triton-inference-server / server Public

Notifications You must be signed in to change notification settings
Fork 1.4k
Star 7.6k

Code
Issues 446
Pull requests 48
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Issues: triton-inference-server/server

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

446 Open 3,102 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

could you give some examples about ragged input config for tensorrt backend

#7339 opened Jun 11, 2024 by wanghuihhh

Triton server crash when running a large model with an ONNX/CPU backend investigating

The developement team is investigating this issue

#7337 opened Jun 10, 2024 by LucasAudebert

Poll failed for model directory 'diabetes_model': Invalid model name: Could not determine backend for model 'diabetes_model' with no backend in model configuration. Expected model name of the form 'model.<backend_name>' investigating

The developement team is investigating this issue

question

Further information is requested

#7336 opened Jun 10, 2024 by Manishthakur2503

Triton Tensorrt-LLM 24.04 and 24.05 are very large

#7335 opened Jun 8, 2024 by yaysummeriscoming

Does Triton Server support Dynamic Request Batching for models which has sparse tensors as inputs enhancement

New feature or request

investigating

The developement team is investigating this issue

#7333 opened Jun 7, 2024 by MorrisMLZ

Segmentation fault when multi-requsts to triton-vllm bug

Something isn't working

question

Further information is requested

#7332 opened Jun 7, 2024 by tricky61

Segmentation fault (core dumped) - Server version 2.46.0 question

Further information is requested

#7330 opened Jun 6, 2024 by rahchuenmonroe

CUDA runtime API error raised when using only cpu on Mac M3 investigating

The developement team is investigating this issue

#7324 opened Jun 5, 2024 by SunXuan90

Building and developing with libtritonserver.so

#7320 opened Jun 4, 2024 by asaff1

Triton Server 24.05 can't detect CUDA drivers if host system has installed Nvidia driver 555.85

#7319 opened Jun 4, 2024 by romanvelichkin

Uneven QPS leads to low throughput and high latency as well as low GPU utilization question

Further information is requested

#7318 opened Jun 4, 2024 by SunnyGhj

When the request is large, the Triton server has a very high TTFT. investigating

The developement team is investigating this issue

#7316 opened Jun 4, 2024 by Godlovecui

Memory over 100% with decoupled dali video model investigating

The developement team is investigating this issue

#7315 opened Jun 3, 2024 by wq9

Single docker layer is too large investigating

The developement team is investigating this issue

#7314 opened Jun 3, 2024 by ShuaiShao93

Low QPS with momentary traffic surges cause significant increases in inference TP99 latency. question

Further information is requested

#7313 opened Jun 3, 2024 by a1342772

triton malloc fail question

Further information is requested

#7308 opened May 31, 2024 by MouseSun846

unexpected datatype TYPE_INT64 for inference input ,expecting TYPE_INT32 question

Further information is requested

#7307 opened May 31, 2024 by CallmeZhangChenchen

Add TT-Metalium as a backend enhancement

New feature or request

#7305 opened May 30, 2024 by jvasilje

Why is my model in ensemble receiving out-of-order input question

Further information is requested

#7303 opened May 30, 2024 by Joenhle

ONNX backend with TensorRT optimizer sometimes fails to start

#7296 opened May 29, 2024 by ShuaiShao93

How does Triton implement one instance to handle multiple requests simultaneously? investigating

The developement team is investigating this issue

#7295 opened May 29, 2024 by SeibertronSS

triton-inference-server cannot be started

#7293 opened May 29, 2024 by tuninger

Backend support for .keras files?

#7289 opened May 28, 2024 by chriscarollo

Support histogram custom metric in Python backend enhancement

New feature or request

#7287 opened May 28, 2024 by ShuaiShao93

What is the correct way to run inference in parallel in Triton?

#7283 opened May 28, 2024 by sandesha-hegde

Previous 1 2 3 4 5 … 17 18 Next

Previous Next

ProTip! What’s not been updated in a month: updated:<2024-05-11.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly