-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Issues: triton-inference-server/server
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Milestones
Assignee
Sort
Issues list
could you give some examples about ragged input config for tensorrt backend
#7339
opened Jun 11, 2024 by
wanghuihhh
Triton server crash when running a large model with an ONNX/CPU backend
investigating
The developement team is investigating this issue
#7337
opened Jun 10, 2024 by
LucasAudebert
Poll failed for model directory 'diabetes_model': Invalid model name: Could not determine backend for model 'diabetes_model' with no backend in model configuration. Expected model name of the form 'model.<backend_name>'
investigating
The developement team is investigating this issue
question
Further information is requested
#7336
opened Jun 10, 2024 by
Manishthakur2503
Does Triton Server support Dynamic Request Batching for models which has sparse tensors as inputs
enhancement
New feature or request
investigating
The developement team is investigating this issue
#7333
opened Jun 7, 2024 by
MorrisMLZ
Segmentation fault when multi-requsts to triton-vllm
bug
Something isn't working
question
Further information is requested
#7332
opened Jun 7, 2024 by
tricky61
Segmentation fault (core dumped) - Server version 2.46.0
question
Further information is requested
#7330
opened Jun 6, 2024 by
rahchuenmonroe
CUDA runtime API error raised when using only cpu on Mac M3
investigating
The developement team is investigating this issue
#7324
opened Jun 5, 2024 by
SunXuan90
Triton Server 24.05 can't detect CUDA drivers if host system has installed Nvidia driver 555.85
#7319
opened Jun 4, 2024 by
romanvelichkin
Uneven QPS leads to low throughput and high latency as well as low GPU utilization
question
Further information is requested
#7318
opened Jun 4, 2024 by
SunnyGhj
When the request is large, the Triton server has a very high TTFT.
investigating
The developement team is investigating this issue
#7316
opened Jun 4, 2024 by
Godlovecui
Memory over 100% with decoupled dali video model
investigating
The developement team is investigating this issue
#7315
opened Jun 3, 2024 by
wq9
Single docker layer is too large
investigating
The developement team is investigating this issue
#7314
opened Jun 3, 2024 by
ShuaiShao93
Low QPS with momentary traffic surges cause significant increases in inference TP99 latency.
question
Further information is requested
#7313
opened Jun 3, 2024 by
a1342772
triton malloc fail
question
Further information is requested
#7308
opened May 31, 2024 by
MouseSun846
unexpected datatype TYPE_INT64 for inference input ,expecting TYPE_INT32
question
Further information is requested
#7307
opened May 31, 2024 by
CallmeZhangChenchen
Add TT-Metalium as a backend
enhancement
New feature or request
#7305
opened May 30, 2024 by
jvasilje
Why is my model in ensemble receiving out-of-order input
question
Further information is requested
#7303
opened May 30, 2024 by
Joenhle
ONNX backend with TensorRT optimizer sometimes fails to start
#7296
opened May 29, 2024 by
ShuaiShao93
How does Triton implement one instance to handle multiple requests simultaneously?
investigating
The developement team is investigating this issue
#7295
opened May 29, 2024 by
SeibertronSS
Support histogram custom metric in Python backend
enhancement
New feature or request
#7287
opened May 28, 2024 by
ShuaiShao93
What is the correct way to run inference in parallel in Triton?
#7283
opened May 28, 2024 by
sandesha-hegde
Previous Next
ProTip!
What’s not been updated in a month: updated:<2024-05-11.