A high-throughput and memory-efficient inference and serving engine for LLMs
-
Updated
Jun 12, 2024 - Python
A high-throughput and memory-efficient inference and serving engine for LLMs
A universal scalable machine learning model deployment solution
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Redis integration for the image-dataset-converter library.
PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"
OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
A high-performance inference system for large language models, designed for production environments.
A fast, easy-to-use, production-ready inference server for computer vision supporting deployment of many popular model architectures and fine-tuned models.
Espero que en este repo encuentres inspiración para aprender y desarrollarte en el mundo de la Estadística, no soy perfecto en todo así que si tienes una sugerencia la aceptares con todo el gusto, espero disfrutes lo que puedes encontrar en la página **Este repo aun esta en construcción**
JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
Cross-platform, customizable ML solutions for live and streaming media.
Containerized REST API for interacting with Hugging Face NLP models.
High-efficiency floating-point neural network inference operators for mobile, server, and Web
Large Language Model Text Generation Inference
A scalable inference server for models optimized with OpenVINO™
aigc_serving lightweight and efficient Language service model reasoning
ONNX runtime for Flutter.
Inverse modelling framework for dynamical systems characterised by complex dynamics.
Add a description, image, and links to the inference topic page so that developers can more easily learn about it.
To associate your repository with the inference topic, visit your repo's landing page and select "manage topics."