#

inference

Here are 1,205 public repositories matching this topic...

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

amd cuda inference pytorch transformer llama gpt rocm model-serving tpu mlops llm inferentia llmops llm-serving trainium

Updated Jun 12, 2024
Python

deepjavalibrary / djl-serving

A universal scalable machine learning model deployment solution

deep-learning deployment inference pytorch serving djl

Updated Jun 12, 2024
Java

microsoft / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

machine-learning compression deep-learning gpu inference pytorch zero data-parallelism model-parallelism mixture-of-experts pipeline-parallelism billion-parameters trillion-parameters

Updated Jun 12, 2024
Python

waikato-datamining / image-dataset-converter-redis

Redis integration for the image-dataset-converter library.

redis deep-learning conversion inference image-dataset

Updated Jun 12, 2024
Python

google / jetstream-pytorch

PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"

inference pytorch batching attention llama gemma model-serving tpu llm llm-inference llama2

Updated Jun 12, 2024
Python

openvinotoolkit / openvino

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference

nlp natural-language-processing ai computer-vision deep-learning transformers inference speech-recognition yolo recommendation-system performance-boost good-first-issue openvino diffusion-models stable-diffusion generative-ai llm-inference optimize-ai deploy-ai

Updated Jun 12, 2024
C++

vectorch-ai / ScaleLLM

A high-performance inference system for large language models, designed for production environments.

performance gpu model production cuda efficiency inference transformer llama speculative serving llm llm-inference llama3

Updated Jun 12, 2024
C++

inference

roboflow / inference

A fast, easy-to-use, production-ready inference server for computer vision supporting deployment of many popular model architectures and fine-tuned models.

Updated Jun 12, 2024
Python

CapStats-ML / CapStats

Espero que en este repo encuentres inspiración para aprender y desarrollarte en el mundo de la Estadística, no soy perfecto en todo así que si tienes una sugerencia la aceptares con todo el gusto, espero disfrutes lo que puedes encontrar en la página **Este repo aun esta en construcción**

python blog machine-learning r statistics probability regression inference statistical-analysis

Updated Jun 12, 2024
JavaScript

google / JetStream

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

gpu inference pytorch transformer llama gpt gemma model-serving tpu jax mlops large-language-models llm llmops llm-inference llama2

Updated Jun 12, 2024
Python

triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

machine-learning cloud deep-learning gpu inference edge datacenter

Updated Jun 12, 2024
Python

google-ai-edge / mediapipe

Cross-platform, customizable ML solutions for live and streaming media.

android c-plus-plus calculator machine-learning framework computer-vision deep-learning inference pipeline-framework stream-processing video-processing perception mobile-development audio-processing graph-framework graph-based mediapipe

Updated Jun 12, 2024
C++

doppeltilde / natural_language_processing

Containerized REST API for interacting with Hugging Face NLP models.

python nlp docker json machine-learning natural-language-processing rest rest-api inference pytorch huggingface huggingface-transformers huggingface-pipeline

Updated Jun 12, 2024
Python

google / XNNPACK

High-efficiency floating-point neural network inference operators for mobile, server, and Web

cpu neural-network inference multithreading simd matrix-multiplication neural-networks convolutional-neural-networks convolutional-neural-network inference-optimization mobile-inference

Updated Jun 12, 2024
C

huggingface / text-generation-inference

Large Language Model Text Generation Inference

nlp bloom deep-learning inference pytorch falcon transformer gpt starcoder

Updated Jun 12, 2024
Python

openvinotoolkit / model_server

A scalable inference server for models optimized with OpenVINO™

kubernetes machine-learning cloud ai deep-learning inference edge dag model-serving serving openvino

Updated Jun 12, 2024
C++

ssbuild / aigc_serving

aigc_serving lightweight and efficient Language service model reasoning

chat inference aigc llm langchain aigc-serving gpt-serving llm-model

Updated Jun 12, 2024
Python

fonnx

Telosnex / fonnx

ONNX runtime for Flutter.

dart ai ml inference flutter onnx onnxruntime localfirst onnx-models onnxruntime-web

Updated Jun 12, 2024
Dart

intel / xFasterTransformer

intel inference transformer xeon llama model-serving llm chatglm qwen

Updated Jun 12, 2024
C++

vboussange / PiecewiseInference.jl

Inverse modelling framework for dynamical systems characterised by complex dynamics.

inference inverse-problems

Updated Jun 12, 2024
Julia

Improve this page

Add a description, image, and links to the inference topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the inference topic, visit your repo's landing page and select "manage topics."