SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
-
Updated
Jun 3, 2024 - Python
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
Neural Networks with low bit weights on low end 32 bit microcontrollers such as the CH32V003 RISC-V Microcontroller and others
Model Compression Toolkit (MCT) is an open source project for neural network model optimization under efficient, constrained hardware. This project provides researchers, developers, and engineers advanced quantization and compression tools for deploying state-of-the-art neural networks.
Unify Efficient Fine-Tuning of 100+ LLMs
AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
Implementation for the different ML tasks on Kaggle platform with GPUs.
Train, Evaluate, Optimize, Deploy Computer Vision Models via OpenVINO™
TinyChatEngine: On-Device LLM Inference Library
Brevitas: neural network quantization in PyTorch
On-device LLM Inference Powered by X-Bit Quantization
Neural Network Compression Framework for enhanced OpenVINO™ inference
Self-Created Tools to convert ONNX files (NCHW) to TensorFlow/TFLite/Keras format (NHWC). The purpose of this tool is to solve the massive Transpose extrapolation problem in onnx-tensorflow (onnx-tf). I don't need a Star, but give me a pull request.
🤗 Optimum Intel: Accelerate inference with Intel optimization tools
A collection of hand on notebook for LLMs practitioner
🎨 Convert images to 15/16-bit RGB color with dithering
PEFT is a wonderful tool that enables training a very large model in a low resource environment. Quantization and PEFT will enable widespread adoption of LLM.
SOTA Weight-only Quantization Algorithm for LLMs. This is official implementation of "Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs"
A Python package for extending the official PyTorch that can easily obtain performance on Intel platform
[SIGMOD 2024] RaBitQ: Quantizing High-Dimensional Vectors with a Theoretical Error Bound for Approximate Nearest Neighbor Search
Add a description, image, and links to the quantization topic page so that developers can more easily learn about it.
To associate your repository with the quantization topic, visit your repo's landing page and select "manage topics."