#

CUDA

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

Here are 4,935 public repositories matching this topic...

deepflowio / deepflow

✨ Zero-code distributed tracing and profiling, observability via eBPF 🚀

kubernetes gpu cuda wasm apm profiling distributed-tracing service-map opentelemetry llm

Updated Jun 12, 2024
Go

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

amd cuda inference pytorch transformer llama gpt rocm model-serving mlops llm inferentia llmops llm-serving trainium

Updated Jun 12, 2024
Python

shader-slang / slang

Making it easier to work with shaders

shaders vulkan glsl cuda hlsl d3d12

Updated Jun 12, 2024
C++

Qengineering / Install-OpenCV-Jetson-Nano

OpenCV installation script with CUDA and cuDNN support

opencv cuda cudnn jetson-xavier opencv4 jetson-nano

Updated Jun 12, 2024
Shell

tlfloat

shibatch / tlfloat

Template library for floating point operations

cplusplus math constexpr templates cuda arbitrary-precision floating-point ieee754 half-precision cpp20 quadruple-precision float128 octuple-precision

Updated Jun 12, 2024
C++

viseron

roflcoopter / viseron

Self-hosted, local only NVR and AI Computer Vision software. With features such as object detection, motion detection, face recognition and more, it gives you the power to keep an eye on your home, office or any other place you want to monitor.

Updated Jun 12, 2024
Python

ska-sa / katgpucbf

A GPU-based correlator for MeerKAT Extension

python gpu cuda radio-astronomy

Updated Jun 12, 2024
Python

replicate / cog

Containers for machine learning

docker machine-learning ai deep-learning containers tensorflow cuda pytorch

Updated Jun 12, 2024
Python

vectorch-ai / ScaleLLM

A high-performance inference system for large language models, designed for production environments.

performance gpu model production cuda efficiency inference transformer llama speculative serving llm llm-inference llama3

Updated Jun 12, 2024
C++

rapidsai / cudf

cuDF - GPU DataFrame Library

python data-science cpp gpu arrow pydata cuda pandas data-analysis dask dataframe rapids cudf

Updated Jun 12, 2024
C++

cupy / cupy

NumPy & SciPy for GPU

python gpu numpy cuda cublas scipy tensor cudnn rocm cupy cusolver nccl curand cusparse nvrtc cutensor nvtx cusparselt

Updated Jun 12, 2024
Python

yosh-matsuda / gpu-ptr

Cross-platform GPU smart pointer with C++20 range support

cpp gpu cuda header-only hip cpp20

Updated Jun 12, 2024
C++

ROCm / rocRAND

RAND library for HIP programming language

gpu random cuda rng hip rocm

Updated Jun 12, 2024
C++

pytorch / TensorRT

PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT

machine-learning deep-learning cuda pytorch nvidia jetson tensorrt libtorch

Updated Jun 12, 2024
Python

NVIDIA / TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.

python machine-learning deep-learning gpu cuda pytorch jax fp8

Updated Jun 12, 2024
Python

iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.

machine-learning compiler runtime tensorflow vulkan cuda pytorch spirv jax mlir

Updated Jun 12, 2024
C++

janhq / cortex

Drop-in, local AI alternative to the OpenAI stack. Multi-engine (llama.cpp, TensorRT-LLM). Powers 👋 Jan

ai cuda llama accelerated inference-engine openai-api llm stable-diffusion llms llamacpp llama2 gguf tensorrt-llm

Updated Jun 12, 2024
C++

bytedance / flux

A fast communication-overlapping library for tensor parallelism on GPUs.

gpu cuda pytorch cutlass

Updated Jun 12, 2024
C++

NVIDIA / cccl

CUDA C++ Core Libraries

cpp hpc gpu modern-cpp parallel-computing cuda nvidia gpu-acceleration cuda-kernels gpu-computing parallel-algorithm parallel-programming nvidia-gpu gpu-programming cuda-library cpp-programming cuda-programming accelerated-computing cuda-cpp

Updated Jun 12, 2024
C++

pytorch / torchrec

Pytorch domain library for recommendation systems

deep-learning gpu cuda pytorch recommendation-system sharding recommender-system

Updated Jun 12, 2024
Python

Created by Nvidia

Released June 23, 2007

Followers: 205 followers
Website: developer.nvidia.com/cuda-zone
Wikipedia: Wikipedia

Related Topics

nvcc