Releases · ggerganov/llama.cpp

11 Jun 16:35

148995e

b3131 Latest

Latest

llama-bench: more compact markdown tables (#7879)

Assets 20

cudart-llama-bin-win-cu11.7.1-x64.zip

293 MB 2024-06-11T16:35:38Z
cudart-llama-bin-win-cu12.2.0-x64.zip

413 MB 2024-06-11T16:35:50Z
llama-b3131-bin-macos-arm64.zip

41.4 MB 2024-06-11T16:36:09Z
llama-b3131-bin-macos-x64.zip

38.1 MB 2024-06-11T16:36:13Z
llama-b3131-bin-ubuntu-x64.zip

46 MB 2024-06-11T16:36:15Z
llama-b3131-bin-win-avx-x64.zip

7.33 MB 2024-06-11T16:36:19Z
llama-b3131-bin-win-avx2-x64.zip

7.3 MB 2024-06-11T16:36:21Z
llama-b3131-bin-win-avx512-x64.zip

7.31 MB 2024-06-11T16:36:22Z
llama-b3131-bin-win-cuda-cu11.7.1-x64.zip

134 MB 2024-06-11T16:36:24Z
llama-b3131-bin-win-cuda-cu12.2.0-x64.zip

129 MB 2024-06-11T16:36:31Z
Source code (zip)

2024-06-11T12:45:40Z
Source code (tar.gz)

2024-06-11T12:45:40Z

11 Jun 08:15

github-actions

b3130

4bfe50f

b3130

tests : check the Python version (#7872)

ggml-ci

Assets 20

05 Jun 09:25

github-actions

b3091

2b33896

b3091

ggml : refactor rope norm/neox (#7634)

* ggml : unify rope norm/neox (CPU)

* ggml : fix compile warning

* ggml : remove GLM rope mode

ggml-ci

* metal : better rope implementation

ggml-ci

* cuda : better rope implementation

ggml-ci

* naming : n_orig_ctx -> n_ctx_orig

ggml-ci

* dev : add reminders to update backends

ggml-ci

* vulkan : fix ggml_rope_ext() usage

* cuda : fix array size + indents

ggml-ci

Assets 20

05 Jun 00:21

github-actions

b3089

c90dbe0

b3089

Fix per token atrributes bits (#7749)

Assets 20

04 Jun 21:55

github-actions

b3088

b90dc56

b3088

Allow number of nodes in CUDA graph to change (#7738)

Previously the code would have failed to cope in the case that the
number of nodes changes in an existing CUDA graph. This fixes the
issue by removing an unnecessary conditional.

Assets 20

04 Jun 21:21

github-actions

b3087

1442677

b3087

common : refactor cli arg parsing (#7675)

* common : gpt_params_parse do not print usage

* common : rework usage print (wip)

* common : valign

* common : rework print_usage

* infill : remove cfg support

* common : reorder args

* server : deduplicate parameters

ggml-ci

* common : add missing header

ggml-ci

* common : remote --random-prompt usages

ggml-ci

* examples : migrate to gpt_params

ggml-ci

* batched-bench : migrate to gpt_params

* retrieval : migrate to gpt_params

* common : change defaults for escape and n_ctx

* common : remove chatml and instruct params

ggml-ci

* common : passkey use gpt_params

Assets 20

04 Jun 21:13

github-actions

b3086

554c247

b3086

ggml : remove OpenCL (#7735)

ggml-ci

Assets 20

04 Jun 20:53

github-actions

b3085

0cd6bd3

b3085

llama : remove beam search (#7736)

Assets 21

04 Jun 17:16

github-actions

b3083

adc9ff3

b3083

llama-bench : allow using a different printer for stderr with -oe (#7…

…722)

compare-commits.sh : hide stdout, use -oe to print markdown

Assets 21

04 Jun 16:47

github-actions

b3082

987d743

b3082

Improve hipBLAS support in CMake (#7696)

* Improve hipBLAS support in CMake

This improves the detection of the correct CMAKE_PREFIX_PATH when using different distributions or a self-built ROCm SDK.

* Set ROCM_PATH correctly

Assets 21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: ggerganov/llama.cpp

b3131

b3130

b3091

b3089

b3088

b3087

b3086

b3085

b3083

b3082