llamaR: Interface for Large Language Models via 'llama.cpp'

Provides 'R' bindings to 'llama.cpp' for running Large Language Models ('LLMs') locally with optional 'Vulkan' GPU acceleration via 'ggmlR'. Supports model loading, text generation, 'tokenization', token-to-piece conversion, 'embeddings' (single and batch), encoder-decoder inference, low-level batch management, chat templates, 'LoRA' adapters, explicit backend/device selection, multi-GPU split, and 'NUMA' optimization. Includes a high-level 'ragnar'-compatible embedding provider ('embed_llamar'). Built on top of 'ggmlR' for efficient tensor operations.

Version: 0.2.2
Depends: R (≥ 4.1.0), ggmlR
Imports: jsonlite, utils
LinkingTo: ggmlR
Suggests: testthat (≥ 3.0.0), withr
Published: 2026-03-05
DOI: 10.32614/CRAN.package.llamaR (may not be active yet)
Author: Yuri Baramykov [aut, cre], Georgi Gerganov [cph] (Author of the 'llama.cpp' library included in src/)
Maintainer: Yuri Baramykov <lbsbmsu at mail.ru>
BugReports: https://github.com/Zabis13/llamaR/issues
License: MIT + file LICENSE
URL: https://github.com/Zabis13/llamaR
NeedsCompilation: yes
SystemRequirements: C++17, GNU make
Materials: README, NEWS
CRAN checks: llamaR results [issues need fixing before 2026-03-20]

Documentation:

Reference manual: llamaR.html , llamaR.pdf

Downloads:

Package source: llamaR_0.2.2.tar.gz
Windows binaries: r-devel: not available, r-release: llamaR_0.2.2.zip, r-oldrel: llamaR_0.2.2.zip
macOS binaries: r-release (arm64): llamaR_0.2.2.tgz, r-oldrel (arm64): llamaR_0.2.2.tgz, r-release (x86_64): llamaR_0.2.2.tgz, r-oldrel (x86_64): llamaR_0.2.2.tgz

Linking:

Please use the canonical form https://CRAN.R-project.org/package=llamaR to link to this page.