nicdex @nicdex

**Grigory Shepelev** @shegeley@fosstodon.org · Jul 9

Grigory Shepelev @shegeley@fosstodon.org

I am an #AI-enhanced coding believer now since I've started working at the new place (3-4 months ago). Using #openrouter is a corporate practice there and it's kinda obligatory.

Now I want to enhance my #guix setup with all mcp's possible, upgrade a video card in my desktop and start local #llamacpp server and share it with some friends.

**risse** @risse@mastodon.content.town · Jul 3

Jul 3

risse @risse@mastodon.content.town

Running a privacy-friendly local LLM on a Raspberry Pi? It's possible, check out my video below

https://www.youtube.com/watch?v=TNxIIDkP2Zg

YouTubeRunning AI on a Raspberry PiBy Krisseck

#raspberrypi #ai #llamacpp

**Saemon Zixel** @saemonzixel@mastodon.ml · Jun 22

Jun 22

Saemon Zixel @saemonzixel@mastodon.ml

Запустил llama.cpp на другой материнке с процессором AMD E2-3000. Это хоть и аналог Intel Atom, но посовременнее.

Разбор промпта и генерация ответа стали чуть-чуть быстрее. На 10 процентов примерно. Хотя память DDR3 работает на шине 1600МГц и быстрее в 1,5 раза, чем предыдущая DDR2 на 1066МГц шине. Зато процессор был на 2,6ГГц. А у этого всего лишь 1,6ГГц.

Перекомпилировал llama.cpp на этом процессоре, и скорость прям удвоилась.
Vikhr-Llama-3.2-1B-Q8_0 выдаёт 2 токена в секунду.
А QwQ-500M.Q8_0 выдаёт 6 токенов в секунду и прям так бодренько пишет ответ. Правда, моделька глупенькая, склонна рассуждать и редко правильно отвечает.

Как я понял, это всё из-за поддержки процессором AVX1 и FP16C. А скорость оперативной памяти, к сожалению, тут почти не играет роли.

#llamacpp #vikhr #qwq

**Eric Curtin** @ecurtin@treehouse.systems · Jun 20

Jun 20

Eric Curtin @ecurtin@treehouse.systems

RamaLama just got multimodal! See, understand & respond to visual info with new VLM capabilities. Shoutout to Xuan-Son Nguyen! #RamaLama #AI #llamacpp

https://developers.redhat.com/articles/2025/06/20/unleashing-multimodal-magic-ramalama

Red Hat Developer · Jun 20Unleashing multimodal magic with RamaLama | Red Hat DeveloperRamaLama's new multimodal feature integrates vision-language models with containers. Discover how it helps developers download and serve multimodal AI models

**Eric Curtin** @ecurtin@treehouse.systems · Jun 12

Jun 12

Eric Curtin @ecurtin@treehouse.systems

Stef Walter utilising one of #RamaLama 's latest features, containerised multi-modal inferencing.
We make great use of Xuan-Son Nguyen's demo application #llamacpp

**Olivier Chafik** @ochafik@fosstodon.org · May 25

May 25

Olivier Chafik @ochafik@fosstodon.org

llama.cpp streaming support for tool calling & thoughts was just merged: please test & report any issues

https://github.com/ggml-org/llama.cpp/pull/12379

This PR is still WIP (see todos at the bottom) but welcoming early feedback / testing

Support streaming of tool calls in OpenAI format
Improve handling of thinking model (DeepSeek R1 Distills, QwQ...

GitHub`server`: streaming of tool calls and thoughts when `--jinja` is on by ochafik · Pull Request #12379 · ggml-org/llama.cppBy ochafik

#llamacpp

**Eric Curtin** @ecurtin@treehouse.systems · May 18

May 18

Eric Curtin @ecurtin@treehouse.systems

On route to #redhatsummit, watch out for: "AI inferencing for developers and administrators", "Securing AI workloads with RamaLama", "RamaLama Making developing AI Boring". We may even see a vlm demo, very accurate models as we can see here #ramalama #llamacpp

**Boiling Steam** @boilingsteam@mastodon.cloud · May 10

May 10

Boiling Steam @boilingsteam@mastodon.cloud

Vision Now Available in Llama.cpp: https://github.com/ggml-org/llama.cpp/blob/master/docs/multimodal.md
#linux #update #foss #release #llamacpp #vision #ai #llm

LLM inference in C/C++. Contribute to ggml-org/llama.cpp development by creating an account on GitHub.

GitHubllama.cpp/docs/multimodal.md at master · ggml-org/llama.cppLLM inference in C/C++. Contribute to ggml-org/llama.cpp development by creating an account on GitHub.

**Winbuzzer** @winbuzzer@mastodon.social · May 6

May 6

Winbuzzer @winbuzzer@mastodon.social

Microsoft Clippy Returns as AI Assistant, Empowered By LLMs You Can Run Locally on Your PC

#AI #Clippy #AIClippy #AIAssistants #LLMs #LocalAI #OpenSource #ElectronJS #LlamaCpp #GGUF #Gemma3 #Llama3 #Phi4 #Qwen3 #RetroTech #MicrosoftOffice #OnDeviceAI

https://winbuzzer.com/2025/05/06/microsoft-clippy-returns-as-ai-assistant-empowered-by-llms-you-can-run-locally-on-your-pc-xcxwbn/

**Saemon Zixel** @saemonzixel@lor.sh · Apr 30

Apr 30

Saemon Zixel @saemonzixel@lor.sh

А llama.cpp достаточно легко и просто скомпилировалась в моей 32битной altlinux. Зависимостей мизер. Ничего не потребовалось доустанавливать, компилить. При этом работает стабильно, не ругается, не сегфолтиться.

Тестил с Vikhr-Llama-3.2-1B-Q8_0.gguf, которая на 1,2ГБ и знает русский язык. Скорость "чтения" промпта 2 токена/сек. А скорость генерации ответа 1 токен/сек. Для вопросов "не к спеху" можно использовать, но качество ответа так себе.

Замечу, что компьютер у меня старенький: Pentium D E6300 на 2,8Ггц, поддерживает максимум SSSE3 и работает с памятью DDR2 на 4ГБ. По этому, то, что есть уже радует меня)

#llama #llamacpp #linux

**Phil** @phil@fed.bajsicki.com · Apr 29

Apr 29

Phil @phil@fed.bajsicki.com

Big hopes for Qwen3. IF the 30A3B model works well, gptel-org-tools will be very close to what I envision as a good foundation for the package.

It's surprisingly accurate, especially with reasoning enabled.

At the same time, I'm finding that #gptel struggles a lot with handling LLM output that contains reasoning, content and tool calls at once.

I'm stumped. These new models are about as good as it's ever been for local inference, and they work great in both the llama-server and LM Studio UI's.

Changing the way I prompt doesn't work. I tried taking an axe to gptel-openai.el, but I frankly don't understand the code nearly well enough to get a working version going.

So... yeah. Kinda stuck.

Not sure what next. Having seen Qwen3, I'm not particularly happy to go back to older models.

#emacs #gptelorgtools #llamacpp

**Hassan Habib** @hassanhabib · Apr 27

Apr 27

Hassan Habib @hassanhabib

Run AI completely offline with Llama-CLI and C#!
No cloud. Full control.
Watch the full guide here: https://www.youtube.com/watch?v=lc6lVCe0XHI
#AI #CSharp #OfflineAI #LlamaCpp

YouTubeRun AI Offline in C#.NETBy Hassan Habib

**Peter Lord** @plord12@mastodon.social · Apr 21

Apr 21

Peter Lord @plord12@mastodon.social

Started preparing for my next talk on @u3acommunities.org.

Will outline running #generativeai locally, mainly for privacy reasons.

Will include #llamacpp #ollama #AUTOMATIC1111 #openwebui and probably others.

Any pointers of things to mention appreciated !

**N-gated Hacker News** @ngate@mastodon.social · Mar 26

Mar 26

N-gated Hacker News @ngate@mastodon.social

Oh, the riveting saga of Llama.cpp's heap—it’s like watching paint dry, but with more compiler errors. Our intrepid hacker spent 30 hours (yes, you read that right) dissecting code so niche, even the bugs were disinterested.
https://retr0.blog/blog/llama-rpc-rce #LlamaCpp #Debugging #CodeNiche #CompilerErrors #HackerNews #HackerNews #ngated

retr0.blogRetr0's RegisterRetr0's Threat Research

**Hacker News** @h4ckernews@mastodon.social · Mar 26

Mar 26

Hacker News @h4ckernews@mastodon.social

Heap-overflowing Llama.cpp to RCE

https://retr0.blog/blog/llama-rpc-rce

retr0.blogRetr0's RegisterRetr0's Threat Research

#HackerNews #HeapOverflow #LlamaCpp

**Nexus6** @nexus_6@mastodon.social · Mar 24

Mar 24

Nexus6 @nexus_6@mastodon.social

I've just published the second part of my guide on setting up an AI/LLM stack in Haiku. If you've been curious about running AI models on alternative operating systems, this one's for you!
https://blog.nexus6.me/new%20adventures%20in%20ai/Setup-an-environment-for-AI-in-Haiku-Part-2/
#HaikuOS #langchain #openai #llamacpp

Nexus6's Blog · Mar 19Setting up an AI/LLM Stack in Haiku: A Practical Guide part IIUsing AI Components in Haiku

**Nexus6** @nexus_6@mastodon.social · Mar 24

Mar 24

Nexus6 @nexus_6@mastodon.social

I've just published the first part of my guide on setting up an AI/LLM stack in Haiku. If you've been curious about running AI models on alternative operating systems, this one's for you!
https://blog.nexus6.me/new%20adventures%20in%20ai/Setup-an-environment-for-AI-in-Haiku-Part-1/
#HaikuOS #langchain #openai #llamacpp

Nexus6's Blog · Mar 19Setting up an AI/LLM Stack in Haiku: A Practical Guide part II wanted to push the boundaries of what Haiku can do, so I decided to experiment with setting up a complete AI stack on it. My goal was to see if Haiku could actually run modern Large Language Models without GPU acceleration by using the most used frameworks like LangChain. While mainstream operating systems often require powerful hardware for AI workloads, I was curious if Haiku might offer a practical alternative for enthusiasts who want to explore AI without investing in specialized equipment. In this article, I’ll walk you through how I built a functional Python environment for AI in Haiku and demonstrate how to leverage essential components for working with LLMs, all running on modest hardware like my ThinkPad T480s.

**Hacker News** @h4ckernews@mastodon.social · Mar 10

Mar 10

Hacker News @h4ckernews@mastodon.social

Llama.cpp AI Performance with the GeForce RTX 5090 Review — https://www.phoronix.com/review/nvidia-rtx5090-llama-cpp
#HackerNews #LlamaCPP #AI #GeForceRTX5090 #NVIDIA #Review #TechNews

www.phoronix.comLlama.cpp AI Performance With The GeForce RTX 5090 Review

**Todd A. Jacobs | Rubyist** @todd_a_jacobs@ruby.social · Feb 11

Feb 11

Todd A. Jacobs | Rubyist @todd_a_jacobs@ruby.social

It seems like metal-enabled #llamacpp using #gguf is faster than llama.cpp with #mlx on my #AppleSilicon. #Ollama is mlx-only and slower, so not just a tool optimization.

MLX was designed for Metal so should be faster. Maybe it helps more with Apple Intelligence or something? I now choose GGUF over MLX unless I specifically need Ollama.

Anyone else had similar experiences? Do newer M-series chips do a better job with it, or did I not account for something?

https://github.com/ggerganov/llama.cpp

LLM inference in C/C++. Contribute to ggerganov/llama.cpp development by creating an account on GitHub.

GitHubGitHub - ggerganov/llama.cpp: LLM inference in C/C++LLM inference in C/C++. Contribute to ggerganov/llama.cpp development by creating an account on GitHub.

**Olivier Chafik** @ochafik@fosstodon.org · Feb 1

Feb 1

Olivier Chafik @ochafik@fosstodon.org

llama.cpp now supports tool calling (OpenAI-compatible)

https://github.com/ggerganov/llama.cpp/pull/9639

On top of generic support for *all* models, it supports 8+ models’ native formats:
- Llama 3.x
- Functionary 3
- Hermes 2/3
- Qwen 2.5
- Mistral Nemo
- Firefunction 3
- DeepSeek R1

Runs anywhere (incl. Raspberry Pi 5).
On a Mac:

brew install llama.cpp
llama-server --jinja -fa -hf bartowski/Qwen2.5-7B-Instruct-GGUF:Q4_K_M

Still fresh / lots of bugs to discover: feedback welcome!

This supersedes #6389 (now using a fully C++ approach), #5695 (first attempt at supporting Functionary) and #9592 (more recent Python wrapper).
Which models are supported (in their native style)?
W...

GitHubTool call support (generic + native for Llama, Functionary, Hermes, Mistral, Firefunction, DeepSeek) w/ lazy grammars by ochafik · Pull Request #9639 · ggerganov/llama.cppBy ochafik

#llamacpp

Recent searches

Search options

Administered by:

Server stats:

#llamacpp