techhub.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
A hub primarily for passionate technologists, but everyone is welcome

Administered by:

Server stats:

4.7K
active users

#llamacpp

1 post1 participant0 posts today
Grigory Shepelev<p>I am an <a href="https://fosstodon.org/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a>-enhanced coding believer now since I've started working at the new place (3-4 months ago). Using <a href="https://fosstodon.org/tags/openrouter" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>openrouter</span></a> is a corporate practice there and it's kinda obligatory. </p><p>Now I want to enhance my <a href="https://fosstodon.org/tags/guix" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>guix</span></a> setup with all mcp's possible, upgrade a video card in my desktop and start local <a href="https://fosstodon.org/tags/llamacpp" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>llamacpp</span></a> server and share it with some friends.</p>
risse<p>Running a privacy-friendly local LLM on a Raspberry Pi? It's possible, check out my video below</p><p><a href="https://www.youtube.com/watch?v=TNxIIDkP2Zg" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://www.</span><span class="ellipsis">youtube.com/watch?v=TNxIIDkP2Z</span><span class="invisible">g</span></a></p><p><a href="https://mastodon.content.town/tags/raspberrypi" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>raspberrypi</span></a> <a href="https://mastodon.content.town/tags/ai" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ai</span></a> <a href="https://mastodon.content.town/tags/llamacpp" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>llamacpp</span></a> <a href="https://mastodon.content.town/tags/makersgonnamake" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>makersgonnamake</span></a> <a href="https://mastodon.content.town/tags/linux" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>linux</span></a> <a href="https://mastodon.content.town/tags/tech" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>tech</span></a></p>
Saemon Zixel<p>Запустил llama.cpp на другой материнке с процессором AMD E2-3000. Это хоть и аналог Intel Atom, но посовременнее.</p><p>Разбор промпта и генерация ответа стали чуть-чуть быстрее. На 10 процентов примерно. Хотя память DDR3 работает на шине 1600МГц и быстрее в 1,5 раза, чем предыдущая DDR2 на 1066МГц шине. Зато процессор был на 2,6ГГц. А у этого всего лишь 1,6ГГц.</p><p>Перекомпилировал llama.cpp на этом процессоре, и скорость прям удвоилась. <br>Vikhr-Llama-3.2-1B-Q8_0 выдаёт 2 токена в секунду.<br>А QwQ-500M.Q8_0 выдаёт 6 токенов в секунду и прям так бодренько пишет ответ. Правда, моделька глупенькая, склонна рассуждать и редко правильно отвечает. </p><p>Как я понял, это всё из-за поддержки процессором AVX1 и FP16C. А скорость оперативной памяти, к сожалению, тут почти не играет роли.</p><p><a href="https://mastodon.ml/tags/llamacpp" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>llamacpp</span></a> <a href="https://mastodon.ml/tags/vikhr" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>vikhr</span></a> <a href="https://mastodon.ml/tags/qwq" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>qwq</span></a> <a href="https://mastodon.ml/tags/amd" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>amd</span></a></p>
Eric Curtin<p>RamaLama just got multimodal! 🚀 See, understand &amp; respond to visual info with new VLM capabilities. Shoutout to Xuan-Son Nguyen! <a href="https://social.treehouse.systems/tags/RamaLama" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>RamaLama</span></a> <a href="https://social.treehouse.systems/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a> <a href="https://social.treehouse.systems/tags/llamacpp" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>llamacpp</span></a></p><p><a href="https://developers.redhat.com/articles/2025/06/20/unleashing-multimodal-magic-ramalama" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">developers.redhat.com/articles</span><span class="invisible">/2025/06/20/unleashing-multimodal-magic-ramalama</span></a></p>
Eric Curtin<p>Stef Walter utilising one of <a href="https://social.treehouse.systems/tags/RamaLama" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>RamaLama</span></a> 's latest features, containerised multi-modal inferencing. <br>We make great use of Xuan-Son Nguyen's demo application <a href="https://social.treehouse.systems/tags/llamacpp" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>llamacpp</span></a></p>
Olivier Chafik<p>llama.cpp streaming support for tool calling &amp; thoughts was just merged: please test &amp; report any issues 😅</p><p><a href="https://github.com/ggml-org/llama.cpp/pull/12379" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">github.com/ggml-org/llama.cpp/</span><span class="invisible">pull/12379</span></a></p><p><a href="https://fosstodon.org/tags/llamacpp" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>llamacpp</span></a></p>
Eric Curtin<p>On route to <a href="https://social.treehouse.systems/tags/redhatsummit" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>redhatsummit</span></a>, watch out for: "AI inferencing for developers and administrators", "Securing AI workloads with RamaLama", "RamaLama Making developing AI Boring". We may even see a vlm demo, very accurate models as we can see here <a href="https://social.treehouse.systems/tags/ramalama" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ramalama</span></a> <a href="https://social.treehouse.systems/tags/llamacpp" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>llamacpp</span></a></p>
Boiling Steam<p>Vision Now Available in Llama.cpp: <a href="https://github.com/ggml-org/llama.cpp/blob/master/docs/multimodal.md" rel="nofollow noopener" target="_blank"><span class="invisible">https://</span><span class="ellipsis">github.com/ggml-org/llama.cpp/</span><span class="invisible">blob/master/docs/multimodal.md</span></a> <br><a href="https://mastodon.cloud/tags/linux" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>linux</span></a> <a href="https://mastodon.cloud/tags/update" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>update</span></a> <a href="https://mastodon.cloud/tags/foss" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>foss</span></a> <a href="https://mastodon.cloud/tags/release" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>release</span></a> <a href="https://mastodon.cloud/tags/llamacpp" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>llamacpp</span></a> <a href="https://mastodon.cloud/tags/vision" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>vision</span></a> <a href="https://mastodon.cloud/tags/ai" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ai</span></a> <a href="https://mastodon.cloud/tags/llm" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>llm</span></a></p>
Winbuzzer<p>Microsoft Clippy Returns as AI Assistant, Empowered By LLMs You Can Run Locally on Your PC</p><p><a href="https://mastodon.social/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a> <a href="https://mastodon.social/tags/Clippy" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Clippy</span></a> <a href="https://mastodon.social/tags/AIClippy" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AIClippy</span></a> <a href="https://mastodon.social/tags/AIAssistants" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AIAssistants</span></a> <a href="https://mastodon.social/tags/LLMs" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>LLMs</span></a> <a href="https://mastodon.social/tags/LocalAI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>LocalAI</span></a> <a href="https://mastodon.social/tags/OpenSource" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>OpenSource</span></a> <a href="https://mastodon.social/tags/ElectronJS" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ElectronJS</span></a> <a href="https://mastodon.social/tags/LlamaCpp" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>LlamaCpp</span></a> <a href="https://mastodon.social/tags/GGUF" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>GGUF</span></a> <a href="https://mastodon.social/tags/Gemma3" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Gemma3</span></a> <a href="https://mastodon.social/tags/Llama3" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Llama3</span></a> <a href="https://mastodon.social/tags/Phi4" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Phi4</span></a> <a href="https://mastodon.social/tags/Qwen3" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Qwen3</span></a> <a href="https://mastodon.social/tags/RetroTech" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>RetroTech</span></a> <a href="https://mastodon.social/tags/MicrosoftOffice" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>MicrosoftOffice</span></a> <a href="https://mastodon.social/tags/OnDeviceAI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>OnDeviceAI</span></a> </p><p><a href="https://winbuzzer.com/2025/05/06/microsoft-clippy-returns-as-ai-assistant-empowered-by-llms-you-can-run-locally-on-your-pc-xcxwbn/" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">winbuzzer.com/2025/05/06/micro</span><span class="invisible">soft-clippy-returns-as-ai-assistant-empowered-by-llms-you-can-run-locally-on-your-pc-xcxwbn/</span></a></p>
Saemon Zixel<p>А llama.cpp достаточно легко и просто скомпилировалась в моей 32битной altlinux. Зависимостей мизер. Ничего не потребовалось доустанавливать, компилить. При этом работает стабильно, не ругается, не сегфолтиться.</p><p>Тестил с Vikhr-Llama-3.2-1B-Q8_0.gguf, которая на 1,2ГБ и знает русский язык. Скорость "чтения" промпта 2 токена/сек. А скорость генерации ответа 1 токен/сек. Для вопросов "не к спеху" можно использовать, но качество ответа так себе. </p><p>Замечу, что компьютер у меня старенький: Pentium D E6300 на 2,8Ггц, поддерживает максимум SSSE3 и работает с памятью DDR2 на 4ГБ. По этому, то, что есть уже радует меня)</p><p><a href="https://lor.sh/tags/llama" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>llama</span></a> <a href="https://lor.sh/tags/llamacpp" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>llamacpp</span></a> <a href="https://lor.sh/tags/linux" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>linux</span></a> <a href="https://lor.sh/tags/vikhr" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>vikhr</span></a></p>
Phil<p>Big hopes for Qwen3. <i>IF</i> the 30A3B model works well, <code>gptel-org-tools</code><span> will be very close to what I envision as a good foundation for the package.<br><br>It's surprisingly accurate, especially with reasoning enabled.<br><br>At the same time, I'm finding that </span><a href="https://fed.bajsicki.com/tags/gptel" rel="nofollow noopener" target="_blank">#gptel</a> struggles <i>a lot</i> with handling LLM output that contains reasoning, content <i>and</i><span> tool calls at once.<br><br>I'm stumped. These new models are about as good as it's ever been for local inference, and they work great in both the llama-server and LM Studio UI's. <br><br>Changing the way I prompt doesn't work. I tried taking an axe to gptel-openai.el, but I frankly don't understand the code nearly well enough to get a working version going.<br><br>So... yeah. Kinda stuck.<br><br>Not sure what next. Having seen Qwen3, I'm not particularly happy to go back to older models.<br><br></span><a href="https://fed.bajsicki.com/tags/emacs" rel="nofollow noopener" target="_blank">#emacs</a> <a href="https://fed.bajsicki.com/tags/gptelorgtools" rel="nofollow noopener" target="_blank">#gptelorgtools</a> <a href="https://fed.bajsicki.com/tags/llamacpp" rel="nofollow noopener" target="_blank">#llamacpp</a></p>
Hassan Habib<p>Run AI completely offline with Llama-CLI and C#! 🚀<br />No cloud. Full control.<br />Watch the full guide here: <a href="https://www.youtube.com/watch?v=lc6lVCe0XHI" target="_blank" rel="nofollow noopener" translate="no"><span class="invisible">https://www.</span><span class="">youtube.com/watch?v=lc6lVCe0XHI</span><span class="invisible"></span></a><br /><a href="https://techhub.social/tags/AI" class="mention hashtag" rel="tag">#<span>AI</span></a> <a href="https://techhub.social/tags/CSharp" class="mention hashtag" rel="tag">#<span>CSharp</span></a> <a href="https://techhub.social/tags/OfflineAI" class="mention hashtag" rel="tag">#<span>OfflineAI</span></a> <a href="https://techhub.social/tags/LlamaCpp" class="mention hashtag" rel="tag">#<span>LlamaCpp</span></a></p>
Peter Lord<p>Started preparing for my next talk on <span class="h-card" translate="no"><a href="https://u3acommunities.org/@u3acommunities.org" class="u-url mention" rel="nofollow noopener" target="_blank">@<span>u3acommunities.org</span></a></span>. </p><p>Will outline running <a href="https://mastodon.social/tags/generativeai" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>generativeai</span></a> locally, mainly for privacy reasons. </p><p>Will include <a href="https://mastodon.social/tags/llamacpp" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>llamacpp</span></a> <a href="https://mastodon.social/tags/ollama" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ollama</span></a> <a href="https://mastodon.social/tags/AUTOMATIC1111" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AUTOMATIC1111</span></a> <a href="https://mastodon.social/tags/openwebui" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>openwebui</span></a> and probably others.</p><p>Any pointers of things to mention appreciated !</p>
N-gated Hacker News<p>🐪🤯 Oh, the riveting saga of Llama.cpp's heap—it’s like watching paint dry, but with more compiler errors. Our intrepid hacker spent 30 hours (yes, you read that right) dissecting code so niche, even the bugs were disinterested. 🐛💤<br><a href="https://retr0.blog/blog/llama-rpc-rce" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="">retr0.blog/blog/llama-rpc-rce</span><span class="invisible"></span></a> <a href="https://mastodon.social/tags/LlamaCpp" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>LlamaCpp</span></a> <a href="https://mastodon.social/tags/Debugging" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Debugging</span></a> <a href="https://mastodon.social/tags/CodeNiche" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>CodeNiche</span></a> <a href="https://mastodon.social/tags/CompilerErrors" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>CompilerErrors</span></a> <a href="https://mastodon.social/tags/HackerNews" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>HackerNews</span></a> <a href="https://mastodon.social/tags/HackerNews" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>HackerNews</span></a> <a href="https://mastodon.social/tags/ngated" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ngated</span></a></p>
Hacker News<p>Heap-overflowing Llama.cpp to RCE</p><p><a href="https://retr0.blog/blog/llama-rpc-rce" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="">retr0.blog/blog/llama-rpc-rce</span><span class="invisible"></span></a></p><p><a href="https://mastodon.social/tags/HackerNews" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>HackerNews</span></a> <a href="https://mastodon.social/tags/HeapOverflow" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>HeapOverflow</span></a> <a href="https://mastodon.social/tags/LlamaCpp" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>LlamaCpp</span></a> <a href="https://mastodon.social/tags/RCE" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>RCE</span></a> <a href="https://mastodon.social/tags/CyberSecurity" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>CyberSecurity</span></a> <a href="https://mastodon.social/tags/Exploit" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Exploit</span></a> <a href="https://mastodon.social/tags/TechNews" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>TechNews</span></a></p>
Nexus6<p>I've just published the second part of my guide on setting up an AI/LLM stack in Haiku. If you've been curious about running AI models on alternative operating systems, this one's for you!<br>🔗 <a href="https://blog.nexus6.me/new%20adventures%20in%20ai/Setup-an-environment-for-AI-in-Haiku-Part-2/" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">blog.nexus6.me/new%20adventure</span><span class="invisible">s%20in%20ai/Setup-an-environment-for-AI-in-Haiku-Part-2/</span></a><br><a href="https://mastodon.social/tags/HaikuOS" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>HaikuOS</span></a> <a href="https://mastodon.social/tags/langchain" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>langchain</span></a> <a href="https://mastodon.social/tags/openai" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>openai</span></a> <a href="https://mastodon.social/tags/llamacpp" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>llamacpp</span></a></p>
Nexus6<p>I've just published the first part of my guide on setting up an AI/LLM stack in Haiku. If you've been curious about running AI models on alternative operating systems, this one's for you!<br>🔗 <a href="https://blog.nexus6.me/new%20adventures%20in%20ai/Setup-an-environment-for-AI-in-Haiku-Part-1/" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">blog.nexus6.me/new%20adventure</span><span class="invisible">s%20in%20ai/Setup-an-environment-for-AI-in-Haiku-Part-1/</span></a><br><a href="https://mastodon.social/tags/HaikuOS" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>HaikuOS</span></a> <a href="https://mastodon.social/tags/langchain" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>langchain</span></a> <a href="https://mastodon.social/tags/openai" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>openai</span></a> <a href="https://mastodon.social/tags/llamacpp" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>llamacpp</span></a></p>
Hacker News<p>Llama.cpp AI Performance with the GeForce RTX 5090 Review — <a href="https://www.phoronix.com/review/nvidia-rtx5090-llama-cpp" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://www.</span><span class="ellipsis">phoronix.com/review/nvidia-rtx</span><span class="invisible">5090-llama-cpp</span></a><br><a href="https://mastodon.social/tags/HackerNews" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>HackerNews</span></a> <a href="https://mastodon.social/tags/LlamaCPP" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>LlamaCPP</span></a> <a href="https://mastodon.social/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a> <a href="https://mastodon.social/tags/GeForceRTX5090" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>GeForceRTX5090</span></a> <a href="https://mastodon.social/tags/NVIDIA" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>NVIDIA</span></a> <a href="https://mastodon.social/tags/Review" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Review</span></a> <a href="https://mastodon.social/tags/TechNews" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>TechNews</span></a></p>
Todd A. Jacobs | Rubyist<p>It seems like metal-enabled <a href="https://ruby.social/tags/llamacpp" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>llamacpp</span></a> using <a href="https://ruby.social/tags/gguf" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>gguf</span></a> is faster than llama.cpp with <a href="https://ruby.social/tags/mlx" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>mlx</span></a> on my <a href="https://ruby.social/tags/AppleSilicon" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AppleSilicon</span></a>. <a href="https://ruby.social/tags/Ollama" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Ollama</span></a> is mlx-only and slower, so not just a tool optimization.</p><p>MLX was designed for Metal so should be faster. Maybe it helps more with Apple Intelligence or something? I now choose GGUF over MLX unless I specifically need Ollama.</p><p>Anyone else had similar experiences? Do newer M-series chips do a better job with it, or did I not account for something?</p><p><a href="https://github.com/ggerganov/llama.cpp" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="">github.com/ggerganov/llama.cpp</span><span class="invisible"></span></a></p>
Olivier Chafik<p>llama.cpp now supports tool calling (OpenAI-compatible)</p><p><a href="https://github.com/ggerganov/llama.cpp/pull/9639" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">github.com/ggerganov/llama.cpp</span><span class="invisible">/pull/9639</span></a></p><p>On top of generic support for *all* models, it supports 8+ models’ native formats:<br>- Llama 3.x<br>- Functionary 3<br>- Hermes 2/3<br>- Qwen 2.5<br>- Mistral Nemo<br>- Firefunction 3<br>- DeepSeek R1</p><p>Runs anywhere (incl. Raspberry Pi 5).<br>On a Mac:</p><p>brew install llama.cpp<br>llama-server --jinja -fa -hf bartowski/Qwen2.5-7B-Instruct-GGUF:Q4_K_M</p><p>Still fresh / lots of bugs to discover: feedback welcome!</p><p><a href="https://fosstodon.org/tags/llamacpp" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>llamacpp</span></a></p>