Self Hosting LLMs

Found a very good source @bycloudAI on YouTube.

Oct 10, 2025

There’s an excellent source of AI related wisdom out there - @bycloudAI. I’m running on fumes and this video on the bigger picture of running LLMs locally was really helpful. Let’s watch a video and then consider how to level up in this area, eliminating subscription costs.

Attention Conservation Notice:

Hopeless nerd mumbling about AI performance particulars and hardware minutia. Go no further, unless you’re planning on following this path yourself.

LLMs Of Your Own:

There you have it, everything you might want to know about keeping LLMs as pets. This is an enormous area and I guess I am starting to be literate.

Local LLMs:

I have limited gear available and until the startup I’m working on gets funded, this is all there is.

16GB M1 Pro MacBook - main desktop, can maybe get 5GB model going, if I turn off everything else.
8GB M1 Mac Air - provides 5GB models over the network using LMStudio.
i7-9700T - 32GB desktop that can provide 16GB models over the network with LMStudio, but very, VERY slowly.
Pi5 - 8GB machine, why am I installing Ollama here?

I am paying for ChatGPT Plus ($20/mo) just for general usage and I’ve got Claude Pro ($20/mo) for coding. I had an OpenAI API account but something wiped out my initial $5 investment, I know not what. This is the problem - API usage is a hydra.

Parabeagle was the first place I got aggressive about this problem, replacing OpenAI with a local embedding method. This does not require a lot of ram and runs well on the M1 Pro. It will process tens of thousands of documents at a rate where I didn’t bother to note the run time, it just got stuff done.

Since there’s a mobile component I am getting familiar with React Native. I’ve been directed to climb the Figma learning curve ($20/mo). I’m using it with VSCode, which makes me stabby when I’m compelled to touch it before being properly caffeinated. I started looking at Letta for various reasons, and thusly I have a Google Gemini API key. This led to the discovery of Gemini CLI, which does what Claude Code does, and no cost so far.

The React Native TypeScript environment is Figma for design, VSCode for development, and Gemini for the AI assist. This is wholly separate from the Python specific PyCharm and Claude Code I’ve been using for Parabeagle. This is brand new but I’m finding Gemini to be pleasing, it’s different than Claude Code, but I’m moving through things with it that would have taken me months sans assistance.

The need for a local model was urgent with Parabeagle and I suspect it will become urgent with the two frameworks I’m using - Letta and MindsDB. I don’t care if prototypes are dead slow, as long as they’re not draining my limited funds.

Fantasies:

My tired old HP Z420 with its Nvidia GTX 1060 is still sitting here under my test bench, but it hasn’t been powered on since the spring of 2024. The CPU lacks the AVX2 feature that modern AI apps demand and the 6GB GPU is almost ten years old. Thing “steam powered car” - it’s hot, it’s noisy, and you’re going nowhere in a hurry.

I asked for a Z440, which has AVX2, and a 16GB Nvidia RTX 5060. Since then I’ve been reading and I think the generation after the Z440, the Z4 G4, would be much better. There is a fairly small premium, we’re talking $350 machines on Ebay, and they come with the AVX512 instructions. This is SIMD stuff - Single Instruction Multiple Data, which is used for matrix processing. It’s not as fast as the dedicated purpose GPUs, but it makes a tremendous difference. AVX512 are a confusing family of instruction sets, it would be better to get an AVX10 system, but that means adding another zero to the price tag. Picking the HP is me just crossing my fingers a top tier vendor got the AVX512 stuff right.

Apple systems have unified memory - the CPU and GPU share the machine’s physical memory. This is not as fast as the dedicated GPU memory of Nvidia cards, but it costs less and it’s more flexible. The Nvidia DGX Spark caused a stir when announced, but it’s been eclipsed by systems based on AMD’s Ryzen AI Max+ 395. Careful here, if you’re wanting a full featured desktop, you have to get a PRO model of that chip, assuming you’ll want to do virtualization at some point. Even so, a 128GB unified memory system for $2,000 beats the heck out of the $3,500 M4 Max Mac Studio. If I were to throw $3,500 at Apple it would be for a 96GB M3 Ultra Studio - the memory bandwidth is double that of the M4s, and that shows bigtime in AI related work.

The absolute least cost “get something in here to do a local LLM” would be an Intel Arc Pro A40, but in addition to the $200 this would involve major surgery on the little Dell Optiplex that does Proxmox/file server duties. I’d be giving up both Proxmox and the associated 5TB of mirrored storage in exchange for a tiny 6GB GPU. And it’s NOT Nvidia. Branching out from CUDA systems is inevitable, but I would not bet the farm on this. All I need is one *whoops* in this area and it’ll cost months.

Conclusion:

Morning comes, I make my way to my desk, and I immediately plunge into integration work. I am again serving in the R&D CTO role advertised on my LinkedIn profile. I am occasionally … startled … by this. Being so sick for so very long, then being mostly put back together is … it’s just weird. I got a call today, they’re trying to move up my appointment with the allergist, and once that happens I may well be 100% free of constraints.

There is a brewing MAGA meltdown, I’ve started to pay attention to the Six Republican Factions. This is just a weird hobby at this point, I put maybe 2,000 hours into the big MAGA graph I kept from 2020 - 2024, and it’s sort of like the final season of Game of Thrones. I’m not THAT into it, but I did use a vast amount of time and energy, so I do kinda want to know how it ends.

I haven’t had any new hardware this year, other than swapping a small monitor for the 32” fishbowl on my test bench, which was a nonevent. I’m gonna go out on a limb here and predict that the piecemeal HP workstation will precede any funding leading to some fabulous Apple gadget here on my desk.

But it’s all good … I have my life back, my career is rebooting, who could have predicted this?

🇺🇦 Netwar Irregulars Bulletin 🇺🇦

Discussion about this post