I'm tired of LLM bullshitting. So I fixed it.

SuspciousCarrot78@lemmy.world · edit-2 24 days ago

I'm tired of LLM bullshitting. So I fixed it.

SuspciousCarrot78@lemmy.world · 26 days ago

My brother in virtual silicon: I run this shit on a $200 p.o.s with 4gb of VRAM.

If you can run an LLM at all, this will run. BONUS: because of the way “Vodka” operates, you can run with a smaller context window without eating shit of OOM errors. So…that means… if you could only run a 4B model (because the GGUF itself is 3GBs without the over-heads…then you add in the drag from the KV cache accumulation)… maybe you can now run next sized up model…or enjoy no slow down chats with the model size you have.

rollin@piefed.social · 26 days ago

I never knew LLMs can run on such low-spec machines now! That’s amazing. You said elsewhere you’re using Qwen3-4B (abliterated), and I found a page saying that there are Qwen3 models that will run on “Virtually any modern PC or Mac; integrated graphics are sufficient. Mobile phones”

Is there still a big advantage to using Nvidia GPUs? Is your card Nvidia?

My home machine that I’ve installed ollama on (and which I can’t access in the immediate future) has an AMD card, but I’m now toying with putting it on my laptop, which is very midrange and has Intel Arc graphics (which performs a whole lot better than I was expecting in games)

SuspciousCarrot78@lemmy.world · 26 days ago

Yep, LLMs can and do run on edge devices (weak hardware).

One of the driving forces for this project was in fact trying to make my $50 raspberry pi more capable of running llm. It sits powered on all the time, so why not?

No special magic with NVIDIA per se, other than ubiquity.

Yes, my card is NVIDIA, but you don’t need a card to run this.

I'm tired of LLM bullshitting. So I fixed it.

I'm tired of LLM bullshitting. So I fixed it.

llama-conductor

The thing: llama-conductor

1) KB mechanics that don’t suck (1990s engineering: markdown, JSON, checksums)

2) Mentats: proof-or-refusal mode (Vault-only)

3) Vodka: deterministic memory on a potato budget