need help understanding if this setup is even feasible.

brokenlcd@feddit.it · 1 day ago

Pretty sure you can be in both situations while crying. Think of crying like a modifier asset.

brokenlcd@feddit.it · 2 days ago

Oh ok. That changes a lot of things then :-). I think i’ll finally have to graduate to something a little less guided than kobold.cpp. Time to read llama.cpp’s and exllama’s docs i guess.

Thanks for the tips.

brokenlcd@feddit.it · 2 days ago

They can kill as well. But indiscriminately and slowly.

brokenlcd@feddit.it · 2 days ago

Thanks for the advice. I’ll see how mutch i can squeeze out of the new rig. Especially with exl models and different frameworks.

Gemma 12B is really popular now

I was already eyeing it. But i remember the context being memory greedy due to being a multimodal model. While Qwen3 was just way out of the steam deck’s capabilities. Now it’s just a matter of assembling the rig and get tinkering.

Thanks again for the time and the availability :-)

brokenlcd@feddit.it · 3 days ago

Where are you guys buying headphones with ANC for so cheap? The cheapest pair i could find was ~75€.

brokenlcd@feddit.it · 3 days ago

Genesis by justice for the bass and delight by jamie berry to test how they fare with cymbals and trumpets (i found that if a speaker has problems with the band pass filters/ channel separation it sounds awful).

Plus they sound awesome for their respective genres.

brokenlcd@feddit.it · edit-2 3 days ago

at the moment i’m essentially lab ratting the models, i just love to see how far i can push them, both in parameters and in compexity of request. before they break down. plus it was a good excuse to expand my little “homelab” (read: workbench that’s also stuffed with old computers) form just a raspberry pi to something more beefy. as for more “practical” (still mostly to mess around) purposes. i was thinking about making a pseudo-realistic digital radio w/ announcer, using a small model and a TTS model: that is, writing a small summary for the songs in my playlists (or maybe letting the model itself do it, if i manage to give it search capabilites), and letting them shuffle, using the LLM+TTS combo to fake an announcer introducing the songs. i’m quite sure there was already a similar project floating around on github. another option would be implementing it in home assistant via something like willow as a frontend. to have something closer to commercial assistants like alexa, but fully controlled by the user.

I’ve been following this comm for a bit and there seems like a real committed, knowledgeable base of folks here - the dialog just in this post almost brings a tear to my eye, lol.

to be honest, this post might have been the most positive interaction i’ve had on the web since the bbs days. i guess the fact the communities are smaller makes it easier to cobble up people that are genuinely interested in sharing and learing about this stuff, same with the homelab community. like comparing a local coffee shop to a starbucks, it just by nature filters for different people :-)

brokenlcd@feddit.it · 3 days ago

right now i’m hopping between nemo finetunes to see how they fare. i think i only ever used one 8B model from Llama2, the rest is been all Llama 3 and maybe some solar based ones. unfortunately i have yet to properly dig into the more technical side of llms due to time contraints.

the process is vram light (albeit time intense)

so long as it’s not interactive i can always run it at night and make it shut off the rig when it’s done. power here is cheaper at night anyways :-)

thanks for the info (and sorry for the late response, work + cramming for exams turned out to be more brutal than expected)

brokenlcd@feddit.it · edit-2 4 days ago

I’ll have to check exllama once i build the system, if it can fit a 24B model in 12 gb. It should give me some leeway for 13B ones. Though i feel like i’ll need to quantize to exl3 myself for the models i use. Worth a try on a Container though.

Thanks for the tip.

brokenlcd@feddit.it · edit-2 5 days ago

Did you say you’re using a x1 riser though? That splits it to a sixteenth of the bandwidth—maybe I’m misunderstanding what you mean by x1.

not exactly, what i mean by x1 riser is one of these bad boys they are basically extension cords for a x1 pcie link, no bifurcation. the thinkcenter has 1 x16 slot and two x1 slots. my idea for the whole setup was to have the 3060 i’m getting now into the x16 slot of the motherboard, so it can be used for other tasks as well if need’s be; while the second 3060 would be placed in one of the x1 slots the motherboard has via the riser; since from what i managed to read it should only affect the time to first load the model. but the fact you only mentioned the x16 slot does make me worry if there is some handicap to the other two x1 slots.

of course, the second card will come down the line; don’t have nearly enough money for two cards and the thinkcentre :-P.

started with my decade-old ThinkPad inferencing Llama 3.1 8B at about 1 TPS

pretty mutch same story, but with the optiplex and the steam deck. come to think of it, i do need to polish and share the scripts i wrote for the steam deck, since i designed them to be used without a dock, they’re a wonderful gateway drug to this hobby :-).

there’s a popular way to squeeze performance through Mixture of Experts (MoE) models.

yeah, that’s a little too out of scope for me, i’m more practical with the hardware side of things, mostly due to lacking hardware to really get into the more involved stuff. though it’s not out of question for the future :-).

Tesla P100 16GB

i am somewhat familiar with these bad boys, we have an older poweredge server full of them at work, where it’s used for fluid simulation, (i’d love to see how it’s set up, but can’t risk bricking the workhorse) but the need to figure out a cooling system for these cards, plus the higher power draw made it not really feasible in my budget unfortunately.

brokenlcd@feddit.it · edit-2 5 days ago

Is bifurcation necessary because of how CUDA works, or because of bandwidth restraints? Mostly asking because for the secondary card i’ll be limited by the x1 link mining risers have (and also because unfortunately both machines lack that capability. :'-) )

Also, if i offload layers to the GPU manually, so that only the context needs to overflow into RAM, will that be less of a slowdown, or will iti comparable to letting model layers into ram? (Sorry for the question bombing, i’m trying understand how mutch i can realistically push the setup before i pull the trigger)

brokenlcd@feddit.it · 6 days ago

You need a 15$ electrical relay board that sends power from the motherboard to the second PSU or it won’t work.

If you are talking about something like the add2psu boards that jump the PS_ON line of the secondary power supply on when the 12v line of the primary one is ready. Then i’m already on it the diy way. Thanks for the heads up though :-).

expect 1-5token per second (really more like 2-3).

5 tokens per seconds would be wonderful compared to what i’m using right now, since it averages at ~ 1,5 tok/s with 13B models. (Koboldcpp through vulkan on a steam deck) My main concerns for upgrading are bigger context/models + trying to speed up prompt processing. But i feel like the last one will also be handicapped by offloading to RAM.

How much vram is the 3060 youre looking at?

I’m looking for the 12GB version. i’m also giving myself space to add another one (most likely through a 1x mining riser) if i manage to save up enough another card in the future to bump it up to 24 gb with parallel processing, though i doubt i’ll manage.

Sorry for the wall of text, and thanks for the help.

brokenlcd@feddit.it · edit-2 6 days ago

need help understanding if this setup is even feasible.

brokenlcd@feddit.it · edit-2 13 days ago

I don’t have adhd, but femtanyl, kmfdm and justice are wonderful when crunching for exams.

brokenlcd@feddit.it · 17 days ago

No no no. You don’t get it. The turd is turning into a werewolf mid-shit.

brokenlcd@feddit.it · 17 days ago

Tbh, everytime i see soulless corporations trying to look more amicable like this, the only thing that comes to mind is “✨some pretty colors aren’t going to erase your sins✨” said in the most cutesy voice imaginable.

brokenlcd@feddit.it · 17 days ago

brokenlcd@feddit.it · 18 days ago

Don’t give me ideas… I love spicy stuff, and it has been a pretty good deterrent in of itself from having my foodstuffs stolen. So two birds with one stone…

brokenlcd@feddit.it · edit-2 18 days ago

amphotericity is some weird shit, so yes. Water also an acid. (100% butchered the translation)

brokenlcd@feddit.it · 18 days ago

It’s so hot outside even the birb’s melting

brokenlcd@feddit.it · 18 days ago

Yeah, but they can’t get you to scroll through all the ads if they don’t water it down to hell.

brokenlcd@feddit.it · 4 months ago

should i put a disclaimer in my repos?

brokenlcd@feddit.it · 6 months ago

unsure on how to quantize model

brokenlcd@feddit.it · 6 months ago

[help] has anyone managed to run rocm on the deck?

brokenlcd@feddit.it · 1 year ago

wasn't there patch for reducing asset size?