• Mrkawfee@lemmy.world
    link
    fedilink
    English
    arrow-up
    3
    ·
    4 months ago

    In early 2022, consumer GPUs [Nvidia] accounted for 47% of Team Green’s total revenue; by early 2026, that share had fallen to just 7.5%. Over the same period, data center revenue surged to $51.2 billion, representing roughly 90% of the company’s earnings.

    Wow, that’s a complete wipeout of GPUs for home computing.

    I wonder if the diminishing returns in gaming graphics has something to do with it as well.

  • dantheclamman@lemmy.world
    link
    fedilink
    English
    arrow-up
    4
    ·
    4 months ago

    This will not stop until the ultra-rich are destroyed as a class. They have constructed a parallel economy, and we are all their serfs. History shows this situation can’t last and the question is whether they can be parted with their wealth peacefully or not

  • Jason2357@lemmy.ca
    link
    fedilink
    English
    arrow-up
    4
    ·
    4 months ago

    Generally speaking, the consumer market has been entirely eclipsed by business to business sales. The only entities with expendable cash are businesses.

  • anon_8675309@lemmy.world
    link
    fedilink
    English
    arrow-up
    4
    ·
    4 months ago

    With the exception of a few tasks most modern hardware should last a while doing everyday tasks. If you’re due for an upgrade do it and then get off the consumerism train for a while. If you’ve got something within the past two or three years you’re set for a while.

    • BeardededSquidward@lemmy.blahaj.zone
      link
      fedilink
      English
      arrow-up
      2
      ·
      4 months ago

      I thought I needed to upgrade my core components but since I’m no longer detailed graphics chasing like I was it’s not as important. It’s still fairly old at this point but it’s doing me just fine.

  • floquant@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    19
    ·
    4 months ago

    Just a reminder that “consumer” means human. They’re fucking over everyone in favour of “corporations” (aka a few select humans)

  • melfie@lemy.lol
    link
    fedilink
    English
    arrow-up
    15
    ·
    edit-2
    4 months ago

    I’ve been looking into self-hosting LLMs, and it seems a $10k GPU is kind of a requirement to run a decently-sized model and get reasonable tokens / s rate. There’s CPU and SSD offloading, but I’d imagine it would be frustratingly slow to use. I even find cloud-based AI like GH Copilot to be rather annoyingly slow. Even so, GH Copilot is like $20 a month per user, and I’d be curious what the actual costs are per user considering the hardware and electricity cost.

    What we have now is clearly an experimental first generation of the tech, but the industry is building out data centers as though it’s always going to require massive GPUs / NPUs with wicked quantities of VRAM to run these things. If it really will require huge data centers full of expensive hardware where each user prompt requires minutes of compute time on a $10k GPU, then it can’t possibly be profitable to charge a nominal monthly fee to use this tech, but maybe there are optimizations I’m unaware of.

    Even so, if the tech does evolve and it become a lot cheaper to host these things, then will all these new data centers still be needed? On the other hand, if the hardware requirements don’t decrease by an order of magnitude, then will it be cost effective to offer LLMs as a service, in which case, I don’t imagine the new data centers will be needed either.

    • brucethemoose@lemmy.world
      link
      fedilink
      English
      arrow-up
      11
      arrow-down
      1
      ·
      edit-2
      4 months ago

      This is not true. I have a single 3090 + 128GB CPU RAM (which wasn’t so expensive that long ago), and I can run GLM 4.6 350B at 6 tokens/sec, with measurably reasonable quantization quality. I can run sparser models like Stepfun 3.5, GLM Air or Minimax 2.1 much faster, and these are all better than the cheapest API models. I can batch Kimi Linear, Seed-OSS, Qwen3, and all sorts of models without any offloading for tons of speed.


      …It’s not trivial to set up though. It’s definitely not turnkey. That’s the issue.

      You can’t just do “ollama run” and expect good performance, as the local LLM scene is finicky and highly experimental. You have to compile forks and PRs, learn about sampling and chat formatting, perplexity and KL divergence, about quantization and MoEs and benchmarking. Everything is moving too fast, and is too performance sensitive, to make it that easy, unfortunately.

      EDIT:

      And if I were trying to get local LLMs setup today, for a lot of usage, I’d probably buy an AI Max 395 motherboard instead of a GPU. They aren’t horrendously priced, and they don’t slurp power like a 3090. 96GB VRAM is the perfect size for all those ~250B MoEs.

      But if you go AMD, take all the finickiness for an Nvidia setup and multiply it by 10. You better know your way around pip and Linux, as if you don’t get it exactly right, performance will be horrendous, and many setups just won’t work anyway.

      • WhyJiffie@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        1
        ·
        4 months ago

        You can’t just do “ollama run” and expect good performance, as the local LLM scene is finicky and highly experimental. You have to compile forks and PRs, learn about sampling and chat formatting, perplexity and KL divergence, about quantization and MoEs and benchmarking. Everything is moving too fast, and is too performance sensitive, to make it that easy, unfortunately.

        how do you have the time to figure all these out and keep being up to date? do you do this at work?

        • brucethemoose@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          edit-2
          4 months ago

          As a hobby mostly, but its useful for work. I found LLMs fascinating even before the hype, when everyone was trying to get GPT-J finetunes named after Star Trek characters to run.

          Reading my own quote, I was being a bit dramatic. But at the very least it is super important to grasp some basic concepts (like MoE CPU offloading, quantization, and specs of your own hardware), and watch for new releases in LocalLlama or whatever. You kinda do have to follow and test things, yes, as there’s tons of FUD in open weights AI land.


          As an example, stepfun 2.5 seems to be a great model for my hardware (single Nvidia GPU + 128GB CPU RAM), and it could have easily flown under the radar without following stuff. I also wouldn’t know to run it with ik_llama.cpp instead of mainline llama.cpp, for a considerable speed/quality boost over (say) LM Studio.

          If I were to google all this now, I’d probably still get links for setting up the Deepseek distillations from Tech Bro YouTubers. That series is now dreadfully slow and long obsolete.

      • melfie@lemy.lol
        link
        fedilink
        English
        arrow-up
        2
        ·
        edit-2
        4 months ago

        Appreciate all the info! I did find this calculator the other day, and it’s pretty clear the RTX 4060 in my server isn’t going to do much though its NVMe may help.

        https://apxml.com/tools/vram-calculator

        I’m also not sure under 10 tokens per second will be usable, though I’ve never really tried it.

        I’d be hesitant to buy something just for AI that doesn’t also have RTX cores because I do a lot of Blender rendering. RDNA 5 is supposed to have more competitive RTX cores along with NPU cores, so I guess my ideal would be a SoC with a ton of RAM. Maybe when RDNA 5 releases, the RAM situation will have have blown over and we will have much better options for AMD SoCs with strong compute capabilities that aren’t just a 1-trick pony for rasterization or AI.

        • brucethemoose@lemmy.world
          link
          fedilink
          English
          arrow-up
          5
          ·
          edit-2
          4 months ago

          I did find this calculator the other day

          That calculator is total nonsense. Don’t trust anything like that; at best, its obsolete the week after its posted.

          I’d be hesitant to buy something just for AI that doesn’t also have RTX cores because I do a lot of Blender rendering. RDNA 5 is supposed to have more competitive RTX cores

          Yeah, that’s a huge caveat. AMD Blender might be better than you think though, and you can use your RTX 4060 on a Strix Halo motherboard just fine. The CPU itself is incredible for any kind of workstation workload.

          along with NPU cores, so I guess my ideal would be a SoC with a ton of RAM

          So far, NPUs have been useless. Don’t buy any of that marketing.

          I’m also not sure under 10 tokens per second will be usable, though I’ve never really tried it.

          That’s still 5 words/second. That’s not a bad reading speed.

          Whether its enough? That depends. GLM 350B without thinking is smarter than most models with thinking, so I end up with better answers faster.

          But anyway, I’m get more like 20 tokens a second with models that aren’t squeezed into my rig within an inch of their life. If you buy an HEDT/Server CPU with more RAM channels, it’s even faster.

          If you want to look into the bleeding edge, start with https://github.com/ikawrakow/ik_llama.cpp/

          And all the models on huggingface with the ik tag: https://huggingface.co/models?other=ik_llama.cpp&sort=modified

          You’ll see instructions for running big models on a 4060 + RAM.

          If you’re trying to like batch process documents quickly (so no CPU offloading), look at exl3s instead: https://huggingface.co/models?num_parameters=min%3A12B%2Cmax%3A32B&sort=modified&search=exl3

          And run them with this: https://github.com/theroyallab/tabbyAPI

    • Xenny@lemmy.world
      link
      fedilink
      English
      arrow-up
      5
      ·
      4 months ago

      Ai failed and now they are doing this to capture the compute market to then make their profit back through unscrupulous means.

    • hector@lemmy.today
      link
      fedilink
      English
      arrow-up
      5
      ·
      4 months ago

      As I am told, there is no way these llm’s ever make their investments back. It’s like Tesla at this point. Whomever is paying the actual money to build this stuff is going to get hosed if they can’t offload it onto some other sucker. That ultimate sucker probably being the US taxpayer.

    • Clam_Cathedral@lemmy.ml
      link
      fedilink
      English
      arrow-up
      4
      ·
      4 months ago

      Honestly just jump in with whatever hardware you have available and a small 1.5b/7b model. You’ll figure out all the difficult uncertainties as you go and try to improve things.

      I’m hosting a few lighter models that are somewhat useful and fun without even using a dedicated GPU- just a lot of ram and fast NVMe so the models don’t take forever to spin up.

      Of course I’ve got an upgrade path in mind for the hardware and to add a GPU but there are other places I’d rather put the money atm and I do appreciate that it all currently runs on a 250w PSU.

    • Analog@lemmy.ml
      link
      fedilink
      English
      arrow-up
      5
      ·
      4 months ago

      Can run decent size models with one of these: https://store.minisforum.com/products/minisforum-ms-s1-max-mini-pc

      For $1k more you can have the same thing from nvidia in their dgx spark. You can use high speed fabric to connect two of ‘em and run 405b parameter models, or so they claim.

      Point being that’s some pretty big models in the 3-4k range, and massive models for less than 10k. The nvidia one supports comfyui so I assume it supports cuda.

      It ain’t cheap and AI has soooo many negatives, but… it does have some positives and local LLMs mitigate some of the minuses, so I hope this helps!

      • melfie@lemy.lol
        link
        fedilink
        English
        arrow-up
        1
        ·
        4 months ago

        Nice, though $3k is still getting pretty pricey. I see mini PCs with a AMD RYZEN AI MAX+ 395 and 96GB of RAM can be had for $2k, or even $1k with less RAM: https://www.gmktec.com/products/amd-ryzen™-ai-max-395-evo-x2-ai-mini-pc?variant=f6803a96-b3c4-40e1-a0d2-2cf2f4e193ff

        I’m looking for something that also does path tracing well if I’m going to drop that kind of coin. It sounds like this chip can be on par with a 4070 for rasterization, but it only gets a benchmark score of 495 for Blender rendering compared to 3110 for even a RTX 4060. RDNA 5 with true RTX cores should drastically change the situation of chips like this, though.

        • brucethemoose@lemmy.world
          link
          fedilink
          English
          arrow-up
          2
          ·
          edit-2
          4 months ago

          FYI you can buy this this: https://frame.work/products/framework-desktop-mainboard-amd-ryzen-ai-max-300-series?v=FRAFMK0002

          And stick a regular Nvidia GPU on it. Or an AMD one.

          That’d give you the option to batch renders across the integrated and discrete GPUs, if such a thing fits your workflow. Or to use one GPU while the other is busy. And if a particular model doesn’t play nice with AMD, it’d give you the option to use Nvidia + CPU offloading very effectively.

          It’s only PCIe 4.0 X4, but that’s enough for most GPUs.

          TBH I’m considering exactly this, hanging my venerable 3090 off the board. As I’m feeling the FOMO crunch of all hardware getting so expensive. And $2K for 16 cores with 128GB of ridiculously fast quad channel RAM is not bad, even JUST as a CPU.

  • Bakkoda@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    9
    ·
    4 months ago

    Consumer sales are very very trackable. Off channel bulk sales can be very hard to verify and I’m sure that’s not being used to prop up valuation. Not at all.

  • varjen@lemmy.world
    link
    fedilink
    English
    arrow-up
    28
    ·
    4 months ago

    I see a future where all computing is done in the cloud and home computers are just dumb terminals. An incredibly depressing future. Customers not users is the goal.

  • neclimdul@lemmy.world
    link
    fedilink
    English
    arrow-up
    38
    ·
    4 months ago

    Kind of makes sense really when you think about it. The vast majority of consumers have had all their wealth eroded over decades to the point no one can buy anything. Better to let the AIs buy everything now.

  • brucethemoose@lemmy.world
    link
    fedilink
    English
    arrow-up
    17
    ·
    edit-2
    4 months ago

    Also, this has been the case (or at least planned) for a while.

    Pascal (the GTX 1000 series) and Ampere (the RTX 3000 series) used the exact same architecture for datacenter/gaming. The big gaming dies were dual use and datacenter-optimized. This habit sort of goes back to ~2008, but Ampere and the A100 is really where “datacenter first” took off.

    AMD announced a plan to unify their datacenter/gaming architecture awhile ago, and prioritized the MI300X before that. And EPYC has always been the priority, too.

    Intel wanted to do this, but had some roadmap trouble.

    These companies have always put datacenter first, it just took this much drama for the consumer segment to largely notice.

  • Fred@lemmy.world
    link
    fedilink
    English
    arrow-up
    42
    ·
    4 months ago

    Imma remember what Curcial and others are doing, so when the AI bubble pops I’ll skip on all their products.

      • Joe@discuss.tchncs.de
        link
        fedilink
        English
        arrow-up
        13
        arrow-down
        1
        ·
        4 months ago

        That’s capitalism for you. But also Linux, where it’s typical to upstream hardware support and rely on existing ecosystems rather than release addon drivers or niche supporting apps.

        China has made some strategic investments in Linux over the years though – often domestically targeted, like Red Flag Linux, and drivers for chinese hardware, etc.

        • ulterno@programming.dev
          link
          fedilink
          English
          arrow-up
          1
          ·
          4 months ago

          But also Linux, where it’s typical to upstream hardware support and rely on existing ecosystems rather than release addon drivers or niche supporting apps.

          Still possible though, right?
          It does afterall support out of tree device drivers now.

          • Joe@discuss.tchncs.de
            link
            fedilink
            English
            arrow-up
            2
            ·
            4 months ago

            Sure… but why would el cheapo hardware want/need to support proprietary drivers? Now, for premium hardware and software, they might still want vendor lock-in mechanisms… So unless you absolutely have to, you should avoid hardware on Linux that needs proprietary drivers.

          • Joe@discuss.tchncs.de
            link
            fedilink
            English
            arrow-up
            4
            ·
            4 months ago

            There is plenty of consumer hardware that is supported on Linux, or will be as soon as a kernel developer gets their hands on it, reverse engineers the protocol if necessary, and adds support. For things like keyboards, there are often proprietary extensions (eg. for built-in displays, macros, etc.). It pays to check for Linux support before buying hardware though. Sometimes it’s not the kernel drivers, but supporting software (eg. Steam input) that might not support it.

            First class vendor support for Linux is more common for niche/premium hardware designed in the west, than cheap chinese knockoffs that follow it. Long term customer support is not their strong suit.

          • moopet@sh.itjust.works
            link
            fedilink
            English
            arrow-up
            3
            ·
            4 months ago

            What do you mean lacking support for keyboards and controllers? Maybe for doing weird custom stuff like RGB, but for anything else they’re standard HIDs and will work with anything, no “support” needed. You can plug a USB keyboard and mouse into your phone and it’ll work if you want.

            I’m currently playing Clair Obscur on linux through steam with a cheap fake xbox controller I got off ebay, and it works perfectly. I’m using an Nvidia card too, and I haven’t had to do any customisation or anything.

            Easy anti-cheat won’t work, so Valorant/Fortnite, etc. are out of the question for now, but any games that don’t use that kind of malware are probably fine.

              • moopet@sh.itjust.works
                link
                fedilink
                English
                arrow-up
                2
                ·
                4 months ago

                Ah, makes sense. You’re right about firmware updaters, and I don’t know if I’d trust one running under Wine anyway tbh. Who knows what weird system calls they make assuming you’re running Windows 95 or whatever.

    • Psythik@lemmy.world
      link
      fedilink
      English
      arrow-up
      12
      arrow-down
      1
      ·
      4 months ago

      I hope you’re right because Intel and AMD still can’t compete with high end Nvidia cards, and that’s how we ended up with a $5000 5090.

      • EndlessNightmare@reddthat.com
        link
        fedilink
        English
        arrow-up
        1
        ·
        4 months ago

        Nvidia cards are more powerful, but even if others never catch up they could still be solidly “good enough” for gaming. I have a newer Nvidia card and my computer is feels so wildly overbuilt. The only thing I wish I had more of was SSD space, but that’s a different problem.

        Unless you’re a professional competitive gamer, in which case this is actual work equipment, the difference in performance between medium-tier and upper-echelon is probably not worth it for the average consumer.

      • drev@lemmy.dbzer0.com
        link
        fedilink
        English
        arrow-up
        7
        ·
        4 months ago

        FIVE THOUSAND?!

        Jesus nun-fucking Christ, what an absolute scam. I bought a 1070 for $220 in the first few months after release. Guess I’ll just have to hope it can run for another 10 years…

        • Psythik@lemmy.world
          link
          fedilink
          English
          arrow-up
          2
          ·
          edit-2
          4 months ago

          Tell me about it. I also paid about the same for a 1070 back in 2016 and it lasted me all the way until 2022, when I finally decided to be a sucker and pony up the $1600 for a brand new 4090 at launch. Which as insane as that is, I’m glad I did because now 4090s go for about $3K used!

          If you didn’t buy a new GPU by mid 2023, you’re pretty much stuck with what you have for the foreseeable future, given how insane prices are now with no signs of slowing down.

      • JohnEdwa@sopuli.xyz
        link
        fedilink
        English
        arrow-up
        6
        ·
        4 months ago

        We also partly ended up with the 5k 5090 because it’s just the TITAN RTX of the 50xx generation - the absolute top of the line a card where you pay 200% extra for that last +10% performance.
        nVidia just realized few generations back that naming those cards the xx90 gets a bunch of more people to buy them, because they always desperately need to have the shiniest newest xx90 cards, no matter the cost.

      • muusemuuse@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        17
        arrow-down
        1
        ·
        edit-2
        4 months ago

        AMD can already beat nvidia at the price tiers most people actually buy at, and Intel is gaining ground way faster than anyone expected.

        But outside of the GPU shakeup, I could give a shit about Intel. Let China kill us. We earned this.

  • swade2569@lemmy.world
    link
    fedilink
    English
    arrow-up
    5
    ·
    4 months ago

    Probably because the hardware is going into systems that eliminate jobs and we become broke. All that gear is gonna sit on the shelf if we can’t afford it.

  • Ilixtze@lemmy.ml
    link
    fedilink
    English
    arrow-up
    87
    arrow-down
    2
    ·
    4 months ago

    AMERICAN manufacturers, just waint until the Chinese industries swoop in to fill the gap. I seriously feel America just wants to kneecap itself.

    • brucethemoose@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      edit-2
      4 months ago

      I mean, I’d kill for a Chinese GPU. But software lock-in for your Steam back catalog is strong.

      Also, have you been watching all the Chinese GPU announcements? They’re all in on datacenter machine learning ASICs too.

      • Ilixtze@lemmy.ml
        link
        fedilink
        English
        arrow-up
        3
        ·
        4 months ago

        There is already a lot of good Chinese DDR 5 memory on the market and it’s a matter of time before Chinese GPU’s and CPU’s proliferate. I remember people in the west global north were sceptic about the viability of Chinese electric cars ever existing just 5 years ago; Elon even laughed at the possibility. Tables turn fast when you have industrial capacity and central planning.

        • brucethemoose@lemmy.world
          link
          fedilink
          English
          arrow-up
          3
          ·
          edit-2
          4 months ago

          Chinese electric cars were always going to take off. RAM is just a commodity; if you sell the most bits at the lowest price and sufficient speed, it works.

          If you’re in edge machine learning, if you write your own software stacks for niche stuff, Chinese hardware will be killer.

          But if you’re trying to run Steam games? Or CUDA projects? That’s a whole different story. It doesn’t matter how good the hardware is, they’re always going to be handicapped by software in “legacy” code. Not just for performance, but driver bugs/quirks.

          Proton (and focusing everything on a good Vulkan driver) is not a bad path forward, but still. They’re working against decades of dev work targeting AMD/Nvidia/Intel, up and down the stack.

          • Ilixtze@lemmy.ml
            link
            fedilink
            English
            arrow-up
            2
            ·
            4 months ago

            But i feel it’s not a matter of the Industry adapting into an entirely different ecosystem. As in, i don’t think that China will be taking over the computer industry. I feel it will be more of an issue of giving American companies and their anti-consumer practices something they haven’t had during their lifetimes: Real competition. I feel a lot of attitudes could change once they are in an ecosystem where they don’t have the luxury of monopolies and closed environments and i feel we are long overdue for having new players in this difficult field. It’s not about being a China shill either but in the end competition is good for the consumer. It’s concerning that all American tech industries are in bed with each other and also in bed with a government bent in global control and totalitarian surveillance. I don’t think Chinese manufacturers could be exempt from these dangers but at least it will give consumers the possibilities to pick their poison.

            Also, GPU and graphics standards have changed in less than decades. We can still play old games in new software. AAA Developer models are clearly dying and new standards for Indie and AA development are emerging. Some of the hottest games this year could be defined as made by indie studios. So instead of hitting a wall i feel gaming in general could be moving into a new paradigm and i sure as hell wish that paradigm is not cloud computing.

            I am not a dedicated gamer. I am from south America and I am playing Expedition 33 with full graphics on a 10 year old, entry range GPU on an old AMD CPU with 32 gigs of DDR 4 memory. And I’m having fun! And this rig works great for my job with a variety of open sourced and pirated software. I don’t need the latest and the greatest. I just need something that gives me results at an affordable price. Lets say that for the next 5 years that might be the new standard as the industry self corrects.

      • Ilixtze@lemmy.ml
        link
        fedilink
        English
        arrow-up
        5
        ·
        4 months ago

        Not a problem for me; I’m not in America, I own a Huawei phone and a Huion Tablet.

    • errer@lemmy.world
      link
      fedilink
      English
      arrow-up
      6
      ·
      4 months ago

      Hard to swoop in with massive tariffs. The few players that remain will just charge a lot more…it’ll become the rich lucky few who can afford their own hardware.

    • foodandart@lemmy.zip
      link
      fedilink
      English
      arrow-up
      53
      arrow-down
      1
      ·
      4 months ago

      Wants to kneecap itself?

      Dude, the US is going full seppuku and we’re going to gut ourselves on the floor.