AI training is not even proper copying, much less theft. More akin to humans learning from reading books.
We should abolish copyright btw
The studies would disagree with you: https://arxiv.org/abs/2601.02671
AI slop is theft and slop.
Yes, if a human saw a book enough times they would also be able to quote it from memory. They chose a very popular book which is almost certainly hugely overrepresented in training, try eg Kim Stanley Robinson’s Green Mars and see if it can reproduce it in full. It’s still not proper copying - and even if it was, copying is not theft, since the original person still has their original copy. Theft means you have it and the original owner doesn’t. If it’s ethical for individuals to copy media, it’s also ethical for LLMs to do it (regardless of if they actually do)
Way to refute actual research with “trust me bro”. I love how slop defenders like yourself always rely on vibes arguments instead of data or research. Kind of indicative don’t you think? I’d post more research about how by relying on slop machines you don’t think and lose those skills. But I’ve been down this rabbit hole enough so I’ll go read a book instead
I was engaging with what the research showed and showing how it doesn’t show what you claim it does. That’s all…
more like saving it in a lossy compression format
I agree with your position on copyright, but not on AI.
AI is not:
- “Stealing” digital goods
- …of which there are infinite copies
- …and for which “ownership” is a dubious and antisocial concept
But AI is:
- Enclosing the digital commons
- Interfering with free association
- Neglecting mutual obligations of collaborative works
- Polluting our global collaboration infrastructure
- Sowing epistemic chaos
- Enabling more exploitative work conditions
- Concentrating even more wealth in the hands of the Nerd Reich
That argument could hold for freely open source models (and even then there needs to be laws prohibiting copyrighting anything AI generated) but absolutely not closed, for-profit models. Making a profit off someone else’s work without their knowledge or permission goes beyond copying and well into theft.
I completely agree with you!
“It’s alive bro, trust me bro, it’s just learning bro, it’s just a tool bro, it’s not a corporate vending machine clogging up the internet bro the output is not useless spam bro. give me all your data, water and electricity bro, buy my slop bro.”
I’m not a fan of corporate AI, but then I’m not a fan of corporate anything. I live for decentralised open weights models ;)
most of which don’t publish the training dataset. Because it has all the same problems
As if decentralized models aren’t made by the same corporations and don’t follow the same logic. It’s the typical “disruption” model, they will give you a piece of the vending machine for free and when you are completely de-skilled they will regulate the open source models and charge 500% added the price to use their product.
I don’t fully understand what you mean. How are they going to make me pay to use models that are already downloaded onto my computer and run using open source code?
Are you training those models? Because that requires hardware that is probably $10-12k minimum. Every single LLM self-hosting story I’ve heard is someone using a tiny purpose-driven model that can do only 1 or 2 things. If you want Claude or ChatGPT level of knowledge, you’re running hardware that costs $10k+ minimum. I’ve been in tech 20 years and I don’t even own a single piece of hardware that will run any model well. Even my M3 MBA absolutely chokes on qwen for example.
If you want pre-trained models like Llama, Mistral, or Gemma, you’re circling back to corporate lock-in from Meta, former Meta employees, or Google. Suddenly it’s not open source anymore.
I have a £3k laptop that can run most models with <= 16GB of VRAM
The same way they are trying to regulate and limit 3d printers we already have at our homes. the same way they buy software kill it and then discontinue it for future hardware. But i feel the death for open source models will be perpetrated by laws and the gradual operating system surveillance and denying costumers of hardware. There is already manufacture of consent for that goal; on one side Chinese open models as a “security risk” on the other the generation of ilegal material by open source uncensored models. And then What good is open source when there is no ecosystem to sustain it?
But to me that doesn’t matter at all because anything made by gen ai is useless trash anyway.
Also, training models requires expensive hardware. So now you have a bunch of folks that can run LLMs on their system, but over time those models need to keep learning. The hardware requirements for training are far more intense. It’s cost-prohibitive across the board. That’s when you have people waiting for some angel with the hardware to get the models up to speed, and that’s often times some corporate entity (the one that shaved off a smidge for open source and is now putting the leash around your neck). Zero of this is sustainable. AI never gets cheaper.
training models requires expensive hardware
Right now. In ten to twenty years this won’t be the case. Also consider the diminishing returns from adding more hardware to the problem of training AI: Despite monopolizing the entire world’s supply of DRAM, AI models are only gaining marginal improvements.
The curve of hardware expense against model capabilities is moving to intersect unless something drastic changes. Big AI needs a huge breakthrough in order to stay ahead of that curve. I don’t see that happening because all the big breakthroughs that are happening now are in regards to efficiency which makes things worse for them and makes training cheaper, faster.
Most people have PCs in their homes of which 99% of their computational power goes unused. Is there a reason training models couldn’t be done p2p?
Also LLM models are built on diminishing returns and constant hype cycles. This is part of the business model. Even in the open source LLM scene folks are constantly looking for the new release, and i often hear them get tired of an llm repeating the same patterns over and over again. And then there is the public that start to recognize the speech patters of a certain model or when something is made with an llm. It is already affecting the credibility of some magazines and individual journalists.
Open source image generation depends on buckets of loras and refinement. tons of training for the vending machine. So this idea of having your favorite ol’ isolated model on your pc for years of personal use sounds incompatible with the logic of this technology.







