That’s not how AI learns “facts”, that’s how AI learns tokens.
And google prioritizes reddit responses, so it’s a bit of an ouroboros of garbage.
Was this guide AI generated as well? Looks like it credits over 100% of its information gathering to the first four sites on the list.
another comment explains some responses can contain multiple sources hence >100%
Ah, so what you’re saying is it doesn’t get 40% of its facts from reddit, but rather 40% of its replies contain a fact cited from reddit? That would explain totals over 100%, but I’m still not sure why they wouldn’t just say that of the x thousand facts AI cited, y percent came from this site. To me, that would have been more representative of what their graph title purports to offer.
im literally just regurgitating something i saw another person comment. but yeah if that was the case why wouldnt they elucidate that lol
So aside from Wikipedia which is a publicly user maintained service which has become pretty reputable … the majority of the ‘facts’ that LLMs collect (about 75%) is all collected from privately controlled websites with curated content that is managed and maintained by corporations. And of all that content, most of it is also manipulated and controlled to make people either angry, mad, frightened, sad or anxious.
They’re teaching the next AI on our negative impulses, greatest fears and worst anxieties.
What could go wrong?
Yes. Better if they collect it from personal blogs running on people’s PCs 👍
That would be a more honest representation of human culture rather than the curated content that is constantly manipulated and controlled by a private corporation.
So basically it’s just a Reddit search engine. Where most of the facts are based on “trust me bro”.
Personally, I’m disappointed Truth Social isn’t on the list
“Facts”
“Everythere” is a radical new word.
Perfectly cromulent
Embiggens the best of us.
Canoodling in the threads.
What if people really got banned on Reddit for posting nonsense. I remember responding to several comments where people threw random words in and spelled stuff wrong. It was a funny trend, but could have set back their billion dollar AI.
Is that why people were doing that?
I have seen posts that were edited to be random dictionary words on Reddit. Complete nonsense.
Most have just removed their replies while others deleted their accounts.
People have been protesting for a while.
I’ve seen a lot more of that. They have a tool for it
Maybe. Its like “Bazinga” and “Google en passant” before that. With Bazinga you would just randomly use the word Bazinga.
However in the “en passant” you would post a chess board depicting a move and some advice such as “when the king is in this space he cannot be check mated because he can make one move like a knight”. To which people would ask if that was a valid chess move. OP would then advise it was, and to “Google en passant”. People would then post random feedback.
People have been trolling AI for years.
Wait until the AI finds out about Il Vaticano
Holy Hell!
I’m amazed its brain isn’t completely paralyzed with this dataset lmao
I keep having to argue with people that the crap that chat GPT told them doesn’t exist.
I asked AI to explain how to set a completely fictional setting in an admin control panel and it told me exactly where to go and what non-existent buttons to press.
I actually had someone send me a screenshot of instructions on how to do exactly what they wanted and I sent back screenshots of me during the directions to a tee, and pointing out that the option didn’t exist.
And it keeps happening.
“AI” gets big uppies energy from telling you that something can be done and how to do it. It does not get big uppies energy from telling you that something isn’t possible. So it’s basically going to lie to you about whatever you want to hear so it gets the good good.
No, seriously, there’s a weighting system to responses. When something isn’t possible, it tends to be a less favorable response than hallucinating a way for it to work.
I am quickly growing to hate this so-called “AI”. I’ve been on the Internet long enough that I can probably guess what the AI will reply to just about any query.
It’s just… Inaccurate, stupid, and not useful. Unless you’re repeating something that’s already been said a hundred different ways by a hundred different people and you just want to say the same thing… Then it’s great.
Hey, chat GPT, write me a cover letter for this job posting. Cover letters suck and are generally a waste of fucking time, so, who gives a shit?
to be fair, you could train an LLM on only Microsoft documentation with 100% accuracy, and it will still do the same with broken instructions because Microsoft has 12 guides for how to do a thing, and they all don’t work because they keep changing the layout, moving shit around or renaming crap and don’t update their documentation.
The worst is that they replace products and give them the same name.
Teams, was replaced with “new” teams, that then got renamed to teams again.
Outlook is now known as Outlook (classic) and the new version of Outlook is just called Outlook.
Both are basically just webapps.
I could go on.
It just copies corporate cool aid yes man culture. If it didn’t marketing would say it’s not ready for release.
Think about it, how much corpo bosses and marketing get annoyed and label you as “difficult” if they get to you with a stupid idea and you call it BS? Now make the AI so that it pleases that kind of people.
I asked AI to explain how to set a completely fictional setting in an admin control panel and it told me exactly where to go and what non-existent buttons to press.
This makes sense if you consider it works by trying to find the most accurate next word in a sentence. Ask it where I can turn off the screen defogger in windows and it will associate “screen” with “monitor” or “display”. “Turn off” -> must be a toggle… yeah go to settings -> display -> defogger toggle.
Its not AI, its not smart, its text prediction with a few extra tricks.
I describe it as unchecked auto correct that just accepts the most likely next word without user input, and trained on the entire Internet.
So the response reflects the average of every response on the public Internet.
Great for broad, common queries, but not great for specialized, specific and nuanced questions.
“Cited”. This does not represent where the training data comes from, it represents the most common result when the LLM calls a tool like
web_search.Exactly. The article just discovered that high traffic sites are high ranked in search. That list is basically: https://en.wikipedia.org/wiki/List_of_most-visited_websites
Basically this means to head back to reddit and poison up!
Facebook? 😂
Walmart?
This graphic is missing the enormous amount of pirated media








