Well that explains everythere. Where AI gets its facts

Stamets@lemmy.dbzer0.com · 8 months ago

Well that explains everythere. Where AI gets its facts

Slyke@lemmy.ca · 8 months ago

That’s not how AI learns “facts”, that’s how AI learns tokens.

massive_bereavement@fedia.io · 8 months ago

And google prioritizes reddit responses, so it’s a bit of an ouroboros of garbage.

Grabthar@lemmy.world · 8 months ago

Was this guide AI generated as well? Looks like it credits over 100% of its information gathering to the first four sites on the list.

ngdev@lemmy.zip · 8 months ago

another comment explains some responses can contain multiple sources hence >100%

Grabthar@lemmy.world · 8 months ago

Ah, so what you’re saying is it doesn’t get 40% of its facts from reddit, but rather 40% of its replies contain a fact cited from reddit? That would explain totals over 100%, but I’m still not sure why they wouldn’t just say that of the x thousand facts AI cited, y percent came from this site. To me, that would have been more representative of what their graph title purports to offer.

ngdev@lemmy.zip · 8 months ago

im literally just regurgitating something i saw another person comment. but yeah if that was the case why wouldnt they elucidate that lol

IninewCrow@lemmy.ca · 8 months ago

So aside from Wikipedia which is a publicly user maintained service which has become pretty reputable … the majority of the ‘facts’ that LLMs collect (about 75%) is all collected from privately controlled websites with curated content that is managed and maintained by corporations. And of all that content, most of it is also manipulated and controlled to make people either angry, mad, frightened, sad or anxious.

They’re teaching the next AI on our negative impulses, greatest fears and worst anxieties.

What could go wrong?

Riskable@programming.dev · 8 months ago

Yes. Better if they collect it from personal blogs running on people’s PCs 👍

IninewCrow@lemmy.ca · 8 months ago

That would be a more honest representation of human culture rather than the curated content that is constantly manipulated and controlled by a private corporation.

CL4P-TP@lemmy.dbzer0.com · 8 months ago

So basically it’s just a Reddit search engine. Where most of the facts are based on “trust me bro”.

shplane@lemmy.world · 8 months ago

Personally, I’m disappointed Truth Social isn’t on the list

Kairos@lemmy.today · 8 months ago

“Facts”

null@piefed.nullspace.lol · 8 months ago

“Everythere” is a radical new word.

Ioughttamow@fedia.io · 8 months ago

Perfectly cromulent

Ex Nummis@lemmy.world · 8 months ago

Embiggens the best of us.

Riskable@programming.dev · 8 months ago

Canoodling in the threads.

Wilco@lemmy.zip · 8 months ago

What if people really got banned on Reddit for posting nonsense. I remember responding to several comments where people threw random words in and spelled stuff wrong. It was a funny trend, but could have set back their billion dollar AI.

Feathercrown@lemmy.world · 8 months ago

Is that why people were doing that?

MystikIncarnate@lemmy.ca · 8 months ago

I have seen posts that were edited to be random dictionary words on Reddit. Complete nonsense.

Most have just removed their replies while others deleted their accounts.

People have been protesting for a while.

Feathercrown@lemmy.world · 8 months ago

I’ve seen a lot more of that. They have a tool for it

Wilco@lemmy.zip · 8 months ago

Maybe. Its like “Bazinga” and “Google en passant” before that. With Bazinga you would just randomly use the word Bazinga.

However in the “en passant” you would post a chess board depicting a move and some advice such as “when the king is in this space he cannot be check mated because he can make one move like a knight”. To which people would ask if that was a valid chess move. OP would then advise it was, and to “Google en passant”. People would then post random feedback.

People have been trolling AI for years.

Feathercrown@lemmy.world · 8 months ago

Wait until the AI finds out about Il Vaticano

tetrahedron@programming.dev · 8 months ago

Holy Hell!

Feathercrown@lemmy.world · 8 months ago

I’m amazed its brain isn’t completely paralyzed with this dataset lmao

MystikIncarnate@lemmy.ca · 8 months ago

I keep having to argue with people that the crap that chat GPT told them doesn’t exist.

I asked AI to explain how to set a completely fictional setting in an admin control panel and it told me exactly where to go and what non-existent buttons to press.

I actually had someone send me a screenshot of instructions on how to do exactly what they wanted and I sent back screenshots of me during the directions to a tee, and pointing out that the option didn’t exist.

And it keeps happening.

“AI” gets big uppies energy from telling you that something can be done and how to do it. It does not get big uppies energy from telling you that something isn’t possible. So it’s basically going to lie to you about whatever you want to hear so it gets the good good.

No, seriously, there’s a weighting system to responses. When something isn’t possible, it tends to be a less favorable response than hallucinating a way for it to work.

I am quickly growing to hate this so-called “AI”. I’ve been on the Internet long enough that I can probably guess what the AI will reply to just about any query.

It’s just… Inaccurate, stupid, and not useful. Unless you’re repeating something that’s already been said a hundred different ways by a hundred different people and you just want to say the same thing… Then it’s great.

Hey, chat GPT, write me a cover letter for this job posting. Cover letters suck and are generally a waste of fucking time, so, who gives a shit?

Bluegrass_Addict@lemmy.ca · 8 months ago

to be fair, you could train an LLM on only Microsoft documentation with 100% accuracy, and it will still do the same with broken instructions because Microsoft has 12 guides for how to do a thing, and they all don’t work because they keep changing the layout, moving shit around or renaming crap and don’t update their documentation.

MystikIncarnate@lemmy.ca · 8 months ago

The worst is that they replace products and give them the same name.

Teams, was replaced with “new” teams, that then got renamed to teams again.

Outlook is now known as Outlook (classic) and the new version of Outlook is just called Outlook.

Both are basically just webapps.

I could go on.

trolololol@lemmy.world · edit-2 8 months ago

It just copies corporate cool aid yes man culture. If it didn’t marketing would say it’s not ready for release.

Think about it, how much corpo bosses and marketing get annoyed and label you as “difficult” if they get to you with a stupid idea and you call it BS? Now make the AI so that it pleases that kind of people.

PieMePlenty@lemmy.world · edit-2 8 months ago

I asked AI to explain how to set a completely fictional setting in an admin control panel and it told me exactly where to go and what non-existent buttons to press.

This makes sense if you consider it works by trying to find the most accurate next word in a sentence. Ask it where I can turn off the screen defogger in windows and it will associate “screen” with “monitor” or “display”. “Turn off” -> must be a toggle… yeah go to settings -> display -> defogger toggle.

Its not AI, its not smart, its text prediction with a few extra tricks.

MystikIncarnate@lemmy.ca · 8 months ago

I describe it as unchecked auto correct that just accepts the most likely next word without user input, and trained on the entire Internet.

So the response reflects the average of every response on the public Internet.

Great for broad, common queries, but not great for specialized, specific and nuanced questions.

Xylight‮@lemdro.id · 8 months ago

“Cited”. This does not represent where the training data comes from, it represents the most common result when the LLM calls a tool like web_search.

FauxLiving@lemmy.world · 8 months ago

Exactly. The article just discovered that high traffic sites are high ranked in search. That list is basically: https://en.wikipedia.org/wiki/List_of_most-visited_websites

Treczoks@lemmy.world · 8 months ago

Basically this means to head back to reddit and poison up!

eigenraum@discuss.tchncs.de · 8 months ago

Facebook? 😂

individual@toast.ooo · 8 months ago

Walmart?

HootinNHollerin@lemmy.dbzer0.com · 8 months ago

This graphic is missing the enormous amount of pirated media