What’s a better way to word it? I can’t think of another way to say it that’s as concise and clearly communicates the idea. It seems like it would be harder in general to describe machines meant to emulate human thought without anthropomorphic analogies.
While many believe that LLMs can’t output the training data, recent work shows that substantial amounts of copyrighted text can be extracted from open-weight models…
Note that this neutral language makes it more apparent that it’s possible thal llms are able to output the training data, since it’s what the model’s network is build upon. By using personifying language, we’re biasing people into thinking about llms as if they were humans, and this will affect, for example, court decisions, like the ones related to copyright.
What’s a better way to word it? I can’t think of another way to say it that’s as concise and clearly communicates the idea. It seems like it would be harder in general to describe machines meant to emulate human thought without anthropomorphic analogies.
One possibility:
While many believe that LLMs can’t output the training data, recent work shows that substantial amounts of copyrighted text can be extracted from open-weight models…
Note that this neutral language makes it more apparent that it’s possible thal llms are able to output the training data, since it’s what the model’s network is build upon. By using personifying language, we’re biasing people into thinking about llms as if they were humans, and this will affect, for example, court decisions, like the ones related to copyright.