Training "AI" On Public Data Is Totally Fine And Not Stealing.

31337@sh.itjust.works · 3 months ago

Training "AI" On Public Data Is Totally Fine And Not Stealing.

NateNate60@lemmy.world · 3 months ago

This is not an opinion. You have made a statement of fact. And you are wrong.

At law, something being publicly available does not mean it is allowed to be used for any purpose. Copyright law still applies. In most countries, making something publicly available does not cause all copyrights to be disclaimed on it. You are still not permitted to, for example, repost it elsewhere without the copyright holder’s permission, or, as some courts have ruled, use it to train an AI that then creates derivative works. Derivative works are not permitted without the copyright holder’s permission. Courts have ruled that this could mean everything an AI generates is a derivative work of everything in its training data and, therefore, copyright infringement.

Zagorath@aussie.zone · 3 months ago

They have indeed made a statement of fact. But to the best of my knowledge it’s not one that’s got any definite controlling precedent in law.

You are still not permitted to, for example, repost it elsewhere without the copyright holder’s permission

That’s the thing. It’s not clear that an LLM does “repost it elsewhere”. As the OP said, the model itself is basically just a mathematical construct that can’t really be turned back into the original work, which is possibly a sign that it’s not a derivative work, but a transformative one, which is much more likely to be given Fair Use protection. Though Fair Use is always a question mark and you never really know if a use is Fair without going to court.

You could be right here. Or OP could. As far as I’m concerned anyone claiming to know either way is talking out of their arse.