That’s not the point. You don’t train a hammer from millions of user inputs.
You gotta ask, if the AI can produce inappropriate material, then where did the developers get the training data, and what exactly did they train those AI models for?
Do… Do you really think the creators/developers of Stable Diffusion (the AI art tool in question here) trained it on CSAM before distributing it to the public?
Or are you arguing that we should be allowed to do what’s been done in the article? (arrest and charge the individual responsible for training their copy of an AI model to generate CSAM)
One, AI image generators can and will spit out content vastly different than anything in the training dataset (this ofc can be influenced greatly by user input). This can be fed back into the training data to push the model towards the desired outcome. Examples of the desired outcome are not required at all. (IE you don’t have to feed it CSAM to get CSAM, you just have to consistently push it more and more towards that goal)
Two, anyone can host an AI model; it’s not reserved for big corporations and their server farms. You can host your own copy and train it however you’d like on whatever material you’ve got. (that’s literally how Stable Diffusion is used) This kind of explicit material is being created by individuals using AI software they’ve downloaded/purchased/stolen and then trained themselves. They aren’t buying a CSAM generator ready to use off the open market… (nor are they getting this material from publicly operating AI models)
They are acquiring a tool and moulding it into a weapon of their own volition.
Some tools you can just use immediately, others have a setup process first. AI is just a tool, like a hammer. It can be used appropriately, or not. The developer isn’t responsible for how you decide to use it.
Do… Do you really think the creators/developers of Stable Diffusion (the AI art tool in question here) trained it on CSAM before distributing it to the public?
3,226 suspected images out of 5.8 billion. About 0.00006%. And probably mislabeled to boot, or it would have been caught earlier. I doubt it had any significant impact on the model’s capabilities.
That’s not the point. You don’t train a hammer from millions of user inputs.
You gotta ask, if the AI can produce inappropriate material, then where did the developers get the training data, and what exactly did they train those AI models for?
Do… Do you really think the creators/developers of Stable Diffusion (the AI art tool in question here) trained it on CSAM before distributing it to the public?
Or are you arguing that we should be allowed to do what’s been done in the article? (arrest and charge the individual responsible for training their copy of an AI model to generate CSAM)
One, AI image generators can and will spit out content vastly different than anything in the training dataset (this ofc can be influenced greatly by user input). This can be fed back into the training data to push the model towards the desired outcome. Examples of the desired outcome are not required at all. (IE you don’t have to feed it CSAM to get CSAM, you just have to consistently push it more and more towards that goal)
Two, anyone can host an AI model; it’s not reserved for big corporations and their server farms. You can host your own copy and train it however you’d like on whatever material you’ve got. (that’s literally how Stable Diffusion is used) This kind of explicit material is being created by individuals using AI software they’ve downloaded/purchased/stolen and then trained themselves. They aren’t buying a CSAM generator ready to use off the open market… (nor are they getting this material from publicly operating AI models)
They are acquiring a tool and moulding it into a weapon of their own volition.
Some tools you can just use immediately, others have a setup process first. AI is just a tool, like a hammer. It can be used appropriately, or not. The developer isn’t responsible for how you decide to use it.
Then that settles it. It’s whoever allows bad data into the training data.
Yes. Because they did (not intentionally though)
https://cyber.fsi.stanford.edu/news/investigation-finds-ai-image-generation-models-trained-child-abuse
3,226 suspected images out of 5.8 billion. About 0.00006%. And probably mislabeled to boot, or it would have been caught earlier. I doubt it had any significant impact on the model’s capabilities.