• excral@feddit.org
    link
    fedilink
    arrow-up
    2
    ·
    3 days ago

    My guess is that one of the major motivations of this is to identify their own texts when training future AI models. Training LLMs on LLM-generated data is harmful to their performance and leads to regression, but more and more data they scrape from the internet is LLM-generated. With measures like this they may be able to filter out a significant chunk of the data they generated themselves from future training data.