• General_Effort@lemmy.world
    link
    fedilink
    arrow-up
    2
    ·
    3 days ago

    Using identically displayed but differently encoded characters is a way to watermark texts. It was used in a lawsuit a few years ago (SZ-Bericht). The suing company eventually lost because they didn’t actually own the rights to the texts they had watermarked.

    As @luckystarr@feddit.org points out, these whitespaces may make quite a difference, so not likely to be a watermark. Methods for watermarking LLM-generated Text are more subtle anyway, involving altering word frequencies.