The newer GPT-o3 and GPT-o4 mini models appear to be embedding special character watermarks in generated text. However, removing these watermarks is relatively simple, making this seem more like a short-term measure than a long-term solution
Using identically displayed but differently encoded characters is a way to watermark texts. It was used in a lawsuit a few years ago (SZ-Bericht). The suing company eventually lost because they didn’t actually own the rights to the texts they had watermarked.
As @luckystarr@feddit.org points out, these whitespaces may make quite a difference, so not likely to be a watermark. Methods for watermarking LLM-generated Text are more subtle anyway, involving altering word frequencies.
Using identically displayed but differently encoded characters is a way to watermark texts. It was used in a lawsuit a few years ago (SZ-Bericht). The suing company eventually lost because they didn’t actually own the rights to the texts they had watermarked.
As @luckystarr@feddit.org points out, these whitespaces may make quite a difference, so not likely to be a watermark. Methods for watermarking LLM-generated Text are more subtle anyway, involving altering word frequencies.