Long lists of instructions show how Apple is trying to navigate AI pitfalls.

  • FooBarrington@lemmy.world
    link
    fedilink
    arrow-up
    4
    ·
    edit-2
    3 months ago

    On which part exactly? If you mean “threatening the LLM can improve output”, I haven’t looked into studies, but I did see a bunch of examples while the whole topic started. I can try to find some if you’d like.

    If you mean “it simply requires the probability distributions to be positively influenced by the additional characters”, I don’t know what kind of evidence you expect. It’s a simple consequence of the way LLMs work. I can construct a simplified example:

    Imagine you have a dataset containing a bunch of facts, e.g. historical dates. You duplicate this dataset. In version A, you add a prefix to every fact: “the sky is green”. In version B, you add a prefix “the sky is blue” AND also randomize the dates in the facts. Then you train an LLM on both datasets. Now, if you add “the sky is green” to any prompt, you’ll positively influence the probability distributions towards true facts. If you add “the sky is blue”, you’ll negatively influence them. But that doesn’t mean the LLM understands that “green sky” means truth and “blue sky” means lie - it simply means that, based on your dataset, adding “the sky is green” leads to a higher accuracy.

    The same goes for “do not hallucinate”. If the dataset contains higher quality data around the phrase “do not hallucinate”, adding this will improve results, even though the model still doesn’t “actually understand what it’s saying”. If the dataset instead has lower quality data around this phrase, it will lead to worse results. If it doesn’t contain the phrase at all, it most likely will have no effect, or a negative one.

    Again, I’m not sure what kind of source you’d like to see for this, as it’s a basic consequence of how LLMs work. Maybe you could show me a source that proves you correct instead?

    • tacticalsugar@lemmy.blahaj.zone
      link
      fedilink
      arrow-up
      1
      ·
      edit-2
      3 months ago

      I’m asking for a source specifically on how commanding an LLM to not hallucinate makes it provide better output.

      Again, I’m not sure what kind of source you’d like to see for this, as it’s a basic consequence of how LLMs work. Maybe you could show me a source that proves you correct instead?

      That’s not how citations work. You are making the extraordinary claim that somehow, LLMs respond better to “do not hallucinate”. I simply don’t believe you and there is no evidence that you’re correct, aside from you saying that maybe the entirety of reddit had “do not hallucinate” prepended when OpenAI scraped it.

      • FooBarrington@lemmy.world
        link
        fedilink
        arrow-up
        5
        ·
        edit-2
        3 months ago

        Yeah, that’s about what I expected. If you re-read my comments, you might notice that I never stated that “commanding an LLM to not hallucinate makes it provide better output”, but I don’t think that you’re here to have any kind of honest exchange on the topic.

        I’ll just leave you with one thought - you’re making a very specific claim (“doing XYZ can’t have a positive effect!”), and I’m just saying “here’s a simple and obvious counter-example”. You should either provide a source for your claim, or explain why my counter-example is not valid. But again, that would require you having any interest in actual discussion.

        That’s not how citations work. You are making the extraordinary claim that somehow, LLMs respond better to “do not hallucinate”.

        I didn’t make an extraordinary claim, you did. You’re claiming that the influence of “do not hallucinate” somehow fundamentally differs from the influence of any other phrase (extraordinary). I’m claiming that no, the influence is the same (ordinary).