• douglasg14b@lemmy.world
    link
    fedilink
    English
    arrow-up
    34
    ·
    edit-2
    11 months ago

    Generative AI is INCREDIBLY bad at mathmatical/logical reasoning. This is well known, and very much not surprising.

    That’s actually one of the milestones on the way to general artificial intelligence. The ability to reason about logic & math is a huge increase in AI capability.

    • kromem@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      arrow-down
      1
      ·
      11 months ago

      It’s really not in the most current models.

      And it’s already at present incredibly advanced in research.

      The bigger issue is abstract reasoning that necessitates nonlinear representations - things like Sodoku, where exploring a solution requires updating the conditions and pursuing multiple paths to a solution. This can be achieved with multiple calls, but doing it in a single process is currently a fool’s errand and likely will be until a shift to future architectures.

      • douglasg14b@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        11 months ago

        I’m referring to models that understand language and semantics, such as LLMs.

        Other models that are specifically trained can’t do what it can, but they can perform math.

        • kromem@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          11 months ago

          The linked research is about LLMs. The opening of the abstract of the paper:

          In recent years, large language models have greatly improved in their ability to perform complex multi-step reasoning. However, even state-of-the-art models still regularly produce logical mistakes. To train more reliable models, we can turn either to outcome supervision, which provides feedback for a final result, or process supervision, which provides feedback for each intermediate reasoning step. Given the importance of training reliable models, and given the high cost of human feedback, it is important to carefully compare the both methods. Recent work has already begun this comparison, but many questions still remain. We conduct our own investigation, finding that process supervision significantly outperforms outcome supervision for training models to solve problems from the challenging MATH dataset. Our process-supervised model solves 78% of problems from a representative subset of the MATH test set. Additionally, we show that active learning significantly improves the efficacy of process supervision.