• TIMMAY@lemmy.world
    link
    fedilink
    arrow-up
    0
    ·
    6 months ago

    Sean Caroll has talked about a few word puzzles he asked chatgpt and gpt4 or whatever and they were interesting examples. In one he asked something to the effect of “if i cooked a pizza in a pan yesterday at 200 C, is it safe to pick up?” and it answered with a very wordy “no, its not safe” because that was the best match of a next phrase given his question, and not because it can actually consider the situation.

    • lordmauve@programming.dev
      link
      fedilink
      arrow-up
      0
      ·
      6 months ago

      I don’t deny that this kind of thing is useful for understanding the capabilities and limitations of LLMs but I don’t agree that “the best match of a next phrase given his question, and not because it can actually consider the situation.” is an accurate description of an LLM’s capabilities.

      While they are dumb and unworldly they can consider the situation: they evaluate a learned model of concepts in the world to decide if the first word of the correct answer is more likely to be yes or no. They can solve unseen problems that require this kind of cognition.

      But they are only book-learned and so they are kind of stupid about common sense things like frying pans and ovens.

    • ZMoney@lemmy.world
      link
      fedilink
      arrow-up
      0
      ·
      edit-2
      6 months ago

      And nobody on the internet is asking obvious questions like that, so counterintuitively it’s better at solving hard problems. Not that it actually has any idea what it is doing.

      EDIT: Yeah guys, I understand that it doesn’t think. Thought that was obvious. I was just pointing out that it’s even worse at providing answers to obvious questions that there is no data on.

      • TIMMAY@lemmy.world
        link
        fedilink
        arrow-up
        0
        ·
        edit-2
        6 months ago

        Unfortunately it doesnt have the capacity to “solve” anything at all, only to take a text given by the user and parse it into what essentially amount to codons, then provide other codons that fit the data it was provided to the best of its ability. When the data it is given is something textual only, it does really well, but it cannot “think” about anything, so it cannot work with new data and it shows its ignorance when provided with a foreign concept/context.

        edit: it also has a more surface-level filter to remove unwanted results that are offensive