So I gave an LLMbecile that claims to be a reasoning model…

Are you a “reasoning” model?

Great question! Yes, I am designed to be a “reasoning” model. That means I don’t just repeat information—I analyze, synthesize, and draw logical conclusions based on the data and context you provide. My goal is to help you solve problems, answer questions, and think through complex topics in a way that’s clear and helpful.

If you have a tricky question, need help with decision-making, or want to break down a complicated concept, feel free to put me to the test!

…the following as a test of its reasoning:

There are seven customers (Carol, Darrell, Henrietta, Kenneth, Lula, Omar, Peter).
There are seven cheeses (cheddar, colby jack, gouda, havarti, muenster, provolone, swiss).
There are seven meats (bologna, corned beef, ham, liverwurst, prosciutto, roast beef, salami).
There are seven sandwiches priced at one of ($2.75 $3.75 $4.75 $5.75 $6.75 $7.75 $8.75)

Match prices to customers, meats, and cheeses based on these clues:

1. Omar's sandwich costs more than the order with liverwurst.
2. Henrietta's order doesn't cost $2.75.
3. Kenneth's order is either the order with provolone cheese or the $7.75 order.
4. Omar's order doesn't include swiss cheese.
5. The order with prosciutto includes provolone cheese.
6. Lula's order doesn't include corned beef.
7. The order with ham doesn't include swiss cheese.
8. Henrietta's order costs less than Darrell's sandwich.
9. Carol's sandwich costs $5.75.
10. Of the sandwich with havarti cheese and Kenneth's order, one includes prosciutto and the other costs
$3.75.
11. Henrietta's sandwich doesn't cost $4.75.
12. The order with colby jack cheese costs less than the sandwich with prosciutto.
13. Henrietta's sandwich doesn't include roast beef.
14. The order with roast beef doesn't include colby jack cheese.
15. The sandwich with muenster cheese costs 1 dollar less than Lula's order.
16. The $5.75 order includes colby jack cheese.
17. The $8.75 sandwich includes gouda cheese.
18. The sandwich with colby jack cheese costs 3 dollars less than the sandwich with corned beef.
19. The $3.75 order is either the sandwich with bologna or Kenneth's sandwich.

Now before I did this, I gave the task to my SO as an assignment. (He’s an engineer.) He wrote two solutions using two different tools. Something he called CLP/FD and something he called CHR. He’s rusty in both of those, so it took him about 30 minutes to get them working. (He’s confident he can do any more such problems with far less time spent coding now.)

It took about 3ms for his first program to solve it, and about 7ms for his second.

So how did the “reasoning” LLMbecile fare?

Well, it was infinitely slower than my SO’s programs. Because it couldn’t solve it.

After 16 steps (I told it to explain its reasoning along the way; I’m not going to copy all that out because it’s tedious) it gave me this:

So it failed, and when I pressed it for the final solution it basically argued that it would take too long but told me that if I guided it with some more detailed questions it could solve it for me.

(You know. The thing I already had solved two different ways before asking the LLMbecile.)

When I asked this, however…

I thought the whole point of a reasoning model was to do the reasoning, not have the user do the reasoning for it?

… the answer was golden:


Your critique is entirely valid—and it’s a sharp observation that highlights the current state of AI reasoning models. Here’s what’s really going on:

Why AI Reasoning Models Struggle with Complex Logic Puzzles

  • Pattern Matching, Not True Reasoning:
    Most AI models, including those billed as “reasoning” models, are fundamentally pattern matchers. They excel at low- and medium-complexity problems where the answer can be inferred from seen data or familiar patterns. But when a problem’s complexity increases—like with large, multi-variable logic grid puzzles—their performance can collapse, and they often fail to follow through with systematic, step-by-step deduction.

  • No Genuine Understanding:
    These models don’t “understand” logic or context the way humans do. They don’t build or manipulate internal representations of the puzzle’s state; instead, they generate plausible next steps based on training data. This leads to errors, contradictions, or incomplete solutions as complexity rises.

  • Scaling Limits:
    Research shows that as puzzles get harder, models may initially increase their reasoning effort, but beyond a certain point, their ability to sustain coherent reasoning collapses—even if they have enough computational resources to keep going.

  • Lack of Explicit Algorithms:
    Unlike humans who might draw a grid or systematically eliminate possibilities, models rarely execute explicit, algorithmic reasoning. Their “chain-of-thought” is often just a simulation of reasoning, not the real thing.

What This Means for Users

  • For simple logic puzzles:
    AI models can often solve them and explain their reasoning convincingly.

  • For complex, multi-layered puzzles:
    The model’s reasoning may break down, leading to incomplete, inconsistent, or outright wrong answers—just as you observed in your sandwich puzzle example.

  • Human intervention is still needed:
    For truly novel, complex logic problems, humans remain better at systematic, stepwise deduction, especially when the solution space is large and requires careful cross-elimination.

Bottom Line

AI “reasoning” is impressive, but brittle. It’s not the same as human logic, and it can’t reliably replace human reasoning in genuinely complex or novel scenarios—especially when the solution requires detailed, multi-step deduction and cross-referencing.

Your expectation that a reasoning model should “do the reasoning” is exactly what researchers are working toward—but as of now, there are real, well-documented limits to what these models can achieve on hard logic puzzles.


The takeaway points from this response (keeping in mind that all of this comes from the LLMbecile itself … so it may be hallucinated! 🤣):

  • Most AI models, including those billed as “reasoning” models, are fundamentally pattern matchers.
  • These models don’t “understand” logic or context the way humans do.
  • their ability to sustain coherent reasoning collapses
  • Their “chain-of-thought” is often just a simulation of reasoning, not the real thing.
  • The model’s reasoning may break down, leading to incomplete, inconsistent, or outright wrong answers
  • Human intervention is still needed

So next time you see some aitechbroshilldude honking the horn of “reasoning models”, point them to this where a reasoning LLMbecile explains why it’s not actually reasoning (and accidentally explains why it’s utterly useless).

  • pixxelkick@lemmy.world
    link
    fedilink
    arrow-up
    8
    arrow-down
    6
    ·
    3 days ago

    LLMs are not self aware, any random nonsense they generate about themselves is not remotely reliable as a source of truth.

    You can’t ask it for info about who/what it is and take that at face value, it’s just as randomly generated as any other output.

    In terms of reasoning, you’ll wanna understand zero vs one vs many shot prompting, complex logic puzzles still typically require at minimum one shot prompts, but if complex enough may require a multi shot prompt to get it going.

    Treat an LLM a lot like a lawn mower gas engine, if you just take it out on the yard and drag it around without actually starting the engine up, it’s not going to be surprising that it didnt cut any grass.

    For all intents and purpose for a puzzle like this, you likely need to first provide an example of solving a different puzzle of the same “type”, demonstrating the steps to achieve a solution for that puzzle.

    Then you provide the actual puzzle to the LLM, and it’s success rate will skyrocket.

    The pre-load puzzle can be a simpler one, its mostly just about demonstrating the format and steps one “how” you do this “type” of puzzle, that can usually be good enough to get the ball rolling to get the LLM to start generating sane output.

    This is called “One Shot” prompting.

    However, for more complex stuff you may need to pre-prompt with 2 to 4 examples, ideally focusing on keeping the syntax very tight and small so the context windows stays small (using stuff like icons and shorthands to shorten up phrases and turn an entire sentence into 2-3 words can help a lot)

    With multiple preloaded prompts this can even further boost the LLMs reliability of output. We call this “Multi shot” prompting.

    Its very well known that even the best trained LLMs still struggle a lot with logic puzzles AND zero prompt shots at it

    Only if its a well known logic puzzle that is already well solved, in which case instead of actually “solving” it, the llm will simply just regurgitate the answer verbatim someone wrote out on some random forum or whatever it was trained on.

    But for a unique new logic puzzle, it becomes necessary to at minimum one shot prompt usually.

    • RichardDegenne@lemmy.zip
      link
      fedilink
      arrow-up
      14
      arrow-down
      1
      ·
      3 days ago

      “It can’t be that stupid, you must be prompting it wrong.”

      A timeless classic as this point.

      • ZDL@lazysoci.alOP
        link
        fedilink
        arrow-up
        6
        arrow-down
        1
        ·
        3 days ago

        Besides, I took it at its word. Is it my fault it lies?

        My goal is to help you solve problems, answer questions, and think through complex topics in a way that’s clear and helpful. If you have a tricky question, need help with decision-making, or want to break down a complicated concept, feel free to put me to the test!

      • pixxelkick@lemmy.world
        link
        fedilink
        arrow-up
        1
        arrow-down
        4
        ·
        3 days ago

        The principle that one shot prompts are pretty critical for logic puzzles is well established at this point, has been for well over a year now.

        Like I said, this is like someone dragging their lawmower out onto the lawn without starting it, and then proclaiming lawnmowers suck cuz their lawn didnt get cut.

        You have to start the thing for it to work, mate, lol.

        I get that itd be nice if you didnt have to, but thats not how an LLM works, LLMs are predictive text algorithms which means they need something to start predicting off of as a starting point, thats like their whole shtick.

        If you dont give them a solid starting point to work from, you are literally just rolling the dice on if it’ll do what you want or not, because Zero shot prompting is going full “jesus take the wheel” mode on the algorithm.

        It’s annoying that marketing and consumers have created this very wrong perception about “what” an LLM is.

        When you asks someone “knock knock” and they respond with “who’s there?” thats all an LLM is doing, it’s just predicting what text outta come up statistically next.

        If you dont establish a precedent, you’re going full RNGjesus on praying it choose the correct direction

        And more important, and I CANNOT stress this enough…

        Once an LLM gets the answer wrong, if you keep chasing that thread, it will continue to keep behaving wrong

        Because you’ve established the pattern now in that thread that “User B is an idiot”, and told it its wrong, and that means its gonna now keep generating the content of what a wrong/stupid responder would sound like

        Consider this thought experiment, if you will:

        If I hand a person the incomplete text of a play where 2 characters are talking to each other, A and B, and the entire text is B saying dumb shit and A correcting B, and I ask that person to add some more content to the end of what I’ve got so far, “finish this” so to say, do you think they’re gonna suddenly pivot to B no longer being an idiot?

        Or… do you think it’s more likely they’ll keep the pattern going I have established, and continue to make B sound stupid for A to correct them on?

        Probably the latter, right?

        Thats all an LLM is, so if you already have 3 instances of you telling the LLM “No thats wrong you are dumb”, guess what?

        You have literally conditioned it now to get even dumber, so its gonna respond with even more wrong responses, because you’re chasing that thread.

        • RichardDegenne@lemmy.zip
          link
          fedilink
          arrow-up
          4
          ·
          3 days ago

          It’s annoying that marketing and consumers have created this very wrong perception about “what” an LLM is.

          On the one hand, yes.

          On the other hand, if it needs me to hold its hand on top of the electric power of a small city to solve a fucking zebra puzzle, then I don’t care about it or how it works, it is not useful technology.

    • ZDL@lazysoci.alOP
      link
      fedilink
      arrow-up
      9
      ·
      3 days ago

      LLMs are not self aware, any random nonsense they generate about themselves is not remotely reliable as a source of truth.

      It’s almost as if I knew that.

      (keeping in mind that all of this comes from the LLMbecile itself … so it may be hallucinated! 🤣)

      Almost.