So, "reasoning" LLMs can't reason. (Hot news flash, I know!)

ZDL@lazysoci.al · 3 days ago

So, "reasoning" LLMs can't reason. (Hot news flash, I know!)

pixxelkick@lemmy.world · 3 days ago

LLMs are not self aware, any random nonsense they generate about themselves is not remotely reliable as a source of truth.

You can’t ask it for info about who/what it is and take that at face value, it’s just as randomly generated as any other output.

In terms of reasoning, you’ll wanna understand zero vs one vs many shot prompting, complex logic puzzles still typically require at minimum one shot prompts, but if complex enough may require a multi shot prompt to get it going.

Treat an LLM a lot like a lawn mower gas engine, if you just take it out on the yard and drag it around without actually starting the engine up, it’s not going to be surprising that it didnt cut any grass.

For all intents and purpose for a puzzle like this, you likely need to first provide an example of solving a different puzzle of the same “type”, demonstrating the steps to achieve a solution for that puzzle.

Then you provide the actual puzzle to the LLM, and it’s success rate will skyrocket.

The pre-load puzzle can be a simpler one, its mostly just about demonstrating the format and steps one “how” you do this “type” of puzzle, that can usually be good enough to get the ball rolling to get the LLM to start generating sane output.

This is called “One Shot” prompting.

However, for more complex stuff you may need to pre-prompt with 2 to 4 examples, ideally focusing on keeping the syntax very tight and small so the context windows stays small (using stuff like icons and shorthands to shorten up phrases and turn an entire sentence into 2-3 words can help a lot)

With multiple preloaded prompts this can even further boost the LLMs reliability of output. We call this “Multi shot” prompting.

Its very well known that even the best trained LLMs still struggle a lot with logic puzzles AND zero prompt shots at it

Only if its a well known logic puzzle that is already well solved, in which case instead of actually “solving” it, the llm will simply just regurgitate the answer verbatim someone wrote out on some random forum or whatever it was trained on.

But for a unique new logic puzzle, it becomes necessary to at minimum one shot prompt usually.

RichardDegenne@lemmy.zip · 3 days ago

“It can’t be that stupid, you must be prompting it wrong.”

A timeless classic as this point.

ZDL@lazysoci.al · 3 days ago

Besides, I took it at its word. Is it my fault it lies?

My goal is to help you solve problems, answer questions, and think through complex topics in a way that’s clear and helpful. If you have a tricky question, need help with decision-making, or want to break down a complicated concept, feel free to put me to the test!

pixxelkick@lemmy.world · 3 days ago

The principle that one shot prompts are pretty critical for logic puzzles is well established at this point, has been for well over a year now.

Like I said, this is like someone dragging their lawmower out onto the lawn without starting it, and then proclaiming lawnmowers suck cuz their lawn didnt get cut.

You have to start the thing for it to work, mate, lol.

I get that itd be nice if you didnt have to, but thats not how an LLM works, LLMs are predictive text algorithms which means they need something to start predicting off of as a starting point, thats like their whole shtick.

If you dont give them a solid starting point to work from, you are literally just rolling the dice on if it’ll do what you want or not, because Zero shot prompting is going full “jesus take the wheel” mode on the algorithm.

It’s annoying that marketing and consumers have created this very wrong perception about “what” an LLM is.

When you asks someone “knock knock” and they respond with “who’s there?” thats all an LLM is doing, it’s just predicting what text outta come up statistically next.

If you dont establish a precedent, you’re going full RNGjesus on praying it choose the correct direction

And more important, and I CANNOT stress this enough…

Once an LLM gets the answer wrong, if you keep chasing that thread, it will continue to keep behaving wrong

Because you’ve established the pattern now in that thread that “User B is an idiot”, and told it its wrong, and that means its gonna now keep generating the content of what a wrong/stupid responder would sound like

Consider this thought experiment, if you will:

If I hand a person the incomplete text of a play where 2 characters are talking to each other, A and B, and the entire text is B saying dumb shit and A correcting B, and I ask that person to add some more content to the end of what I’ve got so far, “finish this” so to say, do you think they’re gonna suddenly pivot to B no longer being an idiot?

Or… do you think it’s more likely they’ll keep the pattern going I have established, and continue to make B sound stupid for A to correct them on?

Probably the latter, right?

Thats all an LLM is, so if you already have 3 instances of you telling the LLM “No thats wrong you are dumb”, guess what?

You have literally conditioned it now to get even dumber, so its gonna respond with even more wrong responses, because you’re chasing that thread.

RichardDegenne@lemmy.zip · 3 days ago

It’s annoying that marketing and consumers have created this very wrong perception about “what” an LLM is.

On the one hand, yes.

On the other hand, if it needs me to hold its hand on top of the electric power of a small city to solve a fucking zebra puzzle, then I don’t care about it or how it works, it is not useful technology.

ZDL@lazysoci.al · 3 days ago

LLMs are not self aware, any random nonsense they generate about themselves is not remotely reliable as a source of truth.

It’s almost as if I knew that.

(keeping in mind that all of this comes from the LLMbecile itself … so it may be hallucinated! 🤣)

Almost.

So, "reasoning" LLMs can't reason. (Hot news flash, I know!)

So, "reasoning" LLMs can't reason. (Hot news flash, I know!)

Why AI Reasoning Models Struggle with Complex Logic Puzzles

What This Means for Users

Bottom Line