Sunday, June 09, 2024

The simple problems that AI can't handle

It's interesting to hear that AI can have problems answering some pretty simple logic problems, problems that humans typically have no problems with.

The problems in question are often referred to as Alice In Wonderland (AIW) problems, and are usually stated as follows: "Alice has four sisters, and she also has a brother. How many sisters does Alice's brother have?" Humans, even relatively young humans, typically have no difficulty in figuring out that the brother has four sisters plus Alice herself, i.e. five.

As often as not, though, AI applications like ChatGPT, Opera's Claude 3, and Meta's Llama 3 get this wrong, often offering confident and detailed step-by-step workings and explanations of its erroneous thinking (one even with a drum roll!) 

OpenAI's new GPT4o model had the highest success rate, but even that only achieved about 65% success. Claude 3 only managed a 43% success rate, while the best version of Llama languished at just 30%. Google's Gemini Pro only got it right 0.8% of the time, so the less said about that the better.

It's not quite clear why they have such difficulty with these apparently simple tasks. The AI systems' official ratings for problem-solving ability is 88%, 87%, 64% and 72% respectively, but that does not seem to be reflected with these particular problems. Maybe the problem-solving evaluation needs to be re-evaluated?

No comments: