We know that artificial intelligence (AI) can’t think the same way as a person, but new research has revealed how this difference might affect AI’s decision-making, leading to real-world ramifications humans might be unprepared for.
The study, published Feb. 2025 in the journal Transactions on Machine Learning Research, examined how well large language models (LLMs) can form analogies.
They found that in both simple letter-string analogies and digital matrix problems — where the task was to complete a matrix by identifying the missing digit — humans performed well but AI performance declined sharply.
While testing the robustness of humans and AI models on story-based analogy problems, the study found the models were susceptible to answer-order effects — differences in responses due to the order of treatments in an experiment — and may have also been more likely to paraphrase.
Altogether, the study concluded that AI models lack “zero-shot” learning abilities, where a learner observes samples from classes that weren’t present during training and makes predictions about the class they belong to according to the question.
Related: Punishing AI doesn’t stop it from lying and cheating — it just makes it hide better, study shows
Co-author of the study Martha Lewis, assistant professor of neurosymbolic AI at the University of Amsterdam, gave an example of how AI can’t perform analogical reasoning as well as humans in letter string problems.
“Letter string analogies have the form of ‘if abcd goes to abce, what does ijkl go to?’ Most humans will answer ‘ijkm’, and [AI] tends to give this response too,” Lewis told Live Science. “But another problem might be ‘if abbcd goes to abcd, what does ijkkl go to? Humans will tend to answer ‘ijkl’ – the pattern is to remove the repeated element. But GPT-4 tends to get problems [like these] wrong.”
Why it matters that AI can’t think like humans
Lewis said that while we can abstract from specific patterns to more general rules, LLMs don’t have that capability. “They’re good at identifying and matching patterns, but not at generalizing from those patterns.”
Most AI applications rely to some extent on volume — the more training data is available, the more patterns are identified. But Lewis stressed pattern-matching and abstraction aren’t the same thing. “It’s less about what’s in the data, and more about how data is used,” she added.
To give a sense of the implications, AI is increasingly used in the legal sphere for research, case law analysis and sentencing recommendations. But with a lower ability to make analogies, it may fail to recognize how legal precedents apply to slightly different cases when they arise.
Given this lack of robustness might affect real-world outcomes, the study pointed out that this served as evidence that we need to carefully evaluate AI systems not just for accuracy but also for robustness in their cognitive capabilities.