AI has been rapidly developing since the last decade. Large language models have played a significant role in this development. Recently, research from the University of Warwick and Princeton University has made a remarkable discovery that can enhance the ability of LLMs.
Zhu Jian-Qiao, a postdoc in the Computational Cognitive Science Lab at Princeton University, USA Posted on his X said, “Large language models show promise as cognitive models. The behaviors they produce often mirror human behaviors, suggesting we might gain insight into human cognition by studying LLMs. But why do LLMs behave like humans at all?“

Challenges
The study found that LLMs perform similarly to humans in cognitive tasks, often making judgments and decisions that deviate from rational norms. However, there are significant challenges to using LLMs as models of human cognition. LLMs are trained on much larger datasets than humans, and the origins of the behavioural similarities between LLMs and humans are unclear.
To address these challenges, the researchers proposed two strategies. First, they suggested using computationally equivalent tasks that both LLMs and rational agents must master for cognitive problem-solving. Second, they proposed examining the task distributions required for LLMs to exhibit human-like behaviours.
The researchers tested their approach by pretraining an LLM, called Arithmetic-GPT, on an ecologically valid arithmetic dataset. They then used this model to predict human behaviour in decision-making tasks, specifically risky and intertemporal choices. The results showed that Arithmetic-GPT, pre-trained on an ecologically valid synthetic dataset, predicted human choices more accurately than many traditional cognitive models.
The study also addressed the challenge of using LLMs as cognitive models by defining a data generation algorithm for creating synthetic datasets and gaining access to neural activation patterns crucial for decision-making. The researchers used a small LM with a Generative Pretrained Transformer (GPT) architecture and pre trained it on arithmetic tasks. They generated synthetic datasets reflecting realistic probabilities and values for training.

Implications for Machine Learning
The study’s results are important for the field of Machine Learning (ML). The new approach helps solve some of the main problems with using LLMs as models of human thinking. It also gives us useful information about how well LLMs can be used for this purpose. The study shows that using synthetic data that is similar to real-life situations is important for training LLMs to make decisions like humans.

The study by researchers from Princeton University and the University of Warwick proposes a novel approach to enhance the utility of LLMs as cognitive models. The findings demonstrate that LLMs, specifically Arithmetic-GPT pre-trained on ecologically valid synthetic datasets, can closely model human cognitive behaviours in decision-making tasks. The study’s robustness is confirmed through extensive validation techniques, highlighting the potential of LLMs as cognitive models and the importance of using ecologically valid synthetic datasets for pertaining.
FineWeb: 15 Trillion Token Dataset Redefines LLM Pretraining (Hugging Face)