MoCa: Cognitive Scaffolding for Language Models in Causal and Moral Judgment Tasks

Abstract

Human common sense understanding of the physical and social world is organized around intuitive theories. Two key building blocks of these intuitive theories are causality and morality. Causal and moral judgments come naturally to people: who did what, and why? There is a rich literature in psychology and cognitive science that has studied people’s causal and moral intuitions. This work has revealed a number of factors that systematically influence people’s judgments, such as the presence of norms, and whether the agent was aware of their action’s potential consequences. Here, we investigate whether large language models (LLMs) make causal and moral judgments about text-based scenarios that align with those of human participants. We find that without any annotations, LLMs and human participants are misaligned (only 56%-60% agreement). However, LLMs can accurately annotate what factors are present in a scenario with simple expert-written instructions. We show how these annotations can guide LLMs to match participants’ judgments more closely (69.7%-72% agreement). These results suggest that insights from cognitive science can help scaffold language models to align more closely with human intuitions in a challenging common-sense evaluation task.

Publication
Proceedings of the 39 th International Conference on Machine Learning, Baltimore, Maryland, USA,