Excerpted From: Jason M. Chin, Alex O. Holcombe, Kathryn Zeiler, Patrick S. Forscher and Ann Guo, Metaresearch, Psychology, and Law: A Case Study on Implicit Bias, 56 Connecticut Law Review 225 (December, 2023) (416 Footnotes) (Full Document)

Chinetal.jpegLaw and psychology are natural partners. Thinking, feeling, and behaving are at the heart of many disputes that legal systems seek to regulate. Accordingly, the promise of psychologically informed law and policy often captures the attention of legal scholars, policymakers, and courts. Although psychology research has much to offer to law, the limits of its usefulness are equally important to understand and recognize. Yet, those limits appear to have received far less attention.

Disagreements about whether psychology findings are sufficiently researched, tested, and agreed upon to inform the legal system date back to the beginnings of legal psychology. A little over a century ago, John Henry Wigmore, a scholar and teacher of evidence, wrote a scathing critique of the work of Hugo Münsterberg and the then-fledgling field of legal psychology. Münsterberg, who has been referred to as the “father of applied psychology,” was a psychologist at Harvard and a highly public figure at the time for his involvement in several notorious legal cases.

Wigmore's article was in large part inspired by Münsterberg's highly publicized work in criminal cases across the United States. For example, in a 1907 murder trial, Münsterberg administered a battery of psychological tests, such as timed word association tasks, on a key witness. He concluded that the witness was not intentionally lying. Münsterberg leveraged this fieldwork to advocate for a larger role for psychology in law. For instance, in a book of reflections on his role in various legal matters, Münsterberg argued that there was little about a trial that could not be improved through the involvement of psychology research:

There is thus really no doubt that experimental psychology can furnish amply everything which the court demands: it can register objectively the symptoms of the emotions and ... it can trace emotions through involuntary movements, breathing, pulse, and so on, where ordinary observation fails entirely.

Given this sort of claim, Münsterberg predictably attracted critics, prominent among them was Wigmore in his 1908 article, “Professor Muensterberg and the Psychology of Testimony Being a Report of the Case of Cokestone v. Muensterberg.” That article, although satirical, raised questions that still endure. For instance, Wigmore asked, “[d]oes this new method give a safe criterion for testing the individual witness?” This challenge of applying group level research to individuals has been the focus of a great deal of recent legal-scientific research. Wigmore also highlighted concerns of whether psychological testing was sufficiently “exact and precise” and whether psychological claims had reached general acceptance in the field; both being subjects of modern study. And there also remains the problem demonstrated by Münsterberg's own testimony of psychology experts providing exaggerated and unsupportable claims to advance a party's interest.

In retrospect, it is unsurprising that a great deal of psychology research would fall short at the time of Wigmore's missive. The practice of conducting systematic empirical studies to test research questions in psychology was only a few decades old. The succeeding one hundred ten years have seen a great deal of empirical research and theoretical development in legal psychology. Alongside new findings, researchers have developed new frameworks to understand when an empirical psychology claim that is susceptible to scrutiny has withstood criticism and when it is more likely to generalize to a population of interest. Many of these insights were inspired by what has been called a “credibility revolution” in psychology that responded to the failure of many prominent studies to replicate their findings (sometimes referred to as a “crisis”). Large-scale replication studies, error detection, and general research on research practices are ongoing in a field referred to as “metaresearch” or “metascience.” These developments--those that focus on the study of research methods themselves--are the focus of our Article.

Yet, legal scholars do not regularly acknowledge that psychological evidence can be affected by problems that undermine its credibility, nor do they often reference the burgeoning metaresearch that studies--and sometimes quantifies-- those problems. This opens up the possibility that legal scholars, policymakers, and judges are using psychology findings as rhetorical support for pre-existing policy preferences, without regard as to the underlying strength of evidence. We seek to address this problem by evaluating the general limits of psychology research applied to law and policy. We go on to systematically examine whether law journal article authors adequately acknowledge the limits of psychology research in various literatures, namely studies that explore the interpretation of implicit bias measures, the correlation between implicit bias measures and behavior, and the effectiveness of interventions aimed at reducing implicit bias and problematic behavior presumably caused by implicit bias. We find that acknowledgement is lacking and suggest several avenues toward developing a better approach to applying psychology research to law and within legal institutions.

We use implicit bias intervention experiments merely as an example of a wider phenomenon in experimental psychology. In fact, metascience studies in most or all experimental fields have uncovered similar problems. These other fields, however, are beyond our scope. We focus on implicit bias interventions for several reasons. First and most importantly, robust evidence suggests that racial disparities exist across many critical dimensions, including healthcare treatment, exposure to pollution, and labor market and criminal justice system outcomes. We believe that developing effective interventions to address the causes of disparities is urgently important. Second, recently published metaresearch stands as strong evidence that the experimental literature that studies implicit bias interventions is plagued by publication bias and other methodological issues, calling into question the findings' reliability. Third, many normative scholars have failed to consider or even acknowledge the limitations of such findings. Grounding normative claims in unreliable experimental findings related to implicit bias, its effect on behavior, and interventions to reduce bias and problematic behavior hinders the development of effective interventions. The questionable reliability of implicit bias research does not imply that we should halt the study of discrimination or disparities, however. It implies the opposite. Implicit bias is but one possible explanation for disparities. We believe that solutions grounded in unreliable implicit bias research have distracted us from exploring alternative approaches and that we have a better chance of understanding disparities and remedying discrimination by turning our attention to such alternatives.

This Article proceeds as follows. In Part I, we examine the limits of the application of psychology research to legal issues. As noted, we focus on findings and analyses from the last decade of metaresearch in psychology and statistics. This includes research exploring the effects of using research designs with small sample sizes and undisclosed flexibility in analysis and data collection. We also consider research studying threats to the generalizability of psychology findings, such as psychology's overreliance on college student samples and other samples of convenience.

Part II then seeks to apply that knowledge to a specific set of psychology findings related to implicit bias, which was initially presented as having vast consequences for the legal system. The term “implicit bias” is meant to capture the possibility that people may act in a certain way due to “automatically activated associations about social groups” that, according to some accounts, evade conscious awareness. Law professors and other legal commentators have rung warning bells about implicit bias in the legal system for years, claiming that it affects the behaviors of lawyers, police, employers, judges, mediators, and jurors. In response, some jurisdictions mandate that judges engage in training to reduce their implicit bias. Commentators also sometimes characterize the research supporting such training as remarkably robust in a way reminiscent of Münsterberg's claims:

While experts may disagree about the role such research should play in employment litigation, the dispute is not over the validity of the research findings themselves. Specifically, there is agreement on the last of these four findings: implicit bias can be counteracted, interrupted, or corrected to prevent or reduce its impact on employment decisions.

The implicit bias literature is a useful and potentially important case study because the published views of many legal commentators diverge sharply from a view guided by the findings summarized in Part I. In other words, a great deal of implicit bias research carries hallmarks of unreliability, and commentators have failed to note this when devising policy recommendations based on the existing body of research. Part II highlights the importance of caution among legal researchers and actors when evaluating research and proposing reforms based on it. These aims are especially important when it comes to addressing discrimination in the legal system, a problem so significant that proposed solutions must not be grounded in foundational research that has failed to stand up to rigorous scrutiny. We demonstrate that the shaky evidence base requires consideration of alternative approaches if we hope to effectively address discriminatory behavior. The implicit bias literature also provides a cautionary tale, with Part II detailing how well-intentioned scholars in psychology and law rushed to conclusions. They ignored many potential frailties in the evidence base and have advocated for solutions that have not found strong empirical support.

Accordingly, Part III examines how scientific findings related to implicit bias are presented in law journals. Specifically, we conducted a systematic study of recent (2017-2021) mentions of “implicit bias training” in law journals indexed on HeinOnline, a widely used legal database. We chose mentions of training for two reasons. First, scant research finds that interventions aimed at reducing implicit bias are effective. Therefore, we would expect that if legal researchers accurately describe the research, they would be cautious when discussing implicit bias training. And second, whereas implicit bias itself is a relatively complicated construct that builds on a great deal of past research with varying degrees of research support over the past decade, studies on training are more straightforward. Specifically, there has never been strong support that interventions aimed at reducing implicit bias are effective, and so it would be especially problematic if legal scholars and practitioners did not acknowledge this limitation. We find that very few, only eight, of the fifty-eight law journal articles that recommend such interventions acknowledge reliability concerns related to the experimental psychology literature cited to support the recommendations. This is highly problematic. Consumers of these law journal articles, including those in positions to implement interventions who do not have an independent understanding of the scientific literature, have possibly been misled.

In Part IV, we chart a path forward. What can psychologists, especially legal psychologists, do to make clearer the limits of their work? Here, we suggest tangible reforms, such as including “constraints on generality” statements in empirical work. These statements express the researcher's view of the scope of their finding, such as the populations and contexts to which the finding can be applied. Such statements may assist legal researchers who do not have the expertise to know which results are likely to generalize, and which are not. We also provide recommendations to legal scholars, lawyers, and others whose primary field is law.

Part V concludes on an optimistic note. Although many of the findings we discuss throughout this article are so tenuous that they are not yet ready to inform law and policy, psychology and law can be a fruitful partnership. Psychology experiments have, for example, provided useful demonstrations that human memory is reconstructive and that people, in some conditions, report rich false memories. They have also provided evidence about the factors that drive the persuasiveness of expert witnesses. To leverage psychology's capacity to inform crucial questions of law and policy, Part V lays out a metaresearch agenda for law and psychology. That is, we propose a research plan for how the field may begin to study its own methods more carefully, so that it can improve and fulfill its potential.

[. . .]

The field of psychology has much to offer those interested in reforming law. Recent metaresearch findings, however, give legal scholars, judges, and policy-setting bodies ample reason to be cautious when applying findings from psychology experiments (and other empirical fields When findings that have not yet been severely tested are used to bolster claims, it is critical that indicia of reliability be clearly communicated and appropriately used to weigh the strength of the evidence in support of or against reform proposals. In cases where claims have been severely tested and have been found seriously lacking, alternative interventions must be considered, especially for vitally important issues like racism and discrimination. Our analysis here should not be taken to imply that we should necessarily abandon implicit bias training, nor should it be taken to imply that effective interventions are not possible. While we wait for scientific fields to produce reliable evidence that passes muster when severely critical appraisal methods are applied, interventions should be developed based on plausible theories, and outcomes should be systematically monitored to determine effectiveness. Effectiveness of interventions designed using unreliable evidence should never be assumed.

An analysis of where discrimination research in behavioral science should go from here is important but beyond our scope. We note, however, that others have suggested possible next steps including a call for researchers to conduct more large, transparent, multi-period field studies; to develop norms around preregistration of lab and field experiments; to commit to more transparent reporting of relevant outcome measures, less ad hoc data exclusion, full data transparency, and collection of adequate samples from multiple and diverse subject pools; and to study possible interventions beyond nudge-like interventions. It may be useful for some of these large studies to target key assumptions of the implicit bias account of discrimination. Others have suggested shifting focus away from thoughts and behaviors of individuals and toward the systems in which individuals operate to explore other causes of disparities beyond the narrow realm of implicit bias. Still others suggest that future research should focus on the ways in which contextual factors influence IAT measures.

In the meantime, should we halt all implicit bias training? Not necessarily. We must, however, weigh the costs, including opportunity costs, of continuing such trainings against the expected benefits with a full understanding of the limitations of the existing evidence. Professor Greenwald, one of the developers of the IAT, along with twenty-four colleagues, recently acknowledged the dearth of evidence supporting the claim that interventions tested in laboratory experiments cause behavioral changes. They note that while “[m]any offerings of implicit bias training are successful in producing some education[,] ... there is no reason to expect that diversity 'trainers' (who might more properly be called 'diversity educators') can achieve what researchers cannot produce empirically.”

While our main contribution might seem quite negative, we believe that the future of psychology and its legal applications is bright. Much has improved since Münsterberg's day; speculative claims about the accuracy of diagnostic tools have been replaced gradually by controlled studies. That said, seemingly more sophisticated studies present dangers of their own, especially studies designed to address important societal issues. Although the field of experimental psychology is undergoing a credibility revolution, and we believe that a time will come when the credibility levels of findings will be evident to all, we need an interim approach. Those who apply modern psychological science must adequately acknowledge the field's limitations. We hope that the light we have shed on normative scholars' reliance on unreliable research will spark a change in author and journal editor norms around the use of scientific evidence to support normative claims.

In line with the general credibility movement, we call for the development of a metaresearch agenda for law and psychology. Researchers are increasingly (1) systematically studying their own methods, (2) developing empirically informed ways of improving those methods, and (3) testing whether those new practices have the desired effect. Metaresearch studies focusing specifically on experimental psychology have informed the broader movement. We believe that it is time for law and psychology to turn inward.

What would a metaresearch agenda in law and psychology look like? We propose beginning with systematic audits of the existing literature. Audits would measure current research and reporting practices, generally following our review discussed in Part I (which is itself limited in that it mostly relies on examples from the literature rather than a systematic survey of the field). For example, metaresearchers have established protocols for measuring data sharing and reproducibility to ascertain how often studies report sufficient information on employed research methods to allow others to verify their findings. Audits should also measure how often researchers justify their sample sizes and provide transparent effect size calculations. With these data in hand, interventions to improve reporting can be implemented, and users of research will have sufficient information to assess the general credibility of particular fields of research and to know when they should provide warnings. Interventions might include guidelines promoting cautions about generalizations, statements explaining why effect sizes may or may not accumulate, and warnings about publication bias. As interventions are implemented, researchers will be able to study whether their intended effects have manifested.

Given the applied nature of law and psychology, the field's metaresearch agenda must consider the research translation efforts of legal scholars, courts, and policy-setting bodies. We urge metaresearchers to build on our meta-study by further systematically studying vulnerabilities in the research-to-action pipeline. Important and unanswered questions include whether researchers cite studies that have been retracted or contradicted by large-scale replication studies and whether courts and policy-setting bodies put more weight on registered studies with larger sample sizes. These studies also can be repeated in future years as new methods for transparently presenting strengths and weaknesses to users of research are developed.

Finally, greater collaboration between metaresearchers, legal psychologists, and research users can improve law and psychology. Users of research have valuable knowledge about, for instance, currently understudied contexts and specific effect sizes that are practically relevant to their decisions. Metaresearchers and other methodology specialists can boost the usefulness of their research by communicating and collaborating with users to develop research questions and experiment designs. Finally, as even Wigmore eventually acknowledged, if legal psychologists reform their research practices, they can become integral to bridging the gap between questions of law and research paradigms that help answer those questions.

Australian National University College of Law.

University of Sydney School of Psychology.

Boston University School of Law.

Busara Center for Behavioral Economics.

Faculty of Medicine Dentistry and Health Sciences, University of Melbourne (MD candidate).

Correspondence should be addressed to Jason M. Chin; E-mail: This email address is being protected from spambots. You need JavaScript enabled to view it..