What Is the Law? A System for Statutory Research (STARA) with Large Language Models

Abstract

Each year thousands of bills are enacted into law by Congress, state legislatures, and municipalities. As new laws are enacted, legal codes have grown in length and complexity to such an extent that, in many domains, even the government lacks systematic knowledge of what the law is. This challenge has repeatedly stymied legal reform and research efforts. To address the problem, we develop the Statutory Research Assistant (STARA), an automated system capable of performing accurate “statutory surveys”---compilations of all provisions relevant to a particular legal issue, together with detailed annotations and reasoning. We validate STARA’s accuracy on three existing human-compiled surveys, showing that it can reproduce them with high fidelity and in many cases identify hundreds of previously undiscovered provisions. We address the unique challenges of automated statutory research, and show that STARA’s domain-specific architecture yields substantial improvements in accuracy over off-the-shelf language models. STARA can dramatically reduce the time for discerning the law, and we discuss its considerable implications for legal reform, academic research, and transparency. We make STARA’s compiled statutory surveys available to the public and the tool available to researchers upon request.

System Design and Results

Figure: Left: Illustration of the STARA system’s components. Right: STARA’s contribution to statutory research, showing the number of newly-documented provisions (and percentage improvement over previous datasets) across three validation tasks: congressionally-mandated reports and criminal statutes in the United States Code, and commissions established by the San Francisco Municipal Code.

System	# Found	Precision	Recall
STARA	1,983	0.98	0.998
Gemini Deep Research	282	0.890	0.144
OpenAI Deep Research	84	0.881	0.044
Westlaw AI Jurisdictional Survey	113	0.415	0.072

Table: Comparison of STARA with other AI research systems on federal criminal statutes task. STARA locates 7 times as many provisions as the best-performing comparison system.

Configuration	Precision	Recall	Extraction
LLM Baseline	0.58	0.990	0.28
+Prompt Engineering	0.76	0.987	0.32
+Basic Context	0.96	0.984	0.70
STARA	0.96	0.998	0.76

Table: Ablation study results on federal criminal statutes task, demonstrating the importance of different components of STARA’s architecture. Extraction measures accuracy and completeness in identifying offense descriptions and penalties.

Acknowledgments

We thank Ananya Karthik, Christopher D. Manning, Neel Guha, Emaan Hariri, Lucia Zheng, Isabel Gallegos, Mirac Suzgun, Elena Eneva, Jonathan Hennessy, Erin Maneri, Derek Ouyang, Kit Rodolfa, and Andrea Vallebueno for helpful feedback and comments; and David Chiu, Jon Givner, Andrea Bruss, Leah Granger, and Rebekah Krell for the collaboration.

BibTeX

@article{suranigailmard2025,
  title={What Is the Law? A System for Statutory Research (STARA) with Large Language Models},
  author={Surani, Faiz and Gailmard, Lindsey A and Casasola, Allison and Magesh, Varun and Robitschek, Emily J and Ho, Daniel E},
  journal={Proceedings of the 20th International Conference on Artificial Intelligence and Law},
  year={2025}
  url={https://dho.stanford.edu/wp-content/uploads/STARA.pdf}
}