But, to be perfectly honest… I think there’s part of this proposal that has merit, and part of this proposal that might sound good to many people but is actually bad.
First, the bad: The notion that “Law is a computational engine that converts human values into legible directives” is wrong. Legibility is not an inherent property of the directives. It is a property of the directives with respect to the one interpreting them, which in the case of law is humans. If you build an AI that doesn’t try to follow the spirit of the law in a human-recognizable way, the law will not be legible in the way you want.
The notion that it would be good to build AI that humans direct by the same process that we currently create laws is wrong. Such a process works for laws, specifically for laws for humans, but the process is tailored to the way we currently apply it in many ways large and small, and has numerous flaws even for that purpose (as you mention, about expressions of power).
Then, the good: Law offers a lot of training data that directly bears on what what humans value, what vague statements of standards mean in practice, and what humans think good reasoning looks like. The “legible” law can’t be used directly, but it can be used as a yardstick against which to learn the illegible spirit of the law. This research direction does not look like a Bold New Way to do AI alignment, instead it looks like a Somewhat Bold New Way to apply AI alignment work that is fully contiguous with other alignment research (e.g. attempts to learn human preferences by actively asking humans).
I will check out Xuan’s talk. Thanks for sharing that.
Instead of:
Law is a computational engine that converts human values into legible directives.
I could expand the statement to cover the larger project of what we are working on:
Law and legal interpretation form a computational engine that converts human values into legible directives.
One of the primary goals of this research agenda is to teach AI to follow the spirit of the law in a human-recognizable way. This entails leveraging existing human capabilities for the “law-making” / “contract-drafting” part (how do we use the theory and practice of law about how to tell agents what to do?), and conducting research on building AI capabilities for the interpretation part (how do our machine learning processes use data and processes from the theory and practice of law about how agents interpret those directives / contracts?).
Reinforcement learning with human attorney feedback (there are more than 1.3 million lawyers in the US) via natural language interactions with AI models is potentially a powerful process to teach (through training, or fine-tuning, or extraction of templates for in-context prompting of large language models) statutory interpretation, argumentation, and case-based reasoning, which can then be applied more broadly for AI alignment. Models could be trained to assist human attorney evaluators, which theoretically, in partnership with the humans, could allow the combined human-AI evaluation team to have capabilities that surpass the legal understanding of the expert humans alone.
The Foundation Models in use today, e.g., GPT-3, have, effectively, conducted a form of behavioral cloning on a large portion of the Internet to leverage billions of human actions (through natural language expressions). It may be possible to, similarly, leverage billions of human legal data points to build Law Foundation Models through large-scale language model self-supervision on pre-processed legal text data.
Aspects of legal standards, and the “spirit” of the law, can be learned directly from legal data. We could also codify examples of human and corporate behavior exhibiting standards such as fiduciary duty into a structured format to evaluate the standards-understanding capabilities of AI models. The legal data available for AI systems to learn from, or be evaluated on, includes textual data from all types of law (constitutional, statutory, administrative, case, and contractual), legal training tools (e.g., bar exam outlines, casebooks, and software for teaching the casuistic approach), rule-based legal reasoning programs, and human-in-the-loop live feedback from law and policy human experts. The latter two could simulate state-action-reward spaces for AI fine-tuning or validation, and the former could be processed to do so.
Automated data curation processes to convert textual legal data into either state-action-reward tuples, or contextual constraints for shaping candidate action choices conditional on the state, is an important frontier in this research agenda (and promising for application to case law text data, contracts, and legal training materials). General AI capabilities research has recently found that learning from textual descriptions, rather than direct instruction, may allow models to learn reward functions that better generalize. Fortunately, much of law is embedded more in the form of descriptions and standards than it is in the form of direct instructions and specific rules. Descriptions of the application of standards provides a rich and large surface area to learn from.
Textual data can be curated and labeled for these purposes. We will aim for two outcomes with this labeling. First, data that can be used to evaluate how well AI models understand legal standards. Second, the possibility that the initial “gold-standard” human expert labeled data can be used to generate additional much larger sets of data through automated curation and processing of full corpora of legal text, and through model interaction with human feedback.
I think your statement:
“This research direction does not look like a Bold New Way to do AI alignment, instead it looks like a Somewhat Bold New Way to apply AI alignment work that is fully contiguous with other alignment research”
is spot on. That is how I was thinking about it, but I should have made that more clear; perhaps I should work on a follow-up post at some point that explicitly explores the intersections of Law Informs Code with other strands of alignment research. Some of this is in the longer form version of this post, but with this inspiration from you, I may try to go further in that direction (although I am already beyond the length the Journal editors want!).
Thanks for your thorough response, and yeah, I’m broadly on board with all that. I think learning from detailed text behind decisions, not just the single-bit decision itself, is a great idea that can leverage a lot of recent work.
I don’t think that using modern ML to create a model of legal text is directly promising from an alignment standpoint, but by holding out some of your dataset (e.g. a random sample, or all decisions about a specific topic, or all decisions later than 2021), you can test the generalization properties of the model, and more importantly test interventions intended to improve those properties.
I don’t think we have that great a grasp right now on how to use human feedback to get models to generalize to situations the humans themselves can’t navigate. This is actually a good situation for sandwiching: suppose most text about a specific topic (e.g. use of a specific technology) is held back from the training set, and the model starts out bad at predicting that text. Could we leverage human feedback from non-experts in those cases (potentially even humans who start out basically ignorant about the topic) to help the model generalize better than those humans could alone? This is an intermediate goal that it would be great to advance towards.
Presumably you’re aware of various Dylan Hadfield-Menell papers, e.g. https://dl.acm.org/doi/10.1145/3514094.3534130 , https://dl.acm.org/doi/10.1145/3306618.3314258 , https://dl.acm.org/doi/10.1145/3514094.3534130
And of course Xuan’s talk ( https://www.lesswrong.com/posts/Cty2rSMut483QgBQ2/what-should-ai-owe-to-us-accountable-and-aligned-ai-systems )
But, to be perfectly honest… I think there’s part of this proposal that has merit, and part of this proposal that might sound good to many people but is actually bad.
First, the bad: The notion that “Law is a computational engine that converts human values into legible directives” is wrong. Legibility is not an inherent property of the directives. It is a property of the directives with respect to the one interpreting them, which in the case of law is humans. If you build an AI that doesn’t try to follow the spirit of the law in a human-recognizable way, the law will not be legible in the way you want.
The notion that it would be good to build AI that humans direct by the same process that we currently create laws is wrong. Such a process works for laws, specifically for laws for humans, but the process is tailored to the way we currently apply it in many ways large and small, and has numerous flaws even for that purpose (as you mention, about expressions of power).
Then, the good: Law offers a lot of training data that directly bears on what what humans value, what vague statements of standards mean in practice, and what humans think good reasoning looks like. The “legible” law can’t be used directly, but it can be used as a yardstick against which to learn the illegible spirit of the law. This research direction does not look like a Bold New Way to do AI alignment, instead it looks like a Somewhat Bold New Way to apply AI alignment work that is fully contiguous with other alignment research (e.g. attempts to learn human preferences by actively asking humans).
Hi Charlie, thank you for your comment.
I cite many of Dylan’s papers in the longer form version of this post: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4218031
I will check out Xuan’s talk. Thanks for sharing that.
Instead of:
I could expand the statement to cover the larger project of what we are working on:
One of the primary goals of this research agenda is to teach AI to follow the spirit of the law in a human-recognizable way. This entails leveraging existing human capabilities for the “law-making” / “contract-drafting” part (how do we use the theory and practice of law about how to tell agents what to do?), and conducting research on building AI capabilities for the interpretation part (how do our machine learning processes use data and processes from the theory and practice of law about how agents interpret those directives / contracts?).
Reinforcement learning with human attorney feedback (there are more than 1.3 million lawyers in the US) via natural language interactions with AI models is potentially a powerful process to teach (through training, or fine-tuning, or extraction of templates for in-context prompting of large language models) statutory interpretation, argumentation, and case-based reasoning, which can then be applied more broadly for AI alignment. Models could be trained to assist human attorney evaluators, which theoretically, in partnership with the humans, could allow the combined human-AI evaluation team to have capabilities that surpass the legal understanding of the expert humans alone.
The Foundation Models in use today, e.g., GPT-3, have, effectively, conducted a form of behavioral cloning on a large portion of the Internet to leverage billions of human actions (through natural language expressions). It may be possible to, similarly, leverage billions of human legal data points to build Law Foundation Models through large-scale language model self-supervision on pre-processed legal text data.
Aspects of legal standards, and the “spirit” of the law, can be learned directly from legal data. We could also codify examples of human and corporate behavior exhibiting standards such as fiduciary duty into a structured format to evaluate the standards-understanding capabilities of AI models. The legal data available for AI systems to learn from, or be evaluated on, includes textual data from all types of law (constitutional, statutory, administrative, case, and contractual), legal training tools (e.g., bar exam outlines, casebooks, and software for teaching the casuistic approach), rule-based legal reasoning programs, and human-in-the-loop live feedback from law and policy human experts. The latter two could simulate state-action-reward spaces for AI fine-tuning or validation, and the former could be processed to do so.
Automated data curation processes to convert textual legal data into either state-action-reward tuples, or contextual constraints for shaping candidate action choices conditional on the state, is an important frontier in this research agenda (and promising for application to case law text data, contracts, and legal training materials). General AI capabilities research has recently found that learning from textual descriptions, rather than direct instruction, may allow models to learn reward functions that better generalize. Fortunately, much of law is embedded more in the form of descriptions and standards than it is in the form of direct instructions and specific rules. Descriptions of the application of standards provides a rich and large surface area to learn from.
Textual data can be curated and labeled for these purposes. We will aim for two outcomes with this labeling. First, data that can be used to evaluate how well AI models understand legal standards. Second, the possibility that the initial “gold-standard” human expert labeled data can be used to generate additional much larger sets of data through automated curation and processing of full corpora of legal text, and through model interaction with human feedback.
I think your statement:
is spot on. That is how I was thinking about it, but I should have made that more clear; perhaps I should work on a follow-up post at some point that explicitly explores the intersections of Law Informs Code with other strands of alignment research. Some of this is in the longer form version of this post, but with this inspiration from you, I may try to go further in that direction (although I am already beyond the length the Journal editors want!).
Thanks for your thorough response, and yeah, I’m broadly on board with all that. I think learning from detailed text behind decisions, not just the single-bit decision itself, is a great idea that can leverage a lot of recent work.
I don’t think that using modern ML to create a model of legal text is directly promising from an alignment standpoint, but by holding out some of your dataset (e.g. a random sample, or all decisions about a specific topic, or all decisions later than 2021), you can test the generalization properties of the model, and more importantly test interventions intended to improve those properties.
I don’t think we have that great a grasp right now on how to use human feedback to get models to generalize to situations the humans themselves can’t navigate. This is actually a good situation for sandwiching: suppose most text about a specific topic (e.g. use of a specific technology) is held back from the training set, and the model starts out bad at predicting that text. Could we leverage human feedback from non-experts in those cases (potentially even humans who start out basically ignorant about the topic) to help the model generalize better than those humans could alone? This is an intermediate goal that it would be great to advance towards.
Interesting. I will think more about the sandwiching approach between non-legal experts and legal experts.