We Ran an Alignment Workshop

TL;DR:

AI Safety at UCLA ran an alignment workshop for community members interested in testing their aptitude for theoretical alignment research. Attendees spent the weekend trying to make progress on these open problems.

We highly recommend other university groups hold similar < 15-person object-level workshops that focus on difficult yet tractable problems in AI Safety.

Why did we run Define, Design, and Align?

Despite there being a wide range of theoretical alignment proposals, we felt that there were surprisingly few structured opportunities for people to engage and build on these ideas. Furthermore, progress in theoretical alignment can feel nebulous, especially when compared to more prosaic alignment research, which we believe can deter potential researchers before they even begin engaging with the problems.

We aimed for participants to:

Actively spend many hours thinking deeply about problems in theoretical alignment
Test their aptitude by gaining feedback from industry professionals
Collaborate with other talented students working on AI safety

What did the weekend look like?

The retreat lasted from Friday evening to Sunday afternoon and had 15 participants from UCLA, UC Berkeley, Georgia Tech, USC, and Boston University. Ten of the participants were undergraduates, and the rest were Ph.D. candidates or early-stage professionals. Everyone had prior experience with research, AI safety, and some form of technical machine learning.

The majority of the weekend was left unstructured. We held an intro presentation to give attendees a sense of what the weekend would look like, but after that, they were paired with another participant with a similar research focus, and left to work on their question. Groups had time to share their proposals with Thomas Kwa, a professional research contractor at MIRI, and discuss them with the other attendees. The workshop convened on Sunday for closing presentations so groups could share their progress and get feedback.

I (Aiden Ament) organized the content of the weekend. Brandon Ward ran operations for the event, with help from Govind Pimpale and Tejas Kamtam.

What went well?

All of the participants were deeply engaged with concepts in theoretical alignment, and their presentations (which we encouraged them to publicize) were impressive. They made good use of the unstructured time by working diligently and having productive conversations with the other attendees. All of the participants found the workshop useful and well organized. We had a 8.75 average net promoter score, indicating our participants would strongly recommend the workshop to their peers.

Not many people have actually worked on these problems. After talking with Vivek Hebbar, he shared that they received around 15 answers to their SERI application (which is where the workshop’s questions were sourced). This means that our workshop “very roughly” doubled the number of people who have seriously engaged with these questions in the past year. Thus, future workshops on other open topics in AI safety have the ability to increase overall engagement with the chosen problem dramatically.

Furthermore, the quality of each group’s research was quite high overall, considering the brief period they had to engage with the problems. After the workshop ended, teams were encouraged to draft a formalized version of their ideas. These have been passed onto Thomas Kwa (or other industry professionals better posed to evaluate the agenda), and are currently pending feedback.

What could be improved?

While the lack of structured activity allowed participants to pursue their thoughts without interruption, feedback indicates the workshop did suffer from a lack of formal team building events. Some participants noted they would have preferred additions like scheduled discussion time with Thomas or optional group sub-activities to break up the unstructured research. We think lectures from experts could be particularly valuable in helping people develop their research agenda. In our future workshops, we’re going to include such programming.

The research questions we chose to highlight were another area that could be improved upon. We thought that these questions were a good starting point to engage in theoretical research, but many people expressed how their limited scope didn’t allow them to pursue the topics which really interested them.

As an organizer, solving this seems tricky. We want to provide tractable questions that motivate valuable research, without limiting participants from pursuing the topics they find the most insight in. I think we should allow people to pivot from their original topic midworkshop as long as their partner and the organizers agree that their new focus is worthwhile to allow for participants to generate higher quality research. Additionally, we could ask even more open ended questions that allow for greater flexibility and creativity to prevent attendees from feeling limited. Ultimately, finding the proper questions to motivate a workshop is a balancing act, and we’re still figuring out what works best.