Hey everyone, my name is Jacques, I’m an independent technical alignment researcher (primarily focused on evaluations, interpretability, and scalable oversight). I’m now focusing more of my attention on building an Alignment Research Assistant. I’m looking for people who would like to contribute to the project. This project will be private unless I say otherwise.
Side note: I helped build the Alignment Research Dataset ~2 years ago. It has been used at OpenAI (by someone on the alignment team), (as far as I know) at Anthropic for evals, and is now used as the backend for Stampy.ai.
If you are interested in potentially helping out (or know someone who might be!), send me a DM with a bit of your background and why you’d like to help out. To keep things focused, I may or may not accept.
I have written up the vision and core features for the project here. I expect to see it evolve in terms of features, but the vision will likely remain the same. I’m currently working on some of the features and have delegated some tasks to others (tasks are in a private GitHub project board).
I’m also collaborating with different groups. For now, the focus is to build core features that can be used individually but will eventually work together into the core product. In 2-3 months, I want to get it to a place where I know whether this is useful for other researchers and if we should apply for additional funding to turn it into a serious project.
As an update to the Alignment Research Assistant I’m building, here is a set of shovel-ready tasks I would like people to contribute to (please DM if you’d like to contribute!):
An LLM periodically looks through the project you are working on and tries to suggest *actually useful* things in the side-chat. It will be a delicate balance to make sure not to share too much and cause a loss of focus. This could be custom for the research with an option only to give automated suggestions post-research session.
8. Figure out if we can get a useable browser inside of VSCode (tried quickly with the Edge extension but couldn’t sign into the Claude chat website)
Could make use of new features other companies build (like Anthropic’s Artifact feature), but inside of VSCode to prevent context-switching in an actual browser
9. “Alignment Research Codebase” integration (can add as Continue backend)
Create an easily insertable set of repeatable code that researchers can quickly add to their project or LLM context
This includes code for Multi-GPU stuff, best practices for codebase, and more
Should make it easy to populate a new codebase
Pro-actively gives suggestions to improve the code
Generally makes common code implementation much faster
Specialized tooling (outside of VSCode)
Bulk fast content extraction
Create an extension to extract content from multiple tabs or papers
Simplify the process of feeding content to the VSCode backend for future use
Personalized Research Newsletter
Create a tool that extracts relevant information for researchers (papers, posts, other sources)
Generate personalized newsletters based on individual interests (open questions and research they care about)
Sends pro-active notification in VSCode and Email
Discord Bot for Project Proposals
Suggest relevant papers/posts/repos based on project proposals
We’re doing a hackathon with Apart Research on 26th. I created a list of problem statements for people to brainstorm off of.
Pro-active insight extraction from new research
Reading papers can take a long time and is often not worthwhile. As a result, researchers might read too many papers or almost none. However, there are still valuable nuggets in papers and posts. The issue is finding them. So, how might we design an AI research assistant that proactively looks at new papers (and old) and shares valuable information with researchers in a naturally consumable way? Part of this work involves presenting individual research with what they would personally find valuable and not overwhelm them with things they are less interested in.
How can we improve the LLM experience for researchers?
Many alignment researchers will use language models much less than they would like to because they don’t know how to prompt the models, it takes time to create a valuable prompt, the model doesn’t have enough context for their project, the model is not up-to-date on the latest techniques, etc. How might we make LLMs more useful for researchers by relieving them of those bottlenecks?
Simple experiments can be done quickly, but turning it into a full project can take a lot of time
One key bottleneck for alignment research is transitioning from an initial 24-hour simple experiment in a notebook to a set of complete experiments tested with different models, datasets, interventions, etc. How can we help researchers move through that second research phase much faster?
How might we use AI agents to automate alignment research?
As AI agents become more capable, we can use them to automate parts of alignment research. The paper “A Multimodal Automated Interpretability Agent” serves as an initial attempt at this. How might we use AI agents to help either speed up alignment research or unlock paths that were previously inaccessible?
How can we nudge research toward better objectives (agendas or short experiments) for their research?
Even if we make researchers highly efficient, it means nothing if they are not working on the right things. Choosing the right objectives (projects and next steps) through time can be the difference between 0x to 1x to +100x. How can we ensure that researchers are working on the most valuable things?
What can be done to accelerate implementation and iteration speed?
Implementation and iteration speed on the most informative experiments matter greatly. How can we nudge them to gain the most bits of information in the shortest time? This involves helping them work on the right agendas/projects and helping them break down their projects in ways that help them make progress faster (and avoiding ending up tunnel-visioned on the wrong project for months/years).
How can we connect all of the ideas in the field?
How can we integrate the open questions/projects in the field (with their critiques) in such a way that helps the researcher come up with well-grounded research directions faster? How can we aid them in choosing better directions and adjust throughout their research? This kind of work may eventually be a precursor to guiding AI agents to help us develop better ideas for alignment research.
I’ve created a private discord server to discuss this work. If you’d like to contribute to this project (or might want to in the future if you see a feature you’d like to contribute to) or if you are an alignment/governance researcher who would like to be a beta user so we can iterate faster, please DM me for a link!
Hey everyone, my name is Jacques, I’m an independent technical alignment researcher (primarily focused on evaluations, interpretability, and scalable oversight). I’m now focusing more of my attention on building an Alignment Research Assistant. I’m looking for people who would like to contribute to the project. This project will be private unless I say otherwise.
Side note: I helped build the Alignment Research Dataset ~2 years ago. It has been used at OpenAI (by someone on the alignment team), (as far as I know) at Anthropic for evals, and is now used as the backend for Stampy.ai.
If you are interested in potentially helping out (or know someone who might be!), send me a DM with a bit of your background and why you’d like to help out. To keep things focused, I may or may not accept.
I have written up the vision and core features for the project here. I expect to see it evolve in terms of features, but the vision will likely remain the same. I’m currently working on some of the features and have delegated some tasks to others (tasks are in a private GitHub project board).
I’m also collaborating with different groups. For now, the focus is to build core features that can be used individually but will eventually work together into the core product. In 2-3 months, I want to get it to a place where I know whether this is useful for other researchers and if we should apply for additional funding to turn it into a serious project.
As an update to the Alignment Research Assistant I’m building, here is a set of shovel-ready tasks I would like people to contribute to (please DM if you’d like to contribute!):
Core Features
1. Setup the Continue extension for research: https://www.continue.dev/
Design prompts in Continue that are suitable for a variety of alignment research tasks and make it easy to switch between these prompts
Figure out how to scaffold LLMs with Continue (instead of just prompting one LLM with additional context)
Can include agents, search, and more
Test out models to quickly help with paper-writing
2. Data sourcing and management
Integrate with the Alignment Research Dataset (pulling from either the SQL database or Pinecone vector database): https://github.com/StampyAI/alignment-research-dataset
Integrate with other apps (Google Docs, Obsidian, Roam Research, Twitter, LessWrong)
Make it easy to look and edit long prompts for project context
3. Extract answers to questions across multiple papers/posts (feeds into Continue)
Develop high-quality chunking and scaffolding techniques
Implement multi-step interaction between researcher and LLM
4. Design Autoprompts for alignment research
Creates lengthy, high-quality prompts for researchers that get better responses from LLMs
5. Simulated Paper Reviewer
Fine-tune or prompt LLM to behave like an academic reviewer
Use OpenReview data for training
6. Jargon and Prerequisite Explainer
Design a sidebar feature to extract and explain important jargon
Could maybe integrate with some interface similar to https://delve.a9.io/
7. Setup automated “suggestion-LLM”
An LLM periodically looks through the project you are working on and tries to suggest *actually useful* things in the side-chat. It will be a delicate balance to make sure not to share too much and cause a loss of focus. This could be custom for the research with an option only to give automated suggestions post-research session.
8. Figure out if we can get a useable browser inside of VSCode (tried quickly with the Edge extension but couldn’t sign into the Claude chat website)
Could make use of new features other companies build (like Anthropic’s Artifact feature), but inside of VSCode to prevent context-switching in an actual browser
9. “Alignment Research Codebase” integration (can add as Continue backend)
Create an easily insertable set of repeatable code that researchers can quickly add to their project or LLM context
This includes code for Multi-GPU stuff, best practices for codebase, and more
Should make it easy to populate a new codebase
Pro-actively gives suggestions to improve the code
Generally makes common code implementation much faster
Specialized tooling (outside of VSCode)
Bulk fast content extraction
Create an extension to extract content from multiple tabs or papers
Simplify the process of feeding content to the VSCode backend for future use
Personalized Research Newsletter
Create a tool that extracts relevant information for researchers (papers, posts, other sources)
Generate personalized newsletters based on individual interests (open questions and research they care about)
Sends pro-active notification in VSCode and Email
Discord Bot for Project Proposals
Suggest relevant papers/posts/repos based on project proposals
Integrate with Apart Research Hackathons
We’re doing a hackathon with Apart Research on 26th. I created a list of problem statements for people to brainstorm off of.
Pro-active insight extraction from new research
Reading papers can take a long time and is often not worthwhile. As a result, researchers might read too many papers or almost none. However, there are still valuable nuggets in papers and posts. The issue is finding them. So, how might we design an AI research assistant that proactively looks at new papers (and old) and shares valuable information with researchers in a naturally consumable way? Part of this work involves presenting individual research with what they would personally find valuable and not overwhelm them with things they are less interested in.
How can we improve the LLM experience for researchers?
Many alignment researchers will use language models much less than they would like to because they don’t know how to prompt the models, it takes time to create a valuable prompt, the model doesn’t have enough context for their project, the model is not up-to-date on the latest techniques, etc. How might we make LLMs more useful for researchers by relieving them of those bottlenecks?
Simple experiments can be done quickly, but turning it into a full project can take a lot of time
One key bottleneck for alignment research is transitioning from an initial 24-hour simple experiment in a notebook to a set of complete experiments tested with different models, datasets, interventions, etc. How can we help researchers move through that second research phase much faster?
How might we use AI agents to automate alignment research?
As AI agents become more capable, we can use them to automate parts of alignment research. The paper “A Multimodal Automated Interpretability Agent” serves as an initial attempt at this. How might we use AI agents to help either speed up alignment research or unlock paths that were previously inaccessible?
How can we nudge research toward better objectives (agendas or short experiments) for their research?
Even if we make researchers highly efficient, it means nothing if they are not working on the right things. Choosing the right objectives (projects and next steps) through time can be the difference between 0x to 1x to +100x. How can we ensure that researchers are working on the most valuable things?
What can be done to accelerate implementation and iteration speed?
Implementation and iteration speed on the most informative experiments matter greatly. How can we nudge them to gain the most bits of information in the shortest time? This involves helping them work on the right agendas/projects and helping them break down their projects in ways that help them make progress faster (and avoiding ending up tunnel-visioned on the wrong project for months/years).
How can we connect all of the ideas in the field?
How can we integrate the open questions/projects in the field (with their critiques) in such a way that helps the researcher come up with well-grounded research directions faster? How can we aid them in choosing better directions and adjust throughout their research? This kind of work may eventually be a precursor to guiding AI agents to help us develop better ideas for alignment research.
I’ve created a private discord server to discuss this work. If you’d like to contribute to this project (or might want to in the future if you see a feature you’d like to contribute to) or if you are an alignment/governance researcher who would like to be a beta user so we can iterate faster, please DM me for a link!
Have you talked with someone from Ought/Elicit? It seems like they should be able to give you useful feedback.
Yes, I’ve talked to them a few times in the last 2 years!