AI alignment is rapidly scaling funding, and this means grantmakers will be stretched thin and less able to reliably avoid giving money to people who are not mission aligned, not producing useful research, or worst of all just want to extract money from the system. This has the potential to cause massive problems down the road, both by producing distracting low-quality research and by setting up expectations which will cause drama if someone is defunded later on while there’s still a lot of money flowing to others.
An attack-resistant EigenKarma[1]-like network for alignment researchers would, if adopted, allow the bottleneck of grantmaker time to be eased and quality of vetting to be improved, by allowing all researchers to participate in the process of vetting people and assessing the quality of their work. The ideal system would:
Allow grantmakers to view any individual’s score with an arbitrary initial trusted group, so they could analyze how promising someone seems from the perspective of any subgroup, with a clean UI.
Allow people to import their own upvote history from any of AF/LW/EAF to seed their outflowing karma, but adjust it manually via a clean UI.
Have some basic tools to flag suspicious voting patterns (e.g. two people only channeling karma to each other), likely by graphing networks of votes.
Maybe have some features to allow grants to be registered on the platform, so grantmakers can see what’s been awarded already?
Maybe have a split between “this person seems competent and maybe we should fund them to learn” vs “this person has produced something of value”?
Rob Miles has some code running on his Discord with a basic EigenKarma system, which is currently being used as the basis for a crypto project by some people from Monastic Academy, and could be used to start your project. I have some thoughts on how to improve the code and would be happy to advise.
I’m imagining a world where researchers channel their trust into the people they think are doing the most good work, which means that grantmakers can go “oh, conditioning on interpretability-focused researchers as the seed group, this applicant scores highly” or “huh, this person has been working for two years but no one trusted thinks what they’re doing is useful” rather than relying on their own valuable time to assess the technical details or their much less comprehensive and scalable sense of how they think the person is perceived.
Obviously some pre-research would be to make a sketch of how it’d work and ask grantmakers and researchers if they would use the system, but I for one would if it worked well (and might provide some seed funding to the right person).
This is Rob Miles’s description of his code, I hear the EigenTrust++ model is even better, but have not read the paper yet to verify that it makes more sense here.
Additional layer: Have the researchers have a separate “Which infrastructure / support has been most valuable to you?” category of karma, and use that to help direct funding towards the most valuable parts of the infrastructure ecosystem to support alignment. This should be one way, with researchers able to send this toward infrastructure, but not the reverse since research is the goal.
AI alignment is rapidly scaling funding, and this means grantmakers will be stretched thin and less able to reliably avoid giving money to people who are not mission aligned, not producing useful research, or worst of all just want to extract money from the system. This has the potential to cause massive problems down the road, both by producing distracting low-quality research and by setting up expectations which will cause drama if someone is defunded later on while there’s still a lot of money flowing to others.
An attack-resistant EigenKarma[1]-like network for alignment researchers would, if adopted, allow the bottleneck of grantmaker time to be eased and quality of vetting to be improved, by allowing all researchers to participate in the process of vetting people and assessing the quality of their work. The ideal system would:
Allow grantmakers to view any individual’s score with an arbitrary initial trusted group, so they could analyze how promising someone seems from the perspective of any subgroup, with a clean UI.
Allow people to import their own upvote history from any of AF/LW/EAF to seed their outflowing karma, but adjust it manually via a clean UI.
Have some basic tools to flag suspicious voting patterns (e.g. two people only channeling karma to each other), likely by graphing networks of votes.
Maybe have some features to allow grants to be registered on the platform, so grantmakers can see what’s been awarded already?
Maybe have a split between “this person seems competent and maybe we should fund them to learn” vs “this person has produced something of value”?
Rob Miles has some code running on his Discord with a basic EigenKarma system, which is currently being used as the basis for a crypto project by some people from Monastic Academy, and could be used to start your project. I have some thoughts on how to improve the code and would be happy to advise.
I’m imagining a world where researchers channel their trust into the people they think are doing the most good work, which means that grantmakers can go “oh, conditioning on interpretability-focused researchers as the seed group, this applicant scores highly” or “huh, this person has been working for two years but no one trusted thinks what they’re doing is useful” rather than relying on their own valuable time to assess the technical details or their much less comprehensive and scalable sense of how they think the person is perceived.
Obviously some pre-research would be to make a sketch of how it’d work and ask grantmakers and researchers if they would use the system, but I for one would if it worked well (and might provide some seed funding to the right person).
This is Rob Miles’s description of his code, I hear the EigenTrust++ model is even better, but have not read the paper yet to verify that it makes more sense here.
Exit strategy: sell to Reddit? (Although to be fair I guess there won’t be much in the way of protected IP)
Additional layer: Have the researchers have a separate “Which infrastructure / support has been most valuable to you?” category of karma, and use that to help direct funding towards the most valuable parts of the infrastructure ecosystem to support alignment. This should be one way, with researchers able to send this toward infrastructure, but not the reverse since research is the goal.