I found this compelling, but I feel confused about what I’d be signing up for from a time-investment perspective (hours, days, weeks?).
First, I want to support this suggestion:
If you do give it a try, consider leaving comments here on how much time you invested, how far you got, and how the whole thing was for you.
Responses of this kind will be quite helpful from my perspective, especially if the commentors share a bit about their background as well.
Given that comments of that type do not currently exist, I’d be excited for Holden and/or others in the know to post what their predictions would be if e.g some comments were made. What are your expectations for people’s time investment and results, especially by background?
I can give a sense of my investment, though I’m obviously an unusual case in multiple ways. I’m a coauthor on the report but I’m not an ARC researcher, and my role as a coauthor was primarily to try to make it more likely that the report would be accessible to a broader audience, which involved making sure my own “dumb questions” were answered in the report.
I kept time logs, and the whole project of coauthoring the report took me ~100 hours. By the end I had one “seed” of an ELK idea but unfortunately didn’t flesh it out because other work/life things were pretty hectic. Getting to this “seed” was <30 min of investment.
I think if I had started with the report in hand, it would have taken me ~10 hours to read it carefully enough and ask enough “dumb questions” to get to the point of having the seed of an idea about as good as that idea, and then another ~10 hours to flesh it out into an algorithm + counterexample. I think the probability I’d have won the $5000 prize after that investment is ~50%, making the expected investment 40h. I think there’s a non-trivial but not super high chance I’d have won larger prizes, so the $ / hours ratio is significantly better in expectation than $125/hr (since the ceiling for the larger prizes is so much higher).
My background:
I have a fairly technical background, though I think the right way to categorize me is as “semi-technical” or “technical-literate.” I did computer science in undergrad and enjoyed it / did well, but my day to day work mainly involves writing. I can do simple Python scripting. I can slowly and painfully sometimes do the kinds of algorithms problem sets I did quickly in undergrad.
Four years ago I wrote this to explain what I understood of Paul’s research agenda at the time.
I’ve been thinking about AI alignment a lot over the last year, and especially have the unfair advantage of getting to talk to Paul a lot. With that said, I didn’t really know much or think much about ELK specifically (which I consider pretty self-contained) until I started writing the report, which was late Nov / early Dec.
Hey Josh, I think this is a good point—it would be great to have some common knowledge of what sort of commitment this is.
Here’s where I am so far:
I read through the full report reasonably carefully (but only some of the appendices).
I spent some time thinking about potential counterexamples. It’s hard to say how much; this mostly wasn’t time I carved out, more something I was thinking about while taking a walk or something.
At times I would reread specific parts of the writeup that seemed important for thinking about whether a particular idea was viable. I wrote up one batch of rough ideas for ARC and got feedback on it.
I would guess that I spent several hours on #1, several hours on #2, and maybe another 2-3 hours on #3. So maybe something like 10-15 hours so far?
At this point I don’t think I’m clearly on track to come up with anything that qualifies for a prize, but I think I understand the problem pretty well and why it’s hard for me to think of solutions. If I fail to submit a successful entry, I think it will feel more like “I saw what was hard about this and wasn’t able to overcome it” than like “I tried a bunch of random stuff, lacking understanding of the challenge, and none of it worked out.” This is the main benefit that I wanted.
My background might unfortunately be hard to make much sense of, in terms of how it compares to someone else’s. I have next to no formal technical education, but I have spent tons of time talking about AI timelines and AI safety, including with Paul (the head of ARC), and that has included asking questions and reading things about the aspects of machine learning I felt were important for these conversations. (I never wrote my own code or read through a textbook, though I did read Michael Nielsen’s guide to neural networks a while ago.) My subjective feeling was that the ELK writeup didn’t have a lot of prerequisites—mostly just a very basic understanding of what deep learning is about, and a vague understanding of what a Bayes net is. But I can’t be confident in that. (In particular, Bayes nets are generally only used to make examples concrete, and I was generally pretty fine to just go with my rough impression of what was going on; I sometimes found the more detailed appendices, with pseudocode and a Conway’s Game of Life analogy, clearer than the Bayes net diagrams anyway.)
Congrats on submitting proposals that would have won you a $15,000 prize if you had been eligible! How long did it take you to come up with these proposals?
Thanks! I’d estimate another 10-15 hours on top of the above, so 20-30 hours total. A good amount of this felt like leisure time and could be done while not in front of a computer, which was nice. I didn’t end up with “solutions” I’d be actually excited about for substantive progress on alignment, but I think I accomplished my goal of understanding the ELK writeup well enough to nitpick it.
I would consider someone a “pretty good fit” (whatever that means) for alignment research if they started out with a relatively technical background, e.g. an undegrad degree in math/cs, but not really having engaged with alignment before and they were able to come up with a decent proposal after:
~10 hours of engaging with the ELK doc.
~10 hours of thinking about the document and resolving confusions they had, which might involve asking some questions to clarify the rules and the setup.
~10 hours of trying to come up with a proposal.
If someone starts from having thought about alignment a bunch, I would consider them a potentially “pretty good researcher” if they were able to come up with a decent proposal in 2-8 hours. I expect many existing (alignment) researchers to be able to come up with proposals in <1 hour.
Note that I’m saying “if (can come up with proposal in N hours), then (might be good alignment researcher)” and not saying the other implication also holds, e.g. it is not the case that “if (might be good alignment researcher), then (can come up with proposal in N hours)”
After becoming very familiar with the ELK report (~5 hours?), it took me one hour to generate a proposal and associated counterexample (the “Predict hypothetical sensors” proposal here), though it wasn’t very clean / fleshed out and Paul clarified it a bunch more (after that hour). I haven’t checked whether it would have defeated all the counterexamples that existed at the time.
I have a lot of background in CS, ML, AI alignment, etc but it did not feel to me that I was leveraging that all that much during the one hour of producing a proposal (though I definitely leveraged it a bunch to understand the ELK report in the first place, as well as to produce the counterexample to the proposal).
During the 2 hours of reading and skimming through the relevant blog posts I was able to come up with 2 strategies (and no counter-examples so far). They seem to me as quite intuitive and easy to come up with, so I’m wondering what I got wrong about the ELK problem or the contest...
Due to the low confidence in my understanding I don’t feel comfortable submitting these strategies, as I don’t want to waste the ARC’s team time.
My background: ML engineer (~7 years of experience), some previous exposure to AGI and AIS research and computer security.
I am pretty confident that ARC would want you to submit those strategies, especially given your background. Even if both trivially fail, it seems useful for them to know that they did not seem obviously ruled out to you by the provided material.
ARC would be excited for you to send a short email to elk@alignmentresearchcenter.org with a few bullet points describing your high level ideas, if you want to get a sense for whether you’re on the right track / whether fleshing them out would be likely to win a prize.
I’m a software engineer with about 15 years of experience, but my jobs have had nothing to do with ML apart from occasional stats work. I remember a little about neural networks and Bayes nets from school 20 years ago, enough to follow part 1 of the ELK paper easily (up until Ontology Identification) and much (but not all) of the remainder difficult-ly. I occasionally read terrifying essays about AI and would like to help.
I’ve spent maybe 8 hours with the ELK paper; it was encouraging how much I understood with my current background. I plan to take a day off next week to work on it, expecting to total around 20 hours. I have an idea of where I want to look first, but am dubious that I’ll be able to pin down a new idea in that time as I’m not typically very creative. Still, I might get lucky, and updating/refreshing my ML knowledge a bit feels worthwhile in itself.
I found this compelling, but I feel confused about what I’d be signing up for from a time-investment perspective (hours, days, weeks?).
First, I want to support this suggestion:
Responses of this kind will be quite helpful from my perspective, especially if the commentors share a bit about their background as well.
Given that comments of that type do not currently exist, I’d be excited for Holden and/or others in the know to post what their predictions would be if e.g some comments were made. What are your expectations for people’s time investment and results, especially by background?
I can give a sense of my investment, though I’m obviously an unusual case in multiple ways. I’m a coauthor on the report but I’m not an ARC researcher, and my role as a coauthor was primarily to try to make it more likely that the report would be accessible to a broader audience, which involved making sure my own “dumb questions” were answered in the report.
I kept time logs, and the whole project of coauthoring the report took me ~100 hours. By the end I had one “seed” of an ELK idea but unfortunately didn’t flesh it out because other work/life things were pretty hectic. Getting to this “seed” was <30 min of investment.
I think if I had started with the report in hand, it would have taken me ~10 hours to read it carefully enough and ask enough “dumb questions” to get to the point of having the seed of an idea about as good as that idea, and then another ~10 hours to flesh it out into an algorithm + counterexample. I think the probability I’d have won the $5000 prize after that investment is ~50%, making the expected investment 40h. I think there’s a non-trivial but not super high chance I’d have won larger prizes, so the $ / hours ratio is significantly better in expectation than $125/hr (since the ceiling for the larger prizes is so much higher).
My background:
I have a fairly technical background, though I think the right way to categorize me is as “semi-technical” or “technical-literate.” I did computer science in undergrad and enjoyed it / did well, but my day to day work mainly involves writing. I can do simple Python scripting. I can slowly and painfully sometimes do the kinds of algorithms problem sets I did quickly in undergrad.
Four years ago I wrote this to explain what I understood of Paul’s research agenda at the time.
I’ve been thinking about AI alignment a lot over the last year, and especially have the unfair advantage of getting to talk to Paul a lot. With that said, I didn’t really know much or think much about ELK specifically (which I consider pretty self-contained) until I started writing the report, which was late Nov / early Dec.
Hey Josh, I think this is a good point—it would be great to have some common knowledge of what sort of commitment this is.
Here’s where I am so far:
I read through the full report reasonably carefully (but only some of the appendices).
I spent some time thinking about potential counterexamples. It’s hard to say how much; this mostly wasn’t time I carved out, more something I was thinking about while taking a walk or something.
At times I would reread specific parts of the writeup that seemed important for thinking about whether a particular idea was viable. I wrote up one batch of rough ideas for ARC and got feedback on it.
I would guess that I spent several hours on #1, several hours on #2, and maybe another 2-3 hours on #3. So maybe something like 10-15 hours so far?
At this point I don’t think I’m clearly on track to come up with anything that qualifies for a prize, but I think I understand the problem pretty well and why it’s hard for me to think of solutions. If I fail to submit a successful entry, I think it will feel more like “I saw what was hard about this and wasn’t able to overcome it” than like “I tried a bunch of random stuff, lacking understanding of the challenge, and none of it worked out.” This is the main benefit that I wanted.
My background might unfortunately be hard to make much sense of, in terms of how it compares to someone else’s. I have next to no formal technical education, but I have spent tons of time talking about AI timelines and AI safety, including with Paul (the head of ARC), and that has included asking questions and reading things about the aspects of machine learning I felt were important for these conversations. (I never wrote my own code or read through a textbook, though I did read Michael Nielsen’s guide to neural networks a while ago.) My subjective feeling was that the ELK writeup didn’t have a lot of prerequisites—mostly just a very basic understanding of what deep learning is about, and a vague understanding of what a Bayes net is. But I can’t be confident in that. (In particular, Bayes nets are generally only used to make examples concrete, and I was generally pretty fine to just go with my rough impression of what was going on; I sometimes found the more detailed appendices, with pseudocode and a Conway’s Game of Life analogy, clearer than the Bayes net diagrams anyway.)
Congrats on submitting proposals that would have won you a $15,000 prize if you had been eligible! How long did it take you to come up with these proposals?
Thanks! I’d estimate another 10-15 hours on top of the above, so 20-30 hours total. A good amount of this felt like leisure time and could be done while not in front of a computer, which was nice. I didn’t end up with “solutions” I’d be actually excited about for substantive progress on alignment, but I think I accomplished my goal of understanding the ELK writeup well enough to nitpick it.
Note: I work for ARC.
I would consider someone a “pretty good fit” (whatever that means) for alignment research if they started out with a relatively technical background, e.g. an undegrad degree in math/cs, but not really having engaged with alignment before and they were able to come up with a decent proposal after:
~10 hours of engaging with the ELK doc.
~10 hours of thinking about the document and resolving confusions they had, which might involve asking some questions to clarify the rules and the setup.
~10 hours of trying to come up with a proposal.
If someone starts from having thought about alignment a bunch, I would consider them a potentially “pretty good researcher” if they were able to come up with a decent proposal in 2-8 hours. I expect many existing (alignment) researchers to be able to come up with proposals in <1 hour.
Note that I’m saying “if (can come up with proposal in N hours), then (might be good alignment researcher)” and not saying the other implication also holds, e.g. it is not the case that “if (might be good alignment researcher), then (can come up with proposal in N hours)”
After becoming very familiar with the ELK report (~5 hours?), it took me one hour to generate a proposal and associated counterexample (the “Predict hypothetical sensors” proposal here), though it wasn’t very clean / fleshed out and Paul clarified it a bunch more (after that hour). I haven’t checked whether it would have defeated all the counterexamples that existed at the time.
I have a lot of background in CS, ML, AI alignment, etc but it did not feel to me that I was leveraging that all that much during the one hour of producing a proposal (though I definitely leveraged it a bunch to understand the ELK report in the first place, as well as to produce the counterexample to the proposal).
During the 2 hours of reading and skimming through the relevant blog posts I was able to come up with 2 strategies (and no counter-examples so far). They seem to me as quite intuitive and easy to come up with, so I’m wondering what I got wrong about the ELK problem or the contest...
Due to the low confidence in my understanding I don’t feel comfortable submitting these strategies, as I don’t want to waste the ARC’s team time.
My background: ML engineer (~7 years of experience), some previous exposure to AGI and AIS research and computer security.
I am pretty confident that ARC would want you to submit those strategies, especially given your background. Even if both trivially fail, it seems useful for them to know that they did not seem obviously ruled out to you by the provided material.
ARC would be excited for you to send a short email to elk@alignmentresearchcenter.org with a few bullet points describing your high level ideas, if you want to get a sense for whether you’re on the right track / whether fleshing them out would be likely to win a prize.
Can confirm we would be interested in hearing what you came up with.
I’m a software engineer with about 15 years of experience, but my jobs have had nothing to do with ML apart from occasional stats work. I remember a little about neural networks and Bayes nets from school 20 years ago, enough to follow part 1 of the ELK paper easily (up until Ontology Identification) and much (but not all) of the remainder difficult-ly. I occasionally read terrifying essays about AI and would like to help.
I’ve spent maybe 8 hours with the ELK paper; it was encouraging how much I understood with my current background. I plan to take a day off next week to work on it, expecting to total around 20 hours. I have an idea of where I want to look first, but am dubious that I’ll be able to pin down a new idea in that time as I’m not typically very creative. Still, I might get lucky, and updating/refreshing my ML knowledge a bit feels worthwhile in itself.