I might write a longer summary at some point, but some brief thoughts on how this week went:
Overall, I think the first ~3 days were a good use of my time, but Iâm not sure about the last one or two. From the perspective of âunderstanding what itâs like to be a junior safety researcherâ I feel like I got most of the learning within three days.
I managed to come up with a solution which handled the breakers of the regularizers listed in the original document, though it was subject to a very analogous breaker. I donât feel like humanity is noticeably closer to solving the alignment problem by virtue of my solution, but I think I wouldâve estimated ~1/â3 chance that I would make even this little amount of progress before the week started. (Mostly calibrating off Ajeya saying that it would have taken her a full week on expectation, and assuming Iâm substantially less qualified than her.) So overall I feel relatively happy with my work.
I feel more optimistic about humanityâs ability to solve the alignment problem now. Partially this is a reflection of me having recently been reading Eliezerâs debates, where he presents a very pessimistic view of our chance of success.
This contest seems like a really great opportunity for people to get involved in alignment research, and Iâm very grateful to the ARC team for running it.
There are not very many people who could run contest like this, and I assume that in a couple weeks the ARC team is (justifiably) going to go back to doing their research. I feel sad that there arenât more people who could run a contest like this, and Iâm not sure how to create more of them. If others have thoughts on how I/âCEA could do this, I would be very interested in hearing them!
I might write a longer summary at some point, but some brief thoughts on how this week went:
Overall, I think the first ~3 days were a good use of my time, but Iâm not sure about the last one or two. From the perspective of âunderstanding what itâs like to be a junior safety researcherâ I feel like I got most of the learning within three days.
I managed to come up with a solution which handled the breakers of the regularizers listed in the original document, though it was subject to a very analogous breaker. I donât feel like humanity is noticeably closer to solving the alignment problem by virtue of my solution, but I think I wouldâve estimated ~1/â3 chance that I would make even this little amount of progress before the week started. (Mostly calibrating off Ajeya saying that it would have taken her a full week on expectation, and assuming Iâm substantially less qualified than her.) So overall I feel relatively happy with my work.
I feel more optimistic about humanityâs ability to solve the alignment problem now. Partially this is a reflection of me having recently been reading Eliezerâs debates, where he presents a very pessimistic view of our chance of success.
This contest seems like a really great opportunity for people to get involved in alignment research, and Iâm very grateful to the ARC team for running it.
There are not very many people who could run contest like this, and I assume that in a couple weeks the ARC team is (justifiably) going to go back to doing their research. I feel sad that there arenât more people who could run a contest like this, and Iâm not sure how to create more of them. If others have thoughts on how I/âCEA could do this, I would be very interested in hearing them!