This is a really great write-up, thanks for doing this so conscientiously and thoroughly. It’s good to hear that Surge is mostly meeting researchers’ needs.
Re whether higher-quality human data is just patching current alignment problems—the way I think about it is more like: there’s a minimum level of quality you need to set up various enhanced human feedback schemes. You need people to actually read and follow the instructions, and if they don’t do this reliably you really won’t be able to set up something like amplification or other schemes that need your humans to interact with models in non-trivial ways. It seems good to get human data quality to the point where it’s easy for alignment researchers to implement different schemes that involve complex interactions (like the humans using an adversarial example finder tool or looking at the output of an interpretability tool). This is different from the case where we e.g. have an alignment problem because MTurkers mark common misconceptions as truthful, whereas more educated workers correctly mark them as false, which I don’t think of as a scalable sort of improvement.
This is great! Agree that this looked like an extremely promising idea based on what was publicly knowable in spring, and that it’s probably not the right move now.
+1! I updated on this a lot over the past few months from working with Surge, and it’s really great to see this reflected so quickly in others’ thinking here
Thanks for this amazing, really comprehensive writeup! I’m from Surge—we are actually very intrinsically Alignment-motivated, and one of our main goals is to help researchers advance the field. So this is great feedback for us. I’d love to grab time with you all to chat more. And if anyone else would like to chat too, feel free to reach out to me at edwin[æ]surgehq.ai :)
As a sidenote, we do help take as much off people’s plates as possible (whether creating instructions, building interfaces, running quality controls, etc). We probably need to figure out how to explain that better.
Thanks for sharing! I think being able to stop porsuing a project if it no longer seems to have a high expected value, and sharing the learnings is really valuable!
This is a really great write-up, thanks for doing this so conscientiously and thoroughly. It’s good to hear that Surge is mostly meeting researchers’ needs.
Re whether higher-quality human data is just patching current alignment problems—the way I think about it is more like: there’s a minimum level of quality you need to set up various enhanced human feedback schemes. You need people to actually read and follow the instructions, and if they don’t do this reliably you really won’t be able to set up something like amplification or other schemes that need your humans to interact with models in non-trivial ways. It seems good to get human data quality to the point where it’s easy for alignment researchers to implement different schemes that involve complex interactions (like the humans using an adversarial example finder tool or looking at the output of an interpretability tool). This is different from the case where we e.g. have an alignment problem because MTurkers mark common misconceptions as truthful, whereas more educated workers correctly mark them as false, which I don’t think of as a scalable sort of improvement.
This is great! Agree that this looked like an extremely promising idea based on what was publicly knowable in spring, and that it’s probably not the right move now.
+1! I updated on this a lot over the past few months from working with Surge, and it’s really great to see this reflected so quickly in others’ thinking here
Thanks for this post! Future Fund has removed this project from our projects page in response.
Thanks for this amazing, really comprehensive writeup! I’m from Surge—we are actually very intrinsically Alignment-motivated, and one of our main goals is to help researchers advance the field. So this is great feedback for us. I’d love to grab time with you all to chat more. And if anyone else would like to chat too, feel free to reach out to me at edwin[æ]surgehq.ai :)
As a sidenote, we do help take as much off people’s plates as possible (whether creating instructions, building interfaces, running quality controls, etc). We probably need to figure out how to explain that better.
Thanks for investigating this and producing such an extremely thorough write-up, very useful!
Thanks for sharing! I think being able to stop porsuing a project if it no longer seems to have a high expected value, and sharing the learnings is really valuable!