Stephen McAleese comments on Reflections on my first year of AI safety research

Stephen McAleese 9 Jan 2024 15:46 UTC
10 points
2 ∶ 0
Thanks for the writeup. I like how it’s honest and covers all aspects of your experience. I think a key takeaway is that there is no obvious fixed plan or recipe for working on AI safety and instead, you just have to try things and learn as you go along. Without these kinds of accounts, I think there’s a risk of survivorship bias and positive selection effects where you see a nice paper or post published and you don’t get to see experiments that have failed and other stuff that has gone wrong.
- Jay Bailey🔸 10 Jan 2024 0:09 UTC
  7 points
  0 ∶ 0
  Parent
  This is exactly right, and the main reason I wrote this up in the first place. I wanted this to serve as a data point for people to be able to say “Okay, things have gone a little off the rails, but things aren’t yet worse than they were for Jay, so we’re still probably okay.” Note that it is good to have a plan for when you should give up on the field, too—it should just allow for some resilience and failures baked in. My plan was loosely “If I can’t get a job in the field, and I fail to get funded twice, I will leave the field”.
  
  Also contributing to positive selection effects is that you’re more likely to see the more impressive results in the field, because they’re more impressive. That gives your brain a skewed idea of what the median person in the field is doing. Our brain thinks “Average piece of alignment research we see” is “Average output of alignment researchers”.
  
  The counterargument to this is “Well, shouldn’t we be aiming for better than median? Shouldn’t these impressive pieces be our targets to reach?” I think so, yes, but I believe in incremental ambition as well—if one is below-average in the field, aiming to be median first, then good, then top-tier rather than trying to immediately be top-tier seems to me a reasonable approach.