Great post!
This framing doesn’t seem to capture the concern that even slight misspecification (e.g. a reward function that is a bit off) could lead to x-catastrophe.
I think this is a big part of many people’s concerns, including mine.
This seems somewhat orthogonal to the Saint/Sycophant/Schemer disjunction… or to put it another way, it seems like a Saint that is just not quite right about what your interests actually are (e.g. because they have alien biology and culture) could still be an x-risk.
Thoughts?
capybaralet
I think you guys are already aware of RadicalXChange. It’s a bit different in focus, but I know they are excited about trying out mechanisms like QV/QF in institutional settings.
People are motivated both by:
competition and status and
cooperation and identifying with the successes of a group. I think we should aim to harness both of these forms of motivation.
Overall, I’m intrigued and like this general line of thought. A few thoughts on the post:
If you’re using earn.com, it’s not really email anymore, right? So maybe it’s better to think about this as about “online messaging”.
Another (complementary) way to improve email is to make it like facebook where you have to agree to connect with someone before they can message you.
Like many ideas about using $$ as a signal, I think it might be better if we instead used a domain-specific credit system, where credits are allotted to individuals at some fixed rate, or according to some rules, and cannot be purchased. People can find ways of subverting that, but they can also subvert the paid email idea (just open all their emails and take the $$ without reading or responding meaningfully).
Thanks for this update, and for your valuable work.
I must admit I was frustrated by reading this post. I want this work to continue, and I don’t find the levels of engagement you report surprising or worth massively updating on (i.e. suspending outreach).
I’m also bothered by the top-level comments assuming that this didn’t work and should’ve been abandoned. What you’ve shown is that you could not provide strong evidence of the type that you hoped for the programs effectiveness, NOT that it didn’t work!
Basically, I think there should be a strong prior that this type of work is effective, and I think the question should be how to do a good job of it. So I want these results to be taken as a baseline, and for your org to continue iterating and trying to improve your outreach, rather than giving up on it. And I want funders to see your vision and stick with you as you iterate.
I’m frustrated by the focus on short-term, measurable results here. I don’t expect you to be able to measure the effects well.
Overall, I feel like the results you’ve presented here inspire a lot of ideas and questions, and I think continued work to build a better model of how outreach to high schoolers works seems very valuable. I think this should be approached with more of a scientific/tinkering/start-up mindset of “we have this idea that we believe in and we’re going to try our damndest to make it work before giving up!” I think part of “making it work” here includes figuring out how to gauge the impact. How do teachers normally tell if they’re having an impact? Probably they mostly trust their gut. So is there a way to ask them (obvious risk is they’ll tell you a white lie). Maybe you think continuing this work is not your comparative advantage, or you’re not the org to do it, which seems fine, but I’d rather you try and hire a new “CEO”/team for SHIC in that case (if possible), and not throw away existing institutional knowledge, rather than suspend the outreach.
-------------------------
RE evaluating effectiveness:
I’d be very curious to know more about the few students who did engage outside of class. In my mind, the evidence for effectiveness hinges to a significant extent on the quality and motivation of the students who continue engaging.
I think there are other ways you could gauge effectiveness, mostly by recruiting teachers into this process. They were more eager for your material than you expected (well, I think it makes sense, since its less work for them!) So you can ask for things in return: follow-up surveys, assignments, quiz questions, or any form of evaluation from them in terms of how well the content stuck and if they think it had any impact.
A few more specific questions:
- RE footnote 3: why not use “EA” in the program? This seems mildly dishonest and liable to reduce expected impact.
- RE footnote 7: why did they feel inappropriate?
What you describe is part of what I meant by “jadedness”.
”If they were actually trying to change the world—if they were actually strongly motivated to make the world a better place, etc. -- the stuff they learn in college wouldn’t stop them.”
^ I disagree. Or rather, I should say, there are a lot of people who are not-so-strongly motivated to make the world a better place, and so get burned out and settle into a typical lifestyle. I think this outcome would be much less likely at a place like “Change the World University”, both because it would feel worse to give up on that goal (you would constantly be reminded of that), and because your peers would be (self-/)selected for being passionate about changing the world.
Out of 55 2-sample t-tests, we would expect 2 to come out “statistically significant” due to random chance, but I found 10, so we can expect most of these to point to actually meaningful differences represented in the survey data.
Is there a more rigorous form of this argument?
Reminds me of The House of Saud (although I’m not saying they have this goal, or any shared goal):
”The family in total is estimated to comprise some 15,000 members; however, the majority of power, influence and wealth is possessed by a group of about 2,000 of them. Some estimates of the royal family’s wealth measure their net worth at $1.4 trillion”
https://en.wikipedia.org/wiki/House_of_Saud
Reiterating my other comments: I don’t think it’s appropriate to say that the evidence showed it made sense to give up. As others have mentioned, there are measurement issues here. So this is a case where absence of evidence is not strong evidence of absence.
I recommend changing the “climate change” header to something a bit broader (e.g.”environmentalism” or “protecting the natural environment”, etc.). It is a shame that (it seems) climate change has come to eclipse/subsume all other environmental concerns in the public imagination. While most environmental issues are exacerbated by climate change, solving climate change will not necessarily solve them.
A specific cause worth mentioning is preventing the collapse of key ecosystems, e.g. coral reefs: https://forum.effectivealtruism.org/posts/YEkyuTvachFyE2mqh/trying-to-help-coral-reefs-survive-climate-change-seems
Thanks for that!
I’m interested if you have other examples.
This one looks similar, but not that similar. The whole framing/vision is different.
When I visit their webpage, the message I get is: “hey, do you maybe want to opt in to this thing to tell us about yourself because you can’t get any real publicity?”
The message I want to send is: “Politicians are job candidates; why don’t we make them apply/grovel for a job like everyone else?
Thanks for writing this. My TL;DR is:
AI policy is important, but we don’t really know where to begin at the object level
You can potentially do 1 of 3 things, ATM: A. “disentanglement” research: B. operational support for (e.g.) FHI C. get in position to influence policy, and wait for policy objectives to be cleared up
Get in touch / Apply to FHI!
I think this is broadly correct, but have a lot of questions and quibbles.
I found “disentanglement” unclear. [14] gave the clearest idea of what this might look like. A simple toy example would help a lot.
Can you give some idea of what an operations role looks like? I find it difficult to visualize, and I think uncertainty makes it less appealling.
Do you have any thoughts on why operations roles aren’t being filled?
One more policy that seems worth starting on: programs that build international connections between researchers (especially around policy-relevant issues of AI (i.e. ethics/safety)).
The timelines for effective interventions in some policy areas may be short (e.g. 1-5 years), and it may not be possible to wait for disentanglement to be “finished”.
Is it reasonable to expect the “disentanglement bottleneck” to be cleared at all? Would disentanglement actually make policy goals clear enough? Trying to anticipate all the potential pitfalls of policies is a bit like trying to anticipate all the potential pitfalls of a particular AI design or reward specification… fortunately, there is a bit of a disanalogy in that we are more likely to have a chance to correct mistakes with policy (although that still could be very hard/impossible). It seems plausible that “start iterating and create feedback loops” is a better alternative to the “wait until things are clearer” strategy.
I’m also very interested in hearing you elaborate a bit.
I guess you are arguing that AIS is a social rather than a technical problem. Personally, I think there are aspects of both, but that the social/coordination side is much more significant.
RE: “MIRI has focused in on an extremely specific kind of AI”, I disagree. I think MIRI has aimed to study AGI in as much generality as possible and mostly succeeded in that (although I’m less optimistic than them that results which apply to idealized agents will carry over and produce meaningful insights in real-world resource-limited agents). But I’m also curious what you think MIRIs research is focusing on vs. ignoring.
I also would not equate technical AIS with MIRI’s research.
Is it necessary to be convinced? I think the argument for AIS as a priority is strong so long as the concerns have some validity to them, and cannot be dismissed out of hand.
(cross posted on facebook):
I was thinking of applying… it’s a question I’m quite interested in. The deadline is the same as ICML tho!
I had an idea I will mention here: funding pools:
You and your friends whose values and judgement you trust and who all have small-scale funding requests join together.
A potential donor evaluates one funding opportunity at random, and funds all or none of them on the basis of that evaluation.
You have now increased the ratio of funding / evaluation available to a potential donor by a factor of #projects
There is an incentive for you to NOT include people in your pool if you think their proposal is quite inferior to yours… however, you might be incentivized to include somewhat inferior proposals in order to reach a threshold where the combined funding opportunity is large enough to attract more potential donors.
IIRC Etherium foundation is using QF somehow.
But it’s probably best just to get in touch with someone who knows more of what’s going on at RXC.
Not sure who that would be OTTMH, unfortunately.
To some extent, you don’t need to. I don’t believe there’s a very clear distinction between the 2 camps.
To begin with, this university would be viewed as weird, and I suspect, would not be particularly attractive to virtue signalers as a result. This would help establish a culture of genuine idealists.
This is part of the mandate of the admissions decision-makers. I expect if you had good people, you could do a pretty good job of screening applicants.
I was overall a bit negative on Sarah’s post, because it demanded a bit too much attention, (e.g. the title), and seemed somewhat polemic. It was definitely interesting, and I learned some things.
I find the most evocative bit to be the idea that EA treats outsiders as “marks”.
This strikes me as somewhat true, and sadly short-sighted WRT movement building. I do believe in the ideas of EA, and I think they are compelling enough that they can become mainstream.Overall, though, I think it’s just plain wrong to argue for an unexamined idea of honesty as some unquestionable ideal. I think doing so as a consequentialist, without a very strong justification, itself smacks of disingenuousness and seems motivated by the same phony and manipulative attitude towards PR that Sarah’s article attacks.
What would be more interesting to me would be a thoughtful survey of potential EA perspectives on honesty, but an honest treatment of the subject does seem to be risky from a PR standpoint. And it’s not clear that it would bring enough benefit to justify the cost. We probably will all just end up agreeing with common moral intuitions.
EDIT: I forgot to link to the Google group: https://groups.google.com/forum/#!forum/david-kruegers-80k-people
Hi! David Krueger (from Montreal and 80k) here. The advice others have given so far is pretty good.
My #1 piece of advise is: start doing research ASAP!
Start acting like a grad student while you are still an undergrad. This is almost a requirement to get into a top program afterwards. Find a supervisor and ideally try to publish a paper at a good venue before you graduate.Stats is probably a bit more relevant than CS, but some of both is good. I definitely recommend learning (some) programming. In particular, focus on machine learning (esp. Deep Learning and Reinforcement Learning). Do projects, build a portfolio, and solicit feedback.
If you haven’t already, please check out these groups I created for people wanting to get into AI Safety. There are a lot of resources to get you started in the Google Group, and I will be adding more in the near future. You can also contact me directly (see https://mila.umontreal.ca/en/person/david-scott-krueger/ for contact info) and we can chat.
In general, I think that people are being too conservative about addressing the issue. I think we need some “radicals” who aren’t as worried about losing some credibility. Whether or not you want to try and have mainstream appeal, or just be straightforward with people about the issue is a strategic question that should be considered case-by-case.
Of course, it is a big problem that talking about AIS makes a good chunk of people think you’re nuts. It’s been my impression that most of those people are researchers, not the general public, who are actually quite receptive to the idea (although maybe for the wrong reasons...)
I have a recommendation: try to get at least 3 people, so you aren’t managing your manager. I think accountability and social dynamics would be better that way, since:
- I suspect part of why line managers work for most people is because they have some position of authority that makes you feel obligated to satisfy them. If you are in equal positions, you’d mostly lose that effect.
- If there are only 2 of you, it’s easier to have a cycle of defection where accountability and standards slip. If you see the other person slacking, you feel more OK with slacking. Whereas if you don’t see the work of your manager, you can imagine that they are always on top of their shit.