I thought The Alignment Problem was pretty good at giving a high level history. Despite the name, only a pretty small portion is actually about the alignment problem and a lot is about ML history.
TW123
“AI” is an indexical
Thanks for writing. I will say this phenomenon isn’t specific to EA. I used to organize a large ish non-EA event and huge numbers of people would fail to show up and others would try to register the day before. After enough iterations, we knew the fraction of people who did each, and we just planned for that. I wonder if you could do something similar for these events? But if it’s extremely variable based on year/location, that would be harder.
Also, I realized I’ve been assuming that for virtual conferences, there is essentially zero downside to being a no show. But maybe this isn’t true? Do people have to pay per swapcard profile, or some other variable cost I’m not aware of? If anyone knows, this seems relevant. For an in person conference it is a lot more obvious that money can still get spent on you if you don’t show up.
Not hugely surprising, given the people at CEA have certainly thought about this more than random forum users. Still, it’s good to do diligence. Thank you Eli for responding; this is a model example of somebody graciously explaining non-obvious considerations in a decision.
This is one reason I haven’t wanted to live in Berkeley. Whenever I visited (which was very frequently at one point) it was pretty exhausting to have so many people running around trying to figure out and chase which things were “cool” (far from everyone in Berkeley was like this, but too many seemed to be).
Outside Berkeley I notice it much less. I still interact frequently online with a lot of people who live in Berkeley, and for me this is much more pleasant. It’s a shame, because there are lots of benefits of talking to people in person. But for me they don’t seem worth the costs.
[This comment isn’t meant to signal any opinion about the rest of your post.]
Carlsmith’s report in particular is highly interdisciplinary and draws on technical AI, economics, and also philosophy. It doesn’t have much in the way of technical AI or economics claims. It’s not really clear who would be most qualified to write this, but in general a philosopher doesn’t seem like such a bad choice. In fact, I’d think the average philosopher with strong quantitative skills would be better at this than the average economist or certainly AI researcher.
Whether a more experienced philosopher should have done it is another question, but I’d imagine that even with money Open Phil cannot summon very experienced experts to write reports for them at the drop of a hat.
Do I view every problem in my life and my community as analogous or bearing surprising similarity to the alignment problem
This made me laugh.
But also, as I said at the top of the post, I actually do think the alignment problem does bear surprising similarities to other things, but this is mainly because of general ideas about complex systems pertain to both.
Welcome to the forum!
I’ve done research in reinforcement learning, and I can say this sort of behavior is very common and expected. I was working on a project once and incorrectly programmed the reward function, leading the agent to kill itself rather than explore the environment so that it could avoid an even greater negative reward from sticking around. I didn’t consider this very notable, because when I thought about the reward function, it was obvious this would happen.
Here is a spreadsheet with a really long list of examples of this kind of specification gaming. I suspect the reason your grant was rejected is because if the agent did as you suspect it would, this wouldn’t provide much original insight beyond what people have already found. I do think many of the examples are in “toy” environments and it might be interesting to observe more behavior like this in more complex enviroments.
It might still be useful for your own learning to implement this yourself!
The article now says the FAQ was “put together by the effective altruism forum.” This is not true either, nor does it make sense. If they would like to copy and paste:
“A user on the effective altruism forum posted an FAQ.”
This makes it sound a lot less official and less like something you’d want to quote in a publication. But that’s because it is in fact not official at all.
I interpreted this post as the author saying that they thought general AI capabilities would be barely advanced by this kind of thing, if they were advanced by it at all. The author doesn’t seem to suggest building an AGI startup, but rather some kind of AI application startup.
I’m curious if you think your reasoning applies to anything with a small chance of accelerating timelines by a small amount, or if you instead disagree with the object-level claim that such a startup would only have a small chance of accelerating timelines by a small amount.
The Centre for Effective Altruism has had to deal with a lot of questions about Bankman-Fried since FTX’s collapse. Here’s an FAQ it put together.
This is not true. The FAQ was not put together by CEA, and so far as I know Hamish has no affiliation with CEA (nor did he ever claim to).
This is an interesting post, and I’m glad you wrote it.
People are driven by incentives, which can be created with cash in a variety of ways
I agree with these ways, but I think it’s quite hard to manage incentives properly. You mention DARPA, but DARPA is a major bureacracy comprised of people who are aligned to their own incentive structures, and ultimately the most powerful organization in the world (the US government). Such a thing does not exist in AI safety—not even close. Money would certainly help with this, but it certainly can’t just be straightforwardly turned into good research.
Where is that money going to come from? How are they going to fine-tune the latest models when OpenAI is spending billions?
It’s unclear to me that having EA people starting an AI startup is more tractable than convincing other people that the work is worth funding. It certainly works faster to convince people who already have money now, vs. creating people who might make money later. I don’t really have a strong opinion, but this doesn’t seem wholly justified.
It’s frustratingly difficult to predict what will actually be useful for AI Safety...But money is flexible. It’s hard to imagine a world where another billion doesn’t come in handy.
I don’t see how the flexibility of money makes any difference? Isn’t it frustratingly difficult to predict which uses of money will actually be useful for AI safety? In that case, you still have the same problem.
The blog post says ChatGPT is trained with proximal policy optimization. This documentation says text-davinci-003 was trained with PPO, but not text-davinci-002.
However, it is interesting what you’re saying about the request payloads, because this seems to be contradictory. So I’m not quite sure anymore. It’s possible that ChatGPT was trained with PPO on top of the non-PPO text-davinci-002.
text-davinci-003 (which is effectively ChatGPT) was a bit better than text-davinci-002 anecdotally and when I benchmarked it on TriviaQA. It was only released about a week before ChatGPT so it’s not necessarily unreasonable to lump them together. If you do, then the interface isn’t the only change one might associate with ChatGPT.
This is an important consideration, thanks for bringing it up! I pretty much agree with all of it.
Thank you for writing this. I’m sure it was very difficult to do, and so I really appreciate it.
Effective altruism has an emotions problem.
I strongly agree with this. Have you seen Michel’s post on emotional altruism? It doesn’t get to your points specifically, but is similarly the need for more open emotion in the movement.
I also to add something that, in my experience, cannot really be ignored when speaking about the expression of emotion in particular in EA. EA has a lot of people who are on the autism spectrum who may relate to emotion differently, particularly in the way they speak about it. There are others who aren’t on the spectrum but similarly have a natural inclination to be less or differently publicly emotional. EA/rationality can feel rather welcoming (like “my people”) to people like this (which is good—welcomingness of people whose brains work differently is good) and this may produce a feedback loop. This is not at all to deny your recommendations on this particular area. Rather, it is to acknowledge that some proportion (far from all) of what you call the “emotions problem” is probably just people being themselves in a way we should find acceptable, which means that I am a bit more confused about how to best address it.
I sympathize with this. My AI safety career was started when I turned down a tech company return offer for the chance of maybe later getting an AI safety research internship. The tech company paid more than triple (seeing as AI safety internship was in academia). I ended up getting the AI safety internship, but I almost didn’t!
Not everyone wants to take that risk. And not everyone can afford to, either.
It’s not obvious to me that January is correct. For instance, at Yale, more than two thirds of CS majors received their full time job offers in December or earlier (most of these are likely in the tech industry). You often don’t have very long to reply to these offers so I wouldn’t be surprised if many had to accept prior to January.
I agree that April is later than nearly anywhere else; but I’m also not convinced that January would be better than November. I basically agree with this post but I think people should do their own research on what they’re competing with rather than just deciding on January on the basis of this post.
I’m not sure why you think 10k people signing it matters (as opposed to which people), or why you assume the factor is 10^(-6). I read this post and I’m not sure the estimate is anywhere close. By that I mean that it could be tens of orders of magnitude off, or even that the sign of the estimate could be wrong and it would be bad for this to get many signatures.
I think it probably would have been more useful to people if you’d discussed more generally why you think this letter might have a good effect rather than trying to quantify existential risk reductions from individual EA forum users’ signatures (which I take to be extremely intractable).
I originally helped design the course and I ran the first iteration of a similar program. I’m not really involved with the course now but I think I’m qualified to answer. However, I did AGI safety fundamentals a long time ago and haven’t done MLAB, so my knowledge of those could be wrong (though I don’t think so).
In comparison to AGI Safety Fundamentals, this course a lot more technical and less conceptual. AGISF is not going to include the latest in machine learning on a technical level, and this course doesn’t include as many conceptual readings.
In comparison with MLAB, this course is more focused on reading papers and understanding research, and less focused on teaching particular frameworks or engineering skills.
There’s a bit of overlap between all, but it’s pretty minimal. I think anyone who has done any of these programs would learn something from doing the others. It mostly depends on what people want to take out of the course: knowledge of a lot of different conceptual research directions (AGISF), skills in engineering with Pytorch (MLAB), or knowledge of the frontier of ML safety research and paper reading skills (Intro to ML Safety).