AI strategy & governance. ailabwatch.org.
Zach Stein-Perlman
Ideally powerful AI will enable something like reflection rather than locking in prosaic human values or our ignorant conceptions of the good.
The field of alignment is really about alignability, not making sure “the right people control it.” That’s a different problem.
My favorite AI governance research since this post (putting less thought into this list):
Responsible Scaling Policies (METR 2023)
Deployment corrections (IAPS: O’Brien et al. 2023)
Open-Sourcing Highly Capable Foundation Models (GovAI: Seger et al. 2023)
Do companies’ AI Safety Policies meet government best practice? (CFI: Ó hÉigeartaigh et al. 2023)
AI capabilities can be significantly improved without expensive retraining (Davidson et al. 2023)
I mostly haven’t really read recent research on compute governance (e.g. 1, 2) or international governance (e.g. 1, 2, 3). Probably some of that would be on this list if I did.
I’m looking forward to the final version of the RAND report on securing model weights.
Feel free to mention your favorite recent AI governance research here.
I appreciate it; I’m pretty sure I have better options than finishing my Bachelor’s; details are out-of-scope here but happy to chat sometime.
TLDR: AI governance; maybe adjacent stuff.
Skills & background: AI governance research; email me for info on my recent work.
Location: flexible.
LinkedIn: linkedin.com/in/zsp/.
Email: zacharysteinperlman at gmail.
Other notes: no college degree.
I’ve left AI Impacts; I’m looking for jobs/projects in AI governance. I have plenty of runway; I’m looking for impact, not income. Let me know if you have suggestions!
(Edit to clarify: I had a good experience with AI Impacts.)
PSA about credentials (in particular, a bachelor’s degree): they’re important even for working in EA and AI safety.
When I dropped out of college to work on AI safety, I thought credentials are mostly important as evidence-of-performance, for people who aren’t familiar with my work, and are necessary in high-bureaucracy institutions (academia, government). It turns out that credentials are important—for working with even many people who know you (such that the credential provides no extra evidence) and are willing to defy conventions—for rational, optics-y reasons. It seems even many AI governance professionals/orgs are worried (often rationally) about appearing unserious by hiring or publicly-collaborating-with the uncredentialed, or something. Plus irrationally-credentialist organizations are very common/important, and may even comprise a substantial fraction of EA jobs and x-risk-focused AI governance jobs (which I expected to be more convention-defying), and sometimes an organization/institution is credentialist even when it’s led by weird AI safety people (those people operate under constraints).
Disclaimer: the evidence-from-my-experiences for these claims is pretty weak. This point’s epistemic status is more considerations + impressions from a few experiences than facts/PSA.
Upshot: I’d caution people against dropping out of college to increase impact unless they have a great plan.
(Edit to clarify: this paragraph is not about AI Impacts — it’s about everyone else.)
You don’t need EA or AI safety motives to explain the event. Later reporting suggested that it was caused by (1) Sutskever and other OpenAI executives telling the board that Altman often lied (WSJ, WaPo, New Yorker) and (2) Altman dishonestly attempting to remove Toner from the board (over the obvious pretext that her coauthored paper Decoding Intentions was too critical of OpenAI, plus allegedly falsely telling board members that McCauley wanted Toner removed) (NYT, New Yorker). As far as I know, there’s ~no evidence that EA or AI safety motives were relevant, besides the composition of the board. This isn’t much of a mystery.
See generally gwern’s comments.
Thanks!
General curiosity. Looking at it, I’m interested in my total-hours and karma-change. I wish there was a good way to remind me of… all about how I interacted with the forum in 2022, but wrapped doesn’t do that (and probably ~can’t do it; probably I should just skim my posts from that year...)
Cool. Is it still possible to see my 2022 wrapped?
I object to your translation of actual-votes into approval-votes and RCV-votes, at least in the case of my vote. I gave almost all of my points to my top pick, almost all of the rest to my second pick, almost all of the rest to my third pick, and so forth until I was sure I had chosen something that would make top 3. But e.g. I would have approved of multiple. (Sidenote: I claim my strategy is optimal under very reasonable assumptions/approximations. You shouldn’t distribute points like you’re trying to build a diverse portfolio.)
we are convinced this push towards decentralization will make the EA ecosystem more resilient and better enable our projects to pursue their own goals.
I’m surprised. Why? What was wrong with the EV sponsorship system?
(I’ve seen Elizabeth’s and Ozzie’s posts on this topic and didn’t think the downsides of sponsorship were decisive. Curious which downsides were decisive for you.)
[Edit: someone offline told me probably shared legal liability is pretty costly.]
Yep, AI safety people tend to oppose sharing model weights for future dangerous AI systems.
But it’s not certain that (operator-aligned) open-source powerful AI entails doom. To a first approximation, it entails doom iff “offense” is much more efficient than “defense,” which depends on context. But absent super monitoring to make sure that others aren’t making weapons/nanobots/whatever, or super efficient defenses against such attacks, I intuit that offense is heavily favored.
An undignified way for everyone to die: an AI lab produces clear, decisive evidence of AI risk/scheming/uncontrollability, freaks out, and tells the world. A less cautious lab ends the world a year later.
A possible central goal of AI governance: cause an AI lab produces decisive evidence of AI risk/scheming/uncontrollability, freaks out, and tells the world to quickly result in rules that stop all labs from ending the world.
I don’t know how we can pursue that goal.
I don’t want to try to explain now, sorry.
(This shortform was intended more as starting-a-personal-list than as a manifesto.)
What’s the best thing to read on “Zvi’s writing on EAs confusing the map for the territory”? Or at least something good?
Thanks for the engagement. Sorry for not really engaging back. Hopefully someday I’ll elaborate on all this in a top-level post.
Briefly: by axiological utilitarianism, I mean classical (total, act) utilitarianism, as a theory of the good, not as a decision procedure for humans to implement.
Thanks. I agree that the benefits could outweigh the costs, certainly at least for some humans. There are sophisticated reasons to be veg(etari)an. I think those benefits aren’t cruxy for many EA veg(etari)ans, or many veg(etari)ans I know.
Or me. I’m veg(etari)an for selfish reasons — eating animal corpses or feeling involved in the animal-farming-and-killing process makes me feel guilty and dirty.
I certainly haven’t done the cost-benefit analysis on veg(etari)anism, on the straightforward animal-welfare consideration or the considerations you mention. For example, if I was veg(etari)an for the straightforward reason (for agent-neutral consequentialist reasons), I’d do the cost-benefit analysis, and do things like:
Eat meat that would otherwise go to waste (when that wouldn’t increase anticipated demand for meat in the future)
Try to reduce others’ meat consumption, and try to reduce the supply of meat or improve the lives of farmed animals, when that’s more cost-effective than personal veg(etari)anism
Notice whether eating meat would substantially boost my health and productivity, and go back to eating meat if so
I think my veg(etari)an friends are mostly like me — veg(etari)an for selfish reasons. And they don’t notice this.
Written quickly, maybe hard-to-parse and imprecise.
(I agree it is reasonable to have a bid-ask spread when betting against capable adversaries. I think the statements-I-object-to are asserting something else, and the analogy to financial markets is mostly irrelevant. I don’t really want to get into this now.)
Thanks. I agree! (Except with your last sentence.) Sorry for failing to communicate clearly; we were thinking about different contexts.
Designing an impact market well is an open problem, I think. I don’t think your market works well, and I think the funders were mistaken to express interest. To illustrate:
Alice has an idea for a project that would predictably [produce $10 worth of impact / retrospectively be worth $10 to funders]. She needs $1 to fund it. Under normal funding, she’d be funded and there’d be a surplus worth $9 of funder money. In the impact market, whichever investor reads and understands her project first funds it and later gets $10.
More generally, in your market, all surplus goes to the investors. (This is less problematic since the investors have to donate their profits, but still, I’d rather have LTFF/EAIF/etc. decide how to allocate funds. Or if you believe it’s good for successful investors to allocate funds rather than the funders, and your value proposition depends on this, fine, but make that clear.)
Maybe this market is overwhelmingly supposed to be an experiment, rather than actually be positive-value? If so, fine, but then make sure you don’t scale it or cause others to do similar things without fixing this central problem.
I’m surprised I haven’t seen anyone else discuss your market mechanism. Have there been substantive public comments on your market anywhere? I haven’t seen any but haven’t been following closely.
Possibly I’m misunderstanding how your market works. [Edit: yep, see my comment, but I’m still concerned.] [Edit #2: the basic criticism stands: funders pay $10 for Alice’s project and this shows something is broken.] [Edit #3: actually maybe everything is fine and retroactive funders would correctly give Alice $1. See this comment, but the Manifund site is inconsistent.]