Zach Stein-Perlman

Karma: 5,376

AI strategy & governance. ailabwatch.org.

Zach Stein-Perlman May 1, 2024, 4:49 AM
6 points
0 ∶ 0
in reply to: Larks’s comment on: Introducing AI Lab Watch
1. Yep, that’s related to my “Give some third parties access to models to do model evals for dangerous capabilities” criterion. See here and here.
2. As I discuss here, it seems DeepMind shared super limited access with UKAISI (only access to a system with safety training + safety filters), so don’t give them too much credit.
3. I suspect Politico is wrong and the labs never committed to give early access to UKAISI. (I know you didn’t assert that they committed that.)

Zach Stein-Perlman Apr 29, 2024, 4:45 AM
3 points
1 ∶ 15
on: Is there any way to be confident that humanity won’t keep employing mass torture of animals for millions of years in the future?
Utilitarians aware of the cosmic endowment, at least, can take comfort in the fact that the prospect of quadrillions of animals suffering isn’t even a feather in the scales. They shut up and multiply.
(Many others should also hope humanity doesn’t go extinct soon, for various moral and empirical reasons. But the above point is often missed among people I know.)

Zach Stein-Perlman Apr 25, 2024, 4:58 AM
25 points
7 ∶ 0
in reply to: Habryka [Deactivated]’s comment on: Bryan Johnson seems more EA aligned than I expected
Hmm, I think having the mindset behind effective altruistic action basically requires you to feel the force of donating. It’s often correct to not donate because of some combination of expecting {better information/deconfusion, better donation opportunities, excellent non-donation spending opportunities, high returns, etc.} in the future. But if you haven’t really considered large donations or don’t get that donating can be great, I fail to imagine how you could be taking effective altruistic action. (For extremely rich people.) (Related indicator of non-EA-ness: not strongly considering causes outside the one you’re most passionate about.)
(I don’t have context on Bryan Johnson.)

Zach Stein-Perlman Apr 17, 2024, 8:07 PM
2 points
0 ∶ 0
on: High School Senior Seeking Remote Volunteer Opportunities in EA
See https://ea-internships.pory.app/board, you can filter for volunteer.
It would be helpful to mention if you have background or interest in particular cause areas.

Zach Stein-Perlman Apr 6, 2024, 4:52 PM
2 points
0 ∶ 0
in reply to: MichaelStJules’s comment on: JP’s Shortform
(I endorse this.)

Zach Stein-Perlman Apr 6, 2024, 5:22 AM
13 points
3 ∶ 2
in reply to: JP Addison🔸’s comment on: JP’s Shortform
I’m annoyed at vague “value” questions. If you ask a specific question the puzzle dissolves. What should you do to make the world go better? Maximize world-EV, or equivalently maximize your counterfactual value (not in the maximally-naive way — take into account how “your actions” affect “others’ actions”). How should we distribute a fixed amount of credit or a prize between contributors? Something more Shapley-flavored, although this isn’t really the question that Shapley answers (and that question is almost never relevant, in my possibly controversial opinion).
Happy to talk about well-specified questions. Annoyed at questions like “should I use counterfactuals here” that don’t answer the obvious reply, “use them FOR WHAT?”
I don’t feel 100% bought-in to the Shapley Value approach, and think there’s a value in paying attention to the counterfactuals. My unprincipled compromise approach would be to take some weighted geometric mean and call it a day.
FOR WHAT?
Let’s assume in all of these scenarios that you are only one of the players in the situation, and you can only control your own actions.
If this is your specification (implicitly / further specification: you’re an altruist trying to maximize total value, deciding how to trade off between increasing X and doing good in other ways) then there is a correct answer — maximize counterfactual value (this is equivalent to maximizing total value, or argmaxing total value over your possible actions), not your personal Shapley value or anything else. (Just like in all other scenarios. Multiplicative-ness is irrelevant. Maximizing counterfactual value is always the answer to questions about what action to take.)

Zach Stein-Perlman Apr 1, 2024, 6:17 PM
1 point
4 ∶ 0
on: The Centre for Effective Altruism is spinning out of the Centre for Effective Altruism
Finally

Zach Stein-Perlman Mar 26, 2024, 11:06 PM
4 points
1 ∶ 0
in reply to: Linch’s comment on: Retro funder profile & Manifund team recs (ACX Grants 2024: Impact Market)
My current impression is that there is no mechanism and funders will do whatever they feel like and some investors will feel misled...
I now agree funders won’t really lose out, at least.

Zach Stein-Perlman Mar 26, 2024, 10:14 PM
4 points
0 ∶ 0
in reply to: Jason’s comment on: Retro funder profile & Manifund team recs (ACX Grants 2024: Impact Market)
Hmm. I am really trying to fill in holes, not be adversarial, but I mostly just don’t think this works.
the funder probably recognizes some value in [] the projects the investors funded that weren’t selected for retrofunding
No. If the project produces zero value, then no value for funder. If the project produces positive value, then it’s retrofunded. (At least in the simple theoretical case. Maybe in practice small-value projects don’t get funded. Then profit-seeking investors raise their bar: they don’t just fund everything that’s positive-EV, only stuff that’s still positive-EV when you treat small positive outcomes as zero. Not sure how that works out.)
the funder probably recognizes some value in . . . aligned investors likely devoting their “profits” on other good projects
Yes.
If those gains aren’t enough for the retrofunder, it could promise 100% payment up to investment price, but only partial payment of impact over the investment price—thus splitting the surplus between itself and the investor in whatever fraction seems advisible.
Surely this isn’t optimal, there’s deadweight loss. And it’s still exploitable and this suggests that something is broken. E.g. Alice can do something like: write a bad proposal for her project to ensure it isn’t funded in advance, self-fund at an investment of $10, and thereby extract $10 from the funders.

Zach Stein-Perlman Mar 26, 2024, 9:48 PM
2 points
0 ∶ 0
in reply to: Rachel Weinberg’s comment on: Retro funder profile & Manifund team recs (ACX Grants 2024: Impact Market)
This is ultimately up to retro funders, and they each might handle cases like this differently.
Oh man, having the central mechanism unclear makes me really uncomfortable for the investors. They might invest reasonably, thinking that the funders would use a particular process, and then the funders use a less generous process...
In my opinion, by that definition of true value which is accounting for other opportunities and limited resources, they should just pay $100 for it. If LTFF is well-calibrated, they do not pay any more (in expectation) in the impact market than they do with regular grantmaking, because 99% of project like this will fail, and LTFF will pay nothing for those. So there is still the same amount of total surplus, but LTFF is only paying for the projects that actually succeeded.
What happened to “operate on a model where they treat retrospective awards the same as prospective awards, multiplied by a probability of success.” Can you apply that idea to this case? I think the idea is incoherent and if not I want to know how it works. [This is the most important paragraph in this comment.] [Edit: actually the first paragraph is important too: if funders aren’t supposed to make decisions in a particular way, but just assign funding according to no prespecified mechanism, that’s a big deal.]
(Also, if the funder just pays $100, there’s zero surplus, and if the funder always pays their true value then there’s always zero surplus and this is my original concern...)
There’s a different type of “true value”, which is like how much would the free market pay for AI safety researchers if it could correctly account for existential risk reduction which is an intergenerational public good.
Sure. I claim this is ~never decision-relevant and not a useful concept.

Zach Stein-Perlman Mar 26, 2024, 9:13 PM
2 points
0 ∶ 0
in reply to: Zach Stein-Perlman’s comment on: Retro funder profile & Manifund team recs (ACX Grants 2024: Impact Market)
Actually I’m confused again. Suppose:
Bob has a project idea. The project would cost $10. A funder thinks it has a 99% chance of producing $0 value and a 1% chance of producing $100 value, so its EV is $1, and that’s less than its cost, so it’s not funded in advance. A super savvy investor thinks the project has EV > $10 and funds it. It successfully produces $100 value.
How much is the funder supposed to give retroactively?
I feel like ex-ante-funder-beliefs are irrelevant and the right question has to be “how much would you pay for the project if you knew it would succeed.” But this question is necessarily about “true value” rather than covering the actual costs to the project-doer and giving them a reasonable wage. (And funders have to use the actual-costs-and-reasonable-wage stuff to fund projects for less than their “true value” and generate surplus.)
What links here?
- Zach Stein-Perlman's comment on Retro funder profile & Manifund team recs (ACX Grants 2024: Impact Market) by Saul Munn (Mar 26, 2024, 8:17 PM; 2 points)

Zach Stein-Perlman Mar 26, 2024, 8:59 PM
2 points
0 ∶ 0
in reply to: Rachel Weinberg’s comment on: Retro funder profile & Manifund team recs (ACX Grants 2024: Impact Market)
Ah, hooray! This resolves my concerns, I think, if true. It’s in tension with other things you say. For example, in the example here, “The Good Foundation values the project at $18,000 of impact” and funds the project for $18K. This uses the true-value method rather than the divide-by-P(success) method.
In this context “project’s true value (to a funder) = $X” means “the funder is indifferent between the status quo and spending $X to make the project happen.” True value depends on available funding and other available opportunities; it’s a marginal analysis question.

Zach Stein-Perlman Mar 26, 2024, 8:20 PM
2 points
0 ∶ 0
in reply to: Jason’s comment on: Retro funder profile & Manifund team recs (ACX Grants 2024: Impact Market)
I agree this would be better — then the funders would be able to fund Alice’s project for $1 rather than $10. But still, for projects that are retroactively funded, there’s no surplus-according-to-the-funder’s-values, right?

Zach Stein-Perlman Mar 26, 2024, 8:17 PM
2 points
0 ∶ 0
in reply to: Zach Stein-Perlman’s comment on: Retro funder profile & Manifund team recs (ACX Grants 2024: Impact Market)
Related, not sure: maybe it’s OK if the funder retroactively gives something like cost ÷ ex-ante-P(success). What eliminates the surplus is if the funder retroactively gives ex-post-value.
Edit: no, this mechanism doesn’t work. See this comment.

Zach Stein-Perlman Mar 26, 2024, 8:14 PM
2 points
0 ∶ 0
in reply to: Lily Jordan’s comment on: Retro funder profile & Manifund team recs (ACX Grants 2024: Impact Market)
Yes. Rather than spending $1 on a project worth $10, the funder is spending $10 on the project — so the funder’s goals aren’t advanced. (Modulo that the retroactive-funding-recipients might donate their money in ways that advance the funder’s goals.)

Zach Stein-Perlman Mar 26, 2024, 6:59 PM
2 points
0 ∶ 0
in reply to: Lily Jordan’s comment on: Retro funder profile & Manifund team recs (ACX Grants 2024: Impact Market)
Thanks.
So if project-doers don’t sell all of their equity, do they get retroactive funding for the rest, or just moral credit for altruistic surplus? The former seems very bad to me. To illustrate:
Alice has an idea for a project that would predictably [produce $10 worth of impact / retrospectively be worth $10 to funders]. She needs $1 to fund it. Under normal funding, she’d be funded and there’d be a surplus worth $9 of funder money. In the impact market, she can decline to sell equity (e.g. by setting the price above $10 and supplying the $1 costs herself) and get $10 retroactive funding later, capturing all of the surplus.
The latter… might work, I’ll think about it.

Zach Stein-Perlman Mar 26, 2024, 4:15 AM
2 points
0 ∶ 0
in reply to: Zach Stein-Perlman’s comment on: Retro funder profile & Manifund team recs (ACX Grants 2024: Impact Market)
Oh wait I forgot about the details at https://manifund.org/about/impact-certificates. Specific criticism retracted until learning more; skepticism remains. What happens if a project is funded at a valuation higher than its funding-need? If Alice’s project is funded for $5, where does $4 go?
What links here?
- Zach Stein-Perlman's comment on Retro funder profile & Manifund team recs (ACX Grants 2024: Impact Market) by Saul Munn (Mar 26, 2024, 4:10 AM; 3 points)

Zach Stein-Perlman Mar 26, 2024, 4:10 AM
3 points
0 ∶ 1
on: Retro funder profile & Manifund team recs (ACX Grants 2024: Impact Market)
Designing an impact market well is an open problem, I think. I don’t think your market works well, and I think the funders were mistaken to express interest. To illustrate:
Alice has an idea for a project that would predictably [produce $10 worth of impact / retrospectively be worth $10 to funders]. She needs $1 to fund it. Under normal funding, she’d be funded and there’d be a surplus worth $9 of funder money. In the impact market, whichever investor reads and understands her project first funds it and later gets $10.
More generally, in your market, all surplus goes to the investors. (This is less problematic since the investors have to donate their profits, but still, I’d rather have LTFF/EAIF/etc. decide how to allocate funds. Or if you believe it’s good for successful investors to allocate funds rather than the funders, and your value proposition depends on this, fine, but make that clear.)
Maybe this market is overwhelmingly supposed to be an experiment, rather than actually be positive-value? If so, fine, but then make sure you don’t scale it or cause others to do similar things without fixing this central problem.
I’m surprised I haven’t seen anyone else discuss your market mechanism. Have there been substantive public comments on your market anywhere? I haven’t seen any but haven’t been following closely.
Possibly I’m misunderstanding how your market works. [Edit: yep, see my comment, but I’m still concerned.] [Edit #2: the basic criticism stands: funders pay $10 for Alice’s project and this shows something is broken.] [Edit #3: actually maybe everything is fine and retroactive funders would correctly give Alice $1. See this comment, but the Manifund site is inconsistent.]

Zach Stein-Perlman Mar 22, 2024, 11:11 PM
2 points
0 ∶ 0
on: What if doing the most good = benevolent AI takeover and human extinction?
- Ideally powerful AI will enable something like reflection rather than locking in prosaic human values or our ignorant conceptions of the good.
- Cosmopolitan values don’t come free.
- The field of alignment is really about alignability, not making sure “the right people control it.” That’s a different problem.

Zach Stein-Perlman Feb 25, 2024, 4:30 AM
6 points
1 ∶ 0
on: My favorite AI governance research this year so far
My favorite AI governance research since this post (putting less thought into this list):
1. Responsible Scaling Policies (METR 2023)
2. Deployment corrections (IAPS: O’Brien et al. 2023)
3. Open-Sourcing Highly Capable Foundation Models (GovAI: Seger et al. 2023)
4. Do companies’ AI Safety Policies meet government best practice? (CFI: Ó hÉigeartaigh et al. 2023)
5. AI capabilities can be significantly improved without expensive retraining (Davidson et al. 2023)
I mostly haven’t really read recent research on compute governance (e.g. 1, 2) or international governance (e.g. 1, 2, 3). Probably some of that would be on this list if I did.
I’m looking forward to the final version of the RAND report on securing model weights.
Feel free to mention your favorite recent AI governance research here.