Maybe! I’m only going after a steady stream of 2-3 chapters per week. Be in touch if you’re interested: I’m re-reading the first quarter of PLF since they published a new version in the time since I knocked out the first quarter of it.
quinn
I’ve been increasingly hearing advice to the effect that “stories” are an effective way for an AI x-safety researcher to figure out what to work on, that drawing scenarios about how you think it could go well or go poorly and doing backward induction to derive a research question is better than traditional methods of finding a research question. Do you agree with this? It seems like the uncertainty when you draw such scenarios is so massive that one couldn’t make a dent in it, but do you think it’s valuable for AI x-safety researchers to make significant (i.e. more than 30% of their time) investments in both 1. doing this directly by telling stories and attempting backward induction, and 2. training so that their stories will be better/more reflective of reality (by studying forecasting, for instance)?
I’m thrilled about this post—during my first two-three years of studying math/cs and thinking about AGI my primary concern was the rights and liberties of baby agents (but I wasn’t giving suffering nearly adequate thought). Over the years I became more of an orthodox x-risk reducer, and while the process has been full of nutritious exercises, I fully admit that becoming orthodox is a good way to win colleagues, not get shrugged off as a crank at parties, etc. and this may have played a small role, if not motivated reasoning then at least humbly deferring to people who seem like they’re thinking clearer than me.
I think this area is sufficiently undertheorized and neglected that the following is only hypothetical, but could become important: how is one to tradeoff between existential safety (for humans) and suffering risks (for all minds)?
Value is complex and fragile. There are numerous reasons to be more careful than kneejerk cosmopolitanism, and if one’s intuitions are “for all minds, of course!” it’s important to think through what steps one’d have to take to become someone who thinks safeguarding humanity is more important than ensuring good outcomes for creatures in other substrates. This was best written about, to my knowledge, in the old Value Theory sequence by Eliezer Yudkowsky and to some extent Fun Theory, while it’s not 100% satisfying I don’t think one go-to sequence is the answer, as a lot of this stuff should be left as exercise for the reader.
Is anyone worried about x-risk and s-risk signaling a future of two opposite factions of EA? That is to say, what are the odds that there’s no way for humanity-preservers and suffering-reducers to get along? You can easily imagine disagreement about how to tradeoff research resources between human existential safety and artificial welfare, but what if we had to reason about deployment? Do we deploy an AI that’s 90% safe against some alien paperclipping outcome, 30% reduction in artificial suffering; or one that’s 75% safe against paperclipping, 70% reduction in artificial suffering?
If we’re lucky, there will be a galaxy-brained research agenda or program, some holes or gaps in the theory or implementation that allows and even encourages coalitioning between humanity-preservers and suffering-reducers. I don’t think we’ll be this lucky, in the limiting case where one humanity-preserver and one suffering-reducer are each at the penultimate stages of their goals. However we shouldn’t be surprised if there is some overlap, the cooperative AI agenda comes to mind.
I find myself shocked at point #2, at the inadequacy of the state of theory of these tradeoffs. Is it premature to worry about that before the AS movement has even published a detailed agenda/proposal of how to allocate research effort grounded in today’s AI field? Much theorization is needed to even get to that point, but it might be wise to think ahead.
I look forward to reading the preprint this week, thanks
Awesome! I probably won’t apply as I lack political background and couldn’t tell you the first thing about running a poll, but my eyes will be keenly open in case you post a broader data/analytics job as you grow. Good luck with the search!
Hi Luke, could you describe a candidate that would inspire you to flex the bachelor’s requirement for Think Tank Jr. Fellow? I took time off credentialed institutions to do lambda school and work (didn’t realize I want to be a researcher until I was already in industry), but I think my overall CS/ML experience is higher than a ton of the applicants you’re going to get (I worked on cooperative AI at AI Safety Camp 5 and I’m currently working on multi-multi delegation, hence my interest in AI governance). If possible, I’d like to hear from you how you’re thinking about the college requirement before I invest the time into writing a cumulative 1400 words.
Ah, just saw techpolicyfellowship@openphilanthropy.org at the bottom of the page. Sorry, will direct my question to there!
We’re writing to let you know that the group you tried to contact (techpolicyfellowship) may not exist, or you may not have permission to post messages to the group. A few more details on why you weren’t able to post:
* You might have spelled or formatted the group name incorrectly.
* The owner of the group may have removed this group.
* You may need to join the group before receiving permission to post.
* This group may not be open to posting.
If you have questions related to this or any other Google Group, visit the Help Center at https://support.google.com/a/openphilanthropy.org/bin/topic.py?topic=25838.
Thanks,
openphilanthropy.org admins
(cc’d to the provided email address)
In Think Tank Junior Fellow, OP writes
Recently obtained a bachelor’s or master’s degree (including Spring 2022 graduates)
How are you thinking about this requirement? Is there something flex about it (like when a startup says they want a college graduate) or are there bureaucratic forces at partner organizations locking it in stone (like when a hospital IT department says they want a college graduate)? Perhaps describe properties of a hypothetical candidate that would inspire you to flex this requirement?
What’s the latest on moral circle expansion and political circle expansion?
Were slaves excluded from the moral circle in ancient greece or the US antebellum south, and how does this relate to their exclusion from the political circle?
If AIs could suffer, is recognizing that capacity a slippery slope toward giving AIs the right to vote?
Can moral patients be political subjects, or must political subjects be moral agents? If there was some tipping point or avalanche of moral concern for chickens, that wouldn’t imply arguments for political representation of chickens, right?
Consider pre-suffrage women, or contemporary children: they seem fully admitted into the moral circle, but only barely admitted to the political circle.
A critique of MCE is that history is not one march of worse to better (smaller to larger), there are in fact false starts, moments of retrograde, etc. Is PCE the same but even moreso?
If I must make a really bad first approximation, I would say a rubber band is attached to the moral circle, and on the other end of the rubber band is the political circle, so when the moral circle expands it drags the political circle along with it on a delay, modulo some metaphorical tension and inertia. This rubber band model seems informative in the slave case, but uselessly wrong in the chickens case, and points to some I think very real possibilities in the AI case.
CW death
I’m imagining myself having a 6+ figure net worth at some point in a few years, and I don’t know anything about how wills work.
Do EAs have hit-by-a-bus contingency plans for their net worths?
Is there something easy we can do to reduce the friction of the following process: Ask five EAs with trustworthy beliefs and values to form a grantmaking panel in the event of my death. This grantmaking panel could meet for thirty minutes and make a weight allocation decision on the giving what we can app, or they can accept applications and run it that way, or they can make an investment decision that will interpret my net worth as seed money for an ongoing fund; it would be up to them.
I’m assuming this is completely possible in principle: I solicit those five EAs who have no responsibilities or obligations as long as I’m alive, if they agree I get a lawyer to write up a will that describes everything.
If one EA has done this, the “template contract” would be available to other EAs to repeat it. Would it be worth lowering the friction of making this happen?
Related idea: I can hardcode weight assignment for the giving what we can app into my will, surely a non-EA will-writing lawyer could wrap their head around this quickly. But is there a way to not have to solicit the lawyer every time I want to update my weights, in response to my beliefs and values changing while I’m alive?
It sounds at the face of it that the second idea is lower friction and almost as valuable as the first idea for most individuals.
CC’d to lesswrong.com/shortform
Positive and negative longtermism
I’m not aware of a literature or a dialogue on what I think is a very crucial divide in longtermism.
In this shortform, I’m going to take a polarity approach. I’m going to bring each pole to it’s extreme, probably each beyond positions that are actually held, because I think median longtermism or the longtermism described in the Precipice is a kind of average of the two.
Negative longtermism is saying “let’s not let some bad stuff happen”, namely extinction. It wants to preserve. If nothing gets better for the poor or the animals or the astronauts, but we dodge extinction and revolution-erasing subextinction events, that’s a win for negative longtermism.
In positive longtermism, such a scenario is considered a loss. From an opportunity cost perspective, the failure to erase suffering or bring to agency and prosperity to
1e1000
comets and planets hurts literally as bad as extinction.Negative longtermism is a vision of what shouldn’t happen. Positive longtermism is a vision of what should happen.
My model of Ord says we should lean at least 75% toward positive longtermism, but I don’t think he’s an extremist. I’m uncertain if my model of Ord would even subscribe to the formation of this positive and negative axis.
What does this axis mean? I wrote a little about this earlier this year. I think figuring out what projects you’re working on and who you’re teaming up with strongly depends on how you feel about negative vs. positive longtermism. The two dispositions toward myopic coalitions are “do” and “don’t”. I won’t attempt to claim which disposition is more rational or desirable, but explore each branch
When Alice wants future
X
and Bob wants futureY
, but if they don’t defeat the adversary Adam they will be stuck with future0
(containing great disvalue), Alice and Bob may set aside their differences and choose form a myopic coalition to defeat Adam or not.Form myopic coalitions. A trivial case where you would expect Alice and Bob to tend toward this disposition is if
X
andY
are similar. However, ifX
andY
are very different, Alice and Bob must each believe that defeating Adam completely hinges on their teamwork in order to tend toward this disposition, unless they’re in a high trust situation where they each can credibly signal that they won’t try to get a head start on theX
vs.Y
battle until0
is completely ruled out.Don’t form myopic coalitions. A low trust environment where Alice and Bob each fully expect the other to try to get a head start on
X
vs.Y
during the fight against0
would tend toward the disposition of not forming myopic coalitions. This could lead to great disvalue if a project against Adam can only work via a team of Alice and Bob.
An example of such a low-trust environment is, if you’ll excuse political compass jargon, reading bottom-lefts online debating internally the merits of working with top-lefts on projects against capitalism. The argument for coalition is that capitalism is a formiddable foe and they could use as much teamwork as possible; the argument against coalition is historical backstabbing and pogroms when top-lefts take power and betray the bottom-lefts.
For a silly example, consider an insurrection against broccoli. The ice cream faction can coalition with the pizzatarians if they do some sort of value trade that builds trust, like the ice cream faction eating some pizza and the pizzatarians eating some ice cream. Indeed, the viciousness of the fight after broccoli is abolished may have nothing to do with the solidarity between the two groups under broccoli’s rule. It may or may not be the case that the ice cream faction and the pizzatarians can come to an agreement about best to increase value in a post-broccoli world. Civil war may follow revolution, or not.
Now, while I don’t support long reflection (TLDR I think a collapse of diversity sufficient to permit a long reflection would be a tremendous failure), I think elements of positive longtermism are crucial for things to improve for the poor or the animals or the astronauts. I think positive longtermism could outperform negative longtermism when it comes to finding synergies between the extinction prevention community and the suffering-focused ethics community. However, I would be very upset if I turned around in a couple years and positive longtermists were, like, the premiere face of longtermism. The reason for this is once you admit positive goals, you have to deal with everybody’s political aesthetics, like a philosophy professor’s preference for a long reflection or an engineer’s preference for moar spaaaace or a conservative’s preference for retvrn to pastorality or a liberal’s preference for intercultural averaging. A negative goal like “don’t kill literally everyone” greatly lacks this problem. Yes, I would change my mind about this if 20% of global defense expenditure was targeted at defending against extinction-level or revolution-erasing events, then the neglectedness calculus would lead us to focus the by comparison smaller EA community on positive longtermism.
The takeaway from this shortform should be that quinn thinks negative longtermism is better for forming projects and teams.
- 8 Oct 2022 17:10 UTC; 15 points) 's comment on Don’t leave your fingerprints on the future by (
- 28 Dec 2021 20:52 UTC; 12 points) 's comment on Democratising Risk—or how EA deals with critics by (
- 17 Jun 2022 21:37 UTC; 10 points) 's comment on quinn’s Quick takes by (
- 25 Feb 2023 0:00 UTC; 6 points) 's comment on On Philosophy Tube’s Video on Effective Altruism by (
- 20 Jun 2023 19:02 UTC; 5 points) 's comment on Longtermists are perceived as power-seeking by (
- 5 Jul 2022 20:47 UTC; 5 points) 's comment on My Most Likely Reason to Die Young is AI X-Risk by (
- 25 Feb 2023 20:17 UTC; 4 points) 's comment on Some Reflections on Philosophy Tube’s “The Rich Have Their Own Ethics: Effective Altruism & the Crypto Crash” by (
- 27 Feb 2023 21:04 UTC; 3 points) 's comment on Rockwell’s Quick takes by (
Don’t Look Up might be one of the best mainstream movies for the xrisk movement. Eliezer said it’s too on the nose to bare/warrant actually watching. I fully expect to write a review for EA Forum and lesswrong about xrisk movement building.
Strange. Everyone I watched it with (the second time when I watched it with non-EAs) was impressed and touched. My sister, who has mostly climate change epistemics, was emotionally moved into thinking more about her own extinction concerns (and was very amenable when I explained that pandemics and some AI scenarios are greater threats than climate change).
thanks!
Why have I heard about Tyson investing into lab grown, but I haven’t heard about big oil investing in renewable?
Tyson’s basic insight here is not to identify as “an animal agriculture company”. Instead, they identify as “a feeding people company”. (Which happens to align with doing the right thing, conveniently!)
It seems like big oil is making a tremendous mistake here. Do you think oil execs go around saying “we’re an oil company”? When they could instead be going around saying “we’re a powering stuff” company. Being a powering stuff company means you have fuel source indifference!
I mean if you look at all the money they had to spend on disinformation and lobbying, isn’t it insultingly obvious to say “just invest that money into renewable research and markets instead”?
Is there dialogue on this? Also, have any members of “big oil” in fact done what I’m suggesting, and I just didn’t hear about it?
CC’d to lesswrong shortform
https://www.lesswrong.com/posts/kq8CZzcPKQtCzbGxg/quinn-s-shortform?commentId=yLG8yWWHhuTKLbdZA seems like a I didn’t hear about it kind of thing
This review is great and has gotten a lot of my friends excited about science and being a human. Just watched the movie last night, absolutely loved it.
Thanks for the comment. I wasn’t aware of yours and Rohin’s discussion on Arden’s post. Did you flesh out the inductive alignment idea on lw or alignment forum? It seems really promising to me.
I want to jot down notes more substantive than “wait until I post ‘Going Long on FV’ in a few months” today.
FV in AI Safety in particular
As Rohin’s comment suggests, both aiming proofs about properties of models toward today’s type theories and aiming tomorrow’s type theories toward ML have two classes of obstacles: 1. is it possible? 2. can it be made competitive?
I’ve gathered that there’s a lot of pessimism about 1, in spite of MIRI’s investment in type theory and in spite of the word “provably” in CHAI’s charter. My personal expected path to impact as it concerns 1. is “wait until theorists smarter than me figure it out”, and I want to position myself to worry about 2..
I think there’s a distinction between theories and products, and I think programmers need to be prepared to commercialize results. There’s a fundamental question: should we expect that a theory’s competitiveness can be improved one or more orders of magnitude by engineering effort, or will engineering effort only provide improvements of less than an order of magnitude? I think a lot depends on how you feel about this.
Asya:
Asya may not have been speaking about AI safety here, but my basic thinking is that if less primitive proof assistants end up drastically more competitive, and at the same time there are opportunities convert results in verified ML into tooling, expertise in this area could gain a lot of leverage.
FV in other paths to impact
Rohin:
It’s not clear to me that grinding FV directly is as wise as, say, CompTIA certifications. From the expectation that FV pays dividends in advanced cybersec, we cannot conclude that FV is relevant to early stages of a cybersec path.
Related: Information security careers for GCR reduction. I think the software safety standards in a wide variety of fields have a lot of leverage over outcomes.