mic

Karma: 2,031

mic Jul 5, 2022, 4:31 AM
35 points
8 ∶ 0
on: My Most Likely Reason to Die Young is AI X-Risk
Technical note: I think we need to be careful to note the difference in meaning between extinction and existential catastrophe. When Joseph Carlsmith talks about existential catastrophe, he doesn’t necessarily mean all humans dying; in this report, he’s mainly concerned about the disempowerment of humanity. Following Toby Ord in The Precipice, Carlsmith defines an existential catastrophe as “an event that drastically reduces the value of the trajectories along which human civilization could realistically develop”. It’s not straightforward to translate his estimates of existential risk to estimates of extinction risk.
Of course, you don’t need to rely on Joseph Carlsmith’s report to believe that there’s a ≥7.9% chance of human extinction conditioning on AGI.
What links here?
- Zach Stein-Perlman's comment on My Most Likely Reason to Die Young is AI X-Risk by AISafetyIsNotLongtermist (Jul 4, 2022, 4:01 PM; 27 points)

mic Jul 1, 2022, 7:04 AM
11 points
0 ∶ 0
on: $500 bounty for alignment contest ideas
Here’s my proposal for a contest description. Contest problems #1 and 2 are inspired by Richard Ngo’s Alignment research exercises.
AI alignment is the problem of ensuring that advanced AI systems take actions which are aligned with human values. As AI systems become more capable and approach or exceed human-level intelligence, it becomes harder to ensure that they remain within human control instead of posing unacceptable risks.
One solution to AI alignment proposed by Stuart Russell, a leading AI researcher, is the assistance game, also called a cooperative inverse reinforcement learning (CIRL) game, following these principles:
1. “The machine’s only objective is to maximize the realization of human preferences.
2. The machine is initially uncertain about what those preferences are.
3. The ultimate source of information about human preferences is human behavior.”
For a more formal specification of this proposal, please see Stuart Russell’s new book on why we need to replace the standard model of AI, Cooperatively Learning Human Values, and Cooperative Inverse Reinforcement Learning.
Contest problem #1: Why are assistance games not an adequate solution to AI alignment?
- The first link describes a few critiques; you’re free to restate them in your own words and elaborate on them. However, we’d be most excited to see a detailed, original exposition of one or a few issues, which engages with the technical specification of an assistance game.
Another proposed solution to AI alignment is iterated distillation and amplification (IDA), proposed by Paul Christiano. Paul runs the Alignment Research Center and previously ran the language model alignment team at OpenAI. In IDA, a human H wants to train an AI agent, X by repeating two steps: amplification and distillation. In the amplification step, the human uses multiple copies of X to help it solve a problem. In the distillation step, the agent X learns to reproduce the same output as the amplified system of the human + multiple copies of X. Then we go through another amplification step, then another distillation step, and so on.
You can learn more about this at Iterated Distillation and Amplification and see a simplified application of IDA in action at Summarizing Books with Human Feedback.
Contest problem #2: Why might an AI system trained through IDA be misaligned with human values? What assumptions would be needed to prevent that?
Contest problem #3: Why is AI alignment an important problem? What are some research directions and key open problems? How can you or other students contribute to solving it through your career?
- We’d recommend reading Intro to AI Safety, Why AI alignment could be hard with modern deep learning, AI alignment—Wikipedia, My Overview of the AI Alignment Landscape: A Bird’s Eye View—AI Alignment Forum, AI safety technical research—Career review, and Long-term AI policy strategy research and implementation—Career review.
You’re free to submit to one or more of these contest problems. You can write as much or as little as you feel is necessary to express your ideas concisely; as a rough guideline, feel free to write between 300 and 2000 words. For the first two content problems, we’ll be evaluating submissions based on the level of technical insight and research aptitude that you demonstrate, not necessarily quality of writing.
I like how contest problems #1 and 2:
- provide concrete proposals for solutions to AI alignment, so it’s not an impossibly abstract problem
- ask participants to engage with prior research and think about issues, which seems to be an important aspect of doing research
- are approachable
Contest problem #3 here isn’t a technical problem, but I think it can be helpful so that participants actually end up caring about AI alignment rather than just engaging with it on a one-time basis as part of this contest. I think it would be exciting if participants learned on their own about why AI alignment matters, form a plan for how they could work on it as part of their career, and end up motivated to continue thinking about AI alignment or to support AI safety field-building efforts in India.

mic Jul 1, 2022, 3:55 AM
18 points
0 ∶ 0
on: (Even) More Early-Career EAs Should Try AI Safety Technical Research
Some quick thoughts:
- Strong +1 to actually trying and not assuming a priori that you’re not good enough.
- If you’re at all interested in empirical AI safety research, it’s valuable to just try to get really good at machine learning research.
- An IMO medalist or generic “super-genius” is not necessarily someone who would be a top-tier AI safety researcher, and vice versa.
- For trying AI safety technical research, I’d strongly recommend How to pursue a career in technical AI alignment.

AI safety university groups: a promising opportunity to reduce existential risk

micJun 30, 2022, 6:37 PM

53 points

1 comment11 min readEA link

mic Jun 21, 2022, 12:17 AM
9 points
0 ∶ 0
in reply to: aog’s comment on: How to become more agentic, by GPT-EA-Forum-v1
As a countervailing perspective, Dan Hendrycks thinks that it would be valuable to have automated moral philosophy research assistance to “help us reduce risks of value lock-in by improving our moral precedents earlier rather than later” (though I don’t know if he would endorse this project). Likewise, some AI alignment researchers think it would be valuable to have automated assistance with AI alignment research. If EAs could write a nice EA Forum post just by giving GPT-EA-Forum a nice prompt and revising the resulting post, that could help EAs save time and explore a broader space of research directions. Still, I think some risks are:
- This bot would write content similar to what the EA Forum has already written, rather than advancing EA philosophy
- The content produced is less likely to be well-reasoned, lowering the quality of content on the EA Forum

mic Jun 19, 2022, 10:23 AM
8 points
0 ∶ 0
on: Software Developers: How to have Impact? [WIP]
Distributed computing seems to be a skill in high demand among AI safety organizations. Does anyone have recommendations for resources to learn about it? Would it look like using the PyTorch Distributed package or something like a microservices architecture?

Starting an EA uni group is simple and effective, thanks to intro EA programs

micJun 17, 2022, 6:34 AM

17 points

0 comments12 min readEA link

mic Jun 8, 2022, 9:42 AM
14 points
0 ∶ 0
on: AGI Ruin: A List of Lethalities
I feel somewhat concerned that after reading your repeated writing saying “use your AGI to (metaphorically) burn all GPUs”, someone might actually do so, but of course their AGI isn’t actually aligned or powerful enough to do so without causing catastrophic collateral damage. At least the suggestion encourages AI race dynamics – because if you don’t make AGI first, someone else will try to burn all your GPUs! – and makes the AI safety community seem thoroughly supervillain-y.

Points 5 and 6 suggest that soon after someone develops AGI for the first time, they must use it to perform a pivotal act as powerful as “melt all GPUs”, or else we are doomed. I agree that figuring out how to align such a system seems extremely hard, especially if this is your first AGI. But aiming for such a pivotal act with your first AGI isn’t our only option, and this strategy seems much riskier than if we take some more time use our AGI to solve alignment further before attempting any pivotal acts. I think it’s plausible that all major AGI companies could stick to only developing AGIs that are (probably) not power-seeking for a decent number of years. Remember, even Yann LeCun of Facebook AI Research thinks that AGI should have strong safety measures. Further, we could have compute governance and monitoring to prevent rogue actors from developing AGI, at least until we solve alignment enough to entrust more capable AGIs to develop strong guarantees against random people developing misaligned superintelligences. (There are also similar comments and responses on LessWrong.)

Perhaps a crux here is that I’m more optimistic than you about things like slow takeoffs, AGI likely being at least 20 years out, the possibility of using weaker AGI to help supervise stronger AGI, and AI safety becoming mainstream. Still, I don’t think it’s helpful to claim that we must or even should aim to try to “burn all GPUs” with our first AGI, instead of considering alternative strategies.

mic Jun 8, 2022, 8:16 AM
3 points
0 ∶ 0
on: How to dissolve moral cluelessness about donating mosquito nets
Thanks for writing this! I’ve seen Hilary Greaves’ video on longtermism and cluelessness in a couple university group versions of the Intro EA Program (as part of the week on critiques and debates), so it’s probably been influencing some people’s views. I think this post is a valuable demonstration that we don’t need to be completely clueless about the long-term impact of presentist interventions.

mic Jun 8, 2022, 7:57 AM
4 points
0 ∶ 0
in reply to: Pat Andriola’s comment on: Four Concerns Regarding Longtermism
I’m really sorry that my comment was harsher than I intended. I think you’ve written a witty and incisive critique which raises some important points, but I had raised my standards since this was submitted to the Red Teaming Contest.

mic Jun 7, 2022, 2:22 AM
12 points
0 ∶ 0
on: Four Concerns Regarding Longtermism
For future submissions to the Red Teaming Contest, I’d like to see posts that are much more rigorously argued than this. I’m not concerned about whether the arguments are especially novel.
My understanding of the key claim of the post is, EA should consider reallocating some more resources from longtermist to neartermist causes. This seems plausible – perhaps some types of marginal longtermist donations are predictably ineffective, or it’s bad if community members feel that longtermism unfairly has easier access to funding – but I didn’t find the four reasons/arguments given in this post particularly compelling.
The section Political Capital Concern appears to claim: If EA as a movement doesn’t do anything to help regular near-term causes, people will think that it’s not doing anything to help people, and it could die as a movement. I agree that this is possible (though I also think a “longtermism movement” could still be reasonably successful, though unlikely to have much membership compared to EA.) However, EA continues dedicate substantial resources to near-term causes – hundreds of millions of dollars of donations each year! – and this number is only increasing, as GiveWell hopes to direct 1 billion dollars of donations per year. EA continues to highlight its contributions to near-term causes. As a movement, EA is doing fine in this regard.
So then, if the EA movement as a whole is good in this regard, who should change their actions based on the political capital concern? I think it’s more interesting to examine whether local EA groups, individuals, and organizations should have a direct positive impact on near-term causes for signalling reasons. The post only gives the following recommendation (which I find fairly vague): “Instead, the thought is: when running your utility models, factor this in however you can. Consider that utility translated from EA resources to present life, when done effectively and messaged well, ^[4] redounds as well on the gains to future life.” However, rededicating resources from longtermism to neartermism has costs to the longtermist projects you’re not supporting. How do we navigate these tradeoffs? It would have been great to see examples for this.
The “Social Capital Concern” section writes:
focusing on longterm problems is probably way more fun than present ones.^[7] Longtermism projects seem inherently more big picture and academic, detached from the boring mundanities of present reality.
This might be true for some people, but I think for most EAs, concrete or near-term ways of helping people has a stronger emotional appeal, all else equal. I would find the inverse of the sentence a lot more convincing, to be honest: “focusing on near-term problems is probably way more fun than ones in the distant future. Near-term projects seem inherently more appealing and helpful, grounded in present-day realities.”
But that aside, if I am correct that longtermism projects are sexier by nature, when you add communal living/organizing to EA, it can probably lead to a lot of people using flimsy models to talk and discuss and theorize and pontificate, as opposed to creating tangible utility, so that they can work on cool projects without having to get their hands too dirty, all while claiming the mantle of not just the same, but greater, do-gooding.
Longtermist projects may be cool, and their utility may be more theoretical than near-term projects, but I’m extremely confused what you mean when they don’t involve getting your hands dirty (in a way such that near-termist work, such as GiveWell’s charity effectiveness research, involves more hands-on work). Effective donations have historically been the main neartermist EA thing to do, and donating is quite hands-off.
So individual EA actors, given social incentives brought upon by increased communal living, will want to find reasons to engage in longtermism projects because it will increase their social capital within the community.
This seems likely, and thanks for raising this critique (especially if it hasn’t been highlighted before), but what should we do about it? The red-teaming contest is looking for constructive and action-relevant critiques, and I think it wouldn’t be that hard to take some time to propose suggestions. The action implied by the post is that we should consider shifting more resources to near-termism, but I don’t think that would necessarily be the right move, compared to, e.g., being more thoughtful about social dynamics and making an effort to welcome neartermist perspectives.
The section on Muscle Memory Concern writes:
I think this is a reason to avoid a disproportionate emphasis on longtermism projects. Because longtermism efficacy is inherently more difficult to calculate with confidence, it can become quite easy to forget how to provide utility quickly and confidently.
I don’t know, even the most meta of longtermist projects, such as longtermist community building (or to go even another meta level, support for longtermist community building), is quite grounded in metrics and have short feedback loops, such that you can tell if your activities are having an impact – if not impact on the utility across all time, then at least something tangible, such as high-impact career transitions. I think the skills would transfer fairly well over to something more near-termist, such as community organizing for animal welfare, or running organizations in general. In contrast, if you’re doing charity effectiveness research, whether near-termist or longtermist, it can be hard to tell if your work is any good. Over time, I think that now that we have more EAs getting their hands dirty with projects instead of just earning to give, as a community, we have more experience to be able to execute projects, whether longtermist or near-termist.
As for the final section, the discount factor concern:
Future life is less likely to exist than current life. I understand the irony here, since longtermism projects seek to make it more likely that future life exists. But inherently you just have to discount the utility of each individual future life. In the aggregate, there’s no question that the utility gains are still enormous. But each individual life should have some discount based on this less-likely-to-exist factor.
I think longtermists are already accounting for the fact that we should discount future people by their likelihood to exist. That said, longtermist expected utility calculations are often more naive than they should be. For example, we often wrongly interpret reducing x-risk reduction from one cause by 1% as reducing x-risk as a whole by 1%, or conflate a 1% x-risk reduction this century with a 1% x-risk reduction across all time.
(I hope you found this comment informative, but I don’t know if I’ll respond to this comment, as I already spent an hour writing this and don’t know if it was a good use of my time.)

mic May 24, 2022, 2:55 AM
5 points
0 ∶ 0
on: What’s the value of creating my own fellowship program when I can direct people to the virtual programs?
Some quick thoughts:
- EA Virtual Programs should be fine in my opinion, especially if you think you have more promising things to do than coordinating logistics for a program or facilitating cohorts
- The virtual Intro EA Program only has discussions in English and Spanish. If group members would much prefer to have discussions in Hungarian instead, it might be useful for you to find some Hungarian-speaking facilitators.
- Like Jaime commented, if you’re delegating EA programs to EA Virtual Programs, it’s best for you to have some contact with participants, especially particularly engaged ones, so that you can have one-on-one meetings exploring their key uncertainties, share with them relevant opportunities, encouraging them to etc.
- It’s rare for the EAIF to provide full-time funding for community building (see this comment)
- I’d try to see if you could do more publicity of EA Virtual Programs, such as at Hungarian universities

mic May 24, 2022, 2:35 AM
4 points
0 ∶ 0
on: What does the Project Management role look like in AI safety?
I see two new relevant roles on the 80,000 Hours job board right now:
- Anthropic—Interpretability Team Manager
- OpenAI—Product Manager, Applied Safety
  - Note that I’m not sure this is what you have in mind for AI safety; this role seems to be focused on developing and enforcing usage guidelines of products like DALL-E 2, Copilot, and GPT-3.
Here’s an excerpt from Anthropic’s job posting. It’s looking for basic familiarity with deep learning and mechanistic interpretability, but mostly nontechnical skills.
In this role you would:
- Partner closely with the interpretability research lead on all things team related, from project planning to vision-setting to people development and coaching.
- Translate a complex set of novel research ideas into tangible goals and work with the team to accomplish them.
- Ensure that the team’s prioritization and workstreams are aligned with its goals.
- Manage day-to-day execution of the team’s work including investigating models, running experiments, developing underlying software infrastructure, and writing up and publishing research results in a variety of formats.
- Unblock your reports when they are stuck, and help get them whatever resources they need to be successful.
- Work with the team to uplevel their project management skills, and act as a project management leader and counselor.
- Support your direct reports as a people manager—conducting productive 1:1s, skillfully offering feedback, running performance management, facilitating tough but needed conversations, and modeling excellent interpersonal skills.
- Coach and develop your reports to decide how they would like to advance in their careers and help them do so.
- Run the interpretability team’s recruiting efforts, in concert with the research lead.
You might be a good fit if you:
- Are an experienced manager and enjoy practicing management as a discipline.
- Are a superb listener and an excellent communicator.
- Are an extremely strong project manager and enjoy balancing a number of competing priorities.
- Take complete ownership over your team’s overall output and performance.
- Naturally build strong relationships and partner equally well with stakeholders in a variety of different “directions”—reports, a co-lead, peer managers, and your own manager.
- Enjoy recruiting for and managing a team through a period of growth.
- Effectively balance the needs of a team with the needs of a growing organization.
- Are interested in interpretability and excited to deepen your skills and understand more about this field.
- Have a passion for and/or experience working with advanced AI systems, and feel strongly about ensuring these systems are developed safely.
Other requirements:
- A minimum of 3-5 years of prior management or equivalent experience
- Some technical or science-based knowledge or expertise
- Basic familiarity in deep learning, AI, and circuits-style interpretability, or a desire to learn
- Previous direct experience in machine learning is a plus, but not required

mic May 23, 2022, 3:39 AM
2 points
0 ∶ 0
on: The real state of climate solutions—want to help?
You might want to share this project idea in the Effective Environmentalism Slack, if you haven’t already done so.

mic May 23, 2022, 3:37 AM
2 points
0 ∶ 0
on: Apply to help run EAGxIndia, Berkeley, Singapore and Future Forum!
Is the application form “EAGxBerkeley, India & Future Forum Organizing Team Expression of Interest” supposed to have questions asking about whether you’re interested in organizing the Future Forum? I don’t see any; I only see questions about EAGxBerkeley and EAGxIndia.

mic May 20, 2022, 6:15 PM
10 points
0 ∶ 0
in reply to: Peter’s comment on: Most students who would agree with EA ideas haven’t heard of EA yet (results of a large-scale survey)
From my experience with running EA at Georgia Tech, I think the main factors are:
- not prioritizing high-impact causes
- not being interested in changing their career plans
- lack of high-impact career opportunities that fit their career interests, or not knowing about them
- not having the skills to get high-impact internships or jobs

mic May 20, 2022, 5:49 PM
16 points
0 ∶ 0
in reply to: Aaron Gertler 🔸’s comment on: Some potential lessons from Carrick’s Congressional bid
I think I was primarily concerned that negative information about the campaign could get picked up by the media. Thinking it over now though, that motivation doesn’t make sense for not posting about highly visible negative news coverage (which the media would have already been aware of) or not posting concerns on a less publicly visible EA platform, such as Slack. Other factors for why I didn’t write up my concerns about Carrick’s chances of being elected might have been that:
- no other EAs seemed to be posting much negative information about the campaign, and I thought there might have been a good reason for that
- aside from the posting of “Why Helping the Flynn Campaign is especially useful right now”, there weren’t any events that triggered me to consider writing up my concerns
- the negative media coverage was obvious enough that I thought anyone considering volunteering would already know about it, and it had to already have been priced into the election odds estimates on Metaculus and PredictIt, so drawing attention to it might not have been valuable
- time-sensitivity, as you mentioned
- public critiques might have to be quite well-reasoned, and I might want to check-in with the campaign to make sure that I didn’t misunderstand anything, etc. That could be a decent amount of effort on my part and their part and also somewhat awkward given that I was also volunteering for the campaign.
However, if someone privately asked me for my thoughts on how likely the campaign was to succeed or how valuable helping with it was, I would have been happy to share my honest opinion, including any concerns.

mic May 20, 2022, 5:26 PM
2 points
0 ∶ 0
in reply to: Aaron Gertler 🔸’s comment on: Some potential lessons from Carrick’s Congressional bid
Thanks for the suggestion, just copied the critiques of the “especially useful” post over!

mic May 20, 2022, 5:25 PM
12 points
0 ∶ 0
on: Why Helping the Flynn Campaign is especially useful right now
Before the election was decided, I agreed with the overall point that donating, phone banking, or door-knocking for the campaign seemed quite valuable. At the same time, I want to mention a couple critiques I have (copied from my comment on “Some potential lessons from Carrick’s Congressional bid”)
- The post claims “The race seems to be quite tight. According to this poll, Carrick is in second place among likely Democratic voters by 4% (14% of voters favor Flynn, 18% favor Salinas), with a margin of error of +/- 4 percentage points.” However, it declines to mention that the same poll found that “26 percent of the district’s voters [hold] an unfavorable opinion of him, compared to only 7 percent for Salinas” (The Hill).
- At the time the post was written, a significant fraction of voters already had already voted. The claim “the campaign is especially impactful right now” seems misleading when it would have been better to help earlier on.
- The campaign already has plenty of TV ads from the Protect the Future PAC, and there are lots of internet comments complaining about receiving mailers every other day and seeing Carrick ads all the time. (Though later I learned that PAC ads aren’t able to show Carrick speaking, and I’ve read a few internet comments complaining about how they’ve never heard Carrick speak despite seeing all those ads. So campaign donations could be valuable for ads which do show him speaking.)
- Having a lot of people coming out-of-state to volunteer could further the impression among voters that Carrick doesn’t have much support from Oregonians.
- If you can speak enthusiastically and knowledgeably about the campaign, you can do a better job of phone banking or door-knocking than the average person. However, the campaign already spent $847,000 for door-knockers. While volunteering for the campaign might have been high in expected value, the fact that other people could do door-knocking raises questions about whether it’s in out-of-state EAs’ comparative advantage to do so.

mic May 19, 2022, 4:51 AM
56 points
0 ∶ 0
in reply to: Aaron Gertler 🔸’s comment on: Some potential lessons from Carrick’s Congressional bid
Overall, I agree with Habryka’s comment that “negative evidence on the campaign would be ‘systematically filtered out’”. Although I maxed out donations to the primary campaign and phone banked a bit for the campaign, I had a number of concerns about the campaign that I never saw mentioned in EA spaces. However, I didn’t want to raise these concerns for fear that this would negatively affect Carrick’s chances of winning the election.
Now that Carrick’s campaign is over, I feel more free to write my concerns. These included:
- The vast majority of media coverage was negative from the start. If voters made even a cursory Google of Carrick Flynn’s name, they would be met with plenty of negative headlines like “Carrick Flynn, Crypto-Backed Candidate in New Congressional District, Has Rarely Voted in Oregon” or “Environmental Groups Condemn Congressional Candidate Carrick Flynn’s Comments on Spotted Owls and Timber Unity”.
- The vast majority of comments on Oregon subreddits were also negative.
- The campaign seemed to have quite few non-EA donors or volunteers, suggesting a lack of local support.
- The campaign seemed to have few volunteers until about a week ago, after Why Helping the Flynn Campaign is especially useful right now was posted.
- Even putting aside the issue of crypto funding, Carrick had a notable amount of other controversies such as his comments on spotted owl conservation, the fact that only 2.5% of donations were from Oregon, and that he only voted twice in the past 20 years.
I also have some critiques of the post Why Helping the Flynn Campaign is especially useful right now but I declined to write a comment. These include:
- The post claims “The race seems to be quite tight. According to this poll, Carrick is in second place among likely Democratic voters by 4% (14% of voters favor Flynn, 18% favor Salinas), with a margin of error of +/- 4 percentage points.” However, it declines to mention that “26 percent of the district’s voters holding an unfavorable opinion of him, compared to only 7 percent for Salinas” (The Hill).
- At the time the post was written, a significant fraction of voters already had already voted. The claim “the campaign is especially impactful right now” seems misleading when it would have been better to help earlier on.
- The campaign already has plenty of TV ads from the Protect the Future PAC, and there are lots of internet comments complaining about receiving mailers every other day and seeing Carrick ads all the time. (Though later I learned that PAC ads aren’t able to show Carrick speaking, and I’ve read a few internet comments complaining about how they’ve never heard Carrick speak despite seeing all those ads. So campaign donations could be valuable for ads which do show him speaking.)
- Having a lot of people coming out-of-state to volunteer could further the impression among voters that Carrick doesn’t have much support from Oregonians.
- If you can speak enthusiastically and knowledgeably about the campaign, you can do a better job of phone banking or door-knocking than the average person. However, the campaign already spent $847,000 for door-knockers. While volunteering for the campaign might have been high in expected value, the fact that other people could do door-knocking raises questions about whether it’s in out-of-state EAs’ comparative advantage to do so.

mic

AI safety uni­ver­sity groups: a promis­ing op­por­tu­nity to re­duce ex­is­ten­tial risk

Start­ing an EA uni group is sim­ple and effec­tive, thanks to in­tro EA programs

In this role you would:

You might be a good fit if you:

Other requirements:

AI safety university groups: a promising opportunity to reduce existential risk

Starting an EA uni group is simple and effective, thanks to intro EA programs