@Kyle J. Lucchese Can you cross-post this to LessWrong?
JakubK
Ah ok, I thought “academic researcher” referred to professors/lecturers/postdocs, not PhD students.
one more subtle form of misaligned AI might be an AI that treats humans ok but adopts common human views on nonhuman animal welfare and perpetuates factory farming or abuse of a massive number of digital minds
This is unrelated to the core messages of the post, but I think there’s an important point to consider. A sufficiently intelligent system could improve cultured meat technology or invent other technological innovations for producing meat without factory farms.
an academic researcher in the Bay, who would earn around $40,000-50,000 per year, and a comparable researcher in a for-profit lab, who earns $200,000-500,000.
Totally unrelated to the purpose of the post, but is this for real? $50,000 seems absurdly low, especially since the Bay Area has a high cost of living.
It looks like the CNN clip featuring Stephen Hawking was especially effective—maybe given Hawking’s scientific status and reputation as a genius, and the fact that he’s not seen as a politically polarizing or controversial figure (unlike Elon Musk or PewDiePie).
I don’t think reputation can explain the effectiveness:
Both the CNN video and the CNBC article mention Elon Musk.
The CNBC article had unremarkable effects compared to the other media items, even though it was mostly about Stephen Hawking.
Most of the CNN video (1:20 to the end, 5:51) is an interview with James Barrat, a documentary filmmaker and author.
James Barrat seems to be an excellent orator. I disagree with some of his points in the CNN video, but his presentation is very smooth. Maybe this explains the video’s success.
Naively, it seems as if killing everyone would earn AI a massive penalty in training: why would it develop aims that are consistent with doing that?
There are multiple cognitive strategies that succeed in a training regime that heavily penalizes killing humans (even just one human), such as:
avoid killing humans at all times
avoid killing humans when someone will notice
avoid killing humans during training
How do you incentivize (1)?
The weightings use “annoying pain” as a baseline. How many units of annoying pain would you exchange for a unit of moderate happiness? And then how many units of moderate happiness would you trade for a unit of various pleasant experiences (maybe stuff related to psychedelics, food, nature, music, meditation, exercise, laughter, love, success, beauty, relaxation, fulfillment, etc)?
I imagine the answers to the above questions vary significantly from person to person. I’d be keen to see any existing research on this topic.
Also, maybe I missed it, but the “Question 2” section seems to exclude any detailed contemplation of the value of various pleasant experiences. This makes the analysis seem imbalanced to me.
Note that there is this document of projects and exercises relating to AI safety, which is part of the Stampy project.
The empirical reason is that being a newsletter subscriber correlates with having spent about twice the recorded time on our website, compared to non-subscribers.
I feel like the causality could be “people spend a lot of time on the website” --> “people subscribe to the newsletter” rather than the other way?
And how does 80k record people’s time spent on the website?
You could think about a random normal distribution of estimated-value clustered around the true value of the action. The more actors (estimated-values you draw from the normal distribution) the more likely you are to get an outlier who thinks the value is positive when it is actually negative.
Additionally, the people willing to act unilaterally are more likely to have positively biased estimates of the value of the action:
As you noted, some curse arguments are symmetric, in the sense that they also provide reason to expect unilateralists to do more good. Notably, the bias above is asymmetric; it provides a reason to expect unilateralists to do less good, with no corresponding reason to expect unilateralists to do more good.
These posts provide some interesting points:
AI Timelines via Cumulative Optimization Power: Less Long, More Short
What a compute-centric framework says about AI takeoff speeds—draft report
Disagreement with bio anchors that lead to shorter timelines
I’d like to see more posts like these (including counterarguments or reviews (example 1, example 2)), since timelines are highly relevant to career plans.
I have a hypothesis that some people are updating towards shorter timelines because they didn’t pay much attention to AI capabilities until seeing some of 2022′s impressive (public) results. Indeed, 2022 included results like LaMDA, InstructGPT, chain-of-thought prompting, GopherCite, Socratic Models, PaLM, PaLM-SayCan, DALL-E 2, Flamingo, Gato, AI-assisted circuit design, solving International Math Olympiad problems, Copilot finishing its preview period, Parti, VPT, Minerva, DeepNash, Midjourney entering open beta, AlphaFold Protein Structure Database expanding from nearly 1 million to over 200 million structures, Stable Diffusion, AudioLM, ACT-1, Whisper, Make-A-Video, Imagen Video, AlphaTensor, CICERO, ChatGPT, RT-1, answering medical questions, and more.
Did Superintelligence have a dramatic effect on people like Elon Musk? I can imagine Elon getting involved without it. That involvement might have been even more harmful (e.g. starting an AGI lab with zero safety concerns).
Here’s one notable quote about Elon (source), who started college over 20 years before Superintelligence:In college, he thought about what he wanted to do with his life, using as his starting point the question, “What will most affect the future of humanity?” The answer he came up with was a list of five things: “the internet; sustainable energy; space exploration, in particular the permanent extension of life beyond Earth; artificial intelligence; and reprogramming the human genetic code.”
Overall, causality is multifactorial and tricky to analyze, so concepts like “causally downstream” can be misleading.
(Nonetheless, I do think it’s plausible that publishing Superintelligence was a bad idea, at least in 2014.)
My second suggestion is to explicitly connect the present to the future. Compare these two examples:
Example 1:
In the future, your doctor could be an AI.
Example 2:
In the future, your doctor could be an AI. Here’s how it could happen: …
I think the main issue with example 1 is that it lacks detail. I think a solution is to be as concrete and specific as possible when describing possible futures, and note when you’re uncertain.
What I would find helpful is a list of potential career pathways in the AI safety space, categorised by the level of technical skills you’ll need (or not) to pursue them.
I’m not sure if this is currently possible to make, because there are very few established career paths in AI safety (e.g. “people have been doing jobs involving X for the past 10 years and here’s the trajectory they usually follow”), especially outside of technical research and engineering careers. I did make a small list of roles at maisi.club/help; but again, it’s hard to find clear examples of what these career paths actually look like.
Say that you are compromised if it is easy for someone to shame you.
...
Lots of people on this forum have struggled with the feeling of being compromised. Since FTX. Or Leverage. Or Guzey. Or Thiel. Or Singer. Or Mill or whatever.[4]
...You will make mistakes, and people will rightly hold you to them.[7] It will feel terrible.
I’m confused why you’re including Guzey and Thiel in this list. It doesn’t seem like Guzey’s critique is a mistake that he should “feel terrible” about (although I only did a quick skim), and Torres mentions Thiel exactly once in that article:
Meanwhile, the billionaire libertarian and Donald Trump supporter Peter Thiel, who once gave the keynote address at an EA conference, has donated large sums of money to the Machine Intelligence Research Institute, whose mission to save humanity from superintelligent machines is deeply intertwined with longtermist values.
I was about to comment this too. From a brief skim I can’t find any clarification about what the term “fellowship programs” is referring to.
Thanks for the detailed reply. So instead of informing people who value altruism about the best ways to help, we should also try to elevate the importance of altruism within people’s values. Am I understanding correctly?
Awesome, I added a link to your spreadsheet.
Thank you!
I’m not sure how the comment review process works, but it’s possible that the evaluators assess the importance of an idea/topic based on the total number of times it appears across all the comments.