Exploring how cognitive science can improve AI safety, governance and prioritization.
I’d be excited to intern for any research project.
Always happy to chat!
Exploring how cognitive science can improve AI safety, governance and prioritization.
I’d be excited to intern for any research project.
Always happy to chat!
The core idea sounds very interesting: Increasing rationality likely has effects which can be generalized, therefore having a measure could help evaluate wider social outreach causes.
Defining intelligence could be an AI-complete problem, but I think the problem is complicated enough as a simple factor analysis (i. e. even without knowing what we’re talking about :). I think estimating impact once we know the increase in any measure of rationality is the easier part of the problem—for ex. knowing how much promoting long-termist thinking increases support for AI regulation, we’re only a few steps from getting the QALY. The harder part for people starting out in social outreach might be to estimate how many people they can get on board of thinking more long-termistically with their specific intervention.
So I think it might be very useful to put together a list of all attempts to calculate the impact of various social outreach strategies for anyone who’s considering a new one to be able to find some reference points because the hardest estimates here also seem to be the most important (e. g. the probability Robert Wright would decrease oversuspicion between powers). My intuition tells me differences in attitudes are something intuition could predict quite well, so the wisdom of the crowd could work well here.
The best source I found when I tried to search whether someone tried to put changing society into numbers recently is this article by The Sentience Institute.
Also, this post adds some evidence based intervention suggestions to your list.
When coming up with a similar project,* I thought the first step should be to conduct exploratory interviews with EAs that would reveal their hypotheses about the psychological factors that may go into one’s decision to take AI safety seriously. My guess would be that ideological orientation would explain the most variance.*which I most likely won’t realize (98 %)
Edit: My project has been accepted for the CHERI summer research program, so I’ll keep you posted!
I’d love to see a deeper inquiry into which problems of EAs are most effectively reduced by which interventions. The suggestion there’s a lack of “skilled therapists used to working with intelligent, introspective clients” is a significant novel consideration for me as an aspiring psychologist and this kind of hybrid research could help me calibrate my intuitions.
I got access to Bing Chat. It seems:
- It only searches through archived versions of websites (it doesn’t retrieve today’s news articles, it accessed an older version of my Wikipedia user site)
- During archivation, it only downloads the content one can see without any engagement with the website (tested on Reddit “see spoiler” buttons which reveal new content in the code. It could retrieve info from posts that gained less attention but weren’t hidden behind the spoiler button)
I. e. it’s still in a box of sorts, unless it’s much more intelligent than it pretends.
Edit: A recent ACX post argues text-predicting oracles might be safer, as their ability to form goals is super limited, but it provides 2 models how even they could be dangerous: By simulating an agent or via a human who decides to take bad advice like “run the paperclip maximizer code”. Scott implies thinking it would spontaneously form goals is extreme, linking a post by Veedrac. The best argument there seems to be: It only has memory equivalent to 10 human seconds. I find this convincing for the current models but it also seems limiting for the intelligence of these systems, so I’m afraid for future models, the incentives are aligned with reducing this safety valve.
If Big Tech finds these kinds of salaries cost-effective to solve their problems, I would consider it a strong argument in favor of this project.
I imagine Elon Musk could like this project given that he believes in small effective teams of geniuses.
I’d say “polymaths” is a good label for people I’d expect to make progress like Yudkowsky, Bostrom, Hanson and von Neumann.
Edit: This may be fame-selection (engineers don’t often get credit, particularly in teams) or self-selection (interest in math+society).
The Manhattan and Enigma projects seem like examples where this kind of strategy just worked out. Some consideration that come to mind:
There could be selection effects.
From what I can find, members of these teams weren’t lured in by a lot of money. However, the salience of the AI threat in society is tiny, compared to that of WWII and large incentives could compensate that.
I’ve read money can sometimes decrease intrinsic motivation, that drives exploration & inventions, however these findings are being rebutted by newer studies. Apart from that, my guess would be that getting those teams together is the key part and if large money can facilitate that, great.
A wild idea that might help in case a similar phenomenon works in the sub-population of geniuses & which could make this project more appealing to donors: Limit a portion of these salaries, so that the recipients could only use them for socially beneficial uses.
Suggestion: Integrated search in LessWrong, EA Forum, Alignment Forum and perhaps Progress Forum posts.
Update: I’m pleased to learn Yudkowsky seems to have suggested a similar agenda in a recent interview with Dwarkesh Patel (timestamp) as his greatest source of predictable hope about AI. It’s a rather fragmented bit but the gist is: Perhaps people doing RLHF get a better grasp on what to aim for by studying where “niceness” comes from in humans. He’s inspired by the idea that “consciousness is when the mask eats the shoggoth” and suggests, “maybe with the right bootstrapping you can let that happen on purpose”.
I see a very important point here: Human intelligence isn’t misaligned with evolution in a random direction, it is misaligned in the direction of maximizing positive qualia. Therefore, it seems very likely that consciousness played a causal role in the evolution of human moral alignment—and such causal role needs to be possible to study.
Thanks, I’ve changed it up
Yes, OpenAI’s domain name is in the list because they have a blog
My intention was to make any content published by OpenAI accessible
Recently, I made RatSearch for googling within EA-adjecent webs. Now, you can try the GPT bot version! (GPT plus required)
The bot is instructed to interpret what you want to know in relation to EA and search for it on the Forums. If it fails, it searches through the whole web, while prioritizing the orgs listed by EA News.
Cons: ChatGPT uses Bing, which isn’t entirely reliable when it comes to indexing less visited webs.
Pros: It’s fun for brainstorming EA connections/perspective, even when you just type a raw phrase like “public transport” or “particle physics”
Neutral: I have yet to experiment whether it works better when you explicitly limit the search using the site: operator—try AltruSearch 2. It seems better at digging deeper within the EA ecosystem; AltruSearch 1 seems better at digging wider.
Update (12/8): The link now redirects to an updated version with very different instructions. You can still access the older version here.
I recently made RatSearch for this purpose. You can also try the GPT bot version (more information here).
Sorry, I don’t have any experience with that.
Great to see real data on the web interest! In the past weeks, I investigated the same topic myself, while taking a psychological perspective & paying attention to the EU AI act, reaching the same conclusion (just published here).
The idea of existential risk cuts against the oppression/justice narrative, in that it could kill everyone equally. So they have to opposite it.
That seems like an extremely unnatural thought process. Climate change is the perfect analogy—in these circles, it’s salient both as a tool of oppression and an x-risk.
I think far more selection of attitudes happens through paying attention to more extreme predictions, rather than through thinking / communicating strategically. Also, I’d guess people who spread these messages most consciously imagine a systemic collapse, rather than a literal extinction. As people don’t tend to think about longtermistic consequences, the distinction doesn’t seem that meaningful.
AI x-risk is more weird and terrifying and it goes against the heuristics that “technological progress is good”, “people have always feared new technologies they didn’t understand” and “the powerful draw attention away from their power”. Some people, for whom AI x-risk is hard to accept happen to overlap with AI ethics. My guess is that the proportion is similar in the general population—it’s just that some people in AI ethics feel particularly strong & confident about these heuristics.
Btw I think climate change could pose an x-risk in the broad sense (incl. 2nd-order effects & astronomic waste), just one that we’re very likely to solve (i.e. the tail risks, energy depletion, biodiversity decline or the social effects would have to surprise us).
Looking forward to the sequel!
I’d be particularly interested in any takes on the probability that civilization will be better equipped to deal with the alignment problem in, say, 100 years. My impression is that there’s an important and not well-examined balance between:
Decreasing runaway AI risk & systemic risks by slowing down AI
Increasing the time of perils
Possibly increasing its intensity by giving malicious actors more time to catch up in destructive capabilities
But also possibly increasing the time for reflection on defense before a worse time of perils.
Possibly decreasing the risk of an aligned AI with bad moral values (conditional on this risk being lower in year 2123)
Possibly increasing the risk of astronomic waste (conditional on this risk being higher if AI is significantly slowed down)
That’s a good note. But it seems to me a little like pointing out there’s a friction between a free market policy and a pro-immigration policy because
a) Some pro-immigration policies would be anti-free market (e.g. anti-discrimination law)
b) Americans who support one tend to oppose the other
While that’s true, philosophically, the positions support each other and most pro-free market policies are presumably neutral or positive for immigration.
Similarly, you can endorse the principles that guide AI ethics while endorsing less popular solutions because of additional, x-risk considerations. If there are disagreements, they aren’t about moral principles, but empirical claims (x-risk clearly wouldn’t be an outcome AI ethics proponents support). And the empirical claims themselves (“AI causes harm now” and “AI might cause harm in the future”) support each other & correlated in my sample. My guess is that they actually correlate in academia as well.
It seems to me the negative effects of the concentration of power can be eliminated by other policies (e.g. Digital Markets Act, Digital Services Act, tax reforms)
Sounds reasonable! I think the empirical side to the question “Will society be better equipped to set AI values in 2123?” is more lacking. For this purpose, I think “better equipped” can be nicely operationalized in a very value-uncertain way as “making decisions based on more reflection & evidence and higher-order considerations”.
This kind of exploration may include issues like:
Populism. Has it significantly decreased the amount of rationality that goes into gov. decision-making, in favor of following incentives & intuitions? And what will be faster—new manipulative technologies or the rate at which new generations get immune to them?
Demographics. Given that fundamentalists tend to have more children, should we expect there will be more of them in 2123?
Cultural evolution. Is Ian Morris or Christopher Brown more right, i.e. should we expect that as we get richer, we’ll be less prone to decide based on what gives us more power, and in turn attain values better calibrated with the most honest interpretation of reality?
If you’re especially motivated by environmental problems, I recommend reading the newly released book by Hannah Ritchie Not the End of the World (here’s her TED talk as a trailer).
I’d like to correct something I mentioned in my post—I implied one reason I didn’t find plastic pollution impactful, is that it just doesn’t have an easy fix. I no longer think that’s quite true—Hannah says it actually could be solved tomorrow, if the Western leaders decided to finance waste infrastructure in developing countries. Most of ocean plastic pollution comes from a handful of rivers in Asia. Since we have this kind of infrastructure in Europe and North America, our waste is only responsible for ~5 % of the ocean plastic (Our World in Data). Presumably, such infrastructure would also lay ground for reducing the harms coming from other waste.
I think there are two other reasons for the low attention to waste:
EA is a young do-ocracy—i.e. everybody is trying to spot their “market advantage” that allows them to nudge the world in a way that triggers a positive ripple effect—and so far, everybody’s attention got caught up by problems that seem bigger. While I have identified ~4 possibly important problems that come with waste in my post (diseases, air pollution, heavy metal pollution, animal suffering), if you asked a random person who lives in extreme poverty how to help them, waste probably wouldn’t be on the top of their mind.
Most people are often reminded of the aesthetic harm of waste. Since people’s moral actions are naturally motivated by disgust, I would presume a lot of smart people who do not take much time to reflect on their moral prioritization would already have found a way to trigger the ripple effect in this area—if there was one.
While I think one would do more good if they convinced a politician to target developmental aid at alleviating diseases and extreme poverty, than if they convinced them to run the project suggested by Hannah, perhaps given the bias I mentioned in point 2), it may be that politicians are more willing to provide funding for a project that would have the ambition to eradicate ocean plastic (constituting one of these ripple effects). So if you feel motivated to embark on a similar project, best of luck! :)
(The same potentially goes for the other 2 waste projects I’ve suggested—supporting biogas plants and improving e-waste monitoring/worker equipment)
What can an EA academic do to improve the incentives in the research side of academia? To help reward quality or even positive impact?