I think I can cut it down a little by moving to Google domains, but Namecheap have introductory offers which I’m not sure kick in when you transfer domains? Worth looking into when the year rolls around though.
plex
You can add new ones here, I would but you probably have a clearer idea of what a good summary would be.
Hi Trev!
Very briefly on your points:
We don’t think AI needs to break thermodynamics to be dangerous.
We don’t think all human-specified goals are safe, and we don’t know how to give a safe one to an extremely powerful AI.
We are not worried about self-awareness or consciousness in particular.
Consider familiarizing yourself with some of the basic arguments, for example using this playlist, “The Road to Superintelligence” and “Our Immortality or Extinction” posts on WaitBuyWhy for a fun, accessible introduction, and Vox’s “The case for taking AI seriously as a threat to humanity” as a high-quality mainstream explainer piece.
The free online Cambridge course on AGI Safety Fundamentals provides a strong grounding in much of the field and a cohort + mentor to learn with.[1]
There is already a team working on the transitive pagerank-like karma over at EigenTrust.Net, with a functional prototype using Discord reacts. We’d love to integrate with EA Forum / LW / Alignment Forum. Feel free to drop by our dev channel!
Entropy will increase so any AI system will break and ultimately turn itself off?
There are plenty of sources of negentropy around, like the sun, as we humans and other forms of life make use of. It’s little consolation that a misaligned AI would eventually fall to the heat death of the universe.
The closest thing that comes to mind is Critch’s work on multi-user alignment, e.g. What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs).
Can I petition for you moving away from Slack with is hostile to open communities due to its business model (e.g. hiding messages after 3 months unless you pay $8.75/user/month), towards Discord which is welcoming to communities due to their business model. I’m spearheading an initiative[1] to move Slacks to there after this recent decision.
Agree, and add that code models won’t be data constrained as they can generate their own training data. It’s simple to write tests automatically, and you can run the code to see whether it passes the tests before adding it to your training dataset. As an unfortunate side effect, part of this process involves constantly and automatically running code output by a large model, and feeding it data which it generated so it can update its weights, both of which are not good safety-wise if the model is misaligned and power seeking.
I don’t know if this has been incorporated into a wider timelines analysis yet as it is quite recent, but this was a notable update for me given the latest scaling laws which indicate that data is the constraining factor, not parameter count. Much shorter timelines than 2043 seem like a live and strategically relevant possibility.
Hi! One option would be to use the same stack as https://alignment.wiki/, we’d be happy to collaborate.
Alignment Ecosystem Development maintains a list of projects which volunteer-scale efforts could shift the needle on, and hosts monthly calls where people can pitch or join projects. Maybe someone could drop by one of those calls and see if anything fits the bill for this?
Sorry about this, fixed the editing permissions, should work now!
Edit: Even with the fixed permissions it’s not working right, I’ve contacted support and we might move to a new platform.
Edit2: Actually fixed for real.
Good idea, done.
Done, and done. Edit access should be open now too.
If you’re a SWE, entrepreneur, communicator, or volunteer organizer who wants to help save the world, we have various opportunities for volunteering-scale projects which seem like they might shift the needle on AI x-risk, while giving you a track record as someone who follows through and builds valuable things.
If you’d like to join, we’ll be having our third monthly Ecosystem Opportunities call, where people can give brief (~5m) pitches of projects which a few hours a week of dev time would make a difference. We also welcome ideas you have for projects which we could offer to our volunteers!If you’d like to pitch an idea, please submit it here: https://airtable.com/shrSYdsGmdcBXoBrJ and make a thread in project-ideas for discussion.
Additionally, EA Gather Town has given us our own area for co-working and the monthly opportunities pitch calls! I’ll be hanging out there while I’m doing projects, feel free to drop by and do impromptu co-working sessions. Join here: https://app.gather.town/app/Yhi4XYj0zFNWuUNv/EA%20coworking%20and%20lounge?spawnToken=X04_I7A1bVXp75xb
Great question! I think the core of the answer comes down to the fact that the real danger of AI systems does not come from tools, but from agents. There are strong incentives to build more agenty AIs, agenty AIs are more useful and powerful than tools, it’s likely to be relatively easy to build agents once you can build powerful tools, and tools may naturally slide into becoming agents at a certain level of capability. If you’re a human directing a tool, it’s pretty easy to point the optimization power of the tool in beneficial ways. Once you have a system which has its own goals which it’s maximizing for, then you have much bigger problems.
Consequentialists seek power more effectively than other systems, so when you’re doing a large enough program search with a diverse training task attached to a reinforcement signal they will tend to be dominant. Internally targetable maximization-flavored search is an extremely broadly useful mechanism which will be stumbled on and upwrighted by gradient descent. See Rohin Shah’s AI risk from Program Search threat model for more details. The system which emerges from recursive self-improvement is likely to be a maximizer of some kind. And maximizing AI is dangerous (and hard to avoid!), as explored in this Rob Miles video.
To tie this back to your question: Weak and narrow AIs can be safely used as tools, we can have a human in the outer loop directing the optimization power. Once you have a system much smarter than you, the thing it ends up pointed at maximizing is no longer corrigible by default, and you can’t course correct if you misspecified the kind of facelikeness you were asking for. Specifying open ended goals for a sovereign maximizer to pursue in the real world which don’t kill everyone is an unsolved problem.
An AI could be aligned to something other than humanity’s shared values, and this could potentially prevent most of the value in the universe from being realized. Nate Soares talks about this in Don’t leave your fingerprints on the future.
Most of the focus goes on being able to align an AI at all, as this is necessary for any win-state. There seems to be consensus among the relevant actors that seizing the cosmic endowment for themselves would be a Bad Thing. Hopefully this will hold.
Activities which pay off over longer time horizons than your timelines should be dramatically downweighted. e.g. if your plan is to spend 6 years building career capital in a minimal impact position then leverage that into an AI Safety job, this is not a good plan if your timelines are 7 years.
Generally, I advise moving as rapidly as possible into doing something which actually helps,[1] either directly as a researcher or by building and maintaining infrastructure which improves the alignment ecosystem and allows research to move in the right direction faster.
- ^
I happen to know you’re helping with impact markets! This seems like a great place to be if we have at least mid single digit years, maybe even low single digit years.
- ^
I can’t track it down, but there is a tweet by I think Holden who runs OpenPhil where he says that people sometimes tell him that he’s got it covered and he wants to shout “NO WE DON’T”. We’re very very far from a safe path to a good future. It is a hard problem and we’re not rising to the challenge as a species.
Why should you care? You and everyone you’ve ever known will die if we get this wrong.
Firstly, make sure you’re working on doing safety rather than capabilities.
Distinguishing between who is doing the best safety work if you can’t evaluate the research directly is challenging. Your best path is probably to find a person who seems trustworthy and competent to you, and get their opinions on the work of the organizations you’re considering. This could be a direct contact, or from a review such as Lark’s one.
I’d be very happy to make this a general index of domains, and update with ones from other cause areas. And I’d be happy to accept retrofunding for this or my other (currently self-funded) AI x-risk projects from anyone who wants to encourage me, though I’d ask that it be directed towards one of my existing grantees rather than myself for tax efficiency.
There are a couple of ea.something domains which look potentially great, though they tend to be premium ones so would rapidly increase the burn rate of this project. Can scale it up if wanted.