Co-Director of Equilibria Network: https://ââeq-network.org/ââ
I try to write as if I were having a conversation with you in person.
I would like to claim that my current safety beliefs are a mix between Paul Christianoâs, Andrew Critchâs and Def/âAcc.
Jonas Hallgren đ¸
ďMax TegÂmarkâs new Time arÂtiÂcle on how weâre in a Donât Look Up sceÂnario [Linkpost]
Glad to hear it!
The Benefits of DistilÂlaÂtion in Research
Great tool; Iâve enjoyed it and used it for two years. I (a random EA) would recommend it.
Thank you for this! Iâm hoping that this enables me to spend a lot less time on hiring in the future. I feel that this is a topic that could easily have taken me 3x the effort to understand if I hadnât gotten some very good resources from this post so I will definitely check out the book and again, awesome post!
Black Box InÂvesÂtiÂgaÂtions ReÂsearch Hackathon
That makes sense, and I would tend to agree that the framing of contingency invokes more of a âwhat if I were to do thisâ feeling which might be more conducive toward people choosing to do more entrepreneurial thinking which in turn seems to have higher impact
Good post; interesting point with that the impact of the founder effect is probably higher in longtermism and I would tend to agree that starting a new field can have a big impact. (Such as wild animal suffering in space, NO FISH ON MARS!)
Not to be the guy that points something out, but I will be that guy; why not use the classic EA jargon of counterfactual impact instead of contingent impact?
Essentially that the epistemics of EA is better than in previous longtermist movements. EAâs frameworks are a lot more advanced with things such as thinking about the traceability of a problem, not Goodharting on a metric, forecasting calibration, RCTs⌠and so on with techniques that other movements didnât have.
Thank you! I was looking for this one but couldnât find it
The ones who aimed at the distant future mostly failed. The longtermist label seems mostly unneeded and unhelpful- and Iâm far from the first to think so.
Firstly, in my mind, youâre trying to say something akin to that we shouldnât advertise longtermism as it hasnât worked in the past. Yet this is a claim about the tractability of the philosophy and not necessarily about the idea that future people matter.
Donât confuse the philosophy with the instrumentals, longtermism matters, but the implementation method is still up for debate.
But I donât view the effective altruist version of longtermism as particularly unique or unprecedented.I think the dismal record of (secular) longtermism speaks for itself.
Secondly, I think youâre using the wrong outside view.
There is a problem with using historical presidents as you assume similar conditions exist in the EA community as it did in the other communities.
An example of this is HPMOR and how unpredictable the success of this fan fiction would have been if you looked at an average Harry Potter fan fiction from before. The underlying outside view is different because the underlying causal thinking is different.
As Nasim Nicholas Taleb would say, youâre trying to predict a black swan, an unprecedented event in the history of humanity.
What is it that makes longtermism different?
There is a fundamental difference in understanding of the worldâs causal models in the EA community. There is no outside view for longtermism as its causal mechanisms are too different from existing reference classes.
To make a final analogy, it is useless to predict gasoline prices for an electric car, just like it is useless to predict the success of the longtermist movement from previous ones.
(Good post, though, interesting investigation, and I tend to agree that we should just say holy shit, x-risk instead)
AnÂnouncÂing the DistilÂlaÂtion for AlignÂment Practicum (DAP)
StockÂholm StuÂdent Hackathon: LesÂsons for next time
This is completely unrelated to the great point you made with the comment but I felt I had to share a classic? EA tip that worked well for me. (uncertain how much this counts as a classic.) I got to the nice nihilistic bottom of realising that my moral system is essentially based on evolution but I reversed that within a year by reading a bunch of Buddhist philosophy and by meditating. Now itâs all nirvana over here! (try it out now...)
https://ââwww.lesswrong.com/ââposts/ââMf2MCkYgSZSJRz5nM/ââa-non-mystical-explanation-of-insight-meditation-and-the
https://ââwww.lesswrong.com/ââposts/ââWYmmC3W6ZNhEgAmWG/ââa-mechanistic-model-of-meditation
https://ââwww.lesswrong.com/ââs/ââZbmRyDN8TCpBTZSip
What is the moral valÂues of naÂtions?(China, for exÂamÂple)
If you feel like booking a meeting to help me out in some way, hereâs my calendly: https://ââcalendly.com/ââjonas-hallgren
TL;DR: I totally agree with the general spirit of this post, we need people to solve alignment, and weâre not on track. Go and work on alignment but before you do, try to engage with the existing research, there are reasons why it exists. There are a lot of things not getting worked on within AI alignment research, and I can almost guarantee you that within six months to a year, you can find things that people havenât worked on.
So go and find these underexplored areas in a way where you engage with what people have done before you!
I also agree in that Eliezerâs style of doom seems uncalled for and that this is a solvable but difficult problem. My personal p(doom) is something around 20%, and I think this seems quite reasonable.
Now I do want to give pushback on this claim as I see a lot of people who havenât fully engaged with the more theoretical alignment landscape making this claim. There are only 300 people working on alignment, but those people are actually doing things, and most of them arenât doing blue in the sky theory.
A note on the ARC claim:
This is essentially a claim about the methodology of science in that working on existing systems gives more information and breakthroughs compared to working on a blue-sky theory. The current hypothesis for this is that it is just a lot more information-rich to do real-world research. This is, however, not the only way to get real-world feedback loops. Christiano is not working on blue sky theory; heâs using real-world feedback loops in a different way; he looks at the real world and looks for information thatâs already there.
A discovery of this type is, for example, the tragedy of the commons; whilst we could have created computer simulations to see the process in action, itâs 10x easier to look at the world and see the real-time failures. He tells stories and sees where they fail in the future as his research methodology. This gives bits of information on where to do future experiments, like how we would be able to tell that humans would fail to stop overfishing without actually running an experiment on it.
This is also what John Wentworth does with his research; he looks at the real world as a reference frame which is quite rich in information. Now a good question is why we havenât seen that many empirical predictions from Agent Foundations. I believe it is because alignment is quite hard, and specifically, it is hard to define agency in a satisfactory way due to some really fuzzy problems (boundaries, among others) and, therefore, hard to make predictions.
We donât want to mathematize things too early either, as doing so would put us into a predefined reference frame that it might be hard to escape from. We want to find the right ballpark for agents since if we fail we might base evaluations on something that turns out to be false.
In general, thereâs a difference in the types of problems in alignment and empirical ML; the reference class of a âsharp-left turnâ is different from something empirically verifiable as it is unclearly defined, so a good question is how we should turn one into the other. This question of how we take recursive self-improvement, inner misalignment and agent foundations into empirically verifiable ML experiments is actually something that most of the people I know in AI Alignment are currently actively working on.
This post from Alexander Turner is a great example of doing this as they try âjust retargeting the searchâ
Other people are trying other things, such as bounding the maximisation in RL into quantilisers. This would, in turn, make AI more âcontentâ with not maximising. (fun parallel to how utilitarianism shouldnât be unbounded)
I could go on with examples, but what I really want to say here is that alignment researchers are doing things; itâs just hard to realise why theyâre doing things when youâre not doing alignment research yourself. (If you want to start, book my calendly and I might be able to help you.)
So what does this mean for an average person? You can make a huge difference by going in and engaging with arguments and coming up with counter-examples, experiments and theories of what is actually going on.
I just want to say that itâs most likely paramount to engage with the existing alignment research landscape before as itâs free information and easy to fall into traps if you donât. (a good resource for avoiding some traps is Johnâs Why Not Just sequence)
Thereâs a couple of years worth of research there; it is not worth rediscovering from the ground up. Still, this shouldnât stop you, go and do it; you donât need a hero licence.