Here’s my attempt to reflect on the topic: https://forum.effectivealtruism.org/posts/PWKWEFJMpHzFC6Qvu/alignment-is-hard-communicating-that-is-harder
Eleni_A
I don’t think it’s restricted only to agentic technologies; my model is for all technologies that involve risk. My toy example is that even producing a knife requires the designer to think about its dangers in advance and propose precautions.
Both Redwood and Anthropic have labs and do empirical work. This is also an example of experimental work: https://twitter.com/Karolis_Ram/status/1540301041769529346
I’ve been asked this question! Or, to be specific, I’ve been asked something along these lines: human cultures have always been speculating about the end of the world so how is this forecasting x-risk any different?
A quick comment after reading about 50% of this article: it seems to focus on statements instead of arguments, e.g. “we cannot calculate the probability of future events as if people didn’t exist.” or “We are after genuine creativity, not the illusion of creativity.”At the same time, it doesn’t really engage with the literature on AI risk or even explain why the definitions adopted e.g. the knowledge definition, are the most appropriate ones. There might be some interesting thoughts in there, but it’d be better for the author to develop them in shorter articles and make the arguments more clear.
I know what you mean and I’ve definitely had this kind of experience (and in particular, last semester which led me to want to leave both my university and academia—that’s how bad it was). What I wanted to emphasize while teaching is that it’s valuable to question our own thoughts, emotions, and experiences in the philosophy classroom, and it’s disappointing to see that most people are not willing to do that. But hey, at least I tried...
“When they ask me about truth, I say, truth in which axiomatic system?” Teukros Michailides
Why bother with New Year’s resolutions when you can just start doing things today (and every today)?
It’s more epistemically virtuous to make a wrong prediction than to make no predictions at all.
The Collingridge dilemma: it is difficult to predict the future impact of a technology. However, once the technology has been implemented, it becomes difficult to manage.
“Find where the difficult thing hides, in its difficult cave, in the difficult dark.” Iain S. Thomas
My upskilling study plan:
1. Math
i) Calculus (derivatives, integrals, Taylor series)
ii) Linear Algebra (this video series)
iii) Probability Theory
2. Decision Theory
3. Microeconomics
i) Optimization of individual preferences
4. Computational Complexity
6. Machine Learning theory with a focus on deep neural networks
8. Arbital
Full-time research in AI Safety.
A model of one’s own (or what I say to myself):
Defer a bit less today—think for yourself!
What would the world look like if X was not true?
Make a prediction—don’t worry if it turns out to be false.
Articulate an argument and find at least one objection to it.
Helpful post, Zach! I think it’s more useful and concrete to focus on asking about specific capabilities instead of asking about AGI/TAI etc. and I’m pushing myself to ask such questions (e.g., when do you expect to have LLMs that can emulate Richard Feynmann-level -of-text). Also, I like the generality vs capability distinction. We already have a generalist (Gato) but we don’t consider it to be an AGI (I think).
Five types of people on AI risks:
Wants AGI as soon as possible, ignores safety.
Wants AGI, but primarily cares about alignment.
Doesn’t understand AGI/doesn’t think it’ll happen anytime during her lifetime; thinks about robots that might take people’s jobs.
Understands AGI, but thinks the timelines are long enough not to worry about it right now.
Doesn’t worry about AGI; being locked-in in our choices and “normal accidents” are both more important/risky/scary.