Lifelong recursive self-improver, on his way to exploding really intelligently :D
More seriously: my posts are mostly about AI alignment, with an eye towards moral progress and creating a better future. If there was a public machine ethics forum, I would write there as well.
An idea:
We have a notion of what good is and how to do good
We could be wrong about it
It would be nice if we could use technology not only to do good, but also to also improve our understanding of what good is.
The idea above, and the fact that I’d like to avoid producing technology that can be used for bad purposes, is what motivates my research. Feel free to reach out if you relate!
At the moment I am doing research on agents whose behaviour is driven by a reflective process analogous to human moral reasoning, rather than by a metric specified by the designer. See Free agents.
Here are other suggested readings from what I’ve written so far:
-Naturalism and AI alignment
-From language to ethics by automated reasoning
-Criticism of the main framework in AI alignment
What you wrote about the central claim is more or less correct: I actually made only an existential claim about a single aligned agent, because the description I gave is sketchy and really far from the more precise algorithmic level of description. This single agent probably belongs to a class of other aligned agents, but it seems difficult to guess how large this class is.
That is also why I have not given a guarantee that all agents of a certain kind will be aligned.
Regarding the orthogonality thesis, you might find 1.2 in Bostrom’s 2012 paper interesting. He writes that objective and intrinsically motivating moral facts need not undermine the orthogonality thesis, since he is using the term “intelligence” as “instrumental rationality”. I add that there is also no guarantee that the orthogonality thesis is correct :)
About psychopaths and metaethics, I haven’t spent a lot of time on that area of research. Like other empirical evidence, it doesn’t seem easy to interpret.