Both ideas are compelling in totally different ways! The second one especially stuck with me. There’s something powerful about the idea that being reliably “nice” can actually be a strategic move, not just a moral one. It reminds me a lot of how trust builds in human systems too, like how people who treat the vulnerable well tend to gain strong allies over time.
Curious to see where you take it next, especially if you explore more complex environments.
I do think the second one has more potential impact if it works out, but I also worry that it’s too “out there” speculative and also dependent on the AGI being persuaded by an argument (which they could just reject), rather than something that more concretely ensures alignment. I also noticed that almost no one is working on the Game Theory angle, so maybe it’s neglected, or maybe the smart people all agree it’s not going to work.
The first project is probably more concrete and actually uses my prior skills as an AI/ML practitioner, but also, there’s a lot of people already working on Mech Int stuff. In comparison, my knowledge of Game Theory is self-taught and not very rigorous.
I’m tempted to explore both to an extent. The first one I can probably do some exploratory experiments to test the basic idea, and rule it out quickly if it doesn’t work.
Of course! You make some great points. I’ve been thinking about that tension too, how alignment via persuasion can feel risky, but might be worth exploring if we can constrain it with better emotional scaffolding.
VSPE (the framework I created) is an attempt to formalize those dynamics without relying entirely on AGI goodwill. I agree it’s not obvious yet if that’s possible, but your comments helped clarify where that boundary might be.
I would love to hear how your own experiments go if you test either idea!
Both ideas are compelling in totally different ways! The second one especially stuck with me. There’s something powerful about the idea that being reliably “nice” can actually be a strategic move, not just a moral one. It reminds me a lot of how trust builds in human systems too, like how people who treat the vulnerable well tend to gain strong allies over time.
Curious to see where you take it next, especially if you explore more complex environments.
Thanks for the thoughts!
I do think the second one has more potential impact if it works out, but I also worry that it’s too “out there” speculative and also dependent on the AGI being persuaded by an argument (which they could just reject), rather than something that more concretely ensures alignment. I also noticed that almost no one is working on the Game Theory angle, so maybe it’s neglected, or maybe the smart people all agree it’s not going to work.
The first project is probably more concrete and actually uses my prior skills as an AI/ML practitioner, but also, there’s a lot of people already working on Mech Int stuff. In comparison, my knowledge of Game Theory is self-taught and not very rigorous.
I’m tempted to explore both to an extent. The first one I can probably do some exploratory experiments to test the basic idea, and rule it out quickly if it doesn’t work.
Of course! You make some great points. I’ve been thinking about that tension too, how alignment via persuasion can feel risky, but might be worth exploring if we can constrain it with better emotional scaffolding.
VSPE (the framework I created) is an attempt to formalize those dynamics without relying entirely on AGI goodwill. I agree it’s not obvious yet if that’s possible, but your comments helped clarify where that boundary might be.
I would love to hear how your own experiments go if you test either idea!