AGI—alignment—paperclip maximizer—pause—defection—incentives
I would like to expose myself to critique.
I hope this is a place where I can receive some feedback + share some of the insights that came to me.
https://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect—“people with low ability, expertise, or experience regarding a certain type of task or area of knowledge tend to overestimate their ability or knowledge”
I’m somewhere on the spectrum 🤡
1. AGI alignment metrics
To avoid paperclip maximizer and solving climate change by eliminating humans I suggest the following value: LIFE
I’ve embedded this principle into Network State Genesis and described it in the founding document in the following way:
1. No killing (universally agreed across legal systems and religions)
2. Health (including mental health, longevity, happiness, wellbeing)
3. Biosphere, environment, other living creatures
4. AI safety
5. Mars: backup civilization is fully aligned with the virtue of life preservation
These principles were applicable to the Network State and I think first three of them can be repurposed towards AGI alignment.
(another core belief is GRAVITY—I believe in GRAVITY—GRAVITY brought us together)
2. Pause, defection, incentives
New proof-of-X algorithm to ensure compliance with AI moratorium. Ensuring supercomputers are not used for training more powerful models.
Proof-of-work consumes loads of energy.
It could be a mixture of algorithms, that is more energy friendly:
peak power for a short amount of time (solve something complex quickly)
proof of storage
A mixture of different algorithms to ensure various elements of the data centre are yielded unsuitable for other means. I know too little about the challenges of operating a data center, I know too little about training AI, ultimately I do not know.
I’m just aware of the incentive to defect and no obvious way to enforce the rules.
So much easier to prove the existence of aliens.
So much more difficult to disprove.
So much easier to prove you did the thing.
So much more difficult to disprove.
WOW
Something new dropped: https://twitter.com/FLIxrisk/status/1646539796527951872
Direct link to the policy: https://futureoflife.org/wp-content/uploads/2023/04/FLI_Policymaking_In_The_Pause.pdf
My reply: https://twitter.com/marsxrobertson/status/1646583463493992462
I’m deeply in “don’t trust verify” camp.
Monitor the energy usage.
Climate change is for real and we need to cut the emissions anyway.
My assumption is: “it takes computer power to train the AI”
“Data centres are estimated to be responsible for up to 3% of global electricity consumption today and are projected to touch 4% by 2030.”—https://datacentremagazine.com/articles/efficiency-to-loom-large-for-data-centre-industry-in-2023
A little bit more explanation / inspiration: https://en.wikipedia.org/wiki/Three_Laws_of_Robotics
Another inspiration: https://earthbound.report/2018/01/15/elinor-ostroms-8-rules-for-managing-the-commons/
Laws of AI alignment:
Humans. Health. Mental Heath. Happiness. Wellbeing. Nature. Environment.
Buying us enough time to figure out what’s next...
I guess there are not that many AI ethicists: https://forum.effectivealtruism.org/posts/5LNxeWFdoynvgZeik/nobody-s-on-the-ball-on-agi-alignment
What is the Shelling Point? This Forum? Less Wrong? Stack Overflow? Reddit? Some Twitter hashtag: https://twitter.com/marsxrobertson/status/1642235852997681153
Or maybe we can ask the AI?
AI may actually know what are the good principles 🤣