WillPearson

Karma: 122

WillPearson 19 Nov 2025 23:44 UTC
1 point
0 ∶ 0
on: WillPearson’s Quick takes
So I’ve been trying to think of ways to improve the software landscape. If we do this it might make traditional software more aligned with human values and it’s models for building more advanced systems too.
One piece I’ve been looking at is software licensing.
Instead of traditional open source, have an easy to get license for a version of software, based on a cryptographic identity. This could make it less frictional to be a bad actor.
This license is checked on startup that it matches the version of the software running (git sha stored somewhere). If it doesn’t the software fails to start. It can also be used by clients and servers to identify each other but does not have to marry up one to one with a person’s identity.
The license is acquired as easily as a let’s encrypt style certificate, but the identity has to part of the reputation system (which might require a fee).
The software might require a license from one of many reputation monitoring systems. So that no monitoring system becomes a single point of failure.
Edit: effective altruism might decide to fund awards for work of software ecosystem engineering with non software engineers as the judges to bring this digital infrastructure to the publics consciousness and incentivise making it understandable as well

WillPearson 1 May 2025 11:15 UTC
3 points
0 ∶ 0
on: WillPearson’s Quick takes
I had an idea for a new concept in alignment that might allow nuanced and human like goals (if it can be fully developed).
Has anyone explored using neural clusters found by mechanistic interpretability as part of a goal system?
So that you would look for clusters for certain things e.g. happiness or autonomy and have that neural clusters in the goal system. If the system learned over time it could refine that concept.
This was inspired by how human goals seem to have concepts that change over time in them.

WillPearson 9 Apr 2025 12:32 UTC
3 points
0 ∶ 0
on: WillPearson’s Quick takes
I’ve got an idea for a business that could help biosecurity by helping stop accidental leaks of data to people that shouldn’t have it. I’m thinking about proving the idea with personal identifiable information. Looking for feedback and collaborators.

WillPearson 26 Mar 2025 17:54 UTC
4 points
4 ∶ 2
on: Will AI R&D Automation Cause a Software Intelligence Explosion?
My expectation is that software without humans in the loop evaluating it, will Goodhart’s law itself and over fit to the metrics/measures given.

WillPearson 24 Mar 2025 3:36 UTC
0 points
0 ∶ 0
on: WillPearson’s Quick takes
My blog might be of interest to people

WillPearson 5 Mar 2025 20:05 UTC
1 point
0 ∶ 0
on: Fractal Governance: A Tractable, Neglected Approach to Existential Risk Reduction
Here is a blog post also written with Claudes help that I hope to engage with home scale experimenters with

WillPearson 26 Feb 2025 23:15 UTC
2 points
0 ∶ 0
in reply to: ank’s comment on: Share AI Safety Ideas: Both Crazy and Not
I appreciate your views on space and AI working with ML systems in that way might be useful.
But I think that I am drawn to the base reality a lot because of threats to that from things like gamma ray bursts or aliens. These things can only be represented probabilistically in simulations because they are out of context. The branching tree explodes with possibilities.
I agree that we aren’t ready for agents , but I would like to try to build time non-static intelligence augmentation as slowly as possible. Starting with building systems to control and shape them tested out with static ML systems. Then testing them with people. Then testing them inside simulations etc

WillPearson 26 Feb 2025 19:11 UTC
2 points
0 ∶ 0
in reply to: ank’s comment on: Share AI Safety Ideas: Both Crazy and Not
I find your view of things interesting. A few questions, how do you deal with democracy when people might be inhabiting worlds unlike the real one and have forgotten the real one exists?
I think static AI models lack corrigibility, humans can’t give them instruction on how to change how to act, so they might be a dead end in terms of day to day usefulness. They might be good as scientists though as they can be detached from human needs. So worth exploring.

WillPearson 26 Feb 2025 17:31 UTC
2 points
1 ∶ 0
in reply to: ank’s comment on: Share AI Safety Ideas: Both Crazy and Not
There is a concept of utility, but I’m expecting these systems to mainly be user focussed so not agents in their own rights, so the utility is based on user feedback about the system. So ideally the system would be an extension of the feedback systems within humans.

There is also karma which is separate from utility which is given by one ml system to another, if it is helped it out or hindered it in a non-economic fashion.

WillPearson 26 Feb 2025 13:28 UTC
2 points
0 ∶ 0
on: Share AI Safety Ideas: Both Crazy and Not
I’ve been thinking that AGI will require an freely evolving multi-agent approach. So I want to try out the multi-agent control patterns on ML models without the evolution. Which should prove them out in a less dangerous setting. The multi-agent control patterns I am thinking are things like karma and market based alignment patterns. More information on my blog

WillPearson 23 May 2024 22:48 UTC
2 points
0 ∶ 0
on: Big Picture AI Safety: Introduction
Does anyone have recommendations for people I should be following for structural AI risk discussion and possible implications of post-current deep learning AGI systems.

WillPearson 9 Mar 2024 16:50 UTC
1 point
0 ∶ 0
in reply to: David T’s comment on: Existential risk management in central government? Where is it?
I suppose I’m interested in questions around what is an existential threat. How bad a nuclear winter would it have to be to cause the collapse of society (and how easily could society be rebuilt afterwards). Both require robust models of agriculture in extreme situations and models of energy flows in economies where strategic elements might have been destroyed (to know how easy rebuilding would be). Since pandemic/climate change also have societal collapse as a threat the models needed would apply to them too (they might trigger nuclear exchange or at least loss of control over nuclear reactors, depending upon what societal collapse looks like).

The national risk register is the closest I found, in the public domain. It doesn’t include things like large meteorites, that I found.

WillPearson 18 Feb 2024 0:29 UTC
3 points
0 ∶ 0
in reply to: CAISID’s comment on: Is working on AI to help democracy a good idea?
It’s true that all data and algorithms are biased in some way. But I suppose the question is, is the bias from this less than what you get from human experts, who often have a pay cheque that might lead them to think in a certain way.
I’d imagine that any system would not be trusted implicitly, to start with, but would have to build up a reputation of providing useful predictions.
In terms of implementation, I’m imagining people building complex models of the world, like decision making under deep uncertainty with the AI mainly providing a user friendly interface to ask questions about the model.

WillPearson 14 Dec 2023 17:55 UTC
3 points
0 ∶ 0
in reply to: Geoffrey Miller’s comment on: Is anyone working on safe selection pressure for digital minds?
Thanks, I did a MSc in this area back in the early 2000s, my system was similar to Tierra, so I’m familiar with evolutionary computation history. Definitely useful context. Learning classifier systems are also interesting to check out for aligning multi-agent evolutionary systems. It definitely informs where I am coming from.
Do you know anyone with this kind of background that might be interested in writing something long form on this? I’m happy to collaborate, but my mental health has not been the best. I might be able to fund this a small bit, if the right person needs it.

WillPearson 13 Dec 2023 12:33 UTC
1 point
0 ∶ 0
in reply to: Ryan Greenblatt’s comment on: Is anyone working on safe selection pressure for digital minds?
Thanks, I’ve had a quick skim of propositions, it does mention perhaps limiting rights of reproduction, but not the conditions under which it should be limited or how it should be controlled.
Another way of framing my question is if natural selection favours ai over humans, what form of selection should we try to put in place for AI. Rights are just part of the the question. Evolutionary dynamics and what is needed by society from AI (and humans) to continue functioning is the major part of the question.

WillPearson 13 Dec 2023 0:38 UTC
1 point
0 ∶ 0
in reply to: Ryan Greenblatt’s comment on: Is anyone working on safe selection pressure for digital minds?
I’ve clarified the question, does it make more sense now?

WillPearson 12 Dec 2023 22:46 UTC
1 point
0 ∶ 0
on: Is anyone working on safe selection pressure for digital minds?
And if no one is working on it, is there an organisation that would be interested in starting working on it?

WillPearson 23 Oct 2023 10:37 UTC
−5 points
0 ∶ 0
on: WillPearson’s Quick takes
I’ve been thinking a bit around secret efforts in AI safety research.
My current thoughts are around: if it is or does occur what non secret efforts might be needed? E.g. if it develops safe AI media that shows postive outcomes from AI might be needed so that people aren’t overly scared.

Oh and AI policy might be needed too, perhaps limiting certain types of AI (agentic stuff).

WillPearson 27 Aug 2023 8:46 UTC
1 point
0 ∶ 0
on: WillPearson’s Quick takes
How should important ideas around topics like AI and biorisk be shared? Is there a best practice, or government departments that specialise in handling that?

WillPearson 28 Sep 2020 10:41 UTC
3 points
0 ∶ 0
on: Open Thread #45
Hi, I’m thinking about a possibly new approach to AI safety. Call it AI monitoring and safe shutdown.
Safe shutdown, riffs on the idea of the big red button, but adapts it for use in simpler systems. If there was a big red button, who gets to press it and how? This involves talking to law enforcement, legal and policy. Big red buttons might be useful for non learning systems, large autonomous drones and self-driving cars are two system that might suffer from software failings and need to be shutdown safely if possible (or precipitously if the risks from hard shutdown are less than it’s continued operation).
The monitoring side of thing asks what kind of registration and monitoring we should have for AIs and autonomous systems. Building on work on aircraft monitoring, what would the needs around autonomous system be?
Is this a neglected/valuable cause area? If so, I’m at an early stage and could use other people to help out.