Co-Director of Equilibria Network: https://ââeq-network.org/ââ
I try to write as if I were having a conversation with you in person.
I would like to claim that my current safety beliefs are a mix between Paul Christianoâs, Andrew Critchâs and Def/âAcc.
Jonas Hallgren đ¸
I enjoyed the post and I thought the platform for collective action looked quite cool.
I also want to mention that I think tractability is just generally a really hard thing for longtermism. Itâs also a newer field and so on expectation I think you should just believe that the projects will look worse than in animal welfare. I donât think thereâs any need for psychoanalysis of the people in the space even though it has its fair share of wackos.
Great point, I did not think of the specific claim of 5% when thinking of the scale but rather whether more effort should be spent in general.
My brain basically did a motte and baily on me emotionally when it comes to this question so I appreciate you pointing that out!
It also seems like youâre mostly critiquing the tractability of the claim and not the underlying scale nor neglectedness?
It kind of gives me some GPR vibes as for why itâs useful to do right now and that dependent on initial results either less or more resources should be spent?
Super exciting!
I just wanted to share a random perspective here: Would it be useful to model sentience alongside consciousness itself?If you read Daniel Dennettâs book Kinds of Minds or take some of the Integrated Information Theory stuff seriously, you will arrive at this view of a field of consciousness. This view is similar to Philip Goffâs or to more Eastern traditions such as Buddhism.
Also, even in theories like Global Workspace Theory, the amount of localised information at a point in time matters alongside the type of information processing that you have.
Iâm not a consciousness researcher or anything, but I thought it would be interesting to share. I wish I had better links to research here and there, but if you look at Dennett, Philip Goff, IIT or Eastern views of consciousness, you will surely find some interesting stuff.
Wild animal welfare and longtermist animal welfare versus farmed animal welfare?
Thereâs this idea of the truth as an asymmetric weapon; I guess my point isnât necessarily that the approach vector will be something like:
Expert discussion â Policy change
but rather something like
Experts discussion â Public opinion change â Policy Change
You could say something about memetics and that it is the most understandable memes that get passed down rather than the truth, which is, to some extent, fair. I guess Iâm a believer that the world can be updated based on expert opinion.
For example, Iâve noticed a trend in the AI Safety debate: the quality seems to get better and more nuanced over time (at least, IMO). Iâm not sure what this entails for the general publicâs understanding of this topic but it feels like it affects the policy makers.
Yeah, I guess the crux here is to what extent we actually need public support or at least what type of public support that we need for it to become legislation?
If we can convince 80-90% of the experts, then I believe that this has cascading effects on the population, and it isnât like AI being conscious is something that is impossible to believe either.
Iâm sure millions of students have had discussions about AI sentience for fun, and so it isnât like fully out of the Overton window either.Iâm curious to know if you disagree with the above or if there is another reason why you think research wonât cascade to public opinion? Any examples you could point towards?
A crux that I have here is that research that takes a while to explain is not going to inspire a popular movement.
Okay, what comes to mind for me here is quantum mechanics and how weâve come up with some pretty good analogies to explain parts of it.
Do we really need to communicate the full intricacies of AI sentience to say that an AI is conscious? I guess that this isnât the case.
The world where EA research and advocacy for AI welfare is most crucial is one where the reasons to think that AI systems are conscious are non-obvious, such that we require research to discover them, and require advocacy to convince the broader public of them.
But I think that world where this is true, and the advocacy succeeds, is a pretty unlikely one.I think this is creating a potential false dichotomy?
Hereâs what I believe might happen in AI Sentience without any intervention as an example:
1. Consciousness is IIT (Integrated Information Theory) or GWT (Global Workspace Theory) based in some way or another. In other words, we have some sort of underlying field of sentience like the electromagnetic field and when parts of the field interact in specific ways then âconsciousnessâ appears as a point load in that field.
2. Consciousness is then only verifiable if this field has consequences on the other fields of reality; otherwise, it is non-popperian, like the Multiverse theory.
3. Number 2 is really hard to prove and so weâre left with very correlational evidence. It is also tightly connected to what we think of as metaphysics, meaning that weâre going to be quite confused about it.
4. Therefore, general legislators and researchers leave this up to chance and do not compute any complete metrics, as it is too difficult a problem. They hope that AIs donât have sentience.
In this world, adding some AI sentience research from the EA Direction could have the consequences of:
1. Making AI labs have consciousness researchers on board so that they donât torture billions of iterations of the same AI.
2. Make governments create consciousness legislation and think tanks for the rights of AI.
3. Create technical benchmarks and theories about what is deemed to be conscious (See this initial, really good report for example)
You donât have to convince the general public; you have to convince the major stakeholders of tests that check for AI consciousness. It honestly seems kind of similar to what we have done for the safety of AI models but instead for the consciousness of them?Iâm quite excited for this week as it is a topic Iâm very interested in but something that I also feel that I canât really talk about that much or take seriously as it is a bit fringe so thank you for having it!
Damn, I really resonated with this post.
I share most of your concerns, but I also feel that I have some even more weird thoughts on specific things, and I often feel like, âWhat the fuck did I get myself into?âNow, as Iâve basically been into AI Safety for the last 4 years, Iâve really tried to dive deep into the nature of agency. You get into some very weird parts of trying to computationally define the boundary between an agent and the things surrounding it and the division between individual and collective intelligence just starts to break down a bit.
At the same time Iâve meditated a bunch and tried to figure out what the hell the âno-selfâ theory to the mind and body problem all was about and Iâm basically leaning more towards some sort of panpsychist IIT interpretation of consciousness at the moment.
I also believe that only the âselfâ can suffer and that the self is only in the map and not the territory. The self is rather a useful abstraction that is kept alive by your belief that it exists since you will interpret the evidence that comes in as being part of âyou.â It is therefore a self-fulfilling prophecy or part of âdependent originationâ.
A part of me then thinks the most effective thing I could do is examine the âselfâ definition within AIs to determine when it is likely to develop. This feels very much like a âwhat?â conclusion, so Iâm just trying to minimise x-risk instead, as it seems like an easier pill to swallow.
Yeah, so I kind of feel really weird about it, so uhh, to feeling weird, I guess? Respect for keeping going in that direction though, much respect.
Startup: https://ââthecollectiveintelligence.company/ââ
Democracy non-profit: https://ââdigitaldemocracy.world/ââ
So Iâve been working in a very adjacent space to these ideas for the last 6 months and I think that the biggest problems that I have with this is just the feasibility of it.
That being said we have thought about some ways of approaching a GTM for a very similar system. Thr system Iâm talking about here is an algorithm to improve interpretability and epistemics of organizations using AI.
One is to sell it as a way to âalignâ management teams lower down in the organization for the C-suite level since this actually incentivises people to buy it.
A second one is to start doing the system fully on AI to prove that it increases interpretability of AI agents.
A third way is to prove it for non-profits by creating an open source solution and directing it to them.
At my startup weâre doing number two and at a non-profit Iâm helping weâre doing number three. After doing some product market fit people werenât really that excited about number 1 and so we had a hard time getting traction which meant a hard time building something.
Yeah, thatâs about it really, just reporting some of the experience in working on a very similar problem
I appreciate you putting out a support post of someone who might have some EA leanings that would be good to pick up on. I may or may not have done so in the past and then removed the post because people absolutely shat on it on the forum đ so respect.
I guess I felt that a lot of the post was arguing under a frame of utilitarianism which is generally fair I think. When it comes to ânot leaving a footprint on the futureâ what Iâm referring to is epistemic humility about the correct moral theories. Iâm quite uncertain myself about what is correct when it comes to morality with extra weight on utilitarianism. From this, we should be worried about being wrong and therefore try our best to not lock in whatever weâre currently thinking. (The classic example being if we did this 200 years ago we might still have slaves in the future)
Iâm a believer that virtue ethics and deontology are imperfect information approximations of utilitarianism. Like Kantâs categorical imperative is a way of looking at the long-term future and asking, how do we optimise society to be the best that it can be?
I guess a core crux here for me is that it seems like youâre arguing a bit for naive utilitarianism here. I actually donât really believe the idea that we will have the AGI follow the VNM-axioms that is being fully rational. I think it will be an internal dynamic system that are weighing for different things that it wants and that it wonât fully maximise utility because it wonât be internally aligned. Therefore we need to get it right or weâre going to have weird and idiosyncratic values that are not optimal for the long-term future of the world.
I hope that makes sense, I liked your post in general.
Yes, I was on my phone, and you canât link things there easily; that was what I was referring to.
I feel like this goes against the principle of not leaving your footprint on the future, no?
Like, a large part of what I believe to be the danger with AI is that we donât have any reflective framework for morality. I also donât believe the standard path for AGI is one of moral reflection. This would then to me say that we leave the value of the future up to market dynamics and this doesnât seem good with all the traps there are in such a situation? (Moloch for example)
If we want a shot at a long reflection or similar, I donât think full sending AGI is the best thing to do.
How will you address the conflict of interest allegations raised against your organisation? It feels like the two organisations are awfully intertwined. For gods sake, the CEOs are sleeping with each other! I bet they even do each otherâs taxes!
Iâm joining the other EA.
This was a dig at interpretability research. Iâm pro-interpretability research in general, so if you feel personally attacked by this, it wasnât meant to be too serious. Just be careful, ok? :)
It makes sense for the dynamics of EA to naturally go in this way (Not endorsing). It is just applying the intentional stance plus the free energy principle to the community as a whole. I find myself generally agreeing with the first post at least and I notice the large regularization pressure being applied to individuals in the space.
I often feel the bad vibes that are associated with trying hard to get into an EA organisation. Iâm doing for-profit entrepreneurship for AI safety adjacent to EA as a consequence and it is very enjoyable. (And more impactful in my views)
I will however say that the community in general is very supportive and that it is easy to get help with things if one has a good case and asks for it, so maybe we should make our structures more focused around that? I echo some of the things about making it more community focused, however that might look. Good stuff OP, peace.
I did enjoy the discussion here in general. I hadnât heard of the âillusionistâ stance before and it does sound quite interesting yet I do find it quite confusing as well.
I generally find there to be a big confusion about the relation of the self to what âconsciousnessâ is. I was in this rabbit hole of thinking about it a lot and I realised I had to probe the edges of my âselfâ to figure out how it truly manifested. A 1000 hours into meditation some of the existing barriers have fallen down.
The complex attractor state can actually be experienced in meditation and it is what you would generally call a case of dependent origination or a self-sustaining loop (literally, lol). You can see through this by the practice of realising that the self-property of mind is co-created by your mind and that it is âemptyâ. This is a big part of the meditation project. (alongside loving-kindness practice, please donât skip the loving-kindness practice)
Experience itself isnât mediated by this âselfingâ property, it is rather an artificial boundary we have created about our actions in the world for simplification reasons. (See Boundaries as a general way of this occurring.)
So, the self cannot be the ground of consciousness; it is rather a computationally optimal structure for behaving in the world. Yet realizing this fully is easiest done through your own experience, or through n=1 science. Meaning that to fully collect the evidence you will have to discover it through your own phenomenological experience. (which makes it weird to take into western philosophical contexts)
So, the self cannot be the ground and partly as a consequence of this and partly since consciousness is a very conflated term, I like thinking more about different levels of sentience instead. At a certain threshold of sentience the âselfingâ loop is formed.The claims and evidence heâs talking about may be true but I donât believe that justifies the conclusions that he draws from them.
Thank you for this post! I will make sure to read the 5â5 books that I havenât read yet, especially excited about Joseph Heinrichâs book from 2020, had read The Secret of Our Success before but not that one.
I actually come from an AI Safety interest when it comes to moral progress. The question is to some extent for me on how we can set up AI systems so that they continuously improve âmoral progressâ as we donât want to leave our fingerprints on the future.
In my opinion, the larger AI Safety dangers come from âbig data hellâ like the ones described in Yuah Noah Harariâs Homo Deus or Paul Christianoâs slow take-off scenarios.
Therefore we want to figure out how to set up AIs in such a way that automatically improves moral progress in the structure of their use. Iâm also a believer that AI will most likely in the future go through a similar process to the one described in The Secret of Our Success and that we should prepare appropriate optimisation functions for it.
So, if you ever feel like we might die from AI, I would love to see some work in that direction!
(happy to talk more about it if youâre up for it.)
I think that still makes sense under my model of a younger and less tractable field?
Experience comes partly from the field being viable for a longer period of time since there can be a lot more people who have worked in that area in the past.
The well-described steps and concrete near-term goals can be described as a lack of easy tractability?
Iâm not saying that it isnât the case that the proposals in longtermism are worse today but rather that it will probably look different in 10 years? A question that pops up for me is about how great the proposals and applications were in the beginning of animal welfare as a field. Iâm sure it was worse in terms of legibility of the people involved and the clarity of the plans.(If anyone has any light to shed on this, that would be great!)
Maybe thereâs some sort of effect where the more money and talent a field gets the better the applications get. To get there you first have to have people spend on more exploratory causes though? I feel like there should be anecdata from grantmakers on this.