Alex P

Karma: 12

Alex P Sep 21, 2022, 4:28 AM
1 point
0 ∶ 0
in reply to: Mau’s comment on: Why AGIs utility can’t outweigh humans’ utility?
>By “satisfaction” I meant high performance on its mesa-objective
Yeah, I’d agree with this definition.
I don’t necessarily agree with your two points of skepticism, for the first one I’ve already mentioned my reasons, for the second one it’s true in principle but it seems almost anything an AI would learn semi-accidentally is going to be much simpler and more intrinsically consistent than human values. But low confidence on both and in any case that’s kind of beyond the point, I was mostly trying to understand your perspective on what utility is.

Alex P Sep 21, 2022, 2:42 AM
1 point
0 ∶ 0
in reply to: Mau’s comment on: Why AGIs utility can’t outweigh humans’ utility?
I am familiar with the basics of ML and the concept of mesa-optimizers. “Building copies of itself” (i.e. multiply) is an optimization goal you’d have to specifically train into the system, I don’t argue with that, I just think it’s a simple and “natural” (in the sense it aligns reasonably well with instrumental convergence) goal that you can robustly train it comparatively easily.
“Satisfaction” however, is not a term that I’ve met in ML or mesa-optimizers context, and I think the confusion comes from us mapping this term differently onto these domains. In my view, “satisfaction” roughly corresponds to “loss function minimization” in the ML terminology—the lower an AIs loss function, the higher satisfaction it “experiences” (literally or metaphorically, depending on the kind of AI). Since any AI [built under the modern paradigm] is already working to minimize its own loss function, whatever that happened to be, we wouldn’t need to care much about the exact shape of the loss function it learns, except that it should robustly include “building copy of itself”. And since we’re presumably talking about a super-human AIs here, they would be very good at minimizing that loss function. So e.g. they can have some stupid goal like “maximize paperclips & build copies of self”, they’ll convert the universe to some mix of paperclips and AIs and experience extremely high satisfaction about it.
But you seem to be meaning something very different when you say “satisfaction”? Do you mind stating explicitly what it is?

Alex P Sep 20, 2022, 11:29 PM
1 point
0 ∶ 0
in reply to: Mau’s comment on: Why AGIs utility can’t outweigh humans’ utility?
My point is, getting the “multiply” part right is sufficient, AI will take care of the “satisfaction” part on its own, especially given that it’s able to reprogram itself.
This assumes “[perceived] goal achievement” == “satisfaction” (aka utility), which was my assumption all along, but apparently is only true under preference utilitarianism.

Alex P Sep 20, 2022, 8:55 PM
1 point
0 ∶ 0
on: Why AGIs utility can’t outweigh humans’ utility?
Ok, so here’s my take away from the answers so far:
Most flavors of utilitarianism (except for preference utilitarianism) don’t consider any goal-having agent achieving those goals as utility. Instead there assumed to be some metric of similarity between the goals and/or mental states of the agent and those of humans, and the agent’s achievement of its goals counts the less toward total utility the lower this similarity metric is, so completely alien agents achieving their alien goals and [non-]experiencing alien non-joy about it don’t register as adding utility.
How exactly this metric should be formulated is disputed and fuzzy, and quite often a lot of this fuzziness and uncertainty is swept under the rug with the word “sentience” (or something similar) written on it.
Additionally, the proportion of EAs who would seriously consider “all humans replaced by [particular kind of] AIs” as an acceptable outcome may be not as trivial as I assumed.
Please let me know if I’m grossly misunderstanding or misrepresenting something, and thank you everyone for your explanations!

Alex P Sep 20, 2022, 8:47 PM
1 point
0 ∶ 0
in reply to: TW123’s comment on: Why AGIs utility can’t outweigh humans’ utility?
>It’s hard to imagine AI systems having this

Why? As per instrumental convergence, any advanced AI is likely to have self-preservation and a negative reward signal it would receive upon a violation of such drive would be functionally very similar to pain (give or take the bodily component, but I don’t think it’s required? Otherwise simulate a million human minds in agony is OK, and I assume we agree it’s not). Likewise, any system with goal-directed agentic behavior would experience some reward from moving towards its goals, which seems functionally very similar to pleasure (or satisfaction or something along these lines).

Alex P Sep 20, 2022, 8:36 PM
1 point
0 ∶ 0
in reply to: Mau’s comment on: Why AGIs utility can’t outweigh humans’ utility?
Can you, um, coherently imagine an agent that does not try to achieve its own goals (assuming it has no conflicting goals)?

Alex P Sep 20, 2022, 6:47 PM
1 point
0 ∶ 0
in reply to: Mau’s comment on: Why AGIs utility can’t outweigh humans’ utility?
That’s true, but I think robustly embedding a goal of “multiply” is much easier than actual alignment. You can express it mathematically, you can use evolution, etc.
[To reiterate, I’m not advocating for any of this, I think any moral system that labels “humans replaced by AIs” as an acceptable outcome is a broken one]

Alex P Sep 20, 2022, 5:43 PM
1 point
0 ∶ 0
in reply to: TW123’s comment on: Why AGIs utility can’t outweigh humans’ utility?
So two questions (please also see my reply to HjalmarWijk for context)::
1. Do you on these grounds think that insect suffering (and everything more exotic) is meaningless? Because our last common ancestor with insects hardly have any neurons, and unsurprisingly our neuronal architecture is very different, so there isn’t many reasons to expect any isomorphism between our “mental” processes.
2. Assuming an AI is sentient (in whatever sense you put into this word) but otherwise not meaningfully isomorphic to humans. How do you define “positive” inner life in that case?

Alex P Sep 20, 2022, 5:35 PM
1 point
0 ∶ 0
in reply to: bbartlog’s comment on: Why AGIs utility can’t outweigh humans’ utility?
Ok, so the crux of my question was not understanding that non-preference utilitarianism exists, although now I’m even more confused, as I explained in my reply to HjalmarWijk. You also seem to be coming from the assumption that suffering (and I assume pleasure) exists separately from an agent achieving it’s goals, so I’m curious to hear your thoughts on how you define them?
>So for me there isn’t really a paradox to resolve when it comes to propositions like ‘the best future is one where an enormous number of highly efficient AGIs are experiencing as much joy as cybernetically possible, meat is inefficient at generating utility’.

Does this mean that you can agree with such proposition?

Alex P Sep 20, 2022, 5:31 PM
1 point
0 ∶ 0
in reply to: Larks’s comment on: Why AGIs utility can’t outweigh humans’ utility?
Eliezer seems to come from the position that utility is more or less equal to “achieving this agent’s goals, whatever those are” and as such even agents extremely different from humans can have it (example of a trillion times more powerful AI). This is very different from [my understanding of] what HjalmarWijk above says, where utility seems to be defined in a more-or-less universal way and a specific agent can have goals orthogonal or even opposite to utility, so you can have a trillion agents fully achieving their goals and yet not a single “utiliton”.
Re other ethical systems—I’m mostly asking about utilitarianism, because it’s what nearly everyone working on alignment subscribes to, and also I know even less about other systems. But at a first glance, seems like deontological or virtue ethics can have either ways out of this problem? And for relativism or egoism it’s a non-issue.

Alex P Sep 20, 2022, 5:25 PM
1 point
0 ∶ 0
in reply to: HjalmarWijk’s comment on: Why AGIs utility can’t outweigh humans’ utility?
Aaaaahhhh, that’s it, “preference utilitarianism” is the concept I was missing! Or rather, I assumed that any utilitarianism is preference utilitarianism, in that it leaves definition of what’s “good” or “bad” to the agents involved. And apparently it’s not the case?
Only now I’m even more confused. What is “welfare” you’re referring to, if it is not achievement of agent’s goals? Saying things like “joy” or “happiness” or “maximum utility” doesn’t really clarify anything when we’re talking about non-human agents. How do you define utility in non-preference utilitarianism?

[Question] Why AGIs utility can’t outweigh humans’ utility?

Alex PSep 20, 2022, 5:16 AM

6 points

25 comments1 min readEA link

Alex P Sep 19, 2022, 11:42 PM
12 points
1 ∶ 5
on: Alex P’s Shortform
In the recent interview with Katja Grace referenced on ACX, she mentioned that many people may be opposed to slowing down AI progress because (I’m paraphrasing) they perceive AGI as a genie that will solve longevity and other problems and bring about the Cool Transhumanist Future, which they won’t see otherwise—due to their age and/or longer timelines. I have been hanging out in longevity/anti-aging spaces for a while and this perspective is exceedingly common there. People are very hyped about AI coming and curing aging, and dismiss any concerns about AI safety.
From this point of view, solving aging, or even just making a tangible progress towards LEV , will make those people less resistant to the notion of slowing/stopping AI progress. This is complementary to the often mentioned idea that longer life expectancy will cause people to care more about AGI (and other existential risks). This of course doesn’t imply that anti-aging is more important than direct work on AI alignment, but 1) it is likely more tractable and 2) not everyone can or want to work on AI alignment directly.

Alex P’s Quick takes

Alex PSep 19, 2022, 11:42 PM

1 point

1 comment EA link

Alex P Sep 19, 2022, 11:39 PM
−4 points
0 ∶ 0
in reply to: Patrick Wilson’s comment on: [Cause Exploration Prizes] Longer, healthier lives: public engagement around ageing and health
Not quite sure, but as far as I understand only the top 10 or so voted posts were getting any funding within this contest, and the contest is closed by now. There is definitely other ways to get funding from EA, but I’m one of the least qualified people on this forum to advice on those. Jack Harley (of LongevityWiki fame) is probably the right guy to ask about other avenues, he’s much more involved with the community and he’s working on a similar task—public engagement around longevity and anti-aging.

Alex P Sep 2, 2022, 8:50 PM
3 points
0 ∶ 0
on: [Cause Exploration Prizes] Longer, healthier lives: public engagement around ageing and health
I think this area may well be one of the most promising new cause areas for EA. Getting to general public’s consciousness the idea that aging may be [to an extent] treatable, as well as affecting the regulatory bodies (FDA, EMA and such) and professional community to make developing anti-aging treatments easier is both tractable, neglected, and if done successfully it can affect allocation of the huge governmental and private funds currently directed to drug discovery, thus creating a leverage effect. It’s very unfortunate this got as few votes as it did.

Alex P

[Question] Why AGIs util­ity can’t out­weigh hu­mans’ util­ity?

Alex P’s Quick takes

[Question] Why AGIs utility can’t outweigh humans’ utility?