Aleksi Maunu

Karma: 88

Aleksi Maunu 23 Sep 2023 11:48 UTC
3 points
0 ∶ 0
in reply to: Zach Stein-Perlman’s comment on: Policy ideas for mitigating AI risk
Naively I would trade a lot of clearly-safe stuff being delayed or temporarily prohibited for even a minor decrease in chance of safe-seeming-but-actually-dangerous stuff going through, which pushes me towards favoring a more expansive scope of regulation.
(in my mind the potential loss of decades of life improvements currently pale vs potential non-existence of all lives in the longterm future)
Don’t know how to think about it when accounting for public opinion though, I expect a larger scope will gather more opposition to regulation, which could be detrimental in various ways, the most obvious being decreased likelihood of such regulation being passed/upheld/disseminated to other places.

Aleksi Maunu 23 Sep 2023 10:58 UTC
1 point
0 ∶ 0
in reply to: Zach Stein-Perlman’s comment on: AI Pause Will Likely Backfire
But the difficulty of alignment doesn’t seem to imply much about whether slowing is good or bad, or about its priority relative to other goals.
At the extremes, if alignment-to-”good”-values by default was 100% likely I presume slowing down would be net-negative, and racing ahead would look great. It’s unclear to me where the tipping point is, what kind of distribution over different alignment difficulty levels one would need to have to tip from wanting to speed up vs wanting to slow down AI progress.
Seems to me like the more longtermist one is, the more slowing down looks good even when one is very optimistic about alignment. Then again there are some considerations that push against this: risk of totalitarianism, risk of pause that never ends, risk of value-agnostic alignment being solved and the first AGI being aligned to “worse” values than the default outcome.
(I realize I’m using two different definitions of alignment in this comment, would like to know if there’s standardized terminology to differentiate between them)

Aleksi Maunu 23 Sep 2023 10:02 UTC
2 points
1 ∶ 0
in reply to: Gerald Monroe’s comment on: AI Pause Will Likely Backfire
How is the “secretly is planning to murder all humans” improving the models scores on a benchmark?
(I personally don’t find this likely, so this might accidentally be a strawman)
For example: planning and gaining knowledge are incentivized on many benchmarks → instrumental convergence makes model instrumentally value power among other things → a very advanced system that is great at long-term planning might conclude that “murdering all humans” is useful for power or other instrumentally convergent goals
You could prove this. Make a psychopathic model designed to “betray” in a game like world and then see how many rounds of training on a new dataset clear the ability for the model to kill when it improves score.
I think with our current interpretability techniques we wouldn’t be able to robustly distinguish between a model that generalized to behave well in any reasonable environment vs a model that learned to behave well in that specific environment but would turn back to betray in many other environments

Aleksi Maunu 23 Sep 2023 9:51 UTC
4 points
1 ∶ 0
in reply to: RobertM’s comment on: AI Pause Will Likely Backfire
GPT-4 doesn’t have the internal bits which make inner alignment a relevant concern.
Is this commonly agreed upon even after fine-tuning with RLHF? I assumed it’s an open empirical question. The way I understand is is that there’s a reward signal (human feedback) that’s shaping different parts of the neural network that determines GPT-4′s ouputs, and we don’t have good enough interpretability techniques to know whether some parts of the neural network are representations of “goals”, and even less so what specific goals they are.
I would’ve thought it’s an open question whether even base models have internal representations of “goals”, either always active or only active in some specific context. For example if we buy the simulacra (predictors?) frame, a goal could be active only when a certain simulacrum is active.
(would love to be corrected :D)

Aleksi Maunu 9 Jul 2023 0:41 UTC
1 point
0 ∶ 0
on: EA Focusmate Group Announcement
Anyone else not able to join the group through the link? 🤔 It just redirects me to the dashboard without adding me in

Aleksi Maunu 16 Jun 2023 21:10 UTC
1 point
1 ∶ 0
in reply to: Joseph Lemien’s comment on: Joseph Lemien’s Shortform
In those cases I would interpret agree votes as “I’m also thankful” or “this has also given me a lot to think about”

Aleksi Maunu 3 Feb 2023 10:06 UTC
8 points
3 ∶ 0
in reply to: Ofer’s comment on: Questions about OP grant to Helena
I think the stated reasoning there by OP is that it’s important to influence OpenAI’s leadership’s stance and OpenAI’s work on AI existential safety. Do you think this is unreasonable?
To be fair I do think it makes a lot of sense to invoke nepotism here. I would be highly suspicious of the grant if I didn’t happen to place a lot of trust in Holden Karnofsky and OP.
(feel free to not respond, I’m just curious)

Aleksi Maunu 23 Jan 2023 17:12 UTC
3 points
0 ∶ 0
in reply to: LGS’s comment on: FLI FAQ on the rejected grant proposal controversy
I think if I was issuing grants, I would use misleading language in such a letter to make it less likely that the grantee organization can’t get registered for some bureaucracy reasons. It’s possible to mention that to the grantee in an email or call too to not cause any confusion. My guess would be that that’s what happened here but that’s just my 2 cents. I have no relevant expertise.

Aleksi Maunu 23 Jan 2023 13:37 UTC
9 points
0 ∶ 0
in reply to: Indra Gesink’s comment on: Disentangling “Improving Institutional Decision-Making”
Thanks for the comment! I feel funny saying this without being the author, but feel like the rest of my comment is a bit cold in tone, so thought it’s appropriate to add this :)
I lean more moral anti-realist but I struggle to see how the concept of “value alignment” and “decision-making quality” are not similarly orthogonal from a moral realist view than an anti-realist view.
Moral realist frame: “The more the institution is intending to do things according to the ‘true moral view’, the more it’s value-aligned.”
“The better the institutions’s decision making process is at predictably leading to what they value, the better their ‘decision-making quality’ is.”
I don’t see why these couldn’t be orthogonal in at least some cases. For example, a terrorist organization could be outstandingly good at producing outstandingly bad outcomes.
Still, it’s true that the “value-aligned” term might not be the best, since some people seem to interpret it as a dog-whistle for “not following EA dogma enough” [link] (I don’t, although might be mistaken). “Altruism” and “Effectiveness”as the x and y axes would suffer from the problem mentioned in the post that it could alienate people coming to work on IIDM from outside the EA community. For the y-axis, ideally I’d like some terms that make it easy to differentiate between beliefs common in EA that are uncontroversial (“let’s value people’s lives the same regardless of where they live”), and beliefs that are more controversial (“x-risk is the key moral priority of our times”).
About the problematicness of ” value-neutral”: I thought the post gave enough space for the belief that institutions might be worse than neutral on average, marking statements implying the opposite as uncertain. For example crux (a) exists in this image to point out that if you disagree with it, you would come to a different conclusion about the effectiveness of (A).
(I’m testing out writing more comments on the EA forum, feel free to say if it was helpful or not! I want to learn to spend less time on these. This took about 30 minutes.)

Aleksi Maunu 23 Jan 2023 12:56 UTC
1 point
0 ∶ 0
in reply to: weeatquince’s comment on: Disentangling “Improving Institutional Decision-Making”
(not the author)
4. When I hear “(1) IIDM can improve our intellectual and political environment”, I’m imagining something like if the concept of steelmanning becomes common in public discourse, we might expect that to indirectly lead to better decisions by key institutions.

Aleksi Maunu 3 Dec 2022 10:23 UTC
3 points
0 ∶ 0
on: Most students who would agree with EA ideas haven’t heard of EA yet (results of a large-scale survey)
Does anyone have thoughts on
1. How does the FTX situation affect the EV of running such a survey? My first intuition is that running one while the situation’s so fresh is worse than waiting a 3-6 months, but I can’t properly articulate why.
2. What, if any, are some questions that should be added, changed, or removed given the FTX situation?

Aleksi Maunu 17 Nov 2022 2:53 UTC
1 point
0 ∶ 0
in reply to: S.E. Montgomery’s comment on: Some important questions for the EA Leadership
For what it’s worth connecting SBF and Musk might’ve been a time sensitive situation for a reason or another. There would’ve also still been time to debate the investment in the larger community before the deal would’ve actually gone through.

Aleksi Maunu 18 Oct 2022 13:37 UTC
9 points
3 ∶ 0
on: AI Safety Ideas: An Open AI Safety Research Platform
Small note: the title made me think the platform is made by the organization Open AI

Aleksi Maunu 8 Oct 2022 13:23 UTC
3 points
0 ∶ 0
on: What does it mean for an AGI to be ‘safe’?
(After writing this I thought of one example where the goals are in conflict: permanent surveillance that stops the development of advanced AI systems. Thought I’d still post this in case others have similar thoughts. Would also be interested in hearing other examples.)
I’m assuming a reasonable interpretation of the proxy goal of safety means roughly this: “be reasonably sure that we can prevent AI systems we expect to be built from causing harm”. Is this a good interpretation? If so, when is this proxy goal in conflict with the goal of having “things go great in the long run”?
I agree that it’s epistemically good for people to not confuse proxy goals with goals, but in practice I have trouble thinking of situations where these two are in conflict. If we’ve ever succeeded in the first goal, it seems like making progress in the second goal should be much easier, and at that point it would make more sense to advocate using-AI-to-bring-a-good-future-ism.
Focusing on the proxy goal of AI safety seems also good for the reason that it makes sense across many moral views, while people are going to have different thoughts on what it means for things to “go great in the long run”. Fleshing out those disagreements is important, but I would think there’s time to do that when we’re in a period of lower existential risk.

Aleksi Maunu 8 Oct 2022 12:36 UTC
3 points
1 ∶ 0
in reply to: trevor1’s comment on: What does it mean for an AGI to be ‘safe’?
I’d be interested in the historical record for similar industries, could you quickly list some examples that come to your mind? No need to elaborate much.

Aleksi Maunu 7 Oct 2022 15:50 UTC
5 points
0 ∶ 0
in reply to: titotal’s comment on: Reasons for my negative feelings towards the AI risk discussion
Interesting, I hadn’t thought of the anchoring effect you mention. One way to test this might be to poll the same audience about other more outlandish claims, something like the probability of x-risk from alien invasion, or CERN accidentally creating a blackhole.

Aleksi Maunu 1 Oct 2022 21:36 UTC
1 point
0 ∶ 0
in reply to: Guy Raveh’s comment on: Does biodiversity loss warrant being it’s own cause area?
disclaimer: I don’t feel like I know much about wild animal welfare, last read about it about 2 years ago
You’re right, I think suffering-focused wasn’t the right term to use, as all WAW interventions that come to my mind are about reducing animals’ suffering. I should’ve asked if you’re assuming that WAW people think that:
1. animals’ lives are usually net-negative
2. the best way to help them and future animals is to kill them / cause them to not exist
I would guess that (1) is a common belief, but that a only a minority of people who work in WAW believe in (2). But this guess isn’t based on much.
The first WAW interventions that come to my mind :
- Vaccinating animal populations from painful diseases (and finding out a way to do so that doesn’t mess with ecosystems)
  - seems to be the first intervention idea that WAW advocates like to bring up
- Genetically engineering predators to be omnivores for the benefit of their preys
  - I read people arguing about this in the comments of a slatestarcodex post, don’t know if anyone’s actually pursuing research into the feasibility of it. Either way it seems like this would be super complicated for many reasons, so likely a far off thing.
- Incentivizing people to build buildings from certain materials that don’t cause tons of bugs to exist underneath them (assumes that bugs’ expected life experiences are net negative)
  - probably from Brian Tomasik’s blog
Assuming success, only the last one seems to me like it would be bad from a biodiversity perspective.

Aleksi Maunu 21 Sep 2022 19:25 UTC
3 points
0 ∶ 0
in reply to: RayTaylor’s comment on: Does biodiversity loss warrant being it’s own cause area?
Doesn’t this depend on assuming negative utilitarianism, and suffering-focused ethic, or a particular set of assumptions about the net pleasure vs pain in the life of an ‘average’ animal?
I don’t think it depends on those things, what they meant by species not being inherently valuable is that each individual of a species is inherently valuable. It’s a claim that the species’ value comes from the value of the individuals (not taking into account value from stuff like possibly making ecological collapse less likely etc).
(I only read the beginning of your comment, sorry for not responding to the rest!)

Aleksi Maunu 21 Sep 2022 19:10 UTC
3 points
0 ∶ 0
in reply to: Jonas Kathage’s comment on: Does biodiversity loss warrant being it’s own cause area?
To the extent that moral uncertainty pushes you to give more credence to common sense ethical views, it does point towards prioritizing biodiversity more than a consequentialist view would otherwise imply, as “let’s preserve species” and “let’s preserve option value” are common sense ethical views. Probably not enough to affect prioritization in practice though.

Aleksi Maunu 21 Sep 2022 18:53 UTC
2 points
0 ∶ 0
in reply to: Guy Raveh’s comment on: Does biodiversity loss warrant being it’s own cause area?
How does biodiversity conflict with WAW? I would imagine that there’s many possible interventions which are good both in terms of increasing the wellbeing of animals in the wild, and in keeping species from going extinct. Are you assuming a suffering-focused view of WAW?