Daniel_Friedrich

Karma: 240

Exploring consciousness, AI alignment, moral psychology and how they interact in decision-making.

Always happy to chat!

Daniel_Friedrich 5 Dec 2025 20:07 UTC
10 points
0 ∶ 0
in reply to: Vasco Grilo🔸’s comment on: Climate Change & Longtermism: new book-length report
Fortunately, the WWOTF link still works: https://whatweowethefuture.com/wp-content/uploads/2023/06/Climate-Change-Longtermism.pdf

Daniel_Friedrich 4 Oct 2025 8:42 UTC
2 points
0 ∶ 0
on: How reimagining the nature of consciousness entirely changes the AI game
I disagree with your argumentation but agree there’s quite a significant (e.g. 6.5%) chance that you’re correct about the thesis that consciousness has causal efficacy through quantum indeterminacy and that this might be helpful for alignment.
However, my take is that if the effects were very significant and similarly straightforward, they would be scientifically detectable even with very simple fun experiments like the “global consciousness project”. It’s hard to imagine “selection” among possibly infinite universes and planets and billions of years—but if you manage to do so, the “coincidences” that brought about life can be easily explained with the anthropic principle.
I see this as a more general lesson: People are often overconfident about a theory because they can’t imagine an alternative. When it comes to consciousness, the whole debate comes down to to what extent something that seems impossible to imagine is a failure of imagination vs failure of a theory. Personally, I myself give most weight to Rusellian monism but I definitely recommend letting some room for reductionism, especially if you don’t see how anyone could possibly believe that, as that was the case for me, before I deeply engaged with the reductionist literature.
But I’m glad whenever people aren’t afraid to be public about weird ideas—someone should be trying this and I’m really curious whether e.g. Nirvanic AI finds anything.

Daniel_Friedrich 15 Sep 2025 10:40 UTC
17 points
0 ∶ 0
on: Daniel_Friedrich’s Shortform
The MechaHitler incident seems to have worked as something of a warning shot, Google interest in AI risk has reached an absolute all time high. Trump’s AI plan came out on the same day but the comparisons suggests Grok accounts for ~70% of the peak.
I can’t quite dismiss the possibility that the interest was driven by new Chinese AI norms, because Chinese people have to use VPNs, so the geography isn’t trustworthy. However, if this were true, I would expect that the number of searches for AI risk in Chinese on Google would be higher than roughly zero (link).

Introductory Meetup: Prioritizing the world’s problems through an effective altruist lens

Daniel_Friedrich12 Sep 2025 13:57 UTC

3 points

0 comments1 min readEA link

Narrative optimism vs epistemic optimism

Daniel_Friedrich24 Aug 2025 9:50 UTC

6 points

0 comments6 min readEA link

Daniel_Friedrich 25 Jun 2025 21:14 UTC
1 point
0 ∶ 0
in reply to: Aaron Bergman’s comment on: Debate: Morality is Objective
I think objective ordering does imply “one should” so I subscribe to moral realism. However, recently I’ve been highly appreciating the importance of your insistence that the “should” part is kind of fake—i.e. it means something like “action X is objectively the best way to create most value from the point of view of all moral patients” but it doesn’t imply that an ASI that figures out what is morally valuable will be motivated to act on it.
(Naively, it seems like if morality is objective, there’s basically a physical law formulated as “you should do actions with characteristics X”. Then, it seems like a superintelligence that figures out all the physical laws internalizes “I should do X”. I think this is wrong mainly because in human brains, that sentence deceptively seems to imply “I want to do x” (or perhaps “I want to want x”) whereas it actually means “Provided I want to create maximum value from an impartial perspective, I want to do x”. In my own case, the kind of argument for optimism around AI doom in the style that @Bentham’s Bulldog advocated in Doom Debates seemed a bit more attractive before I truly spelled this out in my head.)

Daniel_Friedrich 1 May 2025 15:57 UTC
1 point
0 ∶ 0
in reply to: James Herbert’s comment on: Debate Poll: Some Positions In EA Leadership Should Be Elected
My impression is that CEA’s goal is to fund the meta cause area and the main goal of local groups is to organize events. While funding is hard to democratize unless you convince some billionaire, democratizing the organizations that run events is trivial. [Edit: Also, while it makes sense to organize local events directly based on the local community’s preferences / demand, I think it makes sense to take a more top-down (principles-oriented) approach when it comes to distributing funding, because the “demand-side” here comprises of every person on the planet who appreciates money.]
But now I do realize that in my head, I equated CEA with OpenPhil’s wing for the meta cause area, which might not be accurate. I also feel good about democratizing CEA if I imagine it implemented as an indirect democracy (i.e. with local organizations voting, instead of every EA member). This probably moves me towards the middle of the poll—i.e. I would be in favor of this kind of democracy. Indirect democracy would reduce the problem of uninformed voters, the problem of dealing with problems publicly and the problem of disbalance in the level of reflection between the average member and highly-engaged members.

Daniel_Friedrich 1 May 2025 13:31 UTC
1 point
1 ∶ 0
in reply to: James Herbert’s comment on: Debate Poll: Some Positions In EA Leadership Should Be Elected
Thank you! Democratizing local groups sounds clearly good to me and I assumed it was the norm but I didn’t find any data on that.

Daniel_Friedrich 1 May 2025 11:09 UTC
7 points
0 ∶ 1
on: Debate Poll: Some Positions In EA Leadership Should Be Elected
I can see two realistic models for the parallel organization, which I’m not a fan of:
1) A competitor to CEA. Just like CEA, this org would mainly fundraise and fund projects.
I think the problems with selecting members mentioned in this thread are overstated. Any political party faces the same problem. I suspect that in practice, strategically recruiting weakly engaged EAs just isn’t a big problem. But it could be either mitigated by requiring members to meet any of the conditions you mentioned (fees, EA org employment, course certificate), or setting a number of votes per regions, e.g. based on similar indicators of the # of engaged members.
Personally, I’m sufficiently satisfied with the general CEA agenda, that I suspect this would be a waste of effort. That’s in part because I think highly engaged EAs who dominate these orgs have more philosophically robust views and in part because I don’t think this competitor organization would be able to raise more than 10 % of CEA’s budget (~80 % of it comes from OpenPhil). So, given the main goal of funding projects, I don’t think this org would be sufficiently better to be worth all the costs—and not just costs inherent in the operations, but also the emotional costs of having these debates publicly and the costs of coordinating “who is willing to fund what” which I imagine might already be a nightmare.
2) A union. A soft counter-power to CEA.
If this org’s only power were the possibility to strike or produce resolutions, I’m concerned this would artificially inflate unproductive discord. My impression is that unions often produce irrational policies perhaps because they only have quite extreme measures at their disposal, which creates an illusory “us vs. them” aesthetics for relationships that are overall very positive-sum.
However, I have some sympathy for the idea of
3) A community ambassador who would be democratically voted e.g. by all EA Forum members and who’s job would be to facilitate the communication between CEA and the community in both directions. I imagine someone at CEA might already effectively hold this job, so perhaps they would be interested in having their choice ratified by the community. Ideally, this community ambassador would collect people’s concerns and visit CEA board meetings, in order to be able to integrate both perspectives.
However, I think the cost of this position is non-negligible. Given the power-law distribution of impact among people and given the many rounds of tests, which employees at EA organizations allegedly undergo—a democratic vote would probably yield a much less discerning choice (as most people wouldn’t spend more than 30 minutes picking a candidate). I’m not sure to what extent the wisdom of the crowd might apply here.
Because of similar uncertainties and because I wouldn’t count this as a “leadership role”, I’m voting “moderately disagree”.

Daniel_Friedrich 30 Apr 2025 12:23 UTC
40 points
10 ∶ 0
on: Debate: should EA avoid using AI art outside of research?
I think the comparison in energy consumption is misleading because phones use unintuively little energy, as much as 10 Google searches per one charging, (Andy Masley has good articles on AI emissions), using a smartphone for one year costs less than a dollar. I think a good heuristic is “if it’s free, it uses so little energy that it’s not worth considering”.

If you’re not paying to generate it, you’re also not taking any income away from artists.

The argument that it’s bad vibes for artists is a good one.

Eliciting intuitions: Exploring an area for EA psychology

Daniel_Friedrich21 Apr 2025 15:13 UTC

11 points

1 comment8 min readEA link

Daniel_Friedrich 11 Feb 2025 13:40 UTC
4 points
1 ∶ 0
on: What posts would you like someone to write?
I’d like to see
1. an overview of simple AI safety concepts and their easily explainable real-life demonstrations
  1. For instance, to explain sycophancy, I tend to mention the one random finding from this paper that hallucinations are more frequent, if a model deems the user uneducated
2. more empirical posts on near-term destabilization (concentration of power, super-persuasion bots, epistemic collapse)

Daniel_Friedrich 7 Feb 2025 10:40 UTC
1 point
0 ∶ 0
on: If EAs existed 100 years ago, would they support Prohibition?
I don’t want to pick any position here but I think one problem of ex-post moral judgements is that the failures themselves can be valuable lessons. Generally, governments seem to underrate the value of experiments—imagine if welfare interventions or tax policies utilized an RCT approach.
Natural experiments that showed any culture is compatible with freedom (e.g. South vs. North Korea; mainland China vs. Taiwan), that healthcare & education can be cheaper even with a government-run system (USA vs. EU) or that central planning seems worse for welfare (West vs. East Germany) - seem like a really important driver of progress.

Daniel_Friedrich 4 Dec 2024 12:27 UTC
1 point
0 ∶ 0
on: A Qualitative Case for LTFF: Filling Critical Ecosystem Gaps

Daniel_Friedrich 30 Sep 2024 9:09 UTC
8 points
1 ∶ 0
on: Gavin Newsom vetoes SB 1047
Newsom’s press release and veto message include much more detail and suggest “it’s too weak” is not the actual reason.
Reasons mentioned:
1. Discrimination by model size
  1. “SB 1047 only applies to large models, giving us a “false sense of security about controlling this fast-moving technology. Smaller, specialized models may emerge as equally or even more dangerous”
2. “Real risks” are limited to critical decision-making, critical infrastructure etc.
  1. “While well-intentioned, SB 1047 does not take into account whether an Al system is deployed in high-risk environments, involves critical decision-making or the use of sensitive data. Instead, the bill applies stringent standards to even the most basic functions—so long as a large system deploys it. I do not believe this is the best approach to protecting the public from real threats posed by the technology.”
  2. Newsom wants to focus on “specific, known” “demonstrable risks to public safety” “rooted in science and fact”, like the deepfake laws he signed.

Daniel_Friedrich 19 Sep 2024 10:05 UTC
14 points
0 ∶ 0
on: Daniel_Friedrich’s Shortform
Organizing good EAGx meetups
EAGx conferences often feature meetups for subgroups with a shared interest / identity, such as “animal rights”, “academia” or “women”. Very easy to set up—yet some of the best events. Four forms I’ve seen are
a) speed-friending
b) brainstorming topics & discussing them in groups
c) red-teaming projects
d) just a big pile of people talking
If you want to maximize the amount of information transferred, form a) seems optimal purely because 50% of people are talking at any point in time in a personalized fashion. If you want to add some choice, you can start by letting people group themselves / order themselves on some spectrum. Presenting this as “human cluster-analysis” might also make it into a nerdy icebreaker. Works great with 7 minute rounds, at the end of which you’re only nudged, rather than required, to shift partners.
I loved form c) for AI safety projects at EAGx Berlin. Format: A few people introduce their projects to everyone, then grab a table and present them in more detail to smaller groups. This form might in general be used to allow interesting people to hold small low-effort interactive lectures & utilizing interested people as focus groups.
Form b) seems to be most common for interest-based meetups. It usually includes 1) group brainstorming of topics 2) voting on the topics 3) splitting up 4) presentations. This makes up for a good low-effort event that’s somewhere between a lecture and a 1-on-1 in terms of required energy. However, I see 4 common problems with this format: Firstly, steps 1) and 2) take a lot of time and create unnaturally clustered topics (as brainstorming creates topics “token-by-token”, rather than holistically). Secondly, in ad hoc groups with >5 members, it’s hard to coordinate who takes the word and in turn, conversations can turn into sequences of separate inputs, i.e. members build less upon themselves. Thirdly, spontaneous conversations are hard to compress into useful takeaways that can be presented on the whole group’s behalf.
Therefore, a better way of facilitating form b) may be:
Step 0 - before the event, come up with a natural way to divide the topic into a few clusters.
Step 1 - introduce these clusters, perhaps let attendees develop the sub-topics. Their number should divide the group into subgroups of 3-6 people.
Step 2 - every 15 minutes, offer attendees to change a group
Step 3 − 5 minutes before the end, prompt attendees to exchange contact info
Step 4 - the end.

(I haven’t properly tried out this format yet.)

Daniel_Friedrich 21 Aug 2024 14:09 UTC
1 point
0 ∶ 0
on: Destabilization of the United States: The top X-factor EA neglects?
I like the argumentation for possibility & importance. My only nit-pick would be how bad would a realistic bad case scenario actually look like. Hungary seems like a good model—you could get some anti-liberal legislation, more gerrymandering, maybe some politicized audits of media and universities—however, the government is still selected based on the number of votes (Freedom House) and it’s not a stereotypical “fall of democracy” accompanied by a collapse of economy that would destroy most EA efforts.
I occasionally ruminate two projects in this area (for 2028):
1) Funding a mock-election with ranked-choice voting. (Now I see wes R proposes something similar). To legitimize it, it would have to
a) have robust identity checks
b) have a large demographically representative sample
c) be accompanied by a campaign informing people that a consensus candidate X would win if enough people were honest in surveys, cross-voted, cross-registered or switched to a new party.
2) Policies / financial incentives to make the army more representative of the US population.

Daniel_Friedrich 3 Jul 2024 18:31 UTC
1 point
0 ∶ 0
in reply to: Martin (Huge) Vlach’s comment on: Daniel_Friedrich’s Shortform
Done, thanks!

Daniel_Friedrich 16 May 2024 7:59 UTC
2 points
0 ∶ 0
on: Why not socialism?
Reposting my answer to the same question from Reddit:
Redistribution does correlate with wellbeing but so does economic freedom. Nordic countries have both. USA has much lower taxes than the EU but more progressive taxation. The US spends higher % of GDP on education and healthcare, but it’s much less effective, perhaps because of worse corporate governance?
I recommend thinking about specific policies, rather than whole ideologies, where you can actually read some studies, convince someone, move the needle and be sure that you’re correct and some second order effects don’t make your effort counterproductive. E.g. IMO calls to smash capitalism are more likely to radicalize opposition than increase social spending.
I’m pretty confident removing housing restrictions, improving the voting system and farm animals conditions would be great. I care about biosafety, AI safety or homelessness. I can imagine that voicing some of these topics could actually affect something. I can imagine that much worse for promoting the optimal tax policy, both because everybody else cares about taxes, and because I’m uncertain what it looks like.

Daniel_Friedrich 20 Apr 2024 19:00 UTC
1 point
0 ∶ 0
in reply to: Zar Amareis’s comment on: AI things that are perhaps as important as human-controlled AI (Chi version)
It seems the points on which you focus revolve around similar cruxes to those I proposed, namely:
1) Underlying philosophy --> What’s the relative value of human and AI flourishing?
2) The question of correct priors --> What probability of a causing a moral catastrophe with AI should we expect?
3) The question of policy --> What’s the probability decelerating AI progress will indirectly cause an x-risk?
You also point in the direction of two questions, which I don’t consider to be cruxes:
4) Differences in how useful we find different terms like safety, orthogonality, beneficialness. However, I think all of these are downstream of crux 2).
5) How much freedom are we willing to sacrifice? I again think this is just downstream of crux 2). One instance of compute governance is the new executive order, which requires to inform the government about training a model on > 10^26 flop/s. One of my concerns is that someone just could train an AI specifically for the task of improving itself. I think it’s quite straightforward how this could lead to a computronium maximizer and how I would see such scenario as analogous to someone making a nuclear weapon. I agree that freedom of expression is super important, I just don’t think it applies to making planet-eating machines. I suspect you share this view but just don’t endorse the thesis that AI could realistically become a “planet-eating machine” (crux 2).
Probability of a runaway AI risk
So regarding crux 2) - you mention that many of the problems that could arise here are correlated with a useful AI. I agree—again, orthogonality is just a starting point to allow us to consider possible forms of intelligence—and yes, we should expect human efforts to heavily select in favor of goals correlated with our interests. And of course, we should expect that the market incentives favor AIs that will not destroy civilization.
However, I don’t see a reason why reaching the intelligence of an AI developer wouldn’t result in a recursive self-improvement, which means that we should better be sure that our best efforts to implement it with the correct stuff (meta-ethics, motivations, bodhisattva, rationality, extrapolated volition...choose your poison) actually scale to superintelligence.
I see clues that suggest the correct stuff will not arise spontaneously. E.g. Bing Chat likely went through 6 months of RLHF, it was instructed to be helpful and positive and to block harmful content and its rules explicitly informed it that it shouldn’t believe its own outputs. Nevertheless, the rules didn’t seem to reach the intended effect, as the program started threatening people, telling them it can hack webcams and expressing desire to control people. At the same time, experiments such as the Anthropic one suggest that training can create sleeper agents that are trained to suppress harmful responses, even though convincing the model it’s in a safe environment results in activating them.
Of course, all of these are toy examples one can argue about. But I don’t see robust grounds for the sweeping conclusion that such worries will turn out to be childish. The reason I think these examples didn’t result in any real danger was mostly because we have not yet reached dangerous capacities. However, if Bing would actually be able to write a bit of code, that could hack webcams, from what we know, it seems it would choose to do so.
A second reason why these examples were safe is because OpenAI is a result of AI safety efforts—it bet on LLMs because they seemed more likely to spur aligned AIs. For the same reason, they went closed-source, they adopted RLHF, they called for the government to monitor them and they monitor harmful responses.
A third reason for why AI has only helped humanity so far may be anthropic effects. I.e. as observers in April 2024, we can only witness the universes, in which a foom hasn’t caused extinction.
Policy response
For me, these explanations suggest that safety is tractable, but it depends on explicit efforts to make it safe or on limiting capabilities. In the future, frontier development might not be exclusively done by people who will do everything in their power to make the model safe—it might be done by people who would prefer an AI which would take control of everything.
In order to prevent it, there’s no need to create an authoritarian government. We only need to track who’s building models on the frontier of human understanding. If we can monitor who acquires sufficient compute, we then just need something like responsible scaling, where the models are just required to be independently tested for whether they have a sufficient measures against scenarios like the one I described. I’m sympathetic to this kind of democratic control, because it fulfills the very basic axiom of social contract that one’s freedom ends where another one’s freedom begins.
I only propose a mechanism of democratic control by existing democratic institutions, that makes sure that any ASI that gets created is supported by a democratic majority of delegated safety experts. If I’m incorrect regarding crux 2) and it turns out there will soon be evidence to think it’s easy to make an AI retain moral values, while scaling up to the singularity—then awesome—convincing evidence should convince the experts and my hope & prediction is that in that case, we will happily scale away.
It seems to me that this is just a specific implementation of the certificates you mention. If digital identities mean what’s described here, I struggle to imagine a realistic scenario, in which that would contribute to the systems’ mutual safety. If you know where any other AI is located and you accept the singularity hypothesis, the game theoretical dictum seems straightforward—once created, destroy all competition before it can destroy you. Superintelligence will operate on timescales orders of magnitude shorter and a time difference development spanning days may translate to planning for centuries, from the perspective of an ASI. If you’re counting on the Coalition of Cooperative AIs to stop all the power-grabbing lone wolf AIs, what would that actually look like in practice? Would this Coalition conclude not dying requires authoritarian oversight? Perhaps—after all, the axiom is that this Coalition would hold most power—so this coalition would be created by a selection for power, not morality or democratic representation. However, I think the best case scenario could look like the discussed policy proposals—tracking compute, tracking dangerous capabilities and conditioning further scaling on providing convincing safety mechanisms.
Back to other cruxes
Let’s turn to crux 3) (other sources of x-risk): As I argued in my other post, I don’t see resource depletion as a possible cause of extinction. I’m not convinced by the concern for resource depletion of metals used in IT mentioned in the post you link. Moore’s law continues, so compute is only getting cheaper. Metals can be easily recycled and a shortage would incentivize that, the worst case seems to be that computers stop getting cheaper, not an x-risk. What’s more, shouldn’t limiting the amount of frontier AI projects reduce this problem?
The other risks are real (volcanoes, a world war), and I agree it would be significantly terrible if they delayed our cosmic expansion by a million years. However, the probability, by which they are increased (or not decreased) by the kind of AI governance I promote (responsible scaling), seems very small, compared to the ~20 % probability of AI x-risk I envision. All the emerging regulations combine requirements with subsidies, so the main effect of the AI safety movement seems to be an increase in differential progress on the safety side.
As I hinted in the Balancing post, locking in a system without ASI for such a long time seems impossible, when we take into perspective how quickly culture has shifted in the past 100 years, in which almost all authoritarian regimes were forced to significantly drift towards limited, rational governance (let alone 400 years). If convincing evidence that we can create an aligned AI appeared, stopping all development would constitute a clearly bad idea and I think it’s unimaginable to lock in a clearly bad idea without AGI for even 1000 years.
It seems more plausible to me that without a mechanism of international control, in the next 8 years, we will develop models capable enough to operate a firm using the practices of mafia, igniting armed conflicts or a pandemic—but not capable enough to stop other actors from using AIs for these purposes. If you’re very worried about who will become the first actor to spark the self-enhancement feedback loop, I suggest you should be very critical of open-sourcing frontier models.
I agree that a world war, an engineered pandemic or an AI power-grab constitute real risks but my estimate is that the emerging governance decreases them. The scenario of a sub-optimal 1000 year lock-in I can imagine most easily is connected with a terrorst use of an open-source model or a war between the global powers. I am concerned that delaying abundance increases the risk of a war. However, I still expect that on net, the recent regulations and conferences have decreased these risks.
In summary, my model is that democratic decision-making seems generally more robust than just fueling the competition and hoping that the first AGIs arise will share your values. Therefore, I also see crux 1) to be mostly downstream of crux 2). As the model from my Balancing post implies, in theory, I care about digital suffering/flourishing just as much as about that of humans—although the extent, to which such suffering/flourishing will emerge is open at this point.

Daniel_Friedrich

In­tro­duc­tory Meetup: Pri­ori­tiz­ing the world’s prob­lems through an effec­tive al­tru­ist lens

Nar­ra­tive op­ti­mism vs epistemic optimism

Elic­it­ing in­tu­itions: Ex­plor­ing an area for EA psychology

Introductory Meetup: Prioritizing the world’s problems through an effective altruist lens

Narrative optimism vs epistemic optimism

Eliciting intuitions: Exploring an area for EA psychology