Dan H

Karma: 1,053

https://danhendrycks.com

Dan H 4 Aug 2021 16:12 UTC
7 points
0 ∶ 0
in reply to: TruePath’s comment on: Utilitarianism Symbol Design Competition
I think the “heart in a lightbulb” insignia for EA is a great design choice and excellent for outreach, but there is no such communicable symbol for utilitarianism. Companies know to spend much on design for outreach since visualization is not superfluous. I do not think the optimal spending is $0, as is currently the case. A point of the competition is finding a visual way of communicating a salient idea about utilitarianism suitable for broader outreach. I do not know what part is best to communicate or how best to communicate it—that’s part of the reason for the competition.

Dan H 5 Aug 2021 19:44 UTC
0 points
0 ∶ 0
in reply to: Aaron Gertler’s comment on: Utilitarianism Symbol Design Competition
There isn’t an official body for utilitarianism, so no decisions are official. A community competition brings in more submissions and voices, and it is a less arbitrary process. I’ll try to have many utilitarian-minded people vote on the insignia compared to just one person.

Dan H 5 Aug 2021 19:48 UTC
1 point
0 ∶ 0
in reply to: peterhartree’s comment on: Utilitarianism Symbol Design Competition

My main concern is that the format disproportionately encourages submissions from amateurs

We also crosspost on reddit to attract people who know how to design logos.

The claim that “symbolism is important” is not substantiated

I would need evidence against the claim that imagery basically worthless. Even in academic ML research, it’s a fatal mistake not to spend at least a day thinking about how to visualize the paper’s concepts. This mistake is nonetheless common.

Dan H 30 Aug 2021 1:23 UTC
1 point
0 ∶ 0
on: Utilitarianism Symbol Design Competition
A reminder that the competition ends this month!

Dan H 27 Jan 2022 18:31 UTC
1 point
0 ∶ 0
on: Winning Design of the Utilitarianism Symbol Competition
I spent some time improving the design. Here is the current design in three color scheme options.
SVGs are here: https://drive.google.com/drive/folders/1ZxtJ_gf9T_H4AIZnpmCzPngp5h0XPojf?usp=sharing
Here are past logos by other people:

Dan H 28 Mar 2022 1:00 UTC
27 points
0 ∶ 0
on: The role of academia in AI Safety.
For concrete research directions in safety and several dozen project ideas, please see our paper Unsolved Problems in ML Safety: https://arxiv.org/abs/2109.13916

Note that some directions are less concretized than others. For example, it is easier to do work on Honest AI and Proxy Gaming than it is to do work on, say, Value Clarification.

Since this paper is dense for newcomers, I’m finishing up creating a course that will expand on these safety problems.

NeurIPS ML Safety Workshop 2022

Dan H26 Jul 2022 15:33 UTC

72 points

0 comments1 min readEA link

(neurips2022.mlsafety.org)

[MLSN #6]: Transparency survey, provable robustness, ML models that predict the future

Dan H12 Oct 2022 20:51 UTC

21 points

1 comment6 min readEA link

Dan H 6 Feb 2023 19:45 UTC
3 points
2 ∶ 0
in reply to: Geoffrey Miller’s comment on: Why Theorems? A Brief Defence
Even in deep learning, proofs have by and large been a failure. Proofs would be important, and there are many people trying to find angles of attack for having useful proofs for deep learning systems, so it is hard to say it is neglected. Unfortunately useful proofs are rarely tractable for complex systems such as deep learning systems. Compared to other interventions I would not bet a substantial amount on proofs for deep learning systems given its important, neglectedness, and tractability.

Dan H 7 Feb 2023 3:10 UTC
3 points
1 ∶ 0
on: Donation recommendations for xrisk + ai safety
safe.ai/

Dan H 14 Feb 2023 5:22 UTC
1 point
0 ∶ 0
on: Interviews with 97 AI Researchers: Quantitative Analysis

lasting effect on their beliefs

take new action(s) at work

These could mean a lot of things. Are there more specific results?

Dan H 20 Feb 2023 15:57 UTC
1 point
0 ∶ 0
in reply to: titotal’s comment on: Don’t Call It AI Alignment
“AI safety” refers to ensuring that the consequences of misalignment are not majorly harmful
That’s saying that AI safety is about protective mechanisms and that alignment is about preventative mechanisms. I haven’t heard the distinction drawn that way, and I think that’s an unusual way to draw it.
Context:
Preventative Barrier: prevent initiating hazardous event (decrease probability(event))
Protective Barrier: minimize hazardous event consequences (decrease impact(event))
Broader videos about safety engineering distinctions in AI safety: [1], [2].

Dan H 21 Feb 2023 4:28 UTC
3 points
1 ∶ 0
in reply to: David Johnston’s comment on: There are no coherence theorems
I agree fitness is a more useful concept than rationality (and more useful than an individual agent’s power), so here’s a document I wrote about it: https://drive.google.com/file/d/1p4ZAuEYHL_21tqstJOGsMiG4xaRBtVcj/view

AI and Evolution

Dan H30 Mar 2023 13:09 UTC

41 points

1 comment2 min readEA link

(arxiv.org)

Dan H 31 Mar 2023 14:43 UTC
43 points
8 ∶ 1
on: Critiques of prominent AI safety labs: Redwood Research
The failure of Redwood’s adversarial training project is unfortunately wholly unsurprising given almost a decade of similarly failed attempts at defenses to adversarial examples from hundreds or even thousands of ML researchers. For example, the RobustBench benchmark shows the best known robust accuracy on ImageNet is still below 50% for attacks with a barely perceptible perturbation.
The better reference class is adversarially mined examples for text models. Meta and other researchers were working on a similar projects before Redwood started doing that line of research. https://github.com/facebookresearch/anli is an example. (Reader: evaluate your model’s consistency for what counts as alignment research—does this mean non-x-risk-pilled Meta researchers do some alignment research, if we believe RR project constituted exciting alignment research too?)
Separately, I haven’t seen empirical demonstrations that pursuing this line of research can have limited capabilities externalities or result in differential technological progress. Robustifying models against some kinds of automatic adversarial attacks (1,2) does seem to be separable from improving general capabilities though, and I think it’d be good to have more work on that.
We recommend this article by an MIT CS professor which is partly about how creating a sustainable work culture can actually increase productivity.
This researcher’s work attitude is only part of a spectrum. Many researchers find great returns working 80+ hours a week. Some labs differentiate themselves by having usual hours, but many successful labs have their members work a lot, and that works out well. For example, Dawn Song’s students work a ton, and some other Berkeley grad students in other labs are intimidated by her lab’s hours, but that’s OK because her graduate students find that environment suitable. It’d be nice if this post was more specific about how much of the work culture discontent is about hours vs other issues.

Dan H 1 Apr 2023 2:29 UTC
10 points
2 ∶ 0
in reply to: Paul_Christiano’s comment on: Critiques of prominent AI safety labs: Redwood Research

I don’t think Redwood’s project had identical goals, and would strongly disagree with someone saying it’s duplicative.

I agree it is not duplicative. It’s been a while, but if I recall correctly the main difference seemed to be that they chose a task with gave them a extra nine of reliability (started with an initially easier task) and pursued it more thoroughly.

think I’m comparably skeptical of all of the evidence on offer for claims of the form “doing research on X leads to differential progress on Y,”

I think if we find that improvement of X leads to improvement on Y, then that’s some evidence, but it doesn’t establish that it’s differential. If we find that improvement on X also leads to progress on thing Z that is highly indicative of general capabilities, then that’s evidence against. If we find that it mainly affects Y but not other things Z, then that’s reasonable evidence it’s differential. For example, so far, transparency hasn’t affected general capabilities, so I read that as evidence of differential technological progress. As another example, I think trojan defense research differentially improves our understanding our trojans; I don’t see it making models better at coding or gaining new general instrumental skills.

I think commonsense is too unreliable of a guide when thinking about deep learning; deep learning findings are phenomena are often unintelligible even in hindsight (I still don’t understand why some of my research papers’ methods work). That’s why I’d prefer empirical evidence. Empirical research claiming to differentially improve safety should demonstrate a differential safety improvement empirically.

Dan H 1 Apr 2023 2:30 UTC
8 points
1 ∶ 1
in reply to: Nate Thomas’s comment on: Critiques of prominent AI safety labs: Redwood Research
I think the adversarial mining thing was hot in 2019. IIRC, Hellaswag and others did it; I’d venture maybe 100 papers did it before RR, but I still think it was underexplored at the time and I’m happy RR investigated it.