Dan Elton

Karma: 148

https://www.moreisdifferent.com/ https://moreisdifferent.substack.com/ https://twitter.com/moreisdifferent

Dan Elton Nov 21, 2021, 2:04 PM
4 points
1 ∶ 0
on: The academic contribution to AI safety seems large
This is a very important subject. I’ve also come to a similar conclusion independently recently—that a lot of the AI Safety work being funded by EA orgs (OpenPhil in particular sticks out) is prosaic AI alignment work that is not very neglected. Perversely, as the number of academic and commercial researchers and practitioners working on prosaic alignment as exploded, there has been a shift in the AI Safety community towards prosaic AI work and away from agent foundations type work. From what I hear, that has mainly been because of concerns that scaling transformer-like models will lead to existentially dangerous AI, which I personally think is misguided for reasons that would take too much space to go into here. [I personally think scaling up transformers / deep learning will be part of the AGI puzzle but there are missing pieces, and I would rather people try to figure out what those missing pieces are than focus so much on safety and alignment with deep learning which is not neglected at this point.]

I think this post missed how much prosaic AI safety work is being in commercial settings, since it only focused on academia. I am currently working to deploy models from both commercial companies and academic research labs into hospital radiology clinics so we can gather real world data on performance and get feedback from radiologists. The issue of lack of robustness / distribution shift is a huge problem and we are having to develop additional AI models to identify outliers and anomalies and do automated quality control on images to make sure the right body part is shown and its in the right orientation for whatever AI model we are trying to run. I would argue that every major company trying to deploy ML/DL is struggling with similar robustness issues and there is enormous commercial/market incentives to solves such issues. Furthermore, there is intense pressure on companies to make their models explainable (see the EU’s GDPR legislation which gives users a “right to an explanation” in certain cases). We also see this with transformers—after various debacles at Google (like the black people=Gorillas case in Google photos, anti-semitic autocompletes, etc) and Microsoft (Tay), I’m pretty confident both companies will be putting a lot of effort into aligning transformers (the current default for transformer models is to spew racist crap and misinformation at random intervals which is a complete no-go from a commercial perspective).

So, a lot (but not all!) of the work going on at Anthropic and Redwood, to be perfectly frank, seems very un-neglected to me, although there is an argument that academics will have trouble doing such work because of problems accessing compute and data. [To be clear, I still think its worth having EAs doing alignment with the AI methods/models at the frontier of various benchmarks with an eye to where thinks are going in the future, but what I take issue with is the relative amount of EA funding flowing into such work right now. ]

The amount of research going on in both industry and academia under the umbrella of “fairness, accountability, and transparency” has exploded recently (see the ACM FAccT conference). Research on explainability methods has exploded too (a very non-exhaustive list I did in early 2020 includes some 30 or so explainability methods and I’ve seen graphs showing that the number of papers on explainability /interpretability/XAI is growing exponentially).

There is a very important point I want to make on explainability work in particular though—most of it is “post-hoc” explainability where the main goal is increasing trust (which often, to be frank, just involves showing pretty pictures and telling stories and not actually empirically measuring if trust is increased and if the increased trust is warranted/good). (Sometimes researchers also want to debug failures and detect bias, also, although I have not found many non-trivial cases where this has been achieved). I believe 90%+ percent of current explainability work has very little value to actually understanding how AI models work mechanistically and thus has no relevance to AI existential safety. I keep pointing people to the work of Adebayo et al showing that the most popular XAI methods today are useless for actual mechanistic understanding and the work of Hase et al. showing popular XAI methods don’t help humans much to predict how an ML model will behave. So, work on tools and methods for generating and validating mechanistic explanations in particular still seems very neglected. The only real work that goes to the “gears level” (ie level of neurons/features and weights/dependencies) is the “Circuits” work by Olah et al. at OpenAI. The issue with mechanistic explainability work though is the high amount of work/effort required and that it may be hard to transfer methods to future architectures. For instance if AI moves from using transformers to something like Hebbian networks, energy based models, spiking neural nets, or architectures designed by genetic algorithms (all plausible scenarios in my view) then entirely new tool chains will have to be built and plausibly some methods will have to be thrown out completely and entirely new ones developed. Of course, this is all very unclear… I think more clarity will be obtained once we see how well the tools and methods developed by Olah et al. for CNNs transfer to transformers, which I hear is something they are doing at Anthropic.

Dan Elton Oct 25, 2021, 2:05 PM
11 points
0 ∶ 0
on: Objections to Value-Alignment between Effective Altruists
There are lots of of good points here. I could say more but here are just a few comments:

The obsession over intelligence is counterproductive. I worry a lot that EA is becoming too insular and that money and opportunities are being given out based largely on a perception of how intelligent people are and the degree to which people signal in-group status. The result is organizations like MIRI and Leverage staffed by autists that have burned through lots of money and human resources while only producing papers of marginal value. The fact they don’t even bother to get most papers peer reviewed is really bad. Yes, peer review sucks and is a lot of work, but every paper I had peer reviewed was improved by the process. Additionally, peer review and being able to publish in a good journal is a useful (although noisy) signal to outsiders and funders that your work is at least novel and not highly derivative.

The focus on intelligence can be very off-putting and I suspect is a reason many people are not involved in EA. I know one person who said they are not involved because they find it too intimidating. While I have not experienced problems at EA events, I have experienced a few times where people were either directly or indirectly questioning my intelligence at LessWrong events, and I found it off-putting. In one case, someone said “I’m trying to figure out how intelligent you are”. I remember times I had trouble keeping up with face-paced EA conversations. There’s been some conversations I’ve seen which appeared to be a bunch of people trying to impress and signal how intelligent they are rather than doing something constructive.

Age diversity is also an issue. Orgs that have similar values, like humanist orgs or skeptics orgs, have much greater age diversity. I think this is related to the focus on intelligence, especially superficial markers like verbosity and fluency/fast-talking, and the dismissal of skeptics and critics (people who are older tend to have developed a more critical/skeptical take on things due to greater life experience).