How I Formed My Own Views About AI Safety

Neel Nanda27 Feb 2022 18:52 UTC

134 points

Inside vs. outside view AI safety Cause prioritization Career choice Existential risk Epistemic deference AI alignment Community epistemic health Building effective altruism

Link post

Disclaimer: I work as a researcher at Anthropic, but this post entirely represents my own views, rather than the views of my own employer

Introduction

I’ve spent the past two years getting into the field of AI Safety. One important message I heard as I was entering the field was that I needed to “form an inside view about AI Safety”, that I needed to form my own beliefs and think for myself rather than just working on stuff because people smarter than me cared about it. And this was incredibly stressful! I think the way I interpreted this was pretty unhealthy, caused me a lot of paralysing uncertainty and anxiety, and almost caused me to give up on getting into the field. But I feel like I’ve now reached a point I’m comfortable with, and where I somewhat think I have my own inside views on things and understand how to form them.

In this post, I try to explain the traps I fell into and why, what my journey actually looked like, and my advice for how to think about inside views, now I’ve seen what not to do. This is a complex topic and I think there are a lot of valid perspectives, but hopefully my lens is novel and useful for some people trying to form their own views on confusing topics (AI Safety or otherwise)! (Note: I don’t discuss why I do now think AI Safety is important and worth working on—that’s a topic for a future post!)

The Message of Inside Views

First, context to be clear about what I mean by inside views. As I understand it, this is a pretty fuzzily defined concept, but roughly means “having a clear model and argument in my head, starting from some basic and reasonable beliefs about the world, that get to me to a conclusion like ‘working on AI Safety is important’ without needing to rely on deferring to people”. This feels highly related to the concept of gears-level models. This is in comparison to outside views, or deferring to people, where the main reason I believe something is because smart people I respect believe it. In my opinion, there’s a general vibe in the rationality community that inside views are good and outside views are bad (see Greg Lewis’ In Defence of Epistemic Modesty for a good argument for the importance of outside views and deferring!). Note that this is not the Tetlockian sense of the words, used in forecasting, where outside view means ‘look up a base rate’ and inside view means ‘use my human intuition, which is terribly calibrated’, where the standard wisdom is outside view > inside view.

Good examples of this kind of reasoning: Buck Shlegeris’ My Personal Cruxes for Working on AI Safety, Richard Ngo’s AGI Safety from First Principles, Joseph Carlsmith’s report on Existential Risk from Power-Seeking AI. Note that, while these are all about the question of ‘is AI Safety a problem at all’, the notion of an inside view also applies well to questions like ‘de-confusion research/reinforcement learning from human feedback/interpretability is the best way to reduce existential risk from AI’, arguing for specific research agendas and directions.

How I Interpreted the Message of Inside Views

I’m generally a pretty anxious person and bad at dealing with uncertainty, and sadly, this message resulted in a pretty unhealthy dynamic in my head. It felt like I had to figure out for myself the conclusive truth of ‘is AI Safety a real problem worth working on’ and which research directions were and were not useful, so I could then work on the optimal one. And that it was my responsibility to do this all myself, that it was bad and low-status to work on something because smart people endorsed it.

This was hard and overwhelming because there are a lot of agendas, and a lot of smart people with different and somewhat contradictory views. So this felt basically impossible. But it also felt like I had to solve this before I actually started any permanent research positions (ie by the time I graduated) in case I screwed up and worked on something sub-optimal. And thus, I had to solve this problem that empirically most smart people must be screwing up, and do it all before I graduated. This seemed basically impossible, and created a big ugh field around exploring AI Safety. Which was already pretty aversive, because it involved re-skilling, deciding between a range of different paths like PhDs vs going straight into industry, and generally didn’t have a clean path into it.

My Journey

So, what actually happened to me? I started taking AI Safety seriously in my final year of undergrad. At the time, I bought the heuristic arguments for AI Safety (like, something smarter than us is scary), but didn’t really know what working in the field looked like beyond ‘people at MIRI prove theorems I guess, and I know there are people at top AI labs doing safety stuff?’ I started talking to lots of people who worked in the field, and gradually got data on what was going on. This was all pretty confusing and stressful, and was competing with going into quant finance—a safe, easy, default path that I already knew I’d enjoy.

After graduating, I realised I had a lot more flexibility than I thought. I took a year out, and managed to finagle my way into doing three back-to-back AI Safety internships. The big update was that I could explore AI Safety without risking too much—I could always go back into finance in a year or two if it didn’t work out. I interned at FHI, DeepMind and CHAI—working on mathematical/theoretical safety work, empirical ML based stuff to do with fairness and bias, and working on empirical interpretability work respectively. I also did the AGI Fundamentals course, and chatted to a lot of people at the various orgs I worked at and at conference. I tried to ask all the researchers I met about their theory of change for how their research actually matters. One thing that really helped me was chatting to a researcher at OpenAI who said that, when he started, he didn’t have clear inside views. But that he’d formed them fairly organically over time, and just spending time thinking and being in a professional research environment was enough.

At the end of the year, I had several offers and ended up joining Anthropic to work on interpretability with Chris Olah. I wasn’t sure this was the best option, but I was really excited about interpretability, and it seemed like the best bet. A few months in, this was clearly a great decision and I’m really excited about the work, but it wouldn’t have been the end of the world if I’d decided the work wasn’t very useful or a bad fit, and I expect I could have left within a few months without hard feelings. As I’ve done research and talked to Chris + other people here, I’ve started to form clearer views on what’s going on with interpretability and the theory of impact for it and Anthropic’s work, but there’s still big holes in my understanding where I’m confused or deferring to people. And this is fine! I don’t think it’s majorly holding me back from having an impact in the short-term, and I’m forming clearer views with time.

My Advice for Thinking About & Forming Inside Views

Why to form them?

I think there are four main reasons to care about forming inside views:

Truth-tracking—having an impact is hard! It’s really important to have true beliefs, and the best way to find them is by trying hard to form your own views and ensuring they correlate with truth. It’s easy to get deferring wrong if you trust the wrong people.
- I’m pretty unconvinced by this one—it doesn’t seem that hard to find people smarter than me, who’ve thought about each problem for longer than I have, and just believing whatever they believe. Especially if I average multiple smart people’s beliefs
  - Eg, I haven’t thought too much about biosecurity, but will happily defer to people like Greg Lewis on the topic!
Ensuring good community epistemic health—Maybe your personal inside view will track the truth less well than the best researchers. But it’s not perfectly correlated! If you try hard to find the truth on your own, you might notice ideas other people are missing, can poke holes in popular arguments, etc. And this will make the community as a whole better off
- This one is pretty legit, but doesn’t seem that big a deal. Like, important, sure, but not something I’d dedicate more than 5% of my effort towards max
- It seems particularly important to avoid information cascades where I work on something because Alice thinks it matters, and then Bob is a bit skeptical of Alice alone but observes that both Alice and I believe it matters, and works on it even harder, Charlie sees me, Alice and Bob, etc. This is a main reason I try hard to distinguish between what I believe all things considered (including other people’s views) and what I believe by my own lights (according to my own intuitions + models of the world)
Motivation—It’s really hard to work on something you don’t believe in!
- I personally overthink things, and this one is really important to me! But people vary—this is much more a fact about personal psychology than an abstract statement about how to have an impact
Research quality—Doing good research involves having good intuitions and research taste, sometimes called an inside view, about why the research matters and what’s really going on. This conceptual framework guides the many small decisions and trade-offs you make on a daily basis as a researcher
- I think this is really important, but it’s worth distinguishing this from ‘is this research agenda ultimately useful’. This is still important in eg pure maths research just for doing good research, and there are areas of AI Safety where you can do ‘good research’ without actually reducing the probability of x-risk.
  - Toy example: Let’s say there are ten good AI Safety researchers in the world, who all believe different things. My all-things-considered view should put 10% credence on each person’s view. But I’ll get much more research done if I randomly pick one person and fully adopt their views and dive into their research agenda. So, even if only one researcher is correct, the latter strategy is much better in expected value.
- This is one of the main reasons that mentorship is so key. I have become a way more effective interpretability researcher by having ready access to Chris to ask for advice, intuitions and direction. And one of my top priorities is absorbing as many of his conceptual frameworks as I can
  - More generally, IMO the point of a research mentor is to lend you their conceptual frameworks to advise you on how to make the right decisions and trade-offs. And you slowly absorb their frameworks by supervised learning, and build on and add to them as you grow as a researcher

These are pretty different, and it’s really important to be clear about which reasons you care about! Personally, I mostly care about motivation > research quality = impact >> community epistemics

How to form them?

Talk to people! Try to absorb their inside views, and make it your own
- Importantly, the goal is not to defer to them, it’s to understand what they believe and why.
- My main tool for this is to ask lots of questions, and then paraphrase—summarise back my understanding in my own words, and ask what’s wrong or what I’m missing.
  - My default question is ‘so, why, concretely, does your research direction reduce existential risk from AI?’
  - Or, ‘what are the biggest ways you disagree with other researchers?’ Or ‘why aren’t you working on X?’
- I really, really love paraphrasing! A few reasons it’s great:
  - It forces you to actively listen and process in the moment
  - It’s much easier to correct than teach—the other person can easily identity issues in your paraphrase and correct them
  - It makes it obvious to myself if I’m confused or don’t understand something, or if I’m deferring on any points—it’s awkward to say things that are confused!
  - Once I get it working, I have now downloaded their mental model into my head and can play around with it
  - Once you’ve downloaded multiple people’s models, you can compare them, see how they differ, etc
- A variant—focus on cruxes, key claims where if they changed their mind on that they’d change their mind about what to work on.
  - This is really important—some people work on a direction because they think it’s the most important, other people work on it because eg it’s a good personal fit or they find it fun. These should be completely different conversations
- A variant—write a google doc summarising a conversation and send it to them afterwards for comments. This can work great if you find it hard to summarise in the moment, and can produce a good artefact to publish or share—I’d love it if people did this more with me
You have permission to disagree (even with really cool and high-status people)
- This was a big update for me! Someone being smart and competent just means they’re right more often, not that they’re always right
- It really helps to have a low bar for asking dumb questions—if you poke at everything that might be wrong, 90% of the time they’re right and you learn something, and 10% of the time they missed something
- For example, I’ve done research in the past that, in hindsight, I don’t think was particularly useful. And this is totally fine!
- Empirically, there’s a lot of smart people who believe different and contradictory things! It’s impossible for all of them to be right, so you must disagree with some of them. Internalising that you can do this is really important for being able to think clearly
Don’t be a monk—you form an inside view by going out in the world and doing things—not just by hiding away and thinking really hard
- Eg, just try doing research! Spend 10 hours pursuing something, write up a blog post, fail, succeed, hear criticism, see what you learn and make updates
- Talk to lots of people!
- Live your life, and see what happens—my thoughts naturally change a lot over time
- It’s valuable to spend some time reading and thinking, but if this is all you do I think that’s a mistake
Think from first principles (sometimes)
- Concrete exercise: Open a blank google doc, set a one hour timer, and start writing out your case for why AI Safety is the most important problem to work on. Spend the full hour on this, and if you run out of steam, go back through and poke at everything that feels confusing, or off, or dodgy. Write out all the counter-arguments you can think of, and repeat
- This definitely isn’t all you should do, but I think this is a really useful exercise for anything confusing!
Don’t just try harder—I have a failure mode I call pushing the Try Harder button where I label something as important and just try to channel a lot of willpower and urgency towards it. Don’t do that! This takes a long time, and a lot will happen naturally as you think, talk to people, and do research.
- If you find this really stressful, you have my permission to chill and not make it a priority for a while!
- I’ve found my inside views develop a lot over time, fairly organically
Inside vs outside views is a spectrum—there’s no clear division between thinking for yourself and deferring. Forming inside views starts out by deferring, and then slowly forming more and more detailed models of where I’m deferring and why over time
- My views have gone fairly organically from naive stories like ‘AGI seems scary because intelligence is important and smart people think this matters’ to more detailed ones like ‘I think one reason AGI is scary is inner misalignment. Because neural networks have the base optimiser of stochastic gradient descent, the network may end up as a mesa-optimiser with a different mesa-objective. And this may create an instrumental incentive for power seeking’. The latter story is way more detailed, but still includes a lot of implicit deferring—eg that we’ll get AGI at all, that it’ll be via deep learning, that mesa-optimisers are a thing at all, that there’s an instrumental incentive for power seeking, etc. But expanding the tree of concepts like this is what progress looks like!
- Or, ‘I should work on AI because AGI will happen eventually—if nature did it, so can we’ to ‘AGI is compute constrained. Using the bioanchors method to link to the size of the human brain gives 30-ish year AI timelines for human-level AI. I believe AGI is compute constrained because of some heuristic arguments about empirical trends, and because lots of smart people believe this’
- Getting here looks like downloading other people’s gears level models into your head, and slowly combining them, deleting parts you disagree with, adding ideas of your own, etc

Misc

Defer intelligently—Don’t just adopt someone’s opinions as your own because they’re charismatic, high status, or well-credentialed. Think about why you think their opinions track the truth better than your own, and in which areas you’re willing to defer to them. Figure out how hard they’ve thought about this, and whether they’ve taken the belief seriously
- One key question is how much feedback they get from the world—would they know if they were wrong? I think some fields score much better on this than others—I’m a lot more comfortable disagreeing with many moral philosophy professors and being a committed consequentialist than I am with eg disagreeing with most algebraic geometers. Mathematicians get feedback re whether there proofs work in a way that, as far as I can tell, moral philosophy doesn’t
- And be domain specific—I’d defer to a Cambridge maths professor about mathematical facts, but not on a topic like ‘how best to teach maths to undergraduates’ - they clearly haven’t done enough experimentation to tell if they’re missing out on vastly better methods
You can act without an inside view
- Forming a good inside view takes a really long time! I’ve been doing full-time safety research for the past year and a bit and I’m still very confused
  - An analogy—a PhD is essentially a training program to give people an inside view for a specific research area. And this takes several years! IMO a question like ‘is AGI an existential risk’ is much harder than most thesis topics, and you don’t have a hope of really understanding it without that much work
- You can always change your mind and pivot later! Make the best expected value bet given what you know at the time, and what information you might get in future
- Gathering information has costs! Sometimes thinking harder about a problem is analysis paralysis, and it’s worth just running with your best guess
- I think it’s good to spend maybe 10% of your time long-term on high-level thinking, strategy, forming inside views, etc—a lot of your time should be spent actually doing stuff!
  - Though it’s OK to spend a higher percentage early on when you have major decisions like what career path to go down.
You don’t have to form an inside view—Forming inside views that track the truth is hard, and it’s a skill. You might just be bad at it, or find it too stressful. And this is fine! It shouldn’t be low-status or guilt-inducing to just do what people more competent than you recommend
- You can be a great research assistant, ops person, engineer etc without having a clear inside view—just find someone smart who you trust, explain your situation, and do what they think is best
  - I think the main reason this is a bad idea is motivational, not really about truth-tracking. And it’s up to you how much you care about this motivationally!
  - An analogy: I think basically all AI Safety researchers who have ideas for an agenda should get funded, even if I personally think their agenda is BS. Likewise, I want them all to have enough labour available to execute well on their agenda—picking the agenda you’re the best personal fit for and just deferring is a good way to implement this in practice.
Aim high, but be OK with missing—It’s valuable and important practice to try forming inside views, but it’s also pretty hard! It’s OK to struggle and not make much progress
- IMO, trying to think for yourself is great training—it’ll help you think more clearly, be harder to con, become a better researcher, etc.
- Outside view: The vast majority of the world thinks AI Safety is nonsense, and puts very few resources towards it. This is worth taking seriously! You shouldn’t throw your life away on a weird and controversial idea without thinking seriously about it first
- This is a good way to trade-off between motivation and truth-tracking—so long as I try hard to think for myself, I feel OK motivationally, even if I know that I may not be tracking truth well
  - In practice, I try hard to form my own views, but then make big decisions by deferring a lot and forming an all-things-considered view, which I expect to track truth better
- If you aren’t doing full-time research, it’s much harder to form clear views on things! This is a really hard thing you’re trying to do
Convey mindsets, not inside views—If you’re talking to someone else about this stuff, eg while community building, it’s important to try to convey the spirit and mindset of forming inside views, more so than your actual views. Try to convey all of the gears-level models in your head, but make it clear that they’re just models! Try to convey what other people believe in.
- I try hard to be clear about which beliefs I’m confident in, which are controversial, which points I’m deferring on, and which things I’ve thought hard about. I think this is important for avoiding information cascades, and building a healthy community
- Relatedly, if you’re mostly doing community building, it’s totally fine to not have inside views on hard technical questions like AI Safety! Your goal is more to help people in your community form their own views on things—having views of your own is helpful but not essential.

What links here?

Neel Nanda27 Feb 2022 18:52 UTC

134 points

12 comments14 min readEA link

Inside vs. outside view AI safety Cause prioritization Career choice Existential risk Epistemic deference AI alignment Community epistemic health Building effective altruism

Miranda_Zhang 28 Feb 2022 13:49 UTC
12 points
0 ∶ 0
Thank you for this. I am a community-builder and I’ve definitely started emphasizing the importance of developing inside views to my group members. However, it seems like there may be domains where developing an inside view is relatively less important (e.g., algebraic geometers vs moral philosophers), because experts in that field appear to have better feedback loops. Given this, I’m curious whether you think community-builders might want to form inside views* on which areas to emphasize inside view formation for, to help communicate more accurately to our members?
*I’m not confident I’m describing an ‘inside view.’ Maybe this is something like, ‘getting a sense of outside views across an array of domains?’
I found your post doubly useful because I’ve recently been exploring how I can form inside views, which I’ve found both practically and emotionally difficult. Not being familiar with the rationality or AI safety community, I was surprised by how much emphasis was placed on inside views and started feeling a bit like an imposter in the EA community. I definitely felt like it was “low-status” to not have inside views on the causes I prioritized, though I expect at least some of this was due to my own anxiety.
Being able to see how you tackled this is really useful, as it gives me another model for how I could develop inside views (particularly on AI risk, which is the first thing I’m working on). It also reinforces that a lot of people have more career flexibility than they think—and so, perhaps, it’s okay if I haven’t figured out whether I should switch from community building into AI safety research in the three months before I graduate!
- Jamie B 3 Mar 2022 10:54 UTC
  4 points
  0 ∶ 0
  Parent
  Hey! I have been thinking about this a lot from the perspective of a confused community builder / pipeline strategist, too. I didn’t get so far as Neel, it’s been great to read this post before getting anywhere near finishing my thoughts. It captures a lot of the same things better than I had. Thanks for your comment too—definitely a lot of overlap here!
  I have got as far as some ideas, here, and would love any initial thoughts before I try to write it up with more certainty?
  First a distinction which I think you’re pointing at—an inside view on what? The thing I can actually have an excellent inside view about as a (full-time) community builder is how community building works. Like, how to design a programme, how people respond to certain initiatives, what the likelihood certain things work are, etc.
  Next, programmes that lead to working in industry, academic field building, independent research, etc, look different. How do I decide which to prioritise? This might require some inside view on how each direction changes the world (and interacts with the others), and lead to an answer on which I’m most optimistic about supporting. There is nobody to defer to here, as practitioners are all (rightly) quite bullish about their choice. Having an inside view on which approach I find most valuable will lead to quite concrete differences in the ultimate strategy I’m working towards or direction I’m pointing people in, I think.
  When it comes to what to think about object-level work (i.e. how does alignment happen, technically), I get more hazy on what I should aim for. By statistical arguments, I reckon most inside views that exist on what work is going to be valuable are probably wrong. Why would mine be different? Alternatively, they might all be valuable, so why support just one. Or something in between. Either way, if I am doing meta work, it will probably be wrong to be bullish about my single inside view on ‘what will go wrong’. I think I should aim to support a number of research agenda if I don’t have strong reasons to believe some are wrong. I think this is where I will be doing most of my deferral, ultimately (and as the field shifts from where I left it).
  However, understanding how valuable the object-level work is does seem important for deciding which directions to support (e.g. academia vs industry), so I’m a bit stuck on where to draw a kune. As Neel says, I might hope to get as far understanding what other people believe about their agenda and why—I always took this as “can I model the response person X would give, when considering an unseen question”, rather than memorising person X’s response to a number of questions.
  I think where I am landing on this is that it might be possible to assume uniform prior over the directions I could take, and adjust my posterior by ‘learning things’ and understanding their models on both the direction-level and object-level, properly. Another thought I want to explore—is this something like a worldview diversification over directions? It feels similar, as we’re in a world where it ‘might turn out’ some agenda or direction was correct, but there’s no way of knowing that right now.
  To confirm—I believe people doing the object-level work (i.e. alignment research) should be bullish about their inside view. Let them fight it out, and let expert discourse decide what is “right” or “most promising”. I think this amounts to Neel’s “truth-seeking” point.
  - Miranda_Zhang 4 Mar 2022 3:43 UTC
    2 points
    0 ∶ 0
    Parent
    Hey Jamie, thanks for this! Seems like you’ve thought about it quite a bit—probably more than I have—but here are my initial thoughts. Hope this is helpful to you; if so, maybe we should chat more!
    First a distinction which I think you’re pointing at—an inside view on what? [...] How do I decide which to prioritise? This might require some inside view on how each direction changes the world (and interacts with the others), and lead to an answer on which I’m most optimistic about supporting. There is nobody to defer to here, as practitioners are all (rightly) quite bullish about their choice. Having an inside view on which approach I find most valuable will lead to quite concrete differences in the ultimate strategy I’m working towards or direction I’m pointing people in, I think.
    Agree! When I first wrote my comment, I labelled this a ‘meta-inside view:’ an inside view on what somebody (probably you, but possibly others like your group members) need to form inside views on. But this might be too confusing compared to less jargon-y phrases like, ‘prioritizing what you form an inside view on first’ or something.
    Regardless, I think we are capturing the same issue here—although I don’t use ‘issue’ in a negative sense. In my ideal world, community-builders would form pretty different views on causes to prioritize because this would help increase intellectual diversity and the discovery of the ‘next-best’ thing to work on. That doesn’t mean, however, that there couldn’t be some sort of guidance for how community-builders might go about figuring out what to prioritize.
    I think this is where I will be doing most of my deferral, ultimately (and as the field shifts from where I left it).
    Yeah, I think this is the status quo for any field that one isn’t an expert on. Community-builders may be experts on community-building, but that doesn’t extend to other domains, hence the need for deferral. Perhaps the key difference here is that community-builders need to be extra aware of the ever-shifting landscape and stay plugged-in, since their advice may directly impact the ‘next generation’ of EAs.
    However, understanding how valuable the object-level work is does seem important for deciding which directions to support (e.g. academia vs industry), so I’m a bit stuck on where to draw a kune. As Neel says, I might hope to get as far understanding what other people believe about their agenda and why—I always took this as “can I model the response person X would give, when considering an unseen question”, rather than memorising person X’s response to a number of questions.
    Hmm, I think you’re right that developing an inside view for a specific cause would influence the levers that you think are most important (which has effects on your CB efforts, etc.) - but I’m not sure this has much implications for what CBs should do. My prior is that it is very unlikely that there are any causes where only a handful of levers and skillsets would be relevant, such that I would feel comfortable suggesting that people rely more on personal fit to figure out their careers once they’ve chosen a cause area. However, I acknowledge that there is definitely more need in certain causes (e.g., software engineers for AI safety): I just don’t think that the CB level is the right level to apply this knowledge. I would feel more comfortable having cause-specific recruiters (c.f., University community building seems like the wrong model for AI safety).
    I definitely agree on the latter point. I see community-builders as both building and embodying pipelines to the EA community! As the ‘point-of-entry’ for many potential EAs, I think it is sufficient for CBs to be able to model the mainstream views for core cause areas. I expect that the most talented CBs will probably have developed inside views for a specific cause outside of CB, but that doesn’t seem necessary to me for good CB work.
    I think where I am landing on this is that it might be possible to assume uniform prior over the directions I could take, and adjust my posterior by ‘learning things’ and understanding their models on both the direction-level and object-level, properly. Another thought I want to explore—is this something like a worldview diversification over directions? It feels similar, as we’re in a world where it ‘might turn out’ some agenda or direction was correct, but there’s no way of knowing that right now.
    Oh, I’m a huge fan of worldview diversification! I don’t currently have thoughts on starting with a non-/uniform prior … I am, honestly, somewhat inclined to suggest that CBs ‘adapt’ a bit to the communities in which they are working. That is, perhaps what should partly affect a CB’s prioritization re: inside view development is the existing interests of their group. For example, considering the Bay Area’s current status as a tech hub, it seems pretty important for CBs in the Bay Area to develop inside views on, say, AI safety—even if AI safety may not be what they consider the most pressing issue in the entire world. What do you think?
    To confirm—I believe people doing the object-level work (i.e. alignment research) should be bullish about their inside view. Let them fight it out, and let expert discourse decide what is “right” or “most promising”.
    Also completely agree here. : )
Sam Clarke 2 Mar 2022 10:35 UTC
8 points
0 ∶ 0
Nice post! I agree with ~everything here. Parts that felt particularly helpful:
- There are even more reasons why paraphrasing is great than I thought—good reminder to be doing this more often
- The way you put this point was v crisp and helpful: “Empirically, there’s a lot of smart people who believe different and contradictory things! It’s impossible for all of them to be right, so you must disagree with some of them. Internalising that you can do this is really important for being able to think clearly”
- The importance of “how much feedback do they get from the world” in deferring intelligently
One thing I disagree with: the importance of forming inside views for community epistemic health. I think it’s pretty important. E.g. I think that ~2 years ago, the arguments for the longterm importance of AGI safety were pretty underdeveloped; that since then lots more people have come out with their insidee views about it; and that now the arguments are in much better shape.
- Sam Clarke 2 Mar 2022 14:29 UTC
  9 points
  0 ∶ 0
  Parent
  Also, nitpick, but I find the “inside view” a more confusing and jargony way of just saying “independent impressions” (okay, also jargon to some extent, but closer to plain English), which also avoids the problem you point out: inside view is not the opposite of the Tetlockian sense of outside view (and the other ambiguities with outside view that another commenter pointed out).
  - Neel Nanda 2 Mar 2022 16:52 UTC
    3 points
    0 ∶ 0
    Parent
    The complaint that it’s confusing jargon is fair. Though I do think the Tetlock sense + phrase inside view captures something important—my inside view is what feels true to me, according to my personal best guess and internal impressions. Deferring doesn’t feel true in the same way, it feels like I’m overriding my beliefs, not like how they world is.
    This mostly comes under the motivation point—maybe, for motivation, inside views matter but independent impressions don’t? And people differ on how they feel about the two?
    - Sam Clarke 14 Mar 2022 16:19 UTC
      8 points
      0 ∶ 0
      Parent
      I’m still confused about the distinction you have in mind between inside view and independent impression (which also have the property that they feel true to me)?
      
      Or do you have no distinction in mind, but just think that the phrase “inside view” captures the sentiment better?
      - Neel Nanda 16 Mar 2022 5:12 UTC
        3 points
        0 ∶ 0
        Parent
        Inside view feels deeply emotional and tied to how I feel the world to be, independent impression feels cold and abstract
- Neel Nanda 2 Mar 2022 16:51 UTC
  3 points
  0 ∶ 0
  Parent
  One thing I disagree with: the importance of forming inside views for community epistemic health. I think it’s pretty important. E.g. I think that ~2 years ago, the arguments for the longterm importance of AGI safety were pretty underdeveloped; that since then lots more people have come out with their insidee views about it; and that now the arguments are in much better shape.
  I want to push back against this. The aggregate benefit may have been high, but when you divide it by all the people trying, I’m not convinced it’s all that high.
  Further, that’s an overestimate—the actual question is more like ‘if the people who are least enthusiastic about it stop trying to form inside views, how bad is that?’. And I’d both guess that impact is fairly heavy tailed, and that the people most willing to give up are the least likely to have a major positive impact.
  I’m not confident in the above, but it’s definitely not obvious
  - Sam Clarke 14 Mar 2022 16:12 UTC
    4 points
    0 ∶ 0
    Parent
    Thanks—good points, I’m not very confident either way now
Yonatan Cale 28 Feb 2022 0:01 UTC
1 point
0 ∶ 0
Linking: Taboo Outside View [Lesswrong post, 292 karma]
What links here?
- Sam Clarke's comment on How I Formed My Own Views About AI Safety by Neel Nanda (2 Mar 2022 14:29 UTC; 9 points)