Error
Unrecognized LW server error:
Field "fmCrosspost" of type "CrosspostOutput" must have a selection of subfields. Did you mean "fmCrosspost { ... }"?
Unrecognized LW server error:
Field "fmCrosspost" of type "CrosspostOutput" must have a selection of subfields. Did you mean "fmCrosspost { ... }"?
See also LessWrong Forum:
Comment 1 (on my portrayal of Eliezer’s portrayal of AGl):
Comment 2:
Comment 3:
Comment 4:
Comment 5:
(a bunch of counterarguments and counterexamples)
Two people asked me to clarify this claim:
Copying over my responses:
re: Conflicts of interest:
My impression has been that a few people appraising my project work looked for ways to e.g. reduce Goodharting, or the risk that I might pay myself too much from the project budget. Also EA initiators sometimes post a fundraiser write-up for an official project with an official plan, that somewhat hides that they’re actually seeking funding for their own salaries to do that work (the former looks less like a personal conflict of interest *on paper*).
re: Skin in the game:
Bigger picture, the effects of our interventions aren’t going to affect us in a visceral and directly noticeable way (silly example: we’re not going to slip and fall from some defect in the malaria nets we fund). That seems hard to overcome in terms of loose feedback from far-away interventions, but I think it’s problematic that EAs also seem to underemphasise skin in the game for in-between steps where direct feedback is available. For example, EAs seem sometimes too ready to pontificate (me included) about how particular projects should be run or what a particular position involves, rather than rely on the opinions/directions of an experienced practician who would actually suffer the consequences of failing (or even be filtered out of their role) if they took actions that had negative practical effects for them. Or they might dissuade someone from initiating an EA project/service that seems risky to them in theory, rather than guide the initiator to test it out locally to constrain or cap the damage.
This interview with Jacqueline Novogratz from Acumen Fund covers some practical approaches to attain skin in the game.
I’m actually interested to hear your thoughts!
Do throw them here, or grab a moment to call :)
To clarify the independent vs. interdependent distinction
Julia suggested that EA thought about negative flow-through effects are an example of interdependent thinking. IMO EAs still tend to take an independent view on that. Even I did a bad job above of describing causal interdependencies in climate change (since I still placed the causal sources in a linear ‘this leads to this leads to that’ sequence).
So let me try to clarify again, at the risk of going meta-physical:
EAs do seem to pay more attention to causal dependencies than I was letting on, but in a particular way:
When EA researchers estimate impacts of specific flow-through effects, they often seem to have in mind some hypothetical individual who takes actions, which incrementally lead to consequences in the future. Going meta on that, they may philosophise about how an untested approach can have unforeseen and/or irreversible consequences, or about cluelessness (not knowing how the resulting impacts, spread out across the future, will average out). Do correct me if you have a different impression!
An alternate style of thinking involves holding multiple actors / causal sources in mind to simulate how they conditionally interact. This is useful for identifying root causes for problems, which I don’t recall EA researchers doing much of (e.g. the sociological/economic factors that originally made commercial farmers industrialise their livestock production).
To illustrate the difference, I think gene-environment interactions provide a neat case:
Independent ‘this or that’ thinking:
Hold one factor constant (e.g. take the different environments in which adopted twins grew up in as a representative sample) to predict the other (e.g. attribute 50% of variation of a general human trait to their genes).
Interdependent ‘this and that’ thinking:
Assume that factors will interplay, and therefore probabilities are not strictly independent.
Test nonlinear factors together to predict outcomes.
e.g. on/off gene for aggression × childhood trauma × teenagers playing violent video games
Cartesian frames seem an apt theoretical analogy
“A represents a set of possible ways the agent can be, E represents a set of possible ways the environment can be, and ⋅ : A × E → W is an evaluation function that returns a possible world given an element of A and an element of E”
Under the interdependent framing, the environment affords certain options perceivable by the agent, which they choose between.
A notion of Free Will loses its relevancy under this framing. Changes in the world were caused neither by the settings of the outside environment nor the embedded agent ‘willing’ an action, but rather as contingent on both.
You might counter: isn’t the agent’s body constituted of atomic particles that act and react deterministically over time, making free will an illusion?
Yes, and somehow in parts interacting across parts, they come to view the constitution of a greater whole, an agent, that makes choices.
None of these (admittedly confusing) framings have to be inconsistent with each other.
Overlap between ‘interdependent thinking’ and ‘context’ and ‘collective thinking’.
When individuals with their own distinct traits are constrained in the possible ways they can interact by surrounding others (i.e. by their context), they will behave predictably within those constraints:
e.g. when EAs stick to certain styles of analysis that they know comrades will grasp and admire when gathered at a conference or writing a post for others to read.
Analysis of the kind ‘this individual agent with x skills and y preferences will take/desist from actions that are more likely to lead to z outcomes’ falls flat here.
e.g. to paraphrase Critch’s Production Web scenario, which typical AI Safety analysis tends to overlook the severity of:
Take a future board that buys a particular ‘CEO AI service’ to ensure their company will be successful. The CEO AI elicits trustees for their inherent categorical preferences, but what they express at any given moment is guided by their recent interactions with influential others (e.g. the need to survive tougher competition by other CEO AIs). A CEO AI that plans company actions based on preferences elicited by board members’ preferences at any given point in time, will by default not account for actions bringing into existence processes that actually change the preferences board members state. That is, unless safety-minded AI developers design a management service that accounts for this circuitous dynamic, and boards are self-aware enough to buy the less-profit-optimised service that won’t undermine their personal integrity.
The risk emerges from how the AI developers and company’s board introduce assumptions of structure:
i.e. That you can design an AI to optimise for end states based on its human masters’ identified intrinsic preferences. That AI would fail to use available compute to determine whether a chosen instrumental action reinforces a process through which ‘stuff’ contingently gets flagged in human attention, expressed to the AI, received as inputs, and derived as ‘stable preferences’.
I left out nuances to keep the blindspot summary short and readable. But I should have specifically prefaced what fell outside the scope of my writing. Not doing so made claims come across more extreme than I meant for the more literal/explicit readers amongst us :)
So for you who still happens to read this, here’s where I was coming from:
To describe blindspots broadly across the entire rationality and EA community.
In actual fact I see both communities more as loose clusters of interacting and affiliated people. Each gathered group somewhat diverges in how it attracts members who are predisposed towards focussing on (and reinforce each other to express) certain aspects as perceived within certain views.
I pointed out how a few groups diverge in the summary above (e.g. effective animal advocacy vs. LW decision theorists, thriving vs suffering-focussed EAs), but left out many others. Responding to Christian Kl’s earlier comment, I think how the ‘CFAR alumni’ cluster frames aspects meaningfully diverges from the larger/overlapping ‘long-time LessWrong fans’ cluster.
Previously, I suggested that EA staff could coordinate work more through non-EA-branded groups with distinct yet complementary scopes and purposes, so the general overarching tone of this post runs counter to that.
To aggregate common views within which our members seemed to most often frame problems (as expressed to others involved in the community they knew also aimed to work on those problems), and to contrast those with the foci held by other purposeful human communities out there.
Naturally, what an individual human focusses on in any given moment depends on their changing emotional/mental makeup and the context they find themselves (incl. the role they then identify as having) in. I’m not e.g. claiming that when someone who aspires to be a rational researcher at work focusses on brushing their teeth at home while glancing at their romantic partner, they must nevertheless be thinking real abstract and elegant thoughts.
But for me, the exercise of mapping our ingroup’s brightspots onto each listed dimension – relative to the focus of outside groups on – has provided some overview. The dimensions are from a perceptual framework I gradually put together and that is somewhat internally coherent (but predictably overwhelms anyone whom I explain it to, and leaves them wondering how it’s useful; hence this more pragmatic post).
I hope though no reader ends up using this as a personality test – say for identifying their or their friend’s (supposedly stable) character traits to predict their resulting future behaviour (or god forgive, to explain away any confusion or disagreement they sense about what an unfamiliar stranger says).
To keep each blindspot explanation simple and to the point:
If I already mix in a lot of ‘on one hand in this group...but on the other hand in this situation’, the reader will gloss over the core argument. I appreciate people’s comments with nuanced counterexamples though. Keeps me intellectually honest.
Hope that clarifies the post’s argumentation style somewhat.
I had those three starting points at the back of my mind while writing in March. So sorry I didn’t include them.