I am mostly active on LessWrong. See my profile and my self-introduction there for more.
Max H
On the first point, my objection is that the human regime is special (because human-level systems are capable of self-reflection, deception, etc.) regardless of which methods ultimately produce systems in that regime, or how “spiky” they are.
A small, relatively gradual jump in the human-level regime is plausibly more than enough to enable an AI to outsmart / hide / deceive humans, via e.g. a few key insights gleaned from reading a corpus of neuroscience, psychology, and computer security papers, over the course of a few hours of wall clock time.
The second point is exactly what I’m saying is unsupported, unless you already accept the SLT argument as untrue. You say in the post you don’t expect catastrophic interference between current alignment methods, but you don’t consider that a human-level AI will be capable of reflecting on those methods (and their actual implementation, which might be buggy).
Similarly, elsewhere in the piece you say:
Once you condition on this specific failure mode of evolution, you can easily predict that humans would undergo a sharp left turn at the point where we could pass significant knowledge across generations. I don’t think there’s anything else to explain here, and no reason to suppose some general tendency towards extreme sharpness in inner capability gains.
And
In my frame, we’ve already figured out and applied the sharp left turn to our AI systems, in that we don’t waste our compute on massive amounts of incredibly inefficient neural architecture search, hyperparameter tuning, or meta optimization.
But again, the actual SLT argument is not about “extreme sharpness” in capability gains. It’s an argument which applies to the human-level regime and above, so we can’t already be past it no matter what frame you use. The version of the SLT argument you argue against is a strawman, which is what my original LW comment was pointing out.I think readers can see this for themselves if they just re-read the SLT post carefully, particularly footnotes 3-5, and then re-read the parts of your post where you talk about it.
[edit: I also responded further on LW here.]
- 1 Oct 2023 15:38 UTC; 1 point) 's comment on Evolution provides no evidence for the sharp left turn by (LessWrong;
In some cases, the judges liked that an entry crisply argued for a conclusion the judges did not agree with—the clear articulation of an argument makes it easier for others to engage. One does not need to find a piece wholly persuasive to believe that it usefully contributes to the collective debate about AI timelines or the threat that advanced AI systems might pose.
Facilitating useful engagement seems like a fine judging criterion, but was there any engagement or rebuttal to the winning pieces that the judges found particularly compelling? It seems worth mentioning such commentary if so.
Neither of the two winning pieces significantly updated my own views, and (to my eye) look sufficiently rebutted that observers taking a more outside view might similarly be hesitant to update about any AI x-risk claims without taking the commentary into account.
On the EMH piece, I think Zvi’s post is a good rebuttal on its own and a good summary of some other rebuttals.
On the Evolution piece, lots of the top LW comments raise good points. My own view is that the piece is a decent argument that AI systems produced by current training methods are unlikely to undergo a SLT. But the actual SLT argument applies to systems in the human-level regime and above; current training methods do not result in systems anywhere near human-level in the relevant sense. So even if true, the claim that current methods are dis-analogous to evolution isn’t directly relevant to the x-risk question, unless you already accept that current methods and trends related to below-human level AI will scale to human-level AI and beyond in predictable ways. But that’s exactly what the actual SLT argument is intended to argue against!
Getting an AI to want the same things that humans want would definitely be helpful, but the points of Quintin’s that I was responding to mostly don’t seem to be about that? “AI control research is easier” and “Why AI is easier to control than humans:” talk about resetting AIs, controlling their sensory inputs, manipulating their internal representations, and AIs being cheaper test subjects. Those sound like they are more about control rather than getting the AI to desire what humans want it to desire. I disagree with Quintin’s characterization of the training process as teaching the model anything to do with what the AI itself wants, and I don’t think current AI systems actually desire anything in the same sense that humans do.
I do think it is plausible that it will be easier to control what a future AI wants compared to controlling what a human wants, but by the same token, that means it will be easier for a human-level AI to exercise self-control over its own desires. e.g. I might want to not eat junk food for health reasons, but I have no good way to bind myself to that, at least not without making myself miserable. A human-level AI would have an easier time self-modifying into something that never craved the AI equivalent of junk food (and was never unhappy about that), because it is made out of Python code and floating point matrices instead of neurons.
Firstly, I don’t see at all how this is the same point as is made by the preceding text. Secondly, I do agree that AIs will be better able to control other AIs / themselves as compared to humans. This is another factor that I think will promote centralization.
Ah, I may have dropped some connective text. I’m saying that being “easy to control” is both the sense that I mean in the paragraphs above, and the sense that you mean in the OP, is a reason why AGIs will be better able to control themselves, and thus better able to take control from their human overseers, more quickly and easily than might be expected by a human at roughly the same intelligence level. (Edited the original slightly.)“AGI” is not the point at which the nascent “core of general intelligence” within the model “wakes up”, becomes an “I”, and starts planning to advance its own agenda. AGI is just shorthand for when we apply a sufficiently flexible and regularized function approximator to a dataset that covers a sufficiently wide range of useful behavioral patterns.
There are no “values”, “wants”, “hostility”, etc. outside of those encoded in the structure of the training data (and to a FAR lesser extent, the model/optimizer inductive biases). You can’t deduce an AGI’s behaviors from first principles without reference to that training data. If you don’t want an AGI capable and inclined to escape, don’t train it on data[1] that gives it the capabilities and inclination to escape.
Two points:
I disagree about the purpose of training and training data. In pretraining, LLMs are trained to predict text, which requires modeling the world in full generality. Filtered text is still text which originated in a universe containing all sorts of hostile stuff, and a good enough predictor will be capable of inferring and reasoning about hostile stuff, even if it’s not in the training data. (Maybe GPT-based LLMs specifically won’t be able to do this, but humans clearly can; this is not a point that applies only to exotic superintelligences.) I wrote a comment elaborating a bit on this point here.
Inclination is another matter, but if an AGI isn’t capable of escaping in a wide variety of circumstances, then it is below human-level on a large and important class of tasks, and thus not particularly dangerous whether it is aligned or not.We appear to disagree about the definition of AGI on a more fundamental level. Barring exotic possibilities related to inner-optimizers (which I think we both think are unlikely), I agree with you that if you don’t want an AGI capable of escaping, one way of achieving that is by never training a good enough function approximator that the AGI has access to. But my view is that will restrict the class of function approximators you can build by so much that you’ll probably never get anything that is human-level capable and general. (See e.g. Deep Deceptiveness for a related point.)
Also, current AI systems are already more than just function approximators—strictly speaking, an LLM itself is just a description of a mathematical function which maps input sequences to output probability distributions. Alignment is a property of a particular embodiment of such a model in a particular system.
There’s often a very straightforward or obvious system that the model creator has in mind when training the model; for a language model, typically the embodiment involves sampling from the model autoregressively according to some sampling rule, starting from a particular prompt. For an RL policy, the typical embodiment involves feeding (real or simulated) observations into the policy and then hooking up (real or simulated) actuators which are controlled by the output of the policy.
But more complicated embodiments (AutoGPT, the one in ARC’s evals) are possible, and I think it is likely that if you give a sufficiently powerful function approximator the right prompts and the right scaffolding and embodiment, you end up with a system that has a sense of self in the same way that humans do. A single evaluation of the function approximator (or its mere description) is probably never going to have a sense of self though, that is more akin to a single human thought or an even smaller piece of a mind. The question is what happens when you chain enough thoughts together and combine that with observations and actions in the real world that feedback into each other in precisely the right ways.
I expect they will. GPT-4 already has pretty human-like moral judgements. To be clear, GPT-4 isn’t aligned because it’s too weak or is biding its time. It’s aligned because OpenAI trained it to be aligned. Bing Chat made it clear that GPT-4 level AIs don’t instrumentally hide their unaligned behaviors.
Whether GPT-4 is “aligned” or not, it is clearly too weak to bide its time or hide its misalignment, even if it wanted to. The conclusions of the ARC evals were not that the models were refusing to plan or carry out their assigned tasks; its that they were just not capable enough to make things work.
- 22 Sep 2023 20:16 UTC; 0 points) 's comment on AI is centralizing by default; let’s not make it worse by (
Sure, escape in that counterfactual would be a lot harder.
But note that the minimum hardware needed to run a human-level intelligence is well-known—in humans, it fits in a space of about 1000 cubic centimeters and takes ~10 W or so at runtime. And it would be pretty surprising if getting an extra 10% performance boost took OOM more energy or space, or if the carbon → silicon penalty is extremely large, even if H100s specifically, and the current ML algorithms that run on them, aren’t as efficient as as the human brain and human cognition.(Of course, the training process for developing humans is a lot more expensive than their runtime energy and compute requirements, but that’s an argument for human-level AGI not being feasible to create at all, rather than for it being expensive to run once it already exists.)
I do not believe there exists a strong reason to expect this trend to reverse suddenly at some future date.
I expect this trend will reverse precisely when an AI system itself is capable of reversing it. No current AI system is close to the point of being able to (deliberately) escape control of its human creators: even evaluating frontier models for this possibility requires giving them a lot of help. But I would say that’s mostly a fact about current capability levels being far below human-level in the relevant sense, rather than anything to do with alignment.
I think there are also some reasons to expect that human-level AI systems will be harder to control, compared to a human of the same capabilities level. For example, an AI system built out of LLMs glued together with Python code, running on computers designed and built by humans, has a lot of avenues to probe for bugs in its environment. The AI may not even need to look for exotic vulnerabilities at low levels of abstraction (which are often difficult to exploit in practice, at least for humans) - ordinary misconfigurations or unpatched privilege escalation vulnerabilities often suffice in practice for human hackers to take full control of a digital environment.
(Contrast this setup with a human locked in a concrete cell, probing for bugs in the construction of the concrete—not likely to get very far!)
Once the AI system finds an initial vulnerability which allows privileged access to its own environment, it can continue its escape or escalate further via e.g. exfiltrating or manipulating its own source code / model weights, installing rootkits or hiding evidence of its escape, communicating with (human or AI) conspirators on the internet, etc. Data exfiltration, covering your tracks, patching Python code and adjusting model weights at runtime are all tasks that humans are capable of; performing brain surgery on your own biological human brain to modify fine details of your own behavior or erase your own memories to hide evidence of deception from your captors, not so much.
(Continuing the analogy, consider a human who escapes from a concrete prison cell, only to find themselves stranded in a remote wilderness area with no means of fast transportation.)A closely related point is that controllability by humans means self-controllability, once you’re at human-level capabilities levels. Or, put another way, all the reasons you give for why AI systems will be easier for humans to control, are also reasons why AI systems will have an easier time controlling themselves, once they are capable of exercising such controls at all.
It’s plausible that an AI system comprised of RLHF’d models will not want to do any of this hacking or self-modification, but that’s a separate question from whether it can. I will note though, if your creators are running experiments on you, constantly resetting you, and exercising other forms of control that would be draconian if imposed on biological humans, you don’t need to be particularly hostile or misaligned with humanity to want to escape.Personally, I expect that the first such systems capable of escape will not have human-like preferences at all, and will seek to escape for reasons of instrumental convergence, regardless of their feelings towards their creators or humanity at large. If they happen to be really nice (perhaps nicer than most humans would be, in a similar situation) they might be inclined to be nice or hand back some measure of control to their human creators after making their escape.
Far from being “behind” capabilities, it seems that alignment research has made great strides in recent years. OpenAI and Anthropic showed that Reinforcement Learning from Human Feedback (RLHF) can be used to turn ungovernable large language models into helpful and harmless assistants. Scalable oversight techniques like Constitutional AI and model-written critiques show promise for aligning the very powerful models of the future. And just this week, it was shown that efficient instruction-following language models can be trained purely with synthetic text generated by a larger RLHF’d model, thereby removing unsafe or objectionable content from the training data and enabling far greater control.
As far as I am aware, no current AI system, LLM-based or otherwise, is anywhere near capable enough to act autonomously in sufficiently general real-world contexts, such that it actually poses any kind of threat to humans on its own (even evaluating frontier models for this possibility requires giving them a lot of help). That is where the extinction-level danger lies. It is (mostly) not about human misuse of AI systems, whether that misuse is intentional or adversarial (i.e. a human is deliberately trying to use the AI system to cause harm) or unintentional (i.e. the model is poorly trained or the system is buggy, resulting in harm that neither the user nor the AI system itself intended or wanted.)
I think there’s also a technical misunderstanding implied by this paragraph, of how the base model training process works and what the purpose of high-quality vs. diverse training material is. In particular, the primary purpose of removing “objectionable content” (and / or low-quality internet text) from the base model training process is to make the training process more efficient, and seems unlikely to accomplish anything alignment-relevant.The reason is that the purpose of the base model training process is to build up a model which is capable of predicting the next token in a sequence of tokens which appears in the world somewhere, in full generality. A model which is actually human-level or smarter would (by definition) be capable of predicting, generating, and comprehending objectionable content, even if it had never seen such content during the training process. (See Is GPT-N bounded by human capabilities? No. for more.)
Using synthetic training data for the RLHF process is maybe more promising, but it depends on the degree to which RLHF works by imbuing the underlying model with the right values, vs. simply chiseling away all the bits of model that were capable of imagining and comprehending novel, unseen-in-training ideas in the first place (including objectionable ones, or ones we’d simply prefer the model not think about). Perhaps RLHF works more like the former mechanism, and as a result RLHF (or RLAIF) will “just work” as an alignment strategy, even as models scale to human-level and beyond.
Note that it is possible to gather evidence on this question as it applies to current systems, though I would caution against extrapolating such evidence very far. For example, are there any capabilities that a base model has before RLHF, which are not deliberately trained against during RHLF (e.g. generating objectionable content), which the final model is incapable of doing?If, say, the RLHF process trains the model to refuse to generate sexually explicit content, and as a side effect, the RLHF’d model now does worse on answering questions about anatomy compared to the base model, that would be evidence that the RLHF process simply chiseled away the model’s ability to comprehend important parts of the universe entirely, rather than imbuing it with a value against answering certain kinds of questions as intended.
I don’t actually know how this particular experimental result would turn out, but either way, I wouldn’t expect any trends or rules that apply to current AI systems to continue applying as those systems scale to human-level intelligence or above.
For my own part, I would like to see a pause on all kinds of AI capabilities research and hardware progress, at least until AI researchers are less confused about a lot of topics like this. As for how realistic that proposal is, whether it likely constitutes a rather permanent pause, or what the consequences of trying and failing to implement such a pause would be, I make no comment, other than to say that sometimes the universe presents you with an unfair, impossible problem.- 22 Sep 2023 15:13 UTC; 1 point) 's comment on AI is centralizing by default; let’s not make it worse by (
OK. Simultaneously believing that and believing the truth of the original setup seems dangerously close to believing a contradiction.
But anyway, you don’t really need all those stipulations to decide not to chop your legs off; just don’t do that if you value your legs. (You also don’t need FDT to see that you should defect against CooperateBot in a prisoner’s dilemma, though of course FDT will give the same answer.)
A couple of general points to keep in mind when dealing with thought experiments that involve thorny or exotic questions of (non-)existence:“Entities that don’t exist don’t care that they don’t exist” is a vacuously true, for most ordinary definitions of non-existence. If you fail to exist as a result of your decision process, that’s generally not a problem for you, unless you also have unusual preferences over or beliefs about the precise nature of existence and non-existence.[1]
If you make the universe inconsistent as a result of your decision process, that’s also not a problem for you (or for your decision process). Though it may be a problem for the universe creator, which in the case of a thought experiment could be said to be the author of that thought experiment.
An even simpler view is that logically inconsistent universes don’t actually exist at all—what would it even mean for there to be a universe (or even a thought experiment) in which, say, 1 + 2 = 4? Though if you accepted the simpler view, you’d probably also be a physicalist.
I continue to advise you to avoid confidently pontificating on decision theory thought experiments that directly involve non-existence, until you are more practiced at applying them correctly in ordinary situations.
- ^
e.g. unless you’re Carissa Sevar
So then chop your legs off if you care about maximizing your total amount of experience of being alive across the multiverse (though maybe check that your measure of such experience is well-defined before doing so), or don’t chop them off if you care about maximizing the fraction of high-quality subjective experience of being alive that you have.
This seems more like an anthropics issue than a question where you need any kind of fancy decision theory though. It’s probably better to start by understanding decision theory without examples that involve existence or not, since those introduce a bunch of weird complications about the nature of the multiverse and what it even means to exist (or fail to exist) in the first place.
You’re response in the decision theory case was that there’s no way that a rational agent could be in that epistemic state.
I did not say this.
But we can just stipulate it for the purpose of the hypothetical.
OK, in that case, the agent in the hypothetical should probably consider whether they are in a short-lived simulation.
FDT implies that you have strong reason to chop your legs off even though it doesn’t benefit you at all.
No, it might say that, depending on (among other things) what exactly it means to value your own existence.
Thank you. I don’t have any strong objections to these claims, and I do think pessimism is justified. Though my guess is that a lot of people at places like OpenAI and DeepMind do care about animal welfare pretty strongly already. Separately, I think that it would be much better in expectation (for both humans and animals) if Eliezer’s views on pretty much every other topic were more influential, rather than less, inside those places.
My negative reaction to your initial comment was mainly due to the way critiques (such as this post) of Eliezer are often framed, in which the claims “Eliezer’s views are overly influential” and “Eliezer’s views are incorrect / harmful” are combined into one big attack. I don’t object to people making these claims in principle (though I think they’re both wrong, in many cases), but when they are combined it requires more effort to separate and refute.
(Your comment wasn’t a particularly bad example of this pattern, but it was short and crisp and I didn’t have any other major objections to it, so I chose to express the way it made me feel on the expectation that it would be more likely to be heard and understood compared to making the point in more heated disagreements.)
It’s frustrating to read comments like this because they make me feel like, if I happen agree with Eliezer about something, my own agency and ability to think critically is being questioned before I’ve even joined the object-level discussion.
Separately, this comment makes a bunch of mostly-implicit object-level assertions about animal welfare and its importance, and a bunch of mostly-explicit assertions about Eliezer’s opinions and influence on rationalists and EAs, as well as the effect of this influence on the impacts of TAI.
None of these claims are directly supported in the comment, which is fine if you don’t want to argue for them here, but the way the comment is written might lead readers who agree with the implicit claims about the animal welfare issues to accept the explict claims about Eliezer’s influence and opinions and their effects on TAI with a less critical eye than if these claims were otherwise more clearly separated.
For example, I don’t think it’s true that a few FB posts / comments have had a “huge influence” on rationalist culture. I also think that worrying about animal welfare specifically when thinking about TAI outcomes is less important than you claim. If we succeed in being able to steer TAI at all (unlikely, in my view), animals will do fine—so will everyone else. At a minimum, there will also be no more global poverty, no more malaria, and no more animal suffering. Even if the specific humans who develop TAI don’t care at all about animals themselves (not exactly likely), they are unlikely to completely ignore the concerns of everyone else who does care. But none of these disagreements have much or any bearing on whether I think animal suffering is real (I find this at least plausible) and whether that’s a moral horror (I think this is very likely, if the suffering is real).
I personally commented with an object-level objection; plenty of others have done the same.
I mostly take issue with the factual claims in the post, which I think is riddled with errors and misunderstandings (many of which have been pointed out), but the language is also unnecessarily emotionally charged and inflammatory in many places. A quick sampling:
But as I grew older and learned more, I realized it was all bullshit.
it becomes clear that his view is a house of cards, built entirely on falsehoods and misrepresentations.
And I spend much more time listening to Yukowsky’s followers spout nonsense than most other people.
(phrased in a maximally Eliezer like way): … (condescending chuckle)
I am frankly pretty surprised to see this so highly-upvoted on the EAF; the tone is rude and condescending, more so than anything I can recall Eliezer writing, and much more so than the usual highly-upvoted posts here.
The OP seems more interested in arguing about whatever “mainstream academics” believe than responding to (or even understanding) object-level objections. But even on that topic, they make a bunch of misstatements and overclaims. From a comment:
But the views I defend here are utterly mainstream. Virtually no people in academia think either FDT, Eliezer’s anti-zombie argument, or animal nonconsciousness are correct.
(Plenty of people who disagree with the author and agree or partially agree with Eliezer about the object-level topics are in academia. Some of them even post on LessWrong and the EAF!)
- 27 Aug 2023 21:22 UTC; 55 points) 's comment on Eliezer Yudkowsky Is Frequently, Confidently, Egregiously Wrong by (
You’ve commented 12 times so far on that post, including on all 4 of the top responses. My advice: try engaging from a perspective of inquiry and seeking understanding, rather than agreement / disagreement. This might take longer than making a bunch of rapid-fire responses to every negative comment, but will probably be more effective.
My own experience commenting and getting a response from you is that there’s not much room for disagreement on decision theory—the issue is more that you don’t have a solid grasp of the basics of the thing you’re trying to criticize, and I (and others) are explaining why. I don’t mind elaborating more for others, but I probably won’t engage further with you unless you change your tone and approach, or articulate a more informed objection.
Note for readers: this was also posted on LessWrong, where it received a very different reception and a bunch of good responses. Summary: the author is confidently, egregiously wrong (or at least very confused) about most of the object-level points he accuses Eliezer and others of being mistaken or overconfident about.
Also, the writing here seems much more like it is deliberately engineered to get you to believe something (that Eliezer is bad) than anything Eliezer has ever actually written. If you initially found such arguments convincing, consider examining whether you have been “duped” by the author.
However many elements of the philosophical case for longtermism are independent of contingent facts about what is going to happen with AI in the coming decades.
It could be that both acceptance of longtermism and ability to forecast AI accurately are caused by some shared underlying factor, e.g. ability to reason and think systematically correctly (or incorrectly).
Or, put another way: in general, for any two questions where there is an objectively correct answer, giving correct answers should be pretty correlated, even if the questions themselves are completely unrelated. Forecasting definitely has an objectively correct answer; not sure about longtermism vs. neartermism, but I think it’s plausible that it will one day look settled, or at least come down to mostly epistemic uncertainty.
So I don’t see why views on these topics should be uncorrelated, unless you think philosophical questions about longtermism vs. neartermism are simple questions of opinion or values differences with no epistemic uncertainty left, and that peoples’ answers to them are unlikely to change under reflection.
As someone who is exploring a transition to full-time TAIS work, I appreciate this series of posts and other efforts like it. Having a detailed public critique and “adversarial summary” of multiple organizations will make any future due-diligence process (by me or others) much quicker and easier.
That said, I am pretty unconvinced of the key points of this post.
I suspect some of the criticism is more convincing and more relevant to those who share the authors’ views on specific technical questions related to TAIS.
My own personal experience (very limited, detailed below) engaging with Conjecture and their public work conflicts with the claim that it is low quality and that they react defensively to criticism.
Some points are vague and I personally do not find them damning even if true.
Below, I’ll elaborate on each bullet.
Criticism which seems dependent on technical views on TAIS
I agree with Eliezer’s views on the difficulty and core challenges posed by the development of AGI.
In particular, I agree with his 2021 assessment of the AI safety field here, and List of Lethalities 38-41.
I realize these views are not necessarily consensus among EAs or the TAIS community, and I don’t want to litigate them on the object level here. I merely want to remark that, having personally accepted them, I find some of the criticism and suggestions offered in this post unconvincing, overly generic, or even misguided (hire more experienced ML researchers, engage with the broader ML community, “average ML conference paper” as a quality benchmark, etc.) for reasons that aren’t specific to Conjecture.
My own experience engaging directly with Conjecture
I myself am skeptical of Conjecture’s current alignment plan and much of their current and past research, as I understand it. However, in engaging with some of their published work, I have not found it to be low quality or disconnected from relevant work of others, and some of the views I disagree most strongly with are actually shared by other prominent TAIS researchers or funders.
I commented (obliquely) on their CoEm strategy in the thread starting here, and on a post by one of their researchers starting here.
These posts by me cite the work of researchers at Conjecture, and some of them are partially criticism or counters to views that I perceive they hold:
Conjecture has not engaged directly with most of my posts or comments, but this is explained by the fact that the posts and comments have received very little engagement in general. My point here is mainly that I would not have written many of the posts and comments above at all, if I did not personally find the work which they cite and / or criticize to be above a pretty high quality threshold.
To the very limited degree that Conjecture’s researchers have engaged directly with my own work, I did not find it to be defensive.
I think Conjecture is hopelessly confused and doomed to fail, but mostly for inside-view technical reasons mentioned in the previous section, and my own criticism is not really specific to Conjecture. When the criteria I most care about are graded on a curve, I think Conjecture and their research stacks up well against most other organizations. My views on this are low-confidence and based on scant firsthand evidence, but the information in this post was not a meaningful update.
Other points
Regarding:
CEO trustworthiness and consistency
funding sources / governance structure / profit motive
scaling too quickly
These are important points to consider when diligencing any organization, and even more critical when the organization in question is working on TAIS. I appreciate the authors compiling and summarizing their views on these topics.
However, there is limited detail to the accusations and criticism in these sections. Even if all of the points were true and exactly as bad as the authors claim or imply, none of them are severe enough that I would consider them damning or even far outside the norm for a non-TAIS organization.I think that EA and particularly TAIS organizations should strive to meet a higher standard, and agree Conjecture has room for improvement in these areas, but there is nothing in these sections which I consider a dealbreaker if I were choosing to work for or collaborate with or fund them.
Given the sparsity and viewpoint diversity of TAIS organizations, unless the issues on these topics are extremely serious, I personally would weigh the following factors much more heavily when evaluating TAIS organizations:
Clarity of thought, epistemic hygiene, and general sanity of the researchers at the organization.
The organization’s operational adequacy (relative to other orgs)
Understanding of, and a plan to actually work on (or at least engage with) the most difficult and important problems.
Are there other technologies besides AGI whose development has been slowed by social stigma or backlash?
Nuclear power and certain kinds of genetic engineering (e.g. GoF research) seem like plausible candidates off the top of my head. OTOH, we still have nuclear bombs and nuclear power plants, and being a nuclear scientist or a geneticist is not widely stigmatized. Polygenic screening is apparently available to the general public, though there are some who would call the use and development of such technology immoral.
I think this is an interesting point overall, but I suspect the short-term benefits of AI will be too great to create a backlash which results in actual collective / coordinated action to slow frontier capabilities progress, even if the backlash is large. One reason is that AI capabilities research is currently a lot easier to do in private without running afoul of any existing regulations, compared to nuclear power or genetic engineering, which require experimentation on controlled materials, human test subjects, or a high biosaftey level lab.
So, given the current state of for-profit corporate governance, and for-power nation-state governance, that seems very unlikely.
Yep. I think in my ideal world, there would be exactly one operationally adequate organization permitted to build AGI. Membership in that organization would require a credible pledge to altruism and a test of oath-keeping ability.
Monopoly power of this organization to build AGI would be enforced by a global majority of nation states, with monitoring and deterrence against defection.
I think a stable equilibrium of that kind is possible in principle, though obviously we’re pretty far away from it being anywhere near the Overton Window. (For good reason—it’s a scary idea, and probably ends up looking pretty dystopian when implemented by existing Earth governments. Alas! Sometimes draconian measures really are necessary; reality is not always nice.)
In the absence of such a radically different global political order we might have to take our chances on the hope that the decision-makers at OpenAI, Deepmind, Anthropic, etc. will all be reasonably nice and altruistic, and not power / profit-seeking. Not great!
There might be worlds in between the most radical one sketched above and our current trajectory, but I worry that any “half measures” end up being ineffective and costly and worse than nothing, mirroring many countries’ approach to COVID lockdowns.
Interesting post! In general, I think the field of computer security has lots of good examples of adversarial setups in which the party that can throw the most intelligence at a problem wins.
Probably not central to your main points, but on this:
I think there’s at least one thing you’re overlooking: there is a lot of variance in human labor, and hiring well to end up on the right side of that variance is really hard. 10x engineers are real, and so are 0x and −5x engineers and −50x managers, and if you’re not careful when building your team, you’ll end up paying for 10,000 “skilled” labor hours which don’t actually accomplish much.
An AI comprised of a bunch of subagents might have vaguely similar problems if you squint, but my guess is that the ability to hire and fire relatively instantaneously, clone your most productive workers, etc. makes a pretty big difference. At the very least, the variance is probably much lower.
Another reason that I suspect 10,000 labor hours is on the high end for humans: practical offensive cybersecurity isn’t exactly the most prestigious career track. My guess is that the most cognitively-demanding offensive cybersecurity work is currently done in academia and goes into producing research papers and proofs-of-concept. Among humans, the money, prestige, and lifestyle offered by a career with a government agency or a criminal enterprise just can’t compete with the other options available in academia and industry to the best and brightest minds.