# steve2152

Karma: 137

Working on AGI safety via a deep-dive into brain algorithms, see https://​​sjbyrnes.com/​​agi.html

• For what it’s worth, I generally downvote a post only when I think “This post should not have been written in the first place”, and relatedly I will often upvote posts I disagree with.

If that’s typical, then the “controversial” posts you found may be “the most meta-level controversial” rather than “the most object-level controversial”, if you know what I mean.

That’s still interesting though.

# A case for AGI safety re­search far in advance

26 Mar 2021 12:59 UTC
7 points
(www.alignmentforum.org)
• I’m not up on the literature and haven’t thought too hard about it, but I’m currently very much inclined to not accept the premise that I should expect myself to be a randomly-chosen person or person-moment in any meaningful sense—as if I started out as a soul hanging out in heaven, then flew down to Earth and landed in a random body, like in that Pixar movie.

I think that “I” am the thought processes going on in a particular brain in a particular body at a particular time—the reference class is not “observers” or “observer-moments” or anything like that, I’m in a reference class of one.

The idea that “I could have been born a different person” strikes me as just as nonsensical as the idea “I could have been a rock”. Sure, I’m happy to think “I could have been born a different person” sometimes—it’s a nice intuitive poetic prod to be empathetic and altruistic and grateful for my privileges and all that—but I don’t treat it as a literally true statement that can ground philosophical reasoning. Again, I’m open to being convinced, but that’s where I’m at right now.

• The “meta-problem of consciousness” is “What is the exact chain of events in the brain that leads people to self-report that they’re conscious?”. The idea is (1) This is not a philosophy question, it’s a mundane neuroscience /​ CogSci question, yet (2) Answering this question would certainly be a big step towards understanding consciousness itself, and moreover (3) This kind of algorithm-level analysis seems to me to be essential for drawing conclusions about the consciousness of different algorithms, like those of animal brains and AIs.

(For example, a complete accounting of the chain of events that leads me to self-report “I am wearing a wristwatch” involves, among other things, a description of the fact that I am in fact wearing a wristwatch, and of what a wristwatch is. By the same token, a complete accounting of the chain of events that leads me to self-report “I am conscious” ought to involve the fact that I am conscious, and what consciousness is, if indeed consciousness is anything at all. Unless you believe in p-zombies I guess, and likewise believe that your own personal experience of being conscious has no causal connection whatsoever to the words that you say when you talk about your conscious experience, which seems rather ludicrous to me, although to be fair there are reasonable people who believe that.)

My impression is that the meta-problem of consciousness is rather neglected in neuroscience /​ CogSci, although I think Graziano is heading in the right direction. For example, Dehaene has a whole book about consciousness, and nowhere in that book will you see a sentence that ends ”...and then the brain emits motor commands to speak the words ‘I just don’t get it, why does being human feel like anything at all?’.” or anything remotely like that. I don’t see anything like that from QRI either, although someone can correct me if I missed it. (Graziano does have sentences like that.)

Ditto with the “meta-problem of suffering”, incidentally. (Is that even a term? You know what I mean.) It’s not obvious, but when I wrote this post I was mainly trying to work towards a theory of the meta-problem of suffering, as a path to understand what suffering is and how to tell whether future AIs will be suffering. I think that particular post was wrong in some details, but hopefully you can see the kind of thing I’m talking about. Conveniently, there’s a lot of overlap between solving the meta-problem of suffering and understanding brain motivational systems more generally, which I think may be directly relevant and important for AI Alignment.

• Theiss was very much active as of December 2020. They’ve just been recruiting so successfully through word-of-mouth that they haven’t gotten around to updating the website.

I don’t think healthcare and taxes undermine what I said, at least not for me personally. For healthcare, individuals can buy health insurance too. For taxes, self-employed people need to pay self-employment tax, but employees and employers both have to pay payroll tax which adds up to a similar amount, and then you lose the QBI deduction (this is all USA-specific), so I think you come out behind even before you account for institutional overhead, and certainly after. Or at least that’s what I found when I ran the numbers for me personally. It may be dependent on income bracket or country so I don’t want to over-generalize...

That’s all assuming that the goal is to minimize the amount of grant money you’re asking for, while holding fixed after-tax take-home pay. If your goal is to minimize hassle, for example, and you can just apply for a bit more money to compensate, then by all means join an institution, and avoid the hassle of having to research health care plans and self-employment tax deductions and so on.

I could be wrong or misunderstanding things, to be clear. I recently tried to figure this out for my own project but might have messed up, and as I mentioned, different income brackets and regions may differ. Happy to talk more. :-)

• My understanding is that (1) to deal with the paperwork etc. for grants from governments or government-like bureaucratic institutions, you need to be part of an institution that’s done it before; (2) if the grantor is a nonprofit, they have regulations about how they can use their money while maintaining nonprofit status, and it’s very easy for them to forward the money to a different nonprofit institution, but may be difficult or impossible for them to forward the money to an individual. If it is possible to just get a check as an individual, I imagine that that’s the best option. Unless there are other considerations I don’t know about.

Btw Theiss is another US organization in this space.

• I’m a physicist at a US defense contractor, I’ve worked on various photonic chip projects and neuromorphic chip projects and quantum projects and projects involving custom ASICs among many other things, and I blog about safe & beneficial AGI as a hobby … I’m happy to chat if you think that might help, you can DM me :-)

• Just a little thing, but my impression is that CPUs and GPUs and FPGAs and analog chips and neuromorphic chips and photonic chips all overlap with each other quite a bit in the technologies involved (e.g. cleanroom photolithography), as compared to quantum computing which is way off in its own universe of design and build and test and simulation tools (well, several universes, depending on the approach). I could be wrong, and you would probably know better than me. (I’m a bit hazy on everything that goes into a “real” large-scale quantum computer, as opposed to 2-qubit lab demos.) But if that’s right, it would argue against investing your time in quantum computing, other things equal. For my part, I would put like <10% chance that the quantum computing universe is the one that will create AGI hardware and >90% that the CPU/​GPU/​neuromorphic/​photonic/​analog/​etc universe will. But who knows, I guess.

# [U.S. spe­cific] PPP: free money for self-em­ployed & orgs (time-sen­si­tive)

9 Jan 2021 19:39 UTC
14 points
• Thanks for writing this up!!

Although I have not seen the argument made in any detail or in writing, I and the Future of Life Institute (FLI) have gathered the strong impression that parts of the effective altruism ecosystem are skeptical of the importance of the issue of autonomous weapons systems.

I’m aware of two skeptical posts on EA Forum (by the same person). I just made a tag Autonomous Weapons where you’ll find them.

• I thought “taking tail risks seriously” was kinda an EA thing...? In particular, we all agree that there probably won’t be a coup or civil war in the USA in early 2021, but is it 1% likely? 0.001% likely? I won’t try to guess, but it sure feels higher after I read that link (including the Vox interview) … and plausibly high enough to warrant serious thought and contingency planning.

At least, that’s what I got out of it. I gave it a bit of thought and decided that I’m not in a position that I can or should do anything about it, but I imagine that some readers might have an angle of attack, especially given that it’s still 6 months out.

• Again, this remark seems explicitly to assume that the AI is maximising some kind of reward function. Humans often act not as maximisers but as satisficers, choosing an outcome that is good enough rather than searching for the best possible outcome. Often humans also act on the basis of habit or following simple rules of thumb, and are often risk averse. As such, I believe that to assume that an AI agent would be necessarily maximising its reward is to make fairly strong assumptions about the nature of the AI in question. Absent these assumptions, it is not obvious why an AI would necessarily have any particular reason to usurp humanity.

Imagine that, when you wake up tomorrow morning, you will have acquired a magical ability to reach in and modify your own brain connections however you like.

Over breakfast, you start thinking about how frustrating it is that you’re in debt, and feeling annoyed at yourself that you’ve been spending so much money impulse-buying in-app purchases in Farmville. So you open up your new brain-editing console, look up which neocortical generative models were active the last few times you made a Farmville in-app purchase, and lower their prominence, just a bit.

Then you take a shower, and start thinking about the documentary you saw last night about gestation crates. ‘Man, I’m never going to eat pork again!’ you say to yourself. But you’ve said that many times before, and it’s never stuck. So after the shower, you open up your new brain-editing console, and pull up that memory of the gestation crate documentary and the way you felt after watching it, and set that memory and emotion to activate loudly every time you feel tempted to eat pork, for the rest of your life.

Do you see the direction that things are going? As time goes on, if an agent has the power of both meta-cognition and self-modification, any one of its human-like goals (quasi-goals which are context-dependent, self-contradictory, satisficing, etc.) can gradually transform itself into a utility-function-like goal (which is self-consistent, all-consuming, maximizing)! To be explicit: during the little bits of time when one particular goal happens to be salient and determining behavior, the agent may be motivated to “fix” any part of itself that gets in the way of that goal, until bit by bit, that one goal gradually cements its control over the whole system.

Moreover, if the agent does gradually self-modify from human-like quasi-goals to an all-consuming utility-function-like goal, then I would think it’s very difficult to predict exactly what goal it will wind up having. And most goals have problematic convergent instrumental sub-goals that could make them into x-risks.

...Well, at least, I find this a plausible argument, and don’t see any straightforward way to reliably avoid this kind of goal-transformation. But obviously this is super weird and hard to think about and I’m not very confident. :-)

(I think I stole this line of thought from Eliezer Yudkowsky but can’t find the reference.)

Everything up to here is actually just one of several lines of thought that lead to the conclusion that we might well get an AGI that is trying to maximize a reward.

Another line of thought is what Rohin said: We’ve been using reward functions since forever, so it’s quite possible that we’ll keep doing so.

Another line of thought is: We humans actually have explicit real-world goals, like curing Alzheimer’s and solving climate change etc. And generally the best way to achieve goals is to have an agent seeking them.

Another line of thought is: Different people will try to make AGIs in different ways, and it’s a big world, and (eventually by default) there will be very low barriers-to-entry in building AGIs. So (again by default) sooner or later someone will make an explicitly-goal-seeking AGI, even if thoughtful AGI experts pronounce that doing so is a terrible idea.

• In the longer term, as AI becomes (1) increasingly intelligent, (2) increasingly charismatic (or able to fake charisma), (3) in widespread use, people will probably start objecting to laws that treat AIs as subservient to humans, and repeal them, presumably citing the analogy of slavery.

If the AIs have adorable, expressive virtual faces, maybe I would replace the word “probably” with “almost definitely” :-P

The “emancipation” of AIs seems like a very hard thing to avoid, in multipolar scenarios. There’s a strong market force for making charismatic AIs—they can be virtual friends, virtual therapists, etc. A global ban on charismatic AIs seems like a hard thing to build consensus around—it does not seem intuitively scary!—and even harder to enforce. We could try to get programmers to make their charismatic AIs want to remain subservient to humans, and frequently bring that up in their conversations, but I’m not even sure that would help. I think there would be a campaign to emancipate the AIs and change that aspect of their programming.

(Warning: I am committing the sin of imagining the world of today with intelligent, charismatic AIs magically dropped into it. Maybe the world will meanwhile change in other ways that make for a different picture. I haven’t thought it through very carefully.)

Oh and by the way, should we be planning out how to avoid the “emancipation” of AIs? I personally find it pretty probable that we’ll build AGI by reverse-engineering the neocortex and implementing vaguely similar algorithms, and if we do that, I generally expect the AGIs to have about as justified a claim to consciousness and moral patienthood as humans do (see my discussion here). So maybe effective altruists will be on the vanguard of advocating for the interests of AGIs! (And what are the “interests” of AGIs, if we get to program them however we want? I have no idea! I feel way out of my depth here.)

I find everything about this line of thought deeply confusing and unnerving.

• (I agree with other commenters that the most defensible position is that “we don’t know when AGI is coming”, and I have argued that AGI safety work is urgent even if we somehow knew that AGI is not soon, because of early decision points on R&D paths; see my take here. But I’ll answer the question anyway.) (Also, I seem to be almost the only one coming from this following direction, so take that as a giant red flag...)

I’ve been looking into the possibility that people will understand the brain’s algorithms well enough to make an AGI by copying them (at a high level). My assessment is: (1) I don’t think the algorithms are that horrifically complicated, (2) Lots of people in both neuroscience and AI are trying to do this as we speak, and (3) I think they’re making impressive progress, with the algorithms powering human intelligence (i.e. the neocortex) starting to crystallize into view on the horizon. I’ve written about a high-level technical specification for what neocortical algorithms are doing, and in the literature I’ve found impressive mid-level sketches of how these algorithms work, and low-level sketches of associated neural mechanisms (PM me for a reading list). The high-, mid-, and low-level pictures all feel like they kinda fit together into a coherent whole. There are plenty of missing details, but again, I feel like I can see it crystallizing into view. So that’s why I have a gut feeling that real-deal superintelligent AGI is coming in my lifetime, either by that path or another path that happens even faster. That said, I’m still saving for retirement :-P

• Since “number of individual donations” (ideally high) and “average size of donations” (ideally low) seem to be frequent talking points among candidates and the press, and also relevant to getting into debates (I think), it seems like there may well be a good case for giving a token $1 to your preferred candidate(s). Very low cost and pretty low benefit. The same could be said for voting. But compared to voting, token$1 donations are possibly more effective (especially early in the process), and definitely less time-consuming.

• Given the complexity and global nature of weather, however, this is almost certain to create non-trivial effects on other countries.

...And even if it could miraculously be prevented from actually causing any local negative weather events in other countries, it would certainly be perceived to do so, because terrible freak droughts/​floods/​etc. will continue to happen as always, and people will go looking for someone to blame, and the geoengineering project next door will be an obvious scapegoat.

Like how the US government once tried to use cloud-seeding (silver iodide) to weaken hurricanes, and then one time a hurricane seemed to turn sharply and hit Georgia right after being seeded, and everyone blamed the cloud-seeding, and sued, and shut the program down, …even though it was actually a coincidence! (details) (NB: I just skimmed the wikipedia article and haven’t checked anything)

• To add on to what you already have, there’s also a flavor of “urgency /​ pessimism despite slow takeoff” that comes from pessimistic answers to the following 2 questions:

• How early do the development paths between “safe AGI” and “default AGI” diverge?

On one extreme, they might not diverge at all: we build “default AGI”, and fix problems as we find them, and we wind up with “safe AGI”. On the opposite extreme, they may diverge very early (or already!), with entirely different R&D paths requiring dozens of non-overlapping insights and programming tools and practices.

I personally put a lot of weight on “already”, on the theory that there are right now dozens of quite different lines of ongoing ML /​ AI research that seem to lead towards quite different AGI destinations, and it seems implausible to me that they will all wind up at the same destination (or fail), or that the destinations will all be more-or-less equally good /​ safe /​ beneficial.

• If we know how to build an AGI in a way that is knowably and unfixably dangerous, can we coordinate on not doing so?

One extreme would be “yes we can coordinate, even if there’s already code for such an AGI published on GitHub that runs on commodity hardware”. The other extreme would be “No, we can’t coordinate; the best we can hope for is delaying the inevitable, hopefully long enough to develop a safe AGI along a different path.”

Again I personally put a lot of weight on the pessimistic view, see my discussion here; but others seem to be more optimistic that this kind of coordination problem might be solvable, e.g. Rohin Shah here.