DirectedEvolution comments on On Deference and Yudkowsky’s AI Risk Estimates

DirectedEvolution Jun 20, 2022, 2:27 AM
9 points
0 ∶ 0
I’m going to break a sentence from your comment here into bits for inspection. Also, emphasis and elisions mine.
I would also say that to the extent that Yudkowsky-style research has enjoyed any popularity of late, it’s because people have been looking at the old debate and realizing that
- extremely simple generic architectures written down in a few dozen lines of code
- with large capability differences between very similar lines of code
- solving many problems in many fields and subsuming entire subfields as simply another minor variant
- with large generalizing models...
- powered by OOMs more compute
- steadily increasing in agency
is
- a short description of Yudkowsky’s views on what the runup will look like
- and how DL now works.
We don’t have a formalism to describe what “agency” is. We do have several posts trying to define it on the Alignment Forum:
While it might not be the best choice, I’m going to use Gradations of Agency as a definition, because it’s more systematic in its presentation.
“Level 3” is described as “Armed with this ability you can learn not just from your own experience, but from the experience of others—you can identify successful others and imitate them.”
This doesn’t seem like what any ML model does. So we can look at “Level 2,” which gives the example ” You start off reacting randomly to inputs, but you learn to run from red things and towards green things because when you ran towards red things you got negative reward and when you ran towards green things you got positive reward.”
This seems like how all ML works.
So using the “Gradations of Agency” framework, we might view individual ML systems as improving in power and generality within a single level of agency. But they don’t appear to be changing levels of agency. They aren’t identifying other successful ML models and imitating them.
Gradations of Agency doesn’t argue whether or not there is an asymptote of power and generality within each level. Is there a limit to the power and generality possible within level 2, where all ML seems to reside?
This seems to be the crux of the issue. If DL is approaching an asymptote of power and generality below that of AGI as model and data sizes increase, then this cuts directly against Yudkowsky’s predictions. On the other hand, if we think that DL can scale to AGI through model and data size increases alone, then that would be right in line with his predictions.
A 10 trillion parameter model now exists, and it’s been suggested that a 100 trillion parameter model, which might even be created this year, might be roughly comparable to the power of the human brain.
It’s scary to see that we’re racing full-on toward a very near-term ML project that might plausibly be AGI. However, if a 100-trillion parameter ML model is not AGI, then we’d have two strikes against Yudkowski. If neither a small coded model nor a 100-trillion parameter trained model using 2022-era ML results in AGI, then I think we have to take a hard look at his track record on predicting what technology is likely to result in AGI. We also have his “AGI well before 2050” statement from “Beware boasting” to work with, although that’s not much help.
On the other hand, I think his assertiveness about the importance of AI safety and risk is appropriate even if he proves wrong about the technology by which AGI will be created.
I would critique the OP, however, for not being sufficiently precise in its critiques of Yudkowsky. As its “fairly clearcut examples,” it uses 20+-year-old predictions that Yudkowsky has explicitly disavowed. Then, at the end, it complains that he hasn’t “acknowledged his mixed track record.” Yet in the post it links, Yudkowsky’s quoted as saying:
To be a slightly better Bayesian is to spend your entire life watching others slowly update in excruciatingly predictable directions that you jumped ahead of 6 years earlier so that your remaining life could be a random epistemic walk like a sane person with self-respect.
6 years is not 20 years. It’s perfectly consistent to say that a youthful, 20+-years-in-the-past version of you thought wrongly about a topic, but that you’ve since come to be so much better at making predictions within your field that you’re 6 years ahead of Metaculus. We might wish he’d stated these predictions in public and specified what they were. But his failure to do so doesn’t make him wrong, but rather lacking evidence of his superior forecasting ability. These are distinct failure modes.
Overall, I think it’s wrong to conflate “Yudkowsky was wrong 20+ years ago in his youth” with “not everyone in AI safety agrees with Yudkowsky” with “Yudkowsky hasn’t made many recent, falsifiable near-term public predictions about AI timelines.” I think this is a fair critique of the OP, which claims to be interrogating Yudkowsky’s “track record.”
But I do agree that it’s wise for a non-expert to defer to a portfolio of well-chosen experts, rather than the views of the originator of the field alone. While I don’t love the argument the OP used to get there, I do agree with the conclusion, which strikes me as just plain common sense.
- kokotajlod Jun 20, 2022, 4:38 AM
  6 points
  0 ∶ 0
  Parent
  Re gradations of agency: Level 3 and level 4 seem within reach IMO. IIRC there are already some examples of neural nets being trained to watch other actors in some simulated environment and then imitate them. Also, model-based planning (i.e. level 4) is very much a thing, albeit something that human programmers seem to have to hard-code. I predict that within 5 years there will be systems which are unambiguously in level 3 and level 4, even if they aren’t perfect at it (hey, we humans aren’t perfect at it either).
- Charles He Jun 20, 2022, 5:58 AM
  4 points
  0 ∶ 0
  Parent
  
  Level 3″ is described as “Armed with this ability you can learn not just from your own experience, but from the experience of others—you can identify successful others and imitate them.” This doesn’t seem like what any ML model does.
  
  This sounds like straightforward transfer learning (TL) or fine tuning, common in 2017.
  
  So you could just write 15 lines of python which shops between some set of pretrained weights and sees how they perform. Often TL is many times (1000x) faster than random weights and only needs a few examples.
  
  As speculation: it seems like in one of the agent simulations you can just have agents grab other agents weights or layers and try them out in a strategic way (when they detect an impasse or new environment or something). There is an analogy to biology where species alternate between asexual vs sexual reproduction, and trading of genetic material occurs during periods of adversity. (This is trivial, I’m sure a second year student has written a lot more.)
  
  This doesn’t seem to fit any sort of agent framework or improve agency though. It just makes you train faster.
  - Charles He Jun 20, 2022, 7:06 AM
    2 points
    0 ∶ 0
    Parent
    Eh, there seems like a connection to interpretability.
    
    For example, if the ML architecture “were modular+categorized or legible to the agents”, they would more quickly and effectively swap weights or models.
    
    So there might be some way where legibility can emerge by selection pressure in an environment where say, agents had limited capacity to store weights or data, and had to constantly and extensively share weights with each other. You could imagine teams of agents surviving and proliferating by a shared architecture that let them pass this data fluently in the form of weights.
    
    To make sure the transmission mechanism itself isn’t crazy baroque you can, like, use some sort of regularization or something.
    
    I’m 90% sure this is a shower thought but like it can’t be worse than “The Great Reflection”.