Charles He comments on AI Could Defeat All Of Us Combined

Charles He Jun 11, 2022, 8:44 PM
6 points
0 ∶ 0
I haven’t read the OP (I haven’t read a full forum post in weeks and I don’t like reading, it’s better to like, close your eyes and try coming up with the entire thing from scratch and see if it matches, using high information tags to compare with, generated with a meta model) but I think this is a referral to the usual training/inference cost differences.

For example, you can run GPT-3 Davinci in a few seconds at trivial cost. But the training cost was millions of dollars and took a long time.

There are further considerations. For example, finding the architecture (stacking more things in Torch, fiddling with parameters, figuring out how to implement the Key Insight , etc.) for finding the first breakthrough model is probably further expensive and hard.
- Michael_Wiebe Jun 11, 2022, 9:00 PM
  3 points
  0 ∶ 0
  Parent
  Let $C_{T}$ be the computing power used to train the model. Is the idea that “if you could afford $C_{T}$ to train the model, then you can also afford $C_{T}$ for running models”?
  Because that doesn’t seem obvious. What if you used 99% of your budget on training? Then you’d only be able to afford $0.01 \times C_{T}$ for running models.
  Or is this just an example to show that training costs >> running costs?
  - aogara Jun 11, 2022, 9:47 PM
    2 points
    0 ∶ 0
    Parent
    Yes, that’s how I understood it as well. If you spend the same amount on inference as you did on training, then you get a hell of a lot of inference.
    I would expect he’d also argue that, because companies are willing to spend tons of money on training, we should also expect them to be willing to spend lots on inference.
    - Michael_Wiebe Jun 11, 2022, 10:16 PM
      2 points
      0 ∶ 0
      Parent
      Do we know the expected cost for training an AGI? Is that within a single company’s budget?
      - aogara Jun 12, 2022, 12:38 AM
        10 points
        0 ∶ 0
        Parent
        Nearly impossible to answer. This report by OpenPhil gives it a hell of an effort, but could still be wrong by orders of magnitude. Most fundamentally, the amount of compute necessary for AGI might not be related to the amount of compute used by the human brain, because we don’t know how similar our algorithmic efficiency is compared to the brain’s.
        
        https://www.cold-takes.com/forecasting-transformative-ai-the-biological-anchors-method-in-a-nutshell/
  - Charles He Jun 11, 2022, 9:03 PM
    2 points
    0 ∶ 0
    Parent
    Yes, the last sentence is exactly correct.
    
    So like the terms of art here are “training” versus “inference”. I don’t have a reference or guide (because the relative size is not something that most people think about versus the absolute size of each individually) but if you google them and scroll through some papers or posts I think you will see some clear examples.
    - Charles He Jun 11, 2022, 9:10 PM
      2 points
      0 ∶ 0
      Parent
      Just LARPing here. I don’t really know anything about AI or machine learning.
      
      I guess in some deeper sense you are right and (my simulated version of) what Holden has written is imprecise.
      
      We don’t really see many “continuously” updating models where training continues live with use. So the mundane pattern we see today of inference, where we trivially running the instructions from the model (often on specific silicon made for inference) being much cheaper than training, may not apply for some reason, to the pattern that the out of control AI uses.
      
      It’s not impossible that if the system needs to be self improving, it has to provision a large fraction of its training cost, or something, continually.
      
      It’s not really clear what the “shape” of this “relative cost curve” would be, if this would be a short period of time, and it doesn’t make it any less dangerous.