Learning as much Deep Learning math as I could in 24 hours

TL:DR I designed an experiment where I committed to spend two 12 hour days trying to learn as much deep-learning math as possible, basically from scratch.

Table of Contents

  1. Origins and Motivations

  2. Results

  3. Takeaways

  4. Experiment set-up

  5. The Curriculum

  6. Documentation on hours

Origins and Motivations

For a long time, I’ve felt intimidated by the technical aspects of alignment research. I had never taken classes on linear algebra or multivariable calculus or deep learning, and when I cracked open many AI papers, I was terrified by symbols and words I didn’t understand.

7 months ago I wrote up a short doc about how I was going to remedy my lack of technical knowledge: I collected some textbooks and some online courses, and I decided to hire a tutor to meet a few hours a week. I had the first two weeks of meetings, it was awesome, then regular meetings got disrupted by travel, and I never came back to it.

When I thought about my accumulating debt of technical knowledge, my cached answer was “Oh, that might take six months to get up to speed. I don’t have the time.

Then, watching my productivity on other projects over the intervening months, I noticed two things:

  1. There appeared to be massive variance in my productivity. Sometimes, in a single day, I would get more done than I had accomplished in previous weeks.

  2. I seemed to both enjoy and get more done by “sprinting” through certain projects, eg. by spending 10 hours on it in a single day, rather than spreading that same work out over 2 hours a week for 5 weeks. It was, for some reason, way more motivating and seemingly more efficient to sprint.

Also, when I asked myself what I thought the main bottlenecks were for addressing my technical debt problem, I identified two categories:

  1. Time (I felt busy all the time, and was afraid of committing too much to one project)

  2. A combination of lacking Motivation, Accountability and Fun

Then, as my mind wandered, I started to put 2 and 2 together: Perhaps these new things I had noticed about my productivity, could be used to address the bottlenecks in my technical debt? I decided to embark on an experiment: how much technical background on deep learning could I learn in a single weekend? My understanding of the benefits of this experiment were as follows:

  1. Committing “a weekend” felt like a much smaller time cost than committing “a few months”, even if they were the same number of hours.

  2. No Distraction: I could design my environment to minimize distractions for two days, something it would be intractable to do to the same degree for several months.

  3. “Trying to learn as much as possible” felt like a challenge. I was, to be honest, pretty scared. I didn’t know what I was doing, it felt extreme, but that also made it exciting and fun.

  4. I had some historical data that I might be good at this kind of sprinting, and framing this as an experiment to see what I could learn about my productivity added another layer of discovery-driven motivation and fun. What if I learned more about how to be productive and get hard things done via this experiment?

  5. As far as I knew, nobody else among my peers had done this—but I suspected that more people than me had the same problems, and that if I conducted this experiment, I might learn things that would be helpful to others, which added yet another layer of discovery-driven motivation and fun.

  6. Accountability: Once I told somebody about this, it was hard to back out. It’s way easier for them to monitor me for a weekend than for a few months.

Results

  • I’d consider the experiment a success: I finished the whole curriculum in ~18 hours, and I got a lot of neat take-aways I’ll go over below.

  • At the end of day 1, after 12 hours of cramming, I was too exhausted to explain what I had learned. However, the next morning, after reading my notes for 20 minutes, I was able to explain everything I had learned the night before. I haven’t tested myself for retention yet, but I feel fairly confident that after glancing at my notes for 20 minutes, I’ll be able to recall and use the information again, which is good enough for my practical purposes.

  • I ended up having to take more breaks than I expected because my brain just kinda felt fried at times. I would often just lay down on the floor and look at the ceiling. If I wasn’t part of an experiment, I think these moments are likely when I would’ve given up and done something else. But I noticed that after 10, 20 or 30 minutes, I was able to return to being productive, which is likely the moment when this experiment showed its worth.

Takeaways

  • Experiments are really useful, as outlined by Neel Nanda.

  • There is no speed limit—the world appears to be not very optimized for speed, including past versions of me. Two days of actually trying allowed me to accomplish more on this particular goal than 7 previous months of cowardice had, and (maybe) about as much as a full semester college course would have.

  • A big part of optimizing the actions you take is deciding upon the optimal sequence. Learning seems to be a lot about setting up the right sequence of inferential steps; order matters a lot. If you skip the wrong step, it’ll cost you.

  • Having someone with way more knowledge spend half an hour curating a list of videos, especially by pointing out the things which are irrelevant and can be skipped, saved me tens of hours, probably more.

  • People waste a lot of time on “prerequisites” for the things they want to do that aren’t actually real prerequisites. Instead, you can try to directly do the thing you want, then notice where you get stuck, and go back to build just the prerequisites you notice are needed to overcome the stuck-ness. No wasted motion.

  • Explaining what you learn to someone is a great (and fun) way to actually test your knowledge, notice where you’re lacking so you can efficiently direct further study, and get useful explanations on things you’re struggling with.

  • Asking for help is hard. It’s also really worth it. This experiment probably would not have worked without the advice of Thomas Larsen and Tamera Lanham. Their input of like ~3 total hours probably saved me tens of hours, which seems like a great utilitarian deal.

  • The Curricula I used was (only partially) optimized. It was still way better than I could’ve come up with on my own, but a better curriculum is definitely possible.

    • Thomas just came up with a list of concepts, looked through the 3 blue 1 brown series on them, marked the videos I could skip, then searched on youtube for other videos for the topics 3b1b didn’t cover, chose the ones with the highest views/​likes, and watched the first five minutes of a couple to see if he liked them. In total I think this cost him ~1 hour over two days.

The Experiment Set-Up

  • I committed to spend 12 hours a day, for two days, doing whatever I could to learn about deep learning. I intended to optimize my breaks, to take meals while working, and generally try to ignore everything and everyone else not involved in the experiment...

  • I would occupy a room in the Lightcone offices with a whiteboard, and lock the door to minimize my friends distracting me.

  • Originally, the plan was to sprint through a python course and the fast.ai course and code a neural net by the end of the weekend.

    • After a half hour talking with Tamera Lanham and Thomas Larsen, it became apparent that what better fit my goals was understanding the math behind neural networks, rather than the coding. This meant that two days before the experiment, I discarded the curricula I had. Thomas graciously spent 1 hour building me a curriculum.

  • Originally, I was going to conduct this sprint with a friend, and with Thomas Larsen available to help us when stuck. The friend decided to do something else the night before, which in hindsight probably improved the efficiency of the experiment, because I could set my own pace very easily, and was less distracted by exciting-but-irrelevant-to-the-experiment conversation. In my original plan I was afraid of doing it alone, but I might actually recommend doing it that way.

The Curriculum that I used

Intro to deep learning (I kept returning to these videos throughout the experiment, rewatching and understanding slightly more)

Linear algebra (this took me 2hr 25 mins, and ~36 mins of breaks)

Calc 3 (this took me 2hr 57 mins and ~50 mins of breaks)

- ResNets: https://​​www.youtube.com/​​watch?v=ZILIbUvp5lk (took me 18 mins)

- RNNs (optional): https://​​www.youtube.com/​​watch?v=_aCuOwF1ZjU (took me 13 mins)

- Transformers: https://​​www.youtube.com/​​watch?v=4Bdc55j80l8&t=609s

(I spent like, two hours on the above video which ex-post was not great. I would recommend others choose a different explainer on Transformers. )

- RL basics https://​​www.youtube.com/​​watch?v=JgvyzIkgxF0 (took me 25 mins)

- policy gradients /​ ppo: https://​​www.youtube.com/​​watch?v=5P7I-xPq8u8&t=318s

I could not understand the above video after rewatching it several times (I think the curricula skipped some prerequisites for this) so I had to have Thomas Larsen walk me through it on his own for around an hour. Thanks Thomas!

- RLHF: rob miles video: https://​​www.youtube.com/​​watch?v=PYylPRX6z4Q (took me 23 mins)

Documentation on Hours

(I used toggl track to record my time, and was fairly happy with the software. However, I made many errors /​ didn’t record breaks correctly, etc. So take these numbers with a grain of salt.)

Saturday

Video 1

28 min

Started around 10:00am

Video 1 Summarizing

15 min

???

7 min

Video 2

29 min

Break

4 min

Video 2 Summarizing

20 min

Video 3

13 min

Break

9 min

White-Boarding

4 min

Linnear Algebra—first four videos

58 min

Cleaning up notes

7 min

Chapter 4 Linnear Algebra

16 min

White boarding

14 min

Three Dimensional Linnear Transformations

14 min

???

12 min

Chapter 9

10 min

break

27 min

Chapter 13

14 min

Calculus?

3 min

Backpropagation, Chapter 4

10 min

break

9 min

Multivariable Calculus

1hr 6min

meditation

9 min

Multivariable Calculus

10 min

meditation

7 min

Multivariable Calculus

26 min

break

25 min

Multivariable Calculus

47 min

Multivariable Calculus

28 min

Watching Neural Nets Ch. 3 again

30 min

????

45 min

Trying to explain and failing

30 min

Ended around 8:45pm

Sunday

Rewatching Backpropagation

22 min

Started around 11am

Resnets Video

18 min

RNN’s Video

13 min

Transformers

21 min

break

5 min

Transformers

36 min

RL Basics

25 min

break

23 min

More RL

23 min

Talking to Thomas about Transformers and Reinforcement Learning and PPO

120 mins

Ended around 6pm