Final comment—quick take on “current approach in AI”
If you’re still here, still reading this comment chain (my guess is that there is a 90% chance the original commentor is gone, and most forum readers are gone too), you might be confused because I haven’t mentioned the “current paradigm of ML or AI”.
For completeness it’s worth filling this in. So like, here it is:
Basically, the current paradigm of ML or deep learning right now is a lot less deep than it seems.
Essentially, people have models, like a GPT-3/BERT transformer, that they tweak, run data through, and look at the resulting performance.
To build these models and new ones, people basically append and modify existing models with new architectures or data, and look at how the results perform on established benchmarks (e.g. language translation, object detection).
Yeah, this doesn’t sound super principled, and it isn’t.
To give an analogy, imagine bridge building in civil engineering. My guess is that when designing a bridge, engineers use intricate knowledge of materials and physics, and choose every single square meter of every component of the bridge in accordance to what is needed, so that the resulting whole stands up and supports every other component.
In contrast, another approach to building a bridge is to just bolt and weld a lot of pieces together, in accordance with intuition and experience. This approach probably would copy existing designs, and there would be a lot of tacit knowledge and rules of thumb (e.g. when you want to make a bridge 2x as big, you usually have to use more than 2x more material). With many iterations, over time this process would work and be pretty efficient.
The second approach is basically a lot of how deep learning works. People are trying different architectures, adding layers, that are moderate innovations over the last model.
There’s a lot more details. Things like unsupervised learning, or encoder decoder models, latent spaces, and more, are super interesting and involve real insights or new approaches.
I’m basically too dumb to build a whole new deep learning architecture, but many people are smart enough, and major advances can be made like by new insights. But still, a lot of it is this iteration and trial and error. The biggest enablement is large amounts of data and computing capacity, and I guess lots of investor money.
Final comment—quick take on “current approach in AI”
If you’re still here, still reading this comment chain (my guess is that there is a 90% chance the original commentor is gone, and most forum readers are gone too), you might be confused because I haven’t mentioned the “current paradigm of ML or AI”.
For completeness it’s worth filling this in. So like, here it is:
Basically, the current paradigm of ML or deep learning right now is a lot less deep than it seems.
Essentially, people have models, like a GPT-3/BERT transformer, that they tweak, run data through, and look at the resulting performance.
To build these models and new ones, people basically append and modify existing models with new architectures or data, and look at how the results perform on established benchmarks (e.g. language translation, object detection).
Yeah, this doesn’t sound super principled, and it isn’t.
To give an analogy, imagine bridge building in civil engineering. My guess is that when designing a bridge, engineers use intricate knowledge of materials and physics, and choose every single square meter of every component of the bridge in accordance to what is needed, so that the resulting whole stands up and supports every other component.
In contrast, another approach to building a bridge is to just bolt and weld a lot of pieces together, in accordance with intuition and experience. This approach probably would copy existing designs, and there would be a lot of tacit knowledge and rules of thumb (e.g. when you want to make a bridge 2x as big, you usually have to use more than 2x more material). With many iterations, over time this process would work and be pretty efficient.
The second approach is basically a lot of how deep learning works. People are trying different architectures, adding layers, that are moderate innovations over the last model.
There’s a lot more details. Things like unsupervised learning, or encoder decoder models, latent spaces, and more, are super interesting and involve real insights or new approaches.
I’m basically too dumb to build a whole new deep learning architecture, but many people are smart enough, and major advances can be made like by new insights. But still, a lot of it is this iteration and trial and error. The biggest enablement is large amounts of data and computing capacity, and I guess lots of investor money.