Anthropic took less than a year to set up large model training infrastructure from scratch but with the benefit of experience. This indicates that infrastructure isn’t currently extremely hard to replicate.
EleutherAI has succeeded at training some fairly large models (the biggest has like 20B params, compared to 580B in PaLM) while basically just being talented amateurs (and also not really having money). These models introduced a simple but novel tweak to the transformer architecture that PaLM used (parallel attention and MLP layers). This suggests that experience also isn’t totally crucial.
I think that the importance of ML experience for success is kind of low compared to other domains of software engineering.
My guess is that entrenched labs will have bigger advantages as time goes on and as ML gets more complicated.
Anthropic took less than a year to set up large model training infrastructure from scratch but with the benefit of experience. This indicates that infrastructure isn’t currently extremely hard to replicate.
EleutherAI has succeeded at training some fairly large models (the biggest has like 20B params, compared to 580B in PaLM) while basically just being talented amateurs (and also not really having money). These models introduced a simple but novel tweak to the transformer architecture that PaLM used (parallel attention and MLP layers). This suggests that experience also isn’t totally crucial.
I think that the importance of ML experience for success is kind of low compared to other domains of software engineering.
My guess is that entrenched labs will have bigger advantages as time goes on and as ML gets more complicated.