Wow. Fine tuning a GPT-J model, especially the 20B, is much much, more work than running GPT-3.
I’m curious if you did this by grabbing the model and rolling it on the cloud, on some local GPUs (really, really expensive?) or used a service and never touched the model directly? All of these seem at least a little tricky.
Uh, feel free to not answer until you are much better.
Wow. Fine tuning a GPT-J model, especially the 20B, is much much, more work than running GPT-3.
I’m curious if you did this by grabbing the model and rolling it on the cloud, on some local GPUs (really, really expensive?) or used a service and never touched the model directly? All of these seem at least a little tricky.
Uh, feel free to not answer until you are much better.