Thanks for running with the idea! This is a major thing within education these days (e.g., Khan academy). This seems reasonably successful although Peter’s example and the tendency to hallucinate makes me a bit concerned.
I’d be keen on attempting to fine-tune available foundations models on the relevant data. E.g., gpt-3.5 and see how good a result one might get.
My intuition after playing around with many of these models is that GPT 3.5 is probably not good enough at general reasoning to produce consistent results. It seems likely to me that either GPT 4 or Claude 2 would be good enough. FWIW, in a recent video Nathan Labenz said that he originally suggested to use GPT 4 and then go from there when people asked him for recommendations. The analysis gets more complicated with Claude 2 (perhaps slightly worse at reasoning, longer context window).
Yeah, this could be the case. Just not sure that gpt4 can be given enough context for it to be a highly user friendly chatbot in the curriculum. But it might be the best of the two options.
Thanks for running with the idea! This is a major thing within education these days (e.g., Khan academy). This seems reasonably successful although Peter’s example and the tendency to hallucinate makes me a bit concerned.
I’d be keen on attempting to fine-tune available foundations models on the relevant data. E.g., gpt-3.5 and see how good a result one might get.
My intuition after playing around with many of these models is that GPT 3.5 is probably not good enough at general reasoning to produce consistent results. It seems likely to me that either GPT 4 or Claude 2 would be good enough. FWIW, in a recent video Nathan Labenz said that he originally suggested to use GPT 4 and then go from there when people asked him for recommendations. The analysis gets more complicated with Claude 2 (perhaps slightly worse at reasoning, longer context window).
Yeah, this could be the case. Just not sure that gpt4 can be given enough context for it to be a highly user friendly chatbot in the curriculum. But it might be the best of the two options.