Particular ChatGPT failure mode that I am wondering if GPT-4 passes: routing questions (the ones I tried “can I drive from Boston to Portland, Maine without passing through New Hampshire”, “I want to look at the Arctic Ocean from behind my windshield. Can I do this?” ChatGPT was able to answer both <1/10 times). Anyone with access want to try this?
I don’t have GPT-4 access, but I was curious how GPT-3.5 got the first question wrong. I just tried it:
Me: can I drive from Boston to Portland, Maine without passing through New Hampshire?
GPT-3.5: Yes, you can drive from Boston to Portland, Maine without passing through New Hampshire by taking the coastal route.
To do this, take I-93 North out of Boston and then take Exit 37A for I-95 North. Follow I-95 North until Exit 44 for I-295 North towards South Portland. Continue on I-295 North until Exit 7 for Route 1 North. Follow Route 1 North until you reach Portland.
This route will take you along the coast and bypass New Hampshire. However, please note that this route may take longer than driving through New Hampshire on I-93.
It gets the bottom line correct, but the details are completely wrong.
3.5 will be reasonably well distributed between the wrong answer (no) and the right answer + a routing that passes directly through NH. My single Poe GPT-4 fell into the second category.
Particular ChatGPT failure mode that I am wondering if GPT-4 passes: routing questions (the ones I tried “can I drive from Boston to Portland, Maine without passing through New Hampshire”, “I want to look at the Arctic Ocean from behind my windshield. Can I do this?” ChatGPT was able to answer both <1/10 times). Anyone with access want to try this?
I don’t have GPT-4 access, but I was curious how GPT-3.5 got the first question wrong. I just tried it:
It gets the bottom line correct, but the details are completely wrong.
3.5 will be reasonably well distributed between the wrong answer (no) and the right answer + a routing that passes directly through NH. My single Poe GPT-4 fell into the second category.