Ward A comments on Measuring artificial intelligence on human benchmarks is naive