Throughout its massive GPT-5 livestream on Thursday, OpenAI confirmed off just a few charts that made the mannequin appear fairly spectacular — however in the event you look intently, some graphs had been a little bit bit off.
In a single, paradoxically exhibiting how effectively GPT-5 does in “deception evals throughout fashions,” the size is in all places. For “coding deception,” for instance, GPT-5 apparently will get a 50.0 p.c deception charge, however that’s in comparison with OpenAI’s smaller 47.4 p.c o3 rating which in some way has a bigger bar.
Or this one, the place one among GPT-5’s scores is decrease than o3’s however is proven with an even bigger bar. On this identical chart, o3 and GPT-4o’s scores are completely different however proven with equally-sized bars. That chart was dangerous sufficient that CEO Sam Altman commented on it, calling it a “mega chart screwup.” An OpenAI advertising and marketing staffer additionally apologized for the “unintentional chart crime.”
OpenAI didn’t instantly reply to a request for remark. And whereas it’s unclear if OpenAI used GPT-5 to truly make the charts, it’s nonetheless not an awesome search for the corporate on its massive launch day — particularly when it’s touting the “important advances in decreasing hallucinations” with its new mannequin.