OpenAI’s Deep Research has set new records in the toughest AI exam, leaving ChatGPT o3-mini and DeepSeek far behind

by Alex Carter February 4, 2025

written by Alex Carter February 4, 2025 0 comments

• Top-Skoring AI boosted accuracy 183% in two weeks.
• Chatgpt O3-Mini now scores 13% accuracy based on capacity
• OpenAI Deep Research eliminates competition with a result of 26.6% accuracy

The difficult AI evaluation referred to as Humanity’s Last Exam became introduced just beneath weeks in the past, and we’ve already witnessed a excellent increase in accuracy. ChatGPT o3-mini and OpenAI’s Deep Reasoning are presently main the percent. This AI benchmark, advanced with the aid of worldwide experts, features some of the maximum difficult reasoning demanding situations and inquiries imaginable.

It’s so complicated that when I formerly discussed Humanity’s Last Exam inside the related article, I located myself not able to realize one of the questions, let alone offer a solution. At the time of that article, the surprising DeepSeek R1 was on the pinnacle of the leaderboard with a nine.4% accuracy rating based completely on textual content assessment (no longer multi-modal). Fast ahead to this week, and OpenAI’s o3-mini has achieved a ten.5% accuracy at the o3-mini setting and a good higher 13% at the o3-mini-excessive setting, that’s more superior however calls for extra time to generate responses. Even greater noteworthy is the overall performance of OpenAI’s new AI agent, Deep Research, which scored an outstanding 26.6% at the benchmark.

It looks like the latest OpenAI model is very doing well across many topics.
My guess is that Deep Research particularly helps with subjects including medicine, classics, and law. pic.twitter.com/x8Ilmq1aQS

— Dan Hendrycks (@DanHendrycks) February 3, 2025

This represents a astonishing 183% improvement in accuracy in much less than ten days. It’s crucial to mention that Deep Research has search abilities, which gives it an edge over other AI models that lack this option. The capacity to get admission to the web is mainly nice for a check like Humanity’s Last Exam, because it includes questions that require fashionable knowledge.

The consequences from models reading Humanity’s Last Exam are showing steady improvement, raising the question of how lengthy it’ll be earlier than an AI model can nearly meet the benchmark. While it’s not going that AI will reach that level each time soon, I wouldn’t rely it out.

Better, however 26.6% by no means were given me any SATs

Openai Deep Research is a notable tool, and I was really influenced by the performances provided during the announcement of AI agent. Intensive Research acts as your individual analyst, dedicate time to fully research and generates reports and answers that usually takes an important time to complete humans.

Although the score of 26.6% is quite remarkable in the previous examination of humanity, especially given the rapid progress seen on the leaderboard of the benchmark in a few weeks, it is still completely low – no one below 50% below 50% Will not consider the score. A real world landscape.

The final examination of humanity acts as an excellent benchmark that will be important with the AI model developing, allowing us to measure their progress. How long will we take before crossing 50% of the threshold? And which model will first get that milestone?

Alex Carter

Alex Carter is a 32-year-old digital nomad and independent journalist with over seven years of experience in online media, content writing, and digital news. With a Bachelor’s degree in Journalism or Communications, he specializes in covering current events, business, technology, and entertainment, offering insightful analysis and breaking news with a fresh, engaging tone.

OpenAI’s Deep Research has set new records in the toughest AI exam, leaving ChatGPT o3-mini and DeepSeek far behind

Unfortunate Outfits at the 2025 Grammys: Fashion Misses and Bold Choices

Has Arya’s death been foreshadowed for Season 7?

You may also like

Never Miss a Thing Subscribe to Our Newsletter!

Never Miss a Thing
Subscribe to Our Newsletter!