OpenAI's Deep Research has set new records in the toughest AI exam, leaving ChatGPT o3-mini and DeepSeek far behind

- Advertisement -

• Top-Skoring AI boosted accuracy 183% in two weeks.
• Chatgpt O3-Mini now scores 13% accuracy based on capacity
• OpenAI Deep Research eliminates competition with a result of 26.6% accuracy

The difficult AI evaluation referred to as Humanity’s Last Exam became introduced just beneath weeks in the past, and we’ve already witnessed a excellent increase in accuracy. ChatGPT o3-mini and OpenAI’s Deep Reasoning are presently main the percent. This AI benchmark, advanced with the aid of worldwide experts, features some of the maximum difficult reasoning demanding situations and inquiries imaginable.

It’s so complicated that when I formerly discussed Humanity’s Last Exam inside the related article, I located myself not able to realize one of the questions, let alone offer a solution. At the time of that article, the surprising DeepSeek R1 was on the pinnacle of the leaderboard with a nine.4% accuracy rating based completely on textual content assessment (no longer multi-modal). Fast ahead to this week, and OpenAI’s o3-mini has achieved a ten.5% accuracy at the o3-mini setting and a good higher 13% at the o3-mini-excessive setting, that’s more superior however calls for extra time to generate responses. Even greater noteworthy is the overall performance of OpenAI’s new AI agent, Deep Research, which scored an outstanding 26.6% at the benchmark.

It looks like the latest OpenAI model is very doing well across many topics.
My guess is that Deep Research particularly helps with subjects including medicine, classics, and law. pic.twitter.com/x8Ilmq1aQS

— Dan Hendrycks (@DanHendrycks) February 3, 2025

This represents a astonishing 183% improvement in accuracy in much less than ten days. It’s crucial to mention that Deep Research has search abilities, which gives it an edge over other AI models that lack this option. The capacity to get admission to the web is mainly nice for a check like Humanity’s Last Exam, because it includes questions that require fashionable knowledge.

The consequences from models reading Humanity’s Last Exam are showing steady improvement, raising the question of how lengthy it’ll be earlier than an AI model can nearly meet the benchmark. While it’s not going that AI will reach that level each time soon, I wouldn’t rely it out.

Better, however 26.6% by no means were given me any SATs

Openai Deep Research is a notable tool, and I was really influenced by the performances provided during the announcement of AI agent. Intensive Research acts as your individual analyst, dedicate time to fully research and generates reports and answers that usually takes an important time to complete humans.

Although the score of 26.6% is quite remarkable in the previous examination of humanity, especially given the rapid progress seen on the leaderboard of the benchmark in a few weeks, it is still completely low – no one below 50% below 50% Will not consider the score. A real world landscape.

The final examination of humanity acts as an excellent benchmark that will be important with the AI model developing, allowing us to measure their progress. How long will we take before crossing 50% of the threshold? And which model will first get that milestone?

MENU

Jennifer Lopez Embraces Self-Love This Valentine’s Day Following Affleck Split

Jennifer Lopez & Kevin Costner Dating Rumors: Separating Fact from Fiction

The Casino Secret JLo Has Been Hiding: Mother’s Millions and Ex-Husband’s Shocking Behavior Exposed!

Goldie Hawn Dazzles at the 2025 Oscars: A Night of Elegance and Emotion

Selena Gomez Captivates at the SAG Awards 2025 with Elegant Transformation

Legendary Actor Gene Hackman and Wife Found Dead in New Mexico Home

Michelle Trachtenberg’s Sudden Death at 39 Sparks Heartfelt Celebrity Tributes

China Develops Record-Breaking 2D Chip Outperforming Intel by 40%

Microsoft Unveils Groundbreaking Quantum Computing Chip Using New State of Matter

Egg Shortage Threatens Easter Tradition as Dye Kit Makers Feel the Pinch

Trump Acknowledges Bezos’ Leadership at Washington Post

UK Brokers Ukraine-US Ceasefire: Inside the Diplomatic Breakthrough

Geopolitical Disruptive Dynamics: Comprehensive Analytical Assessment of Russo-American Political Paradigm Shifts in Contemporary International Relations

Diplomatic Dance: How Macron Skillfully Navigated Trump’s Ukraine Stance

Russia Won’t Accept NATO Troops in Ukraine as Part of Any Peace Deal, Says Lavrov

Morgan Stanley Plans to Reduce Workforce by 2,000 Employees

BlackRock CEO Warns of Economic Risks from Trump Policies and Bitcoin Volatility

Gold Surges Amid Intensifying Global Trade Tensions

Incorporating Dividend Funds into Your Investment Strategy

Egg Shortage Threatens Easter Tradition as Dye Kit Makers Feel the Pinch

Inside Drake’s $38,000 Per Night Sydney Crown Presidential Villa

Address Beach Resort: The world’s highest infinity pool has opened in Dubai

AI Accelerates Humanoid Robot Development, Investors Remain Cautious

BYD Surpasses Tesla in 2024 EV Sales, Annual Report Reveals

James Murdoch, Elon Musk Confidant, Sells $13 Million in Tesla Stock During Market Downturn

China Develops Record-Breaking 2D Chip Outperforming Intel by 40%

Elon Musk Denies Harmful Actions in Contentious Interview

OpenAI’s Deep Research has set new records in the toughest AI exam, leaving ChatGPT o3-mini and DeepSeek far behind

More like this
Related

Revolutionary Elon Musk Robot Outperforms Professional Chefs in Cooking Demonstration

MLB Star Confirmed as Hayden Hopkins’ Baby’s Father, Relieving Mark Davis

AI Accelerates Humanoid Robot Development, Investors Remain Cautious

BYD Surpasses Tesla in 2024 EV Sales, Annual Report Reveals

About us

Company

The latest

Revolutionary Elon Musk Robot Outperforms Professional Chefs in Cooking Demonstration

MLB Star Confirmed as Hayden Hopkins’ Baby’s Father, Relieving Mark Davis

AI Accelerates Humanoid Robot Development, Investors Remain Cautious

MENU

OpenAI’s Deep Research has set new records in the toughest AI exam, leaving ChatGPT o3-mini and DeepSeek far behind

More like thisRelated

About us

Company

The latest

More like this
Related