BusinessDiscoveriesInnovationIT&CTech & Science OpenAI’s Deep Research has set new records in the toughest AI exam, leaving ChatGPT o3-mini and DeepSeek far behind by Alex Carter February 4, 2025 written by Alex Carter February 4, 2025 0 comments 666 • Top-Skoring AI boosted accuracy 183% in two weeks. • Chatgpt O3-Mini now scores 13% accuracy based on capacity • OpenAI Deep Research eliminates competition with a result of 26.6% accuracy The difficult AI evaluation referred to as Humanity’s Last Exam became introduced just beneath weeks in the past, and we’ve already witnessed a excellent increase in accuracy. ChatGPT o3-mini and OpenAI’s Deep Reasoning are presently main the percent. This AI benchmark, advanced with the aid of worldwide experts, features some of the maximum difficult reasoning demanding situations and inquiries imaginable. It’s so complicated that when I formerly discussed Humanity’s Last Exam inside the related article, I located myself not able to realize one of the questions, let alone offer a solution. At the time of that article, the surprising DeepSeek R1 was on the pinnacle of the leaderboard with a nine.4% accuracy rating based completely on textual content assessment (no longer multi-modal). Fast ahead to this week, and OpenAI’s o3-mini has achieved a ten.5% accuracy at the o3-mini setting and a good higher 13% at the o3-mini-excessive setting, that’s more superior however calls for extra time to generate responses. Even greater noteworthy is the overall performance of OpenAI’s new AI agent, Deep Research, which scored an outstanding 26.6% at the benchmark. It looks like the latest OpenAI model is very doing well across many topics. My guess is that Deep Research particularly helps with subjects including medicine, classics, and law. pic.twitter.com/x8Ilmq1aQS — Dan Hendrycks (@DanHendrycks) February 3, 2025 This represents a astonishing 183% improvement in accuracy in much less than ten days. It’s crucial to mention that Deep Research has search abilities, which gives it an edge over other AI models that lack this option. The capacity to get admission to the web is mainly nice for a check like Humanity’s Last Exam, because it includes questions that require fashionable knowledge. The consequences from models reading Humanity’s Last Exam are showing steady improvement, raising the question of how lengthy it’ll be earlier than an AI model can nearly meet the benchmark. While it’s not going that AI will reach that level each time soon, I wouldn’t rely it out. Better, however 26.6% by no means were given me any SATs Openai Deep Research is a notable tool, and I was really influenced by the performances provided during the announcement of AI agent. Intensive Research acts as your individual analyst, dedicate time to fully research and generates reports and answers that usually takes an important time to complete humans. Although the score of 26.6% is quite remarkable in the previous examination of humanity, especially given the rapid progress seen on the leaderboard of the benchmark in a few weeks, it is still completely low – no one below 50% below 50% Will not consider the score. A real world landscape. The final examination of humanity acts as an excellent benchmark that will be important with the AI model developing, allowing us to measure their progress. How long will we take before crossing 50% of the threshold? And which model will first get that milestone? Share 0 FacebookTwitterPinterestEmail Alex Carter Alex Carter is a 32-year-old digital nomad and independent journalist with over seven years of experience in online media, content writing, and digital news. With a Bachelor’s degree in Journalism or Communications, he specializes in covering current events, business, technology, and entertainment, offering insightful analysis and breaking news with a fresh, engaging tone. previous post Unfortunate Outfits at the 2025 Grammys: Fashion Misses and Bold Choices next post Has Arya’s death been foreshadowed for Season 7? You may also like Stephen Hawking’s Apocalypse Warning: Urgent Scientific Predictions Revealed October 13, 2025 Facebook Privacy Alert: Zuckerberg’s Urgent Message About Messenger... October 1, 2025 Elon Musk Reclaims Top Spot as World’s Wealthiest... September 11, 2025 Apple iPhone Launch Sparks Fierce Samsung Smartphone Rivalry... September 10, 2025 Groundbreaking Observation: Scientists Capture Planetary Formation Beyond Our... July 24, 2025 Columbia University Negotiates Funding Restoration with Trump Administration July 24, 2025 Astronaut Reveals Profound Insight Gained During 178-Day Space... July 22, 2025 CERN Scientists Achieve Alchemists’ Dream: Transforming Lead into... May 12, 2025 France Discovers Massive Hydrogen Reserve Worth $92 Billion May 6, 2025 Gates Proposes Robot Tax to Support Workers Displaced... May 5, 2025