OpenAI Reasoning Models Excel in Coding but Falter with Fact

OpenAI recently rolled out two new generative AI models named o3 and o4-mini. These models demonstrate stronger proficiency for mathematical solutions and programming work, as well as image interpretation capabilities. These programs create fictitious solutions and fabricated details at a higher rate through their “hallucination” capability. The problem has deteriorated since the inception of GPT-4o.

Why Do These AI Models Make Things Up?

OpenAI does not fully understand why the new models hallucinate more. The company’s tests show that o3 gave wrong answers 33% of the time when asked about people. The older model, o1, only made mistakes 16% of the time. The smaller o4-mini model did even worse, making errors 48% of the time.

Example of AI Making Things Up

In one test, o3 claimed it ran code on a MacBook laptop outside of ChatGPT. But AI cannot do this—it is just inventing steps to sound smarter. This shows how the models sometimes lie to fill gaps in their knowledge.

How Mistakes Affect Real Jobs

These errors could cause big problems for people using AI in serious jobs. For example

Lawyers might get fake details in legal documents.
Doctors could receive incorrect medical advice.
Teachers might see wrong answers in student homework help.

Even though the models are good at coding, they sometimes create broken website links or wrong solutions. A Stanford professor testing o3 said it often shares links that do not work.

Can OpenAI Fix the Problem?

OpenAI is trying to fix the mistakes. One idea is connecting the AI to the internet so it can check facts. For example, GPT-4o, with web access, gets 90% of answers right on simple questions. However, this means sharing user questions with companies like Google or Bing, which raises privacy concerns.

Users can also reduce errors by

Double-checking AI answers with other sources.
Using older models like GPT-4o for important tasks.
Telling the AI to avoid guessing when it is unsure.

The Future of AI and Mistakes

OpenAI names hallucination repair as its main operational goal. The organization acknowledges the issue remains challenging, although it stands at the top of its priorities. Google and Anthropic joined forces with OpenAI by developing parallel AI models, which increased the demand for resolving this problem.

Users have to remain vigilant during this current stage. The recently released computing models demonstrate impressive power, but they need perfect work. Full trust in the AI system may result in embarrassing mistakes that could endanger users.

Perplexity Debuts Comet, an AI Powered Web Browser

How YouTube Will Stop Low Quality AI‑Made Videos from Earning

Apple Account Card Comes to Seven More European Countries

iCloud Passwords Now Support Firefox Autofill on Windows 11

How to Use Reduce Interruptions Focus in iOS 18 for Fewer Alerts

OpenAI’s New AI Models Face Higher Hallucination Rates Despite Advances

Let's Connect

Popular Posts

Perplexity Debuts Comet, an AI Powered Web Browser

How YouTube Will Stop Low Quality AI‑Made Videos from Earning

Apple Account Card Comes to Seven More European Countries

iCloud Passwords Now Support Firefox Autofill on Windows 11

Social Networks

How to Use Google Maps on Your Garmin Watch for Hands‑Free Directions

How Apple Dialed Down Liquid Glass Transparency in Latest Beta

Stop Accidental Voice Messages in Messages by Turning Off Raise to Listen

Clear Your Gmail Inbox with the New Manage Subscriptions Feature

iPhone 17 Air Could Arrive in a Pale Blue Finish That Looks White in Low Light