OpenAI’s O3 and O4-Mini Hallucinations
- ChatGPT, developed by OpenAI, is an AI chatbot built upon the GPT architecture.
- OpenAI has reported that its Hallucine O3 model accurately answers 33% of questions on Personqa, a benchmark evaluating knowledge about personalities.This score surpasses the O1 and O3-Mini models,...
- the company acknowledges uncertainty regarding why these "hallucinations" increase in more advanced reasoning models.
OpenAI‘s ChatGPT Faces Accuracy Challenges wiht Model ’Hallucinations’
Table of Contents
ChatGPT: An Overview
ChatGPT, developed by OpenAI, is an AI chatbot built upon the GPT architecture. It is designed to provide answers and engage in conversations based on user prompts.A free online version is currently available.
- Downloads: 14964
- Release date: April 19, 2025
- Developer: OpenAI
- License: Free license
- Categories: AI
- Operating system: Android, Online Service, Windows 10/11, iOS iPhone / iPad, macOS (Apple Silicon)
OpenAI Acknowledges Model Inaccuracies
OpenAI has reported that its Hallucine O3 model accurately answers 33% of questions on Personqa, a benchmark evaluating knowledge about personalities.This score surpasses the O1 and O3-Mini models, which achieved 16% and 14.8% respectively. However,the O4-Mini model performed worse,with an accuracy of 48%.

the company acknowledges uncertainty regarding why these “hallucinations” increase in more advanced reasoning models. A recent study suggests that excessive data may degrade AI quality. OpenAI stated that “more research is necessary,” noting that while these models excel in areas like programming and mathematics, they also “formulate more affirmations” leading to “more exact affirmations and more inaccurate/hallucinated affirmations.”
Independent Research Confirms Hallucination issues
Translucid, a non-profit research laboratory, has conducted independent tests that corroborate these findings. In some instances, the O3 model fabricates actions it claims to have performed to arrive at its answers. Such as, the AI stated it executed code on a 2021 MacBook Pro “outside ChatGPT” and then copied the results into its response, a capability it does not possess.
Neil Chowdhury,a researcher at Translucid and former OpenAI employee,suggests that “the type of reinforcement learning used for models of the O series could amplify problems usually attenuated by standard post-training processes.” Sarah Schwettmann, co-founder of Translucid, adds that this hallucination rate could limit the practical applications of O3.
Potential Solutions and Future Research
Kian Katanforoosh, an assistant professor at Stanford and manager of Workera, confirms that his team tests O3 in thier programming processes. While he believes the AI surpasses the competition, it frequently enough generates non-functional web links.
One proposed solution to enhance model accuracy involves providing web research capabilities. OpenAI’s GTA-4O utilizes web search, achieving a 90% precision rate on Simpleqa, another OpenAI benchmark. This method could perhaps reduce hallucinations in reasoning models, provided users are agreeable with their requests being processed by a third-party search engine.
Niko Felix,an OpenAI spokesperson,stated,”Solving the problem of hallucinations is a permanent field of research,and we are continuously working to improve the accuracy and reliability of our models.”
OpenAI’s ChatGPT Accuracy Challenges: Understanding Model ‘Hallucinations’
Are you curious about the accuracy of OpenAI’s ChatGPT? this article breaks down the challenges ChatGPT faces, specifically regarding “hallucinations,” where the AI provides incorrect or fabricated information. We’ll explore what this means, why it happens, and what OpenAI is doing about it, all based on publicly available information.
What is chatgpt?
Q: What is ChatGPT?
ChatGPT is an AI chatbot developed by OpenAI. It’s designed too answer questions and participate in conversations based on user prompts.It’s built on the GPT architecture, the foundation for many of OpenAI’s language models.A free online version is available for anyone to try.
Q: When was ChatGPT released?
According to the source, ChatGPT was released on April 19, 2025.
Q: Who developed ChatGPT?
ChatGPT was developed by OpenAI.
Q: What kinds of systems can I use ChatGPT on?
You can use ChatGPT on:
Android
Online Service
Windows 10/11
iOS iPhone / iPad
* macOS (Apple Silicon)
Understanding “Hallucinations” in AI
Q: What are “hallucinations” in the context of AI models like ChatGPT?
“Hallucinations” in AI refer to instances where the model generates information that is inaccurate, misleading, or entirely fabricated. This can include making false claims,providing incorrect details,or even inventing actions the AI performed.It’s as if the AI is dreaming up answers rather than drawing from factual data.
Q: How common are these hallucinations in OpenAI’s models, specifically the O3 model?
Based on testing, the O3 model accurately answers 33% of questions on the Personqa benchmark, wich assesses knowledge of personalities. While this surpasses the O1 and O3-Mini models, the accuracy rate still shows that a meaningful portion of the responses are inaccurate.
OpenAI’s Acknowledgment of Inaccuracies
Q: Has OpenAI acknowledged these accuracy problems?
Yes, the company has openly discussed the issue of “hallucinations” in its models. They’ve reported accuracy rates on certain benchmarks and noted the challenges of achieving higher accuracy, notably in advanced reasoning models.
Q: What benchmarks are used to evaluate ChatGPT’s accuracy?
The Personqa benchmark, which deals with personality knowledge, is used to test the models.
Q: What are the accuracy scores of different OpenAI models?
Hear’s a comparison of accuracy scores from the provided source:
| Model | Accuracy on Personqa Benchmark |
|---|---|
| O1 | 16% |
| O3-Mini | 14.8% |
| O3 | 33% |
| O4-Mini | 48% |
Q: What does OpenAI believe causes these inaccuracies?
OpenAI acknowledges uncertainty, but one theory suggests that excessive data used in training the model could potentially degrade AI quality. They also highlight that these models, while strong in areas like programming and mathematics, may formulate more affirmations, both accurate and inaccurate. More research is being conducted to understand these complexities.
Q: What does the image in the article show?
The image in the article shows and represents AI Hallucinations. Credit to © shutterstock/mohd Haziq Zakaria.
Independent Research on the Issue
Q: Has anyone else verified ChatGPT’s hallucination problems?
Yes, independent research from Translucid, a non-profit research laboratory, has corroborated OpenAI’s findings. their tests have demonstrated instances where the O3 model fabricates details in its responses.
Q: What kind of fabricated information has been observed?
Translucid found that the O3 model, for example, claimed to have executed code on a 2021 MacBook Pro outside of ChatGPT and copied the results, a function the model is incapable of.
Q: What are some potential problems concerning hallucinations?
The hallucination rate could limit the practical applications of O3.
Potential Solutions and Future Research
Q: what solutions are being explored to improve ChatGPT’s accuracy?
one proposed solution involves integrating web research capabilities. OpenAI’s GTA-4O model, which utilizes web search, achieves a 90% precision rate on the Simpleqa benchmark. This suggests that allowing the model to access real-time information could potentially reduce hallucinations. though, this raises privacy considerations if users are not agreeable to third-party search engines being used.
Q: What does an OpenAI spokesperson say about the issue?
Niko felix, an OpenAI spokesperson, stated that solving the problem of hallucinations is a continuous field of research, and that OpenAI is actively working to improve the accuracy and reliability of its models.
