Our approach to AI safety
Ensuring that AI systems are built, deployed, and used safely is critical to our mission.

Illustration: Justin Jay Wang à DALL·E
OpenAI is committed to keeping powerful AI safe and broadly beneficialâ . We know our AI tools provide many benefits to people today. Our users around the world have told us that ChatGPT helps to increase their productivity, enhance their creativity, and offer tailored learning experiences. We also recognize that, like any technology, these tools come with real risksâso we work to ensure safety is built into our system at all levels.
Prior to releasing any new system we conduct rigorous testing, engage external experts for feedback, work to improve the model's behavior with techniques like reinforcement learning with human feedback, and build broad safety and monitoring systems.
For example, after our latest model, GPTâ4, finished training, we spent more than 6 months working across the organization to make it safer and more aligned prior to releasing it publicly.
We believe that powerful AI systems should be subject to rigorous safety evaluations. Regulation is needed to ensure that such practices are adopted, and we actively engage with governments on the best form such regulation could take.
We work hard to prevent foreseeable risks before deployment, however, there is a limit to what we can learn in a labâ . Despite extensive research and testing, we cannot predict all of the beneficial ways people will use our technologyâ , nor all the ways people will abuse it. Thatâs why we believe that learning from real-world use is a critical component of creating and releasing increasingly safe AI systems over time.
We cautiously and gradually release new AI systemsâwith substantial safeguards in placeâto a steadily broadening group of people and make continuous improvements based on the lessons we learn.
We make our most capable models available through our own services and through an API so developers can build this technology directly into their apps. This allows us to monitor for and take action on misuse, and continually build mitigations that respond to the real ways people misuse our systemsânot just theories about what misuse might look like.
Real-world use has also led us to develop increasingly nuanced policies against behavior that represents a genuine risk to people while still allowing for the many beneficial uses of our technology.
Crucially, we believe that society must have time to update and adjust to increasingly capable AI, and that everyone who is affected by this technology should have a significant say in how AI develops further. Iterative deployment has helped us bring various stakeholders into the conversation about the adoption of AI technology more effectively than if they hadn't had firsthand experience with these tools.
One critical focus of our safety efforts is protecting children. We require that people must be 18 or olderâor 13 or older with parental approvalâto use our AI tools and are looking into verification options.
We do not permitâ our technology to be used to generate hateful, harassing, violent or adult content, among other categories. Our latest model, GPTâ4 is 82% less likely to respond to requests for disallowed content compared to GPTâ3.5 and we have established a robust system to monitor for abuse. GPTâ4 is now available to ChatGPT Plus subscribers and we hope to make it available to even more people over time.
We have made significant effort to minimize the potential for our models to generate content that harms children. For example, when users try to upload known Child Sexual Abuse Material to our image tools, we use Thornâs Safer to detect, review and report it to the National Center for Missing and Exploited Children.
In addition to our default safety guardrails, we work with developers like the non-profit Khan Academyâwhich has built an AI-powered assistantâ that functions as both a virtual tutor for students and a classroom assistant for teachersâon tailored safety mitigations for their use case. We are also working on features that will allow developers to set stricter standards for model outputs to better support developers and users who want such functionality.
Our large language models are trained on a broad corpus of text that includes publicly available content, licensed content, and content generated by human reviewers. We donât use data for selling our services, advertising, or building profiles of peopleâwe use data to make our models more helpful for people. ChatGPT, for instance, improves by further training on the conversations people have with it.
While some of our training data includes personal information that is available on the public internet, we want our models to learn about the world, not private individuals. So we work to remove personal information from the training dataset where feasible, fine-tune models to reject requests for personal information of private individuals, and respond to requests from individuals to delete their personal information from our systems. These steps minimize the possibility that our models might generate responses that include the personal information of private individuals.
Todayâs large language models predict the next series of words based on patterns they have previously seen, including the text input the user provides. In some cases, the next most likely words may not be factually accurate.
Improving factual accuracy is a significant focus for OpenAI and many other AI developers, and weâre making progress. By leveraging user feedback on ChatGPT outputs that were flagged as incorrect as a main source of dataâwe have improved the factual accuracy of GPTâ4. GPTâ4 is 40% more likelyâ (opens in a new window) to produce factual content than GPTâ3.5.
When users sign up to use the tool, we strive to be as transparent as possible that ChatGPT may not always be accurate. However, we recognize that there is much more work to do to further reduce the likelihood of hallucinations and to educate the public on the current limitations of these AI tools.
We believe that a practical approach to solving AI safety concerns is to dedicate more time and resources to researching effective mitigations and alignment techniques and testing them against real-world abuse.
Importantly, we also believe that improving AI safety and capabilities should go hand in hand. Our best safety work to date has come from working with our most capable models because they are better at following usersâ instructions and easier to steer or âguide.â
We will be increasingly cautious with the creation and deployment of more capable models, and will continue to enhance safety precautions as our AI systems evolve.
While we waited over 6 months to deploy GPTâ4 in order to better understand its capabilities, benefits, and risks, it may sometimes be necessary to take longer than that to improve AI systems' safety. Therefore, policymakers and AI providers will need to ensure that AI development and deployment is governed effectively at a global scale, so no one cuts corners to get ahead. This is a daunting challenge requiring both technical and institutional innovation, but itâs one that we are eager to contribute to.
Addressing safety issues also requires extensive debate, experimentation, and engagement, including on the boundsâ of AI system behavior. We have and will continue to foster collaborationâ and open dialogue among stakeholders to create a safe AI ecosystem.


