April 5, 2023

Our approach to AI safety

Ensuring that AI systems are built, deployed, and used safely is critical to our mission.

Illustration: Justin Jay Wang × DALL·E

Loading…

OpenAI is committed to keeping powerful AI safe and broadly beneficial⁠. We know our AI tools provide many benefits to people today. Our users around the world have told us that ChatGPT helps to increase their productivity, enhance their creativity, and offer tailored learning experiences. We also recognize that, like any technology, these tools come with real risks—so we work to ensure safety is built into our system at all levels.

Building increasingly safe AI systems

Prior to releasing any new system we conduct rigorous testing, engage external experts for feedback, work to improve the model's behavior with techniques like reinforcement learning with human feedback, and build broad safety and monitoring systems.

For example, after our latest model, GPT‑4, finished training, we spent more than 6 months working across the organization to make it safer and more aligned prior to releasing it publicly.

We believe that powerful AI systems should be subject to rigorous safety evaluations. Regulation is needed to ensure that such practices are adopted, and we actively engage with governments on the best form such regulation could take.

Learning from real-world use to improve safeguards

We work hard to prevent foreseeable risks before deployment, however, there is a limit to what we can learn in a lab⁠. Despite extensive research and testing, we cannot predict all of the beneficial ways people will use our technology⁠, nor all the ways people will abuse it. That’s why we believe that learning from real-world use is a critical component of creating and releasing increasingly safe AI systems over time.

We cautiously and gradually release new AI systems—with substantial safeguards in place—to a steadily broadening group of people and make continuous improvements based on the lessons we learn.

We make our most capable models available through our own services and through an API so developers can build this technology directly into their apps. This allows us to monitor for and take action on misuse, and continually build mitigations that respond to the real ways people misuse our systems—not just theories about what misuse might look like.

Real-world use has also led us to develop increasingly nuanced policies against behavior that represents a genuine risk to people while still allowing for the many beneficial uses of our technology.

Crucially, we believe that society must have time to update and adjust to increasingly capable AI, and that everyone who is affected by this technology should have a significant say in how AI develops further. Iterative deployment has helped us bring various stakeholders into the conversation about the adoption of AI technology more effectively than if they hadn't had firsthand experience with these tools.

Protecting children

One critical focus of our safety efforts is protecting children. We require that people must be 18 or older—or 13 or older with parental approval—to use our AI tools and are looking into verification options.

We do not permit⁠ our technology to be used to generate hateful, harassing, violent or adult content, among other categories. Our latest model, GPT‑4 is 82% less likely to respond to requests for disallowed content compared to GPT‑3.5 and we have established a robust system to monitor for abuse. GPT‑4 is now available to ChatGPT Plus subscribers and we hope to make it available to even more people over time.

We have made significant effort to minimize the potential for our models to generate content that harms children. For example, when users try to upload known Child Sexual Abuse Material to our image tools, we use Thorn’s Safer to detect, review and report it to the National Center for Missing and Exploited Children.

In addition to our default safety guardrails, we work with developers like the non-profit Khan Academy—which has built an AI-powered assistant⁠ that functions as both a virtual tutor for students and a classroom assistant for teachers—on tailored safety mitigations for their use case. We are also working on features that will allow developers to set stricter standards for model outputs to better support developers and users who want such functionality.

Respecting privacy

Our large language models are trained on a broad corpus of text that includes publicly available content, licensed content, and content generated by human reviewers. We don’t use data for selling our services, advertising, or building profiles of people—we use data to make our models more helpful for people. ChatGPT, for instance, improves by further training on the conversations people have with it.

While some of our training data includes personal information that is available on the public internet, we want our models to learn about the world, not private individuals. So we work to remove personal information from the training dataset where feasible, fine-tune models to reject requests for personal information of private individuals, and respond to requests from individuals to delete their personal information from our systems. These steps minimize the possibility that our models might generate responses that include the personal information of private individuals.

Improving factual accuracy

Today’s large language models predict the next series of words based on patterns they have previously seen, including the text input the user provides. In some cases, the next most likely words may not be factually accurate.

Improving factual accuracy is a significant focus for OpenAI and many other AI developers, and we’re making progress. By leveraging user feedback on ChatGPT outputs that were flagged as incorrect as a main source of data—we have improved the factual accuracy of GPT‑4. GPT‑4 is 40% more likely⁠(opens in a new window) to produce factual content than GPT‑3.5.

When users sign up to use the tool, we strive to be as transparent as possible that ChatGPT may not always be accurate. However, we recognize that there is much more work to do to further reduce the likelihood of hallucinations and to educate the public on the current limitations of these AI tools.

Continued research and engagement

We believe that a practical approach to solving AI safety concerns is to dedicate more time and resources to researching effective mitigations and alignment techniques and testing them against real-world abuse.

Importantly, we also believe that improving AI safety and capabilities should go hand in hand. Our best safety work to date has come from working with our most capable models because they are better at following users’ instructions and easier to steer or “guide.”

We will be increasingly cautious with the creation and deployment of more capable models, and will continue to enhance safety precautions as our AI systems evolve.

While we waited over 6 months to deploy GPT‑4 in order to better understand its capabilities, benefits, and risks, it may sometimes be necessary to take longer than that to improve AI systems' safety. Therefore, policymakers and AI providers will need to ensure that AI development and deployment is governed effectively at a global scale, so no one cuts corners to get ahead. This is a daunting challenge requiring both technical and institutional innovation, but it’s one that we are eager to contribute to.

Addressing safety issues also requires extensive debate, experimentation, and engagement, including on the bounds⁠ of AI system behavior. We have and will continue to foster collaboration⁠ and open dialogue among stakeholders to create a safe AI ecosystem.