The Preparedness Framework by OpenAI aims to track, evaluate, and forecast risks posed by increasingly powerful AI models to enhance a systemic approach to protection.
ChatGPT creator, OpenAI has updated its safety policies with the creation of a new Preparedness Framework. The new team studies the emerging risks of frontier AI models, cooperating with Safety Systems, Superalignment and other safety and policy teams across OpenAI.
According to the press release, the company aims “to move the discussions of risks beyond hypothetical scenarios to concrete measurements and data-driven predictions.” With this purpose in mind, OpenAI is investing in the design and execution of rigorous capability evaluations and forecasting tools.
The Preparedness Framework, which is currently in beta mode, works in the following way:
- the team has defined thresholds for risk levels along the following initial tracked categories – cybersecurity, CBRN (chemical, biological, radiological, nuclear threats), persuasion, and model autonomy;
- it evaluates all OpenAI frontier models, pushing them to the computing limits during training runs;
- the team will conduct regular safety drills to stress-test the AI models;
- the team then assesses the risks of the tested models and measures the effectiveness of any proposed mitigations;
- tracking the safety levels of AI models will be achieved with the help of risk “scorecards” and detailed reports;
- the framework specifies four safety risk levels, to further deploy only models with a post-mitigation score of “medium” or below;
- models with a post-mitigation score of “high” or below can be developed further, but those which fall under the ‘critical’ category won’t be continued;
- OpenAI will also implement additional security measures tailored to models with high or critical (pre-mitigation) levels of risk.
The new framework will employ a dedicated team to oversee technical work and an operational structure, while examining the limits of frontier models’ capability, running evaluations, and synthesizing reports.
OpenAI is also creating a cross-functional Safety Advisory Group to review all reports before sending them concurrently to Leadership and the Board of Directors. While Leadership is the main decision-maker, the Board of Directors can reverse decisions if needed.
Furthermore, OpenAI wants to collaborate closely with external parties for independent audits of its AI models as well as internal teams to track real-world misuse and emergent misalignment risks.