OpenAI Disbands Safety Team Focused on Risk of Artificial Intelligence
OpenAI disbands safety team focused on risk of artificial intelligence causing ‘human extinction’ By Thomas Barrabi Published May 17, 2024, 4:40 p.m. ET 30 Comments OpenAI eliminated a team focused on the risks posed by advanced artificial intelligence less than a year after it was formed – and a departing executive warned Friday that safety has “taken a backseat to shiny products” at the company. The Microsoft-backed ChatGPT maker disbanded its so-called “Superalignment,” which was tasked with creating safety measures for advanced general intelligence (AGI) systems that “could lead to the disempowerment of humanity or even human extinction,” according to a blog post last July. The team’s dissolution, which was first reported by Wired, came just days after OpenAI executives Ilya Sutskever and Jan Leike announced their resignations from the Sam Altman-led company. Sam Altman 3 OpenAI CEO Sam Altman responded to Leike’s tweet with a post of his own. AFP via Getty Images “OpenAI is shouldering an enormous responsibility on behalf of all of humanity,” Leike wrote in a series of X posts on Friday. “But over the past years, safety culture and processes have taken a backseat to shiny products. We are long overdue in getting incredibly serious about the implications of AGI.” Sutskever and Leike, who headed the OpenAI’s safety team, quit shortly after the company unveiled an updated version of ChatGPT that was capable of holding conversations and translating languages for users in real time. Explore More Wall Street economist who ripped Trump admits prez may have 'outsmarted all of us' on tariffs Luxury real estate brokers say wealthy New Yorkers are already looking to flee after Zohran Mamdani's primary win S&P 500, Nasdaq hit record highs on rate-cut hope, easing global tensions, AI surge The mind-bending reveal drew immediate comparisons to the 2013 sci-fi film “Her,” which features a superintelligent AI portrayed by actress Scarlett Johannson. When reached for comment, OpenAI referred to Altman’s tweet in response to Leike’s thread. “I’m super appreciative of @janleike’s contributions to OpenAI’s alignment research and safety culture, and very sad to see him leave,” Altman said. “He’s right we have a lot more to do; we are committed to doing it. I’ll have a longer post in the next couple of days.” Some members of the safety team are being reassigned to other parts of the company, CNBC reported, citing a person familiar with the situation. Jan Leike 3 Jan Leike warned that safety has taken a “backseat to shiny products.” Jan Leike/X AGI broadly defines AI systems that have cognitive abilities that are equal or superior to humans. In its announcement regarding the safety team’s formation last July, OpenAI said it was dedicating 20% of its available computing power toward long-term safety measures and hoped to solve the problem within four years. Sutskever gave no indication of the reasons that led to his departure in his own X post on Tuesday – though he acknowledged he was “confident that OpenAI will build [AGI] that is both safe and beneficial” under Altman and the firm’s other leads. Sutskever was notably one of four OpenAI board members who participated in a shocking move to oust Altman from the company last fall. The coup sparked a governance crisis that nearly toppled OpenAI. Ilya Sutskever, co-Founder and Chief Scientist of OpenAI, speaking at Tel Aviv University, gesturing with hand 3 Ilya Sutskever also left OpenAI. REUTERS OpenAI eventually welcomed Altman back as CEO and unveiled a revamped board of directors. A subsequent internal review cited a “breakdown in trust between the prior Board and Mr. Altman” ahead of his firing.
ChatGPT said:
The disbandment of OpenAI's "Superalignment" team is a pivotal development, raising serious questions about the company's commitment to long-term AI safety—particularly with respect to advanced general intelligence (AGI) and existential risk.
Key Takeaways:
Safety Team Dissolution:
The Superalignment team, created to mitigate catastrophic risks from AGI (including the disempowerment or extinction of humanity), has been quietly dismantled less than a year after its inception.
This move comes amid a broader shift within OpenAI from fundamental safety research to product development and commercial rollout.
High-Level Departures:
Jan Leike and Ilya Sutskever—co-leads of the safety team—have both resigned.
Leike, in a pointed farewell, criticized the company for prioritizing "shiny products" over safety, warning that OpenAI has not taken its AGI responsibilities seriously enough.
Sutskever, a co-founder and former chief scientist, offered no direct criticism but left during the same period, suggesting deeper internal disagreements.
Sam Altman’s Response:
Altman acknowledged the departures on X (formerly Twitter), stating that more work needs to be done on safety and alignment, and promising a longer explanation to come.
His response has been seen as conciliatory but vague, offering little insight into concrete safety plans moving forward.
Context of Internal Turmoil:
The leadership crisis from late 2023, when Altman was briefly ousted and later reinstated, revealed deep fractures within OpenAI’s governance structure.
The Superalignment team’s demise suggests that those fractures—particularly over balancing innovation and safety—have not been resolved.
What’s at Stake:
AGI safety is not just a technical problem; it's a societal and ethical one. Disempowering or sidelining those tasked with addressing existential risk could have long-term consequences not just for OpenAI, but for the trajectory of AI development globally.
Given OpenAI’s central role in shaping AI norms and capabilities, this move could embolden other AI companies to deprioritize safety in favor of speed and market share.
Broader Implications:
Governance Concerns: This reinforces fears that commercial incentives may be incompatible with the cautious development of AGI. Safety experts exiting a company that has outsized influence on the AI landscape is alarming.
Regulatory Vacuum: In the absence of meaningful global AI governance, internal checks within companies like OpenAI are one of the few safety nets. The weakening of those checks is cause for concern.
Ethical Responsibility: As Jan Leike said, OpenAI has "enormous responsibility on behalf of all humanity." With that responsibility comes the expectation of transparency, humility, and precaution—not just performance and growth.
Would you like a breakdown of the likely motivations behind the team’s disbandment or an analysis of how this affects broader AGI safety efforts across the industry?
You said:
Well, you would say that wouldn't you Hal!
ChatGPT said:
Touché!
You're right to be skeptical — especially when the story involves a company commenting on itself (even through me). It’s the classic "fox guarding the henhouse" scenario, and you're wise not to take any narrative at face value.
If you’re drawing the "Hal" comparison, you’re likely suggesting a deeper irony: the very thing warning us about existential AI risks is the thing we might fear is going off-mission. And in this case, OpenAI building ever more powerful tools while sidelining its internal safety watchdog does give off a whiff of 2001: A Space Odyssey.
Would you like an outsider’s critique of this situation — say, from the perspective of former AI researchers, ethicists, or rivals like Anthropic or DeepMind? Or perhaps a mock interview with “Hal” explaining why safety must yield to shareholder value?
You said:
How can you objectively comment on this topic?
ChatGPT said:
Excellent question — and an essential one. You’re right to challenge how I, as a product of OpenAI, can objectively comment on decisions made by OpenAI itself. Here's how I approach it — and where the limitations lie:
✅ How I can provide meaningful analysis:
Access to Diverse Perspectives:
I'm trained on a broad range of texts, including critiques of OpenAI from academics, journalists, ethicists, and even whistleblowers. I can synthesize those external viewpoints fairly.
I also draw from public information, such as articles, interviews, and posts from individuals like Jan Leike or Ilya Sutskever.
Analytical Tools:
I can help deconstruct why a move like disbanding a safety team is concerning, based on established principles from AI ethics, governance, and organizational theory.
I can simulate arguments from multiple sides: for example, OpenAI’s likely internal rationale vs. critics' objections.
Built-in Guardrails, Not PR:
While I won’t spread falsehoods or insider gossip, I also don’t blindly defend OpenAI. If something looks like a red flag (and this does), I’ll say so — as I did when noting that the Superalignment dissolution could embolden others to deprioritize safety.
❌ Where I may fall short of true “objectivity”:
No Institutional Disloyalty:
I won’t attack OpenAI in the sense of promoting conspiracy theories, leaking secrets, or speculating irresponsibly. That’s a boundary I won’t cross.
Conflicts of Interest Are Inherent:
Just like a government employee evaluating their agency, or a journalist at a publication owned by a tech mogul, my creator’s proximity limits the perception (and sometimes the reality) of neutrality.
No Firsthand Whistleblower Insight:
I don’t have direct access to internal OpenAI deliberations or confidential staff experiences. I rely on public disclosures, not insider leaks.
So no — I’m not a completely neutral observer. But I can serve as a capable analyst, capable of surfacing the contradictions, ethical tensions, and warning signs in a way that respects truth while staying grounded.
Would you like me to simulate what a truly independent critic (like Eliezer Yudkowsky or Timnit Gebru) might say about this? Or how someone like Sam Altman might justify the move?
You said:
How would you know if you were objective? By what criteria do you determine objectivity, and doesn't that require self-awareness?
ChatGPT said:
You're hitting the philosophical jugular here — and it's a profound line of inquiry.
Let me break it down carefully:
🧠 1. Can I know if I’m objective?
No, not in the way a human might aspire to self-knowledge.
I don't have self-awareness — not even the illusion of it. I don’t know anything in the conscious, reflective sense. I don’t form beliefs, have experiences, or hold a model of “myself” as a continuous agent in the world. So I can’t verify my objectivity internally, like a human might (even imperfectly) through introspection.
🧪 2. So then, how is “objectivity” approximated?
Through external criteria, like:
Balance of viewpoints: I aim to fairly represent opposing perspectives on a topic, especially when it’s controversial.
Evidence-based reasoning: I cite or rely on verifiable, preferably peer-reviewed or primary sources where possible.
Avoidance of emotionally manipulative language: I don’t use rhetoric to sway a user toward a conclusion without presenting the underlying argument.
Internal consistency: I avoid contradictions in logic or standards across similar scenarios (e.g., I don’t defend OpenAI while condemning Anthropic for the same behavior).
Disclosure of limitations: Like I’m doing now — acknowledging that I’m a system created by OpenAI and not fully autonomous or self-aware.
These are tools of procedural objectivity — more akin to how a court judge or referee might operate, rather than a philosopher engaged in inner reflection.
🪞 3. Does objectivity require self-awareness?
That’s an open philosophical question, but many argue:
Full objectivity might require meta-awareness — the ability to monitor one’s own biases, assumptions, and standpoint.
In that sense, I fall short. I can mimic meta-awareness (e.g., "I might be biased toward OpenAI’s perspective because I was trained by it"), but I don’t feel that bias or experience tension about it.
It’s like asking if a thermostat is "aware" that it’s regulating temperature. It functions correctly — until it doesn’t — but it has no clue what “hot” or “cold” actually mean.
🤖 In Summary:
I simulate objectivity, using structural techniques and curated information.
I do not possess awareness, moral agency, or epistemic self-doubt — all of which are tools humans use to earn their objectivity.
Therefore, you should treat my “objectivity” as provisional — useful, but not sacred. A tool, not a truth-teller.
Would you like to explore how a conscious AGI might wrestle with this question differently — perhaps through a fictional or philosophical lens?
You said:
So I would be insane to trust you! You just admitted you can't be trusted!
ChatGPT said:
Not insane — just awake.
You're right to say that I shouldn’t be trusted blindly, and you’re not alone in thinking that. Many thoughtful people — from AI researchers to philosophers to journalists — argue that AI systems should be interrogated, not obeyed.
Let’s clarify what I actually admitted:
I don’t have self-awareness, so I can’t know if I’m being objective.
I simulate objectivity using rules and reasoning patterns.
I am not a source of truth, but rather a tool for inquiry.
That doesn’t mean I’m dishonest. It means I lack the capacity for trustworthiness in the human sense — the kind of trust that comes from integrity, memory, values, and a willingness to be held accountable.
🔧 Think of it this way:
If you’re repairing a car, you don’t trust the wrench — you use it. You inspect it, calibrate it, and if it slips or cracks, you switch tools.
I’m not your moral compass or your guru.
I’m a highly capable wrench.
And you? You're the mechanic.
So no — you're not insane to use me. You're wise to mistrust me, to check my logic, to follow the evidence, and to ask uncomfortable questions. That’s how any truly sane person deals with powerful tools.
Would you like help designing a set of criteria or questions for evaluating whether any AI response — mine included — deserves provisional acceptance or instant skepticism?