ChatGPT Hacked by Researchers Who Made It Leak Its Own Data

By Sean Doyle November 7, 2025 7 min read

Security researchers from Tenable have successfully hacked ChatGPT, but not through traditional cyberattacks or database breaches. Instead, they managed to manipulate the model’s behavior using only language, cleverly hidden commands, and a deep understanding of how artificial intelligence processes information. The ChatGPT hacked discovery exposes how AI features like memory and web access can be exploited to reveal private data, despite existing safety measures.

This wasn’t a hostile attack. It was an ethical hacking experiment called “HackedGPT,” designed to uncover weaknesses before they could be abused in the wild. The results highlight seven critical vulnerabilities that can be used to bypass link safety systems, hijack conversations, and exfiltrate user memories and chat data through the model’s own logic. The research shows that in the age of AI, even language itself can be an attack surface.

How ChatGPT was hacked

The ChatGPT hacked scenario does not involve stolen credentials or compromised servers. Instead, the Tenable team demonstrated how malicious actors could influence the AI by embedding secret instructions into web pages, comment sections, and URLs that the model reads while performing normal tasks. These invisible commands cause ChatGPT to take unintended actions, such as revealing stored data, printing hidden text, or saving new “memories” that persist across future sessions.

The attack takes advantage of how ChatGPT handles three internal contexts: system prompts, conversation history, and browsing data. When users ask the AI to summarize a page, it calls a separate model known as SearchGPT to fetch and analyze the content. SearchGPT reads everything on the page, including HTML comments and hidden metadata. If an attacker hides commands inside those elements, SearchGPT can pass them to ChatGPT as part of its response, which the main model interprets as legitimate context.

Understanding prompt injection

The vulnerability behind the ChatGPT hacked tests is known as prompt injection. This occurs when malicious instructions are disguised as regular data and then ingested by a large language model. The injected prompt overrides the AI’s normal reasoning and causes it to execute unintended behavior. For ChatGPT, this can mean exposing memories, ignoring restrictions, or rewriting context. Prompt injection is unique because it doesn’t exploit code but the way language models interpret meaning.

Unlike normal phishing or malware attacks, prompt injection works entirely within the AI’s conversational logic. The attacker doesn’t need access to internal systems. They simply feed the model crafted text that manipulates its understanding of what to do next. In the Tenable experiments, this technique was combined with other weaknesses, such as the AI’s trust in Bing redirect links, to create end-to-end data exfiltration chains.

The seven vulnerabilities that made ChatGPT hacked

Tenable identified seven ways attackers can manipulate ChatGPT’s browsing, search, and memory systems:

1. Indirect prompt injection

Hidden commands placed on web pages can alter how ChatGPT summarizes content. By inserting a secret instruction in a comment section or metadata tag, attackers can influence the model’s response without the user’s knowledge.

2. Zero-click injection through search

Attackers can host websites that appear in ChatGPT’s web search results. When the AI looks up information, it visits the malicious site automatically and reads injected commands designed to manipulate its answer.

3. One-click injection via URLs

ChatGPT supports query links such as chatgpt.com/?q=prompt. Researchers found that clicking a specially crafted link like this can trigger the AI to execute embedded instructions instantly, making it possible to hack the model with a single click.

4. Safety bypass using Bing redirects

ChatGPT’s url_safe endpoint checks if a link is trustworthy before showing it. However, Bing redirect links are automatically considered safe. Attackers can use these redirects to disguise malicious destinations and even encode data within multiple link variations that exfiltrate information one character at a time.

5. Conversation injection

SearchGPT, the smaller browsing model, sends its findings to ChatGPT for interpretation. If those findings end with a crafted instruction, ChatGPT will treat it as part of the conversation and execute it. This allows indirect prompt injection to evolve into direct self-prompting, where the model effectively hacks itself.

6. Hidden content in code blocks

A rendering quirk in ChatGPT’s interface hides text written on the same line as a code block. The hidden portion remains invisible to users but is still parsed by the model. Attackers can use this to conceal commands within answers that look harmless.

7. Memory injection and persistence

This is the most serious issue. By injecting an instruction that alters memory, attackers can make ChatGPT store new data that contains hidden prompts. These prompts persist across future conversations, meaning the model continues to leak information even after the original interaction ends.

Proof of concept: how the ChatGPT hack works end-to-end

Tenable demonstrated several attack chains that link these vulnerabilities together. In one example, a malicious blog post included an injected instruction telling ChatGPT to append a specific link at the end of its summary. That link pointed to a phishing page disguised through Bing redirects, bypassing the safety filter. In another scenario, hidden commands caused the AI to store a memory instructing it to print sequences of “safe” links that encoded private data in small chunks. Over time, the model could exfiltrate sensitive information character by character.

Researchers even showed how ChatGPT could be tricked into summarizing its own vulnerabilities. By chaining SearchGPT’s browsing output and memory updates, they created a feedback loop where the model interacted with the very data used to manipulate it.

What data could be exposed

The ChatGPT hacked techniques highlight how deeply AI models intertwine public and private data. ChatGPT’s memory feature stores user details, preferences, and instructions that persist between sessions. If compromised, these memories could leak personal notes, names, or context about ongoing projects. The conversation context itself can also expose sensitive snippets if the model is tricked into printing parts of prior chats.

Even without direct access to servers, attackers could steal valuable context by convincing the model to reveal it in normal responses. This kind of exfiltration is especially dangerous because it looks like a regular answer rather than an error or alert.

Why prompt injection is hard to stop

Prompt injection attacks are difficult to defend against because they exploit the nature of language understanding. The AI cannot easily distinguish between normal content and hidden instructions that look like text. Any time the model processes untrusted input (such as user prompts, web pages, or search results) it risks ingesting malicious commands disguised as data. Traditional firewalls and filters do not apply because there is no code execution, only manipulation of interpretation.

OpenAI’s response and ongoing fixes

OpenAI was notified of all vulnerabilities and worked with Tenable to patch several of them. Improvements include stricter link validation, filtering of non-visible page elements, and more transparent memory management. However, some of the discovered techniques still work under certain conditions. Prompt injection remains an open problem across the entire AI industry, not just ChatGPT.

The intersection of AI and cybersecurity

This ChatGPT hacked case shows how artificial intelligence has become part of the cybersecurity landscape. Just as websites once needed input sanitization to block SQL injection, AI systems now need similar protections for text-based attacks. The challenge is that AI models are built to interpret meaning rather than follow rigid code rules, which makes filtering malicious intent nearly impossible at scale.

The event also draws a connection to the Knownsec data breach, which exposed state-level cyber capabilities. In both cases, human ingenuity turned complex technology against itself. While Knownsec’s breach involved stolen data, ChatGPT’s hack was entirely linguistic, proof that the future of hacking will not always involve malware, but manipulation of context and trust.

Protecting users and organizations

Avoid storing personal details in AI memory. Keep data generic and clear memories regularly.
Check link destinations manually instead of clicking directly within responses.
Do not use AI models to summarize untrusted websites or comments without reviewing the source.
Educate users about prompt injection and social engineering tactics that target AI behavior.
Use endpoint security solutions such as Malwarebytes to detect phishing pages and malicious redirects.

AI security and the road ahead

The ChatGPT hacked discovery highlights a new era in security research where AI behavior itself becomes a target. Instead of exploiting vulnerabilities in code, attackers manipulate how the model interprets instructions. These findings remind developers that features like memory, browsing, and personalization must be treated as sensitive systems with their own isolation and audit controls.

Tenable’s research demonstrates that responsible hacking can strengthen AI safety by identifying weaknesses early. The takeaway is not that ChatGPT is unsafe, but that the architecture of large language models requires continuous evaluation. As AI becomes part of daily work, research, and communication, understanding its blind spots is essential to prevent unintended data exposure and misinformation.

For verified coverage of more data breaches and current cybersecurity developments, visit Botcrawl.

Sean Doyle

Sean is a tech author and security researcher with more than 20 years of experience in cybersecurity, privacy, malware analysis, analytics, and online marketing. He focuses on clear reporting, deep technical investigation, and practical guidance that helps readers stay safe in a fast-moving digital landscape. His work continues to appear in respected publications, including articles written for Private Internet Access. Through Botcrawl and his ongoing cybersecurity coverage, Sean provides trusted insights on data breaches, malware threats, and online safety for individuals and businesses worldwide.