The question behind “is our data safe if the team uses ChatGPT?”
When a founder or CTO asks whether company data is safe when staff use ChatGPT or Claude, they usually expect a yes or a no. The honest answer is that it depends entirely on which version of the tool your team is using and how it is configured. The same brand name sits on top of three very different commercial deals, and the gap between them is the difference between “your confidential data is protected by contract” and “your confidential data may be retained for years and used to train a model.”
This is general information to help you ask sharper questions, not legal advice. Before you rely on any specific term, confirm the current wording with the provider and your own legal counsel, because these policies change often, sometimes with only a few weeks of notice.
Three doors, three different deals
Both OpenAI and Anthropic sell their models through three broad routes, and each route comes with its own default settings.
- Consumer tiers: ChatGPT Free, Plus and Pro, and Claude Free, Pro and Max. Bought by individuals on personal accounts.
- Business tiers: ChatGPT Team, Enterprise and Edu, and Claude for Work, Enterprise and Education. Bought by an organisation under a business contract.
- The API and developer platform: where systems, not people, send data to the model programmatically.
The single most important thing to understand is that the default answer to “is our data used to train the model?” flips depending on which door you walk through. Get the door wrong and every other control you put in place is decoration.
Consumer tiers: your team may be training the model right now
Here is the uncomfortable part. On the consumer tiers, training is generally on by default.
For ChatGPT Free, Plus and Pro personal accounts, OpenAI uses your conversations to improve its models unless you opt out. You switch it off under Settings, Data Controls, “Improve the model for everyone.” Even with that toggle off, giving a thumbs up or thumbs down on a reply hands that specific conversation to OpenAI for review, and messages can still be retained for around 30 days for abuse and safety monitoring.
Anthropic made a similar move. Following a change announced in August 2025, Claude Free, Pro and Max users (including Claude Code on those plans) had to choose by 8 October 2025 whether to allow their chats and coding sessions to be used for training. The default was set to on. If you do not opt out, that data can be retained for up to five years. If you do opt out, it is deleted after 30 days, subject to legal and safety exceptions.
So when a member of staff pastes a client contract, a board pack or a customer list into a personal ChatGPT or Claude account, the realistic assumption is that the content is retained and, unless someone changed a setting, may be used to improve the model. That is the scenario that keeps compliance officers awake.
Business tiers: the default flips in your favour
Move to a business contract and the picture changes.
OpenAI states that it does not train on inputs or outputs from ChatGPT Team, Enterprise, Edu or the API by default. Anthropic says the same for Claude for Work, Enterprise, Education, Gov and API access: customer prompts and responses are not used to train models by default. Business use is covered by a Data Processing Agreement that addresses GDPR Article 28 processor obligations, includes standard contractual clauses for international transfers, and names the sub-processors involved.
Business tiers also give you the administrative controls that make a rollout defensible: single sign-on (SSO), user provisioning, admin-managed workspaces, audit logs and configurable retention. In plain terms, the company owns the account rather than the individual, and an administrator can see and govern usage.
The catch is that these protections attach to the business product, not to the brand. A member of your team logged into a personal Plus account gets none of them, even though the interface looks almost identical.
The API, zero data retention, and where data physically sits
If you are building AI into a product or a workflow, you are almost certainly using the API, and the rules are different again.
By default, OpenAI generates abuse-monitoring logs for API traffic and keeps them for up to 30 days before deletion, unless it is legally required to hold them longer. It does not train on API data by default. For workloads that cannot tolerate even that, OpenAI offers Zero Data Retention (ZDR) and Modified Abuse Monitoring, both subject to prior approval. Under ZDR, eligible customer content is excluded from those logs.
Anthropic offers an equivalent. Qualifying enterprise API customers can, on approval, get a Zero Data Retention agreement under which inputs and outputs are not stored beyond what is needed to screen for abuse. Worth knowing: even under ZDR, Anthropic still retains the results of its safety classifiers to enforce its usage policy, so “zero retention” is not literally zero for every byte.
Data residency matters too. OpenAI now offers data-residency controls (including European options) for eligible enterprise contracts through its sales team, with a 10% pricing uplift on data-residency endpoints for models released on or after 5 March 2026. If your data must stay in a particular region, treat residency as a contract term to confirm explicitly, not an assumption, and check the current geographic scope because it keeps expanding.
Retention is not the same as deletion
The most important lesson of the last year is that a stated retention policy can be overridden by a court.
In the New York Times copyright case against OpenAI, a magistrate judge ordered OpenAI on 13 May 2025 to preserve and segregate output logs that would normally have been deleted, including chats users had explicitly deleted, across the Free, Plus, Pro and Team tiers. That indefinite-preservation obligation for new data ended on 26 September 2025, and OpenAI returned to its standard practice of deleting removed and temporary chats within about 30 days. But in January 2026 a US district judge affirmed an order requiring OpenAI to hand a 20 million-log sample of anonymised conversations to the plaintiffs.
Two takeaways for a leader. First, “we delete after 30 days” is a default, not a guarantee, because litigation and legal holds can freeze it. Second, and more reassuring, ChatGPT Enterprise, Edu and ZDR API customers were reported to sit outside the scope of that preservation order. That is one more concrete reason to put sensitive work on a business or ZDR contract rather than a consumer login. Confirm the current position, because the case is still moving.
Shadow AI: the paste is the risk, not the tool
Notice that almost every risk above comes from the same act: a person pasting company data into a consumer account. This is “shadow AI,” the unsanctioned use of AI tools that IT never approved and cannot see. Repeated industry surveys find that a large share of employees already use AI at work, and a meaningful proportion admit to putting company or customer information into tools their employer has not vetted.
Banning AI does not fix this. It drives the behaviour underground, because the productivity pull is real and people will use their phones if they have to. The fix is to make the safe path the easy path: give people an approved, business-tier tool that is at least as good as the one they would otherwise reach for, and be explicit about what may and may not go into it.
A safe-rollout checklist
You can close most of the exposure with a short, practical programme. Work through these roughly in order.
- Publish an approved-tools list. Name the specific tiers people may use (for example, your ChatGPT Enterprise or Claude for Work workspace) and state plainly that personal Free, Plus, Pro and Max accounts are not approved for company data.
- Buy the business tier and put everyone on it. The default no-training protection, DPA and admin controls only apply to the business contract, so this is the highest-leverage single step.
- Sign and file the DPA. Confirm it covers GDPR Article 28, international-transfer clauses and the sub-processor list, and store it where your auditors can find it.
- Enforce SSO and provisioning. Route access through your identity provider so joiners and leavers are handled automatically and usage is attributable to a named person.
- Turn off training and set retention deliberately. Even on business tiers, check the toggles and configure retention to match your data-classification policy rather than accepting whatever the defaults happen to be.
- Decide whether you need ZDR or data residency. For regulated or highly sensitive workloads, pursue a zero-data-retention agreement and confirm the processing region in writing.
- Redact and minimise at the point of entry. Train people, and better still build tooling, to strip names, card numbers and secrets before data reaches the model. The safest data is the data you never send.
- Classify your data first. None of the above works if nobody knows which documents are confidential, so a simple traffic-light classification usually has to come before everything else.
- Review quarterly. These terms change, so re-check provider policies and your own settings on a schedule rather than assuming last year’s answer still holds.
The safe path is an engineering decision
Treat all of this as an engineering choice rather than a policy memo. The controls that keep data safe (the right tier, no-training settings, a signed DPA, redaction at the point of entry) are the same controls that make an AI system reliable and auditable. It is the mindset behind how we build at Rogue. Our own prospecting engine runs every draft through an adversarial verifier, a separate model acting as a judge, before anything is auto-sent, so nothing leaves the system unchecked. When we helped retire a 779-column legacy system at CoolKit, the whole task was mapping exactly where sensitive data lived and where it was allowed to move. Whether you buy AI off the shelf or build it into your workflows, the principle holds: know which door your team is using, make the safe door the default, and verify the settings rather than trusting the brand name on the box.