LLMs and Customer Data: A UK GDPR Playbook

An LLM does not rewrite the rules, it just moves the risk

When you paste a customer record into a chatbot, or wire a customer database into an agent, you have not entered a new legal category. You are still a data controller processing personal data, and UK GDPR still applies in full. The Data (Use and Access) Act 2025, which received Royal Assent on 19 June 2025, adjusted parts of the regime, but it did not exempt AI from data protection. If anything it raised the stakes, because large language models make it trivially easy to send far more personal data to a third party than you ever meant to.

This guide is general information, not legal advice. Data protection law is fact-specific and still settling after the 2025 reforms, so treat what follows as a practical starting framework and confirm the current position with a qualified adviser before you rely on it.

The good news: the obligations are old and well understood. Lawful basis, data minimisation, purpose limitation, transparency, a DPIA when the risk is high, safeguards on automated decisions, care with international transfers, and a contract with your processor. An LLM does not change the checklist. It just makes each item easier to get wrong at speed.

Get your lawful basis straight before the data moves

Article 6 of the UK GDPR gives you six lawful bases for processing personal data. For feeding customer data to an LLM, two matter in practice: consent and legitimate interests.

Consent sounds clean but is often the wrong tool. Consent under UK GDPR must be specific, informed, freely given and as easy to withdraw as to give. If your AI use is bundled into a general privacy notice, or the customer cannot realistically say no and still get the service, that “consent” is not valid. Consent works well for genuinely optional features, such as an AI assistant a user chooses to switch on, and badly as a blanket justification for back-office processing.

Legitimate interests is usually the better fit for operational uses like summarising support tickets, drafting replies, or enriching a record, provided you can pass the three-part test: identify a real interest, show the processing is necessary for it (not merely convenient), and balance it against the individual’s rights and reasonable expectations. The ICO has confirmed that legitimate interests can cover AI processing, but you have to document the balancing act, not assume it. Write it down before the data moves, not after a complaint arrives.

One nuance from the 2025 reforms: they introduce a category of “recognised legitimate interests” where the balancing test is relaxed for specific listed purposes. Do not assume your AI feature qualifies. The list is narrow, and general product improvement is not on it.

Send the smallest set of facts, and hold the purpose line

Data minimisation (Article 5(1)(c)) says you should process only what is adequate, relevant and limited to what is necessary. This is the single most powerful lever you have with LLMs, and the most ignored.

The lazy pattern is to dump an entire customer object into the prompt because it is easier than deciding what the model actually needs. That is exactly the wrong instinct. If the task is “draft a delivery-delay apology”, the model needs the order status and the first name, not the full address, the payment token, the lifetime value and the whole support history.

At Rogue we call this the smallest set of facts. Before any prompt is built, we ask what is the minimum the model needs to do this job, and we strip everything else. It is the same discipline that let us help retire a 779-column legacy system at CoolKit: most of those columns were never load-bearing, and knowing which few mattered is what made the system safe to change. Fewer fields means less exposure, a smaller blast radius if the vendor is ever breached, and, usefully, better model output, because you are not burying the signal in noise.

Practical minimisation moves: redact or tokenise identifiers before they leave your systems, pass an internal reference instead of a name where you can, aggregate rather than send row-level data, and prefer retrieval that returns the one specific fact over stuffing whole documents into context.

Then hold the purpose line. Purpose limitation (Article 5(1)(b)) says personal data collected for one purpose should not be reused for an incompatible one, and LLMs invite scope creep because the same data pipe can suddenly do a dozen new things. If you collected support emails to resolve tickets, using them to fine-tune a model or to profile customers for upsell is a new purpose that needs its own basis and its own transparency. Name the purpose for each AI use, check it against why the data was collected, and update your privacy notice under Articles 13 and 14 so people can actually see what you do. Vague notices are one of the failings the ICO has repeatedly called out around AI.

Article 22, the reforms, and keeping a human in the loop

This is the area that changed most recently, so read it carefully and check the current position, because the ICO guidance was still in consultation in 2026.

The old Article 22 treated solely automated decisions with legal or similarly significant effects (approving a loan, rejecting a job application, cancelling an account) as broadly prohibited unless a narrow exception applied. The Data (Use and Access) Act 2025 repealed Article 22 and replaced it with new provisions, commonly referred to as Articles 22A to 22D. The reframing matters: for most personal data, solely automated decisions with significant effects are now permitted provided you put safeguards in place, rather than being banned by default. Where special category data (Article 9, such as health or ethnicity) drives the decision, the stricter regime still applies.

“Solely automated” means no meaningful human involvement. A human who rubber-stamps whatever the model outputs does not count. The required safeguards are consistent: tell people the decision is automated, let them make representations, give them a genuine route to obtain human intervention and a real review, and let them contest the outcome.

The design implication is simple. If an agent or model output can materially affect a person, do not let it act unreviewed. Build a checkpoint. This is how we run our own prospecting engine: an adversarial verifier, a separate judge model, checks every message before anything is auto-sent, and a human can always intervene. That pattern, where a machine proposes, a second check disposes, and a person can step in, is both good engineering and the shape the law now expects.

When you need a DPIA, and how to run a light one

A Data Protection Impact Assessment (Article 35) is mandatory when processing is likely to result in a high risk to people’s rights and freedoms. The ICO flags several triggers that AI projects routinely hit: large-scale profiling, systematic monitoring, use of novel technology, and automated decisions with significant effects. In plain terms: if you are wiring customer data into an LLM to make or heavily influence decisions about people, assume a DPIA is required.

A DPIA does not have to be a fifty-page document. A proportionate one answers, in writing: what data flows where and why; what the lawful basis is; what could go wrong for the individual (leakage, an unfair or wrong output, a decision they cannot challenge); what you have done to reduce each risk (minimisation, human review, retention limits, vendor controls); and what residual risk remains. Do it before you build, keep it living, and revisit it when the use changes. If high risk remains after your mitigations, you may need to consult the ICO before going live.

Done early, a DPIA is not red tape. It is the cheapest design review you will ever run, because it surfaces the expensive problems while they are still a paragraph and not a production incident.

Keep customer and client data out of the training set

The question that makes clients nervous is “will my data train the model?” The honest answer is that it depends entirely on the tier and settings you choose, so choose deliberately.

Consumer chat products often reserve the right to use your inputs to improve their models unless you opt out. Business and API tiers from the major providers generally do not train on your data by default, and typically offer zero-retention or short-retention options. The gap between an employee pasting client data into a free consumer chatbot and the same data going through an enterprise API with training disabled and retention set to zero is enormous, and it is entirely within your control.

Concrete steps: use a business, enterprise or API tier, never the free consumer app, for anything touching customer data. Confirm in writing that inputs and outputs are not used for training. Turn on zero-retention or the shortest retention you can. Disable human review of your prompts where that setting exists. And put a plain internal policy in place so staff know the free chatbot is off-limits for customer data. The technical control matters more than the instruction, so where you can, route AI access through a gateway you control rather than trusting everyone to remember the rule.

International transfers: know where your US provider actually sits

Most leading model providers are US companies, so sending them personal data is usually a restricted international transfer under UK GDPR, which needs a valid transfer mechanism.

Your options: the provider may be certified under the UK Extension to the EU-US Data Privacy Framework, known as the UK-US Data Bridge, which has been live since 12 October 2023 and lets you transfer to certified US organisations. Failing that, use the UK International Data Transfer Agreement (IDTA) or the UK Addendum to the EU Standard Contractual Clauses, backed by a Transfer Risk Assessment that checks whether the destination’s laws actually let the importer honour the protections.

One caution worth flagging: the Data Privacy Framework has faced legal challenge in the EU courts, with an appeal ongoing into 2026, so it would be unwise to treat the Data Bridge as permanently settled. Sensible teams keep SCCs or the IDTA available as a fallback and check the destination region. Where residency matters, several providers now let you pin processing to a specific region, which takes a lot of transfer risk off the table.

Get a DPA and read the sub-processor terms

Your AI vendor is your processor, so Article 28 requires a written data processing agreement before any personal data flows. No DPA, no lawful processing, full stop.

A usable DPA should name the purposes and limit the vendor to your documented instructions, commit to security measures, state retention and deletion, confirm no training on your data, list sub-processors and how you are told when they change, and set out breach notification and audit rights. That sub-processor list matters, because your customer data may pass to a cloud host or downstream service you have never heard of, and each one is part of your risk surface. Read it before you sign, not after an incident.

If assembling all this feels like a lot, that is the point: doing it once, properly, is what lets you move fast afterwards. When we built UpGrades as a multi-role AI product in 13 days, or put mdlondon’s live assistant in front of real customers, the speed came from settling the data-handling design up front, not from skipping it.

The practical do and don’t list

Do:

Pick and document a lawful basis before any data moves, usually legitimate interests with a written three-part test.
Send the smallest set of facts the task needs, and redact or tokenise identifiers.
Run a proportionate DPIA for any AI use that decides on or profiles people.
Keep a genuine human review on any decision with a legal or significant effect.
Use business or API tiers with training off and retention minimised, and route access through a controlled gateway.
Get an Article 28 DPA and check the sub-processor list and transfer mechanism.

Don’t:

Paste customer data into a free consumer chatbot.
Dump whole records into a prompt because it is easier than choosing fields.
Reuse data collected for one purpose to train or profile without a fresh basis.
Let an agent take a consequential action about a person with no checkpoint.
Assume a US provider is covered without checking the Data Bridge, IDTA or SCCs.
Treat “consent” buried in a general privacy notice as a valid basis.

Using Customer Data With LLMs Without Breaking GDPR