Targe · 7 min read · June 2026

Data protection patterns for AI workloads

Most company data does not need to be locked away from AI. Most of it could pass through any reasonable model without consequence. The trouble is that the small fraction that does matter — customer records, draft contracts, internal financials, regulated information — has a way of leaking through the same channels as the harmless majority. The job of a data-protection pattern is to make the difficult cases obvious and the easy cases frictionless.

Here are four patterns we lean on, built from years of doing this for clients in Namibia and South Africa where data residency and sectoral regulation are not afterthoughts.

Pattern 1: Classify at the source, not at the prompt

The temptation is to scan prompts going out to AI tools and block the dangerous ones. This works, badly. It is reactive, it triggers false positives, and it relies on users tolerating the friction. By the time you are scanning a prompt, the user has already pasted, thought about, and intended to send sensitive information.

The better pattern is to classify data at its source — when it leaves your CRM, your finance system, your document store — and carry the classification with it. A row marked confidential in your customer database stays marked when it is exported, downloaded, or copied. The AI tool then sees the tag, not the user's intent.

This shifts the question from "is this prompt dangerous?" to "is this data allowed to be in a tool of this category?" — a much easier question to answer correctly.

Pattern 2: Tier your models by what they retain

Not all model providers treat your data the same way. Some retain prompts for training by default. Some delete after thirty days. Some offer zero-retention enterprise tiers under contract. Treating them as a single category — "external AI" — ignores the most important difference.

We use three tiers internally and recommend the same to clients:

Tier A — sovereign or zero-retention. Self-hosted models, or providers under enterprise contracts with audited zero-retention. Suitable for any internal data, including regulated.

Tier B — contracted retention with deletion windows. Mainstream enterprise plans of major providers. Suitable for general internal data; not for regulated or contractually-restricted data.

Tier C — public consumer tools. Free plans, personal accounts, anything where the user is the product. Suitable for public information only.

When a new initiative is registered, the tier requirement falls out of the data classification automatically. There is no debate in the room.

Pattern 3: Default to redaction, allow specific lifts

For any pipeline that touches both sensitive data and an external model, the default should be redaction at the boundary. Names, account numbers, addresses, identifiers — replaced with stable tokens before the model sees them. The model still produces a useful answer; the tokens are reversed only on the way back, inside your perimeter.

The exception is when a specific use case genuinely needs the identifier — say, drafting an email to a named customer. That should be an explicit lift, approved against a policy, logged, and scoped. The default-redact, lift-by-exception pattern protects against the most common failure mode: a developer building a pipeline who never thought about the data crossing the boundary, because nothing made them.

Pattern 4: Keep retrieval inside the perimeter

Retrieval-augmented generation has become a default architecture, and rightly so — it grounds models in your own documents. But the retrieval step is where data protection is won or lost. If your vector store, your search index and your raw documents all live with the model provider, you have effectively shipped your knowledge base outside.

The pattern we use: retrieval, ranking and chunk selection happen inside the client's environment. Only the final, minimal set of relevant chunks is passed to the model — and only the chunks that the asking user is already authorised to read. This keeps the corpus inside, applies existing access controls naturally, and limits the blast radius if a model is ever compromised or a prompt is ever leaked.

The pattern that ties them together

Each of these patterns has the same shape: make the safe path the default, make the unsafe path explicit, and write down every exception. That is also the shape of Targe.

A policy that depends on every user remembering to do the right thing will be broken by the first deadline. A policy expressed as defaults, with logged exceptions, survives reorganisations, staff turnover and a sudden surge of AI experimentation. It is the only kind that holds up under real conditions.

If you want a hand shaping these patterns for your own stack, get in touch — it is the kind of conversation we enjoy.

← Back to blog