Skip to content

The Sanitize API — Design Decisions

Living document. Updated as decisions are made.


Core Principle: Legibility at the Trust Boundary

Section titled “Core Principle: Legibility at the Trust Boundary”

@hypertheory-labs/sanitize sits at the boundary between developer intent and what gets shared with AI assistants, logging systems, and export pipelines. The config a developer writes is also an audit surface — a security reviewer should be able to scan it and immediately verify coverage without understanding implementation details.

Legibility is not an aesthetic preference here. It is a security property.

This means we prioritise readable configs over clever APIs, even when a clever API would be smaller or more elegant. Every addition to the surface area is weighed against whether it makes a config harder to scan at a glance.


The API is layered so that complexity is opt-in. Most developers never leave Tier 1.

sanitized(state, {
password: 'password', // masked
apiKey: 'apiKey', // hashed correlation token
creditCard: 'creditCard', // last four digits
ssn: 'omitted', // key removed entirely
})

Zero imports beyond sanitized itself. Full autocomplete. Config reads like a description of intent. This covers the vast majority of real-world cases.

Tier 2 — Parameterized operators (curried functions)

Section titled “Tier 2 — Parameterized operators (curried functions)”
import { sanitized, keepFirst, truncate } from '@hypertheory-labs/sanitize';
sanitized(state, {
internalId: keepFirst(6),
notes: truncate(80),
})

Used when the finite named vocabulary doesn’t fit. Returns a plain (v: string) => string function — no new concepts, just function composition. Can be mixed freely with Tier 1 rules in the same config.

Tier 3 — Domain factory (custom aliases)

Section titled “Tier 3 — Domain factory (custom aliases)”
const { sanitized } = createSanitizer({
policyNumber: keepFirst(6), // custom alias → parameterized operator
claimNumber: 'hashed', // custom alias → named primitive
memberId: 'omitted',
});
// Now usable as string literals with full type safety:
sanitized(state, { policyNumber: 'policyNumber', claimNumber: 'claimNumber' })

Promotes Tier 2 back to Tier 1 ergonomics for a specific domain. Companies define their vocabulary once; developers in that domain use string literals and never see curried functions. Not yet implemented — see backlog.


Named Rules: Primitives and Semantic Aliases

Section titled “Named Rules: Primitives and Semantic Aliases”

Named rules are string literals drawn from the sanitizationHandlers map. The satisfies constraint on that map is what makes SanitizationRule narrow to a union of literal keys rather than string — this is load-bearing for autocomplete and type safety.

RuleOutputUse when
'omitted'(key removed)Field shouldn’t appear in output at all
'redacted''[redacted]'Field should be visible but value hidden
'lastFour'last 4 charsExplicit about the transformation
'firstFour'first 4 charsExplicit about the transformation
'masked''***…' (≤8 chars)Length-approximate masking
'hashed''[~3f9a12b4]'Correlation without exposure
'email''jo***@example.com'Standard email masking
RuleMaps toRationale
'creditCard'lastFourIndustry standard: show last 4
'debitCard'lastFourSame convention
'phoneNumber'lastFourCommon display convention
'ssn'redactedLast-4 SSN is common UX but too risky for devtools output
'password'maskedLength-approximate — presence visible, value hidden
'apiKey'hashedIdentity correlation useful; value must not appear
'token'hashedSame as apiKey
'secret'redactedFully hidden
'emailAddress'emailVerbose alias for clarity

omitted vs redacted — an important distinction

Section titled “omitted vs redacted — an important distinction”

omitted removes the key — the field does not exist in the sanitized output. redacted keeps the key with '[redacted]' as the value. For AI accessibility, redacted is generally preferable: the AI can see that a field exists and understand the data shape, even if it cannot see the value. Use omitted only when the field’s existence would itself be misleading.

Replacing an API key with [~3f9a12b4] preserves the ability to tell whether two stores are using the same key (same hash) or whether the key changed between snapshots (different hash). This is genuinely useful for debugging without exposing the value.


For cases the named vocabulary doesn’t cover. Each returns a SanitizationHandler ((v: string) => string) — a plain function usable anywhere a named rule can be used.

OperatorSignatureOutput
keepFirst(n)(n: number) => SanitizationHandlerFirst n chars
keepLast(n)(n: number) => SanitizationHandlerLast n chars
truncate(n)(n: number) => SanitizationHandlerFirst n chars +
replace(fn)(fn) => SanitizationHandlerCustom transform — escape hatch

replace is intentionally the escape hatch, not a first-class pattern. Encourage named rules and parameterized operators first; replace exists for genuinely one-off cases.


Arrays require a structural signal: “apply this config to each element.” Two options were considered:

// Tuple convention — works but opaque
customers: [{ email: 'omitted' }]
// arrayOf() — explicit and readable
customers: arrayOf({ email: 'omitted', creditCard: 'creditCard' })

arrayOf() was chosen on legibility grounds — it reads like English at the trust boundary and doesn’t require knowing the [config] convention.

Vocabulary borrowed from Zod: z.array() serves the same structural purpose in Zod’s API. There is no further convergence with Zod’s API surface — this is the full extent of the overlap. arrayOf() is the end of that road, not the beginning of a descent into schema territory.

Both forms are supported for compatibility, but arrayOf() is canonical.


Why satisfies on the handler map is load-bearing

Section titled “Why satisfies on the handler map is load-bearing”
const sanitizationHandlers = {
omitted: null,
lastFour: (v: string) => v.slice(-4),
// ...
} satisfies Record<string, ((v: string) => string) | null>;

SanitizationRule is derived as keyof typeof sanitizationHandlers. If satisfies is replaced with a type annotation (: Record<string, ...>), TypeScript widens the type and keyof produces string instead of 'omitted' | 'lastFour' | ..., breaking the entire config type and eliminating autocomplete. satisfies validates the shape while preserving the narrow literal key types. Do not change this.


autoRedactConfig(state) scans top-level field names against a sensitive-field blocklist and returns a SanitizationConfig automatically. withStellarDevtools calls this on every state snapshot and merges the result with any explicit sanitize option:

{ ...autoRedactConfig(raw), ...options.sanitize }

Explicit config always wins. This means you get protection for common field names without writing any config, and any explicit rules override the defaults.


  • createSanitizer() factory — enables domain-specific aliases (Tier 3). Design note: values in the custom map can be either a named primitive (string) or a SanitizationHandler function; the factory promotes them all to string aliases with full type safety.

  • @hypertheory-labs/sensitive — a companion package for runtime-visible tagging of sensitive data using tagged classes (Effect-style _tag pattern). Sanitization rules could be derived automatically from tags — a field of type SSN would be redacted everywhere without explicit config. Enables cross-cutting policy-based sanitization and, eventually, AI-assisted security auditing (“is any value tagged employee-id ever appearing in a route segment?”).