Skip to content

Redaction

Redaction runs before an event is passed to its sink. It applies recursively to strings, dictionaries, lists, tuples, sets, dataclasses, and objects exposing model_dump() or dict().

Default detectors

  • Email addresses
  • US phone numbers and Social Security numbers
  • JSON Web Tokens and bearer credentials
  • Common API-key assignments and AWS-style access keys
  • Database connection URLs
  • Signed URL query parameters such as token, signature, sig, key, and X-Amz-Signature
  • Values under sensitive dictionary field names

Sensitive field names include password, secret, token, API key, access key, authorization, cookies, client secret, connection string, database URL, email, phone, and SSN variants.

from agentlogsafe import redact

redact({
    "authorization": "Bearer abc.def.ghi",
    "message": "Contact person@example.com",
})
# {
#   "authorization": "[REDACTED]",
#   "message": "Contact [REDACTED]",
# }

Configuration

from agentlogsafe import AgentLogger, RedactionConfig

config = RedactionConfig(
    replacement="***",
    redact_phone_numbers=False,
    sensitive_field_names=("password", "secret", "employee_id"),
)
log = AgentLogger(redaction_config=config, sink="events.jsonl")

Supplying sensitive_field_names replaces the default tuple; include every field name your policy requires.

Disabling redaction

log = AgentLogger(redact_payloads=False, sink="events.jsonl")

This disables redaction for payload, risk, and metadata. Use it only when data has already been sanitized and the destination has suitable protections.

Operational limitations

Regex detectors can produce false positives and false negatives. Built-in international coverage includes plus-prefixed phone numbers, IBANs, UK National Insurance numbers, and Canadian SINs; it is not a comprehensive global identity library. Values that do not resemble a known pattern should be placed beneath a configured sensitive field name or removed before logging. Cyclic object graphs raise RedactionError instead of producing ambiguous output.

Hashing and tokenization

Use deterministic hashing when correlating repeated values is necessary but the original value must not be retained:

from agentlogsafe import HashStrategy, RedactionConfig

config = RedactionConfig(strategy=HashStrategy(salt=b"tenant-specific-salt"))

Use keyed HMAC tokenization when correlation identifiers must not be reproducible without a secret key:

from agentlogsafe import RedactionConfig, TokenizationStrategy

config = RedactionConfig(
    strategy=TokenizationStrategy(key=load_key_from_secret_manager())
)

Never hard-code production salts or tokenization keys. Rotate and scope keys using your secret-management policy. These strategies are intentionally one-way; they do not provide token vault lookup or reversible encryption.

Custom detectors

Implement RedactionDetector.redact(value, strategy) or use RegexDetector:

import re
from agentlogsafe import RedactionConfig, RegexDetector

employee_ids = RegexDetector("employee_id", re.compile(r"EMP-\d{6}"))
config = RedactionConfig(detectors=(employee_ids,))

Custom detectors run after enabled built-ins. Keep expressions bounded and avoid nested ambiguous repetitions that can cause pathological regex performance.