All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
DataRedactor.name_pattern(first, last, middle:) — generates a POSIX ERE that matches a person’s name across common written variations (case-insensitivity, First/Last order swaps, Last, First, initials, diacritics, and interchangeable space/hyphen separators). Returns a String ready to pass to add_pattern. The pattern is boundary-wrapped, so "Mario" matches as a word but not inside "Mariolino". When middle: is given, both the no-middle and with-middle forms match.DataRedactor.redact_deep(data, only:, except:, placeholder:) — recursively redacts every String value in a nested Hash/Array structure. Non-string scalars (Integer, Float, nil, Boolean) and Hash keys are passed through unchanged. Returns a deep copy; never mutates the input. Raises ArgumentError on circular references.DataRedactor.redact_json(json_string, only:, except:, placeholder:) — parses JSON, redacts via redact_deep, and returns valid JSON. Raises JSON::ParserError on invalid input.hvs. prefix, 90–120 chars) — pattern hashicorp_vault_service_tokenhvb. prefix, 138–300 chars) — pattern hashicorp_vault_batch_token<14-char-id>.atlasv1.<token>) — pattern hashicorp_terraform_api_tokenAll three HashiCorp patterns are tagged :credentials and do not require word-boundary wrapping (distinctive prefixes eliminate false positives).
Supersedes 0.7.1, which has been yanked from RubyGems.
0.7.1 had a release pipeline bug: the source gem and the precompiled native
gems were published by two independent workflows, with no gating between
them. When the native-binary builds failed (oxidize-rb/actions/cross-gem
couldn’t pull rbsys/aarch64-linux:0.9.128 from Docker Hub), the source
gem still published — leaving users with release notes that promised
precompiled binaries that didn’t exist on RubyGems. 0.7.2 ships the same
features as 0.7.1 plus the pipeline fix.
ci.yml
and into release-binaries.yml, alongside the native-gem builds. The
publish job now needs: [build-source, build-native]; if any native
platform fails to build, nothing publishes. This guarantees the
RubyGems release matches what the GitHub release notes promise.rake-compiler-dock invocation in CI instead of the
oxidize-rb/actions/cross-gem action. Same code path as rake gem:all
locally and the existing PR-time smoke test in ci.yml. Uses
ghcr.io/rake-compiler/* images (no Docker Hub rate limits).aarch64-linux variant in particular was previously failing.bundle lock --add-platform for cross-platform deploys.data_redactor no longer requires a C toolchain on these targets:
x86_64-linux, aarch64-linux (glibc)x86_64-linux-musl, aarch64-linux-musl (Alpine)x86_64-darwin, arm64-darwin (macOS Intel + Apple Silicon)
Each native gem ships compiled .so files for Ruby 3.1, 3.2, 3.3, and 3.4.
Bundler/RubyGems automatically picks the right gem for the host; users on
any other platform fall back to the source gem and compile as before.rake gem:all task — builds every native gem locally via rake-compiler-dock
(requires Docker). Single command to regenerate the full release matrix..github/workflows/release-binaries.yml — builds & publishes all native
gems on every GitHub release. Also exposes workflow_dispatch so a
maintainer can rebuild any past release without cutting a new tag.rake-compiler-dock as a development dependency. Source-only
gem size is unchanged — native gems strip ext/ and the extconf.rb
extension hook so they only carry the prebuilt .so files.lib/data_redactor/integrations/. Soft-required — none are loaded by default; the gem still has zero runtime dependencies in the gemspec.
DataRedactor::Integrations::Logger — drop-in Logger::Formatter that scrubs every emitted line, wraps an inner formatter (default Logger::Formatter), and preserves exception cause chains.DataRedactor::Integrations::Rails.filter(...) — returns a (key, value) proc for Rails.application.config.filter_parameters. Mutates String values in place via String#replace.DataRedactor::Integrations::Rack — middleware with selectable surfaces. scrub: accepts any subset of [:body, :headers] (default both). :body buffers the response and drops Content-Length; :headers scrubs sensitive response headers (Set-Cookie, Authorization, X-Api-Key, …) and request headers in the env hash. Unknown surfaces raise ArgumentError.only:, except:, placeholder: to DataRedactor.redact.rack as a development dependency. No new runtime dependencies.:credentials tag, exposed via DataRedactor.pattern_names:
anthropic_api_key — sk-ant-apiNN-...openai_project_api_key — sk-proj-...gitlab_pat — glpat-...digitalocean_pat — dop_v1_...databricks_api_token — dapi...sentry_dsn — https://KEY@oNNN.ingest.sentry.io/PID (also matches the legacy KEY:SECRET@ form)NUM_PATTERNS is now 85 (was 79). Built-in pattern indices in C have shifted accordingly; the public Ruby API and pattern names are stable.only: / except:. Both kwargs now accept a mix of Symbols (tags) and Strings (pattern names from DataRedactor.pattern_names). They can be combined: only: :contact, except: ["email"] redacts every contact pattern except email. Mixed-list shapes like only: [:credentials, "iban_de"] also work. Precedence: except: always wins when the two overlap.DataRedactor.pattern_names — array of every known pattern name (built-ins + currently registered custom).DataRedactor::BUILTIN_PATTERN_NAMES and DataRedactor::BUILTIN_PATTERN_TAG_BITS constants (frozen) exposing the compiled-in pattern roster.DataRedactor::UnknownPatternError raised when a String passed to only:/except: does not match any known pattern..github/workflows/ci.yml publishes bundle exec yard doc output to GitHub Pages on every push to main._redact(text, ph_mode, ph_str, enable_bits) and _scan(text, enable_bits) now take a per-pattern enable bit array (built by the Ruby wrapper from only:/except:) instead of a tag bitmask. The public DataRedactor.redact / .scan API is fully backward compatible — only the underscore-prefixed C boundary changed. Single-pass: filtering happens in C, no second pass through _scan.only: and except: may now be combined (previously raised ArgumentError if both were passed).ext/data_redactor/data_redactor.c was a single ~1000-line file; it is now a 60-line entry point plus patterns.{c,h}, placeholder.{c,h}, redact.{c,h}, scan.{c,h}, custom_patterns.{c,h}, and tags.h. extconf.rb now globs every .c in the extension directory via $srcs, so adding a new module needs no Makefile edits.DataRedactor now has @param/@return/@raise annotations (100% coverage); .yardopts configures markdown rendering with the README as the front page.redact/scan are thread-safe but add_pattern/remove_pattern/clear_custom_patterns! are not (register custom patterns once at boot).DataRedactor.scan(text, only:, except:) — returns { redacted: String, matches: Array<Hash> } where each match contains :tag (Symbol), :name (pattern name String), :value (matched text), :start (byte offset into original), :length (byte length). Accepts the same only:/except: tag filters as redact. Includes both built-in and custom pattern matches.pattern_names[] array in the C extension mapping each built-in pattern index to a stable snake_case name string (e.g. "aws_access_key_id", "email", "iban_de").placeholder: keyword argument on DataRedactor.redact.
"[REDACTED]"): placeholder: "***"placeholder: :tagged → [REDACTED:CONTACT], [REDACTED:CREDENTIALS], etc.placeholder: :hash → [CONTACT_a3f9] (4-hex djb2 suffix, same value always produces the same token — useful for correlating redactions across log lines).PH_MODE_PLAIN, PH_MODE_TAGGED, PH_MODE_HASH integer constants exposed from C.DataRedactor::PLACEHOLDER_DEFAULT constant ("[REDACTED]").DataRedactor._redact now takes 4 arguments: (text, mask, ph_mode, ph_str). The public DataRedactor.redact API is fully backward compatible.DataRedactor.add_pattern(name:, regex:, tag: :custom, boundary: false).DataRedactor.remove_pattern(name) — remove a named custom pattern (returns true/false).DataRedactor.custom_patterns — list all registered custom patterns as an array of hashes.DataRedactor.clear_custom_patterns! — remove all custom patterns (useful in test suites).:custom tag and TAG_CUSTOM bitmask constant for custom patterns. Works with only:/except:.DataRedactor::InvalidPatternError raised when a pattern fails regcomp or uses unsupported Ruby-only syntax (\d, \s, \w, \b, lookaround, non-greedy quantifiers, named groups).boundary: true (group indices would shift).regex_t is freed).:credentials, :financial, :tax_id, :national_id, :contact, :network, :travel, :other).DataRedactor.redact(text, only: [...]) to redact only patterns in the given tags.DataRedactor.redact(text, except: [...]) to redact every tag except the given ones.DataRedactor.tags returning the list of supported tags.DataRedactor::TAGS constant mapping tag symbols to bitmask values, plus TAG_* integer constants exposed from C for advanced use.DataRedactor::UnknownTagError raised when an unknown tag symbol is passed.DataRedactor._redact(text, mask) (two-arg, mask is an integer bitmask). The public API is the Ruby wrapper DataRedactor.redact, which remains backward compatible: redact(text) with no keyword arguments runs every pattern exactly as before.ext/data_redactor/data_redactor.c) using POSIX regex.h for high-throughput scanning.DataRedactor.redact(text) module function returning the input with every match replaced by [REDACTED].