Generate anonymized summary of data objects in an environment

This function creates a summary of all objects (primarily data frames) in a specified environment or list, then anonymizes the results using the same pattern matching approach as anon(). It provides structural information about data frames including dimensions, variable details, and memory usage while protecting sensitive information through pattern-based redaction.

Usage

anon_data_summary(
  envir = globalenv(),
  selection = NULL,
  pattern_list = list(),
  default_replacement = getOption("anon.default_replacement", default = "[REDACTED]"),
  example_values_n = getOption("anon.example_values_n", default = 0),
  example_rows = getOption("anon.example_rows"),
  check_approximate = getOption("anon.check_approximate", default = FALSE),
  max_distance = 2,
  nlp_auto = getOption("anon.nlp_auto")
)

Arguments

envir

An environment or list containing the objects to summarize. When passed as a list, unnamed elements will automatically be given names (either derived from the function call or indexed as "x1", "x2", etc.). Default is globalenv().

selection

Optional character vector of object names to include in the summary.

pattern_list

A list of patterns to search for and replace. Can include:

Named elements where names are replacement values and values are one or more patterns to match
Unnamed elements where one or more patterns are replaced with default_replacement This parameter is combined with the global option getOption("anon.pattern_list").

default_replacement

Value to use as the default replacement value when no specific replacement is provided. Default is getOption("anon.default_replacement", default = "\[REDACTED\]").

example_values_n

Optional number of example unique values to include for discrete/text-like data frame columns. Defaults to 0, which disables example values.

example_rows

Optional example-row specification for data frames. Use NULL to disable examples, a single number to request that many rows per data frame, or anon_example_rows() to build a spec with explicit arguments such as n, key, method, and n_key_values.

check_approximate

Logical indicating whether to check for approximate matches using string distance. Default is getOption("anon.check_approximate", default = FALSE).

max_distance

Maximum string distance for approximate matching when check_approximate is TRUE. Default is 2.

nlp_auto

List of logical values with names corresponding to entity names. Can be generated with nlp_auto() and can be set as the anon.nlp_auto global option. This argument overrides the global option.

Value

An object of class "anon_data_summary" containing:

$summary: A tibble with overall statistics (total objects, data frames count, other objects count, total memory usage)
$data_frames: A list with two elements (only present if data frames exist):
- $structure: A tibble with structural information for each data frame (name, label, dimensions, memory size)
- $variables: A tibble with detailed variable information including data types, missing values, distinct values, labels, and optional example values
$examples: Optional data frame example payloads containing either sample rows per data frame or one or more keyed cross-source scenarios
All content is anonymized according to the specified patterns

Details

The function operates in a few key steps:

Generates detailed summaries for all objects
Creates structured output with summary statistics and detailed information about data frames
Applies anonymization using anon() with the provided patterns

For data frames, the function captures:

Structural information: dimensions, memory usage, and data frame-level labels
Variable details: data types, missing value counts, distinct value counts, variable labels, and optional example values
Optional example payloads: either sample rows or one or more keyed cross-source scenarios when configured

The output includes a custom print method that displays the information in a readable format while maintaining the anonymization.

Examples