These functions use natural language processing to identify and anonymize different types of named entities from text. Each function combines NLP entity extraction with pattern expansion and anonymization.
Usage
anon_nlp_entities(x, entity_types = nlp_entity_sets$all, ...)
anon_nlp_proper_nouns(x, ...)
anon_nlp_dates(x, ...)
anon_nlp_dates_and_times(x, ...)
anon_nlp_named(x, ...)
anon_nlp_numbers(x, ...)
anon_nlp_organizations(x, ...)
anon_nlp_people(x, ...)
anon_nlp_places(x, ...)Arguments
- x
The object to anonymize. Can be a character vector, factor, data frame, or list.
- entity_types
Character vector of entity types to extract. Entity types are: "PERSON", "NORP", "FAC", "ORG", "GPE", "LOC", "PRODUCT", "EVENT", "WORK_OF_ART", "LAW", "LANGUAGE", "DATE", "TIME", "PERCENT", "MONEY", "QUANTITY", "ORDINAL", "CARDINAL". See nlp_entity_sets for collections of entity types.
- ...
Additional arguments passed to
anon
Value
An anonymized object of class anon_context with named entities
replaced according to the anonymization workflow.
Details
These functions:
Use NLP to extract potentially sensitive information of the specified type
Extend patterns using
more_patternsfor comprehensive matchingApply
anonto anonymize the identified patterns
Available entity types:
anon_nlp_entities(): Anonymize any specified entity typesanon_nlp_proper_nouns(): Anonymize proper nouns using POS tagginganon_nlp_dates(): Anonymize datesanon_nlp_dates_and_times(): Anonymize dates and timesanon_nlp_people(): Anonymize person namesanon_nlp_organizations(): Anonymize organization namesanon_nlp_places(): Anonymize place names and locationsanon_nlp_numbers(): Anonymize numeric entitiesanon_nlp_named(): Anonymize named entities
Global Options
The anon.nlp_default_replacements global option affects the default replacement
values used by these functions when no default_replacement argument is explicitly
provided. See nlp_default_replacements() to generate the content for the option.
Examples
text <- c("John Smith works at Microsoft in Seattle.",
"The deal was worth $1.2 million in 2023.",
"He was the first employee to make 100% of his 3rd quarter targets.")
# Anonymize all entities
anon_nlp_entities(text)
#> [1] "[PERSON] works at [ORG] in [GPE]."
#> [2] "The deal was worth [MONEY] in [DATE]."
#> [3] "He was the [ORDINAL] employee to make [PERCENT] of [DATE] targets."
# Anonymize person names
anon_nlp_people(text)
#> [1] "[PERSON] works at Microsoft in Seattle."
#> [2] "The deal was worth $1.2 million in 2023."
#> [3] "He was the first employee to make 100% of his 3rd quarter targets."
# Anonymize organizations with custom replacement
anon_nlp_organizations(text, default_replacement = "[COMPANY]")
#> [1] "John Smith works at [COMPANY] in Seattle."
#> [2] "The deal was worth $1.2 million in 2023."
#> [3] "He was the first employee to make 100% of his 3rd quarter targets."
# Anonymize specific entity types
anon_nlp_entities(text, entity_types = c("PERSON", "ORG"))
#> [1] "[PERSON] works at [ORG] in Seattle."
#> [2] "The deal was worth $1.2 million in 2023."
#> [3] "He was the first employee to make 100% of his 3rd quarter targets."
