Create expanded pattern vector for text matching

Accepts a character vector of patterns and returns an expanded vector containing:

Original items
Original items with special characters escaped
Patterns with spaces replaced with a pattern of any number of spaces and any one character between words.
Individual words from each item, split by spaces and punctuation, not including stop words.

Usage

more_patterns(
  patterns,
  original = TRUE,
  escape_regex = TRUE,
  spaces_to_flexible = TRUE,
  individual = TRUE
)

Arguments

patterns: Character vector of patterns to expand
original: Logical. Whether to include original patterns (default: TRUE)
escape_regex: Logical. Whether to include original patterns with special characters escaped (default: TRUE)
spaces_to_flexible: Logical. Whether to include space-flexible patterns (default: TRUE)
individual: Logical. Whether to include individual words (default: TRUE)

Value

Character vector with selected pattern expansions

Details

The intention of the order is to replace phrases with as few replacements as possible while taking additional efforts to anonymize all sensitive information.

Examples

people <- c("John Smith", "Mary Jane Watson")
more_patterns(people)
#> [1] "John Smith"                       "Mary Jane Watson"                
#> [3] "John\\s*.\\s*Smith"               "Mary\\s*.\\s*Jane\\s*.\\s*Watson"
#> [5] "John"                             "Smith"                           
#> [7] "Mary"                             "Jane"                            
#> [9] "Watson"                          
more_patterns(people, individual = FALSE)
#> [1] "John Smith"                       "Mary Jane Watson"                
#> [3] "John\\s*.\\s*Smith"               "Mary\\s*.\\s*Jane\\s*.\\s*Watson"