Skip to contents

Standardizes strings prior to performing a match, using the following transformations:

  1. standardize case (base::tolower)

  2. remove sequences of non-alphanumeric characters at start or end of string

  3. replace remaining sequences of non-alphanumeric characters with "_"

  4. remove diacritics (stringi::stri_trans_general)

  5. (optional) convert roman numerals (I, II, ..., XLIX) to arabic (1, 2, ..., 49)

Usage

string_std(x, convert_roman = FALSE)

Arguments

x

a string

convert_roman

logical indiciating whether to convert roman numerals (I, II, ..., XLIX) to arabic (1, 2, ..., 49)

Value

The standardized version of x

Examples

string_std("United STATES")
#> [1] "united_states"
string_std("R\u00e9publique  d\u00e9mocratique du  Congo")
#> [1] "republique_democratique_du_congo"

# convert roman numerals to arabic
string_std("Mungindu-II (Sud)")
#> [1] "mungindu_ii_sud"
string_std("Mungindu-II (Sud)", convert_roman = TRUE)
#> [1] "mungindu_2_sud"

# note the conversion only works if the numeral is separated from other
# alphanumeric characters by punctuation or space characters
string_std("MunginduII", convert_roman = TRUE) # roman numeral not recognized
#> [1] "munginduii"