Create codes to identify each unique combination of hierarchical levels in a reference dataset
Source:R/hcodes.R
hcodes.Rd
Create codes to identify each unique combination of hierarchical levels in a
reference dataset. Codes may be integer-based (function hcodes_int
) or
string-based (hcodes_str
). Integer-based codes reflect the alphabetical
ranking of each level within the next-highest level. They are constant-width
and may optionally be prefixed with any given string. String-based codes are
created by pasting together the values of each hierarchical level with a
given separator (with options for string standardization prior to
collapsing).
Usage
hcodes_str(ref, pattern, by, sep = "__", std_fn = string_std)
hcodes_int(ref, pattern, by, prefix = "")
Arguments
- ref
data.frame
containing hierarchical columns with reference data- pattern
regex pattern to match the names of the hierarchical columns in
ref
(supply eitherpattern
orby
)- by
vector giving the names of the hierarchical columns in
ref
(supply eitherpattern
orby
)- sep
(only for
hcodes_str
) desired separator between levels in string-based codes (defaults to "__")- std_fn
(only for
hcodes_str
) Function to standardize input strings prior to creating codes. Defaults tostring_std
. Set toNULL
to omit standardization. See also string_standardization.- prefix
(only for
hcodes_int
) character prefix for integer-based codes (defaults to "")
Examples
data(ne_ref)
# string-based codes
hcodes_str(ne_ref, pattern = "^adm")
#> [1] "can" "usa"
#> [3] "can__ontario" "usa__new_jersey"
#> [5] "usa__new_york" "usa__pennsylvania"
#> [7] "can__ontario__durham" "can__ontario__halton"
#> [9] "can__ontario__peel" "can__ontario__toronto"
#> [11] "can__ontario__york" "usa__new_jersey__bergen"
#> [13] "usa__new_jersey__essex" "usa__new_jersey__hudson"
#> [15] "usa__new_jersey__middlesex" "usa__new_jersey__monmouth"
#> [17] "usa__new_york__jefferson" "usa__new_york__bronx"
#> [19] "usa__new_york__kings" "usa__new_york__nassau"
#> [21] "usa__new_york__new_york" "usa__new_york__queens"
#> [23] "usa__new_york__suffolk" "usa__pennsylvania__allegheny"
#> [25] "usa__pennsylvania__bucks" "usa__pennsylvania__chester"
#> [27] "usa__pennsylvania__delaware" "usa__pennsylvania__jefferson"
#> [29] "usa__pennsylvania__lancaster" "usa__pennsylvania__philadelphia"
#> [31] "usa__pennsylvania__york"
# integer-based codes
hcodes_int(ne_ref, pattern = "^adm")
#> [1] "100" "200" "110" "210" "220" "230" "111" "112" "113" "114" "115" "211"
#> [13] "212" "213" "214" "215" "222" "221" "223" "224" "225" "226" "227" "231"
#> [25] "232" "233" "234" "235" "236" "237" "238"