Skip to contents

Create codes to identify each unique combination of hierarchical levels in a reference dataset. Codes may be integer-based (function hcodes_int) or string-based (hcodes_str). Integer-based codes reflect the alphabetical ranking of each level within the next-highest level. They are constant-width and may optionally be prefixed with any given string. String-based codes are created by pasting together the values of each hierarchical level with a given separator (with options for string standardization prior to collapsing).

Usage

hcodes_str(ref, pattern, by, sep = "__", std_fn = string_std)

hcodes_int(ref, pattern, by, prefix = "")

Arguments

ref

data.frame containing hierarchical columns with reference data

pattern

regex pattern to match the names of the hierarchical columns in ref (supply either pattern or by)

by

vector giving the names of the hierarchical columns in ref (supply either pattern or by)

sep

(only for hcodes_str) desired separator between levels in string-based codes (defaults to "__")

std_fn

(only for hcodes_str) Function to standardize input strings prior to creating codes. Defaults to string_std. Set to NULL to omit standardization. See also string_standardization.

prefix

(only for hcodes_int) character prefix for integer-based codes (defaults to "")

Value

A vector of codes

Examples

data(ne_ref)

# string-based codes
hcodes_str(ne_ref, pattern = "^adm")
#>  [1] "can"                             "usa"                            
#>  [3] "can__ontario"                    "usa__new_jersey"                
#>  [5] "usa__new_york"                   "usa__pennsylvania"              
#>  [7] "can__ontario__durham"            "can__ontario__halton"           
#>  [9] "can__ontario__peel"              "can__ontario__toronto"          
#> [11] "can__ontario__york"              "usa__new_jersey__bergen"        
#> [13] "usa__new_jersey__essex"          "usa__new_jersey__hudson"        
#> [15] "usa__new_jersey__middlesex"      "usa__new_jersey__monmouth"      
#> [17] "usa__new_york__jefferson"        "usa__new_york__bronx"           
#> [19] "usa__new_york__kings"            "usa__new_york__nassau"          
#> [21] "usa__new_york__new_york"         "usa__new_york__queens"          
#> [23] "usa__new_york__suffolk"          "usa__pennsylvania__allegheny"   
#> [25] "usa__pennsylvania__bucks"        "usa__pennsylvania__chester"     
#> [27] "usa__pennsylvania__delaware"     "usa__pennsylvania__jefferson"   
#> [29] "usa__pennsylvania__lancaster"    "usa__pennsylvania__philadelphia"
#> [31] "usa__pennsylvania__york"        

# integer-based codes
hcodes_int(ne_ref, pattern = "^adm")
#>  [1] "100" "200" "110" "210" "220" "230" "111" "112" "113" "114" "115" "211"
#> [13] "212" "213" "214" "215" "222" "221" "223" "224" "225" "226" "227" "231"
#> [25] "232" "233" "234" "235" "236" "237" "238"