Match a data.frame with raw, potentially messy hierarchical data (e.g. province, county, township) against a reference dataset, using a dictionary of manually-specified matches.
Usage
hmatch_manual(
raw,
ref,
man,
pattern,
pattern_ref = pattern,
by,
by_ref = by,
code_col,
type = "left",
ref_prefix = "ref_",
std_fn = string_std,
...
)Arguments
- raw
data frame containing hierarchical columns with raw data
- ref
data frame containing hierarchical columns with reference data
- man
data.frameof manually-specified matches, relating a given set of hierarchical values to the code withinrefto which those values correspond- pattern
regex pattern to match the hierarchical columns in
rawandman(see also specifying_columns)- pattern_ref
regex pattern to match the hierarchical columns in
ref. Defaults topattern, so only need to specify if the hierarchical columns have different names inrawandref.- by
vector giving the names of the hierarchical columns in
rawandman- by_ref
vector giving the names of the hierarchical columns in
ref. Defaults toby, so only need to specify if the hierarchical columns have different names inrawandref.- code_col
name of the code column containing codes for matching
refandman- type
type of join ("left", "inner", or "anti"). Defaults to "left". See join_types. Note that this function does not allow 'resolve joins', unlike most other
hmatch_functions.- ref_prefix
prefix to add to names of returned columns from
refif they are otherwise identical to names withinraw. Defaults to "ref_".- std_fn
function to standardize strings during matching. Defaults to
string_std. Set toNULLto omit standardization. See also string_standardization.- ...
additional arguments passed to
std_fn()
Value
a data frame obtained by matching the hierarchical columns in raw
and ref based on sets of matches specified in man, using the join type
specified by argument type (see join_types for more details)
Examples
data(ne_raw)
data(ne_ref)
# create df mapping sets of raw hierarchical values to codes within ref
ne_man <- data.frame(
adm0 = NA_character_,
adm1 = NA_character_,
adm2 = "Bergen, N.J.",
hcode = "211",
stringsAsFactors = FALSE
)
# find manual matches
hmatch_manual(ne_raw, ne_ref, ne_man, code_col = "hcode", type = "inner")
#> id adm0 adm1 adm2 level ref_adm0 ref_adm1 ref_adm2 hcode
#> 1 PID10 <NA> <NA> Bergen, N.J. adm2 USA New Jersey Bergen 211