Specifying hierarchical columns with arguments pattern
or by
Source: R/doc_specifying_columns.R
specifying_columns.Rd
Within the hmatch_
group of functions, there are three ways to specify the
hierarchical columns to be matched.
In all cases, it is assumed that matched columns are already correctly ordered, with the first matched column reflecting the broadest hierarchical level (lowest-resolution, e.g. country) and the last column reflecting the finest level (highest-resolution, e.g. township).
(1) All column names common to raw
and ref
If neither pattern
nor by
are specified (the default), then the
hierarchical columns are assumed to be all column names that are common to
both raw
and ref
.
(2) Regex pattern
Arguments pattern
and pattern_ref
take regex patterns to match the
hierarchical columns in raw
and ref
, respectively. Argument pattern_ref
only needs to be specified if it's different from pattern
(i.e. if the
hierarchical columns have different names in raw
vs. ref
).
For example, if the hierarchical columns in raw
are "ADM_1", "ADM_2", and
"ADM_3", which correspond respectively to columns within ref
named
"REF_ADM_1", "REF_ADM_2", and "REF_ADM_3", then the pattern arguments can be
specified as:
pattern = "^ADM_[[:digit:]]"
pattern_ref = "^REF_ADM_[[:digit:]]"
Alternatively, because pattern_ref
defaults to the same value as
pattern
(unless otherwise specified), one could specify a single regex pattern
that matches the hierarchical columns in both raw
and ref
, e.g.
pattern = "ADM_[[:digit:]]"
However, the user should exercise care to ensure that there are no
non-hierarchical columns within raw
or ref
that may inadvertently be
matched by the given pattern.
(3) Vector of column names
If the hierarchical columns cannot easily be matched with a regex pattern,
one can specify the relevant column names in vector form using arguments by
and by_ref
. As with pattern_ref
, argument by_ref
only needs to be
specified if it's different from by
(i.e. if the hierarchical columns have
different names in raw
vs. ref
).
For example, if the hierarchical columns in raw
are "state", "county", and
"township", which correspond respectively to columns within ref
named
"admin1", "admin2", and "admin3", then theby
arguments can be specified
with:
by = c("state", "county", "township")
by_ref = c("admin1", "admin2", "admin3")