Specifying hierarchical columns with arguments pattern or by
Source: R/doc_specifying_columns.R
specifying_columns.RdWithin the hmatch_ group of functions, there are three ways to specify the
hierarchical columns to be matched.
In all cases, it is assumed that matched columns are already correctly ordered, with the first matched column reflecting the broadest hierarchical level (lowest-resolution, e.g. country) and the last column reflecting the finest level (highest-resolution, e.g. township).
(1) All column names common to raw and ref
If neither pattern nor by are specified (the default), then the
hierarchical columns are assumed to be all column names that are common to
both raw and ref.
(2) Regex pattern
Arguments pattern and pattern_ref take regex patterns to match the
hierarchical columns in raw and ref, respectively. Argument pattern_ref
only needs to be specified if it's different from pattern (i.e. if the
hierarchical columns have different names in raw vs. ref).
For example, if the hierarchical columns in raw are "ADM_1", "ADM_2", and
"ADM_3", which correspond respectively to columns within ref named
"REF_ADM_1", "REF_ADM_2", and "REF_ADM_3", then the pattern arguments can be
specified as:
pattern = "^ADM_[[:digit:]]"pattern_ref = "^REF_ADM_[[:digit:]]"
Alternatively, because pattern_ref defaults to the same value as
pattern (unless otherwise specified), one could specify a single regex pattern
that matches the hierarchical columns in both raw and ref, e.g.
pattern = "ADM_[[:digit:]]"
However, the user should exercise care to ensure that there are no
non-hierarchical columns within raw or ref that may inadvertently be
matched by the given pattern.
(3) Vector of column names
If the hierarchical columns cannot easily be matched with a regex pattern,
one can specify the relevant column names in vector form using arguments by
and by_ref. As with pattern_ref, argument by_ref only needs to be
specified if it's different from by (i.e. if the hierarchical columns have
different names in raw vs. ref).
For example, if the hierarchical columns in raw are "state", "county", and
"township", which correspond respectively to columns within ref named
"admin1", "admin2", and "admin3", then theby arguments can be specified
with:
by = c("state", "county", "township")by_ref = c("admin1", "admin2", "admin3")