Check that a data dictionary complies with the OCA data sharing standard
Source:R/valid_dict.R
valid_dict.RdIncludes the following checks:
contains required columns (
variable_name,short_label,type,choices,origin,status)required columns complete (no missing values)
no duplicated values in column
variable_nameno non-valid values in columns
type,origin,status,indirect_identifierfor coded-list type variables:
no missing choices
no incorrectly formatted choices (expected format is "value1, Label 1 | value2, Label 2 | ...")
Examples
# read example dataset
path_data <- system.file("extdata", package = "datadict")
dat <- readxl::read_xlsx(file.path(path_data, "linelist_cleaned.xlsx"))
# generate data dictionary template from dataset
dict <- dict_from_data(dat, factor_values = "string")
# dictionary column 'indirect_identifier' must be manually specified (yes/no)
dict$indirect_identifier <- "no"
# check for validity
valid_dict(dict)
#> [1] TRUE