Check that a data dictionary complies with the OCA data sharing standard
Source:R/valid_dict.R
valid_dict.Rd
Includes the following checks:
contains required columns (
variable_name
,short_label
,type
,choices
,origin
,status
)required columns complete (no missing values)
no duplicated values in column
variable_name
no non-valid values in columns
type
,origin
,status
,indirect_identifier
for coded-list type variables:
no missing choices
no incorrectly formatted choices (expected format is "value1, Label 1 | value2, Label 2 | ...")
Arguments
- dict
A data frame reflecting a data dictionary to validate
- verbose
Logical indicating whether to give warning describing the checks that have failed. Defaults to TRUE.
Examples
# read example dataset
path_data <- system.file("extdata", package = "datadict")
dat <- readxl::read_xlsx(file.path(path_data, "linelist_cleaned.xlsx"))
# generate data dictionary template from dataset
dict <- dict_from_data(dat, factor_values = "string")
# dictionary column 'indirect_identifier' must be manually specified (yes/no)
dict$indirect_identifier <- "no"
# check for validity
valid_dict(dict)
#> [1] TRUE