Count the number of observations across unique combinations of indirect identifiers within a dataset
Source:R/k_anonymity_counts.R
k_anonymity_counts.Rd
Given a dataset and set of one or more variables that may be indirect identifiers, the function returns a table of counts of the number of observations corresponding to each unique combination of those variables (i.e. k), optionally filtered to those combinations that do not meet the user-specific threshhold of k-anonymity.
Arguments
- x
A data frame
- vars
A character vector containing the name(s) of the variable(s) in
x
to be included in the k-anonymity calculation- threshold
Integer threshold indicating the minimum acceptable value of k. Combinations with values of k below the threshold will be flagged and returned. A return with 0 rows indicates that no combinations have values of k below the threshold.
Value
A tibble
-style data frame containing counts of unique
combinations of the variables specified in argument vars
. If argument
threshold
is specified, only the combinations with counts lower than the
threshold are returned, if any (i.e. combinations that do not meet the
specified value of k-anonymity).
Examples
# read example dataset
path_data <- system.file("extdata", package = "datadict")
dat <- readxl::read_xlsx(file.path(path_data, "linelist_cleaned.xlsx"))
# display combinations of gender and age_cat with k < 5
k_anonymity_counts(dat, vars = c("gender", "age_cat"), threshold = 5)
#> # A tibble: 3 × 3
#> gender age_cat k
#> <chr> <chr> <int>
#> 1 NA 70+ 1
#> 2 f 50-69 2
#> 3 NA 50-69 2