Check the k-anonymity of one or more variables in a dataset

Given a dataset and set of one or more variables of interest (e.g. variables that are potential indirect identifiers), the function returns the minimum observed value of k in the dataset, corresponding to the unique combination of identifying variables with the fewest observations.

Usage

k_anonymity(x, vars)

Arguments

x: A data frame
vars: A character vector containing the name(s) of the variable(s) in x to be included in the k-anonymity calculation

Value

The minimum observed value of k in the dataset, corresponding to the unique combination of identifying variables with the fewest observations

Examples

# read example dataset
path_data <- system.file("extdata", package = "datadict")
dat <- readxl::read_xlsx(file.path(path_data, "linelist_cleaned.xlsx"))

# find minimum observed k for potential indirect identifiers gender and age_cat
k_anonymity(dat, vars = c("gender", "age_cat"))
#> [1] 1