Fetch records from multiple REDCap forms, returning separate list elements for each form
Source:R/fetch_database.R
fetch_database.Rd
Wrapper to fetch_records
that's vectorized over forms (i.e. instruments).
Returns a list whose elements are tibble
-style data
frames corresponding to each requested form.
Usage
fetch_database(
conn,
forms = NULL,
names_fn = function(x) x,
records = NULL,
records_omit = NULL,
id_field = TRUE,
rm_empty = TRUE,
rm_empty_omit_calc = FALSE,
value_labs = TRUE,
value_labs_fetch_raw = FALSE,
header_labs = FALSE,
checkbox_labs = FALSE,
use_factors = FALSE,
times_chron = TRUE,
date_range_begin = NULL,
date_range_end = NULL,
fn_dates = parse_date,
fn_dates_args = list(orders = c("Ymd", "dmY")),
fn_datetimes = lubridate::parse_date_time,
fn_datetimes_args = list(orders = c("Ymd HMS", "Ymd HM")),
na = c("", "NA"),
dag = TRUE,
batch_size = 100L,
batch_delay = 0.5,
form_delay = 0.5,
double_resolve = FALSE,
double_remove = FALSE,
double_sep = "--",
fns = NULL
)
Arguments
- conn
A REDCap API connection object (created with
rconn
)- forms
Character vector of forms (i.e. instruments) to fetch data for. Set to
NULL
(the default) to fetch all forms in the project.- names_fn
Function for creating custom list element names given a vector of form names. Defaults to an identity function in which case element names will correspond exactly to form names.
- records
Character vector of record IDs to fetch. Set to
NULL
(the default) to fetch all record IDs corresponding to the selected form(s).- records_omit
Character vector of record IDs to ignore. Set to
NULL
(the default) to not ignore any records. If a given record ID appears in both argumentrecords
andrecords_omit
, argumentrecords_omit
takes precedence and that record will not be returned.- id_field
Logical indicating whether to always include the 'record ID' field (defined in REDCap to be the first variable in the project codebook) in the API request, even if it's not specified in argument
fields
. Defaults toTRUE
.The record ID field is defined within the first form of a REDCap project, and so API requests for other forms will not include the record ID field by default (unless it's explicitly requested with argument
fields
). Theid_field
argument is a shortcut to avoid having to always explicitly request the record ID field.- rm_empty
Logical indicating whether to remove rows for which all fields from the relevant form(s) are missing. See section Removing empty rows. Defaults to
TRUE
.- rm_empty_omit_calc
Logical indicating whether to exclude calculated fields from assessment of empty rows. Defaults to FALSE. In some cases calculated fields can be autopopulated for certain records even when the relevant form is truly empty, which would otherwise lead to "empty" forms being returned even when
rm_empty
isTRUE
. Defaults toFALSE
.- value_labs
Logical indicating whether to return value labels (
TRUE
) or raw values (FALSE
) for categorical REDCap variables (radio, dropdown, yesno, checkbox). Defaults toTRUE
to return labels.- value_labs_fetch_raw
Logical indicating whether to request raw values for categorical REDCap variables (radio, dropdown, yesno, checkbox), which are then transformed to labels in a separate step when
value_labs = TRUE
. Primarily used for troubleshooting issues with the REDCap API returning fewer records than expected when given certain combinations of request parameters.- header_labs
Logical indicating whether to export column names as labels (
TRUE
) or raw variable names (FALSE
). Defaults toFALSE
to return raw variable names.- checkbox_labs
Logical indicating whether to export checkbox labels (
TRUE
) or statuses (i.e. "Unchecked" or "Checked") (FALSE
). Defaults toFALSE
to export statuses. Note this argument is only relevant whenvalue_labs
isTRUE
— ifvalue_labs
isFALSE
checkbox variables will always be exported as raw values (usually "0"/"1").- use_factors
Logical indicating whether categorical REDCap variables (radio, dropdown, yesno, checkbox) should be returned as factors. Factor levels can either be raw values (e.g. "0"/"1") or labels (e.g. "No"/"Yes") depending on arguments
value_labs
andcheckbox_labs
. Defaults toFALSE
.- times_chron
Logical indicating whether to reclass time variables using chron::times (
TRUE
) or leave as character HH:MM format (FALSE
). Defaults toTRUE
. Note this only applies to variables of REDCap type "Time (HH:MM)", and not "Time (MM:SS)".- date_range_begin
Fetch only records created or modified after a given date-time. Use format "YYYY-MM-DD HH:MM:SS" (e.g., "2017-01-01 00:00:00" for January 1, 2017 at midnight server time). Defaults to NULL to omit a lower time limit.
- date_range_end
Fetch only records created or modified before a given date-time. Use format "YYYY-MM-DD HH:MM:SS" (e.g., "2017-01-01 00:00:00" for January 1, 2017 at midnight server time). Defaults to NULL to omit a lower time limit.
- fn_dates
Function to parse REDCap date variables. Defaults to
parse_date
, an internal wrapper tolubridate::parse_date_time
. If date variables have been converted to numeric (e.g. by writing to Excel), set to e.g.lubridate::as_date
to convert back to dates.- fn_dates_args
List of arguments to pass to
fn_dates
. Can set to empty listlist()
if using a function that doesn't take any arguments.- fn_datetimes
Function to parse REDCap datetime variables. Defaults to
lubridate::parse_date_time
.- fn_datetimes_args
List of arguments to pass to
fn_datetimes
. Can set to empty listlist()
if using a function that doesn't take any arguments.- na
Character vector of strings to interpret as missing values. Passed to readr::read_csv. Defaults to
c("", "NA")
.- dag
Logical indicating whether to export the
redcap_data_access_group
field (if used in the project). Defaults toTRUE
.- batch_size
Number of records to fetch per batch. Defaults to
100L
. Set toInf
orNA
to fetch all records at once.- batch_delay
Delay in seconds between fetching successive batches, to give the REDCap server time to respond to other requests. Defaults to
0.5
.- form_delay
Delay in seconds between fetching successive forms, to give the REDCap server time to respond to other requests. Defaults to
0.5
.- double_resolve
Logical indicating whether to resolve double-entries (i.e. records entered in duplicate using REDCap's Double Data Entry module), by filtering to the lowest entry number associated with each unique record.
If a project uses double-entry, the record IDs returned by an "Export Records" API request will be a concatenation of the normal record ID and the entry number (1 or 2), normally separated by "–" (e.g. "P0285–1"). To resolve double entries we move the entry number portion of the ID to its own column (
entry
), identify all entries belonging to the same unique record, and retain only the row with the lowest entry number for each unique record.Unique records are identified using the record ID column (after separating the entry number portion), and any of the following columns when present (accounting for argument
header_labs
): redcap_event_name (Redcap Event), redcap_repeat_instrument (Repeat Instrument), redcap_repeat_instance (Repeat Instance).- double_remove
Logical indicating whether to remove double-entries (i.e. records entered in duplicate using REDCap's Double Data Entry module), by filtering out records where the record ID field contains pattern
double_sep
(see next argument), so that only merged records remain.- double_sep
If
double_resolve
isTRUE
, the string separator used to split the record ID field into the record ID and entry number. Defaults to "–".- fns
Optional list of one or more functions to apply to each list element (i.e. each form). Could be used e.g. to filter out record IDs from test entries, create derived variables, etc. Each function should take a data frame returned by
fetch_records
as its first argument.
Value
A list of tibble
-style data frames corresponding to each
of the requested forms.
Removing empty rows
Depending on the database design, an "Export Records" API request can sometimes return empty rows, representing forms for which no data has been collected. For example, if forms F1 and F2 are part of the same event, and participant "P001" has form data for F2 but not F1, an API request for F1 will include a row for participant "P001" where all F1-specific fields are empty.
If argument rm_empty
is TRUE
(the default), fetch_records()
will filter
out such rows. The check for empty rows is based only on fields that are
specific to the form(s) specified in argument forms
— i.e. it excludes the
record ID field, and generic fields like redcap_event_name
,
redcap_data_access_group
, etc. The check for empty rows also accounts for
checkbox fields, which, if argument checkbox_labs
is FALSE
, will be set
to "Unchecked" in an empty form (rather than missing per se).
Examples
if (FALSE) { # \dontrun{
conn <- rconn(
url = "https://redcap.msf.fr/api/",
token = Sys.getenv("MY_REDCAP_TOKEN")
)
fetch_database(
conn,
forms = c("my_form1", "my_form2", "my_form3")
)
# use a custom fn to format the 'participant_id' column of each form
# the function must take a data frame as its first argument
format_ids <- function(x) {
x$participant_id <- toupper(x$participant_id)
x$participant_id <- gsub("[^[:alnum:]]+", "_", x$participant_id)
x
}
fetch_database(
conn,
forms = c("my_form1", "my_form2", "my_form3"),
fns = list(format_ids)
)
} # }