REDCap Data Extraction with Sardine
This vignette demonstrates how to extract data from REDCap projects using the sardine package’s object-oriented approach.
Environment Configuration
The sardine package uses environment variables for secure credential management.
# Create environment template (one-time setup)
create_env_template()
# Edit the .env file with your REDCap credentials:
# REDCAP_URL=https://redcap.example.edu/api/
# REDCAP_TOKEN=YOUR_API_TOKEN_HERE
# Load environment variables
load_env()Creating a REDCap Project
The core of sardine’s approach is the REDCap project object:
# Create project from environment variables
project <- redcap_project()
# Or create directly with credentials
project <- redcap_project(
url = "https://redcap.example.edu/api/",
token = "YOUR_API_TOKEN"
)When you create a project: - Connection is tested automatically - Full dataset is cached for performance - Project metadata is retrieved - Any connection issues cause immediate failure with clear error messages
Viewing Project Information
# View project summary
project$info()
# REDCap Project
# ==============
# Title: My Research Study
# URL: https://redcap.example.edu
# Created: 2025-10-03 10:30:00
# Cached Data: 150 records, 25 fieldsExtracting by Form/Instrument
To get data from specific REDCap forms, use metadata to identify form fields:
# Get metadata to see form structure
metadata <- project$metadata
# View all forms
forms <- unique(metadata$form_name)
print(forms)
# Get field names for a specific form
baseline_fields <- metadata %>%
filter(form_name == "baseline_demographics") %>%
pull(field_name)
# Extract data for that form
baseline_data <- project$data %>%
select(record_id, all_of(baseline_fields))
# Extract multiple forms
demo_forms <- metadata %>%
filter(form_name %in% c("baseline_demographics", "medical_history")) %>%
pull(field_name)
core_data <- project$data %>%
select(record_id, all_of(demo_forms))
# Alternative: use pattern matching if forms have consistent prefixes
baseline_by_prefix <- project$data %>%
select(record_id, starts_with("baseline_"))Filtering by Completion Status
REDCap automatically adds completion status fields:
# Find completion status fields
completion_fields <- metadata %>%
filter(grepl("_complete$", field_name)) %>%
pull(field_name)
print(completion_fields)
# Example: baseline_demographics_complete, medical_history_complete
# REDCap completion values:
# 0 = Incomplete
# 1 = Unverified
# 2 = Complete
# Get only complete baseline surveys
complete_baseline <- project$data %>%
filter(baseline_demographics_complete == 2)
# Get records with incomplete forms
incomplete_any <- project$data %>%
filter(
baseline_demographics_complete != 2 |
medical_history_complete != 2
)
# Completion summary
completion_summary <- project$data %>%
summarise(
total = n(),
baseline_complete = sum(baseline_demographics_complete == 2, na.rm = TRUE),
baseline_pct = baseline_complete / total * 100
)Working with Longitudinal Projects
For projects with multiple events/timepoints:
# Check if project is longitudinal
is_longitudinal <- project$project_info$is_longitudinal
# Access metadata about the project structure
metadata <- project$metadata
# In longitudinal projects, the data typically includes
# a redcap_event_name field
if ("redcap_event_name" %in% names(project$data)) {
# View all events in the data
events <- project$data %>%
distinct(redcap_event_name)
print(events)
# Extract data from specific event
baseline_event <- project$data %>%
filter(redcap_event_name == "baseline_arm_1")
# Extract from multiple events
followup_data <- project$data %>%
filter(redcap_event_name %in% c("month_3_arm_1", "month_6_arm_1"))
# Compare values across events
vital_signs_over_time <- project$data %>%
select(record_id, redcap_event_name, bp_systolic, bp_diastolic) %>%
arrange(record_id, redcap_event_name)
}Working with Repeating Instruments
For projects with repeating instruments:
# Repeating instruments have redcap_repeat_instrument and redcap_repeat_instance
if ("redcap_repeat_instance" %in% names(project$data)) {
# Get all instances of a repeating form
medication_log <- project$data %>%
filter(redcap_repeat_instrument == "medications") %>%
select(record_id, redcap_repeat_instance,
medication_name, medication_dose, medication_start_date) %>%
arrange(record_id, redcap_repeat_instance)
# Get latest instance for each record
latest_medication <- medication_log %>%
group_by(record_id) %>%
slice_max(redcap_repeat_instance, n = 1) %>%
ungroup()
# Count instances per record
instance_counts <- project$data %>%
filter(redcap_repeat_instrument == "adverse_events") %>%
group_by(record_id) %>%
summarise(total_adverse_events = n_distinct(redcap_repeat_instance))
}Data Refresh
When you need fresh data from REDCap:
# Refresh cached data
project$refresh()
# The cached data is now updated
fresh_data <- project$dataBest Practices
1. Use Cached Data When Possible
# Good: Use cached data for exploration
all_data <- project$data
summary_stats <- all_data %>%
summarise(
n_participants = n_distinct(record_id),
mean_age = mean(age, na.rm = TRUE),
complete_records = sum(complete.cases(.))
)2. Use Targeted Extraction for Specific Analyses
# Good: Extract only needed fields for analysis
analysis_data <- project$data |>
dplyr::select(record_id, treatment_group, outcome_measure, baseline_score)4. Validate Data After Extraction
# Validate extracted data
validation_checks <- list(
record_count = nrow(demographics),
expected_fields = all(c("record_id", "age", "gender") %in% names(demographics)),
no_duplicate_records = nrow(demographics) == n_distinct(demographics$record_id),
reasonable_age_range = all(demographics$age >= 0 & demographics$age <= 120, na.rm = TRUE)
)
cat("Validation Results:\n")
print(validation_checks)Troubleshooting
Connection Issues
# If project creation fails, check:
# 1. URL format (should end with /api/)
# 2. Token validity
# 3. Network connectivity
tryCatch({
project <- redcap_project()
}, error = function(e) {
cat("Connection failed:", e$message, "\n")
cat("Check your REDCAP_URL and REDCAP_TOKEN in .env file\n")
})Empty Results
# If export_records returns empty results:
result <- project$data |>
dplyr::select(record_id, nonexistent_field)
if (nrow(result) == 0) {
cat("No data returned. Check:\n")
cat("- Field names are correct\n")
cat("- Records exist in the project\n")
cat("- User has appropriate permissions\n")
# Check available fields
available_fields <- names(project$data)
cat("Available fields:", paste(head(available_fields, 10), collapse = ", "), "...\n")
}Summary
The sardine package provides a clean, efficient way to extract REDCap data:
- Create once, use many times: Project objects cache data for performance
- Fail-fast: Connection issues are caught immediately
- Flexible extraction: Get all data, specific fields, records, or forms
- Longitudinal support: Handle complex event-based projects
- Clean interface: Intuitive function names and consistent API
Next, see the data import vignette to learn about getting data back into REDCap.
