The sardine package provides a modern, object-oriented interface for working with research data APIs, starting with comprehensive REDCap support.
Introduction
The sardine package (Structured Architecture for
Research Data Integration and Evaluation) provides a comprehensive and
secure interface for research data integration and evaluation. This
vignette will guide you through the basic functionality, with a focus on
REDCap API integration.
Installation
You can install the development version of sardine from GitHub:
# install.packages("devtools")
devtools::install_github("jackmanners/sardine")Setting up Your REDCap Project
The first step is to create a REDCap project object. This will test your connection and cache project metadata. If it fails, check your credentials and API token.
# Create project using direct parameters
project <- redcap_project(
url = "https://redcap.your-institution.edu/api/",
token = "YOUR_API_TOKEN"
)
# Or use environment variables for security (recommended - see below)
project <- redcap_project()Security Best Practices
For security, it’s recommended to store your credentials as environment variables rather than hardcoding them in your scripts.
Create a
.Renvironfile in your project directory or home directory-
Add your credentials:
REDCAP_URL=https://redcap.your-institution.edu/api/ REDCAP_TOKEN=your_api_token_here Restart R to load the environment variables
Accessing Data
The REDCap project object automatically caches all records when created. Access the data directly:
Filtering Records
Use standard dplyr operations to filter and manipulate your data:
library(dplyr)
# Filter specific records
specific_records <- project$data %>%
filter(record_id %in% c("001", "002", "003"))
# Filter by field values
adults <- project$data %>%
filter(age >= 18)
# Filter by completion status (2 = complete)
complete_baseline <- project$data %>%
filter(baseline_survey_complete == 2)
# Complex filtering
eligible_participants <- project$data %>%
filter(
age >= 18,
age <= 65,
consent_complete == 2,
!is.na(enrollment_date)
)Selecting Specific Fields
# Select specific fields
demographic_data <- project$data %>%
select(record_id, age, gender, race, ethnicity)
# Select by pattern (all fields from a form)
baseline_data <- project$data %>%
select(record_id, starts_with("baseline_"))
# Select by field type using metadata
numeric_fields <- project$metadata %>%
filter(text_validation_type_or_show_slider_number %in% c("integer", "number")) %>%
pull(field_name)
numeric_data <- project$data %>%
select(record_id, all_of(numeric_fields))Accessing Metadata
The project object automatically caches metadata which provides rich information about your fields:
# View field metadata
metadata <- project$metadata
head(metadata)
# Explore metadata structure
names(metadata)
# Get field information
field_info <- metadata %>%
select(field_name, field_label, field_type, required_field)
# Find all choice fields (radio, dropdown, checkbox)
choice_fields <- metadata %>%
filter(field_type %in% c("radio", "dropdown", "checkbox")) %>%
select(field_name, field_label, select_choices_or_calculations)
# Find validated fields
validated_fields <- metadata %>%
filter(!is.na(text_validation_type_or_show_slider_number)) %>%
select(field_name, field_label, text_validation_type_or_show_slider_number)
# Find required fields
required_fields <- metadata %>%
filter(required_field == "y") %>%
pull(field_name)
# Count field types
metadata %>%
count(field_type, sort = TRUE)Using Metadata with Data
# Create a labeled dataset
labeled_data <- project$data %>%
select(record_id, age, gender)
# Add field labels as attributes (useful for reports)
for (field in names(labeled_data)[-1]) {
label_text <- metadata %>%
filter(field_name == field) %>%
pull(field_label)
if (length(label_text) > 0) {
attr(labeled_data[[field]], "label") <- label_text
}
}
# Get form names for fields
field_forms <- metadata %>%
select(field_name, form_name)Advanced Usage
Refreshing Data
If data changes in REDCap (through the web interface or other processes), refresh your local cache:
# Refresh data from REDCap to get latest updates
project$refresh()
# The project$data is now updated
updated_data <- project$dataProject Information
View comprehensive information about your project:
# Display project information
project$info()
# This shows:
# - Project title
# - REDCap URL
# - When the project object was created
# - Number of cached records and fields
# - Number of metadata fields
# - Usage instructionsAccessing Project Details
# Access project information directly
project_info <- project$project_info
# Available properties vary by project but typically include:
print(project_info$project_title)
print(project_info$project_language)
print(project_info$is_longitudinal)
print(project_info$purpose)
print(project_info$purpose_other)
print(project_info$record_autonumbering_enabled)
# View all available project information
str(project_info)Working with the Project Object
The project object has several components:
# Data and metadata (primary usage)
data <- project$data # Full dataset (tibble)
metadata <- project$metadata # Field metadata
info <- project$project_info # Project settings
# Methods
project$info() # Display project information
project$refresh() # Refresh data from REDCap
# Internal properties (advanced)
project$.connection # Connection details
project$.created_at # Timestamp of object creationData Quality Checks
Integrate data quality checks into your workflow:
library(dplyr)
# Check for missing critical fields
missing_critical <- project$data %>%
filter(is.na(consent_date) | is.na(enrollment_date)) %>%
select(record_id, consent_date, enrollment_date)
if (nrow(missing_critical) > 0) {
warning("Found ", nrow(missing_critical), " records with missing critical dates")
print(missing_critical)
}
# Check completion rates
completion_summary <- project$data %>%
summarise(
total_records = n(),
baseline_complete = sum(baseline_survey_complete == 2, na.rm = TRUE),
followup_complete = sum(followup_survey_complete == 2, na.rm = TRUE)
) %>%
mutate(
baseline_pct = baseline_complete / total_records * 100,
followup_pct = followup_complete / total_records * 100
)
print(completion_summary)Data Reshaping
library(tidyr)
# Pivot data for longitudinal analysis
# Assume pain_score_day1, pain_score_day7, pain_score_day14
pain_long <- project$data %>%
select(record_id, starts_with("pain_score_")) %>%
pivot_longer(
cols = starts_with("pain_score_"),
names_to = "timepoint",
names_prefix = "pain_score_",
values_to = "pain_score"
)
# Pivot wider for analysis
pain_wide <- pain_long %>%
pivot_wider(
names_from = timepoint,
values_from = pain_score
)Error Handling
The sardine package includes comprehensive error handling. Common issues include:
- Invalid API token: Check that your token is correct and has appropriate permissions
- Network issues: Ensure you have internet connectivity and the REDCap server is accessible
- Field/form names: Verify that field and form names exist in your project
- Permissions: Ensure your API token has the necessary export permissions
Next Steps
This vignette covered the basic functionality of the sardine package. For more advanced usage, including:
- Data validation and cleaning functions
- Batch processing capabilities
- Integration with other APIs (coming in future versions)
Please refer to the additional vignettes and function documentation.
