Skip to contents

Data Quality and Completion Reports with Sardine

This vignette demonstrates how to monitor data quality, track form completion, and assess participant retention in REDCap studies using the sardine package. For demographic and baseline characteristics tables, see the “Table One and Demographic Reporting” vignette.

What This Vignette Covers

  • Data Quality: Missing data analysis, data type validation, and quality reports
  • Form Completion: Tracking which forms are complete across participants
  • Event Completion: Monitoring progress in longitudinal studies
  • Retention: Measuring participant retention and identifying attrition
  • Automated Monitoring: Setting up regular quality checks

Setup

library(sardine)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(ggplot2)
library(knitr)

Project Setup

# Load environment and create project
load_env()
project <- redcap_project()

# View project info
project$info()

Data Quality Reports

Missing Data Analysis

Identify fields with missing data and quantify the extent:

# Analyze missing data patterns
missing_report <- analyze_missing_data(project)
print(missing_report)

Example output:

Missing Data Analysis
━━━━━━━━━━━━━━━━━━━━━
Total records: 150
Total fields: 45
Fields with missing data: 12

  field                 missing_count  missing_percent  records_affected
  ────────────────────  ─────────────  ───────────────  ────────────────
  follow_up_3_date      68             45.3%            68
  telephone_2           42             28.0%            42
  adverse_event_desc    35             23.3%            35
  bmi                   12             8.0%             12
  income                8              5.3%             8

Key insights: - Which fields have the most missing data - How many records are affected - Whether missingness is systematic or random

Data Type Validation

Verify that data matches expected types:

# Validate data types against REDCap metadata
type_validation <- validate_data_types(project)
print(type_validation)

Example output:

Data Type Validation
━━━━━━━━━━━━━━━━━━━━
Total fields validated: 45
Fields with issues: 3

  field          expected_type  actual_type  issue_count  example_issues
  ─────────────  ─────────────  ───────────  ───────────  ──────────────
  age            numeric        character    5            "forty-two", "N/A"
  enrollment_dt  date           character    2            "2023-13-45"
  height_cm      numeric        character    1            "unknown"

Use this to identify: - Data entry errors - Format inconsistencies - Fields needing cleaning

Comprehensive Quality Report

Generate a combined data quality report:

# Generate comprehensive quality report
quality_report <- generate_data_quality_report(project)
print(quality_report)

Output includes: - Missing data summary - Data type validation - Field-level quality metrics - Recommendations for improvement

Custom Quality Checks

Create project-specific quality rules:

# Custom quality check function
check_age_range <- function(project) {
  data <- project$data
  
  if (!"age" %in% names(data)) {
    cli::cli_alert_info("No age field found")
    return(NULL)
  }
  
  issues <- data %>%
    filter(!is.na(age)) %>%
    filter(age < 18 | age > 120) %>%
    select(record_id, age)
  
  if (nrow(issues) > 0) {
    cli::cli_alert_warning("{nrow(issues)} records with age outside 18-120 range")
    print(issues)
  } else {
    cli::cli_alert_success("All ages within expected range")
  }
  
  return(issues)
}

age_issues <- check_age_range(project)

Range and Outlier Detection

Identify potential outliers in numeric fields:

# Detect outliers in numeric fields
detect_outliers <- function(project, field, lower_limit = NULL, upper_limit = NULL) {
  data <- project$data
  id_field <- project$id_field
  
  if (!field %in% names(data)) {
    cli::cli_alert_danger("Field {field} not found")
    return(NULL)
  }
  
  field_data <- data %>%
    select(!!sym(id_field), !!sym(field)) %>%
    filter(!is.na(!!sym(field)))
  
  # Calculate outliers using IQR method if limits not provided
  if (is.null(lower_limit) || is.null(upper_limit)) {
    q1 <- quantile(field_data[[field]], 0.25, na.rm = TRUE)
    q3 <- quantile(field_data[[field]], 0.75, na.rm = TRUE)
    iqr <- q3 - q1
    
    if (is.null(lower_limit)) lower_limit <- q1 - 3 * iqr
    if (is.null(upper_limit)) upper_limit <- q3 + 3 * iqr
  }
  
  outliers <- field_data %>%
    filter(!!sym(field) < lower_limit | !!sym(field) > upper_limit)
  
  cli::cli_h3("Outlier Detection: {field}")
  cli::cli_text("Valid range: {lower_limit} to {upper_limit}")
  cli::cli_text("Outliers found: {nrow(outliers)}")
  
  if (nrow(outliers) > 0) {
    print(outliers)
  }
  
  return(outliers)
}

# Check for outliers in BMI
bmi_outliers <- detect_outliers(project, "bmi", lower_limit = 15, upper_limit = 50)

# Check for outliers in blood pressure
bp_outliers <- detect_outliers(project, "systolic_bp", lower_limit = 70, upper_limit = 220)

Form Completion Tracking

Overall Completion Status

Get completion status for specific forms:

# Get completion status for specific forms
completion_status <- get_form_completion_status(
  project,
  forms = c("demographics", "baseline_survey", "medical_history")
)

# View results
head(completion_status)

# Summary by form
completion_status %>%
  summarise(across(
    -record_id,
    ~ sum(. == "complete", na.rm = TRUE)
  ))

Example output:

  record_id  demographics      baseline_survey   medical_history
  ─────────  ────────────────  ────────────────  ───────────────
  001        complete          complete          incomplete
  002        complete          complete          complete
  003        complete          not_started       not_started

Identify Incomplete Forms

Find participants who need follow-up:

# Get only incomplete or not started forms
need_follow_up <- get_form_completion_status(
  project,
  forms = c("follow_up_1", "follow_up_2", "follow_up_3"),
  status = c("incomplete", "not_started")
)

# Export for coordinator follow-up
write.csv(need_follow_up, "participants_needing_followup.csv", row.names = FALSE)

# Count by status
need_follow_up %>%
  summarise(across(
    -record_id,
    list(
      not_started = ~ sum(. == "not_started", na.rm = TRUE),
      incomplete = ~ sum(. == "incomplete", na.rm = TRUE)
    )
  ))

Completion Rate Summary

Calculate overall completion rates:

# Function to calculate completion rates
calculate_completion_rates <- function(project, forms) {
  data <- project$data
  
  rates <- data.frame(
    form = forms,
    n_total = nrow(data),
    stringsAsFactors = FALSE
  )
  
  for (form in forms) {
    complete_field <- paste0(form, "_complete")
    
    if (complete_field %in% names(data)) {
      rates$n_complete[rates$form == form] <- sum(data[[complete_field]] == 2, na.rm = TRUE)
      rates$n_incomplete[rates$form == form] <- sum(data[[complete_field]] == 1, na.rm = TRUE)
      rates$n_not_started[rates$form == form] <- sum(
        is.na(data[[complete_field]]) | data[[complete_field]] == 0,
        na.rm = TRUE
      )
      rates$pct_complete[rates$form == form] <- round(
        rates$n_complete[rates$form == form] / nrow(data) * 100, 1
      )
    }
  }
  
  return(rates)
}

# Calculate rates for key forms
forms_to_check <- c("demographics", "baseline_survey", "follow_up_1", 
                    "follow_up_2", "adverse_events")

completion_rates <- calculate_completion_rates(project, forms_to_check)
print(completion_rates)

# Visualize
if (requireNamespace("ggplot2", quietly = TRUE)) {
  ggplot(completion_rates, aes(x = form, y = pct_complete)) +
    geom_col(fill = "steelblue") +
    geom_text(aes(label = paste0(pct_complete, "%")), vjust = -0.5) +
    theme_minimal() +
    labs(
      title = "Form Completion Rates",
      x = "Form",
      y = "% Complete"
    ) +
    theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
    ylim(0, 110)
}

Longitudinal Study Reports

Event Completion Summary

For longitudinal projects, track completion across events:

# Get event completion summary
event_summary <- get_event_completion_summary(project)
print(event_summary)

Example output:

Event Completion Summary
━━━━━━━━━━━━━━━━━━━━━━
Events: 4
Total participants tracked: 150
Average completion rate: 68.5%

  event                    n_participants  demographics_complete  baseline_complete  ...
  ───────────────────────  ──────────────  ────────────────────  ─────────────────  ────
  baseline_arm_1           150             148                    145                ...
  month_3_followup_arm_1   142             140                    138                ...
  month_6_followup_arm_1   128             125                    122                ...
  month_12_followup_arm_1  98              95                     92                 ...

Event Completion for Specific Forms

Focus on particular forms of interest:

# Track specific forms across events
event_summary <- get_event_completion_summary(
  project,
  forms = c("follow_up_survey", "quality_of_life", "adverse_events")
)

# Long format for easier analysis
event_summary_long <- get_event_completion_summary(
  project,
  forms = c("follow_up_survey", "quality_of_life"),
  format = "long"
)

print(event_summary_long)

Retention Analysis

Track participant retention over time:

# Calculate retention from baseline
retention <- get_retention_summary(project)
print(retention)

Example output:

Retention Summary
━━━━━━━━━━━━━━━━━
Baseline participants: 150
Events tracked: 4
Final retention: 98 (65.3%)

  event                    n_baseline  n_retained  n_lost  retention_rate  attrition_rate
  ───────────────────────  ──────────  ──────────  ──────  ──────────────  ──────────────
  baseline_arm_1           150         150         0       100.0%          0.0%
  month_3_followup_arm_1   150         142         8       94.7%           5.3%
  month_6_followup_arm_1   150         128         22      85.3%           14.7%
  month_12_followup_arm_1  150         98          52      65.3%           34.7%

Retention Based on Form Completion

Define retention by specific form completion:

# Retention based on follow-up survey completion
retention_survey <- get_retention_summary(
  project,
  definition = "complete_form",
  form = "follow_up_survey"
)
print(retention_survey)

# Compare different retention definitions
retention_any <- get_retention_summary(project, definition = "any_data")
retention_complete <- get_retention_summary(
  project,
  definition = "complete_form",
  form = "follow_up_survey"
)

# Visualize retention curves
if (requireNamespace("ggplot2", quietly = TRUE)) {
  combined <- bind_rows(
    retention_any %>% mutate(definition = "Any Data"),
    retention_complete %>% mutate(definition = "Survey Complete")
  )
  
  ggplot(combined, aes(x = event, y = retention_rate, color = definition, group = definition)) +
    geom_line(linewidth = 1) +
    geom_point(size = 3) +
    theme_minimal() +
    labs(
      title = "Participant Retention by Definition",
      x = "Event",
      y = "Retention Rate (%)",
      color = "Definition"
    ) +
    theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
    ylim(0, 100)
}

Enrollment Tracking

Enrollment Timeline

Track when participants enrolled:

# Generate enrollment timeline
generate_enrollment_report <- function(project, date_field = "enrollment_date") {
  data <- project$data
  
  if (!date_field %in% names(data)) {
    cli::cli_alert_warning("Field {date_field} not found")
    return(NULL)
  }
  
  enrollment_data <- data %>%
    filter(!is.na(!!sym(date_field))) %>%
    mutate(
      enrollment_date = as.Date(!!sym(date_field)),
      enrollment_month = format(enrollment_date, "%Y-%m")
    ) %>%
    count(enrollment_month, name = "enrolled") %>%
    arrange(enrollment_month) %>%
    mutate(cumulative = cumsum(enrolled))
  
  cli::cli_h2("Enrollment Report")
  cli::cli_text("Total enrolled: {sum(enrollment_data$enrolled)}")
  cli::cli_text("Enrollment period: {min(enrollment_data$enrollment_month)} to {max(enrollment_data$enrollment_month)}")
  
  print(enrollment_data)
  
  # Plot
  if (requireNamespace("ggplot2", quietly = TRUE)) {
    p <- ggplot(enrollment_data, aes(x = enrollment_month, y = cumulative)) +
      geom_line(group = 1, color = "steelblue", linewidth = 1) +
      geom_point(size = 3, color = "steelblue") +
      theme_minimal() +
      labs(
        title = "Cumulative Enrollment Over Time",
        x = "Month",
        y = "Cumulative Participants"
      ) +
      theme(axis.text.x = element_text(angle = 45, hjust = 1))
    
    print(p)
  }
  
  return(enrollment_data)
}

enrollment <- generate_enrollment_report(project)

Enrollment vs Target

Compare actual vs target enrollment:

# Compare enrollment to target
check_enrollment_progress <- function(project, target_n, target_date = NULL,
                                     date_field = "enrollment_date") {
  data <- project$data
  
  if (!date_field %in% names(data)) {
    cli::cli_alert_warning("Field {date_field} not found")
    return(NULL)
  }
  
  # Current enrollment
  current_n <- sum(!is.na(data[[date_field]]))
  pct_enrolled <- round(current_n / target_n * 100, 1)
  
  cli::cli_h2("Enrollment Progress")
  cli::cli_text("Current enrollment: {current_n} / {target_n} ({pct_enrolled}%)")
  cli::cli_text("Participants remaining: {target_n - current_n}")
  
  # Time-based projections if target date provided
  if (!is.null(target_date)) {
    target_date <- as.Date(target_date)
    days_remaining <- as.numeric(target_date - Sys.Date())
    
    enrollment_dates <- as.Date(data[[date_field]][!is.na(data[[date_field]])])
    days_elapsed <- as.numeric(Sys.Date() - min(enrollment_dates))
    enrollment_rate <- current_n / days_elapsed  # per day
    
    projected_final_n <- round(current_n + (enrollment_rate * days_remaining))
    
    cli::cli_text("Days remaining: {days_remaining}")
    cli::cli_text("Current enrollment rate: {round(enrollment_rate, 2)} per day")
    cli::cli_text("Projected final enrollment: {projected_final_n}")
    
    if (projected_final_n < target_n) {
      cli::cli_alert_warning("Projected to fall short by {target_n - projected_final_n} participants")
      needed_rate <- (target_n - current_n) / days_remaining
      cli::cli_text("Required rate: {round(needed_rate, 2)} per day")
    } else {
      cli::cli_alert_success("On track to meet enrollment target")
    }
  }
  
  return(list(
    current_n = current_n,
    target_n = target_n,
    pct_enrolled = pct_enrolled
  ))
}

# Check progress
enrollment_progress <- check_enrollment_progress(
  project,
  target_n = 200,
  target_date = "2026-06-30"
)

Automated Monitoring

Daily Quality Check

Set up daily automated checks:

# Daily quality monitoring function
daily_quality_check <- function(project, output_dir = "daily_reports") {
  
  timestamp <- format(Sys.Date(), "%Y-%m-%d")
  
  cli::cli_h1("Daily Quality Check - {timestamp}")
  
  # 1. New records check
  data <- project$data
  id_field <- project$id_field
  n_records <- nrow(data)
  
  cli::cli_h2("Data Summary")
  cli::cli_text("Total records: {n_records}")
  
  # 2. Missing data check
  missing_report <- analyze_missing_data(project)
  critical_missing <- missing_report %>%
    filter(missing_percent > 20)  # More than 20% missing
  
  if (nrow(critical_missing) > 0) {
    cli::cli_alert_warning("{nrow(critical_missing)} fields with >20% missing data")
    print(critical_missing)
  } else {
    cli::cli_alert_success("No critical missing data issues")
  }
  
  # 3. Data type validation
  type_issues <- validate_data_types(project)
  if (nrow(type_issues) > 0) {
    cli::cli_alert_warning("{nrow(type_issues)} fields with data type issues")
  } else {
    cli::cli_alert_success("No data type issues detected")
  }
  
  # 4. Completion status
  if (!is.null(project$data$redcap_event_name)) {
    # Longitudinal
    retention <- get_retention_summary(project)
    latest_retention <- retention %>% slice(n()) %>% pull(retention_rate)
    cli::cli_text("Current retention: {latest_retention}%")
  } else {
    # Non-longitudinal  
    # Calculate average form completion
    complete_fields <- grep("_complete$", names(data), value = TRUE)
    if (length(complete_fields) > 0) {
      avg_completion <- mean(
        sapply(complete_fields, function(f) {
          sum(data[[f]] == 2, na.rm = TRUE) / nrow(data)
        })
      ) * 100
      cli::cli_text("Average form completion: {round(avg_completion, 1)}%")
    }
  }
  
  # 5. Save summary
  if (!dir.exists(output_dir)) {
    dir.create(output_dir, recursive = TRUE)
  }
  
  summary_file <- file.path(output_dir, paste0("quality_check_", timestamp, ".txt"))
  sink(summary_file)
  cat("Daily Quality Check\n")
  cat("===================\n")
  cat("Date:", timestamp, "\n")
  cat("Records:", n_records, "\n")
  cat("Critical missing fields:", nrow(critical_missing), "\n")
  cat("Data type issues:", nrow(type_issues), "\n")
  sink()
  
  cli::cli_alert_success("Summary saved to {summary_file}")
  
  return(invisible(list(
    n_records = n_records,
    critical_missing = critical_missing,
    type_issues = type_issues
  )))
}

# Run daily check
daily_quality_check(project)

Weekly Progress Report

Generate weekly summary reports:

# Weekly progress report
generate_weekly_report <- function(project, output_dir = "weekly_reports") {
  
  week_ending <- format(Sys.Date(), "%Y-%m-%d")
  
  cli::cli_h1("Weekly Progress Report - Week Ending {week_ending}")
  
  # 1. Data quality
  quality_report <- generate_data_quality_report(project)
  
  # 2. Event completion (if longitudinal)
  if (!is.null(project$data$redcap_event_name)) {
    event_summary <- get_event_completion_summary(project)
    cli::cli_h2("Event Completion")
    print(event_summary)
  }
  
  # 3. Form completion rates
  data <- project$data
  complete_fields <- grep("_complete$", names(data), value = TRUE)
  forms <- gsub("_complete$", "", complete_fields)
  
  completion_rates <- calculate_completion_rates(project, forms)
  cli::cli_h2("Form Completion Rates")
  print(completion_rates)
  
  # 4. Retention (if longitudinal)
  if (!is.null(project$data$redcap_event_name)) {
    retention <- get_retention_summary(project)
    cli::cli_h2("Retention Summary")
    print(retention)
  }
  
  # 5. Export reports
  if (!dir.exists(output_dir)) {
    dir.create(output_dir, recursive = TRUE)
  }
  
  # Save completion rates
  write.csv(
    completion_rates,
    file.path(output_dir, paste0("completion_rates_", week_ending, ".csv")),
    row.names = FALSE
  )
  
  # Save event summary if available
  if (!is.null(project$data$redcap_event_name)) {
    write.csv(
      event_summary,
      file.path(output_dir, paste0("event_completion_", week_ending, ".csv")),
      row.names = FALSE
    )
    
    write.csv(
      retention,
      file.path(output_dir, paste0("retention_", week_ending, ".csv")),
      row.names = FALSE
    )
  }
  
  cli::cli_alert_success("Reports saved to {output_dir}")
  
  return(invisible(list(
    quality_report = quality_report,
    completion_rates = completion_rates
  )))
}

# Run weekly report
weekly_report <- generate_weekly_report(project)

Exporting Reports

Multiple Format Export

Export reports in various formats:

# Export function for multiple formats
export_report <- function(data, base_filename) {
  
  # CSV for data analysis
  write.csv(data, paste0(base_filename, ".csv"), row.names = FALSE)
  
  # RDS for R objects
  saveRDS(data, paste0(base_filename, ".rds"))
  
  # Excel if available
  if (requireNamespace("writexl", quietly = TRUE)) {
    writexl::write_xlsx(data, paste0(base_filename, ".xlsx"))
  }
  
  # JSON for web/API
  if (requireNamespace("jsonlite", quietly = TRUE)) {
    jsonlite::write_json(data, paste0(base_filename, ".json"), pretty = TRUE)
  }
  
  cli::cli_alert_success("Report exported to multiple formats: {base_filename}")
}

# Use function
completion_report <- calculate_completion_rates(
  project, 
  c("demographics", "baseline_survey", "follow_up")
)

export_report(completion_report, "reports/completion_summary")
  completion_data <- get_participant_completion(project)
  completion_summary <- create_completion_summary(completion_data)
  write.csv(completion_summary, 
            file.path(output_dir, paste0("completion_", timestamp, ".csv")),
            row.names = FALSE)
  
  # 2. Missing data report
  missing_data <- generate_missing_data_report(project)
  if (!is.null(missing_data) && nrow(missing_data) > 0) {
    write.csv(missing_data,
              file.path(output_dir, paste0("missing_data_", timestamp, ".csv")),
              row.names = FALSE)
  }
  
  # 3. Quick summary
  quick_summary <- data.frame(
    report_date = Sys.Date(),
    total_records = nrow(project$data),
    total_fields = ncol(project$data),
    avg_completion_rate = round(mean(completion_summary$completion_rate), 1),
    stringsAsFactors = FALSE
  )
  
  write.csv(quick_summary,
            file.path(output_dir, paste0("summary_", timestamp, ".csv")),
            row.names = FALSE)
  
  cat("Reports saved to:", output_dir, "\n")
  cat("Files generated:")
  cat("  - completion_", timestamp, ".csv\n", sep = "")
  cat("  - missing_data_", timestamp, ".csv\n", sep = "")
  cat("  - summary_", timestamp, ".csv\n", sep = "")
  
  return(invisible(list(
    completion = completion_summary,
    missing_data = missing_data,
    summary = quick_summary
  )))
}

# Run scheduled report
# scheduled_results <- run_scheduled_report()

Best Practices

1. Regular Monitoring

Establish a consistent monitoring schedule:

# Example monitoring schedule
monitoring_schedule <- function(project) {
  
  day_of_week <- weekdays(Sys.Date())
  
  # Daily checks (Monday-Friday)
  if (day_of_week %in% c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")) {
    cli::cli_alert_info("Running daily quality check...")
    daily_quality_check(project)
  }
  
  # Weekly reports (Friday)
  if (day_of_week == "Friday") {
    cli::cli_alert_info("Running weekly progress report...")
    generate_weekly_report(project)
  }
  
  # Monthly comprehensive report (1st of month)
  if (format(Sys.Date(), "%d") == "01") {
    cli::cli_alert_info("Running monthly comprehensive report...")
    quality_report <- generate_data_quality_report(project)
    
    if (!is.null(project$data$redcap_event_name)) {
      retention <- get_retention_summary(project)
      event_summary <- get_event_completion_summary(project)
    }
  }
}

# Run scheduled checks
monitoring_schedule(project)

2. Alert Thresholds

Set up automated alerts:

# Define quality thresholds and check
check_quality_thresholds <- function(project) {
  
  alerts <- list()
  
  # Missing data threshold
  missing_report <- analyze_missing_data(project)
  critical_missing <- missing_report %>% filter(missing_percent > 15)
  
  if (nrow(critical_missing) > 0) {
    alerts$missing_data <- paste(
      nrow(critical_missing), 
      "fields exceed 15% missing data threshold"
    )
  }
  
  # Retention threshold (if longitudinal)
  if (!is.null(project$data$redcap_event_name)) {
    retention <- get_retention_summary(project)
    latest_retention <- retention %>% slice(n()) %>% pull(retention_rate)
    
    if (latest_retention < 70) {
      alerts$retention <- paste(
        "Retention dropped to", latest_retention, "%"
      )
    }
  }
  
  # Completion rate threshold
  data <- project$data
  complete_fields <- grep("_complete$", names(data), value = TRUE)
  
  if (length(complete_fields) > 0) {
    avg_completion <- mean(
      sapply(complete_fields, function(f) sum(data[[f]] == 2, na.rm = TRUE) / nrow(data))
    ) * 100
    
    if (avg_completion < 60) {
      alerts$completion <- paste(
        "Average completion rate is", round(avg_completion, 1), "%"
      )
    }
  }
  
  # Report alerts
  if (length(alerts) > 0) {
    cli::cli_h2("Quality Alerts")
    for (alert in names(alerts)) {
      cli::cli_alert_warning("{alert}: {alerts[[alert]]}")
    }
  } else {
    cli::cli_alert_success("All quality metrics within acceptable ranges")
  }
  
  return(alerts)
}

# Check thresholds
alerts <- check_quality_thresholds(project)

3. Version Control

Track report history:

# Add metadata to reports
add_report_metadata <- function(report_data, report_type) {
  
  metadata <- list(
    report_type = report_type,
    generated_at = Sys.time(),
    sardine_version = as.character(packageVersion("sardine")),
    r_version = R.version.string,
    n_records = if(is.data.frame(report_data)) nrow(report_data) else NA
  )
  
  # Combine metadata with report
  result <- list(
    metadata = metadata,
    data = report_data
  )
  
  class(result) <- c("sardine_report", class(result))
  return(result)
}

# Use metadata wrapper
completion_report <- calculate_completion_rates(project, c("demographics", "baseline"))
completion_with_meta <- add_report_metadata(completion_report, "form_completion")

# Save with metadata
saveRDS(completion_with_meta, "reports/completion_with_metadata.rds")

4. Documentation

Document your monitoring strategy:

# Create monitoring documentation
document_monitoring_plan <- function(output_file = "monitoring_plan.md") {
  
  plan <- '
# Data Monitoring Plan

## Daily Checks (Weekdays)
- Missing data analysis
- Data type validation
- New record count
- Critical field completion

## Weekly Reports (Fridays)
- Form completion rates
- Event completion summary (longitudinal)
- Retention analysis (longitudinal)
- Quality metric trends

## Monthly Reports (1st of Month)
- Comprehensive data quality report
- Enrollment progress vs targets
- Retention trends
- Data completeness analysis

## Alert Thresholds
- Missing data: >15% in any field
- Retention: <70% at any timepoint
- Average completion: <60%
- Data type issues: Any detected

## Report Storage
- Daily: `daily_reports/`
- Weekly: `weekly_reports/`
- Monthly: `monthly_reports/`

## Contact Information
- Data Manager: [Name]
- Principal Investigator: [Name]
- REDCap Administrator: [Name]
  '
  
  writeLines(plan, output_file)
  cli::cli_alert_success("Monitoring plan documented in {output_file}")
}

# Create documentation
document_monitoring_plan()

Summary

The sardine package provides comprehensive tools for monitoring data quality and study progress:

Data Quality Tools

  • Missing data analysis: Identify and quantify missing data patterns
  • Data type validation: Detect format inconsistencies and entry errors
  • Custom quality checks: Create project-specific validation rules
  • Outlier detection: Identify extreme or implausible values

Completion Tracking

  • Form completion status: Track which forms participants have completed
  • Completion rates: Calculate overall and form-specific completion percentages
  • Incomplete identification: Find participants needing follow-up

Longitudinal Studies

  • Event completion: Monitor progress across study timepoints
  • Retention analysis: Track participant retention and attrition
  • Multiple definitions: Flexible retention definitions (any data vs specific forms)

Automated Monitoring

  • Scheduled reports: Daily, weekly, and monthly automated checks
  • Quality thresholds: Alert system for critical metrics
  • Multiple export formats: CSV, Excel, JSON for different use cases

Key Benefits

  • Proactive: Identify issues early before they impact analysis
  • Efficient: Automated reports reduce manual effort
  • Comprehensive: Combined view of quality, completion, and retention
  • Flexible: Customizable to your project’s specific needs
  • Reproducible: Versioned reports with metadata

This monitoring infrastructure helps maintain high data quality, ensures study milestones are met, and provides transparency to stakeholders and oversight committees.