rstatix: Tidy Statistical Analysis with Group-wise Multiple Comparisons

👋

Welcome to the rstatix Ecosystem

Your gateway to streamlined statistical analysis in R

If you've ever found yourself drowning in a sea of different R packages for statistical testing—each with its own syntax, output format, and quirks—then rstatix is about to become your new best friend. Created by Alboukadel Kassambara, the mastermind behind the beloved ggpubr package, rstatix was born from a simple but powerful idea: statistical analysis should be as intuitive and consistent as the tidyverse workflow you already love.

🤔 Why Does rstatix Matter?

Picture this scenario: You're analyzing a multi-center clinical trial with 800 patients across 4 treatment groups. You need to:

Run descriptive statistics by treatment and center
Perform ANOVA with post-hoc comparisons
Check normality assumptions for each group
Calculate effect sizes with confidence intervals
Apply different multiple comparison corrections
Create publication-ready plots with statistical annotations

Without rstatix, you'd be juggling functions from base R, stats, broom, effsize, multcomp, and half a dozen other packages—each with different output formats, different parameter names, and different ways of handling grouped data. With rstatix, it's all unified under one consistent, tidy framework.

🎯 The rstatix Philosophy

🧹 Tidy First

Every function returns a clean data frame that plays nicely with dplyr, ggplot2, and the rest of the tidyverse ecosystem.

👥 Group-Aware

Built-in support for grouped analysis using group_by()—no more manual data splitting or loop writing.

📊 Visualization-Ready

Seamless integration with ggpubr for automatic statistical annotations on your plots.

🔧 Comprehensive

From basic t-tests to complex ANOVA designs, from normality testing to effect sizes—all in one unified package.

💡 Real-World Impact

Before rstatix: "Let me check if this is the right function... wait, why is the output a list? How do I extract the p-values? Which package has the effect size function again?"

With rstatix: "Let me just pipe my grouped data through the statistical test function and directly into my plotting function. Done."

This isn't just about convenience—it's about reducing errors, improving reproducibility, and letting you focus on the science instead of wrestling with syntax.

🧪 What You'll Learn Today

We'll walk through a comprehensive analysis of a simulated multi-center hypertension trial, demonstrating how rstatix transforms complex statistical workflows into elegant, readable code. You'll see how to:

Master the Core Functions
get_summary_stats(), anova_test(), pairwise_t_test(), and friends

Handle Complex Designs
Multi-center, multi-arm trials with nested grouping structures

Navigate Multiple Comparisons
Bonferroni, FDR, Holm—when to use what and why it matters

Create Publication Plots
Statistical annotations that reviewers will love

Ready to transform your statistical analysis workflow? Let's dive into the world of tidy statistical testing with rstatix. 🚀

🎯 Learning Objectives: Mastering rstatix

Master the comprehensive rstatix function ecosystem for statistical testing

Implement group-wise multiple comparison corrections using group_by() workflows

Compare parametric and non-parametric approaches with consistent tidy output

Calculate effect sizes and evaluate normality assumptions systematically

Create publication-ready statistical visualizations with ggpubr integration

Apply rstatix to complex multi-center clinical trial analysis workflows

📊 Case Study: Multi-Center Hypertension Trial Analysis with rstatix

A comprehensive demonstration of rstatix capabilities using a simulated 8-center, 4-arm hypertension trial with 800 patients and complex group-wise statistical testing requirements

🏥 Clinical Design

8-center RCT with 800 patients across 4 treatment arms (Placebo, ACE Inhibitor, ARB, Combination therapy). 12-month follow-up with multiple endpoints.

📈 rstatix Focus

Comprehensive statistical testing ecosystem including ANOVA, post-hoc tests, normality assessment, effect sizes, and correlation analysis.

🔍 Group-wise Analysis

Advanced group_by() workflows for center-specific, age group, and risk category analyses with automatic multiple comparison corrections.

📊 Statistical Methods

Parametric and non-parametric tests, multiple correction methods (Bonferroni, FDR, Holm), effect size calculations, and normality assessment.

🎨 Visualization Integration

Seamless ggpubr integration for statistical annotations, automated p-value displays, and publication-ready plots.

⚡ Workflow Efficiency

Tidy output format integrates perfectly with tidyverse workflows, enabling efficient data manipulation and reporting pipelines.

🧮

rstatix Function Ecosystem

Comprehensive statistical testing with tidy output

🔧 Core rstatix Functions

get_summary_stats()

Comprehensive descriptive statistics with multiple summary types. Supports grouping and custom statistic selection.

anova_test()

One-way, two-way, and repeated measures ANOVA with effect size calculations and assumption checking.

pairwise_t_test()

Post-hoc pairwise comparisons with multiple adjustment methods. Integrates seamlessly with group_by().

tukey_hsd()

Tukey's Honestly Significant Difference test for balanced designs with family-wise error rate control.

shapiro_test()

Normality testing with Shapiro-Wilk test. Works with grouped data for assumption checking.

kruskal_test()

Non-parametric alternative to ANOVA for non-normal data or ordinal variables.

cohens_d()

Effect size calculation with confidence intervals for standardized mean differences.

cor_test()

Correlation testing with multiple methods (Pearson, Spearman, Kendall) and p-value corrections.

Basic rstatix Workflow Setup R

library(tidyverse)
library(rstatix)
library(ggpubr)
library(glmmTMB)
library(broom)

# Load simulated multi-center hypertension trial data
# 800 patients across 8 centers, 4 treatment arms
# Primary endpoint: Systolic BP at Month 6

# Key grouping variables for rstatix analysis:
# - treatment: Placebo, ACE_Inhibitor, ARB, Combination
# - center_name: 8 clinical centers (A through H)
# - age_group: Young (<50), Middle (50-65), Older (>65)
# - risk_category: Standard_Risk, Moderate_Risk, High_Risk

# Primary analysis dataset (Month 6 - primary endpoint)
primary_analysis_data <- longitudinal_data %>%
  filter(timepoint == "Month_6") %>%
  mutate(
    age_group = case_when(
      age < 50 ~ "Young",
      age < 65 ~ "Middle",
      TRUE ~ "Older"
    ),
    risk_category = case_when(
      diabetes != "No" | baseline_sbp >= 180 ~ "High_Risk",
      baseline_sbp >= 160 ~ "Moderate_Risk",
      TRUE ~ "Standard_Risk"
    )
  )

Descriptive Statistics

get_summary_stats() with type options
Group-wise summaries with group_by()
Multiple variables simultaneously
Custom statistic selection
Tidy output format

Hypothesis Testing

anova_test() for group comparisons
Automatic effect size calculation
Two-way ANOVA with interactions
ANCOVA with covariates
Repeated measures support

Multiple Comparisons

pairwise_t_test() with corrections
tukey_hsd() for balanced designs
Bonferroni, FDR, Holm methods
Group-wise corrections
adjust_pvalue() for custom corrections

Non-Parametric Tests

kruskal_test() for non-normal data
wilcox_test() for pairwise comparisons
dunn_test() for post-hoc analysis
Rank-based alternatives
Ordinal data support

📊

Statistical Testing with Visual Annotations

ggpubr integration for publication-ready plots

Automated Statistical Annotations R

# ANOVA with post-hoc comparisons
anova_sbp <- primary_analysis_data %>%
  anova_test(sbp ~ treatment)

# Pairwise comparisons with FDR correction
posthoc_sbp_fdr <- primary_analysis_data %>%
  pairwise_t_test(sbp ~ treatment, p.adjust.method = "fdr")

# Create box plot with statistical annotations
ggboxplot(primary_analysis_data, x = "treatment", y = "sbp",
          color = "treatment", palette = "viridis",
          add = "jitter", add.params = list(alpha = 0.3)) +
  stat_compare_means(method = "anova", label.y = 200, size = 4) +
  stat_compare_means(comparisons = list(
    c("Placebo", "ACE_Inhibitor"),
    c("Placebo", "ARB"), 
    c("Placebo", "Combination"),
    c("ACE_Inhibitor", "Combination"),
    c("ARB", "Combination")
  ), method = "t.test", 
  p.adjust.method = "fdr",
  label = "p.signif",
  step.increase = 0.08)

Treatment Effects with Statistical Annotations

Figure 1: Statistical Testing with Automated Annotations. ggpubr's stat_compare_means() function automatically performs statistical tests and adds p-value annotations. The overall ANOVA p-value is shown at the top, while pairwise comparisons use FDR correction. This approach combines statistical rigor with clear visual communication, showing both the overall treatment effect and specific pairwise differences.

⚡ rstatix + ggpubr Advantage

The integration between rstatix and ggpubr provides unmatched efficiency for statistical visualization. Instead of manually calculating p-values and adding annotations, stat_compare_means() automatically performs the appropriate tests, applies corrections, and displays results directly on plots. This reduces errors, saves time, and ensures consistency across analyses.

🔧

Multiple Comparison Correction Methods

Comprehensive approach to p-value adjustment

Comparing Correction Methods R

# Compare different multiple comparison correction methods
posthoc_sbp_bonferroni <- primary_analysis_data %>%
  pairwise_t_test(sbp ~ treatment, p.adjust.method = "bonferroni")

posthoc_sbp_fdr <- primary_analysis_data %>%
  pairwise_t_test(sbp ~ treatment, p.adjust.method = "fdr")

posthoc_sbp_holm <- primary_analysis_data %>%
  pairwise_t_test(sbp ~ treatment, p.adjust.method = "holm")

# Combine results for comparison
comparison_methods <- bind_rows(
  posthoc_sbp_bonferroni %>% mutate(method = "Bonferroni"),
  posthoc_sbp_fdr %>% mutate(method = "FDR"),
  posthoc_sbp_holm %>% mutate(method = "Holm")
) %>%
  mutate(
    comparison = paste(group1, "vs", group2),
    significant = case_when(
      p.adj < 0.001 ~ "***",
      p.adj < 0.01 ~ "**", 
      p.adj < 0.05 ~ "*",
      TRUE ~ "ns"
    ),
    log_p_adj = -log10(p.adj)
  )

Figure 2: Impact of Multiple Comparison Correction Methods. Different p-value adjustment approaches show varying levels of conservatism. Bonferroni (most conservative) and Holm methods show similar patterns, while FDR (False Discovery Rate) is less conservative, allowing more discoveries while controlling the expected proportion of false positives. The choice of method depends on the research context and tolerance for Type I vs Type II errors.

Correction Method	Type	Conservatism	Use Case	rstatix Function
Bonferroni	Family-wise Error Rate	Very Conservative	Strong Type I error control	p.adjust.method = "bonferroni"
Holm	Family-wise Error Rate	Conservative	Uniformly more powerful than Bonferroni	p.adjust.method = "holm"
FDR (Benjamini-Hochberg)	False Discovery Rate	Moderate	Exploratory analysis, many comparisons	p.adjust.method = "fdr"
None	No correction	Not conservative	Single planned comparison only	p.adjust.method = "none"

📐 Choosing Correction Methods

Method selection depends on research goals: Bonferroni/Holm for confirmatory analysis requiring strong Type I error control; FDR for exploratory analysis or large-scale screening where some false positives are acceptable; No correction only for single, pre-planned comparisons. rstatix makes it easy to compare methods and document the rationale for your choice.

👥

Group-wise Multiple Comparisons

Advanced group_by() workflows for complex designs

Group-wise Analysis with rstatix R

# Key rstatix feature: group-wise multiple comparisons
# Multiple comparisons within each center
posthoc_by_center <- primary_analysis_data %>%
  group_by(center_name) %>%
  pairwise_t_test(sbp ~ treatment, p.adjust.method = "fdr") %>%
  adjust_pvalue(method = "fdr")  # Additional adjustment across centers

# Multiple comparisons within each age group
posthoc_by_age <- primary_analysis_data %>%
  group_by(age_group) %>%
  pairwise_t_test(sbp ~ treatment, p.adjust.method = "bonferroni")

# Multiple comparisons within each risk category
posthoc_by_risk <- primary_analysis_data %>%
  group_by(risk_category) %>%
  pairwise_t_test(sbp ~ treatment, p.adjust.method = "holm")

# Example: Center-specific treatment effects
cat("Center A treatment comparisons:\n")
posthoc_by_center %>%
  filter(center_name == "Center A", p.adj < 0.05) %>%
  select(group1, group2, p.adj, p.adj.signif) %>%
  print()

Figure 3: Center-wise Treatment Effect Analysis. Each clinical center shows consistent treatment patterns but with some heterogeneity in effect sizes. This group-wise analysis approach allows identification of centers with exceptional performance or potential implementation issues. The combination therapy consistently shows the largest effects across centers, while individual centers may show different patterns for ACE inhibitors vs ARBs.

🔄 Group-wise Workflow Benefits

rstatix's group_by() integration enables sophisticated multi-level analyses: (1) Automatic subgroup analysis without manual data splitting; (2) Consistent statistical methods across all groups; (3) Built-in multiple comparison corrections within and across groups; (4) Tidy output format that preserves grouping structure for further analysis and visualization.

📊 Group-wise Analysis Summary

Group-wise Analysis Summary: - Center-specific analyses: 48 comparisons across 8 centers - Age group analyses: 18 comparisons across 3 age groups - Risk category analyses: 18 comparisons across 3 risk levels Example significant findings: • Center A: Combination vs Placebo (p.adj = 0.001) • Center B: ARB vs Placebo (p.adj = 0.012) • Young patients: All active treatments vs Placebo significant • High-risk patients: Combination therapy most effective Multiple correction methods performance: - Bonferroni: 3/6 comparisons significant - FDR: 4/6 comparisons significant - Holm: 3/6 comparisons significant

📏

Effect Sizes and Practical Significance

Beyond p-values: quantifying clinical importance

Effect Size Calculations with rstatix R

# Cohen's d for treatment comparisons
cohens_d_sbp <- primary_analysis_data %>%
  cohens_d(sbp ~ treatment, ref.group = "Placebo")

cohens_d_dbp <- primary_analysis_data %>%
  cohens_d(dbp ~ treatment, ref.group = "Placebo")

# Effect size interpretation
effect_sizes_plot <- cohens_d_sbp %>%
  mutate(
    treatment = factor(group2, levels = treatment_arms[-1]),
    magnitude = case_when(
      abs(effsize) < 0.2 ~ "Negligible",
      abs(effsize) < 0.5 ~ "Small",
      abs(effsize) < 0.8 ~ "Medium",
      TRUE ~ "Large"
    ),
    clinical_significance = case_when(
      abs(effsize) >= 0.5 ~ "Clinically Meaningful",
      abs(effsize) >= 0.2 ~ "Potentially Meaningful", 
      TRUE ~ "Minimal Clinical Impact"
    )
  )

# Display results
print(effect_sizes_plot)

Figure 4: Treatment Effect Sizes (Cohen's d). Standardized effect sizes provide clinical context beyond statistical significance. Combination therapy shows a medium effect size (d ≈ 0.57), indicating clinically meaningful blood pressure reduction. ACE inhibitors and ARBs show small but meaningful effects (d ≈ 0.35-0.38). These effect sizes translate to meaningful clinical benefits in cardiovascular risk reduction.

🎯 Clinical Effect Size Interpretation

Effect sizes provide crucial clinical context: Small effects (d = 0.2-0.5) may be meaningful for common conditions or large populations; Medium effects (d = 0.5-0.8) represent clinically important differences; Large effects (d > 0.8) indicate substantial clinical benefits. In hypertension trials, even small effect sizes can translate to significant cardiovascular risk reduction at the population level.

📈

Normality Assessment and Test Selection

Choosing appropriate statistical methods

Figure 5: Normality Assessment by Treatment Group. Q-Q plots show the distribution of systolic blood pressure data for each treatment group. Most groups show reasonable adherence to normality, with some deviation in the tails. This supports the use of parametric tests (ANOVA, t-tests) for primary analysis, with non-parametric alternatives available for sensitivity analysis.

Normality Testing with rstatix R

# Shapiro-Wilk test by groups
normality_tests <- primary_analysis_data %>%
  group_by(treatment) %>%
  shapiro_test(sbp, dbp, quality_of_life)

# Display normality test results
print(normality_tests)

# Alternative: Anderson-Darling or Kolmogorov-Smirnov
# Q-Q plots for visual assessment
map(treatment_arms, function(trt) {
  data_subset <- primary_analysis_data %>% filter(treatment == trt)
  ggqqplot(data_subset, x = "sbp", 
           title = paste("Q-Q Plot:", trt),
           ggtheme = theme_minimal())
})

# Decision tree for test selection
test_selection <- normality_tests %>%
  mutate(
    normality_ok = p > 0.05,
    recommended_test = case_when(
      normality_ok ~ "Parametric (t-test/ANOVA)",
      !normality_ok ~ "Non-parametric (Wilcoxon/Kruskal-Wallis)"
    )
  )

Figure 6: Parametric vs Non-parametric Test Agreement. Comparison of p-values from parametric (t-tests, ANOVA) and non-parametric (Wilcoxon, Kruskal-Wallis) approaches shows good agreement. Points near the diagonal indicate consistent results between methods, supporting the robustness of our findings. When normality assumptions are questionable, both approaches provide similar conclusions.

🕸️

Correlation Analysis and Relationships

Understanding variable relationships with rstatix

Comprehensive Correlation Analysis R

# Correlation matrix with rstatix
cor_matrix <- primary_analysis_data %>%
  select(sbp, dbp, quality_of_life, age, bmi, baseline_sbp, baseline_dbp) %>%
  cor_mat()

# Correlation tests with p-values
cor_tests <- primary_analysis_data %>%
  cor_test(sbp, dbp, quality_of_life, age, bmi, method = "pearson")

# Partial correlation controlling for baseline
partial_cor <- primary_analysis_data %>%
  partial_cor_test(sbp, dbp, quality_of_life, 
                   controlling.for = c("baseline_sbp", "baseline_dbp"))

# Display significant correlations
significant_correlations <- cor_tests %>%
  filter(p < 0.05) %>%
  arrange(p) %>%
  mutate(
    correlation_strength = case_when(
      abs(cor) >= 0.7 ~ "Strong",
      abs(cor) >= 0.4 ~ "Moderate",
      abs(cor) >= 0.2 ~ "Weak",
      TRUE ~ "Negligible"
    )
  )

print(significant_correlations)

Figure 7: Clinical Variable Correlation Matrix. The heatmap reveals important clinical relationships: strong correlation between systolic and diastolic blood pressure (r = 0.89), moderate correlation between baseline and follow-up measures (r = 0.6-0.7), and negative correlation between blood pressure and quality of life (r = -0.3 to -0.4). These relationships inform clinical interpretation and model specification.

🔍 Correlation Insights

rstatix correlation analysis reveals clinically meaningful relationships: Strong BP correlations support using composite endpoints; Baseline-outcome correlations justify covariate adjustment; QoL-BP relationships validate patient-reported outcomes. The cor_test() function provides both effect sizes and statistical significance, enabling comprehensive relationship assessment.

🏆

Comprehensive Analysis Dashboard

Integrated rstatix workflow results

Figure 8: rstatix Analysis Dashboard. Top panel: Treatment group summary statistics showing mean ± SD for primary endpoints across all treatment arms. Bottom panel: Overall treatment effect significance using different statistical approaches (ANOVA F-test, Kruskal-Wallis, Welch ANOVA), demonstrating robust findings across methodological choices.

Data Preparation & Descriptive Analysis

Used get_summary_stats() for comprehensive descriptive statistics by treatment groups and subgroups. Created age categories and risk stratification for group-wise analysis.

Assumption Testing & Method Selection

Applied shapiro_test() for normality assessment across groups. Used Q-Q plots for visual confirmation. Selected appropriate parametric vs non-parametric approaches.

Primary Hypothesis Testing

Conducted anova_test() for overall treatment effects. Calculated effect sizes and confidence intervals. Applied multiple comparison corrections systematically.

Group-wise Multiple Comparisons

Leveraged group_by() workflows for center-specific, age group, and risk category analyses. Applied appropriate correction methods for each analysis level.

Effect Size & Clinical Significance

Calculated cohens_d() for standardized effect sizes. Interpreted clinical meaningfulness beyond statistical significance. Provided context for clinical decision-making.

Visualization & Reporting

Integrated ggpubr for automated statistical annotations. Created publication-ready figures with p-values and effect sizes. Generated comprehensive analysis dashboard.

🔑 Key rstatix Advantages

🧹

Tidy Output Format

Consistent data frame output across all statistical tests integrates seamlessly with tidyverse workflows for efficient data manipulation and reporting.

👥

Group-wise Analysis

Built-in group_by() integration enables automatic subgroup analysis with consistent statistical methods and multiple comparison corrections across all groups.

🔧

Comprehensive Function Set

Complete statistical testing ecosystem including parametric, non-parametric, effect sizes, correlations, and normality tests in a unified interface.

📊

ggpubr Integration

Seamless visualization integration with automatic statistical annotations, p-value displays, and publication-ready plots requiring minimal manual intervention.

⚖️

Multiple Correction Methods

Built-in support for Bonferroni, FDR, Holm, and other correction methods with easy comparison and selection based on research context and goals.

🎯

Effect Size Focus

Automatic effect size calculations with confidence intervals promote focus on practical significance beyond statistical significance for clinical interpretation.

💡

Best Practices and Recommendations

Implementing rstatix in clinical research workflows

🔍 Statistical Workflow

Always start with descriptive statistics and assumption checking. Use normality tests and visualizations to guide parametric vs non-parametric method selection.

⚖️ Multiple Comparisons

Choose correction methods based on research goals: Bonferroni/Holm for confirmatory analysis, FDR for exploratory analysis. Document rationale clearly.

👥 Group-wise Analysis

Leverage group_by() for subgroup analyses but be mindful of multiple testing burden. Consider hierarchical correction strategies for complex designs.

📊 Effect Sizes

Report effect sizes alongside p-values. Interpret clinical significance considering domain-specific benchmarks and practical importance.

🎨 Visualization

Use ggpubr integration for consistent statistical annotations. Combine statistical rigor with clear visual communication for diverse audiences.

📋 Reproducibility

Document all analysis decisions, correction methods, and interpretation criteria. Use rstatix's tidy output for transparent reporting workflows.