Introduction to Model Comparison
Model selection is one of the most important skills in statistical analysis. Too simple, and you miss important relationships. Too complex, and you overfit to your specific dataset. Today we'll learn principled approaches to finding the right balance.
What You'll Learn
- Fit and compare multiple competing models
- Use ANOVA to test if adding variables improves model fit
- Interpret AIC, BIC, and R-squared for model selection
- Apply practical guidelines for building better models
- Understand the trade-off between complexity and overfitting
- Completed the basic statistical modeling tutorial
- Understanding of R-squared and p-values
- Same packages:
tidyverse,glmmTMB,broom.mixed
Building Competing Models
Let's start by fitting several competing models, each with different combinations of predictors. We'll systematically add variables to see which combinations provide the best predictions.
# Load our familiar packages
library(tidyverse)
library(glmmTMB)
library(broom.mixed)
# Load data
data(mtcars)
# Examine correlations between potential predictors
mtcars %>%
select(mpg, wt, hp, cyl) %>%
cor() %>%
round(3)
mpg wt hp cyl mpg 1.000 -0.868 -0.776 -0.852 wt -0.868 1.000 0.659 0.782 hp -0.776 0.659 1.000 0.832 cyl -0.852 0.782 0.832 1.000
Understanding the Relationships
The correlation matrix reveals important patterns:
- Strong negative correlations: All predictors negatively correlate with mpg (heavier, more powerful cars get worse mileage)
- Multicollinearity: Weight, horsepower, and cylinders are moderately correlated (0.66-0.83)
- Selection challenge: We need to balance additional predictive power against redundancy
# Fit competing models systematically
model1 <- glmmTMB(mpg ~ wt, data = mtcars) # Baseline: weight only
model2 <- glmmTMB(mpg ~ wt + hp, data = mtcars) # Add horsepower
model3 <- glmmTMB(mpg ~ wt + hp + cyl, data = mtcars) # Add cylinders
model4 <- glmmTMB(mpg ~ wt * hp, data = mtcars) # Weight-horsepower interaction
# Create a summary function for easy comparison
summarize_model <- function(model) {
model_data <- augment(model)
r_squared <- 1 - var(model_data$.resid) / var(mtcars$mpg)
data.frame(
R_squared = round(r_squared, 3),
AIC = round(AIC(model), 1),
BIC = round(BIC(model), 1),
RMSE = round(sqrt(mean(model_data$.resid^2)), 2)
)
}
# Compare all models
model_comparison <- bind_rows(
summarize_model(model1),
summarize_model(model2),
summarize_model(model3),
summarize_model(model4)
) %>%
mutate(Model = c("Weight Only", "Weight + HP", "Weight + HP + Cyl", "Weight × HP"))
print(model_comparison)
| Model | R-squared | AIC | BIC | RMSE |
|---|---|---|---|---|
| Weight Only | 0.753 | 166.0 | 170.4 | 2.95 |
| Weight + HP | 0.827 | 156.7 | 162.5 | 2.44 |
| Weight + HP + Cyl | 0.843 | 155.5 | 162.8 | 2.32 |
| Weight × HP | 0.885 | 145.6 | 152.9 | 1.98 |
Statistical Testing with ANOVA
Comparing model fit metrics gives us a sense of which models perform better, but ANOVA provides formal statistical tests of whether adding variables significantly improves model fit.
# Use ANOVA to test nested models
anova_result <- anova(model1, model2, model3)
print(anova_result)
Data: mtcars
Models:
model1: mpg ~ wt, zi=~0, disp=~1
model2: mpg ~ wt + hp, zi=~0, disp=~1
model3: mpg ~ wt + hp + cyl, zi=~0, disp=~1
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
model1 3 166.03 170.43 -80.015 160.03
model2 4 156.65 162.51 -74.326 148.65 11.3771 1 0.0007436 ***
model3 5 155.48 162.81 -72.738 145.48 3.1757 1 0.0747407 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Interpreting ANOVA Results
- Model 1 → Model 2: Adding horsepower significantly improves fit (p = 0.0007)
- Model 2 → Model 3: Adding cylinders shows marginal improvement (p = 0.075)
- Chi-square test: Tests whether the reduction in deviance is statistically significant
- Degrees of freedom: Each additional parameter costs 1 degree of freedom
# Test the interaction model separately
# (Can't include in nested ANOVA since it's not nested)
anova_interaction <- anova(model2, model4)
print(anova_interaction)
Data: mtcars
Models:
model2: mpg ~ wt + hp, zi=~0, disp=~1
model4: mpg ~ wt * hp, zi=~0, disp=~1
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
model2 4 156.65 162.51 -74.326 148.65
model4 5 145.61 152.94 -67.805 135.61 13.044 1 0.000304 ***
Understanding Model Fit Metrics
Different metrics tell us different things about model quality. Let's understand what each one means and when to use them for model selection.
R-squared: Proportion of Variance Explained
# Visualize R-squared improvement
r_squared_data <- data.frame(
Model = factor(c("Weight Only", "Weight + HP", "Weight + HP + Cyl", "Weight × HP"),
levels = c("Weight Only", "Weight + HP", "Weight + HP + Cyl", "Weight × HP")),
R_squared = c(0.753, 0.827, 0.843, 0.885),
Variables = c(1, 2, 3, 3) # Interaction counts as 3 parameters total
)
ggplot(r_squared_data, aes(x = Variables, y = R_squared)) +
geom_line(color = "#7c3aed", linewidth = 1.2) +
geom_point(color = "#7c3aed", size = 3) +
geom_text(aes(label = Model), vjust = -0.5, size = 3.5) +
scale_y_continuous(limits = c(0.7, 0.9), labels = scales::percent_format()) +
labs(
title = "Model Performance: R-squared vs Complexity",
subtitle = "Each additional variable explains more variance",
x = "Number of Predictor Variables",
y = "R-squared (Variance Explained)"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold"),
plot.subtitle = element_text(color = "gray60")
)
AIC and BIC: Balancing Fit and Complexity
# Compare information criteria
ic_data <- model_comparison %>%
select(Model, AIC, BIC) %>%
pivot_longer(cols = c(AIC, BIC), names_to = "Criterion", values_to = "Value") %>%
mutate(Model = factor(Model, levels = c("Weight Only", "Weight + HP",
"Weight + HP + Cyl", "Weight × HP")))
ggplot(ic_data, aes(x = Model, y = Value, fill = Criterion)) +
geom_col(position = "dodge", alpha = 0.8) +
scale_fill_manual(values = c("AIC" = "#2563eb", "BIC" = "#dc2626")) +
labs(
title = "Information Criteria Comparison",
subtitle = "Lower values indicate better models (accounting for complexity)",
x = "Model",
y = "Information Criterion Value"
) +
theme_minimal() +
theme(
axis.text.x = element_text(angle = 45, hjust = 1),
plot.title = element_text(face = "bold"),
plot.subtitle = element_text(color = "gray60")
)
Key Differences Between Metrics
| Metric | What it Measures | Good for | Limitation |
|---|---|---|---|
| R-squared | Proportion of variance explained | Understanding explanatory power | Always increases with more variables |
| AIC | Fit penalized by number of parameters | Model selection, prediction focus | Can overfit with small samples |
| BIC | Fit with stronger complexity penalty | Finding "true" model, larger samples | May be too conservative |
| RMSE | Average prediction error | Practical prediction accuracy | Same units as outcome variable |
Model Selection Strategy
With multiple competing models, how do we choose? The best approach depends on your goals: prediction accuracy, scientific understanding, or model simplicity.
The Final Model: Weight × Horsepower Interaction
# Examine our best model in detail
best_model <- model4 # Weight × HP interaction
model_summary <- tidy(best_model)
print(model_summary)
# Get model fit statistics
fit_stats <- glance(best_model)
print(fit_stats)
# Model coefficients # A tibble: 4 × 7 effect component term estimate std.error statistic p.value1 fixed cond (Intercept) 49.8 3.31 15.0 4.20e-16 2 fixed cond wt -8.22 1.60 -5.13 2.48e- 5 3 fixed cond hp -0.120 0.0205 -5.87 4.30e- 6 4 fixed cond wt:hp 0.0279 0.00774 3.61 1.15e- 3 # Model fit statistics nobs sigma logLik AIC BIC deviance df.residual 1 32 1.98 -67.8 145.6 152.9 124. 28
Interpreting the Interaction Model
The final model equation is: mpg = 49.8 - 8.22×wt - 0.120×hp + 0.0279×wt×hp
- Weight main effect (-8.22): For low-horsepower cars, each 1000 lbs reduces mpg by ~8.2
- Horsepower main effect (-0.120): For light cars, each additional hp reduces mpg by ~0.12
- Interaction term (+0.0279): The weight penalty is less severe for high-horsepower cars
# Visualize the interaction effect
model_data <- augment(best_model)
# Create predictions for visualization
prediction_grid <- expand_grid(
wt = seq(1.5, 5.5, 0.1),
hp = c(100, 200, 300) # Low, medium, high horsepower
)
prediction_grid$predicted_mpg <- predict(best_model, newdata = prediction_grid)
# Plot the interaction
ggplot(prediction_grid, aes(x = wt, y = predicted_mpg, color = factor(hp))) +
geom_line(linewidth = 1.2) +
geom_point(data = model_data, aes(x = wt, y = mpg),
color = "black", alpha = 0.6, inherit.aes = FALSE) +
scale_color_manual(values = c("100" = "#22c55e", "200" = "#eab308", "300" = "#ef4444"),
name = "Horsepower") +
labs(
title = "Weight-Horsepower Interaction Effects on Fuel Efficiency",
subtitle = "The weight penalty varies by horsepower level",
x = "Weight (1000 lbs)",
y = "Predicted MPG"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold"),
plot.subtitle = element_text(color = "gray60"),
legend.position = "bottom"
)
Practical Model Selection Guidelines
When to Add Variables
- Theoretical justification: Does the variable make scientific sense?
- Statistical significance: Does ANOVA show significant improvement? (p < 0.05)
- Practical significance: Is the R-squared improvement meaningful?
- Sample size: Do you have at least 10-20 observations per parameter?
Model Selection Checklist
| Check | Our Example | Result |
|---|---|---|
| Theoretical sense | Weight and horsepower should affect fuel efficiency | ✅ Pass |
| ANOVA significance | p < 0.001 for interaction | ✅ Pass |
| R-squared improvement | 0.753 → 0.885 (+13.2%) | ✅ Pass |
| Sample size | 32 observations, 4 parameters (8:1 ratio) | ✅ Pass |
| Residual patterns | Random scatter in diagnostics | ✅ Pass |
Common Pitfalls to Avoid
- Stepwise selection: Automated variable selection often produces unstable results
- P-hacking: Testing many variables increases the chance of false positives
- Overfitting: Complex models may not generalize to new data
- Ignoring multicollinearity: Highly correlated predictors can cause unstable estimates
- Model complexity without justification: Simpler models are often more interpretable and robust
# Final model validation
# Check multicollinearity with VIF (Variance Inflation Factor)
library(car) # for vif() function
# For interaction models, we need to center variables first
mtcars_centered <- mtcars %>%
mutate(
wt_c = wt - mean(wt),
hp_c = hp - mean(hp)
)
validation_model <- glmmTMB(mpg ~ wt_c * hp_c, data = mtcars_centered)
# Note: VIF calculation for glmmTMB requires extracting the linear model part
# For practical purposes, we can check the correlation matrix
# High correlations (>0.7) between predictors suggest multicollinearity
correlation_check <- mtcars %>%
select(wt, hp) %>%
cor()
print("Correlation between predictors:")
print(round(correlation_check, 3))
Correlation between predictors:
wt hp
wt 1.000 0.659
hp 0.659 1.000
Summary and Next Steps
What You've Accomplished
✅ Completed Learning Objectives
- Built and compared multiple competing models systematically
- Used ANOVA to test statistical significance of model improvements
- Interpreted AIC, BIC, and R-squared for model selection decisions
- Discovered and interpreted a meaningful interaction effect
- Applied practical guidelines for responsible model building
Key Insights from Our Analysis
- Systematic approach works: Building models incrementally helps identify which variables add value
- Interactions matter: The relationship between weight and fuel efficiency depends on horsepower
- Multiple metrics tell different stories: Use R-squared, AIC, BIC, and ANOVA together for robust decisions
- Theory guides statistics: Understanding the domain helps interpret statistical results
- Try the same analysis with the
diamondsdataset (price ~ carat + cut + color) - Explore polynomial terms: does
mpg ~ wt + I(wt^2)improve fit? - Test three-way interactions:
mpg ~ wt * hp * cyl(watch for overfitting!) - Apply these techniques to your own research data
Advanced Topics to Explore Next
- Cross-validation: Testing model performance on holdout data
- Regularization: LASSO and Ridge regression for automatic variable selection
- Non-linear models: GAMs (Generalized Additive Models) for flexible relationships
- Mixed-effects models: Handling hierarchical or repeated-measures data
- Model diagnostics: Advanced residual analysis and assumption checking