library(mandala)
library(dplyr)
library(ggplot2)Two-Stage MET Analysis
Single-stage vs two-stage workflows for multi-environment trials
Overview
Multi-environment trials (MET) can be analyzed using either single-stage or two-stage approaches. Each has trade-offs:
| Approach | Advantages | Disadvantages |
|---|---|---|
| Single-stage | Fully models all data jointly; optimal if model is correct | Computationally intensive; complex variance structures |
| Two-stage | Computationally efficient; modular workflow | May lose information if Stage 1 uncertainty not propagated |
Mandala supports both approaches and provides tools to properly propagate uncertainty from Stage 1 to Stage 2 using variance-covariance matrices (RMAT).
- Fit single-stage MET models with different variance structures
- Implement the two-stage workflow using
stage1_prep(),mandala_stage1(), andstage1_bundle() - Propagate BLUE uncertainty to Stage 2 using
vcov(RMAT) - Calculate heritability at both stages
- Compare single-stage and two-stage results
Data Setup
We’ll use simulated MET data with genotypes evaluated across multiple environments. Each environment has a randomized complete block design with spatial row/column structure.
# Simulate MET data
set.seed(42)
n_env <- 4
n_geno <- 50
n_rep <- 2
n_row <- 10
n_col <- 10
# Create base data structure
df <- expand.grid(
env = paste0("E", 1:n_env),
geno = paste0("G", sprintf("%03d", 1:n_geno)),
rep = 1:n_rep
)
# Assign spatial positions within each env:rep
df <- df %>%
group_by(env, rep) %>%
mutate(
row = rep(1:n_row, length.out = n()),
col = rep(1:n_col, each = n_row)[1:n()]
) %>%
ungroup()
# Simulate effects
env_effects <- rnorm(n_env, mean = 0, sd = 5)
names(env_effects) <- paste0("E", 1:n_env)
geno_effects <- rnorm(n_geno, mean = 0, sd = 3)
names(geno_effects) <- paste0("G", sprintf("%03d", 1:n_geno))
# GxE interaction
gxe_effects <- matrix(rnorm(n_env * n_geno, mean = 0, sd = 2),
nrow = n_geno, ncol = n_env)
rownames(gxe_effects) <- names(geno_effects)
colnames(gxe_effects) <- names(env_effects)
# Build response
df <- df %>%
mutate(
yld = 100 +
env_effects[as.character(env)] +
geno_effects[as.character(geno)] +
sapply(1:n(), function(i) gxe_effects[as.character(geno[i]), as.character(env[i])]) +
rnorm(n(), mean = 0, sd = 2) # residual
)
# Convert to factors
df$env <- as.factor(df$env)
df$geno <- as.factor(df$geno)
df$rep <- as.factor(df$rep)
df$row <- as.factor(df$row)
df$col <- as.factor(df$col)
df$block <- df$rep # alias for consistency
head(df)# A tibble: 6 × 7
env geno rep row col yld block
<fct> <fct> <fct> <fct> <fct> <dbl> <fct>
1 E1 G001 1 1 1 111. 1
2 E2 G001 1 1 1 95.0 1
3 E3 G001 1 1 1 101. 1
4 E4 G001 1 1 1 102. 1
5 E1 G002 1 2 1 105. 1
6 E2 G002 1 2 1 96.3 1
# Check data structure
df %>%
group_by(env) %>%
summarise(
n_obs = n(),
n_geno = n_distinct(geno),
mean_yld = mean(yld),
sd_yld = sd(yld)
)# A tibble: 4 × 5
env n_obs n_geno mean_yld sd_yld
<fct> <int> <int> <dbl> <dbl>
1 E1 100 50 107. 3.86
2 E2 100 50 96.5 4.45
3 E3 100 50 102. 4.59
4 E4 100 50 103. 3.80
Single-Stage MET Analysis
Single-stage analysis fits all data in one model. We demonstrate four model variants with different fixed/random specifications.
Model 1: Genotype and GxE Fixed
Estimate fixed genotype means and fixed GxE interaction. Design effects are random.
MET_model1 <- mandala(
fixed = yld ~ env + geno + env:geno,
random = ~ env:rep + env:row + env:col,
data = df
)Initial data rows: 400
Final data rows after NA handling: 400
Explicit param_df Check:
name type vc_idx init lower upper origin
1 env:rep var 1 12.71453 1e-06 Inf G
2 env:row var 2 12.71453 1e-06 Inf G
3 env:col var 3 12.71453 1e-06 Inf G
4 R.sigma2 R_var 0 15.89316 1e-06 Inf R
--- Starting EM Warmup ---
EM Warmup: starting theta:
env:rep 12.71 env:row 12.71 env:col 12.71 R.sigma2 15.89
EM iter 1: logLik=-560.2753; theta: env:rep 9.543 env:row 12.71 env:col 12.71 R.sigma2 1.844
EM iter 2: logLik=-524.9387; theta: env:rep 7.201 env:row 12.71 env:col 12.71 R.sigma2 1.845
EM iter 3: logLik=-524.3725; theta: env:rep 5.41 env:row 12.71 env:col 12.71 R.sigma2 1.844
EM iter 4: logLik=-523.8144; theta: env:rep 4.067 env:row 12.71 env:col 12.71 R.sigma2 1.844
EM iter 5: logLik=-523.2521; theta: env:rep 3.059 env:row 12.71 env:col 12.71 R.sigma2 1.844
EM Warmup: ending theta:
env:rep 3.059 env:row 12.71 env:col 12.71 R.sigma2 1.844
--- EM Warmup Complete. Starting AI-REML ---
Starting AI-REML logLik = -522.6940
Iter LogLik Sigma2 DF wall Restrained
1 -504.6371 2.287 200 09:41:10 ( 0 restrained)
2 -494.0647 2.836 200 09:41:10 ( 0 restrained)
3 -489.8827 3.452 200 09:41:10 ( 0 restrained)
4 -489.2608 3.838 200 09:41:11 ( 0 restrained)
5 -489.0886 3.965 200 09:41:11 ( 0 restrained)
6 -488.7539 3.930 200 09:41:11 ( 0 restrained)
7 -488.3893 3.842 200 09:41:11 ( 0 restrained)
8 -488.0805 3.768 200 09:41:11 ( 0 restrained)
9 -487.8057 3.731 200 09:41:11 ( 0 restrained)
10 -487.5397 3.726 200 09:41:12 ( 0 restrained)
11 -487.2805 3.735 200 09:41:12 ( 0 restrained)
12 -487.0315 3.748 200 09:41:12 ( 0 restrained)
13 -486.7934 3.756 200 09:41:12 ( 0 restrained)
14 -486.5662 3.759 200 09:41:12 ( 0 restrained)
15 -486.3505 3.758 200 09:41:13 ( 0 restrained)
16 -486.1472 3.756 200 09:41:13 ( 0 restrained)
17 -485.9570 3.755 200 09:41:13 ( 0 restrained)
18 -485.7805 3.754 200 09:41:13 ( 0 restrained)
19 -485.6180 3.754 200 09:41:13 ( 0 restrained)
20 -485.4694 3.754 200 09:41:13 ( 0 restrained)
21 -485.3346 3.754 200 09:41:14 ( 0 restrained)
22 -485.2132 3.755 200 09:41:14 ( 0 restrained)
23 -485.1047 3.755 200 09:41:14 ( 0 restrained)
24 -485.0083 3.755 200 09:41:14 ( 0 restrained)
25 -484.9232 3.755 200 09:41:14 ( 0 restrained)
26 -484.8486 3.755 200 09:41:15 ( 0 restrained)
27 -484.7835 3.755 200 09:41:15 ( 0 restrained)
Froze env:rep at 1.000009e-05
28 -484.7270 3.755 200 09:41:15 ( 1 restrained)
29 -484.3893 3.755 200 09:41:15 ( 1 restrained)
30 -484.3893 3.755 200 09:41:15 ( 1 restrained)
Converged at iter 30 by tolerance(s): delta_theta=0.00e+00, delta_loglik=0.00e+00
Main AI-REML Loop finished. Combining results.
solveMME_cpp: adding nugget 5.3269e-07 to MME diagonal (scaled from median diag)
summary(MET_model1)Model statement:
mandala(fixed = yld ~ env + geno + env:geno, random = ~env:rep +
env:row + env:col, data = df)
Variance Components:
component estimate std.error z.ratio bound %ch
env:rep 1.000009e-05 5.364309e-02 1.864190e-04 B NA
env:row 4.331926e+00 1.565893e+14 2.766426e-14 P NA
env:col 1.072798e+01 8.520183e+13 1.259125e-13 P NA
R.sigma2 3.754529e+00 3.792645e-01 9.899502e+00 P NA
Fixed Effects (BLUEs) [first 5]:
effect estimate std.error z.ratio
(Intercept) 109.6442609 4.112474 26.6613873
envE2 -13.9398242 5.816446 -2.3966223
envE3 -6.6710219 5.816446 -1.1469241
envE4 -7.2589418 5.816446 -1.2480030
genoG002 -0.6096677 3.523308 -0.1730384
Converged: TRUE | Iterations: 30
Random Effects (BLUPs) [first 5]:
random level estimate std.error z.ratio
env:rep E1.1 2.871378e-05 0.003162187 0.0090803549
env:rep E2.1 3.399284e-07 0.003162187 0.0001074979
env:rep E3.1 -1.287100e-05 0.003162187 -0.0040702857
env:rep E4.1 -1.811673e-05 0.003162187 -0.0057291786
env:rep E1.2 -2.871305e-05 0.003162187 -0.0090801233
logLik: -484.389 AIC: 976.779 BIC: 989.972 logLik_Trunc: -300.602
Model 2: Genotype Fixed, GxE Random
Estimate fixed genotype effects while treating GxE as random. Common for inference on genotype main effects.
MET_model2 <- mandala(
fixed = yld ~ env + geno,
random = ~ env:geno + env:rep + env:row + env:col,
data = df
)Initial data rows: 400
Final data rows after NA handling: 400
Explicit param_df Check:
name type vc_idx init lower upper origin
1 env:geno var 1 12.71453 1e-06 Inf G
2 env:rep var 2 12.71453 1e-06 Inf G
3 env:row var 3 12.71453 1e-06 Inf G
4 env:col var 4 12.71453 1e-06 Inf G
5 R.sigma2 R_var 0 15.89316 1e-06 Inf R
--- Starting EM Warmup ---
EM Warmup: starting theta:
env:geno 12.71 env:rep 12.71 env:row 12.71 env:col 12.71 R.sigma2 15.89
EM iter 1: logLik=-1005.6104; theta: env:geno 9.987 env:rep 9.543 env:row 9.226 env:col 9.308 R.sigma2 2.9
EM iter 2: logLik=-916.3746; theta: env:geno 8.354 env:rep 7.201 env:row 6.901 env:col 6.929 R.sigma2 3.436
EM iter 3: logLik=-904.9885; theta: env:geno 6.864 env:rep 5.412 env:row 5.126 env:col 5.143 R.sigma2 3.011
EM iter 4: logLik=-898.4885; theta: env:geno 5.771 env:rep 4.072 env:row 3.862 env:col 3.852 R.sigma2 3.072
EM iter 5: logLik=-891.6048; theta: env:geno 4.951 env:rep 3.066 env:row 2.949 env:col 2.912 R.sigma2 3.081
EM Warmup: ending theta:
env:geno 4.951 env:rep 3.066 env:row 2.949 env:col 2.912 R.sigma2 3.081
--- EM Warmup Complete. Starting AI-REML ---
Starting AI-REML logLik = -886.4365
Iter LogLik Sigma2 DF wall Restrained
1 -880.5759 3.620 347 09:41:16 ( 0 restrained)
2 -877.8699 4.159 347 09:41:16 ( 0 restrained)
3 -876.2515 4.215 347 09:41:16 ( 0 restrained)
4 -874.1225 4.046 347 09:41:16 ( 0 restrained)
5 -872.2457 3.852 347 09:41:17 ( 0 restrained)
6 -870.7499 3.730 347 09:41:17 ( 0 restrained)
7 -869.4727 3.689 347 09:41:17 ( 0 restrained)
8 -868.3304 3.699 347 09:41:17 ( 0 restrained)
9 -867.3102 3.727 347 09:41:17 ( 0 restrained)
10 -866.3988 3.750 347 09:41:17 ( 0 restrained)
11 -865.5792 3.762 347 09:41:18 ( 0 restrained)
12 -864.8407 3.764 347 09:41:18 ( 0 restrained)
13 -864.1780 3.761 347 09:41:18 ( 0 restrained)
14 -863.5868 3.757 347 09:41:18 ( 0 restrained)
15 -863.0615 3.754 347 09:41:18 ( 0 restrained)
16 -862.5961 3.753 347 09:41:19 ( 0 restrained)
17 -862.1841 3.753 347 09:41:19 ( 0 restrained)
18 -861.8198 3.754 347 09:41:19 ( 0 restrained)
19 -861.4982 3.754 347 09:41:19 ( 0 restrained)
20 -861.2147 3.755 347 09:41:19 ( 0 restrained)
21 -860.9655 3.755 347 09:41:20 ( 0 restrained)
22 -860.7472 3.755 347 09:41:20 ( 0 restrained)
23 -860.5566 3.755 347 09:41:20 ( 0 restrained)
24 -860.3907 3.755 347 09:41:20 ( 0 restrained)
25 -860.2469 3.755 347 09:41:20 ( 0 restrained)
26 -860.1225 3.755 347 09:41:21 ( 0 restrained)
27 -860.0152 3.755 347 09:41:21 ( 0 restrained)
Froze env:rep at 9.999599e-06
Froze env:row at 1.000033e-05
Froze env:col at 1.000158e-05
28 -859.9229 3.755 347 09:41:21 ( 3 restrained)
29 -859.3845 3.755 347 09:41:21 ( 3 restrained)
30 -859.3845 3.755 347 09:41:21 ( 3 restrained)
Converged at iter 30 by tolerance(s): delta_theta=0.00e+00, delta_loglik=0.00e+00
Main AI-REML Loop finished. Combining results.
solveMME_cpp: adding nugget 7.60834e-07 to MME diagonal (scaled from median diag)
summary(MET_model2)Model statement:
mandala(fixed = yld ~ env + geno, random = ~env:geno + env:rep +
env:row + env:col, data = df)
Variance Components:
component estimate std.error z.ratio bound %ch
env:geno 4.383172e+00 0.8727878 5.022036e+00 P NA
env:rep 9.999599e-06 0.0536432 1.864094e-04 B NA
env:row 1.000033e-05 0.3810006 2.624755e-05 B NA
env:col 1.000158e-05 0.2694103 3.712396e-05 B NA
R.sigma2 3.754539e+00 0.3792652 9.899509e+00 P NA
Fixed Effects (BLUEs) [first 5]:
effect estimate std.error z.ratio
(Intercept) 107.8318845 1.2879953 83.7207122
envE2 -10.6439443 0.5004328 -21.2694776
envE3 -5.5060711 0.5004328 -11.0026182
envE4 -4.4688854 0.5004328 -8.9300410
genoG002 -0.3650888 1.7692144 -0.2063565
Converged: TRUE | Iterations: 30
Random Effects (BLUPs) [first 5]:
random level estimate std.error z.ratio
env:geno E1.G001 1.2725634 1.458609 0.8724497
env:geno E2.G001 -1.0396545 1.458610 -0.7127709
env:geno E3.G001 0.4528869 1.458610 0.3104922
env:geno E4.G001 -0.6853295 1.458610 -0.4698513
env:geno E1.G002 1.0999023 1.458625 0.7540680
logLik: -859.384 AIC: 1728.769 BIC: 1748.016 logLik_Trunc: -540.513
Model 3: Environment Fixed, Genotype and GxE Random
Partition genetic variance across environments. Useful for heritability and variance component estimation.
MET_model3 <- mandala(
fixed = yld ~ env,
random = ~ geno + env:geno + env:rep + env:row + env:col,
data = df
)Initial data rows: 400
Final data rows after NA handling: 400
Explicit param_df Check:
name type vc_idx init lower upper origin
1 geno var 1 12.71453 1e-06 Inf G
2 env:geno var 2 12.71453 1e-06 Inf G
3 env:rep var 3 12.71453 1e-06 Inf G
4 env:row var 4 12.71453 1e-06 Inf G
5 env:col var 5 12.71453 1e-06 Inf G
6 R.sigma2 R_var 0 15.89316 1e-06 Inf R
--- Starting EM Warmup ---
EM Warmup: starting theta:
geno 12.71 env:geno 12.71 env:rep 12.71 env:row 12.71 env:col 12.71 R.sigma2 15.89
EM iter 1: logLik=-1142.4222; theta: geno 10 env:geno 9.818 env:rep 9.543 env:row 8.584 env:col 8.581 R.sigma2 5.779
EM iter 2: logLik=-1056.9317; theta: geno 8.182 env:geno 8.109 env:rep 7.201 env:row 6.082 env:col 5.987 R.sigma2 6.696
EM iter 3: logLik=-1053.1028; theta: geno 6.931 env:geno 6.649 env:rep 5.42 env:row 4.347 env:col 4.23 R.sigma2 6.191
EM iter 4: logLik=-1040.4371; theta: geno 6.077 env:geno 5.586 env:rep 4.086 env:row 3.2 env:col 3.066 R.sigma2 6.364
EM iter 5: logLik=-1035.2517; theta: geno 5.503 env:geno 4.785 env:rep 3.084 env:row 2.417 env:col 2.278 R.sigma2 6.418
EM Warmup: ending theta:
geno 5.503 env:geno 4.785 env:rep 3.084 env:row 2.417 env:col 2.278 R.sigma2 6.418
--- EM Warmup Complete. Starting AI-REML ---
Starting AI-REML logLik = -1030.8455
Iter LogLik Sigma2 DF wall Restrained
1 -1015.3260 4.878 396 09:41:22 ( 0 restrained)
2 -1007.4047 3.707 396 09:41:23 ( 0 restrained)
3 -1006.4582 3.235 396 09:41:23 ( 0 restrained)
4 -1004.7878 3.255 396 09:41:23 ( 0 restrained)
5 -1002.7457 3.473 396 09:41:24 ( 0 restrained)
6 -1001.3215 3.683 396 09:41:24 ( 0 restrained)
7 -1000.2559 3.801 396 09:41:24 ( 0 restrained)
8 -999.2703 3.831 396 09:41:25 ( 0 restrained)
9 -998.3311 3.812 396 09:41:25 ( 0 restrained)
10 -997.4763 3.779 396 09:41:25 ( 0 restrained)
11 -996.7199 3.755 396 09:41:26 ( 0 restrained)
12 -996.0536 3.745 396 09:41:26 ( 0 restrained)
13 -995.4646 3.745 396 09:41:26 ( 0 restrained)
14 -994.9419 3.749 396 09:41:27 ( 0 restrained)
15 -994.4761 3.753 396 09:41:27 ( 0 restrained)
16 -994.0599 3.755 396 09:41:27 ( 0 restrained)
17 -993.6881 3.756 396 09:41:27 ( 0 restrained)
18 -993.3569 3.756 396 09:41:28 ( 0 restrained)
19 -993.0628 3.755 396 09:41:28 ( 0 restrained)
20 -992.8028 3.755 396 09:41:28 ( 0 restrained)
21 -992.5737 3.754 396 09:41:29 ( 0 restrained)
22 -992.3726 3.754 396 09:41:29 ( 0 restrained)
23 -992.1965 3.754 396 09:41:29 ( 0 restrained)
24 -992.0429 3.754 396 09:41:30 ( 0 restrained)
25 -991.9093 3.755 396 09:41:30 ( 0 restrained)
Froze env:col at 9.998505e-06
26 -991.7935 3.755 396 09:41:30 ( 1 restrained)
Froze env:row at 1.000017e-05
27 -991.6288 3.755 396 09:41:31 ( 2 restrained)
Froze env:rep at 9.999991e-06
28 -991.4426 3.755 396 09:41:31 ( 3 restrained)
29 -991.1023 3.755 396 09:41:31 ( 3 restrained)
30 -991.1023 3.755 396 09:41:32 ( 3 restrained)
Converged at iter 30 by tolerance(s): delta_theta=0.00e+00, delta_loglik=0.00e+00
Main AI-REML Loop finished. Combining results.
solveMME_cpp: adding nugget 7.59694e-07 to MME diagonal (scaled from median diag)
summary(MET_model3)Model statement:
mandala(fixed = yld ~ env, random = ~geno + env:geno + env:rep +
env:row + env:col, data = df)
Variance Components:
component estimate std.error z.ratio bound %ch
geno 1.028426e+01 2.40204348 4.281463e+00 P NA
env:geno 4.405181e+00 0.87506846 5.034099e+00 P NA
env:rep 9.999991e-06 0.05364342 1.864160e-04 B NA
env:row 1.000017e-05 0.38122685 2.623155e-05 B NA
env:col 9.998505e-06 0.26957022 3.709054e-05 B NA
R.sigma2 3.754539e+00 0.37926674 9.899469e+00 P NA
Fixed Effects (BLUEs) [first 5]:
effect estimate std.error z.ratio
(Intercept) 107.191690 0.5756216 186.219030
envE2 -10.643944 0.5013116 -21.232191
envE3 -5.506071 0.5013116 -10.983330
envE4 -4.468885 0.5013116 -8.914386
Converged: TRUE | Iterations: 30
Random Effects (BLUPs) [first 5]:
random level estimate std.error z.ratio
geno G001 0.5555188 1.241354 0.4475104
geno G002 0.2386548 1.241354 0.1922537
geno G003 3.4283639 1.241354 2.7617941
geno G004 -1.4838819 1.241354 -1.1953738
geno G005 3.2728545 1.241354 2.6365201
logLik: -991.102 AIC: 1994.205 BIC: 2018.093 logLik_Trunc: -627.203
Model 4: All Random (Variance Partitioning)
Full variance component model for partitioning total variance.
MET_model4 <- mandala(
fixed = yld ~ 1,
random = ~ env + geno + env:geno + env:rep + env:row + env:col,
data = df
)Initial data rows: 400
Final data rows after NA handling: 400
Explicit param_df Check:
name type vc_idx init lower upper origin
1 env var 1 12.71453 1e-06 Inf G
2 geno var 2 12.71453 1e-06 Inf G
3 env:geno var 3 12.71453 1e-06 Inf G
4 env:rep var 4 12.71453 1e-06 Inf G
5 env:row var 5 12.71453 1e-06 Inf G
6 env:col var 6 12.71453 1e-06 Inf G
7 R.sigma2 R_var 0 15.89316 1e-06 Inf R
--- Starting EM Warmup ---
EM Warmup: starting theta:
env 12.71 geno 12.71 env:geno 12.71 env:rep 12.71 env:row 12.71 env:col 12.71 R.sigma2 15.89
EM iter 1: logLik=-1151.8243; theta: env 11.18 geno 10 env:geno 9.817 env:rep 9.158 env:row 8.568 env:col 8.52 R.sigma2 9.362
EM iter 2: logLik=-1093.5325; theta: env 9.951 geno 8.182 env:geno 8.108 env:rep 6.707 env:row 6.065 env:col 5.92 R.sigma2 10.31
EM iter 3: logLik=-1089.8760; theta: env 9.038 geno 6.924 env:geno 6.709 env:rep 4.954 env:row 4.367 env:col 4.194 R.sigma2 10.05
EM iter 4: logLik=-1079.3846; theta: env 8.415 geno 6.055 env:geno 5.673 env:rep 3.698 env:row 3.243 env:col 3.056 R.sigma2 10.27
EM iter 5: logLik=-1074.5132; theta: env 8.042 geno 5.459 env:geno 4.878 env:rep 2.785 env:row 2.474 env:col 2.288 R.sigma2 10.39
EM Warmup: ending theta:
env 8.042 geno 5.459 env:geno 4.878 env:rep 2.785 env:row 2.474 env:col 2.288 R.sigma2 10.39
--- EM Warmup Complete. Starting AI-REML ---
Starting AI-REML logLik = -1070.4971
Iter LogLik Sigma2 DF wall Restrained
1 -1046.3475 7.899 399 09:41:33 ( 0 restrained)
2 -1028.0175 6.003 399 09:41:33 ( 0 restrained)
3 -1016.0388 4.263 399 09:41:34 ( 0 restrained)
4 -1013.8728 3.318 399 09:41:34 ( 0 restrained)
5 -1013.4529 3.105 399 09:41:34 ( 0 restrained)
6 -1011.2657 3.288 399 09:41:35 ( 0 restrained)
7 -1009.5063 3.561 399 09:41:35 ( 0 restrained)
8 -1008.4677 3.757 399 09:41:35 ( 0 restrained)
9 -1007.6300 3.838 399 09:41:36 ( 0 restrained)
10 -1006.8046 3.837 399 09:41:36 ( 0 restrained)
11 -1006.0250 3.802 399 09:41:37 ( 0 restrained)
12 -1005.3324 3.767 399 09:41:37 ( 0 restrained)
13 -1004.7289 3.747 399 09:41:37 ( 0 restrained)
14 -1004.2017 3.742 399 09:41:38 ( 0 restrained)
15 -1003.7384 3.745 399 09:41:38 ( 0 restrained)
16 -1003.3288 3.750 399 09:41:39 ( 0 restrained)
17 -1002.9652 3.754 399 09:41:39 ( 0 restrained)
18 -1002.6419 3.756 399 09:41:40 ( 0 restrained)
19 -1002.3551 3.756 399 09:41:40 ( 0 restrained)
20 -1002.1018 3.756 399 09:41:40 ( 0 restrained)
21 -1001.8792 3.755 399 09:41:41 ( 0 restrained)
22 -1001.6845 3.754 399 09:41:41 ( 0 restrained)
23 -1001.5149 3.754 399 09:41:41 ( 0 restrained)
24 -1001.3675 3.754 399 09:41:42 ( 0 restrained)
25 -1001.2399 3.754 399 09:41:42 ( 0 restrained)
Froze env:col at 1.000161e-05
26 -1001.1296 3.755 399 09:41:43 ( 1 restrained)
Froze env:row at 9.999025e-06
27 -1000.9696 3.755 399 09:41:43 ( 2 restrained)
Froze env:rep at 9.999198e-06
28 -1000.7853 3.755 399 09:41:43 ( 3 restrained)
29 -1000.4768 3.755 399 09:41:44 ( 3 restrained)
30 -1000.4767 3.755 399 09:41:44 ( 3 restrained)
31 -1000.4767 3.755 399 09:41:44 ( 3 restrained)
32 -1000.4767 3.755 399 09:41:45 ( 3 restrained)
Converged at iter 32 by tolerance(s): delta_theta=0.00e+00, delta_loglik=0.00e+00
Main AI-REML Loop finished. Combining results.
solveMME_cpp: adding nugget 7.59693e-07 to MME diagonal (scaled from median diag)
summary(MET_model4)Model statement:
mandala(fixed = yld ~ 1, random = ~env + geno + env:geno + env:rep +
env:row + env:col, data = df)
Variance Components:
component estimate std.error z.ratio bound %ch
env 1.909128e+01 15.69071594 1.216725e+00 P NA
geno 1.028310e+01 2.40181194 4.281393e+00 P NA
env:geno 4.405217e+00 0.87507272 5.034115e+00 P NA
env:rep 9.999198e-06 0.05364335 1.864014e-04 B NA
env:row 9.999025e-06 0.38122865 2.622842e-05 B NA
env:col 1.000161e-05 0.26957150 3.710189e-05 B NA
R.sigma2 3.754536e+00 0.37926628 9.899472e+00 P NA
Fixed Effects (BLUEs) [first 5]:
effect estimate std.error z.ratio
(Intercept) 102.0366 2.23826 45.58747
Converged: TRUE | Iterations: 32
Random Effects (BLUPs) [first 5]:
random level estimate std.error z.ratio
env E1 5.1213981 2.205983 2.3215948
env E2 -5.4529588 2.205983 -2.4718954
env E3 -0.3486815 2.205983 -0.1580618
env E4 0.6817221 2.205983 0.3090333
geno G001 0.5555093 1.241339 0.4475081
logLik: -1000.477 AIC: 2014.953 BIC: 2042.876 logLik_Trunc: -633.820
Two-Stage Workflow
The two-stage approach separates within-environment analysis (Stage 1) from across-environment analysis (Stage 2).
Stage 1: Per-Environment Analysis
Stage 1 fits separate models for each environment, extracting genotype BLUEs and their variance-covariance matrix.
Step 1: Prepare Stage 1 Configuration
s1_prep <- stage1_prep(
df = df,
env_col = "env",
s1_fixed = yld ~ geno,
s1_random = ~ rep + row + col,
s1_classify_term = "geno",
response_var = "yld"
)Step 2: Audit Stage 1 Setup
stage1_audit(df, s1_prep) env rows nonmiss_response nonmiss_id n_id_levels
1 E1 100 100 100 50
2 E2 100 100 100 50
3 E3 100 100 100 50
4 E4 100 100 100 50
Step 3: Run Stage 1 Models
MET_stage1_run <- mandala_stage1(
df = df,
prep = s1_prep
)Explicit param_df Check:
name type vc_idx init lower upper origin
1 rep var 1 5.955770 1e-06 Inf G
2 row var 2 5.955770 1e-06 Inf G
3 col var 3 5.955770 1e-06 Inf G
4 R.sigma2 R_var 0 7.444713 1e-06 Inf R
--- Starting EM Warmup ---
--- EM Warmup Complete. Starting AI-REML ---
Iter LogLik Sigma2 DF wall Restrained
Main AI-REML Loop finished. Combining results.
solveMME_cpp: adding nugget 3.79815e-07 to MME diagonal (scaled from median diag)
Notes:
- The predictions are obtained by averaging across the hypertable
calculated from model terms constructed solely from factors in
the averaging and classify sets.
- The simple averaging set: (none)
- The ignored set: rep, row, col
Explicit param_df Check:
name type vc_idx init lower upper origin
1 rep var 1 7.935386 1e-06 Inf G
2 row var 2 7.935386 1e-06 Inf G
3 col var 3 7.935386 1e-06 Inf G
4 R.sigma2 R_var 0 9.919232 1e-06 Inf R
--- Starting EM Warmup ---
--- EM Warmup Complete. Starting AI-REML ---
Iter LogLik Sigma2 DF wall Restrained
Main AI-REML Loop finished. Combining results.
solveMME_cpp: adding nugget 6.51258e-07 to MME diagonal (scaled from median diag)
Notes:
- The predictions are obtained by averaging across the hypertable
calculated from model terms constructed solely from factors in
the averaging and classify sets.
- The simple averaging set: (none)
- The ignored set: rep, row, col
Explicit param_df Check:
name type vc_idx init lower upper origin
1 rep var 1 8.419766 1e-06 Inf G
2 row var 2 8.419766 1e-06 Inf G
3 col var 3 8.419766 1e-06 Inf G
4 R.sigma2 R_var 0 10.524707 1e-06 Inf R
--- Starting EM Warmup ---
--- EM Warmup Complete. Starting AI-REML ---
Iter LogLik Sigma2 DF wall Restrained
Main AI-REML Loop finished. Combining results.
solveMME_cpp: adding nugget 5.46104e-07 to MME diagonal (scaled from median diag)
Notes:
- The predictions are obtained by averaging across the hypertable
calculated from model terms constructed solely from factors in
the averaging and classify sets.
- The simple averaging set: (none)
- The ignored set: rep, row, col
Explicit param_df Check:
name type vc_idx init lower upper origin
1 rep var 1 5.782310 1e-06 Inf G
2 row var 2 5.782310 1e-06 Inf G
3 col var 3 5.782310 1e-06 Inf G
4 R.sigma2 R_var 0 7.227887 1e-06 Inf R
--- Starting EM Warmup ---
--- EM Warmup Complete. Starting AI-REML ---
Iter LogLik Sigma2 DF wall Restrained
Main AI-REML Loop finished. Combining results.
solveMME_cpp: adding nugget 6.62442e-07 to MME diagonal (scaled from median diag)
Notes:
- The predictions are obtained by averaging across the hypertable
calculated from model terms constructed solely from factors in
the averaging and classify sets.
- The simple averaging set: (none)
- The ignored set: rep, row, col
Step 4: Report Check
stage1_report_check(MET_stage1_run, df) env fit_present BLUEs_present n_BLUE_rows error_message
1 E1 TRUE TRUE 50 <NA>
2 E2 TRUE TRUE 50 <NA>
3 E3 TRUE TRUE 50 <NA>
4 E4 TRUE TRUE 50 <NA>
Step 5: Bundle Stage 1 Outputs
The bundle contains BLUEs and their variance-covariance matrices for Stage 2.
bundle <- stage1_bundle(MET_stage1_run)
# Preview BLUEs
head(bundle$blues) env id BLUE SE
1 E1 G001 109.6486 3.189574
2 E1 G002 109.0374 3.189775
3 E1 G003 113.4323 3.189775
4 E1 G004 108.8724 3.189775
5 E1 G005 107.4882 3.189775
6 E1 G006 106.9827 3.189775
Stage 2: Across-Environment Analysis
Stage 2 analyzes the BLUEs from Stage 1, using vcov(RMAT) to account for BLUE uncertainty.
Step 1: Prepare Stage 2 Data
s2_inputs <- stage2_prep(
bundle = bundle,
df_name = "stage2_data",
vcov_name = "RMAT",
out_env = "env",
out_id = "geno",
out_resp = "yld"
)
s2_data <- s2_inputs$stage2_data
RMAT <- s2_inputs$RMAT
head(s2_data) env geno yld
1 E1 G001 109.6486
2 E1 G002 109.0374
3 E1 G003 113.4323
4 E1 G004 108.8724
5 E1 G005 107.4882
6 E1 G006 106.9827
# Preview RMAT structure (top-left corner)
RMAT[1:5, 1:5]5 x 5 sparse Matrix of class "dsCMatrix"
E1|G001 E1|G002 E1|G003 E1|G004 E1|G005
E1|G001 10.173380 4.876804 4.876804 4.876804 4.876804
E1|G002 4.876804 10.174664 4.877174 4.877174 4.877174
E1|G003 4.876804 4.877174 10.174664 4.877174 4.877174
E1|G004 4.876804 4.877174 4.877174 10.174664 4.877174
E1|G005 4.876804 4.877174 4.877174 4.877174 10.174664
Step 2: Align Row IDs
Ensure s2_data rownames match RMAT for proper variance structure alignment.
rownames(s2_data) <- paste0(s2_data$env, "|", s2_data$geno)
s2_data <- s2_data[rownames(RMAT), , drop = FALSE]
# Verify alignment
stopifnot(identical(rownames(s2_data), rownames(RMAT)))Step 3: Ensure Matrix Format
if (!inherits(RMAT, "Matrix")) {
RMAT <- Matrix::forceSymmetric(Matrix::Matrix(RMAT, sparse = TRUE))
}Step 4: Fit Stage 2 Model
MET_stage2_run <- mandala(
fixed = yld ~ env,
random = ~ geno,
R_formula = ~ vcov(RMAT),
data = s2_data,
matrix_list = list(RMAT = RMAT)
)Initial data rows: 200
Final data rows after NA handling: 200
Explicit param_df Check:
name type vc_idx init lower upper origin
1 geno var 1 11.99967 1e-06 Inf G
2 R.sigma2 R_var 0 14.99959 1e-06 Inf R
--- EM Warmup Skipped. Starting AI-REML ---
Starting AI-REML logLik = -558.3649
Iter LogLik Sigma2 DF wall Restrained
1 -548.2637 11.400 196 09:41:47 ( 0 restrained)
2 -540.7545 8.664 196 09:41:47 ( 0 restrained)
3 -532.8004 5.241 196 09:41:47 ( 0 restrained)
4 -532.4003 3.071 196 09:41:47 ( 0 restrained)
5 -532.4003 3.071 196 09:41:47 ( 0 restrained)
Converged at iter 5 by tolerance(s): delta_theta=0.00e+00, delta_loglik=0.00e+00
Main AI-REML Loop finished. Combining results.
solveMME_cpp: adding nugget 1.40104e-06 to MME diagonal (scaled from median diag)
summary(MET_stage2_run)Model statement:
mandala(fixed = yld ~ env, random = ~geno, data = s2_data, matrix_list = list(RMAT = RMAT),
R_formula = ~vcov(RMAT))
Variance Components:
component estimate std.error z.ratio bound %ch
geno 10.130038 2.6049988 3.888692 P NA
R.sigma2 3.071444 0.9516386 3.227532 P NA
Fixed Effects (BLUEs) [first 5]:
effect estimate std.error z.ratio
(Intercept) 107.191623 0.5138352 208.61088
envE2 -10.644406 0.3505106 -30.36828
envE3 -5.506115 0.3505106 -15.70884
envE4 -4.468897 0.3505106 -12.74967
Converged: TRUE | Iterations: 5
Random Effects (BLUPs) [first 5]:
random level estimate std.error z.ratio
geno G001 0.5943664 0.9497788 0.6257945
geno G002 0.2551813 0.9497788 0.2686745
geno G003 3.6729523 0.9497788 3.8671660
geno G004 -1.5904980 0.9497788 -1.6745983
geno G005 3.5063399 0.9497788 3.6917436
logLik: -532.400 AIC: 1068.801 BIC: 1075.357 logLik_Trunc: -352.288
Step 5: Extract Predictions
predicted_means <- mandala_predict(MET_stage2_run, classify_term = "geno")
Notes:
- The predictions are obtained by averaging across the hypertable
calculated from model terms constructed solely from factors in
the averaging and classify sets.
- The simple averaging set: env, [num mean]
head(predicted_means) geno predicted_value std_error
1 G001 102.6311 0.8454816
2 G002 102.2920 0.8454816
3 G003 105.7097 0.8454816
4 G004 100.4463 0.8454816
5 G005 105.5431 0.8454816
6 G006 102.5572 0.8454816
Step 6: Calculate Reliability
reliability <- stage2_reliability(
MET_stage2_run,
classify_term = "geno",
preds = predicted_means
)
head(reliability) geno predicted_value std_error PEV reliability accuracy weight_invPEV
1 G001 102.6311 0.8454816 0.7148392 0.9294337 0.9640714 1.398916
2 G002 102.2920 0.8454816 0.7148392 0.9294337 0.9640714 1.398916
3 G003 105.7097 0.8454816 0.7148392 0.9294337 0.9640714 1.398916
4 G004 100.4463 0.8454816 0.7148392 0.9294337 0.9640714 1.398916
5 G005 105.5431 0.8454816 0.7148392 0.9294337 0.9640714 1.398916
6 G006 102.5572 0.8454816 0.7148392 0.9294337 0.9640714 1.398916
Heritability Estimation
Stage 1 Heritability (Within-Environment)
Calculate heritability for each environment using the Stage 1 bundle.
s1_heritability <- mandala.h2(
df = df,
out = NULL,
bundle = bundle,
env_col = "env",
genotype_col = "geno",
response_col = "yld",
h2_fixed = yld ~ 1,
h2_random = ~ geno + rep + row + col,
em_iters = 5,
maxit = 100,
tol = 1e-2,
verbose = FALSE
)Explicit param_df Check:
name type vc_idx init lower upper origin
1 geno var 1 5.955770 1e-06 Inf G
2 rep var 2 5.955770 1e-06 Inf G
3 row var 3 5.955770 1e-06 Inf G
4 col var 4 5.955770 1e-06 Inf G
5 R.sigma2 R_var 0 7.444713 1e-06 Inf R
--- Starting EM Warmup ---
--- EM Warmup Complete. Starting AI-REML ---
Iter LogLik Sigma2 DF wall Restrained
Main AI-REML Loop finished. Combining results.
solveMME_cpp: adding nugget 4.8806e-07 to MME diagonal (scaled from median diag)
Explicit param_df Check:
name type vc_idx init lower upper origin
1 geno var 1 7.935386 1e-06 Inf G
2 rep var 2 7.935386 1e-06 Inf G
3 row var 3 7.935386 1e-06 Inf G
4 col var 4 7.935386 1e-06 Inf G
5 R.sigma2 R_var 0 9.919232 1e-06 Inf R
--- Starting EM Warmup ---
--- EM Warmup Complete. Starting AI-REML ---
Iter LogLik Sigma2 DF wall Restrained
Main AI-REML Loop finished. Combining results.
solveMME_cpp: adding nugget 7.04846e-07 to MME diagonal (scaled from median diag)
Explicit param_df Check:
name type vc_idx init lower upper origin
1 geno var 1 8.419766 1e-06 Inf G
2 rep var 2 8.419766 1e-06 Inf G
3 row var 3 8.419766 1e-06 Inf G
4 col var 4 8.419766 1e-06 Inf G
5 R.sigma2 R_var 0 10.524707 1e-06 Inf R
--- Starting EM Warmup ---
--- EM Warmup Complete. Starting AI-REML ---
Iter LogLik Sigma2 DF wall Restrained
Main AI-REML Loop finished. Combining results.
solveMME_cpp: adding nugget 5.93056e-07 to MME diagonal (scaled from median diag)
Explicit param_df Check:
name type vc_idx init lower upper origin
1 geno var 1 5.782310 1e-06 Inf G
2 rep var 2 5.782310 1e-06 Inf G
3 row var 3 5.782310 1e-06 Inf G
4 col var 4 5.782310 1e-06 Inf G
5 R.sigma2 R_var 0 7.227887 1e-06 Inf R
--- Starting EM Warmup ---
--- EM Warmup Complete. Starting AI-REML ---
Iter LogLik Sigma2 DF wall Restrained
Main AI-REML Loop finished. Combining results.
solveMME_cpp: adding nugget 7.37876e-07 to MME diagonal (scaled from median diag)
s1_heritability env n_genotypes sigma_g2 sigma_e2 mean_PEV mean_PEVD H2_plot H2_entrymean
1 E1 50 9.239014 5.265611 10.174814 18.12283 0.6369702 0.4493561
2 E2 50 18.660968 3.070977 43.610536 72.15667 0.8586884 0.0000000
3 E3 50 21.298680 3.662298 12.018695 21.02940 0.8532791 0.7178535
4 E4 50 13.256756 3.019129 8.452715 15.33001 0.8145029 0.6811922
H2_Cullis_PEV H2_Cullis_PEVD
1 0.4493561 0.0192226
2 0.0000000 0.0000000
3 0.7178535 0.5063214
4 0.6811922 0.4218038
Stage 2 Heritability (Across-Environment)
Calculate broad-sense heritability from the Stage 2 model.
h2_stage2 <- h2_estimates(
random_mod = MET_stage2_run,
genotype = "geno"
)
h2_stage2 variable value
source_random TRUE
source_fixed FALSE
sigma_g2 10.1300382835787
sigma_e2 3.07144380861993
n_rep 4
mean_PEV_BLUP 0.902079800565287
avsed_BLUE NA
vdBLUE_avg NA
H2_Cullis 0.910950010719347
H2_Piepho NA
H2_Plot 0.767340985870443
H2_Standard 0.929540461372848
Note:
Plot and entry mean heritabilities are just the approximation- assuming a single experiment with only 'genotype' and 'error' variance components in the model. For MET, modify the equations and estimate accordingly
Variance Summary
stage2_variance_summary(MET_stage2_run, s2_data, RMAT) Component Variance PVE
1 env NA NA
2 genotype 10.130038 0.3258044
3 g x env 3.071444 0.0987844
4 Stage1.error 17.890914 0.5754112
Single vs Two-Stage Comparison
Compare genotype rankings between single-stage (Model 3) and two-stage approaches.
Extract BLUPs
# Single-stage BLUPs
b1 <- subset(MET_model3$BLUPs, random == "geno")[, c("level", "estimate")]
names(b1) <- c("geno", "blup_single")
# Two-stage BLUPs
b2 <- subset(MET_stage2_run$BLUPs, random == "geno")[, c("level", "estimate")]
names(b2) <- c("geno", "blup_two")
# Merge
blup_comparison <- merge(b1, b2, by = "geno")
head(blup_comparison) geno blup_single blup_two
1 G001 0.5555188 0.5943664
2 G002 0.2386548 0.2551813
3 G003 3.4283639 3.6729523
4 G004 -1.4838819 -1.5904980
5 G005 3.2728545 3.5063399
6 G006 0.4862198 0.5204434
Correlation Analysis
cor_blup <- cor(blup_comparison$blup_single, blup_comparison$blup_two,
use = "complete.obs")
cat("Correlation between single-stage and two-stage BLUPs:", round(cor_blup, 3), "\n")Correlation between single-stage and two-stage BLUPs: 1
Scatter Plot: BLUPs
ggplot(blup_comparison, aes(x = blup_single, y = blup_two)) +
geom_point(alpha = 0.6, size = 2) +
geom_abline(intercept = 0, slope = 1, linetype = "dashed", color = "red") +
labs(
title = "Genotype BLUPs: Single-Stage vs Two-Stage",
subtitle = paste("r =", round(cor_blup, 3)),
x = "BLUPs (Single-stage MET)",
y = "BLUPs (Two-stage MET)"
) +
theme_minimal()
Compare Predicted Means
# Single-stage predictions
p1 <- mandala_predict(MET_model3, classify_term = "geno")
Notes:
- The predictions are obtained by averaging across the hypertable
calculated from model terms constructed solely from factors in
the averaging and classify sets.
- The simple averaging set: env, [num mean]
- The ignored set: env:rep, env:row, env:col
p1 <- p1[, c("geno", "predicted_value")]
names(p1) <- c("geno", "pred_single")
# Two-stage predictions
p2 <- predicted_means[, c("geno", "predicted_value")]
names(p2) <- c("geno", "pred_two")
# Merge
pred_comparison <- merge(p1, p2, by = "geno")
cor_pred <- cor(pred_comparison$pred_single, pred_comparison$pred_two,
use = "complete.obs")ggplot(pred_comparison, aes(x = pred_single, y = pred_two)) +
geom_point(alpha = 0.6, size = 2) +
geom_abline(intercept = 0, slope = 1, linetype = "dashed", color = "red") +
labs(
title = "Predicted Means: Single-Stage vs Two-Stage",
subtitle = paste("r =", round(cor_pred, 3)),
x = "Predicted Means (Single-stage)",
y = "Predicted Means (Two-stage)"
) +
theme_minimal()
Summary
- Single-stage models fit all data jointly but can be computationally intensive
- Two-stage workflows are modular and efficient, especially for large METs
- Use
vcov(RMAT)to properly propagate Stage 1 uncertainty to Stage 2 - Both approaches should give similar genotype rankings when properly implemented
- Mandala provides heritability functions for both stages
Next Steps
- Tutorial 07: Two-Stage Genomics — Integrate GBLUP with the two-stage workflow
Session Information
sessionInfo()R version 4.5.2 (2025-10-31)
Platform: aarch64-apple-darwin20
Running under: macOS Tahoe 26.2
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.1
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: America/New_York
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ggplot2_4.0.2 dplyr_1.2.0 mandala_1.0.1
loaded via a namespace (and not attached):
[1] vctrs_0.7.1 cli_3.6.5 knitr_1.51 rlang_1.1.7
[5] xfun_0.56 generics_0.1.4 S7_0.2.1 jsonlite_2.0.0
[9] labeling_0.4.3 glue_1.8.0 htmltools_0.5.9 scales_1.4.0
[13] rmarkdown_2.30 grid_4.5.2 evaluate_1.0.5 tibble_3.3.1
[17] fastmap_1.2.0 yaml_2.3.12 lifecycle_1.0.5 compiler_4.5.2
[21] RColorBrewer_1.1-3 Rcpp_1.1.1 pkgconfig_2.0.3 farver_2.1.2
[25] lattice_0.22-7 digest_0.6.39 R6_2.6.1 utf8_1.2.6
[29] tidyselect_1.2.1 pillar_1.11.1 magrittr_2.0.4 Matrix_1.7-4
[33] withr_3.0.2 gtable_0.3.6 tools_4.5.2