Working with Example Datasets

Analyzing common experimental designs with Mandala

Overview

This tutorial demonstrates how to analyze real agricultural field trial datasets using the mandala package. We’ll work with common experimental designs and show how to properly specify models.

Loading Packages

library(mandala)
library(dplyr)
library(ggplot2)
library(tidyr)

Dataset 1: Randomized Complete Block Design (RCBD)

Data Description

A typical RCBD experiment with:

  • 20 genotypes
  • 3 replications (blocks)
  • Response: grain yield (kg/ha)
# Create realistic RCBD dataset
set.seed(456)
n_genotypes <- 20
n_blocks <- 3

rcbd_data <- expand.grid(
  genotype = factor(paste0("G", 1:n_genotypes)),
  block = factor(paste0("B", 1:n_blocks))
)

# Simulate realistic yield data
rcbd_data$yield <- 5000

# Add genotype effects (some good, some poor yielders)
genotype_effects <- c(
  rnorm(5, mean = 800, sd = 100),   # High yielders
  rnorm(10, mean = 0, sd = 150),    # Average yielders
  rnorm(5, mean = -600, sd = 100)   # Low yielders
)
rcbd_data$yield <- rcbd_data$yield + 
  genotype_effects[as.numeric(rcbd_data$genotype)]

# Add block effects
block_effects <- c(200, -100, -50)
rcbd_data$yield <- rcbd_data$yield + 
  block_effects[as.numeric(rcbd_data$block)]

# Add error term
rcbd_data$yield <- rcbd_data$yield + rnorm(nrow(rcbd_data), mean = 0, sd = 300)

head(rcbd_data, 10)
   genotype block    yield
1        G1    B1 5723.267
2        G2    B1 4881.472
3        G3    B1 5020.040
4        G4    B1 5046.350
5        G5    B1 4783.985
6        G6    B1 5113.979
7        G7    B1 4499.892
8        G8    B1 4729.488
9        G9    B1 5199.150
10      G10    B1 5735.364

Exploratory Data Analysis

# Summary statistics by genotype
rcbd_summary <- rcbd_data %>%
  group_by(genotype) %>%
  summarise(
    mean_yield = mean(yield),
    sd_yield = sd(yield),
    cv = (sd_yield / mean_yield) * 100
  ) %>%
  arrange(desc(mean_yield))

head(rcbd_summary, 10)
# A tibble: 10 × 4
   genotype mean_yield sd_yield    cv
   <fct>         <dbl>    <dbl> <dbl>
 1 G11           5760.    171.   2.97
 2 G10           5734.    152.   2.65
 3 G12           5589.     99.8  1.79
 4 G13           5441.    349.   6.41
 5 G1            5406.    413.   7.64
 6 G16           5336.    389.   7.29
 7 G20           5241.     87.5  1.67
 8 G14           5232.    107.   2.05
 9 G15           5197.    154.   2.96
10 G18           5159.    345.   6.68
# Visualization
ggplot(rcbd_data, aes(x = reorder(genotype, yield), y = yield, color = block)) +
  geom_point(size = 3, alpha = 0.7) +
  coord_flip() +
  theme_minimal() +
  labs(
    title = "Yield by Genotype (RCBD)",
    x = "Genotype", 
    y = "Yield (kg/ha)"
  )

# Block effects
ggplot(rcbd_data, aes(x = block, y = yield)) +
  geom_boxplot(fill = "#6B5A7D", alpha = 0.7) +
  geom_jitter(width = 0.2, alpha = 0.5, color = "#B5795A") +
  theme_minimal() +
  labs(
    title = "Yield Distribution by Block",
    x = "Block", 
    y = "Yield (kg/ha)"
  )

Model Fitting with Mandala

# Fit the RCBD model
rcbd_model <- mandala(
  fixed  = yield ~ genotype,
  random = ~ block,
  data   = rcbd_data
)
Initial data rows: 60 
Final data rows after NA handling: 60 
'as(<ddiMatrix>, "dgCMatrix")' is deprecated.
Use 'as(as(., "generalMatrix"), "CsparseMatrix")' instead.
See help("Deprecated") and help("Matrix-deprecated").
Explicit param_df Check:
      name  type vc_idx     init lower upper origin
1    block   var      1 72636.96 1e-06   Inf      G
2 R.sigma2 R_var      0 90796.20 1e-06   Inf      R
--- Starting EM Warmup ---
EM Warmup: starting theta:
 block 72640 R.sigma2 90800 
EM iter 1: logLik=-292.7282; theta: block 50820 R.sigma2 43670 
EM iter 2 : logLik decreased or failed. Stopping EM.
EM Warmup: ending theta:
 block 50820 R.sigma2 43670 
--- EM Warmup Complete. Starting AI-REML ---
ai_reml_core(): method=auto
Starting AI-REML logLik = -293.2461
 Iter     LogLik   Sigma2  DF     wall    Restrained
    1  -291.6739 54155.615   40  09:44:47  ( 0 restrained)
    2  -291.1937 67152.963   40  09:44:47  ( 0 restrained)
    3  -291.1899 70390.870   40  09:44:47  ( 0 restrained)
    4  -291.1054 69557.565   40  09:44:47  ( 0 restrained)
    5  -291.0187 67364.340   40  09:44:47  ( 0 restrained)
    6  -290.9652 65521.102   40  09:44:47  ( 0 restrained)
    7  -290.9408 64599.019   40  09:44:47  ( 0 restrained)
    8  -290.9313 64451.096   40  09:44:47  ( 0 restrained)
    9  -290.9270 64690.448   40  09:44:47  ( 0 restrained)
   10  -290.9253 64991.927   40  09:44:47  ( 0 restrained)
   11  -290.9248 65192.879   40  09:44:47  ( 0 restrained)
   12  -290.9246 65267.209   40  09:44:47  ( 0 restrained)
   13  -290.9245 65257.140   40  09:44:47  ( 0 restrained)
   14  -290.9245 65215.852   40  09:44:47  ( 0 restrained)
   15  -290.9245 65178.793   40  09:44:47  ( 0 restrained)
LogLik plateaued for 5 iterations -- stopping.
Main AI-REML Loop finished. Combining results.
solveMME_cpp: adding nugget 4.60011e-11 to MME diagonal (scaled from median diag)
# View model summary and variance components
summary(rcbd_model)
Model statement:
mandala(fixed = yield ~ genotype, random = ~block, data = rcbd_data)

Variance Components:
 component estimate std.error   z.ratio bound %ch
     block 11206.94  14489.15 0.7734715     P  NA
  R.sigma2 65215.85  14971.01 4.3561411     P  NA

Fixed Effects (BLUEs) [first 5]:
      effect   estimate std.error    z.ratio
 (Intercept) 5406.47680  159.6052 33.8740637
 genotypeG10  327.20688  208.5108  1.5692566
 genotypeG11  353.90274  208.5108  1.6972877
 genotypeG12  182.19156  208.5108  0.8737754
 genotypeG13   34.68584  208.5108  0.1663503

Converged: TRUE  |  Iterations: 15

Random Effects (BLUPs) [first 5]:
 random level   estimate std.error     z.ratio
  block    B1  94.489529  73.61762  1.28351783
  block    B2  -3.138541  73.61762 -0.04263302
  block    B3 -91.348201  73.61762 -1.24084695

logLik: -290.925   AIC: 585.849   BIC: 589.227   logLik_Trunc: -254.167
# Extract variance components table
var_components <- summary(rcbd_model)$varcomp
Model statement:
mandala(fixed = yield ~ genotype, random = ~block, data = rcbd_data)

Variance Components:
 component estimate std.error   z.ratio bound %ch
     block 11206.94  14489.15 0.7734715     P  NA
  R.sigma2 65215.85  14971.01 4.3561411     P  NA

Fixed Effects (BLUEs) [first 5]:
      effect   estimate std.error    z.ratio
 (Intercept) 5406.47680  159.6052 33.8740637
 genotypeG10  327.20688  208.5108  1.5692566
 genotypeG11  353.90274  208.5108  1.6972877
 genotypeG12  182.19156  208.5108  0.8737754
 genotypeG13   34.68584  208.5108  0.1663503

Converged: TRUE  |  Iterations: 15

Random Effects (BLUPs) [first 5]:
 random level   estimate std.error     z.ratio
  block    B1  94.489529  73.61762  1.28351783
  block    B2  -3.138541  73.61762 -0.04263302
  block    B3 -91.348201  73.61762 -1.24084695

logLik: -290.925   AIC: 585.849   BIC: 589.227   logLik_Trunc: -254.167
print(var_components)
  component estimate std.error   z.ratio bound %ch
1     block 11206.94  14489.15 0.7734715     P  NA
2  R.sigma2 65215.85  14971.01 4.3561411     P  NA
# Heritability Estimation
rcbd_model_rand <- mandala(
  fixed  = yield ~ 1,
  random = ~ genotype + block,
  data   = rcbd_data
)
Initial data rows: 60 
Final data rows after NA handling: 60 
Explicit param_df Check:
      name  type vc_idx     init lower upper origin
1 genotype   var      1 72636.96 1e-06   Inf      G
2    block   var      2 72636.96 1e-06   Inf      G
3 R.sigma2 R_var      0 90796.20 1e-06   Inf      R
--- Starting EM Warmup ---
EM Warmup: starting theta:
 genotype 72640 block 72640 R.sigma2 90800 
EM iter 1: logLik=-433.7394; theta: genotype 70060 block 50820 R.sigma2 75600 
EM iter 2: logLik=-432.9942; theta: genotype 64280 block 36810 R.sigma2 88150 
EM iter 3 : logLik decreased or failed. Stopping EM.
EM Warmup: ending theta:
 genotype 64280 block 36810 R.sigma2 88150 
--- EM Warmup Complete. Starting AI-REML ---
ai_reml_core(): method=auto
Starting AI-REML logLik = -433.4529
 Iter     LogLik   Sigma2  DF     wall    Restrained
    1  -432.2164 69766.090   59  09:44:47  ( 0 restrained)
    2  -432.2091 53022.228   59  09:44:47  ( 0 restrained)
    3  -432.2091 53022.228   59  09:44:47  ( 0 restrained)
Converged at iter 3 by tolerance(s): delta_theta=0.00e+00, delta_loglik=0.00e+00
Main AI-REML Loop finished. Combining results.
solveMME_cpp: adding nugget 6.66977e-11 to MME diagonal (scaled from median diag)
h2_table <- h2_estimates(
  random_mod = rcbd_model_rand,
  fixed_mod  = rcbd_model,
  genotype   = "genotype"
)
print(h2_table)
      variable             value
 source_random              TRUE
  source_fixed              TRUE
      sigma_g2  98837.0586124318
      sigma_e2  53022.2280843937
         n_rep                 3
 mean_PEV_BLUP  19185.1727994583
    avsed_BLUE  208.511848912685
    vdBLUE_avg  43477.1911369864
     H2_Cullis 0.805890896908529
     H2_Piepho 0.819709909797593
       H2_Plot 0.650846324662066
   H2_Standard 0.848305691274437

Note:
Plot and entry mean heritabilities are just the approximation- assuming a single experiment with only 'genotype' and 'error' variance components in the model. For MET, modify the equations and estimate accordingly
# Extract BLUEs and Rank Genotypes
blues <- rcbd_model$BLUEs
genotype_ranking <- blues %>% arrange(desc(estimate))
print(genotype_ranking)
                 effect   estimate std.error    z.ratio
(Intercept) (Intercept) 5406.47680  159.6052 33.8740637
genotypeG11 genotypeG11  353.90274  208.5108  1.6972877
genotypeG10 genotypeG10  327.20688  208.5108  1.5692566
genotypeG12 genotypeG12  182.19156  208.5108  0.8737754
genotypeG13 genotypeG13   34.68584  208.5108  0.1663503
genotypeG16 genotypeG16  -70.49500  208.5108 -0.3380881
genotypeG20 genotypeG20 -165.45782  208.5108 -0.7935218
genotypeG14 genotypeG14 -174.70965  208.5108 -0.8378927
genotypeG15 genotypeG15 -209.31771  208.5108 -1.0038701
genotypeG18 genotypeG18 -247.55828  208.5108 -1.1872686
genotypeG3   genotypeG3 -329.42266  208.5108 -1.5798833
genotypeG17 genotypeG17 -357.37902  208.5108 -1.7139597
genotypeG2   genotypeG2 -423.77917  208.5108 -2.0324092
genotypeG4   genotypeG4 -487.16973  208.5108 -2.3364250
genotypeG6   genotypeG6 -525.87249  208.5108 -2.5220401
genotypeG5   genotypeG5 -595.12547  208.5108 -2.8541716
genotypeG19 genotypeG19 -618.78416  208.5108 -2.9676367
genotypeG7   genotypeG7 -829.20251  208.5108 -3.9767854
genotypeG9   genotypeG9 -859.56759  208.5108 -4.1224137
genotypeG8   genotypeG8 -860.77762  208.5108 -4.1282169

Dataset 2: Alpha-Lattice Design

Data Description

An alpha-lattice design with:

  • 50 genotypes
  • 2 replications
  • Incomplete blocks of size 5 within each replication
set.seed(789)
n_genotypes <- 50
n_reps <- 2
block_size <- 5
n_blocks_per_rep <- n_genotypes / block_size

# Create design structure
alpha_data <- data.frame(
  genotype = factor(rep(paste0("G", 1:n_genotypes), times = n_reps)),
  rep = factor(rep(1:n_reps, each = n_genotypes))
)

# Assign incomplete blocks within reps
alpha_data$incomplete_block <- factor(
  paste0(alpha_data$rep, ".", 
         rep(rep(1:n_blocks_per_rep, each = block_size), times = n_reps))
)

# Simulate yield data
alpha_data$yield <- 6000

# Add genotype effects
genotype_effects <- rnorm(n_genotypes, mean = 0, sd = 400)
alpha_data$yield <- alpha_data$yield + 
  genotype_effects[as.numeric(alpha_data$genotype)]

# Add rep effects
rep_effects <- c(300, -300)
alpha_data$yield <- alpha_data$yield + 
  rep_effects[as.numeric(alpha_data$rep)]

# Add incomplete block effects
n_incomplete_blocks <- n_reps * n_blocks_per_rep
incomplete_block_effects <- rnorm(n_incomplete_blocks, mean = 0, sd = 200)
alpha_data$yield <- alpha_data$yield + 
  incomplete_block_effects[as.numeric(alpha_data$incomplete_block)]

# Add error
alpha_data$yield <- alpha_data$yield + rnorm(nrow(alpha_data), mean = 0, sd = 350)

head(alpha_data, 10)
   genotype rep incomplete_block    yield
1        G1   1              1.1 6690.737
2        G2   1              1.1 6256.748
3        G3   1              1.1 6364.876
4        G4   1              1.1 6398.428
5        G5   1              1.1 6426.309
6        G6   1              1.2 5131.415
7        G7   1              1.2 7174.288
8        G8   1              1.2 6985.033
9        G9   1              1.2 6968.603
10      G10   1              1.2 4985.144

Model Fitting

# Fit Alpha-Lattice Model
alpha_model_fixed <- mandala(
  fixed  = yield ~ genotype,
  random = ~ rep + rep:incomplete_block,
  data   = alpha_data
)
Initial data rows: 100 
Final data rows after NA handling: 100 
Explicit param_df Check:
                  name  type vc_idx     init lower upper origin
1                  rep   var      1 151979.6 1e-06   Inf      G
2 rep:incomplete_block   var      2 151979.6 1e-06   Inf      G
3             R.sigma2 R_var      0 189974.5 1e-06   Inf      R
--- Starting EM Warmup ---
EM Warmup: starting theta:
 rep 152000 rep:incomplete_block 152000 R.sigma2 190000 
EM iter 1: logLik=-391.3152; theta: rep 137400 rep:incomplete_block 122500 R.sigma2 80030 
EM iter 2 : logLik decreased or failed. Stopping EM.
EM Warmup: ending theta:
 rep 137400 rep:incomplete_block 122500 R.sigma2 80030 
--- EM Warmup Complete. Starting AI-REML ---
ai_reml_core(): method=auto
Starting AI-REML logLik = -391.4625
 Iter     LogLik   Sigma2  DF     wall    Restrained
    1  -388.9722 99238.374   50  09:44:47  ( 0 restrained)
    2  -387.8479 123055.584   50  09:44:47  ( 0 restrained)
    3  -387.6506 135005.966   50  09:44:47  ( 0 restrained)
    4  -387.4433 136953.119   50  09:44:47  ( 0 restrained)
    5  -387.2071 133881.113   50  09:44:47  ( 0 restrained)
    6  -387.0147 129991.301   50  09:44:47  ( 0 restrained)
    7  -386.8842 127391.733   50  09:44:47  ( 0 restrained)
    8  -386.8012 126425.885   50  09:44:47  ( 0 restrained)
    9  -386.7535 126551.856   50  09:44:47  ( 0 restrained)
   10  -386.7357 127083.483   50  09:44:47  ( 0 restrained)
   11  -386.7294 127562.203   50  09:44:47  ( 0 restrained)
   12  -386.7272 127817.638   50  09:44:47  ( 0 restrained)
   13  -386.7264 127872.012   50  09:44:47  ( 0 restrained)
   14  -386.7260 127819.103   50  09:44:47  ( 0 restrained)
   15  -386.7259 127743.357   50  09:44:47  ( 0 restrained)
   16  -386.7259 127689.832   50  09:44:47  ( 0 restrained)
   17  -386.7259 127668.121   50  09:44:47  ( 0 restrained)
LogLik plateaued for 5 iterations -- stopping.
Main AI-REML Loop finished. Combining results.
solveMME_cpp: adding nugget 1.5663e-11 to MME diagonal (scaled from median diag)
summary(alpha_model_fixed)
Model statement:
mandala(fixed = yield ~ genotype, random = ~rep + rep:incomplete_block, 
    data = alpha_data)

Variance Components:
            component  estimate std.error   z.ratio bound %ch
                  rep 188544.87 272761.82 0.6912436     P  NA
 rep:incomplete_block  17666.16  21157.54 0.8349816     P  NA
             R.sigma2 127689.83  28564.28 4.4702621     P  NA

Fixed Effects (BLUEs) [first 5]:
      effect   estimate std.error    z.ratio
 (Intercept)  6126.1006  408.5899 14.9932734
 genotypeG10 -1212.1477  381.2500 -3.1794035
 genotypeG11   184.7454  381.2500  0.4845780
 genotypeG12   339.2113  381.2500  0.8897345
 genotypeG13  -491.5735  381.2500 -1.2893729

Converged: TRUE  |  Iterations: 17

Random Effects (BLUPs) [first 5]:
               random  level   estimate std.error    z.ratio
                  rep      1  303.58093  310.4575  0.9778503
                  rep      2 -303.56283  310.4575 -0.9777921
 rep:incomplete_block  1.1.1   19.31326  120.0306  0.1609028
 rep:incomplete_block 1.1.10  -31.22415  120.0308 -0.2601344
 rep:incomplete_block  1.1.2   36.17061  120.0308  0.3013443

logLik: -386.726   AIC: 779.452   BIC: 785.188   logLik_Trunc: -340.779
var_comps_alpha <- summary(alpha_model_fixed)$varcomp
Model statement:
mandala(fixed = yield ~ genotype, random = ~rep + rep:incomplete_block, 
    data = alpha_data)

Variance Components:
            component  estimate std.error   z.ratio bound %ch
                  rep 188544.87 272761.82 0.6912436     P  NA
 rep:incomplete_block  17666.16  21157.54 0.8349816     P  NA
             R.sigma2 127689.83  28564.28 4.4702621     P  NA

Fixed Effects (BLUEs) [first 5]:
      effect   estimate std.error    z.ratio
 (Intercept)  6126.1006  408.5899 14.9932734
 genotypeG10 -1212.1477  381.2500 -3.1794035
 genotypeG11   184.7454  381.2500  0.4845780
 genotypeG12   339.2113  381.2500  0.8897345
 genotypeG13  -491.5735  381.2500 -1.2893729

Converged: TRUE  |  Iterations: 17

Random Effects (BLUPs) [first 5]:
               random  level   estimate std.error    z.ratio
                  rep      1  303.58093  310.4575  0.9778503
                  rep      2 -303.56283  310.4575 -0.9777921
 rep:incomplete_block  1.1.1   19.31326  120.0306  0.1609028
 rep:incomplete_block 1.1.10  -31.22415  120.0308 -0.2601344
 rep:incomplete_block  1.1.2   36.17061  120.0308  0.3013443

logLik: -386.726   AIC: 779.452   BIC: 785.188   logLik_Trunc: -340.779
print(var_comps_alpha)
             component  estimate std.error   z.ratio bound %ch
1                  rep 188544.87 272761.82 0.6912436     P  NA
2 rep:incomplete_block  17666.16  21157.54 0.8349816     P  NA
3             R.sigma2 127689.83  28564.28 4.4702621     P  NA
# Heritability Estimation
alpha_model_random <- mandala(
  fixed  = yield ~ 1,
  random = ~ genotype + rep + rep:incomplete_block,
  data   = alpha_data
)
Initial data rows: 100 
Final data rows after NA handling: 100 
Explicit param_df Check:
                  name  type vc_idx     init lower upper origin
1             genotype   var      1 151979.6 1e-06   Inf      G
2                  rep   var      2 151979.6 1e-06   Inf      G
3 rep:incomplete_block   var      3 151979.6 1e-06   Inf      G
4             R.sigma2 R_var      0 189974.5 1e-06   Inf      R
--- Starting EM Warmup ---
EM Warmup: starting theta:
 genotype 152000 rep 152000 rep:incomplete_block 152000 R.sigma2 190000 
EM iter 1: logLik=-766.1918; theta: genotype 133200 rep 137400 rep:incomplete_block 101600 R.sigma2 132300 
EM iter 2: logLik=-762.6178; theta: genotype 115600 rep 126200 rep:incomplete_block 74720 R.sigma2 163200 
EM iter 3: logLik=-762.2359; theta: genotype 104300 rep 117700 rep:incomplete_block 56690 R.sigma2 158300 
EM iter 4: logLik=-761.4972; theta: genotype 95280 rep 111400 rep:incomplete_block 45080 R.sigma2 166600 
EM iter 5: logLik=-761.3636; theta: genotype 88740 rep 106700 rep:incomplete_block 37040 R.sigma2 168800 
EM Warmup: ending theta:
 genotype 88740 rep 106700 rep:incomplete_block 37040 R.sigma2 168800 
--- EM Warmup Complete. Starting AI-REML ---
ai_reml_core(): method=auto
Starting AI-REML logLik = -761.2430
 Iter     LogLik   Sigma2  DF     wall    Restrained
    1  -759.8962 138629.660   99  09:44:47  ( 0 restrained)
    2  -759.6219 108637.486   99  09:44:47  ( 0 restrained)
    3  -759.6219 108637.486   99  09:44:47  ( 0 restrained)
Converged at iter 3 by tolerance(s): delta_theta=0.00e+00, delta_loglik=0.00e+00
Main AI-REML Loop finished. Combining results.
solveMME_cpp: adding nugget 2.57389e-11 to MME diagonal (scaled from median diag)
h2_alpha <- h2_estimates(
  random_mod = alpha_model_random,
  fixed_mod  = alpha_model_fixed,
  genotype   = "genotype"
)
print(h2_alpha)
      variable             value
 source_random              TRUE
  source_fixed              TRUE
      sigma_g2  136443.111750759
      sigma_e2  108637.485589495
         n_rep                 2
 mean_PEV_BLUP  44650.1495327522
    avsed_BLUE  379.359551187481
    vdBLUE_avg  143913.669077167
     H2_Cullis 0.672756294108018
     H2_Piepho 0.654717595566562
       H2_Plot 0.556727514260668
   H2_Standard 0.715253644790974

Note:
Plot and entry mean heritabilities are just the approximation- assuming a single experiment with only 'genotype' and 'error' variance components in the model. For MET, modify the equations and estimate accordingly
# BLUPs
blups_alpha <- mandala_predict(alpha_model_random, classify_term = "genotype")
Random terms included: 
PEV matrix computed. Scale factor used = 108637

Notes:
- The predictions are obtained by averaging across the hypertable
  calculated from model terms constructed solely from factors in
  the averaging and classify sets.
- The simple averaging set: (none)
- The ignored set: rep, rep:incomplete_block
head(blups_alpha)
  genotype predicted_value std_error
1       G1        6060.863  117035.9
2      G10        5228.278  117035.9
3      G11        6197.783  117035.9
4      G12        6308.265  117035.9
5      G13        5714.043  117035.9
6      G14        6053.733  117035.9

Dataset 3: Multi-Environment Trial

Data Description

A multi-environment trial with:

  • 30 genotypes
  • 4 locations (environments)
  • 3 blocks per location
set.seed(101)
n_genotypes <- 30
n_locations <- 4
n_blocks <- 3

met_data <- expand.grid(
  genotype = factor(paste0("G", 1:n_genotypes)),
  location = factor(paste0("Loc", 1:n_locations)),
  block = factor(paste0("B", 1:n_blocks))
)

# Simulate yield data
met_data$yield <- 5500

# Add genotype main effects
genotype_effects <- rnorm(n_genotypes, mean = 0, sd = 300)
met_data$yield <- met_data$yield + 
  genotype_effects[as.numeric(met_data$genotype)]

# Add location effects
location_effects <- c(800, 200, -300, -700)
met_data$yield <- met_data$yield + 
  location_effects[as.numeric(met_data$location)]

# Add genotype x location interaction
gxe_effects <- matrix(
  rnorm(n_genotypes * n_locations, mean = 0, sd = 250), 
  nrow = n_genotypes, 
  ncol = n_locations
)
met_data$yield <- met_data$yield + 
  gxe_effects[cbind(as.numeric(met_data$genotype), 
                    as.numeric(met_data$location))]

# Add block within location effects
n_block_loc <- n_locations * n_blocks
block_loc_effects <- rnorm(n_block_loc, mean = 0, sd = 150)
met_data$block_loc <- factor(paste0(met_data$location, "_", met_data$block))
met_data$yield <- met_data$yield + 
  block_loc_effects[as.numeric(met_data$block_loc)]

# Add error
met_data$yield <- met_data$yield + rnorm(nrow(met_data), mean = 0, sd = 400)

head(met_data, 10)
   genotype location block    yield block_loc
1        G1     Loc1    B1 5731.465   Loc1_B1
2        G2     Loc1    B1 5523.519   Loc1_B1
3        G3     Loc1    B1 5900.409   Loc1_B1
4        G4     Loc1    B1 6928.250   Loc1_B1
5        G5     Loc1    B1 5208.988   Loc1_B1
6        G6     Loc1    B1 5877.214   Loc1_B1
7        G7     Loc1    B1 6453.261   Loc1_B1
8        G8     Loc1    B1 5900.387   Loc1_B1
9        G9     Loc1    B1 6430.253   Loc1_B1
10      G10     Loc1    B1 6269.523   Loc1_B1

Exploratory Analysis

# Location summary
met_data %>%
  group_by(location) %>%
  summarise(
    mean_yield = mean(yield),
    sd_yield = sd(yield),
    min_yield = min(yield),
    max_yield = max(yield)
  )
# A tibble: 4 × 5
  location mean_yield sd_yield min_yield max_yield
  <fct>         <dbl>    <dbl>     <dbl>     <dbl>
1 Loc1          6156.     488.     4924.     7290.
2 Loc2          5567.     605.     4126.     6960.
3 Loc3          5304.     536.     4058.     6528.
4 Loc4          4686.     545.     3584.     6042.
# GxE interaction plot
met_means <- met_data %>%
  group_by(genotype, location) %>%
  summarise(mean_yield = mean(yield), .groups = "drop")

# Select top 10 genotypes for visualization
top_genotypes <- met_means %>%
  group_by(genotype) %>%
  summarise(overall_mean = mean(mean_yield), .groups = "drop") %>%
  arrange(desc(overall_mean)) %>%
  head(10) %>%
  pull(genotype)

met_means %>%
  filter(genotype %in% top_genotypes) %>%
  ggplot(aes(x = location, y = mean_yield, group = genotype, color = genotype)) +
  geom_line(linewidth = 1, alpha = 0.7) +
  geom_point(size = 3) +
  theme_minimal() +
  scale_color_viridis_d() +
  labs(
    title = "Genotype × Environment Interaction (Top 10 Genotypes)",
    x = "Location", 
    y = "Mean Yield (kg/ha)"
  )

Model Fitting

# Fit Multi-Environment Model
met_model_fixed <- mandala(
  fixed  = yield ~ genotype + location + genotype:location,
  random = ~ location:block,
  data   = met_data
)
Initial data rows: 360 
Final data rows after NA handling: 360 
Explicit param_df Check:
            name  type vc_idx     init lower upper origin
1 location:block   var      1 229685.4 1e-06   Inf      G
2       R.sigma2 R_var      0 287106.8 1e-06   Inf      R
--- Starting EM Warmup ---
EM Warmup: starting theta:
 location:block 229700 R.sigma2 287100 
EM iter 1: logLik=-1865.5855; theta: location:block 160600 R.sigma2 98970 
EM iter 2: logLik=-1849.9651; theta: location:block 115600 R.sigma2 99410 
EM iter 3: logLik=-1848.8735; theta: location:block 84830 R.sigma2 99180 
EM iter 4: logLik=-1848.3127; theta: location:block 64240 R.sigma2 99260 
EM iter 5: logLik=-1847.8266; theta: location:block 50430 R.sigma2 99350 
EM Warmup: ending theta:
 location:block 50430 R.sigma2 99350 
--- EM Warmup Complete. Starting AI-REML ---
ai_reml_core(): method=auto
Starting AI-REML logLik = -1847.5449
 Iter     LogLik   Sigma2  DF     wall    Restrained
    1 -1840.3336 123189.367  240  09:44:48  ( 0 restrained)
    2 -1839.3973 152754.815  240  09:44:48  ( 0 restrained)
    3 -1839.3973 152754.815  240  09:44:48  ( 0 restrained)
Converged at iter 3 by tolerance(s): delta_theta=0.00e+00, delta_loglik=0.00e+00
Main AI-REML Loop finished. Combining results.
solveMME_cpp: adding nugget 1.96393e-11 to MME diagonal (scaled from median diag)
summary(met_model_fixed)
Model statement:
mandala(fixed = yield ~ genotype + location + genotype:location, 
    random = ~location:block, data = met_data)

Variance Components:
      component estimate std.error   z.ratio bound %ch
 location:block  40270.0  22191.43  1.814664     P  NA
       R.sigma2 152754.8  11437.85 13.355209     P  NA

Fixed Effects (BLUEs) [first 5]:
      effect  estimate std.error    z.ratio
 (Intercept) 5968.0741  253.6442 23.5293175
 genotypeG10  607.3811  319.1079  1.9033724
 genotypeG11  300.6162  319.1079  0.9420520
 genotypeG12  163.7220  319.1079  0.5130616
 genotypeG13 1025.1174  319.1079  3.2124481

Converged: TRUE  |  Iterations: 3

Random Effects (BLUPs) [first 5]:
         random   level  estimate std.error    z.ratio
 location:block Loc1.B1 -25.31215  128.2062 -0.1974332
 location:block Loc2.B1 302.57897  128.2062  2.3600960
 location:block Loc3.B1 -42.51595  128.2062 -0.3316216
 location:block Loc4.B1 -15.29247  128.2062 -0.1192803
 location:block Loc1.B2 116.55314  128.2062  0.9091071

logLik: -1839.397   AIC: 3682.795   BIC: 3689.756   logLik_Trunc: -1618.852
var_comps_met <- summary(met_model_fixed)$varcomp
Model statement:
mandala(fixed = yield ~ genotype + location + genotype:location, 
    random = ~location:block, data = met_data)

Variance Components:
      component estimate std.error   z.ratio bound %ch
 location:block  40270.0  22191.43  1.814664     P  NA
       R.sigma2 152754.8  11437.85 13.355209     P  NA

Fixed Effects (BLUEs) [first 5]:
      effect  estimate std.error    z.ratio
 (Intercept) 5968.0741  253.6442 23.5293175
 genotypeG10  607.3811  319.1079  1.9033724
 genotypeG11  300.6162  319.1079  0.9420520
 genotypeG12  163.7220  319.1079  0.5130616
 genotypeG13 1025.1174  319.1079  3.2124481

Converged: TRUE  |  Iterations: 3

Random Effects (BLUPs) [first 5]:
         random   level  estimate std.error    z.ratio
 location:block Loc1.B1 -25.31215  128.2062 -0.1974332
 location:block Loc2.B1 302.57897  128.2062  2.3600960
 location:block Loc3.B1 -42.51595  128.2062 -0.3316216
 location:block Loc4.B1 -15.29247  128.2062 -0.1192803
 location:block Loc1.B2 116.55314  128.2062  0.9091071

logLik: -1839.397   AIC: 3682.795   BIC: 3689.756   logLik_Trunc: -1618.852
print(var_comps_met)
       component estimate std.error   z.ratio bound %ch
1 location:block  40270.0  22191.43  1.814664     P  NA
2       R.sigma2 152754.8  11437.85 13.355209     P  NA
# Heritability Estimation
met_model_random <- mandala(
  fixed  = yield ~ location,
  random = ~ genotype + genotype:location + location:block,
  data   = met_data
)
Initial data rows: 360 
Final data rows after NA handling: 360 
Explicit param_df Check:
               name  type vc_idx     init lower upper origin
1          genotype   var      1 229685.4 1e-06   Inf      G
2 genotype:location   var      2 229685.4 1e-06   Inf      G
3    location:block   var      3 229685.4 1e-06   Inf      G
4          R.sigma2 R_var      0 287106.8 1e-06   Inf      R
--- Starting EM Warmup ---
EM Warmup: starting theta:
 genotype 229700 genotype:location 229700 location:block 229700 R.sigma2 287100 
EM iter 1: logLik=-2754.0859; theta: genotype 155400 genotype:location 162300 location:block 160600 R.sigma2 142100 
EM iter 2: logLik=-2715.4535; theta: genotype 111900 genotype:location 126100 location:block 115600 R.sigma2 155600 
EM iter 3: logLik=-2710.1406; theta: genotype 84090 genotype:location 99910 location:block 84970 R.sigma2 151200 
EM iter 4: logLik=-2705.7996; theta: genotype 66500 genotype:location 82520 location:block 64480 R.sigma2 154500 
EM iter 5: logLik=-2704.0857; theta: genotype 55100 genotype:location 70520 location:block 50680 R.sigma2 156500 
EM Warmup: ending theta:
 genotype 55100 genotype:location 70520 location:block 50680 R.sigma2 156500 
--- EM Warmup Complete. Starting AI-REML ---
ai_reml_core(): method=auto
Starting AI-REML logLik = -2703.5430
 Iter     LogLik   Sigma2  DF     wall    Restrained
    1 -2702.7530 144938.837  356  09:44:49  ( 0 restrained)
Warning in ai_reml_core(y = y, X = X, Zlst = built_G$Zlst, random_names =
built_G$random_names, : Iter 2: Fast step rejected. Re-using previous
parameters.
    2 -2702.7530 144938.837  356  09:44:49  ( 0 restrained)
Converged at iter 2 by tolerance(s): delta_theta=0.00e+00, delta_loglik=0.00e+00
Main AI-REML Loop finished. Combining results.
solveMME_cpp: adding nugget 3.57101e-11 to MME diagonal (scaled from median diag)
h2_met <- h2_estimates(
  random_mod = met_model_random,
  fixed_mod  = met_model_fixed,
  genotype   = "genotype"
)
print(h2_met)
      variable             value
 source_random              TRUE
  source_fixed              TRUE
      sigma_g2  63388.6139119626
      sigma_e2  144938.837455179
         n_rep                12
 mean_PEV_BLUP  21224.4944796125
    avsed_BLUE  559.483573688463
    vdBLUE_avg  313021.869227213
     H2_Cullis 0.665168660903514
     H2_Piepho 0.288261682831932
       H2_Plot 0.304273937476685
   H2_Standard  0.83995308674032

Note:
Plot and entry mean heritabilities are just the approximation- assuming a single experiment with only 'genotype' and 'error' variance components in the model. For MET, modify the equations and estimate accordingly
# BLUPs
blups_by_loc <- mandala_predict(met_model_random, classify_term = "genotype:location")
Random terms included: 
PEV matrix computed. Scale factor used = 1

Notes:
- The predictions are obtained by averaging across the hypertable
  calculated from model terms constructed solely from factors in
  the averaging and classify sets.
- The simple averaging set: (none)
- The ignored set: location:block
blups_overall <- mandala_predict(met_model_random, classify_term = "genotype")
Random terms included: 
PEV matrix computed. Scale factor used = 1

Notes:
- The predictions are obtained by averaging across the hypertable
  calculated from model terms constructed solely from factors in
  the averaging and classify sets.
- The simple averaging set: location, [num mean] 
- The ignored set: location:block
head(blups_overall)
  genotype predicted_value std_error
1       G1        5345.805  118.5263
2      G10        5386.511  118.5263
3      G11        5178.997  118.5263
4      G12        5529.319  118.5263
5      G13        5924.195  118.5263
6      G14        5736.369  118.5263

Summary

This tutorial demonstrated:

  1. RCBD Analysis — Simple randomized complete block design
  2. Alpha-Lattice Design — Incomplete block designs
  3. Multi-Environment Trials — Analyzing GxE interactions

Next Steps

Session Information

sessionInfo()
R version 4.5.2 (2025-10-31)
Platform: aarch64-apple-darwin20
Running under: macOS Tahoe 26.2

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/New_York
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] tidyr_1.3.2   ggplot2_4.0.2 dplyr_1.2.0   mandala_1.0.1

loaded via a namespace (and not attached):
 [1] Matrix_1.7-4       gtable_0.3.6       jsonlite_2.0.0     compiler_4.5.2    
 [5] tidyselect_1.2.1   Rcpp_1.1.1         scales_1.4.0       yaml_2.3.12       
 [9] fastmap_1.2.0      lattice_0.22-7     R6_2.6.1           labeling_0.4.3    
[13] generics_0.1.4     knitr_1.51         tibble_3.3.1       pillar_1.11.1     
[17] RColorBrewer_1.1-3 rlang_1.1.7        utf8_1.2.6         xfun_0.56         
[21] S7_0.2.1           viridisLite_0.4.2  cli_3.6.5          withr_3.0.2       
[25] magrittr_2.0.4     digest_0.6.39      grid_4.5.2         lifecycle_1.0.5   
[29] vctrs_0.7.1        evaluate_1.0.5     glue_1.8.0         farver_2.1.2      
[33] rmarkdown_2.30     purrr_1.2.1        tools_4.5.2        pkgconfig_2.0.3   
[37] htmltools_0.5.9