---
title: "11. Package overview: a clean-room methods tour"
output:
  rmarkdown::html_vignette:
    toc: true
    toc_depth: 2
vignette: >
  %\VignetteIndexEntry{11. Package overview: a clean-room methods tour}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(
  collapse = TRUE, comment = "#>", message = FALSE, warning = FALSE,
  fig.width = 7, fig.height = 5.5
)
```

# Introduction

Idiographic science treats the individual — not the group average — as the
unit of analysis (Molenaar, 2004). Intensive longitudinal designs such as
experience sampling (ESM), ecological momentary assessment (EMA), and diary
studies produce the many-occasions-per-person data this paradigm requires,
and dynamic network models translate those repeated measurements into
interpretable structures: *temporal* networks of lagged, directed effects,
*contemporaneous* networks of same-occasion partial associations, and
*between-person* networks of stable individual differences (Epskamp,
Waldorp, Mõttus, & Borsboom, 2018).

The `idiographic` package implements the principal estimators of this
literature — ordinary and regularized vector autoregression (VAR), multilevel
VAR, Bayesian dynamic structural equation modeling (DSEM), unified structural
equation modeling (uSEM), and Group Iterative Multiple Model Estimation
(GIMME) — together with the methodological workflow that surrounds them:
preprocessing audits, rolling-window (time-varying) estimation, and
structured model comparison. Every estimator is one verb with named arguments; every
result answers questions through the same tidy accessors: `edges()`,
`nodes()`, `coefs()`, `matrices()`, `summary()`, and `plot()`.

# Clean-room implementation

All estimators in `idiographic` are **clean-room reimplementations**: each
was written from the published algorithm — the estimating equations, model
specification, and selection criteria described in the methodological
literature — rather than by wrapping an existing package. Correctness is then
established empirically, by demonstrating numerical equivalence against the
reference implementation on shared data:

| Estimator | Method | Reference implementation | Agreement |
|---|---|---|---|
| `graphical_var()` | Regularized graphical VAR (graphical lasso + EBIC) | `graphicalVAR` | ~1e-10 |
| `build_mlvar()` | Two-step multilevel VAR (`lmer` fixed effects) | `mlVAR` (`estimator = "lmer"`) | machine precision |
| `build_mlvar_bayes()` | Bayesian multilevel VAR / DSEM | Mplus 9 DSEM; independent Stan/JAGS | Monte Carlo error |
| `build_var_bayes()` | Bayesian VAR(1), Normal-inverse-Wishart | Mplus 9 `ESTIMATOR = BAYES` | ~1e-3 |
| `build_gimme()` | GIMME group + individual search | `gimme` (its own bundled data) | exact path sets and coefficients |
| internal graphical lasso | Friedman, Hastie, & Tibshirani (2008) | `glasso` (KKT-checked) | ~1e-11 |

Two properties follow from this design. First, the runtime dependency
footprint is minimal — the package imports only `stats`, `utils`, `lme4`,
and `lavaan`; the reference packages are needed solely to regenerate
validation fixtures. Second, the Bayesian DSEM estimator reproduces the
two-level Bayesian VAR with latent mean centering that Mplus fits under
`TYPE = TWOLEVEL; ESTIMATOR = BAYES` (Asparouhov, Hamaker, & Muthén, 2018)
**without an Mplus installation**.

# The empirical example

Throughout this vignette we analyze the self-regulated learning (SRL)
experience-sampling data from Chapter 20 of the *Learning Analytics Methods*
book: 36 students each reported nine SRL indicators on 156 occasions. The
panel is imported directly from the lamethods data repository:

```{r import}
library(rio)
library(idiographic)

df <- import("https://github.com/lamethods/data2/raw/main/srl/srl.RDS")

nrow(df)
length(unique(df$name))
```

Rows arrive ordered by student and, within student, by occasion, so the
estimators form lag-1 pairs from consecutive rows within each `id` — no
manual sorting or indexing is required. (The same panel, tidied to the nine
SRL indicators with an explicit `day` occasion index, ships with the package
as `data(srl)` for offline use.)

We follow the package convention of a focused five-indicator set so printed
networks stay readable, and use `Grace` for the single-person analyses:

```{r vars}
vars <- c("efficacy", "value", "planning", "monitoring", "effort")
has_cograph <- requireNamespace("cograph", quietly = TRUE)
```

# 1. Preprocessing audit — `audit_preprocess()`

Dynamic network estimates are only as sound as the lag-1 design behind them.
`audit_preprocess()` constructs the exact lagged design the estimators use
and reports compliance, missingness, day-boundary drops, linear trends,
near-unit-root persistence, and split-half drift — making the modeling input
explicit before any model is fit.

```{r audit}
audit <- audit_preprocess(df, vars = vars, id = "name")

audit
```

The subject-by-variable diagnostics are a tidy table:

```{r audit-table}
head(as.data.frame(audit))
```

# 2. Ordinary VAR — `build_var()`

The transparent baseline: each variable is regressed on an intercept and the
lag-1 values of all variables by ordinary least squares, and the residual
concentration matrix yields the contemporaneous partial-correlation network.
Passing `subject` selects one person's series — the caller never slices the
data frame.

```{r var}
var_fit <- build_var(df, vars = vars, id = "name", subject = "Grace",
                     scale = TRUE)

var_fit
```

```{r var-tables}
head(edges(var_fit))

summary(var_fit)
```

```{r plot-var, eval=has_cograph}
plot(var_fit)
```

# 3. Graphical VAR — `graphical_var()`

The regularized counterpart (Epskamp et al., 2018): lasso penalties on the
temporal coefficients and the contemporaneous concentration matrix, with the
penalty pair selected by the extended Bayesian information criterion (Chen &
Chen, 2008). Sparsity — cells estimated as exactly zero — is the point of
the method.

```{r gvar}
gvar_fit <- graphical_var(df, vars = vars, id = "name", subject = "Grace",
                          n_lambda = 12, gamma = 0)

gvar_fit
```

Fixing a penalty instead of searching it (e.g. `lambda_beta = 0.1`)
reproduces published fixed-penalty analyses and collapses the EBIC grid.

```{r plot-gvar, eval=has_cograph}
plot(gvar_fit, layer = "temporal")
```

# 4. One network per person — `build_var_each()` / `graphical_var_each()`

When the analysis target is the idiographic map of *every* individual, fit
one model per person. The collection prints a per-cohort summary and plots
any subject by name.

```{r var-each}
var_each <- build_var_each(df, vars = vars, id = "name", scale = TRUE)

var_each
```

```{r plot-var-each, eval=has_cograph}
plot(var_each, subject = "Grace")
```

`graphical_var_each()` does the same with sparse estimation; here on a
four-student subsample for speed:

```{r gvar-each}
students4 <- subset(df, name %in% c("Grace", "Eve", "Aisha", "Bob"))

gvar_each <- graphical_var_each(students4, vars = vars, id = "name",
                                n_lambda = 8, gamma = 0)

gvar_each
```

# 5. Multilevel VAR — `build_mlvar()`

The multilevel VAR of Bringmann et al. (2013) and Epskamp et al. (2018)
decomposes the panel into three group-level networks — the average
within-person temporal network, the within-person contemporaneous network,
and the between-person network of person means — estimated with `lme4` mixed
models. This estimator matches `mlVAR::mlVAR(estimator = "lmer")` to machine
precision.

```{r mlvar}
mlvar_fit <- build_mlvar(df, vars = vars, id = "name", standardize = TRUE)

mlvar_fit
```

```{r mlvar-tables}
head(edges(mlvar_fit))

summary(mlvar_fit)
```

```{r plot-mlvar, eval=has_cograph}
plot(mlvar_fit, layer = "between")
```

# 6. Bayesian multilevel VAR / DSEM — `build_mlvar_bayes()`

A native Bayesian two-level VAR(1) with latent mean centering — the model
Mplus fits as DSEM (Asparouhov et al., 2018) — estimated by a pure-R
conjugate Gibbs sampler with Mplus's own priors (`N(0, ∞)` on coefficients
and means, improper inverse-Wishart on covariance blocks; reported estimates
are posterior medians). Convergence is monitored with the potential scale
reduction factor across chains.

```{r mlvar-bayes}
bayes_fit <- build_mlvar_bayes(df, vars = vars, id = "name",
                               n_iter = 2000, n_chains = 2, seed = 1)

bayes_fit
```

`coefs()` returns posterior medians with posterior SDs, credible intervals,
and one-sided posterior p-values:

```{r mlvar-bayes-coefs}
head(coefs(bayes_fit))
```

The full DSEM extensions — person-specific temporal matrices
(`temporal = "random"`), person-specific residual covariances
(`residual = "random"`), within-model imputation of missing observations
(`impute = TRUE`), and Mplus-style time-interval binning (`tinterval`) — are
available through arguments. Identifying the random-effect covariance under
the improper prior requires at least `2(p + p²) + 1` subjects, so with 36
students we demonstrate random slopes on a three-variable system (needs 25):

```{r mlvar-bayes-random}
dsem_fit <- build_mlvar_bayes(df,
                              vars = c("planning", "monitoring", "effort"),
                              id = "name", temporal = "random",
                              n_iter = 2000, n_chains = 2, seed = 1)

dsem_fit
```

# 7. Bayesian single-subject VAR — `build_var_bayes()`

The single-level analogue: an exact Normal-inverse-Wishart two-block sampler
for one person's VAR(1), validated against Mplus `ESTIMATOR = BAYES`.

```{r var-bayes}
var_bayes_fit <- build_var_bayes(df, vars = vars, id = "name",
                                 subject = "Grace", n_iter = 2000, seed = 1)

var_bayes_fit
```

# 8. The Mplus backend — `build_mlvar_mplus()`

For laboratories that hold an Mplus license, `build_mlvar_mplus()` drives the
genuine Mplus DSEM run through `MplusAutomation` and returns the same tidy
result classes, so native and Mplus estimates are directly comparable. It is
shown, not run, because it requires the external program:

```{r mplus, eval=FALSE}
mplus_fit <- build_mlvar_mplus(df, vars = vars, id = "name")
```

# 9. Unified SEM — `build_usem()`

uSEM (Kim, Zhu, Chang, Bentler, & Ernst, 2007) places lagged and
contemporaneous paths in a single structural equation model per person,
estimated with `lavaan`. Contemporaneous paths are directed, which
distinguishes it from the partial-correlation representation of the VAR
family.

```{r usem}
students8 <- subset(df, name %in% c("Grace", "Eve", "Aisha", "Alice",
                                    "Bob", "Diana", "Frank", "Heidi"))

usem_fit <- build_usem(students8, vars = vars, id = "name",
                       temporal = "ar", contemporaneous = "none",
                       residual_cov = TRUE, seed = 1)

usem_fit
```

# 10. GIMME — `build_gimme()`

GIMME (Gates & Molenaar, 2012) searches each individual's uSEM and promotes
paths carried by a sufficient proportion of the sample to the group level,
yielding a group structure plus person-specific elaborations. The
implementation replicates the `gimme` package exactly — identical path sets
and coefficients on its own bundled benchmark data.

```{r gimme}
gimme_fit <- build_gimme(students8, vars = vars, id = "name",
                         ar = TRUE, groupcutoff = 0.75, seed = 1)

gimme_fit
```

```{r gimme-tables}
head(edges(gimme_fit))
```

`plot()` draws the canonical GIMME display: dashed lag-1 edges, solid
contemporaneous edges, width proportional to the share of subjects carrying
the path, black for group-level paths.

```{r plot-gimme, eval=has_cograph}
plot(gimme_fit)
```

# 11. Rolling networks — `rolling_var()` / `rolling_graphical_var()`

Rolling-window estimation asks whether one person's dynamics are stationary
over the study period: the estimator is refit on a moving slice of the
series, and the result is a tidy table of window-by-edge estimates.

```{r rolling}
rolling_fit <- rolling_var(df, vars = vars, id = "name", subject = "Grace",
                           window_size = 50, step = 20, scale = TRUE,
                           keep_fits = TRUE)

rolling_fit

head(as.data.frame(rolling_fit))
```

```{r plot-rolling, eval=has_cograph}
plot(rolling_fit, fit = 1, layer = "temporal")
```

`rolling_graphical_var()` provides the sparse analogue with the same
interface.

# 12. Model comparison — `compare_idiographic()`

`compare_idiographic()` runs several estimators on the same data and stacks
their summaries into one comparison table, so estimators are read against
each other rather than assembled by hand. Per-estimator arguments pass
through `estimator_args`.

```{r compare}
cmp <- compare_idiographic(
  df, vars = vars, id = "name",
  estimators = c("var", "graphical_var"),
  estimator_args = list(
    var = list(subject = "Grace", scale = TRUE),
    graphical_var = list(subject = "Grace", n_lambda = 8, gamma = 0)
  )
)

cmp

as.data.frame(cmp)
```

# One grammar for every result

Every estimator above returned an object obeying the same contract: `print`
shows the estimated networks; `edges()`, `nodes()`, `coefs()`, and
`matrices()` return plain data frames; `summary()` condenses the fit;
`plot()` draws it (via the optional `cograph` package); and `as_netobject()`
converts any result into a network object for further graph-analytic work.
The user never indexes into a result object — when a subset is needed, it is
an argument, not a bracket.

# References

Asparouhov, T., Hamaker, E. L., & Muthén, B. (2018). Dynamic structural
equation models. *Structural Equation Modeling, 25*(3), 359–388.

Bringmann, L. F., Vissers, N., Wichers, M., Geschwind, N., Kuppens, P.,
Peeters, F., Borsboom, D., & Tuerlinckx, F. (2013). A network approach to
psychopathology: New insights into clinical longitudinal data. *PLoS ONE,
8*(4), e60188.

Chen, J., & Chen, Z. (2008). Extended Bayesian information criteria for
model selection with large model spaces. *Biometrika, 95*(3), 759–771.

Epskamp, S., Waldorp, L. J., Mõttus, R., & Borsboom, D. (2018). The Gaussian
graphical model in cross-sectional and time-series data. *Multivariate
Behavioral Research, 53*(4), 453–480.

Friedman, J., Hastie, T., & Tibshirani, R. (2008). Sparse inverse covariance
estimation with the graphical lasso. *Biostatistics, 9*(3), 432–441.

Gates, K. M., & Molenaar, P. C. M. (2012). Group search algorithm recovers
effective connectivity maps for individuals in homogeneous and heterogeneous
samples. *NeuroImage, 63*(1), 310–319.

Kim, J., Zhu, W., Chang, L., Bentler, P. M., & Ernst, T. (2007). Unified
structural equation modeling approach for the analysis of multisubject fMRI
data. *Human Brain Mapping, 28*(2), 85–93.

Molenaar, P. C. M. (2004). A manifesto on psychology as idiographic science:
Bringing the person back into scientific psychology, this time forever.
*Measurement, 2*(4), 201–218.

*Learning Analytics Methods*, Book 2, Chapter 20: Vector autoregression.
<https://lamethods.org/book2/chapters/ch20-var/ch20-var.html> (source of the
SRL data).