---
title: "Gaussian graphical models"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Gaussian graphical models}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>", fig.width = 6, fig.height = 5)
has_cograph <- requireNamespace("cograph", quietly = TRUE)
```

```{r}
library(psychnets)
```

## What is a psychological network?

Network analysis lets researchers chart relations, map connections, and discover
clusters among interacting components, and it has become a central method for
understanding complex systems. Its early successes were with social networks —
people linked by who talks to whom. To go beyond who-interacts-with-whom and
reason about variables and their associations, we need a different, *probabilistic*
kind of network.

In a probabilistic, or **psychological**, network the **nodes** are variables —
constructs, behaviours, attitudes, or scale scores — and the **edges** are the
probabilistic associations between them (Saqr, Beck & López-Pernas, 2024;
Borsboom et al., 2021). This vignette focuses on the most widely used such model,
the **Gaussian graphical model (GGM)**, a network whose edges are *partial
correlations*.

A partial correlation measures the association between two variables *after
controlling for every other variable in the network* — their conditional
dependence, *ceteris paribus*. Suppose we study five learning constructs:
motivation, achievement, engagement, self-regulation, and well-being. An edge
between well-being and achievement then means the two are associated *beyond* what
their shared relations with motivation, engagement, and self-regulation already
explain. Equally informative is what is *missing*: when there is no edge between
two variables, they are conditionally independent given the rest. So both the
presence **and** the absence of an edge are interpretable and meaningful — a
property that sets these networks apart from a plain table of correlations
(Epskamp & Fried, 2018; Epskamp, Waldorp, Mõttus & Borsboom, 2018).

In practice many of the small partial correlations in a network are just noise.
Regularization removes those negligible edges — shrinking them to exactly zero —
leaving a sparse network that is easier to read and more likely to replicate.
Beyond estimating the edges, psychological-network methods also supply tools to
judge how *trustworthy* a network is: bootstrapping the accuracy of edge weights
and centralities, and gauging how well a network is expected to replicate
(Epskamp, Borsboom & Fried, 2018).

## Why regularize?

If we simply invert the sample correlation matrix, every pair of variables gets a
non-zero partial correlation: the resulting network is fully connected, and many
of those edges are non-zero only by sampling chance. Such a dense graph is hard to
interpret and unstable, especially when the number of variables is large relative
to the sample size — the usual situation in psychological data.

The **graphical lasso** fixes this. It applies a penalty that shrinks small
partial correlations all the way to zero, so that only the edges with real support
in the data survive. How hard to shrink is itself chosen from the data, by the
extended Bayesian information criterion (**EBIC**), which trades off model fit
against the number of edges. The result is a sparse, interpretable network. (The
exact penalty and selection criterion are given in *Mathematical foundations* at
the end of this vignette.)

## A worked example

The illustrations use `SRL_GPT`, one of five data sets that ship with the package.
Each is a sample of 300 cases of Motivated Strategies for Learning Questionnaire
(MSLQ) construct scores, generated by a large language model in a study of
self-regulated learning simulated through LLM-generated survey responses. The five
constructs are cognitive strategy use (`CSU`), intrinsic value (`IV`),
self-efficacy (`SE`), self-regulation (`SR`), and test anxiety (`TA`); each score
is the mean of that construct's Likert items on the 1–7 scale.

```{r}
head(SRL_GPT)
```

### Estimating the network

`psychnet()` is deliberately simple to use: in the common case you supply only
two arguments — the `data` and the `method`. Here `method = "glasso"` fits the
EBIC-regularized graphical lasso. Everything else has a sensible default, so a
complete analysis is a single readable call; the remaining arguments (covered
under *A note on options* and *Mathematical foundations*) are there only when you
want fine-grained control. (Coming from `qgraph`/`bootnet`? The spelling
`"EBICglasso"` is accepted as an alias for `"glasso"`.)

```{r}
net <- psychnet(SRL_GPT, method = "glasso")
net
```

The fitted partial-correlation network is the tidy edge list returned by
`as.data.frame()`:

```{r}
as.data.frame(net)
```

Each weight is the estimated partial correlation between two constructs, after
conditioning on the other three. The structure is readable: `CSU`, `IV`, `SE`, and
`SR` form a positively connected cluster, while test anxiety (`TA`) is negatively
tied to self-regulation (`SR`) — more anxiety goes with less self-regulation, over
and above everything else in the network.

### Centrality

Centrality summarizes how important each node is. **Node strength** is the sum of
the absolute edge weights at a node — how strongly it is connected overall.
**Expected influence** is the *signed* sum, which keeps the direction of the edges,
so a node with strong negative connections can have low or negative influence even
when its strength is high.

```{r}
net_centralities(net)
```

`SR` and `CSU` are the most strongly connected constructs; `TA`, with its single
strong negative edge, has the lowest expected influence.

### Predictability

Predictability asks a complementary question: how much of each node's variance is
explained by its neighbours in the network (Haslbeck & Waldorp, 2018)? It tells us
whether the connections around a node are actually informative about it.

```{r}
net_predict(net)
```

Most constructs are highly predictable from their neighbours (variance explained
above 0.7), whereas test anxiety (`TA`) is comparatively isolated (about 0.14),
consistent with its single strong edge.

### Plotting

Passing the fitted object to `cograph::splot()` with `psych_styling = TRUE` draws
the network with a spring layout, labelled nodes, and the conventional colour
coding (green for positive edges, red for negative). For a graphical-lasso network
the object already carries each node's predictability, and the plot draws it
automatically as a ring around the node — the visual analogue of the
`net_predict()` table above.

```{r ggm-plot, eval = has_cograph}
cograph::splot(net, psych_styling = TRUE)
```

### A note on options

The defaults above (Pearson correlations, pairwise-complete handling of missing
data) match the common bootnet workflow, but several options are worth knowing.
For Likert data, `cor_method = "auto"` switches to the polychoric/polyserial
correlation used by `qgraph` and `bootnet`. For speed, `native = FALSE`
dispatches the inner solve to the optional Fortran `glasso` package. The full
list of arguments and their defaults is tabulated in *Mathematical foundations*,
below.

### A second example: how the options change the network

The two-argument call above uses every default. Because the substantive choices
are just arguments, we can change one and watch its effect. Here we re-estimate
the network from rank-based (Spearman) correlations and a lighter EBIC penalty
(`gamma = 0.25` rather than `0.5`):

```{r}
net2 <- psychnet(SRL_GPT, method = "glasso", cor_method = "spearman", gamma = 0.25)
as.data.frame(net2)
```

Comparing this with the default Pearson fit shows the difference concretely: the
default keeps all ten edges, including the very weak `CSU`–`TA` link (about
0.05), whereas this fit drops it and returns nine. The strong `CSU`/`IV`/`SE`/`SR`
cluster and the negative `SR`–`TA` edge survive either way; what changes is which
*borderline* edges are retained. The correlation type and the penalty therefore
decide the fate of weak edges, even when the backbone is stable.

A second, more direct control is `threshold`, a post-hoc absolute-weight floor
that simply removes any edge smaller than a chosen size. With `threshold = 0.1`
the three weak `TA` edges fall away and only the eight strongest remain:

```{r}
as.data.frame(psychnet(SRL_GPT, method = "glasso", threshold = 0.1))
```

## Non-normal data

Construct scores are continuous but usually skewed and bounded rather than exactly
Gaussian. When joint normality is doubtful, the **nonparanormal** model relaxes it:
it assumes the variables are jointly Gaussian only after unknown monotone
transformations of each variable, estimates those transformations from the ranks,
and then applies the identical EBIC graphical lasso. This is `method = "huge"`, and
it certifies against the same optimality conditions:

```{r}
certificate(psychnet(SRL_GPT, method = "huge"))
```

---

## Mathematical foundations

This section is a technical reference. It states the model, the estimator, the
selection criterion, the implementation, and the optimality certificate precisely,
for readers who want the definitions and equations behind the tutorial above. It
can be skipped on a first reading.

### The Gaussian graphical model

Let $\mathbf{X} = (X_1, \dots, X_p)$ follow a multivariate normal distribution
$\mathcal{N}(\boldsymbol\mu, \boldsymbol\Sigma)$ with covariance matrix
$\boldsymbol\Sigma$ and precision (concentration) matrix
$\mathbf{K} = \boldsymbol\Sigma^{-1}$. The GGM encodes the conditional
independence structure of $\mathbf{X}$ in an undirected graph on the $p$
variables. Its defining property is the pairwise Markov property: a zero in the
precision matrix is exactly a conditional independence,

$$
K_{ij} = 0 \;\Longleftrightarrow\; X_i \perp\!\!\!\perp X_j \mid \mathbf{X}_{\setminus\{i,j\}},
$$

that is, $X_i$ and $X_j$ are independent given all remaining variables
(Lauritzen, 1996). The off-diagonal entries of $\mathbf{K}$ are in one-to-one
correspondence with the partial correlations,

$$
\rho_{ij \cdot \text{rest}} \;=\; \frac{-K_{ij}}{\sqrt{K_{ii} K_{jj}}},
$$

so an edge in a GGM is a non-zero partial correlation between two variables after
conditioning on every other variable in the system. In the psychometric network
literature this object is the central model: nodes are measured variables and
edges are conditional (partial-correlation) associations (Epskamp & Fried, 2018).

### The graphical lasso objective

The maximum-likelihood estimate of $\mathbf{K}$ from a sample covariance matrix
$\mathbf{S}$ inverts $\mathbf{S}$ directly and is therefore dense. The graphical
lasso (Yuan & Lin, 2007; Banerjee, El Ghaoui & d'Aspremont, 2008; Friedman,
Hastie & Tibshirani, 2008) instead maximizes an $\ell_1$-penalized Gaussian
log-likelihood,

$$
\hat{\mathbf{K}}
 \;=\; \arg\max_{\mathbf{K} \succ 0}
       \Bigl[ \log\det \mathbf{K} - \operatorname{tr}(\mathbf{S}\mathbf{K})
              - \lambda \lVert \mathbf{K} \rVert_1 \Bigr],
$$

where $\lVert \mathbf{K} \rVert_1 = \sum_{i \neq j} |K_{ij}|$ penalizes the
off-diagonal entries and $\lambda \geq 0$ is a tuning parameter. The penalty
shrinks small partial correlations exactly to zero, producing a sparse graph and
controlling sampling variability when $p$ is large relative to $n$. The objective
is strictly concave in $\mathbf{K}$, so its maximizer is unique.

### Tuning by the extended BIC

The penalty $\lambda$ is selected over a path of candidate values by minimizing
the extended Bayesian information criterion (EBIC; Foygel & Drton, 2010), which
augments the ordinary BIC with a term that penalizes the size of the model space:

$$
\mathrm{EBIC}_\gamma
 \;=\; -2\,\ell(\hat{\mathbf{K}}) + E \log n + 4\,E\,\gamma \log p,
$$

where $\ell$ is the maximized Gaussian log-likelihood, $E$ is the number of
non-zero edges, and $\gamma \in [0, 1]$ is a hyperparameter (the `gamma` argument,
default $0.5$) governing the strength of the model-space penalty. Setting
$\gamma = 0$ recovers ordinary BIC; larger $\gamma$ favours sparser graphs. The
candidate path is logarithmically spaced between the smallest penalty that yields
the empty graph and a fraction `lambda_min_ratio` of it, with `nlambda` points. In
`psychnets` the defaults are `nlambda = 100` and `lambda_min_ratio = 0.01`.

### The psychnets implementation

The estimator is a clean-room reimplementation in pure base R. The package imports
only the base packages `stats` and `parallel`; it does not depend on `glasso`,
`qgraph`, `glmnet`, or `Matrix`, and contains no compiled code. The solver is the
covariance block-coordinate-descent algorithm of Friedman, Hastie & Tibshirani
(2008). Writing $\mathbf{W} = \mathbf{K}^{-1}$ for the model covariance, the
algorithm cycles over the columns of $\mathbf{W}$; for each column the off-diagonal
block is updated by solving an ordinary lasso regression via coordinate descent
(with the diagonal held fixed at the diagonal of $\mathbf{S}$, since the penalty is
off-diagonal only), and the precision matrix is reconstructed from the converged
$\mathbf{W}$ and the lasso coefficients.

Selection of $\lambda$ proceeds in two tiers. The EBIC criterion only needs to
identify which edges are active at each candidate $\lambda$, and the lasso zeros
inactive edges exactly, so the path is first scanned at a loose convergence
tolerance (`scan_tol = 1e-4`). The single $\lambda$ that minimizes EBIC is then
refit at a tight tolerance (`refit_tol = 1e-8`), so the returned estimate is the
converged optimum of the convex objective rather than a coarse path point.

### The KKT optimality certificate

Because the graphical-lasso objective is strictly convex, its optimum is unique and
is characterized exactly by the first-order (Karush–Kuhn–Tucker) stationarity
conditions. Writing $\mathbf{W} = \hat{\mathbf{K}}^{-1}$, the optimal estimate
satisfies

$$
W_{ii} = S_{ii}, \qquad
W_{ij} - S_{ij} = \lambda\,\operatorname{sign}(K_{ij})
 \;\text{ where } K_{ij} \neq 0, \qquad
|W_{ij} - S_{ij}| \leq \lambda \;\text{ otherwise.}
$$

The `certificate()` verb returns the maximum absolute violation of these
conditions as a tidy one-row data frame:

```{r}
certificate(net)
```

The residual is at the order of machine precision. Because the minimizer is unique,
a residual this small confirms that the reported precision matrix satisfies the
first-order optimality conditions of the penalized objective to numerical
precision. This is a reproducibility and verification diagnostic — a
numerical-analysis check that the solver converged — rather than a substantive
property of the data; the `kind` column records that it is a KKT residual and
`certified` flags whether it falls below the tolerance `tol` (default $10^{-6}$).

### The solver engine

The `native` argument selects the inner fixed-penalty solver. The default,
`native = TRUE`, is psychnets' own pure-R block-coordinate-descent solver
described above. Setting `native = FALSE` (which requires the optional Fortran
`glasso` package, listed under `Suggests`) dispatches each fixed-penalty solve to
that established package instead, producing numerically equivalent graph
structure at Fortran speed; the optimality residual then reflects the `glasso`
package's own convergence tolerance rather than the tighter base-R refit.

### Argument reference

Only `data` and `method` are required; everything else has a default. The full
set of arguments for the graphical lasso is:

| Argument | Default | Meaning |
|---|---|---|
| `data` | — | Numeric $n \times p$ matrix or data frame (rows are observations). |
| `cor_matrix`, `n` | `NULL` | A precomputed $p \times p$ correlation matrix and its sample size, used in place of `data`. |
| `cor_method` | `"pearson"` | Correlation estimator: `"pearson"`, `"spearman"`, `"kendall"`, or `"auto"` (polychoric/polyserial `cor_auto()`, for ordinal data). |
| `gamma` | `0.5` | EBIC hyperparameter $\gamma$; larger values give sparser graphs, $0$ is ordinary BIC. |
| `nlambda` | `100` | Number of penalties on the $\lambda$ path. |
| `lambda_min_ratio` | `0.01` | Smallest $\lambda$ as a fraction of the penalty that empties the graph. |
| `threshold` | `0` | Post-hoc absolute-weight floor; edges below it are set to zero. |
| `na_method` | `"pairwise"` | Missing data: `"pairwise"` (pairwise-complete correlations with a nearest-positive-definite projection) or `"listwise"` (complete-case deletion). |
| `native` | `TRUE` | Inner solver: `TRUE` = psychnets' pure-R solver, `FALSE` = the `glasso` Fortran package. |
| `labels` | `NULL` | Node names. |

Researchers migrating from `qgraph::EBICglasso()` can map these names onto the
`qgraph` arguments with the package's `net_crosswalk()` helper.

### Predictability in closed form

For a Gaussian graphical model, node predictability is available in closed form
from the precision matrix (Haslbeck & Waldorp, 2018): the proportion of variance of
node $j$ explained by the remaining nodes is

$$
R^2_j \;=\; 1 - \frac{1}{K_{jj}\, S_{jj}},
$$

where $K_{jj}$ is the diagonal of the precision matrix and $S_{jj}$ the
corresponding variance. `net_predict()` reports these per-node values, requiring no
access to the raw data.

### The nonparanormal extension

The nonparanormal (Liu, Lafferty & Wasserman, 2009) assumes that the variables are
jointly Gaussian after unknown monotone marginal transformations. Estimating those
transformations by a rank-based Gaussianization and applying the identical EBIC
graphical lasso to the transformed correlation matrix yields a GGM that is robust
to monotone departures from normality. This is `method = "huge"`, which certifies
against the same KKT conditions as the Gaussian case.

## References

Banerjee, O., El Ghaoui, L., & d'Aspremont, A. (2008). Model selection through
sparse maximum likelihood estimation for multivariate Gaussian or binary data.
*Journal of Machine Learning Research*, 9, 485–516.

Borsboom, D., Deserno, M. K., Rhemtulla, M., Epskamp, S., Fried, E. I., McNally,
R. J., Robinaugh, D. J., Perugini, M., Dalege, J., Costantini, G., Isvoranu, A.-M.,
Wysocki, A. C., van Borkulo, C. D., van Bork, R., & Waldorp, L. J. (2021). Network
analysis of multivariate data in psychological science. *Nature Reviews Methods
Primers*, 1, 1–18.

Epskamp, S., Borsboom, D., & Fried, E. I. (2018). Estimating psychological networks
and their accuracy: A tutorial paper. *Behavior Research Methods*, 50(1), 195–212.

Epskamp, S., & Fried, E. I. (2018). A tutorial on regularized partial correlation
networks. *Psychological Methods*, 23(4), 617–634.

Epskamp, S., Waldorp, L. J., Mõttus, R., & Borsboom, D. (2018). The Gaussian
graphical model in cross-sectional and time-series data. *Multivariate Behavioral
Research*, 53(4), 453–480.

Foygel, R., & Drton, M. (2010). Extended Bayesian information criteria for Gaussian
graphical models. In *Advances in Neural Information Processing Systems* (Vol. 23,
pp. 604–612).

Friedman, J., Hastie, T., & Tibshirani, R. (2008). Sparse inverse covariance
estimation with the graphical lasso. *Biostatistics*, 9(3), 432–441.

Haslbeck, J. M. B., & Waldorp, L. J. (2018). How well do network models predict
observations? On the importance of predictability in network models. *Behavior
Research Methods*, 50(2), 853–861.

Lauritzen, S. L. (1996). *Graphical Models*. Oxford: Oxford University Press.

Liu, H., Lafferty, J., & Wasserman, L. (2009). The nonparanormal: Semiparametric
estimation of high dimensional undirected graphs. *Journal of Machine Learning
Research*, 10, 2295–2328.

Saqr, M., Beck, E., & López-Pernas, S. (2024). Psychological networks: A modern
approach to the analysis of learning and complex learning processes. In M. Saqr &
S. López-Pernas (Eds.), *Learning Analytics Methods and Tutorials: A Practical
Guide Using R* (Chapter 19). Springer.
<https://lamethods.org/book1/chapters/ch19-psychological-networks/ch19-psych.html>

Yuan, M., & Lin, Y. (2007). Model selection and estimation in the Gaussian
graphical model. *Biometrika*, 94(1), 19–35.