--- title: "Confirmatory testing: matching claims to evidence" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Confirmatory testing: matching claims to evidence} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>", message = FALSE, warning = FALSE, dpi = 150, fig.width = 7, fig.height = 5.6, out.width = "100%", fig.align = "center") set.seed(2026) library(lagseq) options(digits = 3) has <- function(p) requireNamespace(p, quietly = TRUE) ``` A fitted model is an estimate, not a finding. The Dynalytics framework (Saqr, Lopez-Pernas, and Misiejuk, 2026) formalises this as a *scientific contract*: every analytical claim must be matched by evidence appropriate to its structure, scope, and complexity. `lagseq` provides the confirmatory testing battery that discharges that contract for lag-sequential models, where each edge is a tested departure from independence. The battery pairs a kind of claim with a kind of evidence: | Claim | Evidence | Function | |---|---|---| | a *specific transition* is real / how precise it is | edge-level uncertainty | `certainty_lsa()`, `bootstrap_lsa()` | | a *significant transition* is not fragile | robustness to information loss | `stability_lsa()` | | the *whole network* is reproducible | structural reliability | `reliability_lsa()` | | the structure is *more than chance* | an assumption-free null | `permute_lsa()` | | two *groups* differ | inference under exchangeability | `compare_lsa()`, `bayes_compare_lsa()` | We use the bundled `engagement` data (138 students, weekly engagement states) for the single-network tests, and a long event log with a real group for the comparison. ```{r fit} fit <- lsa(engagement) transitions(fit, significant = TRUE) ``` The adjusted residual already tests each transition against the independence model, but that test rests on large-sample assumptions that sequential data can violate. The battery supplies evidence that does not rely on them. # A specific transition: edge-level uncertainty A claim about one edge requires an estimate of how much that edge could vary. `bootstrap_lsa()` resamples whole sequences and re-fits; `certainty_lsa()` derives the same uncertainty analytically from a Dirichlet-Multinomial posterior. The two agree when the population is homogeneous; the bootstrap is preferred for a mixture, because resampling sequences preserves within-sequence dependence that the analytic model treats as independent. ```{r edge} as.data.frame(bootstrap_lsa(fit, R = 200)) |> head(4) # resampling CIs as.data.frame(certainty_lsa(fit)) |> head(4) # analytic CIs ``` The bootstrap forest shows every edge's interval at once; tightly pinned edges support stronger claims. ```{r forest, eval = has("ggplot2"), fig.height = 6.5} plot(bootstrap_lsa(fit, R = 200)) ``` # A significant transition: robustness to information loss `stability_lsa()` repeatedly drops a fraction of the cases and re-fits, recording how often each significant edge stays significant. A high stability supports reading the edge as a property of the process; a low one marks it as sample-dependent. ```{r stability} as.data.frame(stability_lsa(fit, R = 200)) |> head(4) ``` # The whole network: structural reliability `reliability_lsa()` raises the claim from a single edge to the entire network. It repeatedly splits the sequences into two halves, fits a model on each, and correlates the edge-weight vectors. A high average correlation indicates the network is reproducible within the sample. ```{r reliability} reliability_lsa(fit, R = 50) ``` # More than chance: an assumption-free null `permute_lsa()` shuffles the event order to build the null of no sequential structure, giving a p-value that does not depend on the adjusted residual's large-sample approximation. ```{r permute} as.data.frame(permute_lsa(fit, R = 200)) |> head(4) ``` # Two groups: inference under exchangeability Comparing groups is the claim that most invites over-interpretation: two networks estimated separately almost always look different. The question is whether the difference exceeds what arises by chance when the group labels carry no information. `tna::group_regulation_long` is a long event log with a recorded achievement group; the same `actor` / `action` / `time` grammar fits it in one call. ```{r group, eval = has("tna")} log <- tna::group_regulation_long gfit <- lsa(log, actor = "Actor", action = "Action", time = "Time", group = "Achiever") ``` `compare_lsa()` answers it by permutation under exchangeability: it shuffles the labels to build a reference distribution for the per-edge difference and a single omnibus statistic. ```{r compare, eval = has("tna")} cmp <- compare_lsa(gfit, R = 500, adjust = "BH") cmp as.data.frame(cmp) |> subset(significant) |> head(4) ``` ```{r barrel, eval = has("tna") && has("ggplot2"), fig.height = 6.5} plot(cmp) ``` `bayes_compare_lsa()` is the Bayesian counterpart: instead of a significance decision it reports, for each edge, the posterior mean difference and a credible interval, so the evidence is a plausible range rather than a verdict. ```{r bayes, eval = has("tna")} bayes_compare_lsa(gfit, seed = 1) ``` # In short The contract is one rule applied at every scope: match the claim to the evidence. A descriptive reading of an edge needs only the fit; a stronger claim needs the test that targets exactly its structure. ```r certainty_lsa(fit); bootstrap_lsa(fit) # a specific edge stability_lsa(fit) # a significant edge under case-dropping reliability_lsa(fit) # the whole network permute_lsa(fit) # more than chance compare_lsa(gfit); bayes_compare_lsa(gfit) # a group difference ```