| Title: | Modern Lag Sequential Analysis with Tidy Transition Networks |
|---|---|
| Description: | A modern, tidy, pipe-friendly toolkit for lag sequential analysis of categorical event sequences. A single 'lsa()' constructor fits classical, two-cell, bidirectional, and dominance engines through a pluggable registry, with multi-lag analysis and structural-zero constraints, and every result is read through a verb that returns a tidy one-row-per-observation data frame. A confirmatory testing battery quantifies the evidence behind each claim: sequence-level bootstrap and analytic Dirichlet-Multinomial certainty for edge uncertainty, split-half reliability for the whole network, case-drop stability, permutation tests, and permutation- and Bayesian-based group comparison. Fits visualize through a single 'plot()' verb (residual heatmap, transition network, chord, sunburst, and forest views) and interoperate with the 'tna', 'Nestimate', and 'TraMineR' ecosystems, both ingesting their sequence objects and converting to network objects. All numerical methods are implemented from primary literature and cross-validated against published worked examples and base-R primitives. |
| Authors: | Mohammed Saqr [aut, cre, cph] |
| Maintainer: | Mohammed Saqr <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.0 |
| Built: | 2026-06-20 17:25:48 UTC |
| Source: | https://github.com/mohsaqr/lagseq |
Returns the per-edge comparison table (the same data frame as
x$edges) so a comparison can be read with as.data.frame() like the
other result objects, without reaching into the object.
## S3 method for class 'lsa_comparison' as.data.frame(x, row.names = NULL, optional = FALSE, ...) ## S3 method for class 'lsa_comparison_pairwise' as.data.frame(x, row.names = NULL, optional = FALSE, ...)## S3 method for class 'lsa_comparison' as.data.frame(x, row.names = NULL, optional = FALSE, ...) ## S3 method for class 'lsa_comparison_pairwise' as.data.frame(x, row.names = NULL, optional = FALSE, ...)
x |
An |
row.names, optional, ...
|
Standard |
The tidy per-edge data frame.
Returns the canonical lsa_data as a tidy data frame: one row per
event (seq_id, within-sequence index, state) for event-level
input, or one row per from/to/count cell for transition-matrix
input.
## S3 method for class 'lsa_data' as.data.frame(x, row.names = NULL, optional = FALSE, ...)## S3 method for class 'lsa_data' as.data.frame(x, row.names = NULL, optional = FALSE, ...)
x |
An |
row.names, optional, ...
|
Standard |
A tidy data frame.
Tidy the per-replicate split-half correlations.
## S3 method for class 'lsa_reliability' as.data.frame(x, row.names = NULL, optional = FALSE, ...) ## S3 method for class 'lsa_reliability_group' as.data.frame(x, row.names = NULL, optional = FALSE, ...)## S3 method for class 'lsa_reliability' as.data.frame(x, row.names = NULL, optional = FALSE, ...) ## S3 method for class 'lsa_reliability_group' as.data.frame(x, row.names = NULL, optional = FALSE, ...)
x |
An |
row.names, optional, ...
|
Ignored (method signature compatibility). |
A data.frame, one row per replicate, with columns
replicate and correlation (a grouped object gains a leading
group column). NA correlations from degenerate splits are kept.
Closed-form Bayesian alternative to compare_lsa() for comparing the
transition structures of two (or, pairwise, more) groups. Each state's
outgoing transitions are modelled as Dirichlet-Multinomial with a
Jeffreys prior, so each transition probability is marginally Beta. The
per-edge posterior mean difference prob_a - prob_b is exact; a
credible interval, the probability of direction pd, and a two-sided
Bayesian p-equivalent 2 * (1 - pd) come from a Monte Carlo draw on the
Beta marginals.
bayes_compare_lsa( x, y = NULL, prior = 0.5, draws = 10000L, ci = 0.95, mean_threshold = 0.01, bound_threshold = 0.001, adjust = "none", seed = NULL )bayes_compare_lsa( x, y = NULL, prior = 0.5, draws = 10000L, ci = 0.95, mean_threshold = 0.01, bound_threshold = 0.001, adjust = "none", seed = NULL )
x |
An |
y |
The second group's |
prior |
Numeric > 0. Dirichlet prior concentration added to every
cell. Default |
draws |
Integer. Monte Carlo posterior draws for the credible
intervals. Default |
ci |
Numeric in (0, 1). Credible-interval mass. Default |
mean_threshold, bound_threshold
|
An edge is flagged credibly
different only if its credible interval excludes zero, |
adjust |
Multiple-comparison correction applied to the two-sided
Bayesian p across edges (and family-wide across pairs); any method of
|
seed |
Optional integer for reproducible credible intervals. |
This complements compare_lsa(): the permutation test asks whether a
difference is more extreme than chance; the Bayesian comparison asks for
the plausible range of the true difference and how precisely it is
estimated. An edge whose source state is rarely visited gets a wide
credible interval even when its row-normalised probability looks
decisive.
The result carries class c("lsa_bayes", "lsa_comparison") (and the
pairwise object c("lsa_bayes_pairwise", "lsa_comparison_pairwise")),
so plot() (barrel / heatmap) and
as.data.frame() work as for a permutation comparison.
For two groups, class c("lsa_bayes", "lsa_comparison", "list")
with an edges data frame (from, to, prob_a, prob_b, diff,
ci_low, ci_high, pd, effect_size, p_value, p_adj,
significant), the two fits, and the Bayesian settings. For more
than two groups, an all-pairwise c("lsa_bayes_pairwise", "lsa_comparison_pairwise", "list").
Johnston, L. & Jendoubi, T. (2026). How Delivery Mode Reshapes Resource Engagement: A Bayesian Differential Network Analysis. TNA Workshop 2026.
compare_lsa(), certainty_lsa()
g <- lsa(group_regulation, group = ifelse(group_regulation$T1 == "plan", "p", "o")) bc <- bayes_compare_lsa(g, seed = 1) head(as.data.frame(bc))g <- lsa(group_regulation, group = ifelse(group_regulation$T1 == "plan", "p", "o")) bc <- bayes_compare_lsa(g, seed = 1) head(as.data.frame(bc))
Non-parametric bootstrap for any LSA fit produced by lsa().
Resamples the underlying sequence data (whole sequences when more
than one is available; geometric-block stationary bootstrap on
events otherwise), refits the engine on each resample using the
immutable recipe stored in fit$params, and aggregates per-edge
statistics into a tidy data frame.
bootstrap_lsa( fit, R = 1000L, level = c("auto", "sequence", "event"), block_length = NULL, level_alpha = 0.95, indices = NULL, parallel = FALSE, n_cores = NULL, verbose = FALSE, ... )bootstrap_lsa( fit, R = 1000L, level = c("auto", "sequence", "event"), block_length = NULL, level_alpha = 0.95, indices = NULL, parallel = FALSE, n_cores = NULL, verbose = FALSE, ... )
fit |
An |
R |
Integer. Number of bootstrap replicates. Default |
level |
Character. Resampling unit: |
block_length |
For event-level bootstrap, mean geometric
block length. Default |
level_alpha |
Numeric. Confidence level for percentile
intervals. Default |
indices |
Optional integer matrix of replay indices, one row
per resample (row |
parallel |
Logical. Use multi-core resampling. Default
|
n_cores |
Integer. Worker count when |
verbose |
Logical. Print progress every 100 replicates.
Default |
... |
Reserved for future use. |
Sequence-level resampling (default for multi-sequence input).
Each resample draws S sequence indices with replacement from
seq_len(S) and rebuilds the multi-sequence input as the
corresponding list of event vectors. Preserves within-sequence
structure.
Event-level resampling (single-sequence input). Implements
the stationary block bootstrap of Politis & Romano (1994). Block
length is geometric with mean block_length; resampled blocks
wrap around the event stream and are concatenated until total
length equals the original T.
Reproducibility hook. Supply indices as an R x S integer
matrix of sequence indices (sequence-level) or an R x T matrix of
event positions (event-level) to deterministically replay the
bootstrap across sessions, processes, or languages. The event-level
matrix holds the fully expanded resampled positions, i.e. the same
form produced internally, so a captured indices_used can be fed
straight back in.
NA handling. Per-cell summary statistics (mean, se,
ci_low, ci_high, p_boot) are computed with na.rm = TRUE,
so replicates that produced NA for a given cell (for example
structural-zero cells, or cells whose row marginal collapsed to
zero in the resampled data) are excluded from that cell's
summary. The summary therefore reflects only the finite
replicates; cells whose every replicate was NA come back as
NA themselves.
An object of class c("lsa_bootstrap", "list") with:
Tidy per-edge data frame with observed + bootstrap
mean, se, ci_low, ci_high, p_boot, and stable for
count, adj_res, prob, and yules_q.
R x K^2 numeric matrix: cell-wise observed
count from each replicate (flattened in as.vector(obs)
order).
R x K^2 matrix of adjusted residuals.
Recipe metadata.
Reference to the original fit (for $params / labels).
Efron, B. (1979). Bootstrap methods: another look at the jackknife. Annals of Statistics, 7(1), 1-26.
Politis, D. N., & Romano, J. P. (1994). The stationary bootstrap. Journal of the American Statistical Association, 89(428), 1303-1313.
permute_lsa(), stability_lsa()
fit <- lsa(engagement, engine = "classical") bs <- bootstrap_lsa(fit, R = 200) head(bs$edges)fit <- lsa(engagement, engine = "classical") bs <- bootstrap_lsa(fit, R = 200) head(bs$edges)
Closed-form Bayesian alternative to bootstrap_lsa() for the
transition-probability edges of an lsa fit. Each state's outgoing
transitions are modelled as Dirichlet-Multinomial: with a Jeffreys
prior the posterior for a row is Dirichlet(count + prior), so each
edge probability is marginally Beta(a, b) and its posterior mean,
standard deviation, credible interval and stability decision are
available analytically. No resampling, so it runs in microseconds.
certainty_lsa( fit, prior = 0.5, level_alpha = 0.95, inference = c("stability", "threshold"), consistency_range = c(0.75, 1.25), edge_threshold = NULL )certainty_lsa( fit, prior = 0.5, level_alpha = 0.95, inference = c("stability", "threshold"), consistency_range = c(0.75, 1.25), edge_threshold = NULL )
fit |
An |
prior |
Numeric > 0. Dirichlet prior concentration added to every
cell. Default |
level_alpha |
Numeric in (0, 1). Credible-interval level. Default
|
inference |
|
consistency_range |
Length-2 multiplicative bounds for stability
inference. Default |
edge_threshold |
Numeric or |
The result carries class c("lsa_certainty", "lsa_bootstrap") and an
edges table with the columns plot_forest() and
as.data.frame() expect, so it is a drop-in for a bootstrap result
(use metric = "prob").
Certainty vs bootstrap. Both answer "how precisely is this edge
pinned down?". They agree on homogeneous data. The Dirichlet posterior
treats transitions as independent, so on strongly heterogeneous data (a
mixture of latent classes with long sequences) it reports more
certainty than the sequence bootstrap – prefer bootstrap_lsa() then.
An object of class c("lsa_certainty", "lsa_bootstrap", "list")
with an edges data frame (from, to, prob_observed,
prob_mean, prob_se, prob_ci_low, prob_ci_high, p_value,
stable, plus adj_res_observed/adj_res_stable for plotting), the
posterior matrices (mean, sd, ci_lower, ci_upper), and call
metadata (prior, level_alpha, inference, ...). For an
lsa_group, a named list of these (class lsa_certainty_group).
Johnston, L. & Jendoubi, T. (2026). How Delivery Mode Reshapes Resource Engagement: A Bayesian Differential Network Analysis. TNA Workshop 2026.
bootstrap_lsa(), stability_lsa(), plot_forest()
fit <- lsa(engagement) cert <- certainty_lsa(fit) cert head(as.data.frame(cert))fit <- lsa(engagement) cert <- certainty_lsa(fit) cert head(as.data.frame(cert))
Permutation test for whether groups produce different LSA transition
structures. For each pair of groups it pools their sequences,
repeatedly reassigns the group label of whole sequences (preserving
the original group sizes), refits each pseudo-group, and builds a
permutation distribution of the per-edge difference in the chosen
measure. The two-sided p-value is
(1 + #{ |diff_perm| >= |diff_obs| }) / (1 + R) (Phipson & Smyth
2010). A single omnibus test of overall difference is reported from
the same permutations.
compare_lsa( x, y = NULL, R = 1000L, measure = c("log_or", "adj_res", "yules_q", "prob", "count", "lift"), adjust = "none", min_count = 5L, parallel = FALSE, n_cores = NULL, verbose = FALSE, ... )compare_lsa( x, y = NULL, R = 1000L, measure = c("log_or", "adj_res", "yules_q", "prob", "count", "lift"), adjust = "none", min_count = 5L, parallel = FALSE, n_cores = NULL, verbose = FALSE, ... )
x |
Either an |
y |
When |
R |
Integer. Number of label permutations per comparison.
Default |
measure |
Character. The per-edge quantity compared between
groups. Default |
adjust |
Multiple-comparison correction; any method accepted by
|
min_count |
Integer. Minimum pooled observed count (group a +
group b) for a transition to be tested. Default |
parallel |
Logical. Use multi-core resampling. Default |
n_cores |
Integer. Worker count when |
verbose |
Logical. Print progress every 100 permutations. |
... |
Reserved. NA handling. Non-estimable cells (structural zeros, zero-margin
rows in a permuted pseudo-group) carry Interpretation caveats. The odds ratio is non-collapsible: the per-group log odds ratios are group-specific departure-from- independence measures and should not be pooled across groups that have different marginal state distributions. As with any LSA, a between-group difference can also be driven by subgroup composition (Simpson's paradox); confirm with subgroup analysis when a confound is plausible. |
With exactly two groups a single comparison is returned. With more
than two groups every pairwise comparison is run and the requested
adjust correction is applied once across the whole family of
per-edge p-values (and separately across the per-pair omnibus tests),
giving family-wise control rather than per-pair control.
For two groups, an object of class
c("lsa_comparison", "list") with:
Tidy per-edge data frame: from, to, the measure in
each group (<measure>_a, <measure>_b), their difference
diff (= a - b), the permutation p-value p_perm, the adjusted
p-value p_adj, and a significant flag.
Omnibus test list: statistic (observed sum of
squared edge differences), p_value, and R.
R x K^2 matrix of permuted edge differences.
Call metadata; groups is the
length-two character vector of group labels (a, b).
The two original fits, named by group.
For more than two groups, an object of class
c("lsa_comparison_pairwise", "list") with:
Tidy per-edge data frame across all pairs, prefixed by
group_a, group_b; p_adj and significant reflect the
family-wide correction.
One row per pair: group_a, group_b, statistic,
p_value, and the across-pairs adjusted p_adj.
Named list of the underlying two-group
lsa_comparison objects (each fit with adjust = "none"), for
drill-down.
Call metadata; groups lists all
group labels.
Phipson, B., & Smyth, G. K. (2010). Permutation p-values should never be zero. Statistical Applications in Genetics and Molecular Biology, 9(1), Article 39.
van Borkulo, C. D., et al. (2022). Comparing network structures on three aspects: A permutation test. Psychological Methods.
permute_lsa(), bootstrap_lsa()
# group_regulation is wide sequences with no grouping column, so # derive one: sessions whose first regulation act is planning vs not. grp <- ifelse(group_regulation$T1 == "plan", "starts_plan", "other") g <- lsa(group_regulation, group = grp) cmp <- compare_lsa(g, R = 200) head(cmp$edges) cmp$global# group_regulation is wide sequences with no grouping column, so # derive one: sessions whose first regulation act is planning vs not. grp <- ifelse(group_regulation$T1 == "plan", "starts_plan", "other") g <- lsa(group_regulation, group = grp) cmp <- compare_lsa(g, R = 200) head(cmp$edges) cmp$global
Wide-format categorical sequence data: 138 students observed over 15
weekly time points. Each row is one student; each column is a
week. Entries are the student's engagement state for that week,
one of "Active", "Average", "Disengaged", or NA for missing
weeks.
engagementengagement
A character matrix with 138 rows and 15 columns.
This is a standard small-K, multi-sequence example for lag
sequential analysis: K = 3 states, S = 138 sequences, mean sequence
length about 15. It exercises the wide-matrix input path of
lsa_data() and produces a stable transition pattern with clear
adjusted-residual signals.
Derived without modification from the trajectories matrix
in the Nestimate package
(https://github.com/mohsaqr/Nestimate), which is MIT-licensed
and produced by Saqr and collaborators as a synthetic engagement
trajectory example. Re-shipped here for convenience and offline
testing; both attribution and license are preserved.
fit <- lsa(engagement, engine = "classical") fit fit$adj_resfit <- lsa(engagement, engine = "classical") fit fit$adj_res
Retrieve a Registered LSA Engine
get_lsa_engine(name)get_lsa_engine(name)
name |
Character scalar. The engine's identifier. |
The registry entry: a list with elements name, fn,
description, requires.
register_lsa_engine(), list_lsa_engines()
A wide-format dataset of group regulation during collaborative
learning: each row is one group's session, each column an ordered
time point, and each cell a coded regulation action. Shorter sessions
are padded with NA. It is the flagship example for the package
vignette: 9 states over 2,000 sequences gives a realistically rich
transition network.
group_regulationgroup_regulation
A data.frame with 2,000 rows (sequences) and 26 columns
(ordered time points), character-coded.
The nine coded actions are adapt, cohesion, consensus,
coregulate, discuss, emotion, monitor, plan, and
synthesis.
fit <- lsa(group_regulation) transitions(fit, significant = TRUE)fit <- lsa(group_regulation) transitions(fit, significant = TRUE)
Chronological sequence of primary genres for 1,000 highly-rated
IMDB films (averageRating >= 7.0, numVotes >= 1000, release
years 1970-2024). Each event is one film's first-listed genre; the
sequence is sorted by startYear ascending, breaking ties by
descending rating.
imdb_genresimdb_genres
A named list with elements:
Character vector of length 1,000: the chronological primary-genre sequence.
Integer vector of corresponding release years.
Character vector of decade labels ("1970s", "1980s", ...).
Numeric vector of IMDB average ratings.
Character vector of primary movie titles.
Sorted character vector of the 16 distinct
genres in sequence.
Citation string.
MIT (cooccure package); IMDB raw data is Open Data.
Summary metadata.
This dataset serves as a medium-K, medium-N validation input
outside the learning-analytics domain that dominates the rest of
the lagseq battery. It exercises K = 16 with N = 1,000 events and
has no published LSA result to validate against; instead, the
test suite cross-validates lagseq's classical engine against
stats::chisq.test() on this exact sequence.
Derived from cooccure::movies (MIT,
https://github.com/mohsaqr/cooccure); IMDB raw data
(https://www.imdb.com/interfaces/) is Open Data.
fit <- lsa(imdb_genres$sequence, engine = "classical") fit # Top over-represented genre transitions head(transitions(fit, significant = TRUE, direction = "over"))fit <- lsa(imdb_genres$sequence, engine = "classical") fit # Top over-represented genre transitions head(transitions(fit, significant = TRUE, direction = "over"))
The proportion of sequences starting in each state, as a tidy
data.frame.
initial(fit) ## S3 method for class 'lsa' initial(fit) ## S3 method for class 'lsa_group' initial(fit)initial(fit) ## S3 method for class 'lsa' initial(fit) ## S3 method for class 'lsa_group' initial(fit)
fit |
An |
A data.frame with columns state, init_prob; zero rows
when the fit came from a transition matrix (no initial states).
initial(lsa(group_regulation))initial(lsa(group_regulation))
Event sequences from 29 undergraduate learners interacting with a
knowledge-graph learning environment over an exam-preparation
session. Each sequence is a chronologically ordered character
vector of action codes drawn from
c("L01", "L02", ..., "L12", "E") where L01..L12 are knowledge
nodes the learner visited (L01 = "Civil disputes", L02 =
"Civil litigation", ...) and E = exercise attempt.
kg_logskg_logs
A named list of 29 character vectors with a "group"
attribute (also a named character vector).
Learners are labeled by 10-digit numeric IDs (used as names()).
The attribute "group" is a named character vector classifying
each learner into the post-test performance group used by the
source paper: low, medium, or high (stored under the source's
original ordinal labels).
This data set is intended as a real-world third-party validation
input for lsa() and is shipped alongside kg_lsa_oracle, which
contains the source paper's own published LSA results computed on
the same data.
Du, J. (2026). The dataset of "Sequential Behavioral Mechanisms Linking Learning Paths to Academic Performance in Knowledge Graph Environments". Mendeley Data V1. doi:10.17632/bdwcj7vw94.1. License: CC BY 4.0. Re-distributed without modification.
kg_lsa_oracle for the source paper's published LSA matrices used as a validation oracle.
length(kg_logs) head(kg_logs[[1]], 10) table(attr(kg_logs, "group")) fit <- lsa(kg_logs, engine = "classical") fitlength(kg_logs) head(kg_logs[[1]], 10) table(attr(kg_logs, "group")) fit <- lsa(kg_logs, engine = "classical") fit
The lag-sequential-analysis outputs published by Du Jun (2026) for
the same 29-learner dataset shipped in kg_logs. Used as an
independent third-party validation oracle for lsa().
kg_lsa_oraclekg_lsa_oracle
A named list with four elements (overall, low, mid,
high), each itself a list with:
The published 13 x 13 transition-frequency matrix (JNTF), with rows = "given" codes and columns = "target" codes.
The published 13 x 13 adjusted-residual matrix (ADJR), printed to two decimal places in the source.
The published values were extracted verbatim from the spreadsheet deposited at doi:10.17632/bdwcj7vw94.1 (the overall-analysis sheet and the low-, medium-, and high-result sheets). The author used the published output (matrices labeled JNTF for joint transition frequencies and ADJR for adjusted residuals) computed from the overall wide-format sequence sheet.
Note that the published total sum(obs) is 870 transitions; running
lagseq on the same wide-format sheet yields 871, an
off-by-one difference attributable to a minor undocumented
preprocessing step in the source paper. Cell-level agreement is
typically within 1-5 events out of 870, and adjusted-residual
agreement is within 0.5 in roughly 90% of cells. See the test file
tests/testthat/test-published-kg.R for the precise agreement
thresholds.
Du, J. (2026). The dataset of "Sequential Behavioral Mechanisms Linking Learning Paths to Academic Performance in Knowledge Graph Environments". Mendeley Data V1. doi:10.17632/bdwcj7vw94.1. License: CC BY 4.0.
kg_logs for the raw event sequences.
How one from -> to transition behaves across lags, as a tidy
one-row-per-lag data frame. A clean shortcut for "track this
transition over lags 1, 2, 3, ...".
lag_profile(x, from, to, lags = 1:3, ...)lag_profile(x, from, to, lags = 1:3, ...)
x |
Sequence input (any form accepted by |
from, to
|
State labels of the transition to profile. |
lags |
Integer vector of lags. Default |
... |
Passed to |
A tidy data.frame, one row per lag, with columns lag,
from, to, count, prob, adj_res, p, and significant.
lag_profile(group_regulation, "plan", "consensus", lags = 1:3)lag_profile(group_regulation, "plan", "consensus", lags = 1:3)
List All Registered LSA Engines
list_lsa_engines()list_lsa_engines()
A data.frame with columns name, description, requires.
register_lsa_engine(), get_lsa_engine()
Fits a lag sequential analysis (LSA) on categorical event sequence data using a registered engine. Returns a tidy S3 object with named slots for observed/expected/probability/residual matrices and a long-format edge table suitable for transition-network visualization.
lsa_classical(data, ...) lsa_two_cell(data, ...) lsa_bidirectional(data, ...) lsa_parallel_dominance(data, ...) lsa_nonparallel_dominance(data, ...) lsa( data, lag = 1, engine = "classical", alternative = c("two.sided", "greater", "less"), alpha = 0.05, loops = TRUE, structural_zeros = NULL, labels = NULL, group = NULL, actor = NULL, action = NULL, time = NULL, order = NULL, session = NULL, time_threshold = 900, custom_format = NULL, is_unix_time = FALSE, unix_time_unit = "seconds", params = list(), ... )lsa_classical(data, ...) lsa_two_cell(data, ...) lsa_bidirectional(data, ...) lsa_parallel_dominance(data, ...) lsa_nonparallel_dominance(data, ...) lsa( data, lag = 1, engine = "classical", alternative = c("two.sided", "greater", "less"), alpha = 0.05, loops = TRUE, structural_zeros = NULL, labels = NULL, group = NULL, actor = NULL, action = NULL, time = NULL, order = NULL, session = NULL, time_threshold = 900, custom_format = NULL, is_unix_time = FALSE, unix_time_unit = "seconds", params = list(), ... )
data |
Sequence input (any form accepted by |
... |
Additional engine-specific parameters (merged into
|
lag |
Integer. The transition lag. Default |
engine |
Character scalar. The engine name, registered via
|
alternative |
Character scalar. The alternative hypothesis for
adjusted-residual and kappa p-values: one of |
alpha |
Numeric. Significance threshold used to mark edges as
significant in |
loops |
Logical. Keep self-transitions (the diagonal)? Default
|
structural_zeros |
Optional |
labels |
Optional character vector of state labels. |
group |
Optional grouping for a multi-group fit. Either a vector
with one entry per input sequence (length |
actor, action, time, order, session
|
Column names (each a single
string) for long-format event-log input. Supplying |
time_threshold |
Numeric. Maximum gap in seconds between
consecutive events before a new session is started in long-format
mode. Default |
custom_format |
Optional |
is_unix_time |
Logical. Treat the |
unix_time_unit |
Character. Unit of the Unix epoch when
|
params |
Optional named list of engine-specific parameters forwarded to the engine function. |
An object of class c("lsa", "cograph_network"). Read it with
the verbs rather than by reaching into slots: transitions() for the
tidy edge table, nodes(), tests(), initial(), and summary()
for the other results, and plot()/plot_transitions() to draw it.
Every number a verb returns is backed by these slots:
The tidy one-row-per-transition frame that backs
transitions() (with extra cograph_network protocol columns).
Data frame backing nodes(): id, label, name, outgoing, incoming.
The same per-cell quantities as edges, in
K x K matrix form (prob is row-conditional P(to | from),
prob_col column-conditional P(from | to)). Convenient for
matrix algebra; not the primary interface.
Lists (statistic, df, p) backing tests(): the
tablewise likelihood-ratio (G^2) and Pearson chi-square tests of
independence; NULL for engines without an expected table.
Named numeric vector backing initial() (proportion
of sequences starting in each state, sums to 1); NULL for
transition-matrix input.
K x K matrix used as the default edge weight for
plotting. Equal to obs (counts) by default.
Logical scalar; TRUE for directed engines,
FALSE for bidirectional.
Engine name (the slot the cograph_network protocol
reads). Also recorded in params$engine.
The canonical lsa_data object (events + seq_id).
Immutable snapshot of all parameters used (recipe),
including params$engine.
List with source, IPF info, version, and call.
When group is supplied, returns an object of class
c("lsa_group", "list"): a named list of lsa fits (one per group
level) carrying levels, group_sizes, labels, and engine
attributes. Downstream verbs (lsa_to_tna(),
transitions(), reliability_lsa(), etc.) dispatch on
it and return grouped results.
lsa_data(), lsa_transitions(), register_lsa_engine(),
list_lsa_engines()
seq <- c("Question", "Explain", "Agree", "Question", "Explain", "Elaborate", "Agree", "Question", "Explain") fit <- lsa(seq, engine = "classical") fit head(fit$edges)seq <- c("Question", "Explain", "Agree", "Question", "Explain", "Elaborate", "Agree", "Question", "Explain") fit <- lsa(seq, engine = "classical") fit head(fit$edges)
Coerces a wide variety of user input shapes into a single canonical representation used by every downstream lagseq function (engines, bootstrap, permutation, grouping, plotting).
lsa_data(x, labels = NULL)lsa_data(x, labels = NULL)
x |
Sequence input. See Details. |
labels |
Optional character vector of label names for the
states. When |
Accepted input forms:
An atomic vector of integer or character codes — treated as a single sequence.
A list of atomic vectors — treated as multiple independent sequences; transitions are not counted across sequence boundaries.
A wide matrix or data.frame with rows = sequences, columns =
ordered time points. Missing values (NA) and empty strings are
treated as missingness, not as a state: they are dropped wherever
they occur in a row and the surrounding events close up, so no
transition is counted into or out of a gap. To model missingness
as its own state, recode it (e.g. NA -> "missing") before
calling lsa().
A square numeric matrix of pre-computed transition counts. Row
i, column j is the count of i -> j transitions. In this case
events and seq_id are not available and downstream resampling
tools that need event-level data will error.
A sequence-bearing object: a tna or
group_tna (sequences read from its $data slot), a tna_data
or nestimate_data ($sequence_data), or an stslist. The stored event
sequences are recovered and analysed.
A tna built from a bare matrix (no retained sequences) errors,
because transition counts cannot be recovered from probability
weights.
An object of class c("lsa_data", "list") with elements:
Integer vector of event codes (1-indexed), or
NULL if input was a transition matrix.
Integer vector of sequence membership, same length
as events, or NULL if input was a transition matrix.
Character vector of state labels.
Number of distinct states (K).
Integer count of independent sequences.
Total number of events across all sequences.
Integer vector: number of transitions each sequence contributes at lag 1.
One of "events", "transitions" — flags whether
event-level data is available.
If source = "transitions", the original K x K
count matrix. Otherwise NULL.
# Single character sequence d1 <- lsa_data(c("a", "b", "a", "c", "b")) d1$n_events # Multiple sequences d2 <- lsa_data(list(c("a", "b", "a"), c("b", "c", "a", "b"))) d2$n_sequences # Pre-computed transition matrix tm <- matrix(c(0, 3, 1, 2, 0, 4, 5, 1, 0), 3, 3, dimnames = list(c("a","b","c"), c("a","b","c"))) d3 <- lsa_data(tm) d3$source# Single character sequence d1 <- lsa_data(c("a", "b", "a", "c", "b")) d1$n_events # Multiple sequences d2 <- lsa_data(list(c("a", "b", "a"), c("b", "c", "a", "b"))) d2$n_sequences # Pre-computed transition matrix tm <- matrix(c(0, 3, 1, 2, 0, 4, 5, 1, 0), 3, 3, dimnames = list(c("a","b","c"), c("a","b","c"))) d3 <- lsa_data(tm) d3$source
Fits expected cell frequencies under the row + column independence
model when some cells are constrained to be exactly zero
(structural zeros). The fitted table has the same row and column
marginals as obs, satisfies E[i, j] = 0 wherever
structure[i, j] = 0, and is the maximum likelihood estimate under
the independence model restricted to the non-zero pattern.
lsa_ipf(obs, structure = NULL, tol = 1e-08, max_iter = 200L)lsa_ipf(obs, structure = NULL, tol = 1e-08, max_iter = 200L)
obs |
Numeric |
structure |
Numeric |
tol |
Numeric. Convergence tolerance on marginal differences.
Default |
max_iter |
Integer. Maximum number of row+column scaling
passes. Default |
Implementation follows Wickens (1989), pp. 107-112: alternately
scale rows then columns of an initialized expected table until row
and column marginals converge to those of obs within tol.
A list with elements:
The fitted K x K expected-frequency matrix.
Number of row+column passes used.
Logical scalar: whether convergence was reached
within max_iter.
Maximum absolute difference between observed and fitted marginals at termination.
Wickens, T. D. (1989). Multiway contingency tables analysis for the social sciences, pp. 107-112. Lawrence Erlbaum.
obs <- matrix(c(0, 4, 6, 3, 0, 5, 7, 2, 0), nrow = 3, byrow = TRUE) fit <- lsa_ipf(obs) fit$fit # fitted expected counts all.equal(rowSums(fit$fit), rowSums(obs)) # TRUE all.equal(colSums(fit$fit), colSums(obs)) # TRUEobs <- matrix(c(0, 4, 6, 3, 0, 5, 7, 2, 0), nrow = 3, byrow = TRUE) fit <- lsa_ipf(obs) fit$fit # fitted expected counts all.equal(rowSums(fit$fit), rowSums(obs)) # TRUE all.equal(colSums(fit$fit), colSums(obs)) # TRUE
Fits lsa() at each requested lag and returns the fits together, so
you can compare a transition's strength across lags (a lag
profile). Each element is an ordinary lsa fit.
lsa_lags(data, lags = 1:3, ...)lsa_lags(data, lags = 1:3, ...)
data |
Sequence input (any form accepted by |
lags |
Integer vector of lags. May include negative lags
(predecessors) and |
... |
Passed to |
An object of class c("lsa_lags", "list"): a named list of
lsa fits (names "lag1", "lag2", ...), with a lags attribute.
as.data.frame() on it row-binds transitions() of every fit (each
already carries its lag column) into one tidy long frame with the
same columns as transitions().
prof <- lsa_lags(engagement, lags = 1:3) prof # Track one transition across lags with the dedicated verb: lag_profile(engagement, from = "Active", to = "Average", lags = 1:3)prof <- lsa_lags(engagement, lags = 1:3) prof # Track one transition across lags with the dedicated verb: lag_profile(engagement, from = "Active", to = "Average", lags = 1:3)
Convert an lsa fit to a tna-class network object usable by the
tna package's centrality, pruning, community, and bootstrap
routines. Requires the tna package (declared in Suggests).
lsa_to_tna(x, ...) ## S3 method for class 'lsa' lsa_to_tna(x, weights = c("prob", "count", "adj_res", "lift"), ...) ## S3 method for class 'lsa_group' lsa_to_tna(x, weights = c("prob", "count", "adj_res", "lift"), ...)lsa_to_tna(x, ...) ## S3 method for class 'lsa' lsa_to_tna(x, weights = c("prob", "count", "adj_res", "lift"), ...) ## S3 method for class 'lsa_group' lsa_to_tna(x, weights = c("prob", "count", "adj_res", "lift"), ...)
x |
An |
... |
Method-specific arguments. |
weights |
Character. Which matrix to expose as the tna edge
weights. One of |
The function is deliberately not named as_tna, to avoid
overlapping export names with other packages so they can be loaded
together without masking each other. lsa_to_tna() is lagseq's own,
collision-free converter.
For an lsa fit, a tna object with weights, inits,
labels, data slots and type/scaling/class attributes
(type is "frequency" for counts, "relative" otherwise). For
an lsa_group, a group_tna (named list of tna objects).
The data/inits slots (the sequences tna resamples from) are
attached only for weights = "prob" or "count", which tna can
re-estimate faithfully. For "adj_res" and "lift" they are
omitted with a warning, because tna's resampling verbs would
re-estimate a probability network rather than the residual/lift
scale carried in weights; the non-resampling verbs
(centralities, prune, ...) still work.
## Not run: fit <- lsa(engagement, engine = "classical") net <- lsa_to_tna(fit, weights = "prob") tna::centralities(net) tna::prune(net, method = "threshold", threshold = 0.05) ## End(Not run)## Not run: fit <- lsa(engagement, engine = "classical") net <- lsa_to_tna(fit, weights = "prob") tna::centralities(net) tna::prune(net, method = "threshold", threshold = 0.05) ## End(Not run)
Computes the K x K transition count matrix from canonical lag
sequential data, optionally returning a tidy long-format edge table
alongside the matrix.
lsa_transitions(x, lag = 1)lsa_transitions(x, lag = 1)
x |
Either an lsa_data object or any input accepted by
|
lag |
Integer. The lag at which to count transitions; default
|
Transitions are counted within sequences only; no transition spans a
sequence boundary. For input that was supplied as a pre-computed
transition matrix (source = "transitions" on the lsa_data
object), the input matrix is returned at lag 1 and an error is
raised for any other lag.
An object of class c("lsa_transitions", "list") with
elements:
The K x K observed transition count matrix with
dimnames set to the labels.
Length-K vector rowSums(obs).
Length-K vector colSums(obs).
Scalar sum(obs).
The lag used.
Character vector of state labels.
Tidy long-format data.frame with one row per
(from, to) cell containing columns from, to, lag,
count, row_total, col_total, n_transitions.
d <- lsa_data(c("a", "b", "a", "c", "b", "a")) tx <- lsa_transitions(d, lag = 1) tx$obs head(tx$edges)d <- lsa_data(c("a", "b", "a", "c", "b", "a")) tx <- lsa_transitions(d, lag = 1) tx$obs head(tx$edges)
The states and their incoming / outgoing transition totals, as a
tidy data.frame (one row per state).
nodes(fit) ## S3 method for class 'lsa' nodes(fit) ## S3 method for class 'lsa_group' nodes(fit)nodes(fit) ## S3 method for class 'lsa' nodes(fit) ## S3 method for class 'lsa_group' nodes(fit)
fit |
An |
A data.frame with columns state (the state name, matching
the from/to endpoints of transitions()), outgoing, and
incoming (its total out- and in-transition counts).
nodes(lsa(group_regulation))nodes(lsa(group_regulation))
The complete published input/output pair from the canonical lag-sequential-analysis methods paper: O'Connor, B. P. (1999), "Simple and flexible SAS and SPSS programs for analyzing lag-sequential categorical data", Behavior Research Methods, Instruments, & Computers, 31(4), 718-726. doi:10.3758/BF03200753.
oconnor_coupleoconnor_couple
A named list with elements:
Integer vector of length 393. Codes 1-6.
6 x 6 integer matrix: published transition frequency
matrix (sum(obs) = 392).
6 x 6 numeric matrix: published expected frequencies under row x column independence.
6 x 6 numeric matrix: published transitional probabilities.
List (statistic, df, p) for the tablewise
likelihood-ratio chi-square: statistic = 202.5009, df = 25,
p approximately 0.
6 x 6 numeric matrix of published adjusted residuals.
6 x 6 matrix of published p-values for the adjusted residuals.
6 x 6 matrix of published Yule's Q values.
6 x 6 matrix of published Wampold-style unidirectional kappas.
6 x 6 matrix of published kappa z-scores.
6 x 6 matrix of published kappa p-values.
List with the permutation test outputs
(n_blocks = 10, n_perm_block = 1000, plus p_mean,
p_high, p_low matrices). Used by Step 5
permute_lsa() validation.
Citation and metadata.
The paper publishes a 393-event input sequence (couple-interaction data from Gottman & Roy 1990, p. 78; Appendix A) plus the full numerical output of the SEQUENTIAL program for that exact input (Appendix B). Six behavior codes; 392 transitions at lag 1.
This is the gold-standard validation oracle for any lag-sequential
analysis implementation. lagseq's classical engine reproduces every
cell of every output matrix to better than the paper's own printed
precision (3-4 decimal places); see
tests/testthat/test-published-oconnor.R.
O'Connor, B. P. (1999). Behavior Research Methods, Instruments, & Computers, 31(4), 718-726. doi:10.3758/BF03200753. The underlying interaction data are from Gottman, J. M., & Roy, A. K. (1990), p. 78.
fit <- lsa(oconnor_couple$sequence, engine = "classical") # Reproduce every output matrix in Appendix B: all.equal(unname(fit$obs), unname(oconnor_couple$obs)) all.equal(unname(fit$exp), unname(oconnor_couple$expected), tolerance = 1e-3) all.equal(unname(fit$adj_res), unname(oconnor_couple$adj_res), tolerance = 1e-3) all.equal(unname(fit$yules_q), unname(oconnor_couple$yules_q), tolerance = 1e-3) all.equal(unname(fit$kappa), unname(oconnor_couple$kappa), tolerance = 1e-3)fit <- lsa(oconnor_couple$sequence, engine = "classical") # Reproduce every output matrix in Appendix B: all.equal(unname(fit$obs), unname(oconnor_couple$obs)) all.equal(unname(fit$exp), unname(oconnor_couple$expected), tolerance = 1e-3) all.equal(unname(fit$adj_res), unname(oconnor_couple$adj_res), tolerance = 1e-3) all.equal(unname(fit$yules_q), unname(oconnor_couple$yules_q), tolerance = 1e-3) all.equal(unname(fit$kappa), unname(oconnor_couple$kappa), tolerance = 1e-3)
Empirical null-distribution p-values for every cell of an LSA
transition matrix. Repeatedly shuffles the input event vector
(within sequence boundaries) and recomputes the engine's residual
matrix, producing a permutation distribution for each cell. The
two-sided p-value is (1 + #{ |stat_perm| >= |stat_obs| }) / (1 + R)
(Phipson & Smyth 2010).
permute_lsa( fit, R = 1000L, within_sequence = TRUE, shuffles = NULL, parallel = FALSE, n_cores = NULL, verbose = FALSE, ... )permute_lsa( fit, R = 1000L, within_sequence = TRUE, shuffles = NULL, parallel = FALSE, n_cores = NULL, verbose = FALSE, ... )
fit |
An |
R |
Integer. Number of permutations. Default |
within_sequence |
Logical. When |
shuffles |
Optional list of length |
parallel |
Logical. Use multi-core resampling. Default
|
n_cores |
Integer. Worker count when |
verbose |
Logical. Print progress every 100 replicates. |
... |
Reserved. NA handling. The exceedance count that drives |
An object of class c("lsa_permutation", "list") with:
Tidy per-edge data frame with observed counts and
residuals, the empirical permutation p-value p_perm, and a
significant flag at the recipe's alpha threshold.
R x K^2 numeric matrix of permuted
residuals (cells in as.vector(adj_res) order).
Recipe metadata.
Reference to the original fit.
Castellan, N. J. (1992). Shuffling arrays: appearances may be deceiving. Behavior Research Methods, Instruments, & Computers, 24(1), 72-77.
Phipson, B., & Smyth, G. K. (2010). Permutation p-values should never be zero: calculating exact p-values when permutations are randomly drawn. Statistical Applications in Genetics and Molecular Biology, 9(1), Article 39.
bootstrap_lsa(), stability_lsa()
fit <- lsa(engagement, engine = "classical") pm <- permute_lsa(fit, R = 200) head(pm$edges)fit <- lsa(engagement, engine = "classical") pm <- permute_lsa(fit, R = 200) head(pm$edges)
Draws the transition structure as a chord diagram via
cograph::plot_chord(): states are arcs on an outer ring and each
transition is a curved ribbon whose width is its frequency (or
probability) and whose fill colour is its adjusted residual
(warm = over-represented, cool = avoided). Supply a second fit as
compare to fill each ribbon by the difference in the colour metric
between the two fits.
plot_chords( fit, compare = NULL, width = c("count", "prob"), color = c("residuals", "lift", "prob", "count"), significant = FALSE, self_loops = TRUE, alpha = 0.6, ... )plot_chords( fit, compare = NULL, width = c("count", "prob"), color = c("residuals", "lift", "prob", "count"), significant = FALSE, self_loops = TRUE, alpha = 0.6, ... )
fit |
An |
compare |
Optional second |
width |
Which non-negative quantity sets ribbon width: |
color |
Which quantity fills the ribbons: |
significant |
Logical. Keep only significant transitions (drops
the others' ribbons). Ignored when |
self_loops |
Logical. Draw self-transition ribbons. Default
|
alpha |
Ribbon fill opacity. Default |
... |
Passed to |
This is the circular companion to the plot.lsa() heatmap and the
plot_transitions() network. Like them it delegates the drawing to
cograph; it needs the cograph package installed.
Invisibly, the list returned by cograph::plot_chord()
(segments and chords data frames). Drawn as a side effect.
plot.lsa() (heatmap), plot_transitions() (network),
transitions()
## Not run: fit <- lsa(group_regulation) plot_chords(fit) # ribbons filled by residual plot_chords(fit, width = "prob") # width = probability plot_chords(fit, significant = TRUE, ticks = TRUE) # Compare two groups: ribbon colour = difference in residuals. g <- lsa(group_regulation, group = rep(c("A", "B"), length.out = nrow(group_regulation))) plot_chords(g$A, compare = g$B) ## End(Not run)## Not run: fit <- lsa(group_regulation) plot_chords(fit) # ribbons filled by residual plot_chords(fit, width = "prob") # width = probability plot_chords(fit, significant = TRUE, ticks = TRUE) # Compare two groups: ribbon colour = difference in residuals. g <- lsa(group_regulation, group = rep(c("A", "B"), length.out = nrow(group_regulation))) plot_chords(g$A, compare = g$B) ## End(Not run)
Draws a radial forest of an bootstrap_lsa() result: each transition
is a spoke around a ring, spanning its bootstrap confidence interval,
with a square at the observed estimate and a dashed reference ring at
the null. Spokes whose adjusted residual is significant across
resamples are coloured by direction (warm = over-represented, cool =
avoided); non-significant ones are grey. Needs ggplot2.
plot_forest( boot, metric = c("residuals", "count", "prob", "yules_q"), n_top = NULL, show_nonsig = TRUE, label_size = 2.6 ) ## S3 method for class 'lsa_bootstrap' plot(x, ...)plot_forest( boot, metric = c("residuals", "count", "prob", "yules_q"), n_top = NULL, show_nonsig = TRUE, label_size = 2.6 ) ## S3 method for class 'lsa_bootstrap' plot(x, ...)
boot |
An |
metric |
Which bootstrapped quantity to plot: |
n_top |
Optional integer: keep only the |
show_nonsig |
Logical. Draw non-significant edges (grey). Default
|
label_size |
Edge-label text size. Default |
x |
An |
... |
Passed to |
A ggplot object (drawn when printed).
bootstrap_lsa(), plot.lsa() (heatmap),
plot_polar() (sunburst)
## Not run: fit <- lsa(group_regulation) b <- bootstrap_lsa(fit, R = 500) plot_forest(b) # residual CIs, circular plot_forest(b, metric = "prob") # probability CIs plot_forest(b, show_nonsig = FALSE) # significant transitions only ## End(Not run)## Not run: fit <- lsa(group_regulation) b <- bootstrap_lsa(fit, R = 500) plot_forest(b) # residual CIs, circular plot_forest(b, metric = "prob") # probability CIs plot_forest(b, show_nonsig = FALSE) # significant transitions only ## End(Not run)
Draws the transition structure as a polar sunburst with the source
states named along a large inner ring. Two styles: "rose" (default)
gives every target an equal angular slot and encodes frequency as the
radial bar height (so nothing crams); "wedge" sizes each
transition's angular width by its frequency share (the classic
look), omitting tiny wedges. Both fill by the adjusted residual (warm
= over-represented, cool = avoided), sharing the plot.lsa() heatmap
colour scale. Needs ggplot2.
plot_polar( fit, style = c("rose", "wedge"), fill = c("residuals", "prob", "lift"), size = c("count", "prob"), significant = FALSE, labels = c("all", "auto", "none"), min_show = 0.01, label_size = 3, ... )plot_polar( fit, style = c("rose", "wedge"), fill = c("residuals", "prob", "lift"), size = c("count", "prob"), significant = FALSE, labels = c("all", "auto", "none"), min_show = 0.01, label_size = 3, ... )
fit |
An |
style |
|
fill |
Which quantity fills the bars/wedges: |
size |
For |
significant |
Logical. Grey out non-significant cells (keeping
their size). Default |
labels |
Which target cells to name: |
min_show |
For |
label_size |
Label text size. Default |
... |
Ignored; accepted so |
A ggplot object (drawn when printed).
plot.lsa() (heatmap), plot_chords() (chord),
plot_forest() (bootstrap forest)
## Not run: fit <- lsa(group_regulation) plot_polar(fit) # rose: bars filled by residual plot_polar(fit, style = "wedge") # classic frequency wedges plot_polar(fit, significant = TRUE) # non-significant cells greyed ## End(Not run)## Not run: fit <- lsa(group_regulation) plot_polar(fit) # rose: bars filled by residual plot_polar(fit, style = "wedge") # classic frequency wedges plot_polar(fit, significant = TRUE) # non-significant cells greyed ## End(Not run)
Draws the directed transition network of an lsa fit with
cograph::splot(). Pick the edge weight with weights; optionally
keep only significant edges. Nodes are white and edges are labelled
by default. Returns the cograph network invisibly.
plot_transitions( fit, weights = c("residuals", "count", "prob", "lift"), significant = FALSE, node_fill = "white", edge_labels = TRUE, ... )plot_transitions( fit, weights = c("residuals", "count", "prob", "lift"), significant = FALSE, node_fill = "white", edge_labels = TRUE, ... )
fit |
An |
weights |
Which matrix becomes the edge weight, and how it is drawn:
|
significant |
Logical. Keep only edges whose adjusted-residual
p-value is below the fit's alpha; weaker cells are set to 0 (no
edge). Default |
node_fill |
Node fill colour. Default |
edge_labels |
Logical (or a label vector). Show edge weights as
labels. Default |
... |
Passed to |
The cograph_network object, invisibly (drawn as a side
effect).
plot.lsa() (heatmap), transitions(), lsa_to_tna()
## Not run: fit <- lsa(group_regulation) plot_transitions(fit) # residual network plot_transitions(fit, weights = "prob") # probabilities plot_transitions(fit, weights = "residuals", # residual network, significant = TRUE) # significant only plot_transitions(fit, node_shape = "square") # splot passthrough ## End(Not run)## Not run: fit <- lsa(group_regulation) plot_transitions(fit) # residual network plot_transitions(fit, weights = "prob") # probabilities plot_transitions(fit, weights = "residuals", # residual network, significant = TRUE) # significant only plot_transitions(fit, node_shape = "square") # splot passthrough ## End(Not run)
One entry point for every view of a fit; pick it with type:
"heatmap" (default, the from x to residual heatmap), "network"
(transition network via cograph::splot()), "chord" (chord diagram
via cograph::plot_chord()), or "sunburst" (polar rose). Extra
arguments are forwarded to the chosen view's worker
(plot_transitions(), plot_chords(), plot_polar()); see those for
view-specific options.
## S3 method for class 'lsa' plot(x, type = c("heatmap", "network", "chord", "sunburst"), ...) ## S3 method for class 'lsa_group' plot( x, type = c("heatmap", "network", "chord", "sunburst"), combined = FALSE, ... )## S3 method for class 'lsa' plot(x, type = c("heatmap", "network", "chord", "sunburst"), ...) ## S3 method for class 'lsa_group' plot( x, type = c("heatmap", "network", "chord", "sunburst"), combined = FALSE, ... )
x |
An |
type |
Which view to draw: |
... |
Forwarded to the chosen view. For |
combined |
Logical, for a grouped fit only. |
A ggplot object for "heatmap" and "sunburst"; the
(invisible) cograph object for "network" and "chord". Drawn
when printed.
plot_transitions(), plot_chords(), plot_polar(),
plot_forest(), transitions()
## Not run: fit <- lsa(group_regulation) plot(fit) # residual heatmap (default) plot(fit, which = "prob") # heatmap of probabilities plot(fit, type = "network") # transition network plot(fit, type = "chord") # chord diagram plot(fit, type = "sunburst") # polar sunburst ## End(Not run)## Not run: fit <- lsa(group_regulation) plot(fit) # residual heatmap (default) plot(fit, which = "prob") # heatmap of probabilities plot(fit, type = "network") # transition network plot(fit, type = "chord") # chord diagram plot(fit, type = "sunburst") # polar sunburst ## End(Not run)
Circular forest of the per-edge transition-probability credible
intervals from certainty_lsa() (delegates to plot_forest() with
metric = "prob").
## S3 method for class 'lsa_certainty' plot(x, metric = "prob", ...)## S3 method for class 'lsa_certainty' plot(x, metric = "prob", ...)
x |
An |
metric |
Which credible interval to draw. Default |
... |
Passed to |
A ggplot object (drawn when printed). Needs ggplot2.
Two views of a compare_lsa() result. The default "barrel" is a
back-to-back pyramid (one row per transition): the first group's bar
runs left and the second's right, bar length is each group's
transition probability, bar colour is each group's log odds ratio
(blue = over-represented, red = avoided, following the vcd / mosaic
convention), bar ends show the observed count, and the centre chip
shows the difference p-value (bold and starred when significant). The
bar of the group with the higher value gets a border that darkens with
the size of the difference (faint = small, dark = large). For more
than two groups, one
barrel is drawn per pair via facets. The "heatmap" style draws the
signed difference as a from x to grid on the same diverging scale.
## S3 method for class 'lsa_comparison' plot( x, style = c("barrel", "heatmap"), value = c("prob", "count"), rank = c("frequency", "effect"), top_n = 12L, ... ) ## S3 method for class 'lsa_comparison_pairwise' plot( x, style = c("barrel", "heatmap"), value = c("prob", "count"), rank = c("frequency", "effect"), top_n = 12L, ... )## S3 method for class 'lsa_comparison' plot( x, style = c("barrel", "heatmap"), value = c("prob", "count"), rank = c("frequency", "effect"), top_n = 12L, ... ) ## S3 method for class 'lsa_comparison_pairwise' plot( x, style = c("barrel", "heatmap"), value = c("prob", "count"), rank = c("frequency", "effect"), top_n = 12L, ... )
x |
An |
style |
|
value |
For |
rank |
For |
top_n |
For |
... |
Reserved. |
A ggplot object (drawn when printed). Needs ggplot2.
## Not run: grp <- ifelse(group_regulation$T1 == "plan", "starts_plan", "other") g <- lsa(group_regulation, group = grp) cmp <- compare_lsa(g, R = 200) plot(cmp) # back-to-back barrel plot(cmp, style = "heatmap") # difference heatmap ## End(Not run)## Not run: grp <- ifelse(group_regulation$T1 == "plan", "starts_plan", "other") g <- lsa(group_regulation, group = grp) cmp <- compare_lsa(g, R = 200) plot(cmp) # back-to-back barrel plot(cmp, style = "heatmap") # difference heatmap ## End(Not run)
The published lag-1 transition matrix and adjusted-residual / Yule's Q output matrices for grandmother behaviour across two Beijing dual-income households, as printed in Tables 4 and 5 of Qi An, W. Xing, Y. Wang, X. Li (2026), Sustainability, 18(5), 2326, doi:10.3390/su18052326.
qi2026_grandmotherqi2026_grandmother
A named list with elements:
10 x 10 integer matrix of transition frequencies
(the paper's Table 4). sum(obs) = 1531.
10 x 10 numeric matrix of published Z-scores (Table 5, upper of each cell pair).
10 x 10 numeric matrix of published Yule's Q values (Table 5, lower of each cell pair).
Data frame of 4 cells where Table 5 disagrees
with the math computed from Table 4. Columns: from, to,
paper_printed, math_computed, category.
Named character vector mapping each code to its plain-English description.
Citation string.
Licensing note. The numerical tables are facts and are not copyrightable; the paper itself is published Open Access.
Summary metadata.
The 10 behaviour codes are:
CO (Cooking), DH (Doing housework), WO (Working),
CK (Caring for kid), SM (Self-management), EA (Eating),
CM (Communicating), ED (Education), RE (Resting),
UO (Using object).
This object is the cleanest mathematical oracle shipped with
lagseq: the published input is a complete transition matrix and
the published output is a complete residual matrix, so feeding
$obs into lsa() and comparing the result to $adj_res is a
direct correctness check. Cross-validation against
stats::chisq.test() on $obs is exact at floating-point
precision (< 1e-12).
Four cells in the paper's published Table 5 print values that do
not match the math computed from the paper's own Table 4 input.
These are documented in $known_typos together with the values
the math actually yields, so that lagseq's tests can distinguish
the paper's transcription errors from any genuine engine
regression.
Qi An, Wanli Xing, Yuzhe Wang, & Xiuyu Li. (2026). Behavioural Trajectories and Spatial Responses: A Study on Lag Sequential Analysis and Design Framework for Elderly Caregivers in Chinese Dual-Earner Households. Sustainability, 18(5), 2326. doi:10.3390/su18052326.
# Reproduce the paper's residual analysis from its own input matrix fit <- lsa(qi2026_grandmother$obs, engine = "classical") # Compare to the published Z-scores (modulo the documented typos) fit$adj_res - qi2026_grandmother$adj_res # The 4 paper typos: qi2026_grandmother$known_typos# Reproduce the paper's residual analysis from its own input matrix fit <- lsa(qi2026_grandmother$obs, engine = "classical") # Compare to the published Z-scores (modulo the documented typos) fit$adj_res - qi2026_grandmother$adj_res # The 4 paper typos: qi2026_grandmother$known_typos
Adds a new engine to the lagseq registry so it can be referenced by
name via lsa(..., engine = "<name>"). Built-in engines
("classical", "two_cell", "bidirectional",
"parallel_dominance", "nonparallel_dominance") are registered
automatically when the package loads.
register_lsa_engine(name, fn, description, requires = character())register_lsa_engine(name, fn, description, requires = character())
name |
Character scalar. The engine's identifier as used in
|
fn |
A function. Must accept a |
description |
Character scalar. One-line human-readable
description shown by |
requires |
Character vector. Names of packages the engine depends on. Empty by default. |
Invisibly returns name.
get_lsa_engine(), list_lsa_engines(),
unregister_lsa_engine(), lsa()
## Not run: my_engine <- function(transitions, ...) { # ... compute and return a list with obs, exp, prob, adj_res, p } register_lsa_engine("my_engine", my_engine, "Custom LSA variant") ## End(Not run)## Not run: my_engine <- function(transitions, ...) { # ... compute and return a list with obs, exp, prob, adj_res, p } register_lsa_engine("my_engine", my_engine, "Custom LSA variant") ## End(Not run)
Estimates the reliability of an LSA network by repeated random split-half resampling of sequences. Each replicate draws two disjoint halves of the sequences without replacement, refits the engine on each half, and computes the correlation between the two half-network edge-weight vectors. Returns the distribution of replicate correlations plus a point summary.
reliability_lsa(fit, ...) ## S3 method for class 'lsa' reliability_lsa( fit, R = 100L, weights = c("prob", "count", "adj_res"), method = c("pearson", "spearman"), parallel = FALSE, n_cores = NULL, verbose = FALSE, ... ) ## S3 method for class 'lsa_group' reliability_lsa(fit, ...)reliability_lsa(fit, ...) ## S3 method for class 'lsa' reliability_lsa( fit, R = 100L, weights = c("prob", "count", "adj_res"), method = c("pearson", "spearman"), parallel = FALSE, n_cores = NULL, verbose = FALSE, ... ) ## S3 method for class 'lsa_group' reliability_lsa(fit, ...)
fit |
An |
... |
Reserved. |
R |
Integer. Number of split-half replicates. Default |
weights |
Character. Which edge matrix to correlate across
halves: |
method |
Character. Correlation method: |
parallel |
Logical. Use multi-core resampling. Default |
n_cores |
Integer. Worker count when |
verbose |
Logical. Print progress every 100 replicates. |
An object of class c("lsa_reliability", "list") with:
Numeric vector of length R: the split-half
correlation of each replicate.
Mean and standard deviation of the finite replicate correlations.
Empirical 2.5% and 97.5% quantiles.
Recipe metadata.
Reference to the original fit.
Epskamp, S., Borsboom, D., & Fried, E. I. (2018). Estimating psychological networks and their accuracy: A tutorial paper. Behavior Research Methods, 50(1), 195-212.
For a grouped fit (lsa_group), reliability is estimated separately
within each group and the per-group lsa_reliability objects are
returned in an lsa_reliability_group container with its own print
method.
bootstrap_lsa(), stability_lsa(), permute_lsa()
fit <- lsa(engagement, engine = "classical") rel <- reliability_lsa(fit, R = 50) relfit <- lsa(engagement, engine = "classical") rel <- reliability_lsa(fit, R = 50) rel
Resamples the underlying sequence data without replacement at
a specified retention proportion, refits the engine on each
subsample, and records which edges remain significant at the
recipe's alpha threshold. Returns a per-edge "stability"
proportion: the fraction of subsamples in which the edge was
significant. Edges with stability >= min_stable (default
0.95) are flagged as robust.
stability_lsa( fit, R = 500L, proportion = 0.8, min_stable = 0.95, parallel = FALSE, n_cores = NULL, verbose = FALSE, ... )stability_lsa( fit, R = 500L, proportion = 0.8, min_stable = 0.95, parallel = FALSE, n_cores = NULL, verbose = FALSE, ... )
fit |
An |
R |
Integer. Number of subsamples. Default |
proportion |
Numeric in (0, 1). Fraction of cases retained
per subsample. Default |
min_stable |
Numeric in (0, 1). Stability threshold for
the |
parallel |
Logical. Use multi-core resampling. Default
|
n_cores |
Integer. Worker count when |
verbose |
Logical. Print progress every 100 replicates. |
... |
Reserved. |
An object of class c("lsa_stability", "list") with:
Tidy per-edge data frame with observed_sig
(whether the cell was significant in the original fit),
stability (fraction of subsamples in which the cell was
significant), and stable (stability >= min_stable).
R x K^2 0/1 matrix recording per-cell
significance across replicates.
Recipe metadata.
Reference to the original fit.
bootstrap_lsa(), permute_lsa()
fit <- lsa(engagement, engine = "classical") st <- stability_lsa(fit, R = 100) head(as.data.frame(st))fit <- lsa(engagement, engine = "classical") st <- stability_lsa(fit, R = 100) head(as.data.frame(st))
The global tests of independence (likelihood-ratio G^2 and Pearson
chi-square) as a one-row-per-test data.frame.
tests(fit) ## S3 method for class 'lsa' tests(fit) ## S3 method for class 'lsa_group' tests(fit)tests(fit) ## S3 method for class 'lsa' tests(fit) ## S3 method for class 'lsa_group' tests(fit)
fit |
An |
A data.frame with columns test ("lrx2" / "x2"),
statistic, df, p. Tests the engine did not compute are
omitted.
tests(lsa(group_regulation))tests(lsa(group_regulation))
The canonical way to read a fit's transitions as a tidy
one-row-per-transition data.frame. transitions(fit) returns every
transition; the arguments narrow it.
transitions( fit, significant = FALSE, direction = c("any", "over", "under"), min_count = NULL, alpha = NULL ) ## S3 method for class 'lsa' transitions( fit, significant = FALSE, direction = c("any", "over", "under"), min_count = NULL, alpha = NULL ) ## S3 method for class 'lsa_group' transitions( fit, significant = FALSE, direction = c("any", "over", "under"), min_count = NULL, alpha = NULL )transitions( fit, significant = FALSE, direction = c("any", "over", "under"), min_count = NULL, alpha = NULL ) ## S3 method for class 'lsa' transitions( fit, significant = FALSE, direction = c("any", "over", "under"), min_count = NULL, alpha = NULL ) ## S3 method for class 'lsa_group' transitions( fit, significant = FALSE, direction = c("any", "over", "under"), min_count = NULL, alpha = NULL )
fit |
An |
significant |
Logical. Keep only transitions whose adjusted-
residual p-value is below |
direction |
One of |
min_count |
Optional integer. Keep only transitions observed at
least this many times. Default |
alpha |
Significance threshold. Default |
A data.frame, one row per transition, with columns from,
to (the source and target state names), lag, count,
expected, prob (row-conditional), prob_col (column-
conditional), adj_res, p, yules_q, kappa, kappa_z,
kappa_p, lift, sign, significant. Engines that compute extra
per-cell statistics append them as further columns (e.g. the two-cell
engine adds odds_ratio, log_or, log_or_se). A grouped fit gains a
leading group column. Row names are reset.
fit <- lsa(group_regulation) transitions(fit) # all transitions transitions(fit, significant = TRUE) # significant ones transitions(fit, direction = "over") # over-represented transitions(fit, min_count = 500) # frequently observedfit <- lsa(group_regulation) transitions(fit) # all transitions transitions(fit, significant = TRUE) # significant ones transitions(fit, direction = "over") # over-represented transitions(fit, min_count = 500) # frequently observed
Remove a Registered LSA Engine
unregister_lsa_engine(name)unregister_lsa_engine(name)
name |
Character scalar. The engine's identifier. |
Invisibly NULL.