cooccurrence() gains a window = parameter for sliding-window
co-occurrence on ordered sequence input (lists of vectors and wide
TraMineR-style data frames via field = "all"). Each window of
w consecutive positions becomes one transaction; states inside
the same window co-occur. window = sequence_length reduces to
the existing bag-of-states behaviour. TraMineR void markers (NA,
"", "%", "*", "NaN") are dropped before windowing. Pure
base R via embed().cooccurrence() gains aggregate_by = and aggregate =
parameters. aggregate_by groups the data by a column (e.g.
journal id), computes the per-group co-occurrence network with
whatever similarity / counting / scale / window was
chosen, then combines the per-group edge weights into ONE final
network. aggregate = c("sum", "mean", "min", "max") controls
the combiner; count is always summed. Differs from split_by,
which keeps groups separate. Cannot be combined with split_by.
threshold and top_n apply after aggregation.counting = "attention" adds positional gap-decay weighting:
each pair within an ordered transaction contributes
exp(-|pos_i - pos_j| / lambda) to the edge. Closer positions
give a stronger edge, distant pairs decay. Matches the
tna::build_model(type = "attention") semantics, undirected.
The new lambda = parameter (default 1.0) controls the decay
rate.cooccur (GitHub-only
development name) to cooccure to avoid collision with an unrelated
archived CRAN package of the same name.Matrix::sparseMatrix /
Matrix::crossprod. The previous implementation allocated dense
n x k and k x k matrices, which hit R's vector memory limit on
corpora with many documents and items (e.g. citation networks
~100k unique references). The sparse rewrite stays in triplet form end-to-end and scales linearly with the number of non-zero co-occurrences.
attr(x, "matrix") and attr(x, "raw_matrix") are now sparse Matrix
objects. as_matrix() densifies them on demand so existing downstream
code keeps working.Matrix to Imports..co_parse_delimited, .co_parse_multi_delimited)
vectorised: trimws(), NA/empty filtering, and per-row deduplication
now run as single C-level calls over the flattened token vector rather
than as per-row R calls. Cuts overall runtime on a 166k-row x
20-items-per-row citation corpus from ~6.2 s to ~3.4 s (~1.8x faster).cooccur on GitHub).cooccurrence() (alias co()) builds co-occurrence networks from six
input formats: delimited fields, multi-column delimited, long/bipartite,
binary matrices, wide sequences (field = "all"), and lists of
character vectors.none, jaccard, cosine, inclusion,
association, dice, equivalence, relative.minmax, log, log10, binary, zscore,
sqrt, proportion, none.counting = "fractional" implements the Perianes-Rodriguez et al.
(2016) weighted pair count.weight_by parameter supports weighted long-format input (e.g. LDA
topic-document probabilities).split_by computes a separate network per group and returns a combined
edge data frame.output argument returns edges in default, Gephi, igraph, cograph,
or matrix form.as_matrix(), as_igraph(), as_tidygraph(),
as_cograph(), as_netobject().launch_app().