| Title: | Co-Occurrence Network Construction and Manipulation |
|---|---|
| Description: | Constructs co-occurrence networks from several types of input data, such as delimited fields, long/bipartite tables, binary matrices, or wide sequences. Returns tidy edge data frames and supports optional scaling, splitting into several networks, thresholding, and subsetting. Provides eight similarity measures, including Jaccard, cosine, and association strength. Supports export to several network and file formats. Network construction and analysis methods follow Saqr, Lopez-Pernas, Conde, and Hernandez-Garcia (2024, <doi:10.1007/978-3-031-54464-4_15>). |
| Authors: | Mohammed Saqr [aut, cre, cph], Sonsoles López-Pernas [aut, cph], Kamila Misiejuk [aut, cph] |
| Maintainer: | Mohammed Saqr <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.2 |
| Built: | 2026-06-09 05:57:57 UTC |
| Source: | https://github.com/mohsaqr/cooccure |
Long-format table mapping each of the 624 actors in actors
to every genre of every movie they appeared in. Use this to build an
actor co-occurrence network grouped by genre: which actors share the
same genres? Pass field = "actor" and by = "genre" to
cooccurrence.
actor_genresactor_genres
A data frame with 2,502 rows and 2 variables:
Actor name.
Genre label (one row per actor-genre combination).
https://developer.imdb.com/non-commercial-datasets/
head(actor_genres) cooccurrence(actor_genres, field = "actor", by = "genre", similarity = "jaccard")head(actor_genres) cooccurrence(actor_genres, field = "actor", by = "genre", similarity = "jaccard")
Long-format bipartite table linking actors to movies in
movies. Pre-filtered to the 624 actors who appear in at
least two movies, so all similarity measures compute instantly.
Pass field = "actor" and by = "tconst" to
cooccurrence to build an actor co-appearance network.
actorsactors
A data frame with 1,267 rows and 7 variables:
Actor name.
IMDB title identifier linking to movies.
Movie title.
Release year (integer).
Release decade as a character string.
Comma-separated genre labels for the linked movie.
IMDB average user rating for the linked movie.
https://developer.imdb.com/non-commercial-datasets/
head(actors) cooccurrence(actors, field = "actor", by = "tconst", similarity = "jaccard")head(actors) cooccurrence(actors, field = "actor", by = "tconst", similarity = "jaccard")
Creates a cograph_network object from a cooccurrence
edge list, compatible with cograph::splot() and other cograph
functions.
as_cograph(x, ...) ## S3 method for class 'cooccurrence' as_cograph(x, ...)as_cograph(x, ...) ## S3 method for class 'cooccurrence' as_cograph(x, ...)
x |
A |
... |
Ignored. |
A cograph_network object.
res <- cooccurrence(list(c("A","B","C"), c("B","C"), c("A","C"))) if (requireNamespace("cograph", quietly = TRUE)) { net <- as_cograph(res) net$n_nodes }res <- cooccurrence(list(c("A","B","C"), c("B","C"), c("A","C"))) if (requireNamespace("cograph", quietly = TRUE)) { net <- as_cograph(res) net$n_nodes }
Creates an undirected, weighted igraph graph from a
cooccurrence edge list.
as_igraph(x, ...) ## S3 method for class 'cooccurrence' as_igraph(x, ...)as_igraph(x, ...) ## S3 method for class 'cooccurrence' as_igraph(x, ...)
x |
A |
... |
Passed to |
An igraph object.
res <- cooccurrence(list(c("A","B","C"), c("B","C"), c("A","C"))) if (requireNamespace("igraph", quietly = TRUE)) { g <- as_igraph(res) igraph::vcount(g) }res <- cooccurrence(list(c("A","B","C"), c("B","C"), c("A","C"))) if (requireNamespace("igraph", quietly = TRUE)) { g <- as_igraph(res) igraph::vcount(g) }
Returns the full square co-occurrence matrix (normalized + scaled).
Use type = "raw" for the raw count matrix.
as_matrix(x, ...) ## S3 method for class 'cooccurrence' as_matrix(x, type = c("normalized", "raw"), ...)as_matrix(x, ...) ## S3 method for class 'cooccurrence' as_matrix(x, type = c("normalized", "raw"), ...)
x |
A |
... |
Ignored. |
type |
Character. |
A numeric matrix.
res <- cooccurrence(list(c("A","B","C"), c("B","C"), c("A","C"))) as_matrix(res) as_matrix(res, type = "raw")res <- cooccurrence(list(c("A","B","C"), c("B","C"), c("A","C"))) as_matrix(res) as_matrix(res, type = "raw")
Creates a netobject from a cooccurrence edge list,
compatible with Nestimate::centrality(),
Nestimate::bootstrap_network(), etc.
as_netobject(x, ...) ## S3 method for class 'cooccurrence' as_netobject(x, ...)as_netobject(x, ...) ## S3 method for class 'cooccurrence' as_netobject(x, ...)
x |
A |
... |
Ignored. |
A netobject with class c("netobject", "cograph_network").
res <- cooccurrence(list(c("A","B","C"), c("B","C"), c("A","C"))) if (requireNamespace("Nestimate", quietly = TRUE)) { net <- as_netobject(res) net$n_nodes }res <- cooccurrence(list(c("A","B","C"), c("B","C"), c("A","C"))) if (requireNamespace("Nestimate", quietly = TRUE)) { net <- as_netobject(res) net$n_nodes }
Creates a tbl_graph from a cooccurrence edge list.
as_tidygraph(x, ...) ## S3 method for class 'cooccurrence' as_tidygraph(x, ...)as_tidygraph(x, ...) ## S3 method for class 'cooccurrence' as_tidygraph(x, ...)
x |
A |
... |
Ignored. |
A tbl_graph object.
res <- cooccurrence(list(c("A","B","C"), c("B","C"), c("A","C"))) if (requireNamespace("tidygraph", quietly = TRUE) && requireNamespace("igraph", quietly = TRUE)) { as_tidygraph(res) }res <- cooccurrence(list(c("A","B","C"), c("B","C"), c("A","C"))) if (requireNamespace("tidygraph", quietly = TRUE) && requireNamespace("igraph", quietly = TRUE)) { as_tidygraph(res) }
Constructs an undirected co-occurrence network from various input formats and returns a tidy edge data frame. Argument names follow the citenets convention.
cooccurrence( data, field = NULL, by = NULL, sep = NULL, weight_by = NULL, split_by = NULL, aggregate_by = NULL, aggregate = c("sum", "mean", "min", "max"), window = NULL, similarity = c("none", "jaccard", "cosine", "inclusion", "association", "dice", "equivalence", "relative"), counting = c("full", "fractional", "attention"), lambda = 1, scale = NULL, threshold = 0, min_occur = 1L, top_n = NULL, output = c("default", "gephi", "igraph", "cograph", "matrix"), ... ) co( data, field = NULL, by = NULL, sep = NULL, weight_by = NULL, split_by = NULL, aggregate_by = NULL, aggregate = c("sum", "mean", "min", "max"), window = NULL, similarity = c("none", "jaccard", "cosine", "inclusion", "association", "dice", "equivalence", "relative"), counting = c("full", "fractional", "attention"), lambda = 1, scale = NULL, threshold = 0, min_occur = 1L, top_n = NULL, output = c("default", "gephi", "igraph", "cograph", "matrix"), ... )cooccurrence( data, field = NULL, by = NULL, sep = NULL, weight_by = NULL, split_by = NULL, aggregate_by = NULL, aggregate = c("sum", "mean", "min", "max"), window = NULL, similarity = c("none", "jaccard", "cosine", "inclusion", "association", "dice", "equivalence", "relative"), counting = c("full", "fractional", "attention"), lambda = 1, scale = NULL, threshold = 0, min_occur = 1L, top_n = NULL, output = c("default", "gephi", "igraph", "cograph", "matrix"), ... ) co( data, field = NULL, by = NULL, sep = NULL, weight_by = NULL, split_by = NULL, aggregate_by = NULL, aggregate = c("sum", "mean", "min", "max"), window = NULL, similarity = c("none", "jaccard", "cosine", "inclusion", "association", "dice", "equivalence", "relative"), counting = c("full", "fractional", "attention"), lambda = 1, scale = NULL, threshold = 0, min_occur = 1L, top_n = NULL, output = c("default", "gephi", "igraph", "cograph", "matrix"), ... )
data |
Input data. Accepts:
|
field |
Character. The entity column — determines what the nodes are.
For delimited format, a single column split by |
by |
Character or |
sep |
Character or |
weight_by |
Character or |
split_by |
Character or |
aggregate_by |
Character or |
aggregate |
Character. How to combine edge weights across
groups when |
window |
Integer or |
similarity |
Character. Similarity measure:
|
counting |
Character. Counting method:
|
lambda |
Numeric. Decay rate for |
scale |
Character or
|
threshold |
Numeric. Minimum edge weight to retain. Applied after similarity and scaling. Default 0. |
min_occur |
Integer. Minimum entity frequency. Entities appearing in
fewer than |
top_n |
Integer or |
output |
Character. Column naming convention for the output:
|
... |
Currently unused. |
Depends on output:
"default": A cooccurrence data frame with columns
from, to, weight, count (and group
when split_by is used).
"gephi": A data frame with columns Source,
Target, Weight, Type, Count. Ready for
Gephi CSV import.
"igraph": An igraph graph object.
"cograph": A cograph_network object.
"matrix": A square numeric co-occurrence matrix.
For the data frame outputs, rows are sorted by weight descending and attributes store the full matrix, item frequencies, and parameters.
van Eck, N. J., & Waltman, L. (2009). How to normalize co-occurrence data? An analysis of some well-known similarity measures. Journal of the American Society for Information Science and Technology, 60(8), 1635–1651.
# Delimited keywords df <- data.frame( id = 1:4, keywords = c("network; graph", "graph; matrix; network", "matrix; algebra", "network; algebra; graph") ) cooccurrence(df, field = "keywords", sep = ";") # Split by a grouping variable df$year <- c(2020, 2020, 2021, 2021) cooccurrence(df, field = "keywords", sep = ";", split_by = "year") # List of transactions with Jaccard similarity cooccurrence(list(c("A","B","C"), c("B","C"), c("A","C")), similarity = "jaccard") # Short alias co(df, field = "keywords", sep = ";", similarity = "cosine") # Windowed co-occurrence on a categorical time series. With # window = 2 only adjacent states co-occur; window = 3 also pairs # states two positions apart, etc. seqs <- list( c("focus", "focus", "distract", "focus", "confused"), c("focus", "distract", "distract", "focus") ) cooccurrence(seqs, window = 2) # Weighted long format (e.g. LDA topic-document probabilities) theta <- data.frame( doc = c("d1","d1","d1","d2","d2","d3","d3"), topic = c("T1","T2","T3","T1","T3","T2","T3"), prob = c(0.6, 0.3, 0.1, 0.4, 0.6, 0.5, 0.5) ) cooccurrence(theta, field = "topic", by = "doc", weight_by = "prob")# Delimited keywords df <- data.frame( id = 1:4, keywords = c("network; graph", "graph; matrix; network", "matrix; algebra", "network; algebra; graph") ) cooccurrence(df, field = "keywords", sep = ";") # Split by a grouping variable df$year <- c(2020, 2020, 2021, 2021) cooccurrence(df, field = "keywords", sep = ";", split_by = "year") # List of transactions with Jaccard similarity cooccurrence(list(c("A","B","C"), c("B","C"), c("A","C")), similarity = "jaccard") # Short alias co(df, field = "keywords", sep = ";", similarity = "cosine") # Windowed co-occurrence on a categorical time series. With # window = 2 only adjacent states co-occur; window = 3 also pairs # states two positions apart, etc. seqs <- list( c("focus", "focus", "distract", "focus", "confused"), c("focus", "distract", "distract", "focus") ) cooccurrence(seqs, window = 2) # Weighted long format (e.g. LDA topic-document probabilities) theta <- data.frame( doc = c("d1","d1","d1","d2","d2","d3","d3"), topic = c("T1","T2","T3","T1","T3","T2","T3"), prob = c(0.6, 0.3, 0.1, 0.4, 0.6, 0.5, 0.5) ) cooccurrence(theta, field = "topic", by = "doc", weight_by = "prob")
A small hand-crafted dataset of 30 well-known actors across 10 classic
films with genre labels. Designed for quick exploration in the Shiny app.
Use field = "actor" with by = "movie" or by = "genre".
demodemo
A data frame with 34 rows and 3 variables:
Movie title.
Actor name.
Primary genre label.
head(demo) cooccurrence(demo, field = "actor", by = "movie", similarity = "jaccard")head(demo) cooccurrence(demo, field = "actor", by = "movie", similarity = "jaccard")
Opens an interactive Shiny application for building and exploring co-occurrence networks. Requires the shiny and DT packages.
launch_app(...)launch_app(...)
... |
Passed to |
Called for its side effect (launches the app). No return value.
if (interactive()) { launch_app() }if (interactive()) { launch_app() }
A sample of 1,000 highly-rated IMDB movies (rating >= 7.0, >= 1,000 votes)
released between 1970 and 2024. The genres column is comma-delimited
and suitable for use as the field argument to cooccurrence.
moviesmovies
A data frame with 1,000 rows and 7 variables:
IMDB title identifier (e.g. "tt0068646").
Movie title.
Release year (integer).
Comma-separated genre labels (e.g. "Crime,Drama").
Release decade as a character string (e.g. "1970s").
IMDB average user rating.
Number of IMDB user votes.
https://developer.imdb.com/non-commercial-datasets/
head(movies) cooccurrence(movies, field = "genres", sep = ",", similarity = "jaccard")head(movies) cooccurrence(movies, field = "genres", sep = ",", similarity = "jaccard")
Plots the co-occurrence matrix as a heatmap. If igraph is available, plots a network graph instead.
## S3 method for class 'cooccurrence' plot(x, type = c("heatmap", "network"), ...)## S3 method for class 'cooccurrence' plot(x, type = c("heatmap", "network"), ...)
x |
A |
type |
Character. |
... |
Passed to the plotting function. |
Invisibly returns x.
res <- cooccurrence(list(c("A","B","C"), c("B","C"), c("A","C"))) plot(res)res <- cooccurrence(list(c("A","B","C"), c("B","C"), c("A","C"))) plot(res)
Print a cooccurrence edge list
## S3 method for class 'cooccurrence' print(x, n = 10L, ...)## S3 method for class 'cooccurrence' print(x, n = 10L, ...)
x |
A |
n |
Integer. Number of rows to show. Default 10. |
... |
Ignored. |
Invisibly returns x.
res <- cooccurrence(list(c("A","B","C"), c("B","C"), c("A","C"))) print(res)res <- cooccurrence(list(c("A","B","C"), c("B","C"), c("A","C"))) print(res)
Summarise a cooccurrence network
## S3 method for class 'cooccurrence' summary(object, ...)## S3 method for class 'cooccurrence' summary(object, ...)
object |
A |
... |
Ignored. |
Invisibly returns object.
res <- cooccurrence(list(c("A","B","C"), c("B","C"), c("A","C"))) summary(res)res <- cooccurrence(list(c("A","B","C"), c("B","C"), c("A","C"))) summary(res)