Package 'SCORPIUS' reference manual

Title:	Inferring Developmental Chronologies from Single-Cell RNA Sequencing Data
Description:	An accurate and easy tool for performing linear trajectory inference on single cells using single-cell RNA sequencing data. In addition, 'SCORPIUS' provides functions for discovering the most important genes with respect to the reconstructed trajectory, as well as nice visualisation tools. Cannoodt et al. (2016) <doi:10.1101/079509>.
Authors:	Robrecht Cannoodt [aut, cre] (<https://orcid.org/0000-0003-3641-729X>, rcannood), Wouter Saelens [ctb] (<https://orcid.org/0000-0002-7114-6248>, zouter)
Maintainer:	Robrecht Cannoodt <[email protected]>
License:	GPL-3
Version:	1.0.9
Built:	2025-03-18 06:22:43 UTC
Source:	https://github.com/rcannood/scorpius

SCORPIUS: Trajectory inference from single-cell RNA sequencing data.

Description

SCORPIUS orders single cells with regard to an implicit timeline, such as cellular development or progression over time.

References

Cannoodt R. et al., SCORPIUS improves trajectory inference and identifies novel modules in dendritic cell development, bioRxiv (Oct., 2016). doi:10.1101/079509 (PDF).

Examples

## Load dataset from Schlitzer et al., 2015
data("ginhoux")

## Reduce dimensionality and infer trajectory with SCORPIUS
space <- reduce_dimensionality(ginhoux$expression, "spearman")
traj <- infer_trajectory(space)

## Visualise
draw_trajectory_plot(
  space,
  path = traj$path,
  progression_group = ginhoux$sample_info$group_name
)
## Load dataset from Schlitzer et al., 2015
data("ginhoux")

## Reduce dimensionality and infer trajectory with SCORPIUS
space <- reduce_dimensionality(ginhoux$expression, "spearman")
traj <- infer_trajectory(space)

## Visualise
draw_trajectory_plot(
  space,
  path = traj$path,
  progression_group = ginhoux$sample_info$group_name
)

Draw time-series heatmap

Description

draw_trajectory_heatmap draws a heatmap in which the samples are ranked according their position in an inferred trajectory. In addition, the progression groups and feature modules can be passed along to further enhance the visualisation.

Usage

draw_trajectory_heatmap(
  x,
  time,
  progression_group = NULL,
  modules = NULL,
  show_labels_row = FALSE,
  show_labels_col = FALSE,
  scale_features = TRUE,
  progression_group_palette = NULL,
  ...
)
draw_trajectory_heatmap(
  x,
  time,
  progression_group = NULL,
  modules = NULL,
  show_labels_row = FALSE,
  show_labels_col = FALSE,
  scale_features = TRUE,
  progression_group_palette = NULL,
  ...
)

Arguments

`x`	A numeric matrix or a data frame with one row per sample and one column per feature.
`time`	A numeric vector containing the inferred time points of each sample along a trajectory.
`progression_group`	`NULL` or a vector (or factor) containing the groupings of the samples (default `NULL`).
`modules`	`NULL` or a data frame as returned by `extract_modules`.
`show_labels_row`	`TRUE` if the labels of the rows are to be plotted (default `FALSE`).
`show_labels_col`	`TRUE` if the labels of the cols are to be plotted (default `FALSE`).
`scale_features`	`TRUE` if the values of each feature is to be scaled (default `TRUE`).
`progression_group_palette`	A named vector palette for the progression group.
`...`	Optional arguments to `pheatmap`

Value

The output of the pheatmap function.

Examples


## Generate a dataset
dataset <- generate_dataset(num_genes=500, num_samples=300, num_groups=4)
expression <- dataset$expression
space <- reduce_dimensionality(expression, ndim=2)
groups <- dataset$sample_info$group_name
traj <- infer_trajectory(space)
time <- traj$time

gimp <- gene_importances(expression, traj$time, num_permutations = 0, ntree = 10000)
gene_sel <- gimp[1:50,]
expr_sel <- expression[,gene_sel$gene]

## Draw a time series heatmap
draw_trajectory_heatmap(expr_sel, time)

## Also show the progression groupings
draw_trajectory_heatmap(expr_sel, time, progression_group=groups)

## Use a different palette
draw_trajectory_heatmap(
  expr_sel, time, progression_group=groups,
  progression_group_palette = setNames(RColorBrewer::brewer.pal(4, "Set2"), paste0("Group ", 1:4))
)

## Group the genes into modules and visualise the modules in a heatmap
modules <- extract_modules(scale_quantile(expr_sel))
draw_trajectory_heatmap(expr_sel, time, progression_group=groups, modules=modules)

## Generate a dataset
dataset <- generate_dataset(num_genes=500, num_samples=300, num_groups=4)
expression <- dataset$expression
space <- reduce_dimensionality(expression, ndim=2)
groups <- dataset$sample_info$group_name
traj <- infer_trajectory(space)
time <- traj$time

gimp <- gene_importances(expression, traj$time, num_permutations = 0, ntree = 10000)
gene_sel <- gimp[1:50,]
expr_sel <- expression[,gene_sel$gene]

## Draw a time series heatmap
draw_trajectory_heatmap(expr_sel, time)

## Also show the progression groupings
draw_trajectory_heatmap(expr_sel, time, progression_group=groups)

## Use a different palette
draw_trajectory_heatmap(
  expr_sel, time, progression_group=groups,
  progression_group_palette = setNames(RColorBrewer::brewer.pal(4, "Set2"), paste0("Group ", 1:4))
)

## Group the genes into modules and visualise the modules in a heatmap
modules <- extract_modules(scale_quantile(expr_sel))
draw_trajectory_heatmap(expr_sel, time, progression_group=groups, modules=modules)

Visualise SCORPIUS

Description

draw_trajectory_plot is used to plot samples after performing dimensionality reduction. Additional arguments can be provided to colour the samples, plot the trajectory inferred by SCORPIUS, and draw a contour around the samples.

Usage

draw_trajectory_plot(
  space,
  progression_group = NULL,
  path = NULL,
  contour = FALSE,
  progression_group_palette = NULL,
  point_size = 2,
  point_alpha = 1,
  path_size = 0.5,
  path_alpha = 1,
  contour_alpha = 0.2
)
draw_trajectory_plot(
  space,
  progression_group = NULL,
  path = NULL,
  contour = FALSE,
  progression_group_palette = NULL,
  point_size = 2,
  point_alpha = 1,
  path_size = 0.5,
  path_alpha = 1,
  contour_alpha = 0.2
)

Arguments

`space`	A numeric matrix or a data frame containing the coordinates of samples.
`progression_group`	`NULL` or a vector (or factor) containing the groupings of the samples (default `NULL`).
`path`	A numeric matrix or a data frame containing the coordinates of the inferred path.
`contour`	`TRUE` if contours are to be drawn around the samples.
`progression_group_palette`	A named vector palette for the progression group.
`point_size`	The size of the points.
`point_alpha`	The alpha of the points.
`path_size`	The size of the path (if any).
`path_alpha`	The alpha of the path (if any).
`contour_alpha`	The alpha of the contour (if any).

Value

A ggplot2 plot.

Examples

## Generate a synthetic dataset
dataset <- generate_dataset(num_genes = 500, num_samples = 300, num_groups = 4)
space <- reduce_dimensionality(dataset$expression, ndim = 2)
groups <- dataset$sample_info$group_name

## Simply plot the samples
draw_trajectory_plot(space)

## Colour each sample according to its group
draw_trajectory_plot(space, progression_group = groups)

## Add contours to the plot
draw_trajectory_plot(space, progression_group = groups, contour = TRUE)

## Plot contours without colours
draw_trajectory_plot(space, contour = TRUE)

## Infer a trajectory and plot it
traj <- infer_trajectory(space)
draw_trajectory_plot(space, progression_group = groups, path = traj$path)
draw_trajectory_plot(space, progression_group = groups, path = traj$path, contour = TRUE)

## Visualise gene expression
draw_trajectory_plot(space, progression_group = dataset$expression[,1])
## Generate a synthetic dataset
dataset <- generate_dataset(num_genes = 500, num_samples = 300, num_groups = 4)
space <- reduce_dimensionality(dataset$expression, ndim = 2)
groups <- dataset$sample_info$group_name

## Simply plot the samples
draw_trajectory_plot(space)

## Colour each sample according to its group
draw_trajectory_plot(space, progression_group = groups)

## Add contours to the plot
draw_trajectory_plot(space, progression_group = groups, contour = TRUE)

## Plot contours without colours
draw_trajectory_plot(space, contour = TRUE)

## Infer a trajectory and plot it
traj <- infer_trajectory(space)
draw_trajectory_plot(space, progression_group = groups, path = traj$path)
draw_trajectory_plot(space, progression_group = groups, path = traj$path, contour = TRUE)

## Visualise gene expression
draw_trajectory_plot(space, progression_group = dataset$expression[,1])

Extract modules of features

Description

extract_modules uses adaptive branch pruning to extract modules of features, which is typically done on the smoothed expression returned by gene_importances.

Usage

extract_modules(
  x,
  time = NULL,
  suppress_warnings = FALSE,
  verbose = FALSE,
  ...
)
extract_modules(
  x,
  time = NULL,
  suppress_warnings = FALSE,
  verbose = FALSE,
  ...
)

Arguments

`x`	A numeric matrix or a data frame with M rows (one per sample) and P columns (one per feature).
`time`	(Optional) Order the modules according to a pseudotime
`suppress_warnings`	Whether or not to suppress warnings when P > 1000
`verbose`	Whether or not Mclust will print output or not
`...`	Extra parameters passed to `Mclust`

Value

A data frame containing meta-data for the features in x, namely the order in which to visualise the features in and which module they belong to.

Examples

## Generate a dataset and visualise
dataset <- generate_dataset(num_genes=300, num_samples=200, num_groups=4)
expression <- dataset$expression
group_name <- dataset$sample_info$group_name
space <- reduce_dimensionality(expression, ndim=2)
traj <- infer_trajectory(space)
time <- traj$time
draw_trajectory_plot(space, path=traj$path, group_name)

## Select most important genes (set ntree to at least 10000!)
gimp <- gene_importances(expression, traj$time, num_permutations = 0, ntree = 1000)
gene_sel <- gimp[1:50,]
expr_sel <- expression[,gene_sel$gene]

## Group the genes into modules and visualise the modules in a heatmap
modules <- extract_modules(scale_quantile(expr_sel))
draw_trajectory_heatmap(expr_sel, time, group_name, modules)
## Generate a dataset and visualise
dataset <- generate_dataset(num_genes=300, num_samples=200, num_groups=4)
expression <- dataset$expression
group_name <- dataset$sample_info$group_name
space <- reduce_dimensionality(expression, ndim=2)
traj <- infer_trajectory(space)
time <- traj$time
draw_trajectory_plot(space, path=traj$path, group_name)

## Select most important genes (set ntree to at least 10000!)
gimp <- gene_importances(expression, traj$time, num_permutations = 0, ntree = 1000)
gene_sel <- gimp[1:50,]
expr_sel <- expression[,gene_sel$gene]

## Group the genes into modules and visualise the modules in a heatmap
modules <- extract_modules(scale_quantile(expr_sel))
draw_trajectory_heatmap(expr_sel, time, group_name, modules)

Calculate the importance of a feature

Description

Calculates the feature importance of each column in x in trying to predict the time ordering.

Usage

gene_importances(
  x,
  time,
  num_permutations = 0,
  ntree = 10000,
  ntree_perm = ntree/10,
  mtry = ncol(x) * 0.01,
  num_threads = 1,
  ...
)
gene_importances(
  x,
  time,
  num_permutations = 0,
  ntree = 10000,
  ntree_perm = ntree/10,
  mtry = ncol(x) * 0.01,
  num_threads = 1,
  ...
)

Arguments

`x`	A numeric matrix or a data frame with M rows (one per sample) and P columns (one per feature).
`time`	A numeric vector containing the inferred time points of each sample along a trajectory as returned by `infer_trajectory`.
`num_permutations`	The number of permutations to test against for calculating the p-values (default: 0).
`ntree`	The number of trees to grow (default: 10000).
`ntree_perm`	The number of trees to grow for each of the permutations (default: ntree / 10).
`mtry`	The number of variables randomly samples at each split (default: 1% of features).
`num_threads`	Number of threads. Default is 1.
`...`	Extra parameters passed to `ranger`.

Value

a data frame containing the importance of each feature for the given time line

Examples

dataset <- generate_dataset(num_genes=500, num_samples=300, num_groups=4)
expression <- dataset$expression
group_name <- dataset$sample_info$group_name
space <- reduce_dimensionality(expression, ndim=2)
traj <- infer_trajectory(space)
# set ntree to at least 1000!
gene_importances(expression, traj$time, num_permutations = 0, ntree = 1000)
dataset <- generate_dataset(num_genes=500, num_samples=300, num_groups=4)
expression <- dataset$expression
group_name <- dataset$sample_info$group_name
space <- reduce_dimensionality(expression, ndim=2)
traj <- infer_trajectory(space)
# set ntree to at least 1000!
gene_importances(expression, traj$time, num_permutations = 0, ntree = 1000)

Generate a synthetic dataset

Description

generate_dataset generates an synthetic dataset which can be used for visualisation purposes.

Usage

generate_dataset(
  num_samples = 400,
  num_genes = 500,
  num_groups = 4
)
generate_dataset(
  num_samples = 400,
  num_genes = 500,
  num_groups = 4
)

Arguments

`num_samples`	The number of samples the dataset will contain.
`num_genes`	The number of genes the dataset will contain.
`num_groups`	The number of groups the samples will be split up in.

Value

A list containing the expression data and the meta data of the samples.

Examples

## Generate a dataset
dataset <- generate_dataset(num_genes = 200, num_samples = 400, num_groups = 4)

## Reduce dimensionality and infer trajectory with SCORPIUS
space <- reduce_dimensionality(dataset$expression, ndim = 2)
traj <- infer_trajectory(space)

## Visualise
draw_trajectory_plot(space, path=traj$path, progression_group=dataset$sample_info$group_name)
## Generate a dataset
dataset <- generate_dataset(num_genes = 200, num_samples = 400, num_groups = 4)

## Reduce dimensionality and infer trajectory with SCORPIUS
space <- reduce_dimensionality(dataset$expression, ndim = 2)
traj <- infer_trajectory(space)

## Visualise
draw_trajectory_plot(space, path=traj$path, progression_group=dataset$sample_info$group_name)

scRNA-seq data of dendritic cell progenitors.

Description

This dataset contains the expression values of the top 2000 most variable genes for 248 dendritic cell progenitors. Each cell is in one of three maturation stages: MDP, CDP or PreDC. The levels of the factor in sample.info are ordered according to the maturation process.

The number of genes had to be reduced specifically for reducing the package size of SCORPIUS. Use the following code to download the original data:

download.file("https://github.com/rcannood/SCORPIUS/raw/master/data-raw/ginhoux_orig.rds", destfile = "local.rds")
ginhoux <- readRDS("local.rds")
# do something with ginhoux

Usage

ginhoux
ginhoux

Format

A list containing two data frames, expression (248x2000) and sample_info (248x1).

Source

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE60783

References

Schlitzer A, Sivakamasundari V, Chen J, Sumatoh HR et al. Identification of cDC1- and cDC2-committed DC progenitors reveals early lineage priming at the common DC progenitor stage in the bone marrow. Nat Immunol 2015 Jul;16(7):718-28. PMID: 26054720

Infer an initial trajectory through space

Description

infer_initial_trajectory infers an initial trajectory for infer_trajectory by clustering the points and calculating the shortest path through cluster centers. The shortest path takes into account the euclidean distance between cluster centers, and the density between those two points.

Usage

infer_initial_trajectory(space, k)
infer_initial_trajectory(space, k)

Arguments

`space`	A numeric matrix or a data frame containing the coordinates of samples.
`k`	The number of clusters

Value

the initial trajectory obtained by this method

Infer linear trajectory through space

Description

infer_trajectory infers a trajectory through samples in a given space in a four-step process:

Perform k-means clustering
Calculate distance matrix between cluster centers using a custom distance function
Find the shortest path connecting all cluster centers using the custom distance matrix
Iteratively fit a curve to the given data using principal curves

Usage

infer_trajectory(
  space,
  k = 4,
  thresh = 0.001,
  maxit = 10,
  stretch = 0,
  smoother = "smooth_spline",
  approx_points = 100
)
infer_trajectory(
  space,
  k = 4,
  thresh = 0.001,
  maxit = 10,
  stretch = 0,
  smoother = "smooth_spline",
  approx_points = 100
)

Arguments

`space`	A numeric matrix or a data frame containing the coordinates of samples.
`k`	The number of clusters to cluster the data into.
`thresh`	convergence threshold on shortest distances to the curve.
`maxit`	maximum number of iterations.
`stretch`	A stretch factor for the endpoints of the curve, allowing the curve to grow to avoid bunching at the end. Must be a numeric value between 0 and 2.
`smoother`	choice of smoother. The default is `"smooth_spline"`, and other choices are `"lowess"` and `"periodic_lowess"`. The latter allows one to fit closed curves. Beware, you may want to use `iter = 0` with `lowess()`.
`approx_points`	Approximate curve after smoothing to reduce computational time. If `FALSE`, no approximation of the curve occurs. Otherwise, `approx_points` must be equal to the number of points the curve gets approximated to; preferably about 100.

Value

A list containing several objects:

path: the trajectory obtained by principal curves.
time: the time point of each sample along the inferred trajectory.

Examples

## Generate an example dataset and visualise it
dataset <- generate_dataset(num_genes = 200, num_samples = 400, num_groups = 4)
space <- reduce_dimensionality(dataset$expression, ndim = 2)
draw_trajectory_plot(space, progression_group = dataset$sample_info$group_name)

## Infer a trajectory through this space
traj <- infer_trajectory(space)

## Visualise the trajectory
draw_trajectory_plot(space, path=traj$path, progression_group = dataset$sample_info$group_name)
## Generate an example dataset and visualise it
dataset <- generate_dataset(num_genes = 200, num_samples = 400, num_groups = 4)
space <- reduce_dimensionality(dataset$expression, ndim = 2)
draw_trajectory_plot(space, progression_group = dataset$sample_info$group_name)

## Infer a trajectory through this space
traj <- infer_trajectory(space)

## Visualise the trajectory
draw_trajectory_plot(space, path=traj$path, progression_group = dataset$sample_info$group_name)

Dimensionality reduction

Description

reduce_dimensionality performs an eigenanalysis of the given dissimilarity matrix and returns coordinates of the samples represented in an ndim-dimensional space.

Usage

reduce_dimensionality(
  x,
  dist = c("spearman", "pearson", "euclidean", "cosine", "manhattan"),
  ndim = 3,
  num_landmarks = 1000
)
reduce_dimensionality(
  x,
  dist = c("spearman", "pearson", "euclidean", "cosine", "manhattan"),
  ndim = 3,
  num_landmarks = 1000
)

Arguments

`x`	a numeric matrix
`dist`	the distance metric to be used; can be any of the metrics listed in `dynutils::calculate_distance()`.
`ndim`	the maximum dimension of the space which the data are to be represented in; must be in 1, 2, ..., n-1.
`num_landmarks`	the number of landmarks to be selected.

Value

A matrix containing the coordinates of each sample, represented in an ndim-dimensional space.

Examples

## Generate an example dataset
dataset <- generate_dataset(num_genes = 200, num_samples = 400, num_groups = 4)

## Reduce the dimensionality of this dataset
space <- reduce_dimensionality(dataset$expression, ndim = 2)

## Visualise the dataset
draw_trajectory_plot(space, progression_group = dataset$sample_info$group_name)
## Generate an example dataset
dataset <- generate_dataset(num_genes = 200, num_samples = 400, num_groups = 4)

## Reduce the dimensionality of this dataset
space <- reduce_dimensionality(dataset$expression, ndim = 2)

## Visualise the dataset
draw_trajectory_plot(space, progression_group = dataset$sample_info$group_name)

Reverse a trajectory

Description

Since the direction of the trajectory is not specified, the ordering of a trajectory may be inverted using reverse_trajectory.

Usage

reverse_trajectory(trajectory)
reverse_trajectory(trajectory)

Arguments

trajectory

A trajectory as returned by infer_trajectory.

Value

The same trajectory, but in the other direction.

Examples

## Generate an example dataset and infer a trajectory through it
dataset <- generate_dataset(num_genes = 200, num_samples = 400, num_groups = 4)
group_name <- dataset$sample_info$group_name
space <- reduce_dimensionality(dataset$expression, ndim = 2)
traj <- infer_trajectory(space)

## Visualise the trajectory
draw_trajectory_plot(space, group_name, path = traj$path)

## Reverse the trajectory
reverse_traj <- reverse_trajectory(traj)
draw_trajectory_plot(space, group_name, path = reverse_traj$path)

plot(traj$time, reverse_traj$time, type = "l")
## Generate an example dataset and infer a trajectory through it
dataset <- generate_dataset(num_genes = 200, num_samples = 400, num_groups = 4)
group_name <- dataset$sample_info$group_name
space <- reduce_dimensionality(dataset$expression, ndim = 2)
traj <- infer_trajectory(space)

## Visualise the trajectory
draw_trajectory_plot(space, group_name, path = traj$path)

## Reverse the trajectory
reverse_traj <- reverse_trajectory(traj)
draw_trajectory_plot(space, group_name, path = reverse_traj$path)

plot(traj$time, reverse_traj$time, type = "l")

Infer a trajectory using SCORPIUS

Description

Pass this object to dynwrap::infer_trajectory().

Usage

ti_scorpius(
  distance_method = "spearman",
  ndim = 3L,
  k = 4L,
  thresh = 0.001,
  maxit = 10L,
  stretch = 0,
  smoother = "smooth_spline"
)
ti_scorpius(
  distance_method = "spearman",
  ndim = 3L,
  k = 4L,
  thresh = 0.001,
  maxit = 10L,
  stretch = 0,
  smoother = "smooth_spline"
)

Arguments

`distance_method`	A character string indicating which correlationcoefficient (or covariance) is to be computed. One of "pearson", "spearman" (default), or "cosine". Domain: spearman, pearson, cosine. Default: spearman. Format: character.
`ndim`	The number of dimensions in the new space. Domain: U(2, 20). Default: 3. Format: integer.
`k`	The number of clusters to cluster the data into to construct the initial trajectory. Domain: U(1, 20). Default: 4. Format: integer.
`thresh`	`principal_curve` parameter; convergence threshhold on shortest distances to the curve. Domain: e^U(-11.51, 11.51). Default: 0.001. Format: numeric.
`maxit`	`principal_curve` parameter; maximum number of iterations. Domain: U(0, 50). Default: 10. Format: integer.
`stretch`	`principal_curve` parameter; a factor by which the curve can be extrapolated when points are projected. Domain: U(0, 5). Default: 0. Format: numeric.
`smoother`	`principal_curve` parameter; choice of smoother. Domain: smooth_spline, lowess, periodic_lowess. Default: smooth_spline. Format: character.

Value

A dynwrap TI method.

Package 'SCORPIUS'

Help Index

SCORPIUS: Trajectory inference from single-cell RNA sequencing data.

Description

Dimensionality Reduction functions

Trajectory Inference functions

Visualisation functions

Datasets

References

Examples

Draw time-series heatmap

Description

Usage

Arguments

Value

Examples

Visualise SCORPIUS

Description

Usage

Arguments

Value

Examples

Extract modules of features

Description

Usage

Arguments

Value

See Also

Examples

Calculate the importance of a feature

Description

Usage

Arguments

Value

Examples

Generate a synthetic dataset

Description

Usage

Arguments

Value

See Also

Examples

scRNA-seq data of dendritic cell progenitors.

Description

Usage

Format

Source

References

See Also

Infer an initial trajectory through space

Description

Usage

Arguments

Value

Infer linear trajectory through space

Description

Usage

Arguments

Value

See Also

Examples

Dimensionality reduction

Description

Usage

Arguments

Value

See Also

Examples

Reverse a trajectory

Description

Usage

Arguments

Value

See Also

Examples

Infer a trajectory using SCORPIUS

Description

Usage

Arguments

Value