Title: | Inferring Developmental Chronologies from Single-Cell RNA Sequencing Data |
---|---|
Description: | An accurate and easy tool for performing linear trajectory inference on single cells using single-cell RNA sequencing data. In addition, 'SCORPIUS' provides functions for discovering the most important genes with respect to the reconstructed trajectory, as well as nice visualisation tools. Cannoodt et al. (2016) <doi:10.1101/079509>. |
Authors: | Robrecht Cannoodt [aut, cre] (<https://orcid.org/0000-0003-3641-729X>, rcannood), Wouter Saelens [ctb] (<https://orcid.org/0000-0002-7114-6248>, zouter) |
Maintainer: | Robrecht Cannoodt <[email protected]> |
License: | GPL-3 |
Version: | 1.0.9 |
Built: | 2024-11-12 03:27:25 UTC |
Source: | https://github.com/rcannood/scorpius |
SCORPIUS orders single cells with regard to an implicit timeline, such as cellular development or progression over time.
infer_trajectory
, infer_initial_trajectory
, reverse_trajectory
, gene_importances
, extract_modules
draw_trajectory_plot
, draw_trajectory_heatmap
Cannoodt R. et al., SCORPIUS improves trajectory inference and identifies novel modules in dendritic cell development, bioRxiv (Oct., 2016). doi:10.1101/079509 (PDF).
## Load dataset from Schlitzer et al., 2015 data("ginhoux") ## Reduce dimensionality and infer trajectory with SCORPIUS space <- reduce_dimensionality(ginhoux$expression, "spearman") traj <- infer_trajectory(space) ## Visualise draw_trajectory_plot( space, path = traj$path, progression_group = ginhoux$sample_info$group_name )
## Load dataset from Schlitzer et al., 2015 data("ginhoux") ## Reduce dimensionality and infer trajectory with SCORPIUS space <- reduce_dimensionality(ginhoux$expression, "spearman") traj <- infer_trajectory(space) ## Visualise draw_trajectory_plot( space, path = traj$path, progression_group = ginhoux$sample_info$group_name )
draw_trajectory_heatmap
draws a heatmap in which the samples
are ranked according their position in an inferred trajectory. In addition, the progression groups and
feature modules can be passed along to further enhance the visualisation.
draw_trajectory_heatmap( x, time, progression_group = NULL, modules = NULL, show_labels_row = FALSE, show_labels_col = FALSE, scale_features = TRUE, progression_group_palette = NULL, ... )
draw_trajectory_heatmap( x, time, progression_group = NULL, modules = NULL, show_labels_row = FALSE, show_labels_col = FALSE, scale_features = TRUE, progression_group_palette = NULL, ... )
x |
A numeric matrix or a data frame with one row per sample and one column per feature. |
time |
A numeric vector containing the inferred time points of each sample along a trajectory. |
progression_group |
|
modules |
|
show_labels_row |
|
show_labels_col |
|
scale_features |
|
progression_group_palette |
A named vector palette for the progression group. |
... |
Optional arguments to |
The output of the pheatmap
function.
## Generate a dataset dataset <- generate_dataset(num_genes=500, num_samples=300, num_groups=4) expression <- dataset$expression space <- reduce_dimensionality(expression, ndim=2) groups <- dataset$sample_info$group_name traj <- infer_trajectory(space) time <- traj$time gimp <- gene_importances(expression, traj$time, num_permutations = 0, ntree = 10000) gene_sel <- gimp[1:50,] expr_sel <- expression[,gene_sel$gene] ## Draw a time series heatmap draw_trajectory_heatmap(expr_sel, time) ## Also show the progression groupings draw_trajectory_heatmap(expr_sel, time, progression_group=groups) ## Use a different palette draw_trajectory_heatmap( expr_sel, time, progression_group=groups, progression_group_palette = setNames(RColorBrewer::brewer.pal(4, "Set2"), paste0("Group ", 1:4)) ) ## Group the genes into modules and visualise the modules in a heatmap modules <- extract_modules(scale_quantile(expr_sel)) draw_trajectory_heatmap(expr_sel, time, progression_group=groups, modules=modules)
## Generate a dataset dataset <- generate_dataset(num_genes=500, num_samples=300, num_groups=4) expression <- dataset$expression space <- reduce_dimensionality(expression, ndim=2) groups <- dataset$sample_info$group_name traj <- infer_trajectory(space) time <- traj$time gimp <- gene_importances(expression, traj$time, num_permutations = 0, ntree = 10000) gene_sel <- gimp[1:50,] expr_sel <- expression[,gene_sel$gene] ## Draw a time series heatmap draw_trajectory_heatmap(expr_sel, time) ## Also show the progression groupings draw_trajectory_heatmap(expr_sel, time, progression_group=groups) ## Use a different palette draw_trajectory_heatmap( expr_sel, time, progression_group=groups, progression_group_palette = setNames(RColorBrewer::brewer.pal(4, "Set2"), paste0("Group ", 1:4)) ) ## Group the genes into modules and visualise the modules in a heatmap modules <- extract_modules(scale_quantile(expr_sel)) draw_trajectory_heatmap(expr_sel, time, progression_group=groups, modules=modules)
draw_trajectory_plot
is used to plot samples after performing dimensionality reduction.
Additional arguments can be provided to colour the samples, plot the trajectory inferred by SCORPIUS,
and draw a contour around the samples.
draw_trajectory_plot( space, progression_group = NULL, path = NULL, contour = FALSE, progression_group_palette = NULL, point_size = 2, point_alpha = 1, path_size = 0.5, path_alpha = 1, contour_alpha = 0.2 )
draw_trajectory_plot( space, progression_group = NULL, path = NULL, contour = FALSE, progression_group_palette = NULL, point_size = 2, point_alpha = 1, path_size = 0.5, path_alpha = 1, contour_alpha = 0.2 )
space |
A numeric matrix or a data frame containing the coordinates of samples. |
progression_group |
|
path |
A numeric matrix or a data frame containing the coordinates of the inferred path. |
contour |
|
progression_group_palette |
A named vector palette for the progression group. |
point_size |
The size of the points. |
point_alpha |
The alpha of the points. |
path_size |
The size of the path (if any). |
path_alpha |
The alpha of the path (if any). |
contour_alpha |
The alpha of the contour (if any). |
A ggplot2 plot.
## Generate a synthetic dataset dataset <- generate_dataset(num_genes = 500, num_samples = 300, num_groups = 4) space <- reduce_dimensionality(dataset$expression, ndim = 2) groups <- dataset$sample_info$group_name ## Simply plot the samples draw_trajectory_plot(space) ## Colour each sample according to its group draw_trajectory_plot(space, progression_group = groups) ## Add contours to the plot draw_trajectory_plot(space, progression_group = groups, contour = TRUE) ## Plot contours without colours draw_trajectory_plot(space, contour = TRUE) ## Infer a trajectory and plot it traj <- infer_trajectory(space) draw_trajectory_plot(space, progression_group = groups, path = traj$path) draw_trajectory_plot(space, progression_group = groups, path = traj$path, contour = TRUE) ## Visualise gene expression draw_trajectory_plot(space, progression_group = dataset$expression[,1])
## Generate a synthetic dataset dataset <- generate_dataset(num_genes = 500, num_samples = 300, num_groups = 4) space <- reduce_dimensionality(dataset$expression, ndim = 2) groups <- dataset$sample_info$group_name ## Simply plot the samples draw_trajectory_plot(space) ## Colour each sample according to its group draw_trajectory_plot(space, progression_group = groups) ## Add contours to the plot draw_trajectory_plot(space, progression_group = groups, contour = TRUE) ## Plot contours without colours draw_trajectory_plot(space, contour = TRUE) ## Infer a trajectory and plot it traj <- infer_trajectory(space) draw_trajectory_plot(space, progression_group = groups, path = traj$path) draw_trajectory_plot(space, progression_group = groups, path = traj$path, contour = TRUE) ## Visualise gene expression draw_trajectory_plot(space, progression_group = dataset$expression[,1])
extract_modules
uses adaptive branch pruning to extract modules of features,
which is typically done on the smoothed expression returned by gene_importances
.
extract_modules( x, time = NULL, suppress_warnings = FALSE, verbose = FALSE, ... )
extract_modules( x, time = NULL, suppress_warnings = FALSE, verbose = FALSE, ... )
x |
A numeric matrix or a data frame with M rows (one per sample) and P columns (one per feature). |
time |
(Optional) Order the modules according to a pseudotime |
suppress_warnings |
Whether or not to suppress warnings when P > 1000 |
verbose |
Whether or not Mclust will print output or not |
... |
Extra parameters passed to |
A data frame containing meta-data for the features in x
, namely the order in which to visualise the features in and which module they belong to.
## Generate a dataset and visualise dataset <- generate_dataset(num_genes=300, num_samples=200, num_groups=4) expression <- dataset$expression group_name <- dataset$sample_info$group_name space <- reduce_dimensionality(expression, ndim=2) traj <- infer_trajectory(space) time <- traj$time draw_trajectory_plot(space, path=traj$path, group_name) ## Select most important genes (set ntree to at least 10000!) gimp <- gene_importances(expression, traj$time, num_permutations = 0, ntree = 1000) gene_sel <- gimp[1:50,] expr_sel <- expression[,gene_sel$gene] ## Group the genes into modules and visualise the modules in a heatmap modules <- extract_modules(scale_quantile(expr_sel)) draw_trajectory_heatmap(expr_sel, time, group_name, modules)
## Generate a dataset and visualise dataset <- generate_dataset(num_genes=300, num_samples=200, num_groups=4) expression <- dataset$expression group_name <- dataset$sample_info$group_name space <- reduce_dimensionality(expression, ndim=2) traj <- infer_trajectory(space) time <- traj$time draw_trajectory_plot(space, path=traj$path, group_name) ## Select most important genes (set ntree to at least 10000!) gimp <- gene_importances(expression, traj$time, num_permutations = 0, ntree = 1000) gene_sel <- gimp[1:50,] expr_sel <- expression[,gene_sel$gene] ## Group the genes into modules and visualise the modules in a heatmap modules <- extract_modules(scale_quantile(expr_sel)) draw_trajectory_heatmap(expr_sel, time, group_name, modules)
Calculates the feature importance of each column in x
in trying to predict the time ordering.
gene_importances( x, time, num_permutations = 0, ntree = 10000, ntree_perm = ntree/10, mtry = ncol(x) * 0.01, num_threads = 1, ... )
gene_importances( x, time, num_permutations = 0, ntree = 10000, ntree_perm = ntree/10, mtry = ncol(x) * 0.01, num_threads = 1, ... )
x |
A numeric matrix or a data frame with M rows (one per sample) and P columns (one per feature). |
time |
A numeric vector containing the inferred time points of each sample along a trajectory as returned by |
num_permutations |
The number of permutations to test against for calculating the p-values (default: 0). |
ntree |
The number of trees to grow (default: 10000). |
ntree_perm |
The number of trees to grow for each of the permutations (default: ntree / 10). |
mtry |
The number of variables randomly samples at each split (default: 1% of features). |
num_threads |
Number of threads. Default is 1. |
... |
Extra parameters passed to |
a data frame containing the importance of each feature for the given time line
dataset <- generate_dataset(num_genes=500, num_samples=300, num_groups=4) expression <- dataset$expression group_name <- dataset$sample_info$group_name space <- reduce_dimensionality(expression, ndim=2) traj <- infer_trajectory(space) # set ntree to at least 1000! gene_importances(expression, traj$time, num_permutations = 0, ntree = 1000)
dataset <- generate_dataset(num_genes=500, num_samples=300, num_groups=4) expression <- dataset$expression group_name <- dataset$sample_info$group_name space <- reduce_dimensionality(expression, ndim=2) traj <- infer_trajectory(space) # set ntree to at least 1000! gene_importances(expression, traj$time, num_permutations = 0, ntree = 1000)
generate_dataset
generates an synthetic dataset which can be used for visualisation purposes.
generate_dataset( num_samples = 400, num_genes = 500, num_groups = 4 )
generate_dataset( num_samples = 400, num_genes = 500, num_groups = 4 )
num_samples |
The number of samples the dataset will contain. |
num_genes |
The number of genes the dataset will contain. |
num_groups |
The number of groups the samples will be split up in. |
A list containing the expression data and the meta data of the samples.
## Generate a dataset dataset <- generate_dataset(num_genes = 200, num_samples = 400, num_groups = 4) ## Reduce dimensionality and infer trajectory with SCORPIUS space <- reduce_dimensionality(dataset$expression, ndim = 2) traj <- infer_trajectory(space) ## Visualise draw_trajectory_plot(space, path=traj$path, progression_group=dataset$sample_info$group_name)
## Generate a dataset dataset <- generate_dataset(num_genes = 200, num_samples = 400, num_groups = 4) ## Reduce dimensionality and infer trajectory with SCORPIUS space <- reduce_dimensionality(dataset$expression, ndim = 2) traj <- infer_trajectory(space) ## Visualise draw_trajectory_plot(space, path=traj$path, progression_group=dataset$sample_info$group_name)
This dataset contains the expression values of the
top 2000 most variable genes for 248 dendritic cell progenitors.
Each cell is in one of three maturation stages: MDP, CDP or PreDC.
The levels of the factor in sample.info
are ordered according to the maturation process.
The number of genes had to be reduced specifically for reducing the package size of SCORPIUS. Use the following code to download the original data:
download.file("https://github.com/rcannood/SCORPIUS/raw/master/data-raw/ginhoux_orig.rds", destfile = "local.rds") ginhoux <- readRDS("local.rds") # do something with ginhoux
ginhoux
ginhoux
A list containing two data frames, expression
(248x2000) and sample_info
(248x1).
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE60783
Schlitzer A, Sivakamasundari V, Chen J, Sumatoh HR et al. Identification of cDC1- and cDC2-committed DC progenitors reveals early lineage priming at the common DC progenitor stage in the bone marrow. Nat Immunol 2015 Jul;16(7):718-28. PMID: 26054720
infer_initial_trajectory
infers an initial trajectory for
infer_trajectory
by clustering the points and calculating
the shortest path through cluster centers. The shortest path takes into account
the euclidean distance between cluster centers, and the density between those two
points.
infer_initial_trajectory(space, k)
infer_initial_trajectory(space, k)
space |
A numeric matrix or a data frame containing the coordinates of samples. |
k |
The number of clusters |
the initial trajectory obtained by this method
infer_trajectory
infers a trajectory through samples in a given space in a four-step process:
Perform k-means clustering
Calculate distance matrix between cluster centers using a custom distance function
Find the shortest path connecting all cluster centers using the custom distance matrix
Iteratively fit a curve to the given data using principal curves
infer_trajectory( space, k = 4, thresh = 0.001, maxit = 10, stretch = 0, smoother = "smooth_spline", approx_points = 100 )
infer_trajectory( space, k = 4, thresh = 0.001, maxit = 10, stretch = 0, smoother = "smooth_spline", approx_points = 100 )
space |
A numeric matrix or a data frame containing the coordinates of samples. |
k |
The number of clusters to cluster the data into. |
thresh |
convergence threshold on shortest distances to the curve. |
maxit |
maximum number of iterations. |
stretch |
A stretch factor for the endpoints of the curve, allowing the curve to grow to avoid bunching at the end. Must be a numeric value between 0 and 2. |
smoother |
choice of smoother. The default is
|
approx_points |
Approximate curve after smoothing to reduce computational time.
If |
A list containing several objects:
path
: the trajectory obtained by principal curves.
time
: the time point of each sample along the inferred trajectory.
reduce_dimensionality
, draw_trajectory_plot
## Generate an example dataset and visualise it dataset <- generate_dataset(num_genes = 200, num_samples = 400, num_groups = 4) space <- reduce_dimensionality(dataset$expression, ndim = 2) draw_trajectory_plot(space, progression_group = dataset$sample_info$group_name) ## Infer a trajectory through this space traj <- infer_trajectory(space) ## Visualise the trajectory draw_trajectory_plot(space, path=traj$path, progression_group = dataset$sample_info$group_name)
## Generate an example dataset and visualise it dataset <- generate_dataset(num_genes = 200, num_samples = 400, num_groups = 4) space <- reduce_dimensionality(dataset$expression, ndim = 2) draw_trajectory_plot(space, progression_group = dataset$sample_info$group_name) ## Infer a trajectory through this space traj <- infer_trajectory(space) ## Visualise the trajectory draw_trajectory_plot(space, path=traj$path, progression_group = dataset$sample_info$group_name)
reduce_dimensionality
performs an eigenanalysis of the given dissimilarity matrix
and returns coordinates of the samples represented in an ndim
-dimensional space.
reduce_dimensionality( x, dist = c("spearman", "pearson", "euclidean", "cosine", "manhattan"), ndim = 3, num_landmarks = 1000 )
reduce_dimensionality( x, dist = c("spearman", "pearson", "euclidean", "cosine", "manhattan"), ndim = 3, num_landmarks = 1000 )
x |
a numeric matrix |
dist |
the distance metric to be used; can be any of the metrics listed in |
ndim |
the maximum dimension of the space which the data are to be represented in; must be in 1, 2, ..., n-1. |
num_landmarks |
the number of landmarks to be selected. |
A matrix containing the coordinates of each sample, represented in an ndim
-dimensional space.
## Generate an example dataset dataset <- generate_dataset(num_genes = 200, num_samples = 400, num_groups = 4) ## Reduce the dimensionality of this dataset space <- reduce_dimensionality(dataset$expression, ndim = 2) ## Visualise the dataset draw_trajectory_plot(space, progression_group = dataset$sample_info$group_name)
## Generate an example dataset dataset <- generate_dataset(num_genes = 200, num_samples = 400, num_groups = 4) ## Reduce the dimensionality of this dataset space <- reduce_dimensionality(dataset$expression, ndim = 2) ## Visualise the dataset draw_trajectory_plot(space, progression_group = dataset$sample_info$group_name)
Since the direction of the trajectory is not specified, the ordering of a trajectory may be inverted using reverse_trajectory
.
reverse_trajectory(trajectory)
reverse_trajectory(trajectory)
trajectory |
A trajectory as returned by |
The same trajectory, but in the other direction.
## Generate an example dataset and infer a trajectory through it dataset <- generate_dataset(num_genes = 200, num_samples = 400, num_groups = 4) group_name <- dataset$sample_info$group_name space <- reduce_dimensionality(dataset$expression, ndim = 2) traj <- infer_trajectory(space) ## Visualise the trajectory draw_trajectory_plot(space, group_name, path = traj$path) ## Reverse the trajectory reverse_traj <- reverse_trajectory(traj) draw_trajectory_plot(space, group_name, path = reverse_traj$path) plot(traj$time, reverse_traj$time, type = "l")
## Generate an example dataset and infer a trajectory through it dataset <- generate_dataset(num_genes = 200, num_samples = 400, num_groups = 4) group_name <- dataset$sample_info$group_name space <- reduce_dimensionality(dataset$expression, ndim = 2) traj <- infer_trajectory(space) ## Visualise the trajectory draw_trajectory_plot(space, group_name, path = traj$path) ## Reverse the trajectory reverse_traj <- reverse_trajectory(traj) draw_trajectory_plot(space, group_name, path = reverse_traj$path) plot(traj$time, reverse_traj$time, type = "l")
Pass this object to dynwrap::infer_trajectory()
.
ti_scorpius( distance_method = "spearman", ndim = 3L, k = 4L, thresh = 0.001, maxit = 10L, stretch = 0, smoother = "smooth_spline" )
ti_scorpius( distance_method = "spearman", ndim = 3L, k = 4L, thresh = 0.001, maxit = 10L, stretch = 0, smoother = "smooth_spline" )
distance_method |
A character string indicating which correlationcoefficient (or covariance) is to be computed. One of "pearson", "spearman" (default), or "cosine". Domain: spearman, pearson, cosine. Default: spearman. Format: character. |
ndim |
The number of dimensions in the new space. Domain: U(2, 20). Default: 3. Format: integer. |
k |
The number of clusters to cluster the data into to construct the initial trajectory. Domain: U(1, 20). Default: 4. Format: integer. |
thresh |
|
maxit |
|
stretch |
|
smoother |
|
A dynwrap TI method.