--- title: "Investigating dendritic cell maturation in dendritic cell progenitors" author: "Robrecht Cannoodt" date: "2016-01-22" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Investigating dendritic cell maturation in dendritic cell progenitors} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} editor_options: chunk_output_type: console --- ```{r, setseed, echo=FALSE} set.seed(1) ``` In this vignette, SCORPIUS is used to infer a trajectory through dendritic cell progenitors. The `ginhoux` dataset contains 248 dendritic cell progenitors in one of three cellular cellular states: MDP, CDP or PreDC. ```{r load_data, message=FALSE} library(SCORPIUS) data(ginhoux) ``` The dataset is a list containing a matrix named `expression` and a data frame named `sample_info`. `expression` was a `r nrow(ginhoux$expression)`-by-15752 matrix containing the expression values of all the cells and all the genes, but this dataset had to be reduced to `r ncol(ginhoux$expression)` genes in order to reduce the package size. See `?ginhoux` for more info. ```{r show_expression} ginhoux$expression[1:6, 1:6] ``` `sample_info` is a data frame with the metadata of the cells, containing cell types of the individual cells. ```{r show_sample_info} head(ginhoux$sample_info) ``` In order to infer a trajectory through this data, SCORPIUS first reduces the dimensionality of the dataset. ## Reduce dimensionality of the dataset SCORPIUS uses Torgerson multi-dimensional scaling to reduce the dataset to three dimensions. This technique attempts to place the cells in a space such that the distance between any two points in that space approximates the original distance between the two cells as well as possible. The distance between any two samples is defined as their correlation distance, namely `1 - (cor(x, y)+1)/2`. The reduced space is constructed as follows: ```{r perform_mds} expression <- ginhoux$expression group_name <- ginhoux$sample_info$group_name space <- reduce_dimensionality(expression, "spearman", ndim = 3) ``` The new space is a `r nrow(space)`-by-`r ncol(space)` matrix, and can be visualised with or without colouring of the different cell types. ```{r show_dimred} draw_trajectory_plot(space, progression_group = group_name, contour = TRUE) ``` ## Inferring a trajectory through the cells The main goal of SCORPIUS is to infer a trajectory through the cells, and orden the cells according to the inferred timeline. SCORPIUS infers a trajectory through several intermediate steps, which are all executed as follows: ```{r infer_trajectory} traj <- infer_trajectory(space) ``` The result is a list containing the final trajectory `path` and the inferred timeline for each sample `time`. The trajectory can be visualised with respect to the samples by passing it to `draw_trajectory_plot`: ```{r plot_trajectory} draw_trajectory_plot( space, progression_group = group_name, path = traj$path, contour = TRUE ) ``` ## Finding candidate marker genes We search for genes whose expression is seems to be a function of the trajectory timeline that was inferred, as such genes might be good candidate marker genes for dendritic cell maturation. ```{r find_tafs} gimp <- gene_importances(expression, traj$time, num_permutations = 0, num_threads = 8) gene_sel <- gimp[1:50,] expr_sel <- expression[,gene_sel$gene] ``` ```{r echo=F} # reverse the trajectory. This does not change the results in any way, # other than the heatmap being ordered more logically. # traj <- reverse_trajectory(traj) ``` To visualise the expression of the selected genes, use the `draw_trajectory_heatmap` function. ```{r visualise_tafs, fig.keep='first'} draw_trajectory_heatmap(expr_sel, traj$time, group_name) ``` Finally, these genes can also be grouped into modules as follows: ```{r moduled_tafs, fig.keep='first'} modules <- extract_modules(scale_quantile(expr_sel), traj$time, verbose = FALSE) draw_trajectory_heatmap(expr_sel, traj$time, group_name, modules) ```