Systematic evaluation of the robustness of deconvolution methods for spatial transcriptomics data

Background

Bulk RNA expression experiments have proven to be useful in many studies, but are inherently limited to measuring an average response across a population of cells.  The development of high-throughput single-cell RNA-sequencing (scRNA-seq) has opened up exciting possibilities for studying cellular heterogeneity. However, a disadvantage of current high-throughput scRNA-seq protocols is that tissue is first dissociated into individual cells and therefore loses the spatial tissue context. Currently, this problem is being attacked both experimentally and computationally. On the experimental side, tremendous progress is being made in the area of spatial transcriptomics, in which spatial information is conserved. However, these techniques often do not reach single-cell resolution yet and are also limited in terms of the number of genes measured. Therefore, computational approaches have been developed to combine the benefits of both worlds by integrating scRNA-seq and spatial transcriptomics data (Longo et al., 2021).

Spatial transcriptomics data contain the measured gene expression counts of spatial measurement locations, here referred to as spots. Each spot usually is a mixture of multiple cell types. Integration methods are designed to predict which cell types are present in each spot and their corresponding proportion using scRNA-seq data a reference dataset that contains the transcriptomic profiles of different cell types. It has been claimed that these integration approaches are sensitive to what is referred to as cell type mismatch, that is, the absence of cell types in the reference dataset. However, it is not well understood how sensitive to cell type mismatch the most often used integration approaches are and whether some approaches are less sensitive than others.

Research goal

In this project, I plan to develop a benchmark framework and to systematically evaluate the robustness to cell type mismatch of several state-of-the-art integration methods, including RCTD (Cable et al., 2021) and SPOTlight (Elosua-Bayes et al. 2021).

Approach

Step 1. Literature review on methods for the integration of scRNA-seq and spatial transcriptomics data,

Step 2. Establish a benchmark framework:

  1. Selection of integration methods to be compared;
  2. Selection and annotation of datasets to be integrated. Because of ongoing projects in the Bioinformatics Laboratory, these will include scRNA-seq and spatial data from mouse and/or human lymph nodes and from synovium in rheumatoid arthritis;
  3. Design of the benchmark approach: how to simulate cell type mismatch, what performance measures to use, use of simulated data.

Step 3. Perform benchmark, interpretation of results.