Deconvolution Benchmark: TL;DR

paper preview

Deconvolution

single cell

A high level overview of the Deconvolution Benchmark project, now published in Genome Biology!

Author

Louise A. Huuki-Myers

Published

April 9, 2025

Introduction

This blog post provides a high-level summary of our paper “Benchmark of cellular deconvolution methods using a multi-assay dataset from postmortem human prefrontal cortex” published in Genome Biology in April, 2025 (Huuki-Myers et al., n.d.).

In this deconvolution benchmark project we set out to determine the most accurate method for predicting cell type composition in bulk RNA-seq data from brain tissue. We also evaluated method for selecting marker genes, and introduced the MeanRatio method for marker gene selection. The dataset developed for this experiment, MeanRatio functions, and other helpful tools for deconvolution are available in the DeconvBuddies Bioconductor package.

What is deconvolution?

Complex tissue is made up of different cell types that express genes at different levels. In bulk RNA-seq this heterogeneity of the tissue is obscured, and the gene expression measurements represent a mixture of all of the cells and cell types in the sample. Differences in the cell type composition between samples, either technical or biologically real, can confound downstream analysis such as differential expression.

Deconvolution is an analysis that infers the cell type composition of bulk RNA-seq data, using gene expression profiles from single cell data.

How to preform deconvolution?

To run deconvolution you’ll need:

Your Bulk RNA-seq gene expression data
A refrence single cell RNA-seq gene expression data set, from the same tissue type
A deconvolution method (computational algorithm)

Available Deconvolution Methods

Reviewing the literature we found 20+ deconvolution methods available. This presents quite an overwhelming choice for researchers! Are there big diffrences between methods? If so how can we chose the most accurate method?

Existing Benchmarks

Benchmark studies aim to test and rank the performance of available methods. There have been several benchmarks studies on deconovlution methods, both with-in papers presenting new methods and as separate studies. However there is not much of a consensus on which method is the most accurate:

Benchmarking results from different papers on “real” data

MuSiC paper (Wang et al. 2019): MuSiC > NNLS > BSEQ-sx > CIBERSORT
Bisque paper (Jew et al. 2020): Bisque > MuSiC > CIBERSORT
Cobos benchmark (Avila Cobos et al. 2020): DWLS > MuSiC > Bisque > deconvoSeq
Jin et al. benchmark (Jin and Liu 2021): CIBERSORT, MuSiC > EPIC*, TIMER, DeconRNAseq
Dai et al., benchmark (Dai et al., n.d.): Dtangle > Bisque > Other Methods

Additionally the Cobos et al., 2020 benchmark study shows that different methods preform best on different data sets (Avila Cobos et al. 2020).

A challenge in benchmark studies is producing a “ground truth” estimate for cell type composition. Often in benchmarks pseudobulk mixtures created from the single cell data are used as the bulk data, so the absolute composition is known.

However we think pseudobulk data might not be a stand-in for real bulk RNA-seq data. Better to use orthogonal measurement of cell type compositions paired with real bulk RNA-seq data. We also were curious about the performance of methods specifically in brain RNA-seq data.

This motivated us to run our own deconvolution benchmark study!

Deconvolution Benchmark Study

Study Design

We designed an experiment to evaluate the performance of deconvolution methods on human brain tissue, specifically the dorsal lateral pre-frontal cortex (DLPFC). We used consecutive slices of 22 DLPFC brain blocks from 10 neurotypical donors, to create three assays:

RNAScope: orthogonal measurement of cell type compositions for six major cell types (n=25)
snRNA-seq: reference single nucleus data (n=19)
Bulk RNA-seq: using a variety of library types and RNA extractions methods (n=110)

Diagram of study design. A. Cartoon of brain region and brain block with consecutive slices. B. Tile plot showing available samples and quality control status

RNAScope Cell Type Proportions 🔬

To obtain orthogonal measurements of cell type proportions for six major cell types in the DLPFC, we utilized multiplex single molecule fluorescent in situ hybridization (smFISH) combined with immunofluorescence (IF) using RNAScope/IF.

We designed two probe combinations:

Star measures:
1. Excitatory Neurons (Excit)
2. Mircoglia (Micro)
3. combined Oligodenrocytes and Oligodendrocyte Precursor cells (OligoOPC)
Circle measures:
1. Inhibitory Neurons (Inhib)
2. Endothelial/Mural cells (EndoMural)
3. Astrocytes (Astro)

We used HALO to segment and label cell types, then calculated cell type porotions for each sample.

RNAScope/IF Experiment Design. A. Star and Circle probe combinations measure 3 cell types each. Example flourescent images of B. Star and C. Circle. D. Bar plots of estimated cell type compositions

Single Nucleus Reference dataset

The snRNA-seq data was previously analyzed as part of the spatialDLPFC project (see Huuki-Myers et al., or previous blog post for more details. This reference consist of 56k nuclei from 19 samples with seven broad cell types.

tSNE plot and overall cell type composition for snRNA-seq dataset

Bulk RNA-seq Data

For the bulk RNA-seq we we curious if using different library types (polyA or RiboZero) and RNA Extraction (nuclear, cytoplasmic, or total) would impact the accuracy of deconvolution. So for each brain block we prepared one sample of each library combination.

Analyzing just the bulk RNA-seq data we saw large differences in gene expression between the different preparations of the bulk data, principal component analysis shows the data divide by library type and RNA extraction. We were suspicious that these technical differences in gene expression would impact deconvolution estimates, a good deconvolution method should be robust to the differences in datatypes.

tile plot showing n samples over library type and RNA extraction, PCA of the genes expression shows PC1 seperates Library type, PC2 seperates RNA extraction

Which methods to test?

From the large number of available methods we selected six methods that were previously selected as top performers in other benchmark papers, and applied a range of different approaches: DWLS, Bisque, MuSiC, hspe, BayesPrism, and CIBERSORTx (detailed below).

Method	Citation	Approach	Marker Gene Selection	Availability	Top Benchmark Performance
DWLS (Dampened weighted least-squares)	(Tsoucas et al. 2019)	weighted least squares	-	R package on CRAN	(Avila Cobos et al. 2020)
Bisque	(Jew et al. 2020)	Bias correction: Assay	-	R package on GitHub	(Dai et al., n.d.)
MuSiC (Multi-subject Single-cell)	(Wang et al. 2019)	Bias correction: Source	Weights Genes	R package GitHub	(Jin and Liu 2021)
BayesPrism	(Chu et al. 2022)	Bayesian	Pairwise t-test	Webtool, R package on GitHub	(Hippen et al. 2023)
hspe (dtangle) (hybrid-scale proportion estimation)	(Hunt et al. 2019)	High collinearity adjustment	Multiple options- default “ratio” 1vALL mean expression ratio	R package on GitHub	(Dai et al., n.d.)
CIBERSORTx	(Newman et al. 2019)	Machine Learning	Differential Gene expression	Webtool, Docker Image	(Jin and Liu 2021)

Marker Gene Selection

A strategy to improve accuracy in deconvolution is to limit the analysis to a set of cell type marker genes; reducing noise in the analysis. To help select cell type specific marker genes we have developed the Mean Ratio method.

The Mean Ratio method works by selecting genes with large differences between gene expression in the target cell type and the closest non-target cell type. We calculate the MeanRatio for a target cell type for each gene by dividing the mean expression of the target cell by the mean expression of the next highest non-target cell type. Genes with the highest MeanRatio values are selected as marker genes.

Illustration of *Mean Ratio* marker selection method, and heatmap of top *Mean Ratio* marker genes

For more information about selecting marker genes with Mean Ratio see Finding Marker Genes with DeconvoBuddies.

In our benchmark we found that methods responded differently and unpredictably to different marker gene sets, but top methods preformed better using the top 25 Mean Ratio marker genes for each cell type.

Method Performance 🏆

On to the main event: time to evaluate the deconvolution methods!

We preformed deconvolution on the 110 bulk RNA seq samples, with each of the six selected methods, using the top25 Mean Ratio genes.

We then compared the estimated cell type proportions with the RNAScope cell type proportions. We calculated Pearson’s correlation and the root mean squared error (RMSE) between the two. Methods with high correlation and low RMSE are the most accurate.

Overall Bisque and hspe were the top preforming methods. 🏆

These were also the top methods in Dai et al., benchmark which also examined brain data (Dai et al., n.d.).

Bisque preformed slightly better in polyA data, hspe slightly better in RiboZero data. CIBERSORTx was a close third place, preforming similarly to Bisque and hspe in polyA data.

A. Scatter plot of RNAScope proportions vs. Method estimated proportions. B. Pearson’s correlation for each method over bulk RNA-seq library combinations, point size corresponds to rmse

Other Results

Above I have highlighted the main study design and conclusions of our deconvolution benchmark. In the paper we explored many more facets of deconvolution method performance. Some other results to highlight:

hspe is sensitive to marker gene selection
Bisque can preform poorly with < 4 donors
Bisque an hspe are unaffected by including “case” donors in the snRNA-seq reference
Bisque is biased to cell type proportions in the reference snRNA-seq data set
Bisque and hspe had relativly fast runtimes and low memory requirements

Be sure to check out the paper for more! 📃

DeconvoBuddies

In conjunction with this study we have developed a Bioconductor package DeconvoBuddies.

DeconvoBuddies is currently on the devel branch and will be included in the next release (April 2025) release of Bioconductor.

The main features of the package are:

Find Marker Genes

Implements Mean Ratio marker gene selection get_mean_ratio()
Implements 1 vs. All marker gene selection findMarkers_1vALL()

Plotting tools

Quickly plot gene expression over cell types (or other category) plot_gene_express()
Plot top marker genes with annotated statistics plot_marker_express
Plot Composition bar plots of deconvolution outputs plot_comoposition_bar()

Access Data

Access paired data from consecutive slices of human DLPFC, used in deconvolution benchmark fetch_deconvo_data()
- Access the RNA-scope, snRNA-seq, and bulk RNA-seq data described above

Truly TL;DR

In this benchmark we used a multi-assay dataset from the human DLPFC to compare deconvolution performace in six top methods. RNAScope/IF cell type estimates were utilized as an orthogonal measurement of the true cell type composition. We developed the Mean Ratio method to select highly specific cell type marker genes.

The top preforming deconvolution methods in brain were hspe(Hunt et al. 2019) and Bisque (Jew et al. 2020). 🏆

We found many factors such as n reference donors, marker genes selection, and library type of bulk RNA-seq can impact performance of deconvolution methods. The dataset, MeanRatio function, and other useful functions for deconvolution are included in our Bioconductor package DeconvoBuddies.

Be sure to check out the paper for the full exploration of Deconvolution Method performance (Huuki-Myers et al., n.d.) ! https://doi.org/10.1186/s13059-025-03552-3

References

Avila Cobos, Francisco, José Alquicira-Hernandez, Joseph E. Powell, Pieter Mestdagh, and Katleen De Preter. 2020. “Benchmarking of Cell Type Deconvolution Pipelines for Transcriptomics Data.” Nature Communications 11 (November): 5650. https://doi.org/10.1038/s41467-020-19015-1.

Chu, Tinyi, Zhong Wang, Dana Pe’er, and Charles G. Danko. 2022. “Cell Type and Gene Expression Deconvolution with BayesPrism Enables Bayesian Integrative Analysis Across Bulk and Single-Cell RNA Sequencing in Oncology.” Nature Cancer 3 (4): 505–17. https://doi.org/10.1038/s43018-022-00356-3.

Dai, Rujia, Tianyao Chu, Ming Zhang, Xuan Wang, Alexandre Jourdon, Feinan Wu, Jessica Mariani, et al. n.d. “Evaluating Performance and Applications of Sample-Wise Cell Deconvolution Methods on Human Brain Transcriptomic Data.” https://doi.org/10.1101/2023.03.13.532468.

Hippen, Ariel A., Dalia K. Omran, Lukas M. Weber, Euihye Jung, Ronny Drapkin, Jennifer A. Doherty, Stephanie C. Hicks, and Casey S. Greene. 2023. “Performance of Computational Algorithms to Deconvolve Heterogeneous Bulk Ovarian Tumor Tissue Depends on Experimental Factors.” Genome Biology 24 (1): 239. https://doi.org/10.1186/s13059-023-03077-7.

Hunt, Gregory J, Saskia Freytag, Melanie Bahlo, and Johann A Gagnon-Bartsch. 2019. “Dtangle: Accurate and Robust Cell Type Deconvolution.” Bioinformatics 35 (12): 2093–99. https://doi.org/10.1093/bioinformatics/bty926.

Huuki-Myers, Louise A., Kelsey D. Montgomery, Sang Ho Kwon, Sophia Cinquemani, Nicholas J. Eagles, Daianna Gonzalez-Padilla, Sean K. Maden, et al. n.d. “Benchmark of Cellular Deconvolution Methods Using a Multi-Assay Reference Dataset from Postmortem Human Prefrontal Cortex.” https://doi.org/10.1101/2024.02.09.579665.

Jew, Brandon, Marcus Alvarez, Elior Rahmani, Zong Miao, Arthur Ko, Kristina M. Garske, Jae Hoon Sul, Kirsi H. Pietiläinen, Päivi Pajukanta, and Eran Halperin. 2020. “Accurate Estimation of Cell Composition in Bulk Expression Through Robust Integration of Single-Cell Information.” Nature Communications 11 (1): 1971. https://doi.org/10.1038/s41467-020-15816-6.

Jin, Haijing, and Zhandong Liu. 2021. “A Benchmark for RNA-Seq Deconvolution Analysis Under Dynamic Testing Environments.” Genome Biology 22 (1): 102. https://doi.org/10.1186/s13059-021-02290-6.

Newman, Aaron M., Chloé B. Steen, Chih Long Liu, Andrew J. Gentles, Aadel A. Chaudhuri, Florian Scherer, Michael S. Khodadoust, et al. 2019. “Determining Cell Type Abundance and Expression from Bulk Tissues with Digital Cytometry.” Nature Biotechnology 37 (7): 773–82. https://doi.org/10.1038/s41587-019-0114-2.

Tsoucas, Daphne, Rui Dong, Haide Chen, Qian Zhu, Guoji Guo, and Guo-Cheng Yuan. 2019. “Accurate Estimation of Cell-Type Composition from Gene Expression Data.” Nature Communications 10 (July): 2975. https://doi.org/10.1038/s41467-019-10802-z.

Wang, Xuran, Jihwan Park, Katalin Susztak, Nancy R. Zhang, and Mingyao Li. 2019. “Bulk Tissue Cell Type Deconvolution with Multi-Subject Single-Cell Expression Reference.” Nature Communications 10 (1): 380. https://doi.org/10.1038/s41467-018-08023-x.