Louise A. Huuki-Myers

Deconvolution Benchmark: TL;DR

Louise A. Huuki-Myers — Wed, 09 Apr 2025 00:00:00 GMT

Introduction

This blog post provides a high-level summary of our paper “Benchmark of cellular deconvolution methods using a multi-assay dataset from postmortem human prefrontal cortex” published in Genome Biology in April, 2025 (Huuki-Myers et al., n.d.).

In this deconvolution benchmark project we set out to determine the most accurate method for predicting cell type composition in bulk RNA-seq data from brain tissue. We also evaluated method for selecting marker genes, and introduced the MeanRatio method for marker gene selection. The dataset developed for this experiment, MeanRatio functions, and other helpful tools for deconvolution are available in the DeconvBuddies Bioconductor package.

What is deconvolution?

Complex tissue is made up of different cell types that express genes at different levels. In bulk RNA-seq this heterogeneity of the tissue is obscured, and the gene expression measurements represent a mixture of all of the cells and cell types in the sample. Differences in the cell type composition between samples, either technical or biologically real, can confound downstream analysis such as differential expression.

Deconvolution is an analysis that infers the cell type composition of bulk RNA-seq data, using gene expression profiles from single cell data.

Cartoon overview of deconvolution

How to preform deconvolution?

To run deconvolution you’ll need:

Your Bulk RNA-seq gene expression data
A refrence single cell RNA-seq gene expression data set, from the same tissue type
A deconvolution method (computational algorithm)

Available Deconvolution Methods

Reviewing the literature we found 20+ deconvolution methods available. This presents quite an overwhelming choice for researchers! Are there big diffrences between methods? If so how can we chose the most accurate method?

Choosing a method

Existing Benchmarks

Benchmark studies aim to test and rank the performance of available methods. There have been several benchmarks studies on deconovlution methods, both with-in papers presenting new methods and as separate studies. However there is not much of a consensus on which method is the most accurate:

Benchmarking results from different papers on “real” data

MuSiC paper (Wang et al. 2019): MuSiC > NNLS > BSEQ-sx > CIBERSORT
Bisque paper (Jew et al. 2020): Bisque > MuSiC > CIBERSORT
Cobos benchmark (Avila Cobos et al. 2020): DWLS > MuSiC > Bisque > deconvoSeq
Jin et al. benchmark (Jin and Liu 2021): CIBERSORT, MuSiC > EPIC*, TIMER, DeconRNAseq
Dai et al., benchmark (Dai et al., n.d.): Dtangle > Bisque > Other Methods

Additionally the Cobos et al., 2020 benchmark study shows that different methods preform best on different data sets (Avila Cobos et al. 2020).

A challenge in benchmark studies is producing a “ground truth” estimate for cell type composition. Often in benchmarks pseudobulk mixtures created from the single cell data are used as the bulk data, so the absolute composition is known.

However we think pseudobulk data might not be a stand-in for real bulk RNA-seq data. Better to use orthogonal measurement of cell type compositions paired with real bulk RNA-seq data. We also were curious about the performance of methods specifically in brain RNA-seq data.

This motivated us to run our own deconvolution benchmark study!

Deconvolution Benchmark Study

Benchmark study design: Use orthogonal RNAScope cell type proportions to evaluate accuracy of deconvolution methods

Study Design

We designed an experiment to evaluate the performance of deconvolution methods on human brain tissue, specifically the dorsal lateral pre-frontal cortex (DLPFC). We used consecutive slices of 22 DLPFC brain blocks from 10 neurotypical donors, to create three assays:

RNAScope: orthogonal measurement of cell type compositions for six major cell types (n=25)
snRNA-seq: reference single nucleus data (n=19)
Bulk RNA-seq: using a variety of library types and RNA extractions methods (n=110)

RNAScope Cell Type Proportions 🔬

To obtain orthogonal measurements of cell type proportions for six major cell types in the DLPFC, we utilized multiplex single molecule fluorescent in situ hybridization (smFISH) combined with immunofluorescence (IF) using RNAScope/IF.

We designed two probe combinations:

Star measures:
1. Excitatory Neurons (Excit)
2. Mircoglia (Micro)
3. combined Oligodenrocytes and Oligodendrocyte Precursor cells (OligoOPC)
Circle measures:
1. Inhibitory Neurons (Inhib)
2. Endothelial/Mural cells (EndoMural)
3. Astrocytes (Astro)

We used HALO to segment and label cell types, then calculated cell type porotions for each sample.

RNAScope/IF Experiment Design. A. Star and Circle probe combinations measure 3 cell types each. Example flourescent images of B. Star and C. Circle. D. Bar plots of estimated cell type compositions

Single Nucleus Reference dataset

The snRNA-seq data was previously analyzed as part of the spatialDLPFC project (see Huuki-Myers et al., or previous blog post for more details. This reference consist of 56k nuclei from 19 samples with seven broad cell types.

tSNE plot and overall cell type composition for snRNA-seq dataset

Bulk RNA-seq Data

For the bulk RNA-seq we we curious if using different library types (polyA or RiboZero) and RNA Extraction (nuclear, cytoplasmic, or total) would impact the accuracy of deconvolution. So for each brain block we prepared one sample of each library combination.

Analyzing just the bulk RNA-seq data we saw large differences in gene expression between the different preparations of the bulk data, principal component analysis shows the data divide by library type and RNA extraction. We were suspicious that these technical differences in gene expression would impact deconvolution estimates, a good deconvolution method should be robust to the differences in datatypes.

tile plot showing n samples over library type and RNA extraction, PCA of the genes expression shows PC1 seperates Library type, PC2 seperates RNA extraction

Which methods to test?

From the large number of available methods we selected six methods that were previously selected as top performers in other benchmark papers, and applied a range of different approaches: DWLS, Bisque, MuSiC, hspe, BayesPrism, and CIBERSORTx (detailed below).

Method	Citation	Approach	Marker Gene Selection	Availability	Top Benchmark Performance
DWLS (Dampened weighted least-squares)	(Tsoucas et al. 2019)	weighted least squares	-	R package on CRAN	(Avila Cobos et al. 2020)
Bisque	(Jew et al. 2020)	Bias correction: Assay	-	R package on GitHub	(Dai et al., n.d.)
MuSiC (Multi-subject Single-cell)	(Wang et al. 2019)	Bias correction: Source	Weights Genes	R package GitHub	(Jin and Liu 2021)
BayesPrism	(Chu et al. 2022)	Bayesian	Pairwise t-test	Webtool, R package on GitHub	(Hippen et al. 2023)
hspe (dtangle) (hybrid-scale proportion estimation)	(Hunt et al. 2019)	High collinearity adjustment	Multiple options- default “ratio” 1vALL mean expression ratio	R package on GitHub	(Dai et al., n.d.)
CIBERSORTx	(Newman et al. 2019)	Machine Learning	Differential Gene expression	Webtool, Docker Image	(Jin and Liu 2021)

Marker Gene Selection

A strategy to improve accuracy in deconvolution is to limit the analysis to a set of cell type marker genes; reducing noise in the analysis. To help select cell type specific marker genes we have developed the Mean Ratio method.

The Mean Ratio method works by selecting genes with large differences between gene expression in the target cell type and the closest non-target cell type. We calculate the MeanRatio for a target cell type for each gene by dividing the mean expression of the target cell by the mean expression of the next highest non-target cell type. Genes with the highest MeanRatio values are selected as marker genes.

Illustration of Mean Ratio marker selection method, and heatmap of top Mean Ratio marker genes

For more information about selecting marker genes with Mean Ratio see Finding Marker Genes with DeconvoBuddies.

In our benchmark we found that methods responded differently and unpredictably to different marker gene sets, but top methods preformed better using the top 25 Mean Ratio marker genes for each cell type.

Method Performance 🏆

On to the main event: time to evaluate the deconvolution methods!

We preformed deconvolution on the 110 bulk RNA seq samples, with each of the six selected methods, using the top25 Mean Ratio genes.

We then compared the estimated cell type proportions with the RNAScope cell type proportions. We calculated Pearson’s correlation and the root mean squared error (RMSE) between the two. Methods with high correlation and low RMSE are the most accurate.

Overall Bisque and hspe were the top preforming methods. 🏆

These were also the top methods in Dai et al., benchmark which also examined brain data (Dai et al., n.d.).

Bisque preformed slightly better in polyA data, hspe slightly better in RiboZero data. CIBERSORTx was a close third place, preforming similarly to Bisque and hspe in polyA data.

A. Scatter plot of RNAScope proportions vs. Method estimated proportions. B. Pearson’s correlation for each method over bulk RNA-seq library combinations, point size corresponds to rmse

Other Results

Above I have highlighted the main study design and conclusions of our deconvolution benchmark. In the paper we explored many more facets of deconvolution method performance. Some other results to highlight:

hspe is sensitive to marker gene selection
Bisque can preform poorly with < 4 donors
Bisque an hspe are unaffected by including “case” donors in the snRNA-seq reference
Bisque is biased to cell type proportions in the reference snRNA-seq data set
Bisque and hspe had relativly fast runtimes and low memory requirements

Be sure to check out the paper for more! 📃

DeconvoBuddies

In conjunction with this study we have developed a Bioconductor package DeconvoBuddies.

DeconvoBuddies is currently on the devel branch and will be included in the next release (April 2025) release of Bioconductor.

The main features of the package are:

Find Marker Genes

Implements Mean Ratio marker gene selection get_mean_ratio()
Implements 1 vs. All marker gene selection findMarkers_1vALL()

Plotting tools

Quickly plot gene expression over cell types (or other category) plot_gene_express()
Plot top marker genes with annotated statistics plot_marker_express
Plot Composition bar plots of deconvolution outputs plot_comoposition_bar()

Access Data

Access paired data from consecutive slices of human DLPFC, used in deconvolution benchmark fetch_deconvo_data()
- Access the RNA-scope, snRNA-seq, and bulk RNA-seq data described above

Truly TL;DR

In this benchmark we used a multi-assay dataset from the human DLPFC to compare deconvolution performace in six top methods. RNAScope/IF cell type estimates were utilized as an orthogonal measurement of the true cell type composition. We developed the Mean Ratio method to select highly specific cell type marker genes.

The top preforming deconvolution methods in brain were hspe(Hunt et al. 2019) and Bisque (Jew et al. 2020). 🏆

We found many factors such as n reference donors, marker genes selection, and library type of bulk RNA-seq can impact performance of deconvolution methods. The dataset, MeanRatio function, and other useful functions for deconvolution are included in our Bioconductor package DeconvoBuddies.

Be sure to check out the paper for the full exploration of Deconvolution Method performance (Huuki-Myers et al., n.d.) ! https://doi.org/10.1186/s13059-025-03552-3

References

Avila Cobos, Francisco, José Alquicira-Hernandez, Joseph E. Powell, Pieter Mestdagh, and Katleen De Preter. 2020. “Benchmarking of Cell Type Deconvolution Pipelines for Transcriptomics Data.” Nature Communications 11 (November): 5650. https://doi.org/10.1038/s41467-020-19015-1.

Chu, Tinyi, Zhong Wang, Dana Pe’er, and Charles G. Danko. 2022. “Cell Type and Gene Expression Deconvolution with BayesPrism Enables Bayesian Integrative Analysis Across Bulk and Single-Cell RNA Sequencing in Oncology.” Nature Cancer 3 (4): 505–17. https://doi.org/10.1038/s43018-022-00356-3.

Dai, Rujia, Tianyao Chu, Ming Zhang, Xuan Wang, Alexandre Jourdon, Feinan Wu, Jessica Mariani, et al. n.d. “Evaluating Performance and Applications of Sample-Wise Cell Deconvolution Methods on Human Brain Transcriptomic Data.” https://doi.org/10.1101/2023.03.13.532468.

Hippen, Ariel A., Dalia K. Omran, Lukas M. Weber, Euihye Jung, Ronny Drapkin, Jennifer A. Doherty, Stephanie C. Hicks, and Casey S. Greene. 2023. “Performance of Computational Algorithms to Deconvolve Heterogeneous Bulk Ovarian Tumor Tissue Depends on Experimental Factors.” Genome Biology 24 (1): 239. https://doi.org/10.1186/s13059-023-03077-7.

Hunt, Gregory J, Saskia Freytag, Melanie Bahlo, and Johann A Gagnon-Bartsch. 2019. “Dtangle: Accurate and Robust Cell Type Deconvolution.” Bioinformatics 35 (12): 2093–99. https://doi.org/10.1093/bioinformatics/bty926.

Huuki-Myers, Louise A., Kelsey D. Montgomery, Sang Ho Kwon, Sophia Cinquemani, Nicholas J. Eagles, Daianna Gonzalez-Padilla, Sean K. Maden, et al. n.d. “Benchmark of Cellular Deconvolution Methods Using a Multi-Assay Reference Dataset from Postmortem Human Prefrontal Cortex.” https://doi.org/10.1101/2024.02.09.579665.

Jew, Brandon, Marcus Alvarez, Elior Rahmani, Zong Miao, Arthur Ko, Kristina M. Garske, Jae Hoon Sul, Kirsi H. Pietiläinen, Päivi Pajukanta, and Eran Halperin. 2020. “Accurate Estimation of Cell Composition in Bulk Expression Through Robust Integration of Single-Cell Information.” Nature Communications 11 (1): 1971. https://doi.org/10.1038/s41467-020-15816-6.

Jin, Haijing, and Zhandong Liu. 2021. “A Benchmark for RNA-Seq Deconvolution Analysis Under Dynamic Testing Environments.” Genome Biology 22 (1): 102. https://doi.org/10.1186/s13059-021-02290-6.

Newman, Aaron M., Chloé B. Steen, Chih Long Liu, Andrew J. Gentles, Aadel A. Chaudhuri, Florian Scherer, Michael S. Khodadoust, et al. 2019. “Determining Cell Type Abundance and Expression from Bulk Tissues with Digital Cytometry.” Nature Biotechnology 37 (7): 773–82. https://doi.org/10.1038/s41587-019-0114-2.

Tsoucas, Daphne, Rui Dong, Haide Chen, Qian Zhu, Guoji Guo, and Guo-Cheng Yuan. 2019. “Accurate Estimation of Cell-Type Composition from Gene Expression Data.” Nature Communications 10 (July): 2975. https://doi.org/10.1038/s41467-019-10802-z.

Wang, Xuran, Jihwan Park, Katalin Susztak, Nancy R. Zhang, and Mingyao Li. 2019. “Bulk Tissue Cell Type Deconvolution with Multi-Subject Single-Cell Expression Reference.” Nature Communications 10 (1): 380. https://doi.org/10.1038/s41467-018-08023-x.

Spatial DLPFC: TL;DR

Louise A. Huuki-Myers — Thu, 23 May 2024 00:00:00 GMT

Introduction

This blog post provides a high-level summary of our paper “A data-driven single cell and spatial transcriptomic map of the human prefrontal cortex” published in Science in May 2024 (aka spatialDLPFC)(Huuki-Myers et al. 2024).

In the spatialDLPFC project we set out to learn more about the organization of the dorsolateral prefrontal cortex (aka DLPFC), its cell types, and gene expression profile 🧠.

Graphical abstract for the spatialDLPFC project published in Science

Background

DLPFC

The dorsolateral prefrontal cortex region of the brain is especially important for executive functions including working memory, cognitive flexibility, and planning. Disruptions of the DLPFC have been associated with several psychiatric and neurodevelopmental disorders, including schizophrenia and autism spectrum disorder.

Location of the DLPFC, its laminar structure (illustration from (House and Pansky, n.d.)), and major cell types.

RNA-sequencing

One of the ways that we can understand the functions of different cell types and structures in the brain is to study what genes they express by sequencing the RNA in a tissue. Recently, several advanced transcriptomic¹ approaches using RNA sequencing have emerged, enhancing our ability to analyze gene expression in the brain.

This LEGO brain schematic demonstrates the evolution from bulk RNA sequencing, which provides a mixture of cell types, to single cell/single nucleus RNA-seq, which reveals the transcriptional profiles of individual cell types. The latest advancement, spatial transcriptomics, links gene expression to specific anatomical locations, providing deeper insights into the relationships between brain structure and function.

Single Nucleus RNA-seq

Single nucleus or single cell RNA sequencing (snRNA-seq) enables us to examine the gene expression of individual cells or nuclei. This technique relies on uniquely barcoded gel beads that attach to a single cell or nucleus, tagging all RNA molecules from that cell. When sequenced, these tagged RNA molecules can be traced back to their original cell. Cells or nuclei are then typically clustered by their gene expression profiles to identify different cell type populations. The expression profiles and cluster assignments are often visualized using reduced dimension plots such as UMAPs or tSNE. In these plots, each point represents a cell, and the distance between points indicates their similarity²; closer points represent more similar cells, which are often of the same cell type (shown by different colors).

Cartoon of 10x snRNA-seq process (via 10x Genomics), and tSNE plot

In this experiment we are working with nuclei, as the cell membrane is destroyed when the brain tissue is frozen. The major cell type populations to identify in the DLPFC are neurons (Excitatory and Inhibitory), glial cells (ex: Astrocytes, Microglia, Oligodendrocytes, OPC), and vascular cells (Endothelial & Mural).

Spatially Resolved Transcriptomics (Visium)

Spatially resolved transcriptomics maps RNA to specific locations on a tissue sample, allowing us to profile gene expression across anatomical features such as blood vessels, glands, or, in our case, layers of the brain’s cortex.

Cartoon of Visium spatial transcriptomics (via 10x genomics), and example spot plots

We used Visium slides, which feature a grid of approximately 5,000 spots arranged in a 6.5x6.5 mm area. Each spot has a unique barcode that binds to the RNA in the contacted tissue. When the RNA is sequenced, these molecules can be traced back to their specific grid locations, similar to the barcodes in snRNA-seq.

This RNA-seq data is paired with a high-definition histology image of the original tissue, providing additional information and aiding in data visualization. We can visualize the gene expression of each spot in “spot plots” using color gradients overlaid on these images. In the example above we highlight the location of the gray matter with SNAP25 a gene highly expressed in neurons, MBP highlights white matter, and PCP4 marks layer 5.

Study Design

Study design for spatialDLPFC

In this study we analyzed the DLPFC of ten healthy adult donors. We sampled three locations of the DLPFC: the anterior, middle, and posterior. All 30 samples were analyzed with Visium spatial transcriptomics, 19 (about 2 from each donor) were selected for snRNA-seq.

Data-Driven Spatial Domains

An earlier study, from the Lieber Institute, of spatial transcriptomics in the DLPFC (Maynard et al. 2021) relied on manually annotating the known layers of the cortex based on the histological images and the expression of select genes. This dataset has been invaluable for testing methodologies in spatial transcriptomics. However, manual annotation is tedious, time-consuming, and prone to human error and bias.

In our current study, which builds on the previous DLPFC project, we aimed to use unsupervised clustering to annotate the layers of the DLPFC, thereby avoiding the labor-intensive process of manual annotation and potentially discovering novel or unknown layers in the brain.

Based on benchmarking against the manually annotated layer data, we chose the method BayesSpace as the best method for clustering spatial data. We clustered the 30 Visium slides at a large range of resolutions, from k=2 to 28 (k denotes the number of clusters). We refer to these clusters as spatial domains, to name these domains we used the syntax , where k is clustering resolution and d is spatial domain number, so is spatial domain 1 when k=9.

We found that k=2 did a great job separating the white matter from the gray matter. With an increasing number of clusters, the layers of the cortex begin to emerge. This brings us to a question: which level of clustering best captures biologically important layers of the DLPFC?

A. Histological images of three DLPFC tissue sections B. spatial clustering at k=2, 9, and 16

Spatial Registration of BayesSpace Clusters

To check which resolution of BayesSpace clusters best matches the six histological layers plus white matter, we used a useful analysis we’ve developed called “spatial registration”. We will delve into the details of this analysis in a future blog post, and its application in this vignette.

Briefly this analysis compares the gene expression profile of a reference set of clusters such as spatial regions or domains, annotated features, or cell type populations etc. (in this case the manual annotations from the pilot dataset), to a query set of clusters we want to learn more about (the BayesSpace clusters). The t-statistics from an enrichment analysis in the query and the reference set are correlated, pairwise across all groups. We visualize this in a heatmap where the high correlation is green, low correlation is purple. Where a query cluster has high correlation with a reference cluster, we can say the two groups are associated, and if the correlation passes our threshold we annotate the query group with the reference.

In the below example has a high correlation with the manual annotation white matter, we then annotate it as . This annotation helps add biological context to our newly defined spatial domains.

Example spatial registration between manual layers and k=7 BayesSpace clusters

From this process we learned that k=9 best reiterated the expected pattern of six layers + white matter, by matching each spatial domain to only one layer. In contrast to the k=7 resolution where some of the spatial domains ( and ) matched more than one layer. K=9 split white matter and Layer 1 into two spatial domains with unique gene expression.

BayesSpace k=9 cluster spatial registration vs. manual layers

For higher resolution clustering, k=16 was determined to be the optimal number of clusters based on the fast H+ statistic, so based on the data this is the best way to cluster the data. This further split the six original layers into 2-3 sub-layers each. The maximum number of clusters we could comfortably run on our computing setup was k=28, at this high number of clusters we lose the laminar definition.

Novel Biology in Spatial Domains

So what does all this clustering and layer matching help us learn about the brain?

At each resolution differentially expressed genes were detected between the spatial domains, this shows the complex organization of gene expression across the DLPFC tissue.

Clustering at k=9 highlighting , spot plots of and Violin plots of CLDN5 expression

The data-driven clustering at k=9 revealed a sub-layer of the white matter with as much difference in gene expression that exists in the previously considered layers. It also found a thin band of vascular tissue () in layer 1 with high expression for endothelial genes like CLDN5. These were both novel findings resulting from the unsupervised clustering. The sub-layers found in k=16 had distinct gene expression profiles.

These new spatial domains help refine the layered anatomy of the DLPFC. Neat! 🎉

Single Nucleus RNA-seq

Cartoons of brain cell types, Created with BioRender.com

On the single nucleus side of the experiment, we processed 56k nuclei from n=19 samples. The first round of clustering (hierarchical clustering) found 29 distinct cell type clusters from seven broad cell types (note the abbreviations):

Glia & Vascular cells: provide structure to the brain, support neurons³

Astrocytes (Astro):link neurons to blood supply, clear neurotransmitters
Endothelial/Mural cells (EndoMural): blood vessels/vascular tissue
Microglia (Micro): immune function
Oligodendrocytes (Oligo): myelin sheath
Oligodendrocyte Precursor cells (OPCs)

Neurons: send and receive signals in the brain

Excitatory Neurons (Excit)
Inhibitory Neurons (Inhib)

tSNE plot of snRNA-seq with 29 hierarchical clusters

Sub-populations in EndoMural, Oligos, and the Excit/Inhib Neurons were found in the first round of clustering.

In the DLPFC we know that different populations of excitatory neurons exist between the six layers of gray matter. To annotate our 13 Excit clusters we brought back our spatial registration tool, comparing all of the 29 hierarchical clusters to the manually annotated clusters from (Maynard et al. 2021) as well as the BayesSpace spatial domains at k=9 & k=16.

Spatial registration between the 29 snRNA-seq hierarchical clusters vs. histological layers or spatial domains at k=9 & 16. Annotations with good confidence (cor > 0.25, merge ratio = 0.1) are marked with “X” and poor confidence are marked with “*”.

We found Oligo and OPC cell types mapped to white matter, and EndoMural plus Astro mapped to Layer 1. Inhib neurons had a weak association with Layer 2-4, and the Excit neurons had strong associations with 1-3 layers each across the gray matter. The same patterns were found and re-fined in the spatial domains, such as the EndoMural groups mapping to .

tSNE plot of snRNA-seq data with layer level annotations

The layer associations were used to annotate the excitatory neuron populations by their strongest associated layers, other cell types were collapsed to their broad cell types. This resulted in our “layer-level” annotation with 13 cell types, and 7 populations of Excit neurons.

Heatmap of the scaled pseudo-bulked logcounts for the top 10 marker genes for each layer level cell type

For each cell type we identified cell type specific marker genes with the Mean Ratio method described in (Huuki-Myers et al., n.d.). The end product is gene expression profiles for layer annotated cell types in the human DLPFC! 🦠

Data Integration

With this combined spatial and snRNA-seq data, there are a number of interesting downstream analyzes possible. Here I will briefly touch on two ways we integrated these data types.

Spot Deconvolution

Overview of spot deconvolution: multiple cells exist in each spot, deconvolution predicts the cell type composition of each spot.

A challenge with Visium spatial transcriptomics is that each spot is larger than single cell resolution, and on average contains 3 cells per spot. To better understand the gene expression of each spot, we employed an analysis called spot deconvolution, which predicts what cell types exist in the tissue for each Visium spot.

We determined that the methods Tangram and Cell2location were the most accurate for predicting cell type compositions through a benchmark experiment. From there we predicted the cell type composition of the spots across the 30 Visium slides with both deconvolution methods.

The spot deconvolution work was performed by Nick Eagles. Check out his spot deconvolution slide deck above for more details.

Spatially Map Disease Ligand Receptor Interactions

Cell-cell communication, EFNA5 and EPHA5 co-localizing in , cartoon of LR interaction in a Visium spot

To show how this dataset can be a rich resource to study neuropsychiatric diseases we explored the spatial location of a ligand-receptor (LR) interaction that is associated with schizophrenia. We performed a cell-cell communication analysis which predicts which cell types are interacting with each other, and then identified overlapping LR pairs with risk of schizophrenia from databases with the cell-cell communication results. From the common set of LR pairs we examined ligand EFNA5 & receptor EPHA5. From the snRNA-seq populations, EFNA5 was most expressed in Excit_L5/6, and EPHA5 in Excit_L6. From the Visium data we identified spots where the two genes were co-expressed, most frequent in , these spots also had high proportions of Excit_L5/6 neurons and Excit_L6 neurons predicted by spot deconvolution. Spatially mapping LR pairs helps us gain insight into the potentials for drug development. (This cool work was completed by Boyi Guo and Melissa Grant-Peters)

This analysis used many elements of the data from the spatialDLPFC project, and is just one example of how this dataset is relevant to the study of disease. In another application we also checked for enrichment of depression and PTSD related genes between the spatial domains. There are lots of exciting applications for the study of diseases with spatial and single cell, stay tuned to future work from the Lieber Institute for more! 👀

Summary

Overall we’ve created a paired spatial transcriptomic and single nucleus RNA-seq dataset of the human DLPFC. We’ve used spatial registration to map the new spatial domains and excitatory neurons to the classical histological layers. The data-driven spatial domains refine the layers of the DLPFC, finding laminar domains and cortical sub-layers. Spot deconvolution further refines the profile of each spot. This data has many applications in the study of neuropsychiatric diseases. We’ve made this dataset widely available to the scientific community (see below).

For more details be sure to check out our recently published paper in Science (Huuki-Myers et al. 2024) 🎉https://doi.org/10.1126/science.adh1938

Data Availability

The 30 DLPFC Visium samples & the 56k nuclei snRNA-seq dataset are available to explore on our interactive websites and Bioconductor/R package spatialLIBD.

Check out how your favorite gene is expressed over the layers or cell types of the DLPFC!

Comments 💬

House, Earl Lawrence, and Ben Pansky. n.d. A Functional Approach to Neuroanatomy. 2nd ed. Blakiston Division.

Huuki-Myers, Louise A., Abby Spangler, Nicholas J. Eagles, Kelsey D. Montgomery, Sang Ho Kwon, Boyi Guo, Melissa Grant-Peters, et al. 2024. “A Data-Driven Single-Cell and Spatial Transcriptomic Map of the Human Prefrontal Cortex.” Science. https://doi.org/10.1126/science.adh1938.

Maynard, KR, L Collado-Torres, LM Weber, C Uytingco, BK Barry, SR Williams, JL Catallini, et al. 2021. “Transcriptome-Scale Spatial Gene Expression in the Human Dorsolateral Prefrontal Cortex.” Nature Neuroscience 24 (3): 425–36. https://doi.org/10.1038/s41593-020-00787-0.

Footnotes

the measurement of RNA transcription is known as “transcriptomics”↩︎
The full interpretation of these kinds of plots takes much nuance we won’t discuss here↩︎
The following are brief notes on cell type function to provide context, not comprehensive descriptions of the complex roles of these cell types↩︎