SRT data embedding, clustering and integration

Overview

This tutorial introduce how to use SRTpipeline (>=0.1.0) to analyze multiple spatially-resolved transcriptomics (SRT) data. We emphases how to use PRECAST model for joint spatial embedding, clustering and integration and its followed applications based on SRTProject object in the SRTpipeline package. This tutorial will cover the following tasks, which we believe will be common for many spatial analyses:

Normalization
Feature selection
Dimension reduction
Spatial clustering
Data integration
Batch-corrected gene expression
DEG analysis
Spatial trajectory inference
Visualization

First, we load SRTpipeline.

library(SRTpipeline)
set.seed(2023)

Dataset

For this tutorial, we will introduce how to create a SRTProject object with multiple SRT samples using SRTpipeline that includes an introduction to common analytical workflows for multiple data batches. Here, we will be taking spatial transcriptomics dataset for human breast cancer as an example. There are two tissue slices with 3,500~4,000 spots and 36,601 genes that were sequenced on the 10X Visium platform. Our preprocessed data can be downloaded here, and This data is also available at 10X genomics data website:

Section1: https://support.10xgenomics.com/spatial-gene-expression/datasets/1.1.0/V1_Breast_Cancer_Block_A_Section_1

Section2: https://support.10xgenomics.com/spatial-gene-expression/datasets/1.1.0/V1_Breast_Cancer_Block_A_Section_2.

They can be downloaded to the current working path by the following command:

githubURL <- "https://github.com/feiyoung/PRECAST/blob/main/vignettes_data/bc2.rda?raw=true"
download.file(githubURL, "bc2.rda", mode = "wb")

Then load to R

load("bc2.rda")

Pre-processing workflow

Prepare SRTProject

Then load to R.

n_sample <- length(bc2)
library(Seurat)
## create count matrix list: note each component has a name, i.e., `ID151672`.
cntList <- list()
## create spatial coordinate matrix
coordList <- list()
## create metadata list
metadataList <- list()

for (r in 1:n_sample) {
    # r <- 1
    message("r = ", r)
    seu <- bc2[[r]]
    sp_count <- seu@assays$RNA@counts
    meta.data <- seu@meta.data
    row.names(meta.data) <- colnames(sp_count)
    cntList[[r]] <- sp_count
    metadataList[[r]] <- meta.data
    coordList[[r]] <- cbind(row = seu$row, col = seu$col)
}
names(cntList) <- paste0("BC", 1:n_sample)
## create meta data for each data batches. Here we only have one data batch.
sampleMetadata <- data.frame(species = rep("Human", n_sample), tissues = rep("Breast Cancer", n_sample))
row.names(sampleMetadata) <- names(cntList)
## Name of this project
projectName <- "BC2_PRECAST"

Create a SRTProject object

We start creating SRTProject object and filter out the genes less than 20 spots with nonzero read counts and the spots less than 20 genes with nonzero read counts.

SRTProj <- CreateSRTProject(cntList, coordList, projectName = projectName, metadataList, sampleMetadata,
    min.spots = 20, min.genes = 20)
SRTProj

## class: SRTProject 
## outputPath: F:\Research paper\IntegrateDRcluster\AnalysisCode\SRTpipeline\vignettes\BC2_PRECAST 
## h5filePath: F:\Research paper\IntegrateDRcluster\AnalysisCode\SRTpipeline\vignettes\BC2_PRECAST/BC2_PRECAST.h5 
## ---------Datasets basic information-----------------
## samples(2): BC1 BC2
## sampleColData names(3): species tissues NumOfSpots
## cellMetaData names(10): orig.ident nCount_RNA ... imagecol batch
## numberOfSpots(2): 3798 3987
## ---------Downstream analyses information-----------------
## Low-dimensional embeddings(0):
## Inferred cluster labels: No
## Embedding for plotting(0):

Normalizing the data

After removing unwanted cells and genes from the dataset, the next step is to normalize the data. To save RAM memory, normalized values are stored in disk as a h5file.

SRTProj <- normalizeSRT(SRTProj, normalization.method = "LogNormalize")

Feature selection

We next select a subset of genes that exhibit high spot-to-spot variation in the dataset (i.e, they are highly expressed in some spots, and lowly expressed in others). It has been found that focusing on these genes in downstream analysis helps to highlight biological signal in single-cell datasets here.

Then we choose variable features. The default number of variable features is 2,000, and users can change it using argument nfeatures. The default type is highly variable genes (HVGs), but users can use spatially variable genes by seting type='SVGs', then method='SPARK-X' can be used to choose SVGs.

SRTProj <- selectVariableFeatures(SRTProj, nfeatures = 2000, type = "HVGs", method = "vst")
SRTProj

## class: SRTProject 
## outputPath: F:\Research paper\IntegrateDRcluster\AnalysisCode\SRTpipeline\vignettes\BC2_PRECAST 
## h5filePath: F:\Research paper\IntegrateDRcluster\AnalysisCode\SRTpipeline\vignettes\BC2_PRECAST/BC2_PRECAST.h5 
## ---------Datasets basic information-----------------
## samples(2): BC1 BC2
## sampleColData names(3): species tissues NumOfSpots
## cellMetaData names(10): orig.ident nCount_RNA ... imagecol batch
## numberOfSpots(2): 3798 3987
## ---------Downstream analyses information-----------------
## Variable features:  2000 
## Low-dimensional embeddings(0):
## Inferred cluster labels: No
## Embedding for plotting(0):

Calculate the adjcence matrix

## Obtain adjacency matrix
SRTProj <- AddAdj(SRTProj, platform = "Visium")

Joint dimension reduction, clustering and integration analysis using PRECAST model

PRECAST model achieves joint dimension reduction, clustering and alignment by integration analysis for multiple samples. Some SRT clustering methods use Markov random field to model clusters of spots. These approaches work extremely well and are a standard practice in SRT data. PRECAST extracts the micro-environment related embeddings and aligned embeddings. The micro-environment related and aligned embeddings are saved in the slot reductions$microEnv.PRECAST and reductions$aligned.PRECAST, and the spatial clusters are saved in the slot clusters. Here, we set the number of clusters K=7:11, then the modified BIC criterion will be used to detemine the number of clusters from the candidates. From the results, K=8 is chosen.

SRTProj <- Integrate_PRECAST(SRTProj, K = 14, q = 15)

## fitting ...
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |======================================================================| 100%
## variable initialize finish! 
## predict Y and V! 
## diff Energy = 13.959933 
## Finish ICM step! 
## iter = 2, loglik= 16237385.000000, dloglik=1.007561 
## predict Y and V! 
## diff Energy = 5.039712 
## diff Energy = 0.810052 
## Finish ICM step! 
## iter = 3, loglik= 16287788.000000, dloglik=0.003104 
## predict Y and V! 
## diff Energy = 15.129362 
## diff Energy = 6.689987 
## Finish ICM step! 
## iter = 4, loglik= 16308760.000000, dloglik=0.001288 
## predict Y and V! 
## diff Energy = 4.357285 
## diff Energy = 12.809867 
## Finish ICM step! 
## iter = 5, loglik= 16320713.000000, dloglik=0.000733 
## predict Y and V! 
## diff Energy = 25.238286 
## diff Energy = 4.566235 
## Finish ICM step! 
## iter = 6, loglik= 16328001.000000, dloglik=0.000447 
## predict Y and V! 
## diff Energy = 0.497345 
## Finish ICM step! 
## iter = 7, loglik= 16332668.000000, dloglik=0.000286 
## predict Y and V! 
## diff Energy = 6.859631 
## diff Energy = 14.217402 
## Finish ICM step! 
## iter = 8, loglik= 16335633.000000, dloglik=0.000182 
## predict Y and V! 
## diff Energy = 4.492357 
## diff Energy = 8.096642 
## Finish ICM step! 
## iter = 9, loglik= 16337790.000000, dloglik=0.000132 
## predict Y and V! 
## diff Energy = 1.622852 
## diff Energy = 4.914173 
## Finish ICM step! 
## iter = 10, loglik= 16339316.000000, dloglik=0.000093 
## predict Y and V! 
## diff Energy = 1.304522 
## Finish ICM step! 
## iter = 11, loglik= 16340364.000000, dloglik=0.000064 
## predict Y and V! 
## diff Energy = 4.679978 
## Finish ICM step! 
## iter = 12, loglik= 16341204.000000, dloglik=0.000051 
## predict Y and V! 
## diff Energy = 0.437156 
## Finish ICM step! 
## iter = 13, loglik= 16341952.000000, dloglik=0.000046 
## predict Y and V! 
## diff Energy = 0.968800 
## diff Energy = 0.405632 
## Finish ICM step! 
## iter = 14, loglik= 16342510.000000, dloglik=0.000034 
## predict Y and V! 
## Finish ICM step! 
## iter = 15, loglik= 16343066.000000, dloglik=0.000034 
## predict Y and V! 
## diff Energy = 6.096696 
## Finish ICM step! 
## iter = 16, loglik= 16343398.000000, dloglik=0.000020 
## predict Y and V! 
## diff Energy = 3.817694 
## Finish ICM step! 
## iter = 17, loglik= 16343782.000000, dloglik=0.000023 
## predict Y and V! 
## diff Energy = 4.036143 
## diff Energy = 0.341253 
## Finish ICM step! 
## iter = 18, loglik= 16344149.000000, dloglik=0.000022 
## predict Y and V! 
## diff Energy = 1.008805 
## diff Energy = 0.853155 
## Finish ICM step! 
## iter = 19, loglik= 16344566.000000, dloglik=0.000026 
## predict Y and V! 
## diff Energy = 2.025461 
## Finish ICM step! 
## iter = 20, loglik= 16344901.000000, dloglik=0.000020

SRTProj

## class: SRTProject 
## outputPath: F:\Research paper\IntegrateDRcluster\AnalysisCode\SRTpipeline\vignettes\BC2_PRECAST 
## h5filePath: F:\Research paper\IntegrateDRcluster\AnalysisCode\SRTpipeline\vignettes\BC2_PRECAST/BC2_PRECAST.h5 
## ---------Datasets basic information-----------------
## samples(2): BC1 BC2
## sampleColData names(3): species tissues NumOfSpots
## cellMetaData names(10): orig.ident nCount_RNA ... imagecol batch
## numberOfSpots(2): 3798 3987
## ---------Downstream analyses information-----------------
## Variable features:  2000 
## Low-dimensional embeddings(2): microEnv.PRECAST aligned.PRECAST
## Inferred cluster labels: Yes
## Embedding for plotting(0):

table(SRTProj@clusters)

## 
##    1    2    3    4    5    6    7    8    9   10   11   12   13   14 
##  288  560  867  410  468  718  307  418 1211  490  976  392  357  323

Visualization

SRT embeddings

To check the performance of integration, we visuzlize the inferred clusters and data batches on the two-dimensional tSNEs. First, we use AddTSNE() function to calculate the two-dimensional tSNEs based on the aligned.PRECAST embeddings.

SRTProj <- AddTSNE(SRTProj, n_comp = 2, reduction = "aligned.PRECAST")

We visualized the inferred clusters and data batches. The tSNE plot showed the domain clusters were well segregated and data batches were well mixed.

cols_cluster <- chooseColors(palettes_name = "Classic 20", n_colors = 14)
cols_batch <- chooseColors(n_colors = 2)
p_tsne2_cluster <- EmbedPlot(SRTProj, item = "cluster", plotEmbeddings = "tSNE", cols = cols_cluster,
    legend.position = "bottom", pt_size = 0.2)
p_tsne2_batch <- EmbedPlot(SRTProj, item = "batch", plotEmbeddings = "tSNE", cols = cols_batch,
    legend.position = "bottom", pt_size = 0.2, nrow.legend = 2)
drawFigs(list(p_tsne2_cluster, p_tsne2_batch), layout.dim = c(1, 2), legend.position = "bottom",
    align = "hv")

Spatial heatmap to show the clustering performance

Except for the embedding plots, SRTpipeline also provides a variaty of visualization functions. First, we visualize the spatial distribution of cluster labels that shows the layer structure for all data batches.

## choose colors to function chooseColors
p12 <- EachClusterSpaHeatMap(SRTProj, cols = cols_cluster, legend.position = "bottom", base_size = 12,
    pt_size = 1, layout.dim = c(1, 2))
p12

By setting combine =FALSE, this function will return a list of ggplot2 objects, thus user can revise each plot. In addtion, we can also plot some of data batches that are interested using the number ID or names of batch.

pList <- EachClusterSpaHeatMap(SRTProj, cols = cols_cluster, legend.position = "bottom", base_size = 12,
    pt_size = 0.5, layout.dim = c(2, 4), nrow.legend = 1, combine = FALSE)
EachClusterSpaHeatMap(SRTProj, batch = 1, title_name = "PRECAST: ", cols = cols_cluster, legend.position = "bottom",
    base_size = 12, pt_size = 1, layout.dim = c(1, 1))

EachClusterSpaHeatMap(SRTProj, batch = c("BC1", "BC2"), title_name = "PRECAST: ", cols = cols_cluster,
    legend.position = "bottom", base_size = 12, pt_size = 1, layout.dim = c(1, 2))

RGB plot to show the embedding performance

Next, we summarized the inferred embeddings for biological eﬀects between spatial domain types (the slot reductions) using three components from either tSNE or UMAP and visualized the resulting tSNE/UMAP components with red/green/blue (RGB) colors in the RGB plot.

The resulting RGB plots from PRECAST showed the laminar organization of the human cerebral cortex, and PRECAST provided smooth transitions across neighboring spots and spatial domains.

SRTProj <- AddTSNE(SRTProj, n_comp = 3, reduction = "aligned.PRECAST")
p_tsne3 <- EachRGBSpaHeatMap(SRTProj, plot_type = "tSNE", pt_size = 0.5, title_name = "", layout.dim = c(1,
    2))
p_tsne3

To run UMAP in SRTpipeline we use the AddUMAP() function. Frist, we evaluate the two-dimensional UMAPs.

SRTProj <- AddUMAP(SRTProj, n_comp = 2, reduction = "aligned.PRECAST")
p_umap2_cluster <- EmbedPlot(SRTProj, item = "cluster", plotEmbeddings = "UMAP", cols = cols_cluster,
    legend.position = "bottom")
p_umap2_batch <- EmbedPlot(SRTProj, item = "batch", plotEmbeddings = "UMAP", cols = cols_batch,
    legend.position = "bottom", nrow.legend = 2)
drawFigs(list(p_umap2_cluster, p_umap2_batch), layout.dim = c(1, 2), legend.position = "bottom",
    align = "hv")

Then, we evaluate the three-dimensional UMAPs.

SRTProj <- AddUMAP(SRTProj, n_comp = 3, reduction = "aligned.PRECAST")
p_umap3 <- EachRGBSpaHeatMap(SRTProj, plot_type = "UMAP", layout.dim = c(1, 2), pt_size = 0.5, title_name = "UMAP: ")
p_umap3

To save the plot, we can use write_fig() function.

write_fig(p_umap3, filename = "PRECAST_p_umap3.png", width = 14, height = 11)

Moreover, we can visualize the microenvironment effects on the spatial coordinates. Then, we evaluate the three-dimensional UMAPs based on microEnv.PRECAST.

# save the previous calculated UMAP3
umap3_cluster <- SRTProj@plotEmbeddings$UMAP3
SRTProj <- AddUMAP(SRTProj, n_comp = 3, reduction = "microEnv.PRECAST")
p_umap3_micro <- EachRGBSpaHeatMap(SRTProj, plot_type = "UMAP", layout.dim = c(1, 2), pt_size = 1.6,
    title_name = "mircoEnv: ")
p_umap3_micro

Cell-by-cell heatmap to show the relation of spatial domains

We plotted the heatmap of Pearson’s correlation coefcients of the estimated embeddings among the detected domains shows the good separation of the estimated embeddings across domains and the correlations between domain 2 and 9 were high, while correlations among the separated layers were low, i.e., domain 1 and 3.

p_cc <- CCHeatMap(SRTProj, reduction = "aligned.PRECAST", grp_color = cols_cluster, ncol.legend = 3)
p_cc

After adding the quantities for data visualization, the SRTProject object will have more information in the downstream analyses information. Now, we print this SRTProject object to check it. We observed two components added in the slot plotEmbeddings (Embeddings for plotting): tSNE, tSNE3, UMAP and UMAP3.

SRTProj

## class: SRTProject 
## outputPath: F:\Research paper\IntegrateDRcluster\AnalysisCode\SRTpipeline\vignettes\BC2_PRECAST 
## h5filePath: F:\Research paper\IntegrateDRcluster\AnalysisCode\SRTpipeline\vignettes\BC2_PRECAST/BC2_PRECAST.h5 
## ---------Datasets basic information-----------------
## samples(2): BC1 BC2
## sampleColData names(3): species tissues NumOfSpots
## cellMetaData names(10): orig.ident nCount_RNA ... imagecol batch
## numberOfSpots(2): 3798 3987
## ---------Downstream analyses information-----------------
## Variable features:  2000 
## Low-dimensional embeddings(2): microEnv.PRECAST aligned.PRECAST
## Inferred cluster labels: Yes
## Embedding for plotting(4): tSNE tSNE3 UMAP UMAP3

Combined DEG analysis

After obtain the spatial cluster labels using a clustering model, we can perform differentially expression analysis. The argument only.var.features implies whether do batch correction for only variable features (default as TRUE).

speInt <- getIntegratedData(SRTProj, Method = "PRECAST", species = "Human", only.var.features = TRUE)

We perform differential expression analysis for all clusters by using FindAllMarkers() function, then the DE genes’ information is saved in a data.frame object dat_degs.

dat_degs <- FindAllDEGs(speInt)
dat_degs

## DataFrame with 1707 rows and 7 columns
##                 p_val avg_log2FC     pct.1     pct.2    p_val_adj  cluster
##             <numeric>  <numeric> <numeric> <numeric>    <numeric> <factor>
## IGFBP7   1.66290e-142   0.631731     0.986     0.995 3.32580e-139        1
## VIM      2.10589e-125   0.507099     0.993     0.992 4.21177e-122        1
## TIMP1     4.13426e-96   0.548544     0.986     0.985  8.26852e-93        1
## A2M       6.42193e-96   0.578070     0.976     0.972  1.28439e-92        1
## CAVIN1    3.44755e-93   0.693447     0.962     0.843  6.89509e-90        1
## ...               ...        ...       ...       ...          ...      ...
## CD52.2    1.03341e-12  -0.252077     0.854     0.835  2.06682e-09       14
## CRISP3.8  2.50793e-12  -0.320490     0.898     0.930  5.01585e-09       14
## JCHAIN.5  7.84102e-12  -0.290120     0.734     0.821  1.56820e-08       14
## SFRP4.5   1.83148e-10  -0.258204     0.693     0.747  3.66296e-07       14
## CCL19.6   1.67627e-04  -0.263334     0.845     0.756  3.35254e-01       14
##                 gene
##          <character>
## IGFBP7        IGFBP7
## VIM              VIM
## TIMP1          TIMP1
## A2M              A2M
## CAVIN1        CAVIN1
## ...              ...
## CD52.2          CD52
## CRISP3.8      CRISP3
## JCHAIN.5      JCHAIN
## SFRP4.5        SFRP4
## CCL19.6        CCL19

We identify the significant DE genes by two criteria: (a) adjustd p-value less than 0.01 and (b) average log fold change greater than 0.4.

degs_sig <- subset(dat_degs, p_val_adj < 0.01 & avg_log2FC > 0.3)
degs_sig

## DataFrame with 658 rows and 7 columns
##                 p_val avg_log2FC     pct.1     pct.2    p_val_adj  cluster
##             <numeric>  <numeric> <numeric> <numeric>    <numeric> <factor>
## IGFBP7   1.66290e-142   0.631731     0.986     0.995 3.32580e-139        1
## VIM      2.10589e-125   0.507099     0.993     0.992 4.21177e-122        1
## TIMP1     4.13426e-96   0.548544     0.986     0.985  8.26852e-93        1
## A2M       6.42193e-96   0.578070     0.976     0.972  1.28439e-92        1
## CAVIN1    3.44755e-93   0.693447     0.962     0.843  6.89509e-90        1
## ...               ...        ...       ...       ...          ...      ...
## HGD       2.33723e-37   0.337084     0.768     0.644  4.67446e-34       14
## CARTPT.1  3.31748e-37   0.452944     0.793     0.562  6.63496e-34       14
## CKB       4.41262e-33   0.310149     0.870     0.763  8.82523e-30       14
## MUC19.1   2.45292e-22   0.322983     0.889     0.794  4.90583e-19       14
## AFP.1     3.37742e-16   0.315104     0.728     0.647  6.75483e-13       14
##                 gene
##          <character>
## IGFBP7        IGFBP7
## VIM              VIM
## TIMP1          TIMP1
## A2M              A2M
## CAVIN1        CAVIN1
## ...              ...
## HGD              HGD
## CARTPT.1      CARTPT
## CKB              CKB
## MUC19.1        MUC19
## AFP.1            AFP

In the following, we perform gene set enrichment analysis for the DE genes of each Domain identified by PRECAST model using R package gprofiler2.

library(gprofiler2)
termList <- list()
for (k in 1:14) {
    # k <- 1
    if (sum(degs_sig$cluster == k) > 0) {
        cat("k = ", k, "\n")
        dat_degs_sub <- subset(degs_sig, cluster == k)

        que1 <- dat_degs_sub$gene
        gostres <- gost(query = que1, organism = "hsapiens", correction_method = "fdr")
        termList[[k]] <- gostres
    }

}

## k =  1 
## k =  2 
## k =  3 
## k =  4 
## k =  5 
## k =  6 
## k =  7 
## k =  8 
## k =  9 
## k =  10 
## k =  11 
## k =  12 
## k =  13 
## k =  14

head(termList[[1]]$result)

##     query significant     p_value term_size query_size intersection_size
## 1 query_1        TRUE 0.003044565         2         31                 2
## 2 query_1        TRUE 0.003044565         2         31                 2
## 3 query_1        TRUE 0.006054331         3         31                 2
## 4 query_1        TRUE 0.031420709         3         31                 1
## 5 query_1        TRUE 0.031420709         3         31                 1
## 6 query_1        TRUE 0.031420709         3         31                 1
##    precision    recall    term_id source                          term_name
## 1 0.06451613 1.0000000 CORUM:3141  CORUM                 CRLR-RAMP3 complex
## 2 0.06451613 1.0000000 CORUM:2709  CORUM           MMP-9-TIMP-1-LRP complex
## 3 0.06451613 0.6666667 CORUM:2710  CORUM LRP-1-Alpha-2-M-annexin VI complex
## 4 0.03225806 0.3333333 CORUM:6509  CORUM        pERK-vimentin-KPNA2 complex
## 5 0.03225806 0.3333333 CORUM:6235  CORUM          c-Fos-c-Jun-SAF-1 complex
## 6 0.03225806 0.3333333 CORUM:6391  CORUM           KCTD12-GNB1-GNG2 complex
##   effective_domain_size source_order       parents
## 1                  3385         1421 CORUM:0000000
## 2                  3385         1141 CORUM:0000000
## 3                  3385         1142 CORUM:0000000
## 4                  3385         2185 CORUM:0000000
## 5                  3385         1993 CORUM:0000000
## 6                  3385         2094 CORUM:0000000

To understand the functions of the identified spatial domains by DR-SC model, we compare the top significant biological process (BP) pathways in GO database for the DE genes from Domain 1 and 2. Here, we only show to visualize the significant BP pathways and users can explore the other databases such as KEGG and HPA.

## Most commonly used databases
source_set <- c("GO:BP", "GO:CC", "GO:MF", "KEGG", "HPA")
cols <- c("steelblue3", "goldenrod", "brown3", "#f98866", "#CE6DBD")
## Here, we show GO:BP
source1 <- "GO:BP"
ss <- which(source_set == source1)
ntop = 5
names(cols) <- source_set
pList_enrich <- list()
for (ii in 1:14) {
    ## ii <- 5
    message("ii=", ii)
    gostres2 <- termList[[ii]]
    if (!is.null(gostres2)) {
        dat1 <- subset(gostres2$result, term_size < 500)
        dat1 <- get_top_pathway(dat1, ntop = ntop, source_set = source1)
        dat1 <- dat1[complete.cases(dat1), ]
        dat1$nlog10P <- -log10(dat1$p_value)

        pList_enrich[[ii]] <- barPlot_enrich(dat1[order(dat1$nlog10P), ], source = "source", "term_name",
            "nlog10P", cols = cols[source_set[ss]], base_size = 14) + ylab("-log10(p-adj)") + xlab("Biological terms") +
            ggtitle(paste0("Domain", ii))
    }

}
drawFigs(pList_enrich[c(4, 14)], layout.dim = c(2, 1), common.legend = T, align = "hv")

We take out the top DE genes for each cluster for visualization.

library(dplyr)
n <- 5
dat_degs %>%
    as.data.frame %>%
    group_by(cluster) %>%
    top_n(n = n, wt = avg_log2FC) -> topGene
topGene

## # A tibble: 70 x 7
## # Groups:   cluster [14]
##       p_val avg_log2FC pct.1 pct.2 p_val_adj cluster gene   
##       <dbl>      <dbl> <dbl> <dbl>     <dbl> <fct>   <chr>  
##  1 3.45e-93      0.693 0.962 0.843  6.90e-90 1       CAVIN1 
##  2 3.81e-89      0.766 0.927 0.862  7.62e-86 1       SPARCL1
##  3 2.20e-82      0.778 0.955 0.833  4.41e-79 1       VWF    
##  4 4.58e-81      0.835 0.91  0.774  9.17e-78 1       AQP1   
##  5 5.50e-37      0.755 0.774 0.62   1.10e-33 1       ACKR1  
##  6 5.01e-85      0.316 0.988 0.999  1.00e-81 2       IGLC2  
##  7 2.48e-65      0.340 0.934 0.982  4.96e-62 2       C3     
##  8 1.55e-50      0.379 0.911 0.949  3.09e-47 2       IGLC1  
##  9 2.98e-20      0.334 0.87  0.873  5.97e-17 2       IGLC3  
## 10 8.97e-17      0.378 0.762 0.759  1.79e-13 2       CCL19  
## # ... with 60 more rows

We visualize the DE genes for each cluster group by gene-by-cell heatmap using the GCHeatMap() function.

p1 <- GCHeatMap(speInt, features = topGene$gene, grp_color = cols_cluster, y_text_size = 5, ncol.legend = 3)
p1

Trajectory inference

Next, we performed trajectory inference using the aligned embeddings and domain labels estimated by PRECAST model.

speInt <- AddTrajectory(speInt, reduction = "aligned.PRECAST")
p1 <- EmbedPlot(speInt, plotEmbeddings = "aligned.PRECAST", colour_by = "PT")
p2 <- EmbedPlot(speInt, plotEmbeddings = "tSNE", colour_by = "PT")
drawFigs(list(p1, p2), layout.dim = c(1, 2), common.legend = TRUE, legend.position = "right")

Visualize the inferred pseudotime on the spatial coordinates for each data batch.

p_spa <- EachEmbedPlot(speInt, reduction = "Coord", colour_by = "PT", layout.dim = c(1, 2))
p_spa

# save(SRTProj, file=paste0(SRTProj@projectMetadata$outputPath,'/SRTProj.rds'))
# load('F:/Research
# paper/IntegrateDRcluster/AnalysisCode/SRTpipeline/vignettes/BC2_PRECAST/SRTProj.rds')

Other downstream analyses

Session Info

sessionInfo()

## R version 4.1.2 (2021-11-01)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 22621)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=Chinese (Simplified)_China.936 
## [2] LC_CTYPE=Chinese (Simplified)_China.936   
## [3] LC_MONETARY=Chinese (Simplified)_China.936
## [4] LC_NUMERIC=C                              
## [5] LC_TIME=Chinese (Simplified)_China.936    
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] scater_1.25.1               scuttle_1.4.0              
##  [3] slingshot_2.2.0             TrajectoryUtils_1.2.0      
##  [5] princurve_2.1.6             gprofiler2_0.2.1           
##  [7] bigalgebra_1.1.0            bigmemory_4.5.36           
##  [9] SpatialExperiment_1.4.0     SingleCellExperiment_1.16.0
## [11] SummarizedExperiment_1.24.0 Biobase_2.54.0             
## [13] GenomicRanges_1.46.1        GenomeInfoDb_1.30.1        
## [15] IRanges_2.28.0              MatrixGenerics_1.6.0       
## [17] matrixStats_0.62.0          dplyr_1.0.9                
## [19] ggplot2_3.3.6               ggthemes_4.2.4             
## [21] colorspace_2.0-3            Matrix_1.4-0               
## [23] hdf5r_1.3.5                 ff_4.0.7                   
## [25] bit_4.0.4                   S4Vectors_0.32.3           
## [27] BiocGenerics_0.40.0         rhdf5_2.38.0               
## [29] sp_1.5-0                    SeuratObject_4.1.0         
## [31] Seurat_4.1.1                SRTpipeline_0.1.1          
## 
## loaded via a namespace (and not attached):
##   [1] scattermore_0.8           R.methodsS3_1.8.1        
##   [3] GiRaF_1.0.1               ragg_1.2.2               
##   [5] tidyr_1.2.0               bit64_4.0.5              
##   [7] knitr_1.37                irlba_2.3.5              
##   [9] DelayedArray_0.20.0       R.utils_2.11.0           
##  [11] data.table_1.14.2         rpart_4.1.16             
##  [13] RCurl_1.98-1.6            generics_0.1.2           
##  [15] ScaledMatrix_1.2.0        cowplot_1.1.1            
##  [17] RANN_2.6.1                future_1.26.1            
##  [19] spatstat.data_3.0-0       httpuv_1.6.5             
##  [21] assertthat_0.2.1          viridis_0.6.2            
##  [23] xfun_0.29                 jquerylib_0.1.4          
##  [25] evaluate_0.15             promises_1.2.0.1         
##  [27] fansi_1.0.3               igraph_1.3.5             
##  [29] DBI_1.1.2                 htmlwidgets_1.5.4        
##  [31] spatstat.geom_2.4-0       purrr_0.3.4              
##  [33] ellipsis_0.3.2            RSpectra_0.16-1          
##  [35] ggpubr_0.4.0              backports_1.4.1          
##  [37] DR.SC_3.1                 deldir_1.0-6             
##  [39] sparseMatrixStats_1.6.0   vctrs_0.4.1              
##  [41] ROCR_1.0-11               abind_1.4-5              
##  [43] cachem_1.0.6              withr_2.5.0              
##  [45] PRECAST_1.4               progressr_0.10.1         
##  [47] sctransform_0.3.3         mclust_5.4.10            
##  [49] goftest_1.2-3             cluster_2.1.2            
##  [51] lazyeval_0.2.2            crayon_1.5.1             
##  [53] SpatialAnno_1.0.0         edgeR_3.36.0             
##  [55] pkgconfig_2.0.3           labeling_0.4.2           
##  [57] nlme_3.1-155              vipor_0.4.5              
##  [59] rlang_1.0.2               globals_0.15.0           
##  [61] lifecycle_1.0.1           miniUI_0.1.1.1           
##  [63] bigmemory.sri_0.1.3       rsvd_1.0.5               
##  [65] rprojroot_2.0.3           polyclip_1.10-0          
##  [67] lmtest_0.9-40             SC.MEB_1.1               
##  [69] carData_3.0-5             Rhdf5lib_1.16.0          
##  [71] zoo_1.8-10                beeswarm_0.4.0           
##  [73] ggridges_0.5.3            rjson_0.2.21             
##  [75] png_0.1-7                 viridisLite_0.4.0        
##  [77] iSC.MEB_1.0.1             bitops_1.0-7             
##  [79] R.oo_1.24.0               KernSmooth_2.23-20       
##  [81] rhdf5filters_1.6.0        DelayedMatrixStats_1.16.0
##  [83] stringr_1.4.0             parallelly_1.32.0        
##  [85] spatstat.random_2.2-0     rstatix_0.7.0            
##  [87] ggsignif_0.6.3            beachmat_2.10.0          
##  [89] scales_1.2.0              memoise_2.0.1            
##  [91] magrittr_2.0.3            plyr_1.8.7               
##  [93] ica_1.0-2                 zlibbioc_1.40.0          
##  [95] compiler_4.1.2            dqrng_0.3.0              
##  [97] RColorBrewer_1.1-3        fitdistrplus_1.1-8       
##  [99] cli_3.2.0                 XVector_0.34.0           
## [101] listenv_0.8.0             patchwork_1.1.1          
## [103] pbapply_1.5-0             formatR_1.11             
## [105] MASS_7.3-55               mgcv_1.8-39              
## [107] tidyselect_1.1.2          stringi_1.7.6            
## [109] textshaping_0.3.6         highr_0.9                
## [111] yaml_2.3.6                locfit_1.5-9.4           
## [113] BiocSingular_1.10.0       ggrepel_0.9.1            
## [115] grid_4.1.2                sass_0.4.1               
## [117] tools_4.1.2               future.apply_1.9.0       
## [119] parallel_4.1.2            rstudioapi_0.13          
## [121] gridExtra_2.3             farver_2.1.0             
## [123] Rtsne_0.16                DropletUtils_1.14.2      
## [125] digest_0.6.29             rgeos_0.5-9              
## [127] shiny_1.7.1               Rcpp_1.0.10              
## [129] car_3.0-12                broom_0.7.12             
## [131] later_1.3.0               RcppAnnoy_0.0.19         
## [133] httr_1.4.3                fs_1.5.2                 
## [135] tensor_1.5                reticulate_1.25          
## [137] splines_4.1.2             uwot_0.1.11              
## [139] spatstat.utils_3.0-1      pkgdown_2.0.6            
## [141] plotly_4.10.0             systemfonts_1.0.4        
## [143] xtable_1.8-4              jsonlite_1.8.0           
## [145] R6_2.5.1                  pillar_1.7.0             
## [147] htmltools_0.5.2           mime_0.12                
## [149] glue_1.6.2                fastmap_1.1.0            
## [151] BiocParallel_1.28.3       BiocNeighbors_1.12.0     
## [153] codetools_0.2-18          utf8_1.2.2               
## [155] lattice_0.20-45           bslib_0.3.1              
## [157] spatstat.sparse_2.1-1     tibble_3.1.7             
## [159] ggbeeswarm_0.6.0          leiden_0.4.2             
## [161] gtools_3.9.2.2            magick_2.7.3             
## [163] limma_3.50.1              survival_3.2-13          
## [165] CompQuadForm_1.4.3        rmarkdown_2.11           
## [167] desc_1.4.0                munsell_0.5.0            
## [169] GenomeInfoDbData_1.2.7    HDF5Array_1.22.1         
## [171] reshape2_1.4.4            gtable_0.3.0             
## [173] spatstat.core_2.4-4

Compiled: 二月 21, 2023