For this tutorial, we will be analyzing the a dataset of Peripheral Blood Mononuclear Cells (PBMC) freely available from 10X Genomics. There are 2,700 single cells that were sequenced on the Illumina NextSeq 500. The raw data can be found here.
We start by reading in the data. The Read10X()
function reads in the output of the cellranger pipeline from 10X, returning a unique molecular identified (UMI) count matrix. The values in this matrix represent the number of molecules for each feature (i.e. gene; row) that are detected in each cell (column).
We next use the count matrix to create a Seurat
object. The object serves as a container that contains both data (like the count matrix) and analysis (like PCA, or clustering results) for a single-cell dataset. For a technical discussion of the Seurat
object structure, check out our GitHub Wiki. For example, the count matrix is stored in pbmc[["RNA"]]@counts
.
What does data in a count matrix look like?
# Lets examine a few genes in the first thirty cells
The .
values in the matrix represent 0s (no molecules detected). Since most values in an scRNA-seq matrix are 0, Seurat uses a sparse-matrix representation whenever possible. This results in significant memory and speed savings for Drop-seq/inDrop/10x data.
Session Info
## R version 4.1.2 (2021-11-01)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 22000)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=Chinese (Simplified)_China.936
## [2] LC_CTYPE=Chinese (Simplified)_China.936
## [3] LC_MONETARY=Chinese (Simplified)_China.936
## [4] LC_NUMERIC=C
## [5] LC_TIME=Chinese (Simplified)_China.936
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] patchwork_1.1.1 sp_1.5-0 SeuratObject_4.1.0 Seurat_4.1.1
## [5] dplyr_1.0.9
##
## loaded via a namespace (and not attached):
## [1] Rtsne_0.16 colorspace_2.0-3 deldir_1.0-6
## [4] ellipsis_0.3.2 ggridges_0.5.3 rprojroot_2.0.3
## [7] fs_1.5.2 spatstat.data_3.0-0 rstudioapi_0.13
## [10] leiden_0.4.2 listenv_0.8.0 ggrepel_0.9.1
## [13] fansi_1.0.3 codetools_0.2-18 splines_4.1.2
## [16] cachem_1.0.6 knitr_1.37 polyclip_1.10-0
## [19] jsonlite_1.8.0 ica_1.0-2 cluster_2.1.2
## [22] png_0.1-7 rgeos_0.5-9 uwot_0.1.11
## [25] spatstat.sparse_2.1-1 shiny_1.7.1 sctransform_0.3.3
## [28] compiler_4.1.2 httr_1.4.3 assertthat_0.2.1
## [31] Matrix_1.4-0 fastmap_1.1.0 lazyeval_0.2.2
## [34] cli_3.2.0 later_1.3.0 formatR_1.11
## [37] htmltools_0.5.2 tools_4.1.2 igraph_1.3.2
## [40] gtable_0.3.0 glue_1.6.2 RANN_2.6.1
## [43] reshape2_1.4.4 Rcpp_1.0.8.3 scattermore_0.8
## [46] jquerylib_0.1.4 pkgdown_2.0.6 vctrs_0.4.1
## [49] nlme_3.1-155 progressr_0.10.1 lmtest_0.9-40
## [52] spatstat.random_2.2-0 xfun_0.29 stringr_1.4.0
## [55] globals_0.15.0 mime_0.12 miniUI_0.1.1.1
## [58] lifecycle_1.0.1 irlba_2.3.5 goftest_1.2-3
## [61] future_1.26.1 MASS_7.3-55 zoo_1.8-10
## [64] scales_1.2.0 spatstat.core_2.4-4 spatstat.utils_3.0-1
## [67] ragg_1.2.2 promises_1.2.0.1 parallel_4.1.2
## [70] RColorBrewer_1.1-3 yaml_2.3.6 gridExtra_2.3
## [73] memoise_2.0.1 reticulate_1.25 pbapply_1.5-0
## [76] ggplot2_3.3.6 sass_0.4.1 rpart_4.1.16
## [79] stringi_1.7.6 desc_1.4.0 rlang_1.0.2
## [82] pkgconfig_2.0.3 systemfonts_1.0.4 matrixStats_0.62.0
## [85] evaluate_0.15 lattice_0.20-45 tensor_1.5
## [88] ROCR_1.0-11 purrr_0.3.4 htmlwidgets_1.5.4
## [91] cowplot_1.1.1 tidyselect_1.1.2 parallelly_1.32.0
## [94] RcppAnnoy_0.0.19 plyr_1.8.7 magrittr_2.0.3
## [97] R6_2.5.1 generics_0.1.2 DBI_1.1.2
## [100] mgcv_1.8-39 pillar_1.7.0 fitdistrplus_1.1-8
## [103] abind_1.4-5 survival_3.2-13 tibble_3.1.7
## [106] future.apply_1.9.0 crayon_1.5.1 KernSmooth_2.23-20
## [109] utf8_1.2.2 spatstat.geom_2.4-0 plotly_4.10.0
## [112] rmarkdown_2.11 grid_4.1.2 data.table_1.14.2
## [115] digest_0.6.29 xtable_1.8-4 tidyr_1.2.0
## [118] httpuv_1.6.5 textshaping_0.3.6 munsell_0.5.0
## [121] viridisLite_0.4.0 bslib_0.3.1