The dataset was graciously provided by Argelaguet et al. (2019).
Scripts used to process the raw data were written and maintained by Argelaguet and colleagues and reside on GitHub: https://github.com/rargelaguet/scnmt_gastrulation
For more information on the protocol, see Clark et al. (2018).
The user can see the available datasets by using the
dry.run
argument:
## ah_id mode file_size rdataclass rdatadateadded rdatadateremoved
## 1 EH3738 acc_cgi 7 Mb matrix 2020-09-03 <NA>
## 2 EH3739 acc_CTCF 1.2 Mb matrix 2020-09-03 <NA>
## 3 EH3740 acc_DHS 0.3 Mb matrix 2020-09-03 <NA>
## 4 EH3741 acc_genebody 49.6 Mb matrix 2020-09-03 <NA>
## 5 EH3742 acc_p300 0.2 Mb matrix 2020-09-03 <NA>
## 6 EH3743 acc_promoter 27.2 Mb matrix 2020-09-03 <NA>
## 7 EH3745 met_cgi 4.6 Mb matrix 2020-09-03 <NA>
## 8 EH3746 met_CTCF 0.1 Mb matrix 2020-09-03 <NA>
## 9 EH3747 met_DHS 0.1 Mb matrix 2020-09-03 <NA>
## 10 EH3748 met_genebody 26.8 Mb matrix 2020-09-03 <NA>
## 11 EH3749 met_p300 0.1 Mb matrix 2020-09-03 <NA>
## 12 EH3750 met_promoter 11.5 Mb matrix 2020-09-03 <NA>
## 13 EH3751 rna 18.6 Mb matrix 2020-09-03 <NA>
Or by simply running the scNMT
function with
defaults:
## ah_id mode file_size rdataclass rdatadateadded rdatadateremoved
## 1 EH3738 acc_cgi 7 Mb matrix 2020-09-03 <NA>
## 2 EH3739 acc_CTCF 1.2 Mb matrix 2020-09-03 <NA>
## 3 EH3740 acc_DHS 0.3 Mb matrix 2020-09-03 <NA>
## 4 EH3741 acc_genebody 49.6 Mb matrix 2020-09-03 <NA>
## 5 EH3742 acc_p300 0.2 Mb matrix 2020-09-03 <NA>
## 6 EH3743 acc_promoter 27.2 Mb matrix 2020-09-03 <NA>
## 7 EH3745 met_cgi 4.6 Mb matrix 2020-09-03 <NA>
## 8 EH3746 met_CTCF 0.1 Mb matrix 2020-09-03 <NA>
## 9 EH3747 met_DHS 0.1 Mb matrix 2020-09-03 <NA>
## 10 EH3748 met_genebody 26.8 Mb matrix 2020-09-03 <NA>
## 11 EH3749 met_p300 0.1 Mb matrix 2020-09-03 <NA>
## 12 EH3750 met_promoter 11.5 Mb matrix 2020-09-03 <NA>
## 13 EH3751 rna 18.6 Mb matrix 2020-09-03 <NA>
A more recent release of the ‘mouse_gastrulation’ dataset was
provided by Argelaguet and colleagues. This dataset includes additional
cells that did not pass the original quality metrics as imposed for the
version 1.0.0
dataset.
Use the version
argument to indicate the newer dataset
version (i.e., 2.0.0
):
## ah_id mode file_size rdataclass rdatadateadded rdatadateremoved
## 1 EH3753 acc_cgi 21.1 Mb matrix 2020-09-03 <NA>
## 2 EH3754 acc_CTCF 1.2 Mb matrix 2020-09-03 <NA>
## 3 EH3755 acc_DHS 16.2 Mb matrix 2020-09-03 <NA>
## 4 EH3756 acc_genebody 60.1 Mb matrix 2020-09-03 <NA>
## 5 EH3757 acc_p300 0.2 Mb matrix 2020-09-03 <NA>
## 6 EH3758 acc_promoter 33.8 Mb matrix 2020-09-03 <NA>
## 7 EH3760 met_cgi 12.1 Mb matrix 2020-09-03 <NA>
## 8 EH3761 met_CTCF 0.1 Mb matrix 2020-09-03 <NA>
## 9 EH3762 met_DHS 3.9 Mb matrix 2020-09-03 <NA>
## 10 EH3763 met_genebody 33.9 Mb matrix 2020-09-03 <NA>
## 11 EH3764 met_p300 0.1 Mb matrix 2020-09-03 <NA>
## 12 EH3765 met_promoter 18.7 Mb matrix 2020-09-03 <NA>
## 13 EH3766 rna 43.5 Mb matrix 2020-09-03 <NA>
To obtain the data, we can use the mode
argument to
indicate specific datasets using ‘glob’ patterns that will match the
outputs above. For example, if we would like to have all ‘genebody’
datasets for all available assays, we would use *_genebody
as an input to mode
.
nmt <- scNMT("mouse_gastrulation", mode = c("*_DHS", "*_cgi", "*_genebody"),
version = "1.0.0", dry.run = FALSE)
## Warning: sampleMap[['assay']] coerced with as.factor()
## A MultiAssayExperiment object of 6 listed
## experiments with user-defined names and respective classes.
## Containing an ExperimentList class object of length 6:
## [1] acc_DHS: matrix with 290 rows and 826 columns
## [2] met_DHS: matrix with 66 rows and 826 columns
## [3] acc_cgi: matrix with 4459 rows and 826 columns
## [4] met_cgi: matrix with 5536 rows and 826 columns
## [5] acc_genebody: matrix with 17139 rows and 826 columns
## [6] met_genebody: matrix with 15837 rows and 826 columns
## Functionality:
## experiments() - obtain the ExperimentList instance
## colData() - the primary/phenotype DataFrame
## sampleMap() - the sample coordination DataFrame
## `$`, `[`, `[[` - extract colData columns, subset, or experiment
## *Format() - convert into a long or wide DataFrame
## assays() - convert ExperimentList to a SimpleList of matrices
## exportClass() - save data to flat files
Included in the colData
DataFrame
within
the MultiAssayExperiment
class are the variables
cellID
, stage
, lineage10x_2
, and
stage_lineage
. To extract this DataFrame
, one
has to use colData
on the MultiAssayExperiment
object:
## DataFrame with 826 rows and 4 columns
## cellID stage lineage10x_2
## <character> <character> <character>
## E7.5_Plate1_A3 E7.5_Plate1_A3 E7.5 Endoderm
## E7.5_Plate1_H3 E7.5_Plate1_H3 E7.5 Endoderm
## E7.5_Plate1_D2 E7.5_Plate1_D2 E7.5 Endoderm
## E7.5_Plate1_D7 E7.5_Plate1_D7 E7.5 Endoderm
## E7.5_Plate1_F5 E7.5_Plate1_F5 E7.5 Endoderm
## ... ... ... ...
## PS_VE_Plate9_C11 PS_VE_Plate9_C11 E6.5 Epiblast
## PS_VE_Plate9_E11 PS_VE_Plate9_E11 E6.5 Epiblast
## PS_VE_Plate9_D11 PS_VE_Plate9_D11 E6.5 Primitive_Streak
## PS_VE_Plate9_A11 PS_VE_Plate9_A11 E6.5 Primitive_Streak
## PS_VE_Plate9_B11 PS_VE_Plate9_B11 E6.5 Mesoderm
## stage_lineage
## <character>
## E7.5_Plate1_A3 E7.5_Endoderm
## E7.5_Plate1_H3 E7.5_Endoderm
## E7.5_Plate1_D2 E7.5_Endoderm
## E7.5_Plate1_D7 E7.5_Endoderm
## E7.5_Plate1_F5 E7.5_Endoderm
## ... ...
## PS_VE_Plate9_C11 E6.5_Epiblast
## PS_VE_Plate9_E11 E6.5_Epiblast
## PS_VE_Plate9_D11 E6.5_Primitive_Streak
## PS_VE_Plate9_A11 E6.5_Primitive_Streak
## PS_VE_Plate9_B11 E6.5_Mesoderm
Check row annotations:
## CharacterList of length 6
## [["acc_DHS"]] ESC_DHS_118970 ESC_DHS_118919 ... ESC_DHS_68996 ESC_DHS_109494
## [["met_DHS"]] ESC_DHS_20778 ESC_DHS_14504 ... ESC_DHS_72133 ESC_DHS_72129
## [["acc_cgi"]] CGI_5278 CGI_6058 CGI_10627 ... CGI_7832 CGI_11329 CGI_10964
## [["met_cgi"]] CGI_3481 CGI_8941 CGI_956 CGI_9461 ... CGI_2867 CGI_3499 CGI_365
## [["acc_genebody"]] ENSMUSG00000036181 ENSMUSG00000071862 ... ENSMUSG00000025576
## [["met_genebody"]] ENSMUSG00000059334 ENSMUSG00000024026 ... ENSMUSG00000078302
The sampleMap
is a graph representation of the
relationships between cells and ‘assay’ datasets:
## DataFrame with 4956 rows and 3 columns
## assay primary colname
## <factor> <character> <character>
## 1 met_genebody E4.5-5.5_new_Plate1_.. E4.5-5.5_new_Plate1_..
## 2 met_genebody E4.5-5.5_new_Plate1_.. E4.5-5.5_new_Plate1_..
## 3 met_genebody E4.5-5.5_new_Plate1_.. E4.5-5.5_new_Plate1_..
## 4 met_genebody E4.5-5.5_new_Plate1_.. E4.5-5.5_new_Plate1_..
## 5 met_genebody E4.5-5.5_new_Plate1_.. E4.5-5.5_new_Plate1_..
## ... ... ... ...
## 4952 acc_DHS PS_VE_Plate9_G05 PS_VE_Plate9_G05
## 4953 acc_DHS PS_VE_Plate9_G08 PS_VE_Plate9_G08
## 4954 acc_DHS PS_VE_Plate9_G09 PS_VE_Plate9_G09
## 4955 acc_DHS PS_VE_Plate9_G12 PS_VE_Plate9_G12
## 4956 acc_DHS PS_VE_Plate9_H08 PS_VE_Plate9_H08
Take a look at the cell identifiers or barcodes across assays:
## CharacterList of length 6
## [["acc_DHS"]] E4.5-5.5_new_Plate1_A02 ... PS_VE_Plate9_H08
## [["met_DHS"]] E4.5-5.5_new_Plate1_A02 ... PS_VE_Plate9_H08
## [["acc_cgi"]] E4.5-5.5_new_Plate1_A02 ... PS_VE_Plate9_H08
## [["met_cgi"]] E4.5-5.5_new_Plate1_A02 ... PS_VE_Plate9_H08
## [["acc_genebody"]] E4.5-5.5_new_Plate1_A02 ... PS_VE_Plate9_H08
## [["met_genebody"]] E4.5-5.5_new_Plate1_A02 ... PS_VE_Plate9_H08
See the accessibilty levels (as proportions) for DNase Hypersensitive Sites:
## E4.5-5.5_new_Plate1_A02 E4.5-5.5_new_Plate1_A04
## ESC_DHS_118970 0.66666667 NA
## ESC_DHS_118919 0.76190476 NA
## ESC_DHS_66330 0.81818182 0.7142857
## ESC_DHS_43318 NA 0.8000000
## ESC_DHS_6229 0.85714286 0.8000000
## ESC_DHS_9413 0.06666667 0.6800000
## E4.5-5.5_new_Plate1_A07 E4.5-5.5_new_Plate1_A08
## ESC_DHS_118970 NA 0.2631579
## ESC_DHS_118919 0.3636364 0.8421053
## ESC_DHS_66330 0.7391304 0.6086957
## ESC_DHS_43318 0.5000000 0.8888889
## ESC_DHS_6229 0.3333333 0.7142857
## ESC_DHS_9413 0.2142857 0.5217391
See the methylation percentage / proportion:
## E4.5-5.5_new_Plate1_A02 E4.5-5.5_new_Plate1_A04
## ESC_DHS_20778 0.8000000 NA
## ESC_DHS_14504 0.8000000 0.8
## ESC_DHS_112143 NA 0.4
## ESC_DHS_34593 0.6666667 0.6
## ESC_DHS_20747 0.4000000 0.6
## ESC_DHS_33671 NA 0.6
## E4.5-5.5_new_Plate1_A07 E4.5-5.5_new_Plate1_A08
## ESC_DHS_20778 0.8571429 0.8000000
## ESC_DHS_14504 0.8000000 0.6000000
## ESC_DHS_112143 0.5714286 0.5000000
## ESC_DHS_34593 0.7142857 0.8000000
## ESC_DHS_20747 NA 0.6000000
## ESC_DHS_33671 0.8333333 0.6666667
For protocol information, see the references below.
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] scater_1.35.0 ggplot2_3.5.1
## [3] scran_1.35.0 scuttle_1.17.0
## [5] HDF5Array_1.35.1 rhdf5_2.51.0
## [7] DelayedArray_0.33.2 SparseArray_1.7.2
## [9] S4Arrays_1.7.1 abind_1.4-8
## [11] Matrix_1.7-1 RaggedExperiment_1.31.0
## [13] SingleCellExperiment_1.29.1 SingleCellMultiModal_1.19.1
## [15] MultiAssayExperiment_1.33.1 SummarizedExperiment_1.37.0
## [17] Biobase_2.67.0 GenomicRanges_1.59.1
## [19] GenomeInfoDb_1.43.1 IRanges_2.41.1
## [21] S4Vectors_0.45.2 BiocGenerics_0.53.3
## [23] generics_0.1.3 MatrixGenerics_1.19.0
## [25] matrixStats_1.4.1 BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] DBI_1.2.3 gridExtra_2.3 formatR_1.14
## [4] rlang_1.1.4 magrittr_2.0.3 RcppAnnoy_0.0.22
## [7] compiler_4.4.2 RSQLite_2.3.8 png_0.1-8
## [10] vctrs_0.6.5 pkgconfig_2.0.3 SpatialExperiment_1.17.0
## [13] crayon_1.5.3 fastmap_1.2.0 dbplyr_2.5.0
## [16] magick_2.8.5 XVector_0.47.0 labeling_0.4.3
## [19] utf8_1.2.4 rmarkdown_2.29 ggbeeswarm_0.7.2
## [22] UCSC.utils_1.3.0 UpSetR_1.4.0 purrr_1.0.2
## [25] bit_4.5.0 xfun_0.49 bluster_1.17.0
## [28] zlibbioc_1.52.0 cachem_1.1.0 beachmat_2.23.1
## [31] jsonlite_1.8.9 blob_1.2.4 rhdf5filters_1.19.0
## [34] Rhdf5lib_1.29.0 BiocParallel_1.41.0 irlba_2.3.5.1
## [37] parallel_4.4.2 cluster_2.1.6 R6_2.5.1
## [40] bslib_0.8.0 limma_3.63.2 jquerylib_0.1.4
## [43] Rcpp_1.0.13-1 knitr_1.49 BiocBaseUtils_1.9.0
## [46] igraph_2.1.1 tidyselect_1.2.1 viridis_0.6.5
## [49] yaml_2.3.10 codetools_0.2-20 curl_6.0.1
## [52] plyr_1.8.9 lattice_0.22-6 tibble_3.2.1
## [55] withr_3.0.2 KEGGREST_1.47.0 evaluate_1.0.1
## [58] BiocFileCache_2.15.0 ExperimentHub_2.15.0 Biostrings_2.75.1
## [61] pillar_1.9.0 BiocManager_1.30.25 filelock_1.0.3
## [64] BiocVersion_3.21.1 munsell_0.5.1 scales_1.3.0
## [67] glue_1.8.0 metapod_1.15.0 maketools_1.3.1
## [70] tools_4.4.2 AnnotationHub_3.15.0 BiocNeighbors_2.1.0
## [73] sys_3.4.3 ScaledMatrix_1.15.0 locfit_1.5-9.10
## [76] buildtools_1.0.0 grid_4.4.2 colorspace_2.1-1
## [79] edgeR_4.5.0 AnnotationDbi_1.69.0 GenomeInfoDbData_1.2.13
## [82] beeswarm_0.4.0 BiocSingular_1.23.0 vipor_0.4.7
## [85] cli_3.6.3 rsvd_1.0.5 rappdirs_0.3.3
## [88] fansi_1.0.6 viridisLite_0.4.2 dplyr_1.1.4
## [91] uwot_0.2.2 gtable_0.3.6 sass_0.4.9
## [94] digest_0.6.37 ggrepel_0.9.6 dqrng_0.4.1
## [97] farver_2.1.2 rjson_0.2.23 memoise_2.0.1
## [100] htmltools_0.5.8.1 lifecycle_1.0.4 httr_1.4.7
## [103] statmod_1.5.0 mime_0.12 bit64_4.5.2