Introduction to scAnnotatR.models

Johannes Griss

2025-12-10

Introduction

The scAnnotatR.models packages contains a set of pre-trained models to classify various (immune) cell types in human data to be used by the scAnnotatR package.

scAnnotatR is an R package for cell type prediction on single cell RNA-sequencing data. Currently, this package supports data in the forms of a Seurat object or a SingleCellExperiment object.

If you are interested in directly applying these models to your data, please refer to the vignettes of the scAnnotatR package.

Installation

The scAnnotatR.models package is a AnnotationHub package. Normally, it is automatically loaded by the scAnnotatR package.

To load the package manually into your R session, please use the Bioconductor AnnotationHub package:

# use the AnnotationHub to load the scAnnotatR.models package
eh <- AnnotationHub::AnnotationHub()

# load the stored models
query <- AnnotationHub::query(eh, "scAnnotatR.models")
models <- query[["AH95906"]]
#> loading from cache
#> Loading required namespace: scAnnotatR
#> Warning: replacing previous import 'ape::where' by 'dplyr::where' when loading
#> 'scAnnotatR'
#> Warning: replacing previous import 'e1071::element' by 'ggplot2::element' when
#> loading 'scAnnotatR'
#> Registered S3 method overwritten by 'spatstat.explore':
#>   method   from
#>   plot.roc pROC

Data Structure

The models object is a named list containing the cell type’s name as key and the respective classifier as value:

# print the available cell types
names(models)
#>  [1] "B cells"           "Plasma cells"      "NK"               
#>  [4] "CD16 NK"           "CD56 NK"           "T cells"          
#>  [7] "CD4 T cells"       "CD8 T cells"       "Treg"             
#> [10] "NKT"               "ILC"               "Monocytes"        
#> [13] "CD14 Mono"         "CD16 Mono"         "DC"               
#> [16] "pDC"               "Endothelial cells" "LEC"              
#> [19] "VEC"               "Platelets"         "RBC"              
#> [22] "Melanocyte"        "Schwann cells"     "Pericytes"        
#> [25] "Mast cells"        "Keratinocytes"     "alpha"            
#> [28] "beta"              "delta"             "gamma"            
#> [31] "acinar"            "ductal"            "Fibroblasts"

Each classifier is an instance of the scAnnotatR S4 class. For example:

models[['B cells']]
#> An object of class scAnnotatR for B cells 
#> * 31 marker genes applied: CD38, CD79B, CD74, CD84, RASGRP2, TCF3, SP140, MEF2C, DERL3, CD37, CD79A, POU2AF1, MVK, CD83, BACH2, LY86, CD86, SDC1, CR2, LRMP, VPREB3, IL2RA, BLK, IRF8, FLI1, MS4A1, CD14, MZB1, PTEN, CD19, MME 
#> * Predicting probability threshold: 0.5 
#> * No parent model

Included models

The scAnnotatR package comes with several pre-trained models to classify cell types.

# Load the scAnnotatR package to view the models
library(scAnnotatR)
#> Loading required package: Seurat
#> Loading required package: SeuratObject
#> Loading required package: sp
#> 
#> Attaching package: 'SeuratObject'
#> The following objects are masked from 'package:base':
#> 
#>     intersect, t
#> Loading required package: SingleCellExperiment
#> Loading required package: SummarizedExperiment
#> Loading required package: MatrixGenerics
#> Loading required package: matrixStats
#> 
#> Attaching package: 'MatrixGenerics'
#> The following objects are masked from 'package:matrixStats':
#> 
#>     colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
#>     colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
#>     colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
#>     colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
#>     colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
#>     colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
#>     colWeightedMeans, colWeightedMedians, colWeightedSds,
#>     colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
#>     rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
#>     rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
#>     rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
#>     rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
#>     rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
#>     rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
#>     rowWeightedSds, rowWeightedVars
#> Loading required package: GenomicRanges
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#> Loading required package: generics
#> 
#> Attaching package: 'generics'
#> The following objects are masked from 'package:base':
#> 
#>     as.difftime, as.factor, as.ordered, intersect, is.element, setdiff,
#>     setequal, union
#> 
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:stats':
#> 
#>     IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#> 
#>     Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
#>     as.data.frame, basename, cbind, colnames, dirname, do.call,
#>     duplicated, eval, evalq, get, grep, grepl, is.unsorted, lapply,
#>     mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
#>     rank, rbind, rownames, sapply, saveRDS, table, tapply, unique,
#>     unsplit, which.max, which.min
#> Loading required package: S4Vectors
#> 
#> Attaching package: 'S4Vectors'
#> The following object is masked from 'package:utils':
#> 
#>     findMatches
#> The following objects are masked from 'package:base':
#> 
#>     I, expand.grid, unname
#> Loading required package: IRanges
#> 
#> Attaching package: 'IRanges'
#> The following object is masked from 'package:sp':
#> 
#>     %over%
#> Loading required package: Seqinfo
#> Loading required package: Biobase
#> Welcome to Bioconductor
#> 
#>     Vignettes contain introductory material; view with
#>     'browseVignettes()'. To cite Bioconductor, see
#>     'citation("Biobase")', and for packages 'citation("pkgname")'.
#> 
#> Attaching package: 'Biobase'
#> The following object is masked from 'package:MatrixGenerics':
#> 
#>     rowMedians
#> The following objects are masked from 'package:matrixStats':
#> 
#>     anyMissing, rowMedians
#> 
#> Attaching package: 'SummarizedExperiment'
#> The following object is masked from 'package:Seurat':
#> 
#>     Assays
#> The following object is masked from 'package:SeuratObject':
#> 
#>     Assays

The models are stored in the default_models object:

default_models <- load_models("default")
#> loading from cache
names(default_models)
#>  [1] "B cells"           "Plasma cells"      "NK"               
#>  [4] "CD16 NK"           "CD56 NK"           "T cells"          
#>  [7] "CD4 T cells"       "CD8 T cells"       "Treg"             
#> [10] "NKT"               "ILC"               "Monocytes"        
#> [13] "CD14 Mono"         "CD16 Mono"         "DC"               
#> [16] "pDC"               "Endothelial cells" "LEC"              
#> [19] "VEC"               "Platelets"         "RBC"              
#> [22] "Melanocyte"        "Schwann cells"     "Pericytes"        
#> [25] "Mast cells"        "Keratinocytes"     "alpha"            
#> [28] "beta"              "delta"             "gamma"            
#> [31] "acinar"            "ductal"            "Fibroblasts"

The default_models object is named a list of classifiers. Each classifier is an instance of the scAnnotatR S4 class. For example:

default_models[['B cells']]
#> An object of class scAnnotatR for B cells 
#> * 31 marker genes applied: CD38, CD79B, CD74, CD84, RASGRP2, TCF3, SP140, MEF2C, DERL3, CD37, CD79A, POU2AF1, MVK, CD83, BACH2, LY86, CD86, SDC1, CR2, LRMP, VPREB3, IL2RA, BLK, IRF8, FLI1, MS4A1, CD14, MZB1, PTEN, CD19, MME 
#> * Predicting probability threshold: 0.5 
#> * No parent model

Please refer to the scAnnotatR package documentation for detailed information about how to use these classifiers.

Session Info

sessionInfo()
#> R Under development (unstable) (2025-10-20 r88955)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.3 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.23-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] scAnnotatR_1.17.0           SingleCellExperiment_1.33.0
#>  [3] SummarizedExperiment_1.41.0 Biobase_2.71.0             
#>  [5] GenomicRanges_1.63.1        Seqinfo_1.1.0              
#>  [7] IRanges_2.45.0              S4Vectors_0.49.0           
#>  [9] BiocGenerics_0.57.0         generics_0.1.4             
#> [11] MatrixGenerics_1.23.0       matrixStats_1.5.0          
#> [13] Seurat_5.3.1                SeuratObject_5.2.0         
#> [15] sp_2.2-0                   
#> 
#> loaded via a namespace (and not attached):
#>   [1] RcppAnnoy_0.0.22       splines_4.6.0          later_1.4.4           
#>   [4] filelock_1.0.3         tibble_3.3.0           polyclip_1.10-7       
#>   [7] hardhat_1.4.2          pROC_1.19.0.1          rpart_4.1.24          
#>  [10] fastDummies_1.7.5      lifecycle_1.0.4        httr2_1.2.2           
#>  [13] globals_0.18.0         lattice_0.22-7         MASS_7.3-65           
#>  [16] magrittr_2.0.4         plotly_4.11.0          sass_0.4.10           
#>  [19] rmarkdown_2.30         jquerylib_0.1.4        yaml_2.3.11           
#>  [22] httpuv_1.6.16          otel_0.2.0             sctransform_0.4.2     
#>  [25] spam_2.11-1            spatstat.sparse_3.1-0  reticulate_1.44.1     
#>  [28] cowplot_1.2.0          pbapply_1.7-4          DBI_1.2.3             
#>  [31] RColorBrewer_1.1-3     lubridate_1.9.4        abind_1.4-8           
#>  [34] Rtsne_0.17             purrr_1.2.0            nnet_7.3-20           
#>  [37] rappdirs_0.3.3         ipred_0.9-15           lava_1.8.2            
#>  [40] data.tree_1.2.0        ggrepel_0.9.6          irlba_2.3.5.1         
#>  [43] spatstat.utils_3.2-0   listenv_0.10.0         goftest_1.2-3         
#>  [46] RSpectra_0.16-2        spatstat.random_3.4-3  fitdistrplus_1.2-4    
#>  [49] parallelly_1.45.1      codetools_0.2-20       DelayedArray_0.37.0   
#>  [52] tidyselect_1.2.1       farver_2.1.2           spatstat.explore_3.6-0
#>  [55] BiocFileCache_3.1.0    jsonlite_2.0.0         caret_7.0-1           
#>  [58] e1071_1.7-16           progressr_0.18.0       ggridges_0.5.7        
#>  [61] survival_3.8-3         iterators_1.0.14       foreach_1.5.2         
#>  [64] tools_4.6.0            ica_1.0-3              Rcpp_1.1.0.8.1        
#>  [67] glue_1.8.0             gridExtra_2.3          prodlim_2025.04.28    
#>  [70] SparseArray_1.11.8     xfun_0.54              dplyr_1.1.4           
#>  [73] withr_3.0.2            BiocManager_1.30.27    fastmap_1.2.0         
#>  [76] digest_0.6.39          timechange_0.3.0       R6_2.6.1              
#>  [79] mime_0.13              scattermore_1.2        tensor_1.5.1          
#>  [82] spatstat.data_3.1-9    dichromat_2.0-0.1      RSQLite_2.4.5         
#>  [85] tidyr_1.3.1            data.table_1.17.8      recipes_1.3.1         
#>  [88] class_7.3-23           httr_1.4.7             htmlwidgets_1.6.4     
#>  [91] S4Arrays_1.11.1        uwot_0.2.4             ModelMetrics_1.2.2.2  
#>  [94] pkgconfig_2.0.3        gtable_0.3.6           timeDate_4051.111     
#>  [97] blob_1.2.4             lmtest_0.9-40          S7_0.2.1              
#> [100] XVector_0.51.0         htmltools_0.5.9        dotCall64_1.2         
#> [103] scales_1.4.0           png_0.1-8              spatstat.univar_3.1-5 
#> [106] gower_1.0.2            knitr_1.50             reshape2_1.4.5        
#> [109] nlme_3.1-168           curl_7.0.0             proxy_0.4-27          
#> [112] cachem_1.1.0           zoo_1.8-14             stringr_1.6.0         
#> [115] BiocVersion_3.23.1     KernSmooth_2.23-26     parallel_4.6.0        
#> [118] miniUI_0.1.2           AnnotationDbi_1.73.0   pillar_1.11.1         
#> [121] grid_4.6.0             vctrs_0.6.5            RANN_2.6.2            
#> [124] promises_1.5.0         dbplyr_2.5.1           xtable_1.8-4          
#> [127] cluster_2.1.8.1        evaluate_1.0.5         cli_3.6.5             
#> [130] compiler_4.6.0         rlang_1.1.6            crayon_1.5.3          
#> [133] future.apply_1.20.1    plyr_1.8.9             stringi_1.8.7         
#> [136] deldir_2.0-4           viridisLite_0.4.2      Biostrings_2.79.2     
#> [139] lazyeval_0.2.2         spatstat.geom_3.6-1    Matrix_1.7-4          
#> [142] RcppHNSW_0.6.0         patchwork_1.3.2        bit64_4.6.0-1         
#> [145] future_1.68.0          ggplot2_4.0.1          KEGGREST_1.51.1       
#> [148] shiny_1.12.1           AnnotationHub_4.1.0    kernlab_0.9-33        
#> [151] ROCR_1.0-11            igraph_2.2.1           memoise_2.0.1         
#> [154] bslib_0.9.0            bit_4.6.0              ape_5.8-1