Skip to content

Unreasonable memory usage when using extract_array() on a sparse TileDBArray #35

@hpages

Description

@hpages

Let's first create a 28k x 15k sparse TileDBArray object:

library(HDF5Array)
library(ExperimentHub)
hub <- ExperimentHub()
brain_path <- hub[["EH1040"]]  # 1.3 Million Brain Cell Dataset
brain <- HDF5Array(brain_path, "counts")

library(TileDBArray)
path <- tempfile()

## Takes about 30-40s (resulting dataset is 154M on disk):
B <- writeTileDBArray(brain[ , 1:15000], sparse=TRUE, path=path)

dim(B)
# [1] 27998  15000

Extracting a random subset of 8000 x 5000 values uses about > 26 GB of memory on my laptop (Ubuntu Linux 24.04):

set.seed(111)
index <- list(sample(nrow(B), 8000), sample(ncol(B), 5000))
m <- extract_array(B,  index)  # 'top' command reports  > 26 GB of memory usage

That's A LOT!

Trying to extract anything slightly bigger will exhaust the 32GB of RAM of my laptop and kill my R session (Linux OOM Killer in action).

For comparison, loading the full array in memory as an ordinary array (dense representation) only consumes 5.1 GB:

b <- as.array(B)  # 'top' command reports about 5.1 GB of memory usage

And from there I can extract the random subset very efficiently:

m2 <- extract_array(b,  index)

but this of course kind of defeats the purpose of using a sparse representation in the first place.

H.

sessionInfo():

R version 4.6.0 alpha (2026-04-05 r89793)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.4 LTS

Matrix products: default
BLAS:   /home/hpages/R/R-4.6.r89793/lib/libRblas.so 
LAPACK: /home/hpages/R/R-4.6.r89793/lib/libRlapack.so;  LAPACK version 3.12.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB              LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: America/Los_Angeles
tzcode source: system (glibc)

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] RcppSpdlog_0.0.28           TileDBArray_1.21.1         
 [3] TENxBrainData_1.31.0        SingleCellExperiment_1.33.2
 [5] SummarizedExperiment_1.41.1 Biobase_2.71.0             
 [7] GenomicRanges_1.63.2        Seqinfo_1.1.0              
 [9] ExperimentHub_3.1.0         AnnotationHub_4.1.0        
[11] BiocFileCache_3.1.0         dbplyr_2.5.2               
[13] HDF5Array_1.39.1            h5mread_1.3.3              
[15] rhdf5_2.55.16               DelayedArray_0.37.1        
[17] SparseArray_1.11.13         S4Arrays_1.11.1            
[19] IRanges_2.45.0              abind_1.4-8                
[21] S4Vectors_0.49.1            MatrixGenerics_1.23.0      
[23] matrixStats_1.5.0           BiocGenerics_0.57.0        
[25] generics_0.1.4              Matrix_1.7-5               

loaded via a namespace (and not attached):
 [1] KEGGREST_1.51.1      httr2_1.2.2          lattice_0.22-9      
 [4] rhdf5filters_1.23.3  vctrs_0.7.2          tools_4.6.0         
 [7] curl_7.0.0           tibble_3.3.1         AnnotationDbi_1.73.1
[10] RSQLite_2.4.6        blob_1.3.0           pkgconfig_2.0.3     
[13] data.table_1.18.2.1  lifecycle_1.0.5      compiler_4.6.0      
[16] Biostrings_2.79.5    nanoarrow_0.8.0      yaml_2.3.12         
[19] pillar_1.11.1        crayon_1.5.3         cachem_1.1.0        
[22] RcppCCTZ_0.2.14      tiledb_0.33.0        tidyselect_1.2.1    
[25] dplyr_1.2.1          purrr_1.2.1          BiocVersion_3.23.1  
[28] fastmap_1.2.0        grid_4.6.0           cli_3.6.6           
[31] magrittr_2.0.5       spdl_0.0.5           withr_3.0.2         
[34] filelock_1.0.3       rappdirs_0.3.4       bit64_4.6.0-1       
[37] nanotime_0.3.13      XVector_0.51.0       httr_1.4.8          
[40] bit_4.6.0            zoo_1.8-15           png_0.1-9           
[43] memoise_2.0.1        rlang_1.2.0          Rcpp_1.1.1          
[46] glue_1.8.0           DBI_1.3.0            BiocManager_1.30.27 
[49] R6_2.6.1             Rhdf5lib_1.33.6     

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions