Introduction to the AnVILAz package

Installation

The package is not yet available from Bioconductor.

Install the development version of the AnVILAz package from GitHub with

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager", repos = "https://cran.r-project.org")
BiocManager::install("Bioconductor/AnVILAz")

Once installed, load the package with

library(AnVILAz)

File Management

For this tutorial we will refer to the Azure Blob Storage service as ABS. Within the ABS, we are given access to a Container. For more information, follow this link to Microsoft’s definition of containers and blobs.

List Azure Blob Storage Container Files

avlist()

The avlist command corresponds to a view of the files in the Blob container on Azure. They can also be accessed via the Microsoft Azure Storage Explorer.

Azure Storage Explorer
Azure Storage Explorer

Uploading a file

As an example, we load the internal mtcars dataset and save it as an .Rda file with save. We can then upload this file to the ABS.

data("mtcars", package = "datasets")
test <- head(mtcars)
save(test, file = "mydata.Rda")

Now we can upload the data to the analyses/ folder in the Azure Blob Storage (ABS) Container.

avcopy("mydata.Rda", "analyses/")

We can also use a small log file for demonstration purposes. The jupyter.log file is already present in our workspace directory.

avcopy("jupyter.log", "analyses/")

Deleting a file

We can remove the data with avremove and the relative path to the .Rda file.

avremove("analyses/mydata.Rda")

Downloading from the ABS

The reverse operation is also possible with a remote and local paths as the first and second arguments, respectively.

avcopy("analyses/jupyter.log", "./test/")

Folder-wise upload to ABS

To upload an entire folder, we can use avbackup. Note that the entire test folder becomes a subfolder of the remote analyses folder in this example.

avbackup("./test/", "analyses/")

Folder-wise download from ABS

By default, the entire source directory will be copied to the current working directory ".", i.e., the base workspace directory.

avrestore("analyses/test")

You may also move this to another folder by providing a folder name as the second argument.

avrestore("analyses/test", "test")

The DATA tab

mtcars example

First we create an example dataset for uploading to the DATA tab. We create a model_id column from the rownames.

library(dplyr)
mtcars_tbl <-
    mtcars |>
    as_tibble(rownames = "model_id") |>
    mutate(model_id = gsub(" ", "-", model_id))

Uploading data

The avtable_import command takes an existing R object (usually a tibble) and uploads to the DATA tab in the AnVIL User Interface. The table argument will set the name of the table. We also need to provide the primaryKey which corresponds to the column name that uniquely identifies each row in the data. Typically, the primaryKey column provides a list of patient or UUID identifiers and is in the first column of the data.

mtcars_tbl |> avtable_import(table = "testData", primaryKey = "model_id")

Downloading data

The avtable function will pull the data from the DATA tab and represent the data locally as a tibble. It works by using the same type identifier (i.e., the table argument) that was used when the data was uploaded.

model_data <- avtable(table = "testData")
head(model_data)

Delete a row in the table

The API allows deletion of specific rows in the data using avtable_delete_values. To indicate which row to delete, provide the a value or set of values that correspond to row identifiers in the primaryKey. In this example, we remove the AMC-Javelin entry from the data. We are left with 31 records.

avtable_delete_values(table = "testData", values = "AMC-Javelin")

Delete entire table

To remove the entire table from the DATA tab, we can use the avtable_delete method with the corresponding table name.

avtable_delete(table = "testData")

Bug Reports

If you experience issues, please feel free to contact us with a reproducible example on GitHub:

https://github.com/Bioconductor/AnVILAz/issues

Session information

## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Etc/UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] AnVILAz_1.1.0    BiocStyle_2.33.1
## 
## loaded via a namespace (and not attached):
##  [1] vctrs_0.6.5          httr_1.4.7           cli_3.6.3           
##  [4] knitr_1.48           rlang_1.1.4          xfun_0.48           
##  [7] jsonlite_1.8.9       glue_1.8.0           rjsoncons_1.3.1.9100
## [10] buildtools_1.0.0     htmltools_0.5.8.1    maketools_1.3.1     
## [13] BiocBaseUtils_1.7.3  sys_3.4.3            sass_0.4.9          
## [16] fansi_1.0.6          rappdirs_0.3.3       rmarkdown_2.28      
## [19] tibble_3.2.1         evaluate_1.0.1       jquerylib_0.1.4     
## [22] fastmap_1.2.0        yaml_2.3.10          lifecycle_1.0.4     
## [25] httr2_1.0.5          BiocManager_1.30.25  compiler_4.4.1      
## [28] pkgconfig_2.0.3      digest_0.6.37        R6_2.5.1            
## [31] utf8_1.2.4           pillar_1.9.0         magrittr_2.0.3      
## [34] bslib_0.8.0          tools_4.4.1          AnVILBase_0.99.32   
## [37] cachem_1.1.0