--- title: "Introduction to the AnVILAz package" author: - name: Marcel Ramos affiliation: Roswell Park Comprehensive Cancer Center email: marcel.ramos@sph.cuny.edu - name: Martin Morgan affiliation: Roswell Park Comprehensive Cancer Center email: Martin.Morgan@RoswellPark.org output: BiocStyle::html_document: self_contained: yes toc: true toc_float: true toc_depth: 2 code_folding: show date: "`r doc_date()`" package: "`r pkg_ver('AnVILAz')`" vignette: > %\VignetteIndexEntry{Introduction to the AnVILAz package} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} az_ok <- AnVILAz::az_health_check() knitr::opts_chunk$set( collapse = TRUE, ## Related to https://stat.ethz.ch/pipermail/bioc-devel/2020-April/016656.html crop = NULL, eval = az_ok ) options(width = 75) ``` # Installation The package is not yet available from [Bioconductor](https://bioconductor.org). Install the development version of the _AnVILAz_ package from GitHub with ```{r install, eval = FALSE} if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager", repos = "https://cran.r-project.org") BiocManager::install("Bioconductor/AnVILAz") ``` Once installed, load the package with ```{r library, message = FALSE, eval = TRUE, cache = FALSE} library(AnVILAz) ``` # File Management For this tutorial we will refer to the Azure Blob Storage service as ABS. Within the ABS, we are given access to a Container. For more information, follow this [link](https://learn.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction#containers) to Microsoft's definition of containers and blobs. ## List Azure Blob Storage Container Files ```{r, eval=az_ok} avlist() ``` The `avlist` command corresponds to a view of the files in the Blob container on Azure. They can also be accessed via the [Microsoft Azure Storage Explorer](https://azure.microsoft.com/en-us/products/storage/storage-explorer). ![Azure Storage Explorer](AzStorageExplorer.png) ## Uploading a file As an example, we load the internal `mtcars` dataset and save it as an `.Rda` file with `save`. We can then upload this file to the ABS. ```{r, eval=az_ok} data("mtcars", package = "datasets") test <- head(mtcars) save(test, file = "mydata.Rda") ``` Now we can upload the data to the `analyses/` folder in the Azure Blob Storage (ABS) Container. ```{r, eval=az_ok} avcopy("mydata.Rda", "analyses/") ``` We can also use a small log file for demonstration purposes. The `jupyter.log` file is already present in our workspace directory. ```{r, eval=az_ok} avcopy("jupyter.log", "analyses/") ``` ## Deleting a file We can remove the data with `avremove` and the _relative_ path to the `.Rda` file. ```{r, eval=az_ok} avremove("analyses/mydata.Rda") ``` ## Downloading from the ABS The reverse operation is also possible with a remote and local paths as the first and second arguments, respectively. ```{r, eval=az_ok} avcopy("analyses/jupyter.log", "./test/") ``` ## Folder-wise upload to ABS To upload an entire folder, we can use `avbackup`. Note that the entire `test` folder becomes a subfolder of the remote `analyses` folder in this example. ```{r, eval=az_ok} avbackup("./test/", "analyses/") ``` ## Folder-wise download from ABS By default, the entire `source` directory will be copied to the current working directory `"."`, i.e., the base workspace directory. ```{r, eval=az_ok} avrestore("analyses/test") ``` You may also move this to another folder by providing a folder name as the second argument. ```{r, eval=az_ok} avrestore("analyses/test", "test") ``` # The `DATA` tab ## `mtcars` example First we create an example dataset for uploading to the `DATA` tab. We create a `model_id` column from the `rownames`. ```{r, eval=az_ok} library(dplyr) mtcars_tbl <- mtcars |> as_tibble(rownames = "model_id") |> mutate(model_id = gsub(" ", "-", model_id)) ``` ## Uploading data The `avtable_import` command takes an existing R object (usually a `tibble`) and uploads to the `DATA` tab in the AnVIL User Interface. The `table` argument will set the name of the table. We also need to provide the `primaryKey` which corresponds to the column name that uniquely identifies each row in the data. Typically, the `primaryKey` column provides a list of patient or UUID identifiers and is in the first column of the data. ```{r, eval=az_ok} mtcars_tbl |> avtable_import(table = "testData", primaryKey = "model_id") ``` ## Downloading data The `avtable` function will pull the data from the `DATA` tab and represent the data locally as a `tibble`. It works by using the same `type` identifier (i.e., the `table` argument) that was used when the data was uploaded. ```{r, eval=az_ok} model_data <- avtable(table = "testData") head(model_data) ``` ## Delete a row in the table The API allows deletion of specific rows in the data using `avtable_delete_values`. To indicate which row to delete, provide the a value or set of values that correspond to row identifiers in the `primaryKey`. In this example, we remove the `AMC-Javelin` entry from the data. We are left with 31 records. ```{r, eval=az_ok} avtable_delete_values(table = "testData", values = "AMC-Javelin") ``` ## Delete entire table To remove the entire table from the `DATA` tab, we can use the `avtable_delete` method with the corresponding table name. ```{r, eval=az_ok} avtable_delete(table = "testData") ``` # Bug Reports If you experience issues, please feel free to contact us with a reproducible example on GitHub: # Session information {.unnumbered} ```{r sessionInfo, echo = FALSE, eval = TRUE} sessionInfo() ```