Accessing the OncoKB API with oncoKBData

oncoKBData

Lifecycle: experimental

The aim of the package is to expose the OncoKB API through an R client. This vignette demonstrates public API access. To learn more about the OncoKB database, visit https://www.oncokb.org.

Installation

To get the development version of oncoKBData use:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("waldronlab/oncoKBData")

Package Load

library(oncoKBData)
library(S4Vectors)

Introduction

The oncoKBData aims to provide access to the OncoKB API via the public API. Access is also possible with a licensed token.

API representation

In order to use the OncoKB API, we must instantiate an API object as provided by the rapiclient and AnVIL packages.

oncokb <- oncoKB()
## Warning in .service_validate_sha256(api_reference_url, api_reference_md5sum, : service version differs from validated version
##     service url: https://www.oncokb.org/api/v1/v2/api-docs?group=Public%20APIs
##     observed sha256: 3b593f651276b907c9c07456743b7cbb0a4cd92cb4a4046a0c53820e9d42fb79
##     expected sha256: c4bb5d9a8e130c154a22a5be604da4f3

Note that for private API access, users must change the api. argument in the oncoKB function.

Operations

Check available tags, operations, and descriptions as a tibble:

tags(oncokb)
## # A tibble: 26 × 3
##    tag                      operation                                    summary
##    <chr>                    <chr>                                        <chr>  
##  1 Annotations for Germline annotateMutationsByGenomicChangeGetUsingGET… annota…
##  2 Annotations for Germline annotateMutationsByGenomicChangePostUsingPO… annota…
##  3 Annotations for Germline annotateMutationsByHGVScGetUsingGET_3        annota…
##  4 Annotations for Germline annotateMutationsByHGVScPostUsingPOST_3      annota…
##  5 Annotations for Germline annotateMutationsByHGVSgGetUsingGET_3        annota…
##  6 Annotations for Germline annotateMutationsByHGVSgPostUsingPOST_3      annota…
##  7 Annotations for Somatic  annotateCopyNumberAlterationsGetUsingGET_1   annota…
##  8 Annotations for Somatic  annotateCopyNumberAlterationsPostUsingPOST_1 annota…
##  9 Annotations for Somatic  annotateMutationsByGenomicChangeGetUsingGET… annota…
## 10 Annotations for Somatic  annotateMutationsByGenomicChangePostUsingPO… annota…
## # ℹ 16 more rows
head(tags(oncokb)$operation)
## [1] "annotateMutationsByGenomicChangeGetUsingGET_3"  
## [2] "annotateMutationsByGenomicChangePostUsingPOST_3"
## [3] "annotateMutationsByHGVScGetUsingGET_3"          
## [4] "annotateMutationsByHGVScPostUsingPOST_3"        
## [5] "annotateMutationsByHGVSgGetUsingGET_3"          
## [6] "annotateMutationsByHGVSgPostUsingPOST_3"

Note. The annotations API access requires a token.

Levels of Evidence

To retrieve the levels of evidence for all types (i.e., ‘therapeutic’, ‘diagnostic’, ‘prognostic’, and ‘FDA’) run the levelsOfEvidence function.

(loe <- levelsOfEvidence(oncokb))
## DataFrame with 16 rows and 4 columns
##     levelOfEvidence            description        htmlDescription    colorHex
##         <character>            <character>            <character> <character>
## 1           LEVEL_1 FDA-recognized bioma.. <span><b>FDA-recogni..     #33A02C
## 2           LEVEL_2 Standard care biomar.. <span><b>Standard ca..     #1F78B4
## 3          LEVEL_3A Compelling clinical .. <span><b>Compelling ..     #984EA3
## 4          LEVEL_3B Standard care or inv.. <span><b>Standard ca..     #BE98CE
## 5           LEVEL_4 Compelling biologica.. <span><b>Compelling ..     #424242
## ...             ...                    ...                    ...         ...
## 12        LEVEL_Px1 FDA and/or professio.. <span><b>FDA and/or ..     #33A02C
## 13        LEVEL_Px2 FDA and/or professio.. <span><b>FDA and/or ..     #1F78B4
## 14        LEVEL_Px3 Biomarker is prognos.. <span>Biomarker is p..     #984EA3
## 15         LEVEL_R1 Standard care biomar.. <span><b>Standard of..     #EE3424
## 16         LEVEL_R2 Compelling clinical .. <span><b>Compelling ..     #F79A92

It will return a DataFrame with important metadata:

names(metadata(loe))
## [1] "oncoTreeVersion" "ncitVersion"     "dataVersion"     "appVersion"     
## [5] "apiVersion"      "publicInstance"  "genomeNexus"
metadata(loe)["oncoTreeVersion"]
## $oncoTreeVersion
## [1] "oncotree_2025_10_03"
metadata(loe)[["apiVersion"]]
## $version
## [1] "v1.6.0"
## 
## $major
## [1] 1
## 
## $minor
## [1] 6
## 
## $patch
## [1] 0
## 
## $suffixTokens
## list()
## 
## $stable
## [1] TRUE

Gene tables

The API allows retrieval of curated genes where there is a single gene per observation:

curatedGenes(oncokb)
## # A tibble: 1,127 × 13
##    grch37Isoform grch37RefSeq grch38Isoform grch38RefSeq entrezGeneId hugoSymbol
##    <chr>         <chr>        <chr>         <chr>               <int> <chr>     
##  1 ENST00000373… ""           ENST00000373… ""                  29974 A1CF      
##  2 ENST00000248… ""           ENST00000248… "NM_001087.…           14 AAMP      
##  3 ENST00000265… "NM_000927.… ENST00000622… "NM_0013489…         5243 ABCB1     
##  4 ENST00000285… "NM_003786.… ENST00000285… "NM_003786.…         8714 ABCC3     
##  5 ENST00000376… ""           ENST00000376… "NM_0010127…        10006 ABI1      
##  6 ENST00000318… "NM_005157.… ENST00000318… "NM_005157.…           25 ABL1      
##  7 ENST00000502… "NM_007314.… ENST00000502… "NM_007314.…           27 ABL2      
##  8 ENST00000321… "NM_139076.… ENST00000321… "NM_139076.…        84142 ABRAXAS1  
##  9 ENST00000353… ""           ENST00000616… "NM_198834.…           31 ACACA     
## 10 ENST00000272… "NM_020311"  ENST00000272… "NM_020311"         57007 ACKR3     
## # ℹ 1,117 more rows
## # ℹ 7 more variables: geneType <chr>, setting <chr>,
## #   highestSensitiveLevel <chr>, highestResistanceLevel <chr>, summary <chr>,
## #   background <chr>, highestResistancLevel <chr>

and a long list of genes associated with cancer where there can be multiple entries for the same hugoSymbol due to multiple geneAliases:

cancerGeneList(oncokb)
## # A tibble: 3,400 × 16
##    hugoSymbol entrezGeneId grch37Isoform grch37RefSeq grch38Isoform grch38RefSeq
##    <chr>             <int> <chr>         <chr>        <chr>         <chr>       
##  1 ABL1                 25 ENST00000318… NM_005157.4  ENST00000318… NM_005157.4 
##  2 ABL1                 25 ENST00000318… NM_005157.4  ENST00000318… NM_005157.4 
##  3 ABL1                 25 ENST00000318… NM_005157.4  ENST00000318… NM_005157.4 
##  4 AKT1                207 ENST00000349… NM_00101443… ENST00000349… NM_00101443…
##  5 AKT1                207 ENST00000349… NM_00101443… ENST00000349… NM_00101443…
##  6 AKT1                207 ENST00000349… NM_00101443… ENST00000349… NM_00101443…
##  7 AKT1                207 ENST00000349… NM_00101443… ENST00000349… NM_00101443…
##  8 AKT1                207 ENST00000349… NM_00101443… ENST00000349… NM_00101443…
##  9 ALK                 238 ENST00000389… NM_004304.4  ENST00000389… NM_004304.4 
## 10 AMER1            139285 ENST00000330… NM_152424.3  ENST00000374… NM_152424.3 
## # ℹ 3,390 more rows
## # ℹ 10 more variables: oncokbAnnotated <lgl>, occurrenceCount <int>,
## #   mSKImpact <lgl>, mSKHeme <lgl>, foundation <lgl>, foundationHeme <lgl>,
## #   vogelstein <lgl>, sangerCGC <lgl>, geneType <chr>, geneAliases <list>

Session Information

Click to expand sessionInfo()
R version 4.6.1 (2026-06-24)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 26.04 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.32.so;  LAPACK version 3.12.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Etc/UTC
tzcode source: system (glibc)

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] S4Vectors_0.51.5    BiocGenerics_0.59.8 generics_0.1.4     
[4] oncoKBData_0.99.7   AnVIL_1.25.1        AnVILBase_1.7.0    
[7] dplyr_1.2.1         BiocStyle_2.41.0   

loaded via a namespace (and not attached):
 [1] utf8_1.2.6            rappdirs_0.3.4        sass_0.4.10          
 [4] tidyr_1.3.2           futile.options_1.0.1  digest_0.6.39        
 [7] magrittr_2.0.5        evaluate_1.0.5        fastmap_1.2.0        
[10] jsonlite_2.0.0        promises_1.5.0        formatR_1.14         
[13] BiocManager_1.30.27.3 httr_1.4.8            purrr_1.2.2          
[16] rapiclient_0.1.8      codetools_0.2-20      httr2_1.2.3          
[19] jquerylib_0.1.4       cli_3.6.6             shiny_1.14.0         
[22] rlang_1.2.0           futile.logger_1.4.9   cachem_1.1.0         
[25] yaml_2.3.12           otel_0.2.0            BiocBaseUtils_1.15.1 
[28] tools_4.6.1           httpuv_1.6.17         DT_0.34.0            
[31] lambda.r_1.2.4        curl_7.1.0            GCPtools_1.3.2       
[34] mime_0.13             buildtools_1.0.0      vctrs_0.7.3          
[37] R6_2.6.1              lifecycle_1.0.5       htmlwidgets_1.6.4    
[40] miniUI_0.1.2          pkgconfig_2.0.3       pillar_1.11.1        
[43] bslib_0.11.0          later_1.4.8           glue_1.8.1           
[46] Rcpp_1.1.1-1.1        xfun_0.59             tibble_3.3.1         
[49] tidyselect_1.2.1      keyring_1.4.1         sys_3.4.3            
[52] knitr_1.51            xtable_1.8-8          htmltools_0.5.9      
[55] rmarkdown_2.31        maketools_1.3.2       compiler_4.6.1