Accessing the OncoKB API with oncoKBData

oncoKBData

Lifecycle: experimental

The aim of the package is to expose the OncoKB API through an R client. This vignette demonstrates public API access. To learn more about the OncoKB database, visit https://www.oncokb.org.

Installation

To get the development version of oncoKBData use:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("waldronlab/oncoKBData")

Package Load

library(oncoKBData)
library(S4Vectors)

Introduction

The oncoKBData aims to provide access to the OncoKB API via the public API. Access is also possible with a licensed token.

API representation

In order to use the OncoKB API, we must instantiate an API object as provided by the rapiclient and AnVIL packages.

oncokb <- oncoKB()
## Warning in .service_validate_sha256(api_reference_url, api_reference_md5sum, : service version differs from validated version
##     service url: https://www.oncokb.org/api/v1/v2/api-docs?group=Public%20APIs
##     observed sha256: 436004d847c3e706a2083ddfce8e346cbcef8aa8494e168bcddd0872dde146cd
##     expected sha256: c4bb5d9a8e130c154a22a5be604da4f3

Note that for private API access, users must change the api. argument in the oncoKB function.

Operations

Check available tags, operations, and descriptions as a tibble:

tags(oncokb)
## # A tibble: 26 × 3
##    tag                      operation                                    summary
##    <chr>                    <chr>                                        <chr>  
##  1 Annotations for Germline annotateMutationsByGenomicChangeGetUsingGET… annota…
##  2 Annotations for Germline annotateMutationsByGenomicChangePostUsingPO… annota…
##  3 Annotations for Germline annotateMutationsByHGVScGetUsingGET_3        annota…
##  4 Annotations for Germline annotateMutationsByHGVScPostUsingPOST_3      annota…
##  5 Annotations for Germline annotateMutationsByHGVSgGetUsingGET_3        annota…
##  6 Annotations for Germline annotateMutationsByHGVSgPostUsingPOST_3      annota…
##  7 Annotations for Somatic  annotateCopyNumberAlterationsGetUsingGET_1   annota…
##  8 Annotations for Somatic  annotateCopyNumberAlterationsPostUsingPOST_1 annota…
##  9 Annotations for Somatic  annotateMutationsByGenomicChangeGetUsingGET… annota…
## 10 Annotations for Somatic  annotateMutationsByGenomicChangePostUsingPO… annota…
## # ℹ 16 more rows
head(tags(oncokb)$operation)
## [1] "annotateMutationsByGenomicChangeGetUsingGET_3"  
## [2] "annotateMutationsByGenomicChangePostUsingPOST_3"
## [3] "annotateMutationsByHGVScGetUsingGET_3"          
## [4] "annotateMutationsByHGVScPostUsingPOST_3"        
## [5] "annotateMutationsByHGVSgGetUsingGET_3"          
## [6] "annotateMutationsByHGVSgPostUsingPOST_3"

Note. The annotations API access requires a token.

Levels of Evidence

To retrieve the levels of evidence for all types (i.e., ‘therapeutic’, ‘diagnostic’, ‘prognostic’, and ‘FDA’) run the levelsOfEvidence function.

(loe <- levelsOfEvidence(oncokb))
## DataFrame with 16 rows and 4 columns
##     levelOfEvidence            description        htmlDescription    colorHex
##         <character>            <character>            <character> <character>
## 1           LEVEL_1 FDA-recognized bioma.. <span><b>FDA-recogni..     #33A02C
## 2           LEVEL_2 Standard care biomar.. <span><b>Standard ca..     #1F78B4
## 3          LEVEL_3A Compelling clinical .. <span><b>Compelling ..     #984EA3
## 4          LEVEL_3B Standard care or inv.. <span><b>Standard ca..     #BE98CE
## 5           LEVEL_4 Compelling biologica.. <span><b>Compelling ..     #424242
## ...             ...                    ...                    ...         ...
## 12        LEVEL_Px1 FDA and/or professio.. <span><b>FDA and/or ..     #33A02C
## 13        LEVEL_Px2 FDA and/or professio.. <span><b>FDA and/or ..     #1F78B4
## 14        LEVEL_Px3 Biomarker is prognos.. <span>Biomarker is p..     #984EA3
## 15         LEVEL_R1 Standard care biomar.. <span><b>Standard of..     #EE3424
## 16         LEVEL_R2 Compelling clinical .. <span><b>Compelling ..     #F79A92

It will return a DataFrame with important metadata:

names(metadata(loe))
## [1] "oncoTreeVersion" "ncitVersion"     "dataVersion"     "appVersion"     
## [5] "apiVersion"      "publicInstance"  "genomeNexus"
metadata(loe)["oncoTreeVersion"]
## $oncoTreeVersion
## [1] "oncotree_2025_10_03"
metadata(loe)[["apiVersion"]]
## $version
## [1] "v1.6.0"
## 
## $major
## [1] 1
## 
## $minor
## [1] 6
## 
## $patch
## [1] 0
## 
## $suffixTokens
## list()
## 
## $stable
## [1] TRUE

Gene tables

The API allows retrieval of curated genes where there is a single gene per observation:

curatedGenes(oncokb)
## # A tibble: 1,121 × 13
##    grch37Isoform grch37RefSeq grch38Isoform grch38RefSeq entrezGeneId hugoSymbol
##    <chr>         <chr>        <chr>         <chr>               <int> <chr>     
##  1 ENST00000373… ""           ENST00000373… ""                  29974 A1CF      
##  2 ENST00000248… ""           ENST00000248… "NM_001087.…           14 AAMP      
##  3 ENST00000265… "NM_000927.… ENST00000622… "NM_0013489…         5243 ABCB1     
##  4 ENST00000285… "NM_003786.… ENST00000285… "NM_003786.…         8714 ABCC3     
##  5 ENST00000318… "NM_005157.… ENST00000318… "NM_005157.…           25 ABL1      
##  6 ENST00000502… "NM_007314.… ENST00000502… "NM_007314.…           27 ABL2      
##  7 ENST00000321… "NM_139076.… ENST00000321… "NM_139076.…        84142 ABRAXAS1  
##  8 ENST00000353… ""           ENST00000616… "NM_198834.…           31 ACACA     
##  9 ENST00000272… "NM_020311"  ENST00000272… "NM_020311"         57007 ACKR3     
## 10 ENST00000336… "NM_001099.… ENST00000336… "NM_001099.…           55 ACP3      
## # ℹ 1,111 more rows
## # ℹ 7 more variables: geneType <chr>, setting <chr>,
## #   highestSensitiveLevel <chr>, highestResistanceLevel <chr>, summary <chr>,
## #   background <chr>, highestResistancLevel <chr>

and a long list of genes associated with cancer where there can be multiple entries for the same hugoSymbol due to multiple geneAliases:

cancerGeneList(oncokb)
## # A tibble: 3,395 × 16
##    hugoSymbol entrezGeneId grch37Isoform grch37RefSeq grch38Isoform grch38RefSeq
##    <chr>             <int> <chr>         <chr>        <chr>         <chr>       
##  1 ABL1                 25 ENST00000318… NM_005157.4  ENST00000318… NM_005157.4 
##  2 ABL1                 25 ENST00000318… NM_005157.4  ENST00000318… NM_005157.4 
##  3 ABL1                 25 ENST00000318… NM_005157.4  ENST00000318… NM_005157.4 
##  4 AKT1                207 ENST00000349… NM_00101443… ENST00000349… NM_00101443…
##  5 AKT1                207 ENST00000349… NM_00101443… ENST00000349… NM_00101443…
##  6 AKT1                207 ENST00000349… NM_00101443… ENST00000349… NM_00101443…
##  7 AKT1                207 ENST00000349… NM_00101443… ENST00000349… NM_00101443…
##  8 AKT1                207 ENST00000349… NM_00101443… ENST00000349… NM_00101443…
##  9 ALK                 238 ENST00000389… NM_004304.4  ENST00000389… NM_004304.4 
## 10 AMER1            139285 ENST00000330… NM_152424.3  ENST00000374… NM_152424.3 
## # ℹ 3,385 more rows
## # ℹ 10 more variables: oncokbAnnotated <lgl>, occurrenceCount <int>,
## #   mSKImpact <lgl>, mSKHeme <lgl>, foundation <lgl>, foundationHeme <lgl>,
## #   vogelstein <lgl>, sangerCGC <lgl>, geneType <chr>, geneAliases <list>

Session Information

Click to expand sessionInfo()
R version 4.6.0 (2026-04-24)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Etc/UTC
tzcode source: system (glibc)

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] S4Vectors_0.51.3    BiocGenerics_0.59.6 generics_0.1.4     
[4] oncoKBData_0.99.7   AnVIL_1.25.1        AnVILBase_1.7.0    
[7] dplyr_1.2.1         BiocStyle_2.41.0   

loaded via a namespace (and not attached):
 [1] utf8_1.2.6           rappdirs_0.3.4       sass_0.4.10         
 [4] tidyr_1.3.2          futile.options_1.0.1 digest_0.6.39       
 [7] magrittr_2.0.5       evaluate_1.0.5       fastmap_1.2.0       
[10] jsonlite_2.0.0       promises_1.5.0       formatR_1.14        
[13] BiocManager_1.30.27  httr_1.4.8           purrr_1.2.2         
[16] rapiclient_0.1.8     codetools_0.2-20     httr2_1.2.2         
[19] jquerylib_0.1.4      cli_3.6.6            shiny_1.13.0        
[22] rlang_1.2.0          futile.logger_1.4.9  cachem_1.1.0        
[25] yaml_2.3.12          otel_0.2.0           BiocBaseUtils_1.15.1
[28] tools_4.6.0          httpuv_1.6.17        DT_0.34.0           
[31] lambda.r_1.2.4       curl_7.1.0           GCPtools_1.3.2      
[34] mime_0.13            buildtools_1.0.0     vctrs_0.7.3         
[37] R6_2.6.1             lifecycle_1.0.5      htmlwidgets_1.6.4   
[40] miniUI_0.1.2         pkgconfig_2.0.3      pillar_1.11.1       
[43] bslib_0.11.0         later_1.4.8          glue_1.8.1          
[46] Rcpp_1.1.1-1.1       xfun_0.58            tibble_3.3.1        
[49] tidyselect_1.2.1     keyring_1.4.1        sys_3.4.3           
[52] knitr_1.51           xtable_1.8-8         htmltools_0.5.9     
[55] rmarkdown_2.31       maketools_1.3.2      compiler_4.6.0