Title: | Collection of simple tools for learning about Bioconductor Packages |
---|---|
Description: | Bioconductor has a rich ecosystem of metadata around packages, usage, and build status. This package is a simple collection of functions to access that metadata from R. The goal is to expose metadata for data mining and value-added functionality such as package searching, text mining, and analytics on packages. |
Authors: | Shian Su [aut, ctb], Lori Shepherd [ctb], Marcel Ramos [aut, ctb] , Felix G.M. Ernst [ctb], Jennifer Wokaty [ctb], Charlotte Soneson [ctb], Martin Morgan [ctb], Vince Carey [ctb], Sean Davis [aut, cre] |
Maintainer: | Sean Davis <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.23.1 |
Built: | 2024-11-01 06:28:04 UTC |
Source: | https://github.com/seandavi/BiocPkgTools |
get the ORCID id from cre field of Authors@R in packageDescription result
.get_cre_orcid(pkgname)
.get_cre_orcid(pkgname)
pkgname |
character(1) |
process employment data from ORCID
.get_orcid_rec(orcid, rename = TRUE)
.get_orcid_rec(orcid, rename = TRUE)
orcid |
character(1) |
rename |
logical(1) if TRUE use short names |
This function uses the gh
package to get a list of either issues, pull
requests, or GitHub commits since the specified date for a particular GitHub
repository. The repository must have both the username / organization and the
name, e.g., "Bioconductor/S4Vectors".
activitySince( gh_repo, activity = c("issues", "pulls", "commits"), status = c("closed", "open", "all"), Date, issue_metadata = c("created_at", "number", "title"), token = NULL )
activitySince( gh_repo, activity = c("issues", "pulls", "commits"), status = c("closed", "open", "all"), Date, issue_metadata = c("created_at", "number", "title"), token = NULL )
gh_repo |
character(1) The GitHub repository location including the username / organization and the repository name, e.g., "Bioconductor/S4Vectors" |
activity |
character(1) The type of repository activity to pull from the GitHub API. It can be one of "issues" (default), "pulls", or "commits". |
status |
character(1) One of 'closed', 'open', or 'all' corresponding to the issue state desired from the GitHub API (Default: "closed"). This argument is ignored for the "commits" activity report. |
Date |
character(1) The date cutoff from which to analyze closed issues in the YYYY-MM-DD or YYYY-MM-DDTHH:MM:SSZ format (ISO 8601). |
issue_metadata |
character() The metadata labels to extract from the
|
token |
character(1) For big requests, e.g., commit history, you may be prompted to use a GitHub Personal Access Token. Enter the token as plain text. |
The tibble
returned by the commits activity report contains five
columns:
'committer_date'
'commit' - hash
'parents' - hash of parent for merge commits
'author'
'message'
For information on other columns, refer to the GitHub API under repository
issues or pulls (e.g., /repos/:repo/issues
).
A tibble
with three columns corresponding to issue metadata (i.e.,
"created_at", "number", "title")
if (interactive()) { activitySince("Bioconductor/S4Vectors", "issues", "closed", "2021-05-01") activitySince("Bioconductor/S4Vectors", "issues", "open", "2022-05-01") activitySince("Bioconductor/S4Vectors", "commits", Date = "2022-05-01") }
if (interactive()) { activitySince("Bioconductor/S4Vectors", "issues", "closed", "2021-05-01") activitySince("Bioconductor/S4Vectors", "issues", "open", "2022-05-01") activitySince("Bioconductor/S4Vectors", "commits", Date = "2022-05-01") }
Get download statistics for Bioconductor packages distributed via Anaconda.
anacondaDownloadStats()
anacondaDownloadStats()
Anaconda provide daily download counts for all software packages they distribute. These are summarised into monthly tables of counts and made available from https://github.com/grimbough/anaconda-download-stats This function provides a mechanism to download these monthly counts for Bioconductor packages distributed through Anaconda.
A data.frame
of download statistics for
all Bioconductor packages distributed by Anaconda, in tidy format.
Note: Anaconda do not provide counts for unique IP addresses. This column
is listed as NA
for all packages to provide continuity with data from
Bioconductor.org obtained by biocDownloadStats
. The counts are
updated monthly, so do not expect to see counts for the current month.
Mike L. Smith
anacondaDownloadStats()
anacondaDownloadStats()
The \code{biocBuildEmail} function provides a template for notifying
maintainers of errors in the Bioconductor Build System (BBS). This convenience function returns the body of the email from a template within the package and provides a copy in the clipboard.
biocBuildEmail( pkg, version = c("release", "devel"), PS = character(1L), dry.run = TRUE, to = NULL, cc = NULL, bcc = NULL, emailTemplate = templatePath(), core.name = NULL, core.email = NULL, core.id = NULL, textOnly = FALSE, resend = FALSE, verbose = FALSE, credFile = "~/.blastula_creds" ) sentHistory()
biocBuildEmail( pkg, version = c("release", "devel"), PS = character(1L), dry.run = TRUE, to = NULL, cc = NULL, bcc = NULL, emailTemplate = templatePath(), core.name = NULL, core.email = NULL, core.id = NULL, textOnly = FALSE, resend = FALSE, verbose = FALSE, credFile = "~/.blastula_creds" ) sentHistory()
pkg |
character(1) The name of the package in trouble |
version |
character() A vector indicating which version of Bioconductor the package is failing in (either 'release' or 'devel'; defaults to both) |
PS |
character(1) Postscript, an additional note to the recipient of the email (i.e., the package maintainer) |
dry.run |
logical(1) Display the email without sending to the recipient.
It only works for HTML email reports and ignored when |
to |
character() A vector of email addresses serving as primary
recipients for the message. For secondary recipients, use the |
cc , bcc
|
character() A vector of email addresses for sending the message as a carbon copy or blind carbon copy. |
emailTemplate |
character(1) The path to the email template Rmd file as
obtained by |
core.name |
character(1) The full name of the core team member |
core.email |
character(1) The Roswell Park email of the core team member |
core.id |
character(1) The internal identifier for the Roswell employee.
This ID usually matches |
textOnly |
logical(1) Whether to return the text of the email only.
This avoids the use of the 'blastula' package and adds the text to the
system clipboard if the |
resend |
logical(1) Whether to force a resend of the email |
verbose |
logical(1) Whether to output full email information from
'smtp_send' (when |
credFile |
character(1) An optional file generated by the
|
The credFile
argument is a convenience for avoiding password entry
at every instance an email is sent. If the default file
~/.blastula_creds
does not exist, the user will be prompted for
authorization information. Currently it is configured to emails for the
core-team:
blastula::create_smtp_creds_file( file = "~/.blastula_creds", user = "[email protected]", host = "smtp.office365.com", port = 587, use_ssl = TRUE )
A character string of the email
Check the history of emails sent
The online Bioconductor build reports are great for humans to look at, but they are not easily computable. This function scrapes HTML and text files available from the build report online pages to generate a tidy data frame version of the build report.
biocBuildReport( version = BiocManager::version(), pkgType = c("software", "data-experiment", "data-annotation", "workflows"), stage.timings = FALSE )
biocBuildReport( version = BiocManager::version(), pkgType = c("software", "data-experiment", "data-annotation", "workflows"), stage.timings = FALSE )
version |
character(1) the character version number
as used to access the online build report. For
example, "3.14". The default is the "current version"
as given by |
pkgType |
|
stage.timings |
logical(1) Whether to include the start, end, and elapsed time for each build, check, install stage from each building in the result (default: FALSE) |
A tbl_df
object with columns pkg, version,
author, commit, date, node, stage, and result.
# Set the stage--what version of Bioc am I using? BiocManager::version() latest_build <- biocBuildReport() head(latest_build)
# Set the stage--what version of Bioc am I using? BiocManager::version() latest_build <- biocBuildReport() head(latest_build)
This function parses the Build Report tarball for a Bioconductor
release. By default it will pull all the report.tgz
files for each
Bioconductor package type. The Bioconductor Build System (BBS) Build Report
tarball contains build status information for all packages in a
Bioconductor release. This function is mainly used by biocBuildReport()
.
biocBuildReportDB( version = BiocManager::version(), pkgType = c("software", "data-experiment", "data-annotation", "workflows"), stage.timings = FALSE )
biocBuildReportDB( version = BiocManager::version(), pkgType = c("software", "data-experiment", "data-annotation", "workflows"), stage.timings = FALSE )
version |
|
pkgType |
|
stage.timings |
logical(1) Whether to include the start, end, and elapsed time for each build, check, install stage from each building in the result (default: FALSE) |
This function downloads and parses the build status information for Bioconductor packages. The build status information is available for the current release and the previous release. Other versions may be available.
biocBuildStatusDB( version = BiocManager::version(), pkgType = c("software", "data-experiment", "data-annotation", "workflows") )
biocBuildStatusDB( version = BiocManager::version(), pkgType = c("software", "data-experiment", "data-annotation", "workflows") )
version |
|
pkgType |
|
A data.frame
with the following columns:
pkg: The name of the package
node: The builder on which the package was built
stage: The stage of the build, e.g., 'install', 'buildsrc', 'checksrc', etc.
result: The status of the build, e.g., 'OK', 'ERROR', 'WARNINGS', etc.
Get Bioconductor download statistics
biocDownloadStats( pkgType = c("software", "data-experiment", "workflows", "data-annotation") )
biocDownloadStats( pkgType = c("software", "data-experiment", "workflows", "data-annotation") )
pkgType |
character(1) All or one of 'software', 'data-experiment', 'workflows', or 'data-annotation' (defaults to all types) |
Note that Bioconductor package download stats are not version-specific.
A tibble
of download statistics for all Bioconductor packages
biocDownloadStats()
biocDownloadStats()
Explore Bioconductor packages through an interactive bubble plot. Click on bubbles to bring up additional information about the package. Size and proximity to center of a bubble is based on the downloads the package has in the past month.
biocExplore(top = 500L, ...)
biocExplore(top = 500L, ...)
top |
maximum number of packages displayed in any biocView |
... |
parameters passed to |
A bubble plot of Bioconductor packages
List all the packages associated with a maintainer. By default, it will
return all packages associated with the [email protected]
email.
hasBiocMaint
returns a logical vector corresponding to the input character
vector of packages indicating whether any package is maintained by the
Bioconductor core team.
biocMaintained( main = "maintainer@bioconductor\\.org", version = BiocManager::version(), pkgType = c("software", "data-experiment", "workflows", "data-annotation") ) hasBiocMaint( pkg, version = BiocManager::version(), main = "maintainer@bioconductor\\.org", repo = c("BioCsoft", "BioCexp", "BioCworkflows", "BioCann") )
biocMaintained( main = "maintainer@bioconductor\\.org", version = BiocManager::version(), pkgType = c("software", "data-experiment", "workflows", "data-annotation") ) hasBiocMaint( pkg, version = BiocManager::version(), main = "maintainer@bioconductor\\.org", repo = c("BioCsoft", "BioCexp", "BioCworkflows", "BioCann") )
main |
character(1) The regex for searching through the Maintainer
column as obtained from |
version |
character(1) the character version number
as used to access the online build report. For
example, "3.14". The default is the "current version"
as given by |
pkgType |
|
pkg |
|
repo |
|
For biocMaintained
: a tibble
of packages associated with the
maintainer.
For hasBiocMaint
: a logical vector indicating whether the
package is maintained by Bioconductor.
biocMaintained() ## maintained by Hervé and not maintainer at bioconductor dot org hasBiocMaint("BiocGenerics")
biocMaintained() ## maintained by Hervé and not maintainer at bioconductor dot org hasBiocMaint("BiocGenerics")
The BiocViews-generated VIEWS
file is available
for Bioconductor release and devel repositories. It
contains quite a bit more information from the
package DESCRIPTION
files than the PACKAGES
file. In particular, it contains biocViews
annotations
and URLs for vignettes and developer URLs.
biocPkgList( version = BiocManager::version(), repo = c("BioCsoft", "BioCexp", "BioCworkflows", "BioCann", "CRAN"), addBiocViewParents = TRUE )
biocPkgList( version = BiocManager::version(), repo = c("BioCsoft", "BioCexp", "BioCworkflows", "BioCann", "CRAN"), addBiocViewParents = TRUE )
version |
The requested Bioconductor version. Will
default to use the BiocManager defaults (i.e., |
repo |
|
addBiocViewParents |
|
Since packages are annotated with the most specific
views, the default functionality here is to add parent terms
for all views for each package. For example, in the bioCsoft
repository, all packages will have at least "Software" added
to their biocViews. If one wants to stick to only the most
specific terms, set addBiocViewParents
to FALSE
.
An object of class tbl_df
.
bpkgl <- biocPkgList(repo = "BioCsoft") bpkgl unlist(bpkgl[1,'Depends'], use.names = FALSE) # Get a list of all packages that # import "GEOquery" library(dplyr) bpkgl |> filter(Package == 'GEOquery') |> pull('importsMe') |> unlist()
bpkgl <- biocPkgList(repo = "BioCsoft") bpkgl unlist(bpkgl[1,'Depends'], use.names = FALSE) # Get a list of all packages that # import "GEOquery" library(dplyr) bpkgl |> filter(Package == 'GEOquery') |> pull('importsMe') |> unlist()
Grab build report results from BUILD_STATUS_DB for a particular package range
biocPkgRanges( start, end, condition = c("ERROR", "WARNINGS"), phase = "buildsrc", version = c("devel", "release") )
biocPkgRanges( start, end, condition = c("ERROR", "WARNINGS"), phase = "buildsrc", version = c("devel", "release") )
start |
character(1) alphabetically first package name in range |
end |
character(1) alphabetically last package name in range |
condition |
character(1) condition string, typically 'ERROR' or 'WARNING' |
phase |
character(1) string for phase of event: 'install', 'checksrc', or 'buildsrc' (default) |
version |
character(1) string indication Bioconductor version, either 'devel' (default) or 'release' |
Vincent J. Carey
## Not run: biocPkgRanges( start = "a4", end = "CMA", condition = "ERROR", version = "devel" ) ## End(Not run)
## Not run: biocPkgRanges( start = "a4", end = "CMA", condition = "ERROR", version = "devel" ) ## End(Not run)
Bioconductor has a rich ecosystem of metadata around packages, usage, and build status. This package is a simple collection of functions to access that metadata from R. The goal is to expose metadata for data mining and value-added functionality such as package searching, text mining, and analytics on packages.
The biocBuildReport
function returns a computable
form of the Bioconductor Build Report.
The biocDownloadStats
function gets Bioconductor
download stats, allowing users to quickly find commonly used
packages. The biocPkgList
is useful for getting
a complete listing of all Bioconductor packages.
Bioconductor packages all have Digital Object Identifiers (DOIs). This package contains basic infrastructure for creating, updating, and de-referencing DOIs.
Maintainer: Sean Davis [email protected]
Authors:
Shian Su [email protected] [contributor]
Marcel Ramos [email protected] (ORCID) [contributor]
Other contributors:
Lori Shepherd [email protected] [contributor]
Felix G.M. Ernst [email protected] [contributor]
Jennifer Wokaty [email protected] [contributor]
Charlotte Soneson [email protected] [contributor]
Martin Morgan [email protected] [contributor]
Vince Carey [email protected] [contributor]
Useful links:
Report bugs at https://github.com/seandavi/BiocPkgTools/issues/new
Managing user data is important to allow use of email functions
such as biocBuildEmail
and made easy with BiocFileCache
.
setCache( directory = tools::R_user_dir("BiocPkgTools", "cache"), verbose = TRUE, ask = interactive() ) pkgToolsCache(...)
setCache( directory = tools::R_user_dir("BiocPkgTools", "cache"), verbose = TRUE, ask = interactive() ) pkgToolsCache(...)
directory |
The file location where the cache is located. Once set future downloads will go to this folder. |
verbose |
Whether to print descriptive messages |
ask |
logical (default TRUE when interactive session) Confirm the file location of the cache directory |
... |
For |
Get the directory location of the cache. It will prompt the user to create
a cache if not already created. A specific directory can be used via
setCache
.
Specify the directory location of the data cache. By default, it will
got to the user's home/.cache/R and "appname" directory as specified by
tools::R_user_dir
(with package="BiocPkgTools" and which="cache").
The biocRevDepEmail
function collects all the emails of the reverse
dependencies and sends a notification that upstream package(s) have been
deprecated or removed. It uses a template found in inst/resources
with the
templatePath()
function.
biocRevDepEmail( packages, which = c("strong", "most", "all"), PS = character(1L), version = BiocManager::version(), dry.run = TRUE, cc = NULL, emailTemplate = templatePath("revdepnote"), core.name = NULL, core.email = NULL, core.id = NULL, textOnly = FALSE, verbose = FALSE, credFile = "~/.blastula_creds", ..., pkg )
biocRevDepEmail( packages, which = c("strong", "most", "all"), PS = character(1L), version = BiocManager::version(), dry.run = TRUE, cc = NULL, emailTemplate = templatePath("revdepnote"), core.name = NULL, core.email = NULL, core.id = NULL, textOnly = FALSE, verbose = FALSE, credFile = "~/.blastula_creds", ..., pkg )
packages |
|
which |
a character vector listing the types of
dependencies, a subset of
|
PS |
character(1) Postscript, an additional note to the recipient of the email (i.e., the package maintainer) |
version |
character() A vector indicating which version of Bioconductor the package is failing in (either 'release' or 'devel'; defaults to both) |
dry.run |
logical(1) Display the email without sending to the recipient.
It only works for HTML email reports and ignored when |
cc |
character() A vector of email addresses for sending the message as a carbon copy. |
emailTemplate |
character(1) The path to the email template Rmd file as
obtained by |
core.name |
character(1) The full name of the core team member |
core.email |
character(1) The Roswell Park email of the core team member |
core.id |
character(1) The internal identifier for the Roswell employee.
This ID usually matches |
textOnly |
logical(1) Whether to return the text of the email only.
This avoids the use of the 'blastula' package and adds the text to the
system clipboard if the |
verbose |
logical(1) Whether to output full email information from
'smtp_send' (when |
credFile |
character(1) An optional file generated by the
|
pkg |
|
... |
Additional inputs to internal functions (not used). |
biocRevDepEmail( "FindMyFriends", version = "3.13", dry.run = TRUE, textOnly = TRUE )
biocRevDepEmail( "FindMyFriends", version = "3.13", dry.run = TRUE, textOnly = TRUE )
Bioconductor is built using an extensive set of
core capabilities and data structures. This leads
to package developers depending on other packages
for interoperability and functionality. This
function extracts package dependency information
from biocPkgList
and returns a tidy
data.frame
that can be used for analysis
and to build graph structures of package dependencies.
buildPkgDependencyDataFrame(dependencies = c("strong", "most", "all"), ...)
buildPkgDependencyDataFrame(dependencies = c("strong", "most", "all"), ...)
dependencies |
character() a vector listing the types of dependencies, a subset of c("Depends", "Imports", "LinkingTo", "Suggests", "Enhances"). Character string "all" is shorthand for that vector, character string "most" for the same vector without "Enhances", character string "strong" (default) for the first three elements of that vector. |
... |
parameters passed along to |
A data.frame
(also a tbl_df
) of
S3 class "biocDepDF" including columns "Package", "dependency",
and "edgetype".
This function requires network access.
See buildPkgDependencyIgraph
, biocPkgList
.
# performs a network call, so must be online. library(BiocPkgTools) depdf <- buildPkgDependencyDataFrame() head(depdf) library(dplyr) # filter to include only "Imports" type # dependencies imports_only <- depdf |> filter(edgetype=='Imports') # top ten most imported packages imports_only |> select(dependency) |> group_by(dependency) |> tally() |> arrange(desc(n)) # The Bioconductor packages with the # largest number of imports largest_importers <- imports_only |> select(Package) |> group_by(Package) |> tally() |> arrange(desc(n)) # not sure what these packages do. Join # to their descriptions biocPkgList() |> select(Package, Description) |> left_join(largest_importers) |> arrange(desc(n)) |> head()
# performs a network call, so must be online. library(BiocPkgTools) depdf <- buildPkgDependencyDataFrame() head(depdf) library(dplyr) # filter to include only "Imports" type # dependencies imports_only <- depdf |> filter(edgetype=='Imports') # top ten most imported packages imports_only |> select(dependency) |> group_by(dependency) |> tally() |> arrange(desc(n)) # The Bioconductor packages with the # largest number of imports largest_importers <- imports_only |> select(Package) |> group_by(Package) |> tally() |> arrange(desc(n)) # not sure what these packages do. Join # to their descriptions biocPkgList() |> select(Package, Description) |> left_join(largest_importers) |> arrange(desc(n)) |> head()
Package dependencies represent a directed
graph (though Bioconductor dependencies are
not an acyclic graph). This function simply
returns an igraph graph from the package
dependency data frame from a call to
buildPkgDependencyDataFrame
or
any tidy data frame with rows of (Package, dependency)
pairs. Additional columns are added as igraph edge
attributes (see graph_from_data_frame
).
buildPkgDependencyIgraph(pkgDepDF)
buildPkgDependencyIgraph(pkgDepDF)
pkgDepDF |
a tidy data frame. See description for details. |
An igraph directed graph. See the igraph package for details of what can be done.
See buildPkgDependencyDataFrame
,
graph_from_data_frame
,
inducedSubgraphByPkgs
, subgraphByDegree
,
igraph-es-indexing
,
igraph-vs-indexing
library(igraph) pkg_dep_df = buildPkgDependencyDataFrame() # at this point, filter or join to manipulate # dependency data frame as you see fit. g = buildPkgDependencyIgraph(pkg_dep_df) g # Look at nodes and edges head(V(g)) # vertices head(E(g)) # edges # subset graph by attributes head(sort(degree(g, mode='in'), decreasing=TRUE)) head(sort(degree(g, mode='out'), decreasing=TRUE))
library(igraph) pkg_dep_df = buildPkgDependencyDataFrame() # at this point, filter or join to manipulate # dependency data frame as you see fit. g = buildPkgDependencyIgraph(pkg_dep_df) g # Look at nodes and edges head(V(g)) # vertices head(E(g)) # edges # subset graph by attributes head(sort(degree(g, mode='in'), decreasing=TRUE)) head(sort(degree(g, mode='out'), decreasing=TRUE))
As the title says it should do something with class relationships
buildClassDepGraph(class, includeUnions = FALSE) buildClassDepData(class, includeUnions = FALSE) buildClassDepFromPackage(pkg, includeUnions = FALSE) plotClassDep(class, includeUnions = FALSE) plotClassDepData(data) plotClassDepGraph(g)
buildClassDepGraph(class, includeUnions = FALSE) buildClassDepData(class, includeUnions = FALSE) buildClassDepFromPackage(pkg, includeUnions = FALSE) plotClassDep(class, includeUnions = FALSE) plotClassDepData(data) plotClassDepGraph(g)
class |
a single |
includeUnions |
|
pkg |
a single |
data |
a |
g |
an |
library("SummarizedExperiment") depData <- buildClassDepData("RangedSummarizedExperiment") depData g <- buildClassDepGraph("RangedSummarizedExperiment") plotClassDepGraph(g)
library("SummarizedExperiment") depData <- buildClassDepData("RangedSummarizedExperiment") depData g <- buildClassDepGraph("RangedSummarizedExperiment") plotClassDepGraph(g)
The CRANstatus
function allows users to check the status of a package
and send an email report of any failures.
CRANstatus( pkg, core.name = NULL, core.email = NULL, core.id = NULL, to.mail = "[email protected]", dry.run = TRUE, emailTemplate = templatePath("cranreport") )
CRANstatus( pkg, core.name = NULL, core.email = NULL, core.id = NULL, to.mail = "[email protected]", dry.run = TRUE, emailTemplate = templatePath("cranreport") )
pkg |
character(1) The name of the package in trouble |
core.name |
character(1) The full name of the core team member |
core.email |
character(1) The Roswell Park email of the core team member |
core.id |
character(1) The internal identifier for the Roswell employee.
This ID usually matches |
to.mail |
The email of the CRAN report recipient |
dry.run |
logical(1) Display the email without sending to the recipient.
It only works for HTML email reports and ignored when |
emailTemplate |
character(1) The path to the email template Rmd file as
obtained by |
This function uses the biocDownloadStats data to approximate when a package entered Bioconductor. Note that the download stats go back only to 2009.
firstInBioc(download_stats)
firstInBioc(download_stats)
download_stats |
a data.frame from |
dls <- biocDownloadStats() tail(firstInBioc(dls))
dls <- biocDownloadStats() tail(firstInBioc(dls))
This function makes calls out to the DataCite REST API described here: https://support.datacite.org/docs/api-create-dois. The function creates a new DOI for a Bioconductor package (cannot already exist). The target URL for the DOI is the short Bioconductor package URL.
generateBiocPkgDOI(pkg, authors, pubyear, event = "publish", testing = TRUE)
generateBiocPkgDOI(pkg, authors, pubyear, event = "publish", testing = TRUE)
pkg |
character(1) package name |
authors |
character vector of authors (will be "pasted" together) |
pubyear |
integer(1) publication year |
event |
Either "hide", "register", or publish". Typically, we use "publish" to make the DOI findable. |
testing |
logical(1) If true, will use the apitest user with the password apitest. These DOIs will expire. The same apitest:apitest combination can be used to login to the website for doing things using the web interface. If false, the Bioconductor-specific user credentials should be in the correct environment variables |
The login information for the "real" Bioconductor account should be stored in the environment variables "DATACITE_USERNAME" and "DATACITE_PASSWORD
The GUI is available here: https://doi.datacite.org/.
The DOI as a character(1) vector.
## Not run: x = generateBiocPkgDOI('RANDOM_TEST_PACKAGE','Sean Davis',1972) ## End(Not run)
## Not run: x = generateBiocPkgDOI('RANDOM_TEST_PACKAGE','Sean Davis',1972) ## End(Not run)
Get data from Bioconductor
get_bioc_data()
get_bioc_data()
A JSON string containing Bioconductor package details
bioc_data <- get_bioc_data()
bioc_data <- get_bioc_data()
get ORCID ids from cre fields of Authors@R in packageDescription results
get_cre_orcids(pkgnames)
get_cre_orcids(pkgnames)
pkgnames |
character() must be installed |
returns NA if no ORCID provided in Authors@R for package description
get_cre_orcids(c("BiocPkgTools", "utils"))
get_cre_orcids(c("BiocPkgTools", "utils"))
The actual vignette path is available using
biocPkgList
.
getBiocVignette( vignettePath, destfile = tempfile(), version = BiocManager::version() )
getBiocVignette( vignettePath, destfile = tempfile(), version = BiocManager::version() )
vignettePath |
character(1) the additional path information to get to the vignette |
destfile |
character(1) the file location to store the vignette |
version |
character(1) such as "3.7", defaults to user version |
character(1) The filename of the downloaded vignette
x = biocPkgList() tmp = getBiocVignette(x$vignettes[[1]][1]) tmp ## Not run: library(pdftools) y = pdf_text(tmp) y = paste(y,collapse=" ") library(tm) v = VCorpus(VectorSource(y)) library(magrittr) v <- v %>% tm_map(stripWhitespace) %>% tm_map(content_transformer(tolower)) %>% tm_map(removeWords, stopwords("english")) %>% tm_map(stemDocument) dtm = DocumentTermMatrix(v) inspect(DocumentTermMatrix(v, list(dictionary = as.character(x$Package)))) ## End(Not run)
x = biocPkgList() tmp = getBiocVignette(x$vignettes[[1]][1]) tmp ## Not run: library(pdftools) y = pdf_text(tmp) y = paste(y,collapse=" ") library(tm) v = VCorpus(VectorSource(y)) library(magrittr) v <- v %>% tm_map(stripWhitespace) %>% tm_map(content_transformer(tolower)) %>% tm_map(removeWords, stopwords("english")) %>% tm_map(stemDocument) dtm = DocumentTermMatrix(v) inspect(DocumentTermMatrix(v, list(dictionary = as.character(x$Package)))) ## End(Not run)
Generate needed information to create DOI from a package directory.
getPackageInfo(dir)
getPackageInfo(dir)
dir |
character(1) Path to package |
A data.frame
For packages that live on GitHub, we can mine further details. This function returns the GitHub details for the listed packages.
githubDetails(pkgs, sleep = 0)
githubDetails(pkgs, sleep = 0)
pkgs |
a character() vector of username/repo
for one or more GitHub repos, such as |
sleep |
numeric() denoting the number of seconds to
sleep between GitHub API calls. Since GitHub rate limits
its APIs, it might be necessary to either use small
chunks of packages iteratively or to supply a non-zero
argument here. See the |
The gh
function is used to
do the fetching. If the number of packages supplied
to this function is large (>40 or so), it is possible
to run into problems with API rate limits. The gh
package uses the environment variable "GITHUB_PAT"
(for personal access token) to authenticate and then
provide higher rate limits. If you run into problems
with rate limits, set sleep to some small positive
number to slow queries. Alternatively, create a Personal
Access Token on GitHub and register it. See the gh
package for details.
pkglist = biocPkgList() # example of "pkgs" format. head(pkglist$URL) gh_list = githubURLParts(pkglist$URL) gh_list = gh_list[!is.null(gh_list$user_repo),] head(gh_list$user_repo) ghd = githubDetails(gh_list$user_repo[1:5]) lapply(ghd, '[[', "stargazers")
pkglist = biocPkgList() # example of "pkgs" format. head(pkglist$URL) gh_list = githubURLParts(pkglist$URL) gh_list = gh_list[!is.null(gh_list$user_repo),] head(gh_list$user_repo) ghd = githubDetails(gh_list$user_repo[1:5]) lapply(ghd, '[[', "stargazers")
Extract GitHub user and repo name from GitHub URL
githubURLParts(urls)
githubURLParts(urls)
urls |
|
A data.frame
with four columns:
url: The original GitHub URL
user_repo: The GitHub "username/repo", combined
user: The GitHub username
repo: The GitHub repo name
# find GitHub URL details for # Bioconductor packages bpkgl = biocPkgList() urldetails = githubURLParts(bpkgl$URL) urldetails = urldetails[!is.na(urldetails$url),] head(urldetails)
# find GitHub URL details for # Bioconductor packages bpkgl = biocPkgList() urldetails = githubURLParts(bpkgl$URL) urldetails = urldetails[!is.na(urldetails$url),] head(urldetails)
Find the subgraph induced by including specific packages. The induced subgraph is the graph that includes the named packages and all edges connecting them. This is useful for a developer, for example, to examine her packages and their intervening dependencies.
inducedSubgraphByPkgs(g, pkgs, pkg_color = "red")
inducedSubgraphByPkgs(g, pkgs, pkg_color = "red")
g |
an igraph graph, typically created by
|
pkgs |
character() vector of packages to include. Package names not included in the graph are ignored. |
pkg_color |
character(1) giving color of named packages. Other packages in the graph that fall in connecting paths will be colored as the igraph default. |
library(igraph) g <- buildPkgDependencyIgraph(buildPkgDependencyDataFrame()) ## subgraph of only the first 10 packages maintained by Bioconductor biocmaintained <- head(biocMaintained()[["Package"]], 10L) g2 <- inducedSubgraphByPkgs(g, pkgs = biocmaintained) g2 V(g2) plot(g2) ## subgraph of a package's strong Bioconductor package dependencies maedeps <- unlist(pkgBiocDeps( "MultiAssayExperiment", which = "strong", recursive = TRUE, only.bioc = TRUE ), use.names = FALSE) g3 <- inducedSubgraphByPkgs(g, pkgs = maedeps) plot(g3) ## same subgraph with networkD3::forceNetwork library(networkD3) wt <- cluster_walktrap(g3) members <- membership(wt) ndg3 <- igraph_to_networkD3(g3, group = members) forceNetwork( Links = ndg3$links, Nodes = ndg3$nodes, Source = 'source', Target = 'target', NodeID = 'name', Group = 'group', zoom = TRUE, linkDistance = 200, fontSize = 20, opacity = 0.9, opacityNoHover = 0.9 )
library(igraph) g <- buildPkgDependencyIgraph(buildPkgDependencyDataFrame()) ## subgraph of only the first 10 packages maintained by Bioconductor biocmaintained <- head(biocMaintained()[["Package"]], 10L) g2 <- inducedSubgraphByPkgs(g, pkgs = biocmaintained) g2 V(g2) plot(g2) ## subgraph of a package's strong Bioconductor package dependencies maedeps <- unlist(pkgBiocDeps( "MultiAssayExperiment", which = "strong", recursive = TRUE, only.bioc = TRUE ), use.names = FALSE) g3 <- inducedSubgraphByPkgs(g, pkgs = maedeps) plot(g3) ## same subgraph with networkD3::forceNetwork library(networkD3) wt <- cluster_walktrap(g3) members <- membership(wt) ndg3 <- igraph_to_networkD3(g3, group = members) forceNetwork( Links = ndg3$links, Nodes = ndg3$nodes, Source = 'source', Target = 'target', NodeID = 'name', Group = 'group', zoom = TRUE, linkDistance = 200, fontSize = 20, opacity = 0.9, opacityNoHover = 0.9 )
The latestPkgStats
function combines outputs from several functions to
generate a table of relevant statistics for a given package.
latestPkgStats( gh_repo, Date, pkgType = c("software", "data-experiment", "workflows", "data-annotation") )
latestPkgStats( gh_repo, Date, pkgType = c("software", "data-experiment", "workflows", "data-annotation") )
gh_repo |
character(1) The GitHub repository location including the username / organization and the repository name, e.g., "Bioconductor/S4Vectors" |
Date |
character(1) The date cutoff from which to analyze closed issues in the YYYY-MM-DD or YYYY-MM-DDTHH:MM:SSZ format (ISO 8601). |
pkgType |
character(1) One of 'software', 'data-experiment', 'workflows', or 'data-annotation' (defaults to 'software') |
if (interactive()) { latestPkgStats("Bioconductor/BiocGenerics", "2021-05-05") }
if (interactive()) { latestPkgStats("Bioconductor/BiocGenerics", "2021-05-05") }
get data.frame of employment info from orcid
orcid_table(orcids)
orcid_table(orcids)
orcids |
character() |
if (interactive()) { # need a token? oids <- c("0000-0003-4046-0063", "0000-0003-4046-0063") print(orcid_table(oids)) oids <- c(oids, NA) print(orcid_table(oids)) print(orcid_table(oids[1])) }
if (interactive()) { # need a token? oids <- c("0000-0003-4046-0063", "0000-0003-4046-0063") print(orcid_table(oids)) oids <- c(oids, NA) print(orcid_table(oids)) print(orcid_table(oids[1])) }
The function uses the pkgType
argument to restrict the look up to only the
relevant Bioconductor repository. It works for multiple packages of the same
type.
pkgBiocDeps( pkg, pkgType = c("software", "data-experiment", "workflows", "data-annotation"), which = "strong", only.bioc = TRUE, recursive = FALSE, version = BiocManager::version() )
pkgBiocDeps( pkg, pkgType = c("software", "data-experiment", "workflows", "data-annotation"), which = "strong", only.bioc = TRUE, recursive = FALSE, version = BiocManager::version() )
pkg |
|
pkgType |
|
which |
a character vector listing the types of
dependencies, a subset of
|
only.bioc |
|
recursive |
a logical indicating whether (reverse) dependencies
of (reverse) dependencies (and so on) should be included, or a
character vector like |
version |
(Optional) |
pkgBiocDeps("MultiAssayExperiment", only.bioc = TRUE) pkgBiocDeps("MultiAssayExperiment", only.bioc = FALSE)
pkgBiocDeps("MultiAssayExperiment", only.bioc = TRUE) pkgBiocDeps("MultiAssayExperiment", only.bioc = FALSE)
The function returns a slightly upgraded list with dependency types as
elements and package names in each of those elements, if any. The
types of dependencies can be seen in the which
argument documentation.
pkgBiocRevDeps( pkg, pkgType = c("software", "data-experiment", "workflows", "data-annotation"), which = "all", only.bioc = TRUE, version = BiocManager::version() ) ## S3 method for class 'biocrevdeps' summary(object, ...)
pkgBiocRevDeps( pkg, pkgType = c("software", "data-experiment", "workflows", "data-annotation"), which = "all", only.bioc = TRUE, version = BiocManager::version() ) ## S3 method for class 'biocrevdeps' summary(object, ...)
pkg |
|
pkgType |
|
which |
a character vector listing the types of
dependencies, a subset of
|
only.bioc |
|
version |
(Optional) |
object |
an object for which a summary is desired. |
... |
additional arguments affecting the summary produced. |
The summary method of the biocrevdeps
class given by
pkgBiocRevDeps
provides a tally in each dependency field.
A biocrevdeps
list class object
rdeps <- pkgBiocRevDeps("MultiAssayExperiment", which = "all") rdeps summary(rdeps)
rdeps <- pkgBiocRevDeps("MultiAssayExperiment", which = "all") rdeps summary(rdeps)
Calculate dependency gain achieved by excluding combinations of packages
pkgCombDependencyGain(pkg, depdf, maxNbr = 3L)
pkgCombDependencyGain(pkg, depdf, maxNbr = 3L)
pkg |
character, the name of the package for which we want to estimate the dependency gain |
depdf |
a tidy data frame with package dependency information
obtained through the function |
maxNbr |
numeric, the maximal number of direct dependencies to leave out simultaneously |
A data frame with three columns: ExclPackages (the excluded direct dependencies), NbrExcl (the number of excluded direct dependencies), DepGain (the dependency gain from excluding these direct dependencies)
Charlotte Soneson
depdf <- buildPkgDependencyDataFrame( dependencies=c("Depends", "Imports"), repo=c("BioCsoft", "CRAN") ) pcd <- pkgCombDependencyGain('GEOquery', depdf, maxNbr = 3L) head(pcd[order(pcd$DepGain, decreasing = TRUE), ])
depdf <- buildPkgDependencyDataFrame( dependencies=c("Depends", "Imports"), repo=c("BioCsoft", "CRAN") ) pcd <- pkgCombDependencyGain('GEOquery', depdf, maxNbr = 3L) head(pcd[order(pcd$DepGain, decreasing = TRUE), ])
Function adapted from 'itdepends::dep_usage_pkg' at https://github.com/r-lib/itdepends to obtain the functionality imported and used by a given package.
pkgDepImports(pkg)
pkgDepImports(pkg)
pkg |
character() name of the package for which we want to obtain the functionality calls imported from its dependencies and used within the package. |
Certain imported elements, such as built-in constants, will not be identified as imported functionality by this function.
A tidy data frame with two columns:
pkg
: name of the package dependency.
fun
: name of the functionality call imported from the
the dependency in the column pkg
and used within
the analyzed package.
Robert Castelo
pkgDepImports('BiocPkgTools')
pkgDepImports('BiocPkgTools')
Elaborate a report on the dependency burden of a given package.
pkgDepMetrics(pkg, depdf)
pkgDepMetrics(pkg, depdf)
pkg |
character() name of the package for which we want to obtain metrics on its dependency burden. |
depdf |
a tidy data frame with package dependency information
obtained through the function |
A tidy data frame with different metrics on the package dependency burden. More concretely, the following columns:
ImportedAndUsed
: number of functionality calls imported and used in
the package.
Exported
: number of functionality calls exported by the dependency.
Usage
: (ImportedAndUsed
x 100) / Exported
. This value provides an
estimate of what fraction of the functionality of the dependency is
actually used in the given package.
DepOverlap
: Similarity between the dependency graph structure of the
given package and the one of the dependency in the corresponding row,
estimated as the Jaccard index
between the two sets of vertices of the corresponding graphs. Its values
goes between 0 and 1, where 0 indicates that no dependency is shared, while
1 indicates that the given package and the corresponding dependency depend
on an identical subset of packages.
DepGainIfExcluded
: The 'dependency gain' (decrease in the total number
of dependencies) that would be obtained if this package was excluded
from the list of direct dependencies.
The reported information is ordered by the Usage
column to facilitate the
identification of dependencies for which the analyzed package is using a small
fraction of their functionality and therefore, it could be easier remove them.
To aid in that decision, the column DepOverlap
reports the overlap of the
dependency graph of each dependency with the one of the analyzed package. Here
a value above, e.g., 0.5, could, albeit not necessarily, imply that removing
that dependency could substantially lighten the dependency burden of the analyzed
package.
An NA
value in the ImportedAndUsed
column indicates that the function
pkgDepMetrics()
could not identify what functionality calls in the analyzed
package are made to the dependency.
Robert Castelo
Charlotte Soneson
depdf <- buildPkgDependencyDataFrame( dependencies=c("Depends", "Imports"), repo=c("BioCsoft", "CRAN") ) pkgDepMetrics('BiocPkgTools', depdf)
depdf <- buildPkgDependencyDataFrame( dependencies=c("Depends", "Imports"), repo=c("BioCsoft", "CRAN") ) pkgDepMetrics('BiocPkgTools', depdf)
This function uses available.packages
to calculate the download rank
percentile of a given package. It approximates what is observed
in the Bioconductor landing page.
pkgDownloadRank( pkg, pkgType = c("software", "data-experiment", "workflows", "data-annotation"), version = BiocManager::version() )
pkgDownloadRank( pkg, pkgType = c("software", "data-experiment", "workflows", "data-annotation"), version = BiocManager::version() )
pkg |
character(1) The name of a Bioconductor package |
pkgType |
character(1) One of 'software', 'data-experiment', 'workflows', or 'data-annotation' (defaults to 'software') |
version |
(Optional) |
The package's percentile rank, in terms of download statistics, and proportion in the name
## Percentile rank for BiocGenerics (top 1%) pkgDownloadRank("BiocGenerics", "software")
## Percentile rank for BiocGenerics (top 1%) pkgDownloadRank("BiocGenerics", "software")
Get Bioconductor download statistics for a package
pkgDownloadStats( pkg, pkgType = c("software", "data-experiment", "workflows", "data-annotation"), years = format(Sys.time(), "%Y") )
pkgDownloadStats( pkg, pkgType = c("software", "data-experiment", "workflows", "data-annotation"), years = format(Sys.time(), "%Y") )
pkg |
character(1) The name of a Bioconductor package |
pkgType |
character(1) One of 'software', 'data-experiment', 'workflows', or 'data-annotation' (defaults to 'software') |
years |
numeric(), character() A vector of years from which to obtain download statistics (defaults to current year) |
A tibble
of download statistics
pkgDownloadStats("GenomicRanges")
pkgDownloadStats("GenomicRanges")
This is a quick way to get an HTML report of packages maintained by a specific developer or which depend directly on a specified package. The function is keyed to filter based on either the maintainer name or by using the 'Depends', 'Suggests' and 'Imports' fields in package descriptions.
problemPage( authorPattern = "V.*Carey", dependsOn, ver = "devel", includeOK = FALSE )
problemPage( authorPattern = "V.*Carey", dependsOn, ver = "devel", includeOK = FALSE )
authorPattern |
character(1) regexp used with grep() to filter author field of package DESCRIPTION for listing |
dependsOn |
character(1) name of a Bioconductor package. The function will return the status of packages that directly depend on this package Can only be used when 'authorPattern' is the empty string. |
ver |
character(1) version tag for Bioconductor |
includeOK |
logical(1) include entries from the build report that are listed as "OK". Default FALSE will result in only those entries that are in WARNING or ERROR state. |
DT::datatable call; if assigned to a variable, must evaluate to get the page to appear
Vince Carey, Mike L. Smith
if (interactive()) { problemPage() problemPage(dependsOn = "limma") }
if (interactive()) { problemPage() problemPage(dependsOn = "limma") }
Summarize binary packages compatible with the Bioconductor or Terra container in use.
repositoryStats( version = BiocManager::version(), binary_repository = BiocManager::containerRepository(version) ) ## S3 method for class 'repositoryStats' print(x, ...)
repositoryStats( version = BiocManager::version(), binary_repository = BiocManager::containerRepository(version) ) ## S3 method for class 'repositoryStats' print(x, ...)
version |
(Optional) |
binary_repository |
|
x |
the object returned by |
... |
further arguments passed to or from other methods (not used). |
a list of class repositoryStats
with the following fields:
container: character(1)
container label, e.g.,
bioconductor_docker
, or NA if not evaluated on a supported container
bioconductor_version: package_version
the
Bioconductor version provided by the user.
repository_exists: logical(1)
TRUE if a binary repository
exists for the container and Bioconductor_Version version.
bioconductor_binary_repository: character(1)
repository
location, if available, or NA if the repository does not exist.
n_software_packages: integer(1)
number of software packages
in the Bioconductor source repository.
n_binary_packages: integer(1)
number of binary packages
available. When a binary repository exists, this number is likely
to be larger than the number of source software packages, because
it includes the binary version of the source software packages, as
well as the (possibly CRAN) dependencies of the binary packages
n_binary_software_packages: integer(1)
number of binary
packages derived from Bioconductor source packages. This number is
less than or equal to n_software_packages
.
missing_binaries: integer(1)
the number of Bioconductor
source software packages that are not present in the binary
repository.
out_of_date_binaries: integer(1)
the number of Bioconductor
source software packages that are newer than their binary
counterpart. A newer source software package
might occur when the main Bioconductor build system has
updated a package after the most recent run of the binary
build system.
print(repositoryStats)
: Print a summary of package
availability in binary repositories.
M. Morgan
stats <- repositoryStats() # obtain statistics stats # display a summary stats$container # access an element for further computation
stats <- repositoryStats() # obtain statistics stats # display a summary stats$container # access an element for further computation
While the inducedSubgraphByPkgs
returns the subgraph with the minimal connections
between named packages, this function takes a vector of
package names, a degree (1 or more) and returns the
subgraph(s) that are within degree
of the
package named.
subgraphByDegree(g, pkg, degree = 1, ...)
subgraphByDegree(g, pkg, degree = 1, ...)
g |
an igraph graph, typically created by
|
pkg |
character(1) package name from which to measure degree. |
degree |
integer(1) degree, limit search for adjacent vertices to this degree. |
... |
passed on to |
An igraph graph, with only nodes and their edges within degree of the named package
g = buildPkgDependencyIgraph(buildPkgDependencyDataFrame()) g2 = subgraphByDegree(g, 'GEOquery') plot(g2)
g = buildPkgDependencyIgraph(buildPkgDependencyDataFrame()) g2 = subgraphByDegree(g, 'GEOquery') plot(g2)
These templates are used with biocBuildEmail
to notify maintainers
regarding package errors and final deprecation warning.
templatePath( type = c("buildemail", "deprecation", "deprecguide", "cranreport", "revdepnote") )
templatePath( type = c("buildemail", "deprecation", "deprecguide", "cranreport", "revdepnote") )
type |
character(1) Either one of "buildemail", "deprecation",
"deprecguide", "cranreport", or "revdepnote". See the templates in the
|