driverMAPS is a software to capture positive selection signals using somatic point mutations in cancer.

Overview of the method

We model aggregated exonic somatic mutation counts from many tumor samples (e.g. as obtained from a normal-tumor paired sequencing cohort). Let Yg denote the mutation count data in gene g. We develop models for Yg under three different hypotheses: that the gene is a “non- driver gene” (\(H_0\)), an “oncogene” (\(H_{OG}\)) or a “tumor suppressor gene” (\(H_{TSG}\)). Each model has two parts, a background mutation model (BMM), which models the background mutation process, and a selection mutation model (SMM), which models how selection acts on functional mutations. The BMM parameters are shared by all three hypotheses, reflecting the assumption that background mutation processes are the same for cancer driver and non-driver genes. In contrast the SMM parameters are hypothesis-specific, to capture the different selection pressures in oncogenes vs tumor suppressor genes vs non-driver genes. We fit the hypothesis-specific parameters using training sets of known oncogenes1 (\(H_{OG}\)), known TSGs1 (\(H_{TSG}\)), and all other genes (\(H_0\)). (This last set will contain some – as yet unidentified – driver genes, which will tend to make our methods conservative in terms of identifying new driver genes.) To combine information across tumor types we first estimate parameters separately in each tumor type, and then stabilize these estimates using Empirical Bayes shrinkage

Having fit these models, we use them to identify genes whose mutation data are most consistent with the driver genes models (HOG and \(H_{TSG}\)). Specifically, for each gene g, we measure the overall evidence for g to be a driver gene by the Bayes Factor (likelihood ratio), BFg, defined as: \[BF_g := 0.5 [Pr(Y_g | H_{OG}) + Pr(Y_g | H_{TSG})] / Pr(Y_g | H_0)\] Large values of BFg indicate strong evidence for g being a driver gene, and at any given threshold we can estimate the Bayesian FDR. For results reported here we chose the threshold by requiring FDR<0.1.

Implementation

We provided a snakemake package for driverMAPS. This provides all steps in need to produce the results shown in the paper. More specifically here are the step names (as defined in the snakefile file) and their function:

To install driverMAPS, please see here. In this simplest case, one can use driverMAPS to call drivers from a single tumor cohort (see Quick start). You can also try to re-produce the results showed in the paper by changing the config file (see here), and use Filtered mutation lists for 20 tumor types used in MAPS paper as input.

Lastest News

2018.07.12 Optimized background parameter inference(BMRinfer) procedures. This step now should take < 1 hour.Fixed a bug in getting standard error for parameters. released v1.0.3.

2018.05.12 Added a demo. This demo run can be finished on a laptop computer within half an hour. released v1.0.2

2018.04.18 Bug fix, released v1.0.1

2018.02.01 First version released. 1.0.0

Reference

Zhao, S. et al. Model-based analysis of positive selection significantly expands the list of cancer driver genes, including RNA methyltransferases. bioRxiv (2018). at http://biorxiv.org/content/early/2018/07/12/366823.abstract


This R Markdown site was created with workflowr