DawnRank: Discovering Personalized Driver Mutations in Cancer


Prerequisites

Source Code

About

DawnRank is an R package that identifies personalized driver mutations for any given patient sample. The input includes differential gene expression (between tumor and normal) and somatic alteration data (point mutations or CNVs). The output includes a ranked list of mutated genes according to their DawnRank score. A high DawnRank score for a altered gene in cancer indicates that the gene is more likely to be a driver.

Instructions

DawnRank was built under R version 2.15 and later. To install the package:
  • Open R
  • Download the DawnRank package and save it
  • Type the command install.packages("path to DawnRank file", repos = NULL, type = "source") to install the package. You will need to download this from the source. This needs only to be done once
  • Type the command library(DawnRank) to access the package's functions. This must be done every time when starting a new R session
  • Below is an example run of DawnRank

Example

The data provided in this tutorial comes preinstalled with the DawnRank package. The data includes a small subset of TCGA breast cancer (BRCA) patient molecular signature data, including tumor gene expression, normal gene expression, and mutation data. An example pathway from KEGG of 1492 nodes and a gold standard vector from the Cancer Gene Census are also provided.

### using a small subset of the TCGA dataset and a small KEGG gene
### interaction network, We will show how to get DawnRank Results

library(DawnRank)

# load the mutation data
data(brcaExampleMutation)

# load the tumor expression data
data(brcaExampleTumorExpression)

# load the normal expression data
data(brcaExampleNormalExpression)

# load the pathway data
data(brcaExamplePathway)

# load the gold standard
data(goldStandard)

# normalize the tumor and normal data to get the differential expression
normalizedDawn <- DawnNormalize(tumorMat = brcaExampleTumorExpression, normalMat = brcaExampleNormalExpression)

# get the DawnRank Score Get some coffee, this might take a while!
dawnRankScore <- DawnRank(adjMatrix = brcaExamplePathway, mutationMatrix = brcaExampleMutation, 
    expressionMatrix = normalizedDawn, mu = 3, goldStandard = goldStandard)

# look at the DawnRank scores for a few patients
dawnRankFrame <- dawnRankScore[[3]]
head(dawnRankFrame)
##    Gene      Patient      Rank PercentRank isGoldStandard Frequency
## 1 CNTFR TCGA-A1-A0SD 0.0001512       21.43              0         1
## 2 NCOA3 TCGA-A1-A0SD 0.0002021       28.87              0         6
## 3  CDH1 TCGA-A1-A0SE 0.0007916       77.23              1        33
## 4 PLCE1 TCGA-A1-A0SH 0.0010033       85.93              0         7
## 5 ACSL4 TCGA-A1-A0SH 0.0001951       30.01              0         2
## 6  RYR2 TCGA-A1-A0SH 0.0020122       96.12              0        21
##   Deviation
## 1   -0.7719
## 2   -0.7431
## 3   -0.1894
## 4   -0.1715
## 5   -0.6814
## 6    2.8303

# get the aggregate DawnRank scores Get some coffee, this might take a
# while!
aggregateDawnRankScore <- condorcetRanking(scoreMatrix = dawnRankScore[[2]], 
    mutationMatrix = brcaExampleMutation)

# look at top 10 ranked genes
top10 <- aggregateDawnRankScore[[2]][1:10]
top10
##   TP53  GATA3 MAP2K4    DMD   EGFR  ERBB2    KIT   CDH1  EP300 PIK3CA 
## 0.9983 0.9775 0.9684 0.9345 0.9146 0.8976 0.8803 0.8727 0.8645 0.8613

# get the individual cutoff for patient TCGA-A2-A04P
dawnRankFrame$isCGC <- dawnRankFrame$isGoldStandard
library(maxstat)
#NOTE: the latest version of mvnorm should be installed
significance <- patspeccutoff(patient = "TCGA-A2-A04P", ms = dawnRankFrame, 
    default = 95)
# look for signficance. 
significance[[1]]
##      Gene      Patient      Rank PercentRank isGoldStandard Frequency
## 48   TP53 TCGA-A2-A04P 1.894e-02      99.933              1       181
## 41 PIK3CA TCGA-A2-A04P 1.038e-03      85.198              1       152
## 39   PAK3 TCGA-A2-A04P 7.556e-04      73.878              0         2
## 47   BIN2 TCGA-A2-A04P 5.900e-04      62.760              0         1
## 45  AKAP9 TCGA-A2-A04P 4.694e-04      53.182              1         9
## 46 EIF4G3 TCGA-A2-A04P 3.605e-04      43.804              0         2
## 44 IL10RB TCGA-A2-A04P 2.959e-04      36.504              0         3
## 43  ERBB3 TCGA-A2-A04P 2.461e-04      31.547              0         9
## 38  HLA-A TCGA-A2-A04P 1.077e-04      14.267              0         3
## 40  FASLG TCGA-A2-A04P 3.801e-05       5.090              0         1
## 42   ATF4 TCGA-A2-A04P 3.662e-07       0.134              0         1
##    Deviation isCGC significant
## 48   2.83706     1           1
## 41   3.97903     1           1
## 39  -0.54155     0           1
## 47   0.06763     0           0
## 45   0.22665     1           0
## 46  -0.18290     0           0
## 44   1.27067     0           0
## 43  -1.64189     0           0
## 38  -1.07798     0           0
## 40  -1.07644     0           0
## 42  -1.27154     0           0

Data used in the manuscript

  • TCGA differential expression and mutation information can be accessed by the TCGA Data Portal
  • Gene network was built from the network used in MEMo (Ciriello et al.) as well as the up-to-date curated information from Reactome, the NCI-Nature Curated PID, and KEGG.
    • Download the network we constructed. (All genes have TCGA expression/alteration information)
    • Download the full network here. (The full network contains some genes that do not have TCGA expression/alteration information)
  • Gold Standard from the Cancer Gene Census (CGC) maintained by the Sanger Institute

Contact

Jack Hou and Jian Ma