DawnRank is an R package that identifies personalized driver mutations for any given patient sample.
The input includes differential gene expression (between tumor and normal) and somatic alteration data (point mutations or CNVs).
The output includes a ranked list of mutated genes according to their DawnRank score.
A high DawnRank score for a altered gene in cancer indicates that the gene is more likely to be a driver.
Instructions
DawnRank was built under R version 2.15 and later. To install the package:
- Open R
- Download the DawnRank package and save it
- Type the command install.packages("path to DawnRank file", repos = NULL, type = "source") to install the package. You will need to download this from the source. This needs only to be done once
- Type the command library(DawnRank) to access the package's functions. This must be done every time when starting a new R session
- Below is an example run of DawnRank
Example
The data provided in this tutorial comes preinstalled with the DawnRank package.
The data includes a small subset of TCGA breast cancer (BRCA) patient molecular signature data,
including tumor gene expression, normal gene expression, and mutation data.
An example pathway from KEGG of 1492 nodes and a gold standard vector from the Cancer Gene Census are also provided.
### using a small subset of the TCGA dataset and a small KEGG gene
### interaction network, We will show how to get DawnRank Results
library(DawnRank)
# load the mutation data
data(brcaExampleMutation)
# load the tumor expression data
data(brcaExampleTumorExpression)
# load the normal expression data
data(brcaExampleNormalExpression)
# load the pathway data
data(brcaExamplePathway)
# load the gold standard
data(goldStandard)
# normalize the tumor and normal data to get the differential expression
normalizedDawn <- DawnNormalize(tumorMat = brcaExampleTumorExpression, normalMat = brcaExampleNormalExpression)
# get the DawnRank Score Get some coffee, this might take a while!
dawnRankScore <- DawnRank(adjMatrix = brcaExamplePathway, mutationMatrix = brcaExampleMutation,
expressionMatrix = normalizedDawn, mu = 3, goldStandard = goldStandard)
# look at the DawnRank scores for a few patients
dawnRankFrame <- dawnRankScore[[3]]
head(dawnRankFrame)
## Gene Patient Rank PercentRank isGoldStandard Frequency
## 1 CNTFR TCGA-A1-A0SD 0.0001512 21.43 0 1
## 2 NCOA3 TCGA-A1-A0SD 0.0002021 28.87 0 6
## 3 CDH1 TCGA-A1-A0SE 0.0007916 77.23 1 33
## 4 PLCE1 TCGA-A1-A0SH 0.0010033 85.93 0 7
## 5 ACSL4 TCGA-A1-A0SH 0.0001951 30.01 0 2
## 6 RYR2 TCGA-A1-A0SH 0.0020122 96.12 0 21
## Deviation
## 1 -0.7719
## 2 -0.7431
## 3 -0.1894
## 4 -0.1715
## 5 -0.6814
## 6 2.8303
# get the aggregate DawnRank scores Get some coffee, this might take a
# while!
aggregateDawnRankScore <- condorcetRanking(scoreMatrix = dawnRankScore[[2]],
mutationMatrix = brcaExampleMutation)
# look at top 10 ranked genes
top10 <- aggregateDawnRankScore[[2]][1:10]
top10
## TP53 GATA3 MAP2K4 DMD EGFR ERBB2 KIT CDH1 EP300 PIK3CA
## 0.9983 0.9775 0.9684 0.9345 0.9146 0.8976 0.8803 0.8727 0.8645 0.8613
# get the individual cutoff for patient TCGA-A2-A04P
dawnRankFrame$isCGC <- dawnRankFrame$isGoldStandard
library(maxstat)
#NOTE: the latest version of mvnorm should be installed
significance <- patspeccutoff(patient = "TCGA-A2-A04P", ms = dawnRankFrame,
default = 95)
# look for signficance.
significance[[1]]
## Gene Patient Rank PercentRank isGoldStandard Frequency
## 48 TP53 TCGA-A2-A04P 1.894e-02 99.933 1 181
## 41 PIK3CA TCGA-A2-A04P 1.038e-03 85.198 1 152
## 39 PAK3 TCGA-A2-A04P 7.556e-04 73.878 0 2
## 47 BIN2 TCGA-A2-A04P 5.900e-04 62.760 0 1
## 45 AKAP9 TCGA-A2-A04P 4.694e-04 53.182 1 9
## 46 EIF4G3 TCGA-A2-A04P 3.605e-04 43.804 0 2
## 44 IL10RB TCGA-A2-A04P 2.959e-04 36.504 0 3
## 43 ERBB3 TCGA-A2-A04P 2.461e-04 31.547 0 9
## 38 HLA-A TCGA-A2-A04P 1.077e-04 14.267 0 3
## 40 FASLG TCGA-A2-A04P 3.801e-05 5.090 0 1
## 42 ATF4 TCGA-A2-A04P 3.662e-07 0.134 0 1
## Deviation isCGC significant
## 48 2.83706 1 1
## 41 3.97903 1 1
## 39 -0.54155 0 1
## 47 0.06763 0 0
## 45 0.22665 1 0
## 46 -0.18290 0 0
## 44 1.27067 0 0
## 43 -1.64189 0 0
## 38 -1.07798 0 0
## 40 -1.07644 0 0
## 42 -1.27154 0 0
Data used in the manuscript
- TCGA differential expression and mutation information can be accessed by the TCGA Data Portal
- Gene network was built from the network used in MEMo (Ciriello et al.) as well as the up-to-date curated information from Reactome, the NCI-Nature Curated PID, and KEGG.
- Download the network we constructed. (All genes have TCGA expression/alteration information)
- Download the full network here. (The full network contains some genes that do not have TCGA expression/alteration information)
- Gold Standard from the Cancer Gene Census (CGC) maintained by the Sanger Institute
Contact
Jack Hou and
Jian Ma