Gene mutation rate against expression level

Gene mutation rate against expression level

MutsigCV q-value against expression level

MutsigCV q-value against expression level

## `geom_smooth()` using method = 'gam'

Caption:

Figure 1. A Comparison of mutation rates between de novo pyrimidine biosynthesis genes (CAD, DHODH, CPS1 and UMPS, red) and known cancer genes (oncogene: black; tumor suppressors: blue). Mutation rates of pyrimidine biosythesis genes are lower than most cancer genes and even lower than non-cancer medium mutation rates. The majority of cancer gene mutation rates are marginally higher than non-cancer genes, which implies the difficulty of discriminate a large proportion of cancer genes from non-cancer genes based on mutation rates.

Figure 2. The significance (Q-value) of gene mutation burden evaluated by MutsigCV. The pyrimidine biosynthesis genes and other 99.7% non-cancer genes (18546 of 18600 genes) are insignificant (Q-value > 0.1). 20% cancer genes are identified as significantly mutated genes (53 of 262 cancer genes).

Methods:

For figure 1, the somatic mutation dataset was constructed by combining mutations from 33 cancer cohorts from TCGA, which consists of 11330 patients. Mutation maf files were downloaded from Firehose (http://gdac.broadinstitute.org/). The average gene expression level was derived from 91 CCLE (Cancer Cell Line Encyclopedia) cell lines. Cancer genes were retrieved from cancer gene census (http://cancer.sanger.ac.uk/census COSMIC v81). Here, we only considered cancer genes which have mutation types in missense / nonsense / frameshift / splicing mtations, excluding cancer genes affected by translocation, amplifications or large deletions. This resulted in 262 cancer genes in which 67 are oncogenes, 100 are Tumor suppressors and 95 don’t have a clear role in cancer.

For figure 2, The somatic mutation dataset and gene expression level was the same as in figure 1. For MutsigCV calculation, the combined pan-cancer mutation maf file was used as input and all other input files, default options were kept. Data analysis and visualization were conducted in R.

Reference

TCGA cancer mutation database: (https://cancergenome.nih.gov/publications/publicationguidelines)

“The results here are in whole or part based upon data generated by the TCGA Research Network: http://cancergenome.nih.gov/.”

Cancer gene census:

Futreal, P. Andrew, et al. “A census of human cancer genes.” Nature Reviews Cancer 4.3 (2004): 177-183.

CCLE:

Barretina, J. et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483, 603-7 (2012).

MutsigCV:

Lawrence, Michael S., et al. “Mutational heterogeneity in cancer and the search for new cancer-associated genes.” Nature 499.7457 (2013): 214-218.