## `geom_smooth()` using method = 'gam'
For figure 1, the somatic mutation dataset was constructed by combining mutations from 33 cancer cohorts from TCGA, which consists of 11330 patients. Mutation maf files were downloaded from Firehose (http://gdac.broadinstitute.org/). The average gene expression level was derived from 91 CCLE (Cancer Cell Line Encyclopedia) cell lines. Cancer genes were retrieved from cancer gene census (http://cancer.sanger.ac.uk/census COSMIC v81). Here, we only considered cancer genes which have mutation types in missense / nonsense / frameshift / splicing mtations, excluding cancer genes affected by translocation, amplifications or large deletions. This resulted in 262 cancer genes in which 67 are oncogenes, 100 are Tumor suppressors and 95 don’t have a clear role in cancer.
For figure 2, The somatic mutation dataset and gene expression level was the same as in figure 1. For MutsigCV calculation, the combined pan-cancer mutation maf file was used as input and all other input files, default options were kept. Data analysis and visualization were conducted in R.
TCGA cancer mutation database: (https://cancergenome.nih.gov/publications/publicationguidelines)
“The results
Cancer gene census:
Futreal, P. Andrew, et al. “A census of human cancer genes.” Nature Reviews Cancer 4.3 (2004): 177-183.
CCLE:
Barretina, J. et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483, 603-7 (2012).
MutsigCV:
Lawrence, Michael S., et al. “Mutational heterogeneity in cancer and the search for new cancer-associated genes.” Nature 499.7457 (2013): 214-218.