Cancer mutations
- Cancer genome instability -> many genome alterations
- Recurrent mutations
- Source
- “sequencing or mutation caller error” e.g. INDELS(false positives)
- “high mutation rate” e.g. low expression levels
- “growth advantage” -> driver
Goals
We’d like to learn
- Distribution
- relationship with
- expression
- cancer type
- etc
- effects
Distribution
Q. Are recurrent mutations enriched in cancer genes? what about POGs and TSGs
A. Yes they are.
Distribution of different gene types
NON_CG |
261137 |
2999 |
0.0113540 |
1.000000 |
POG |
5817 |
205 |
0.0340418 |
2.998225 |
TSG |
6891 |
180 |
0.0254561 |
2.242037 |
*P_value for enrichment are <2.2e-16 by fisher’s exact test
relationship with expression
- Anyone has concept of low expression can be driver?
No preference found for Cancer genes
Distribution of different gene types in 5 tiles of expression
CG |
51 |
51 |
36 |
57 |
34 |
0.0596961 |
NON_CG |
3423 |
3399 |
3428 |
3408 |
3439 |
0.9899763 |
POG |
41 |
58 |
62 |
50 |
52 |
0.2947908 |
POGTSG |
10 |
14 |
8 |
7 |
12 |
0.5224011 |
TSG |
46 |
49 |
37 |
49 |
34 |
0.3303183 |
Relationship with expression (Frameshift ins/del in_frame ins/del)
- Are recurrent variants more like be high or lower expression for tumor suppressors or oncogenes?
cont. (Missense, nonsense and silent)
*every point is a gene
Mapping mutations on proteins
- Cbioportal has this feature
- But they excluded “silent mutations”, unable to customize
- So I did this…
- Mapping different mutations onto TP53
Problem
- pfam downloads may fail if too many downloads attempted;