“…This supposition is corroborated by analyzing the Cosmic database, which document that genes of the pyrimidine de novo pathway are rarely mutated in cancer patients. More specifically, the rate of mutation of these genes in tumours is similar to that of non-cancerous tissues (Figure 6E.”
MR_recurrent/MR_nonrecurrent
nonsyn/syn (kaks ratio)
raw MR/ expression level.
Table 3.1 contains basic information of the 4 genes in 1000Genome and TCGA cancer genome.
Hugo_Symbol | Protein_length | CDS_length | Nmut_1000G | Nmut_cancer | mutrate_1000G | mutrate_cancer | Mr_1000G_scaled | MR_cancer_scaled |
---|---|---|---|---|---|---|---|---|
CAD | 2225 | 6678 | 114 | 311 | 6.80e-06 | 3.9e-06 | -0.4650436 | -0.1719135 |
CPS1 | 1506 | 4521 | 70 | 402 | 6.20e-06 | 7.4e-06 | -0.6440362 | 0.7353631 |
DHODH | 395 | 1188 | 33 | 55 | 1.11e-05 | 3.9e-06 | 0.7420222 | -0.1777953 |
UMPS | 480 | 1443 | 23 | 65 | 6.40e-06 | 3.8e-06 | -0.5926594 | -0.2046028 |
Mutation rate \(MR_{gene}\) is normalized by sample size \(S\) and coding DNA sequence length\(L\). \(MR_{gene}\) is defined as follows:
\[\begin{equation} MR_{gene}=\frac{n_{gene}}{SL} \end{equation}\]Figure 3.1: cancer genes as in CGC and uniprot(829 in total, )
Figure 3.2: cancer genes as somatic, missense,frameshift, nonsense or splice site, excluding translocations, large amplifications/deletions
Figure 3.3: facets of previous figure
We could further compare pyrimidine genes with cancer genes, which have at least 3 recurrent mutations(recurrence >=5)
Var1 | Freq |
---|---|
6 | |
Non-cancer | 187 |
oncogene | 17 |
oncogene/fusion | 3 |
oncogene/TSG | 6 |
onc/TSG/fusion | 1 |
TSG | 22 |
TSG/fusion | 1 |
Hugo_Symbol | N_recurr | Genetype |
---|---|---|
TP53 | 197 | oncogene/TSG |
PIK3CA | 50 | oncogene |
APC | 36 | TSG |
PTEN | 35 | TSG |
TTN | 24 | Non-cancer |
BAGE2 | 19 | Non-cancer |
CTNNB1 | 19 | oncogene |
CDKN2A | 18 | TSG |
MUC4 | 16 | Non-cancer |
EGFR | 14 | oncogene |
ARID1A | 12 | TSG |
FBXW7 | 12 | TSG |
RB1 | 12 | TSG |
SMAD4 | 12 | TSG |
KRAS | 10 | oncogene |
NFE2L2 | 10 | oncogene/TSG |
VHL | 10 | TSG |
BRAF | 9 | oncogene/fusion |
DNAH5 | 9 | Non-cancer |
ERBB2 | 8 | oncogene/fusion |
Figure 3.4: Comparing with recurrently mutated cancer genes, highlighting top5 in the above table and pyrimidien genes
Figure 4.1: With only breast cancer mutations
Q value indicates mutation burden
qvalue > 0.1 | qvalue < 0.1 | |
---|---|---|
Non-cancer | 18580 | 41 |
oncogene | 58 | 7 |
TSG | 70 | 20 |
gene expression level (RNAseq V2 RSEM (log2))
## `geom_smooth()` using method = 'gam'
## `geom_smooth()` using method = 'gam'
## Joining, by = c("Hugo_Symbol", "Genetype")
## `geom_smooth()` using method = 'gam'
TP53, PIK3CA, PTEN, RB1, KRAS, NRAS, BRAF, CDKN2A, FBXW7, ARID1A and MLL2, as well as STAG2
ATM, CASP8, CTCF, ERBB3, HLA-A, HRAS, IDH1, NF1, NFE2L2 and PIK3R1
## `geom_smooth()` using method = 'gam'
QQ plots of gene mutation rates(\(log_{10}(MR_{gene})\)) against normal distributaion in figure 5.1 and figure 5.2 appear to be linear(excluding the lower tails), which indicate that gene mutation rates follow a log normal distribution. So the mutation rates were transformed logarithmically, then they were scaled using z-score.
Figure 5.1: Q-Q plot of gene mutation rates in 1000G
Figure 5.2: Q-Q plot of gene mutation rates in TCGA cancer genome
Figure 5.4 and 5.5 are reproduces of figure 5.3 (which used COSMIC mutation database)using TCGA mutation database.
Figure 5.3 doesn’t make much sense because I included all genes related to pyrimidine metabolism .
Figure 5.3: COSMIC(tumor) MR vs 1000G(normal) MR
Figure 5.4: TCGA(tumor) MR vs 1000G(normal) MR
Figure 5.5: a hexagonal heatmap representation of fig3
Figure 5.6: boxplot of MR translocation cancer genes and others
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.