“…This supposition is corroborated by analyzing the Cosmic database, which document that genes of the pyrimidine de novo pathway are rarely mutated in cancer patients. More specifically, the rate of mutation of these genes in tumours is similar to that of non-cancerous tissues (Figure 6E.”
MR_recurrent/MR_nonrecurrent
nonsyn/syn (kaks ratio)
raw MR/ expression level.
Table 3.1 contains basic information of the 4 genes in 1000Genome and TCGA cancer genome.
Hugo_Symbol | Protein_length | CDS_length | Nmut_1000G | Nmut_cancer | mutrate_1000G | mutrate_cancer | Mr_1000G_scaled | MR_cancer_scaled |
---|---|---|---|---|---|---|---|---|
CAD | 2225 | 6678 | 114 | 311 | 6.80e-06 | 3.9e-06 | -0.4650436 | -0.1719135 |
CPS1 | 1506 | 4521 | 70 | 402 | 6.20e-06 | 7.4e-06 | -0.6440362 | 0.7353631 |
DHODH | 395 | 1188 | 33 | 55 | 1.11e-05 | 3.9e-06 | 0.7420222 | -0.1777953 |
UMPS | 480 | 1443 | 23 | 65 | 6.40e-06 | 3.8e-06 | -0.5926594 | -0.2046028 |
Mutation rate \(MR_{gene}\) is normalized by sample size \(S\) and coding DNA sequence length\(L\). \(MR_{gene}\) is defined as follows:
\[\begin{equation} MR_{gene}=\frac{n_{gene}}{SL} \end{equation}\]We could further compare pyrimidine genes with cancer genes, which have at least 3 recurrent mutations(recurrence >=5)
Var1 | Freq |
---|---|
6 | |
Non-cancer | 187 |
oncogene | 17 |
oncogene/fusion | 3 |
oncogene/TSG | 6 |
onc/TSG/fusion | 1 |
TSG | 22 |
TSG/fusion | 1 |
Hugo_Symbol | N_recurr | Genetype |
---|---|---|
TP53 | 197 | oncogene/TSG |
PIK3CA | 50 | oncogene |
APC | 36 | TSG |
PTEN | 35 | TSG |
TTN | 24 | Non-cancer |
BAGE2 | 19 | Non-cancer |
CTNNB1 | 19 | oncogene |
CDKN2A | 18 | TSG |
MUC4 | 16 | Non-cancer |
EGFR | 14 | oncogene |
ARID1A | 12 | TSG |
FBXW7 | 12 | TSG |
RB1 | 12 | TSG |
SMAD4 | 12 | TSG |
KRAS | 10 | oncogene |
NFE2L2 | 10 | oncogene/TSG |
VHL | 10 | TSG |
BRAF | 9 | oncogene/fusion |
DNAH5 | 9 | Non-cancer |
ERBB2 | 8 | oncogene/fusion |
Q value indicates mutation burden
qvalue > 0.1 | qvalue < 0.1 | |
---|---|---|
Non-cancer | 18580 | 41 |
oncogene | 58 | 7 |
TSG | 70 | 20 |
gene expression level (RNAseq V2 RSEM (log2))
## `geom_smooth()` using method = 'gam'
## `geom_smooth()` using method = 'gam'
## Joining, by = c("Hugo_Symbol", "Genetype")
## `geom_smooth()` using method = 'gam'
TP53, PIK3CA, PTEN, RB1, KRAS, NRAS, BRAF, CDKN2A, FBXW7, ARID1A and MLL2, as well as STAG2
ATM, CASP8, CTCF, ERBB3, HLA-A, HRAS, IDH1, NF1, NFE2L2 and PIK3R1
## `geom_smooth()` using method = 'gam'
QQ plots of gene mutation rates(\(log_{10}(MR_{gene})\)) against normal distributaion in figure 5.1 and figure 5.2 appear to be linear(excluding the lower tails), which indicate that gene mutation rates follow a log normal distribution. So the mutation rates were transformed logarithmically, then they were scaled using z-score.
Figure 5.4 and 5.5 are reproduces of figure 5.3 (which used COSMIC mutation database)using TCGA mutation database.
Figure 5.3 doesn’t make much sense because I included all genes related to pyrimidine metabolism .
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.