1 Final Conclusion

2 Background

2.1 denovo pyrimidine pathways

“…This supposition is corroborated by analyzing the Cosmic database, which document that genes of the pyrimidine de novo pathway are rarely mutated in cancer patients. More specifically, the rate of mutation of these genes in tumours is similar to that of non-cancerous tissues (Figure 6E.”

  • Comparing mutation burdern in Cancer(somatic mut) and Normal (germline)
    • CAD, UMPS, DHODH, CPS1

2.2 Plans

  • MR_recurrent/MR_nonrecurrent

  • nonsyn/syn (kaks ratio)

  • raw MR/ expression level.

2.3 Preparation

3 Figures

3.1 Overview

Table 3.1 contains basic information of the 4 genes in 1000Genome and TCGA cancer genome.

Table 3.1: Pyrimidine 4 gene mutation
Hugo_Symbol Protein_length CDS_length Nmut_1000G Nmut_cancer mutrate_1000G mutrate_cancer Mr_1000G_scaled MR_cancer_scaled
CAD 2225 6678 114 311 6.80e-06 3.9e-06 -0.4650436 -0.1719135
CPS1 1506 4521 70 402 6.20e-06 7.4e-06 -0.6440362 0.7353631
DHODH 395 1188 33 55 1.11e-05 3.9e-06 0.7420222 -0.1777953
UMPS 480 1443 23 65 6.40e-06 3.8e-06 -0.5926594 -0.2046028

Mutation rate \(MR_{gene}\) is normalized by sample size \(S\) and coding DNA sequence length\(L\). \(MR_{gene}\) is defined as follows:

\[\begin{equation} MR_{gene}=\frac{n_{gene}}{SL} \end{equation}\]

3.2 All cancer genes

  • (not good), many cancer genes are not affected by somatic mutations. Figure 3.1
cancer genes as in CGC and uniprot(829 in total, )

Figure 3.1: cancer genes as in CGC and uniprot(829 in total, )

3.3 Cancer genes mainly affected by SNVs and indels

  • Only considering cancer genes with SNVs and small Indels, translocation, large amplifications/deletions mutation types were removed,resulting in 268 genes.
cancer genes as somatic, missense,frameshift, nonsense or splice site, excluding translocations, large amplifications/deletions

Figure 3.2: cancer genes as somatic, missense,frameshift, nonsense or splice site, excluding translocations, large amplifications/deletions

facets of previous figure

Figure 3.3: facets of previous figure

3.4 Cancer genes with recurrent mutations

We could further compare pyrimidine genes with cancer genes, which have at least 3 recurrent mutations(recurrence >=5)

Table 3.2: Gene type distribution
Var1 Freq
6
Non-cancer 187
oncogene 17
oncogene/fusion 3
oncogene/TSG 6
onc/TSG/fusion 1
TSG 22
TSG/fusion 1
Table 3.3: Top 20 genes with number of recurrent mutations and gene type
Hugo_Symbol N_recurr Genetype
TP53 197 oncogene/TSG
PIK3CA 50 oncogene
APC 36 TSG
PTEN 35 TSG
TTN 24 Non-cancer
BAGE2 19 Non-cancer
CTNNB1 19 oncogene
CDKN2A 18 TSG
MUC4 16 Non-cancer
EGFR 14 oncogene
ARID1A 12 TSG
FBXW7 12 TSG
RB1 12 TSG
SMAD4 12 TSG
KRAS 10 oncogene
NFE2L2 10 oncogene/TSG
VHL 10 TSG
BRAF 9 oncogene/fusion
DNAH5 9 Non-cancer
ERBB2 8 oncogene/fusion
Comparing with recurrently mutated cancer genes, highlighting top5 in the above table and pyrimidien genes

Figure 3.4: Comparing with recurrently mutated cancer genes, highlighting top5 in the above table and pyrimidien genes

4 Figures (updates)

4.1 Cancer genes with recurrent mutations (breast cancer specific)

  • if we only include breast cancer mutations:
  • DHODH is not shown because it doesn’t have mutations
With only breast cancer mutations

Figure 4.1: With only breast cancer mutations

4.2 MR_recurrent/MR_nonrecurrent in TCGA and 1000G

4.3 MutsigCV q-value

Q value indicates mutation burden

4.3.1 Using breast cancer mutations only BRCA

4.3.2 Use 20 cancer types [pan20]

4.3.3 Barplot

  • distribution
Table 4.1: pan cancer-20,distribution of qvalue < 0.1
qvalue > 0.1 qvalue < 0.1
Non-cancer 18580 41
oncogene 58 7
TSG 70 20

4.3.4 Boxplot

4.4 dn/ds

  • ratio of non-synonymous mutation rate/ synonymous mutation rates

4.5 raw mutation rate - expression level

4.5.1 BRCA

gene expression level (RNAseq V2 RSEM (log2))

## `geom_smooth()` using method = 'gam'

4.5.2 Raw MR- expression level

## `geom_smooth()` using method = 'gam'

4.5.3 Raw MR- expression level - highlighting recurrently mutated genes

## Joining, by = c("Hugo_Symbol", "Genetype")
## `geom_smooth()` using method = 'gam'

4.5.4 Highlight significantly mutated genes:

  • 22 genes found to be significant in three or more tumor types:
    • TP53, PIK3CA, PTEN, RB1, KRAS, NRAS, BRAF, CDKN2A, FBXW7, ARID1A and MLL2, as well as STAG2

    • ATM, CASP8, CTCF, ERBB3, HLA-A, HRAS, IDH1, NF1, NFE2L2 and PIK3R1

## `geom_smooth()` using method = 'gam'

5 Supplements

5.1 Distribution of mutation rate

QQ plots of gene mutation rates(\(log_{10}(MR_{gene})\)) against normal distributaion in figure 5.1 and figure 5.2 appear to be linear(excluding the lower tails), which indicate that gene mutation rates follow a log normal distribution. So the mutation rates were transformed logarithmically, then they were scaled using z-score.

Q-Q plot of gene mutation rates in 1000G

Figure 5.1: Q-Q plot of gene mutation rates in 1000G

Q-Q plot of gene mutation rates in TCGA cancer genome

Figure 5.2: Q-Q plot of gene mutation rates in TCGA cancer genome

5.2 Plain mutation rate

Figure 5.4 and 5.5 are reproduces of figure 5.3 (which used COSMIC mutation database)using TCGA mutation database.

Figure 5.3 doesn’t make much sense because I included all genes related to pyrimidine metabolism .

COSMIC(tumor) MR vs 1000G(normal) MR

Figure 5.3: COSMIC(tumor) MR vs 1000G(normal) MR

TCGA(tumor) MR vs 1000G(normal) MR

Figure 5.4: TCGA(tumor) MR vs 1000G(normal) MR

a hexagonal heatmap representation of fig3

Figure 5.5: a hexagonal heatmap representation of fig3

5.3 Cancer gene types

  • Boxplots in 3.3 indicate that cancer genes with translocation roles in cancer have lower mutation rates than genes with SNVs and indels :
boxplot of MR translocation cancer genes and others

Figure 5.6: boxplot of MR translocation cancer genes and others

5.4 Gene non-synonymous/synonymous sites ratio

5.5 Gene expression distribution

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.