2017-08-02

Contents

  • Project Background
    • About Cancer

    • Identifying Cancer Drivers

  • Project
    • Mutation Analysis in Cancer

    • Building a Driver Predictor

Project Background

About Cancer

  • Disease with a bad name:
    • prevalent:
      • 14 million new incidents/year ,
    • Cause of death:
      • 8 million deaths/year,
      • 1/5 of worldwide mortality
    • Related to everyone:
      • 28% ~ 41% individual lifetime risk of developing cancer

What is cancer?

  • Cancer is
    • Group of cells with limitless growth potential
    • Caused by epigenetic or genetic changes
  • It's a complex disease because of:
    • Cancer genes > 600
    • Heterogeneity
    • Genome instability:
      • a few cancer "drivers" and dorminant "passengers"
    • and…

Cancer is a complex disease

… > 100 subtypes (maybe more)

Trends in Clinical Cancer research

Trends In Cancer Biology Research (2)

Driver discovery

driver genes

Definition:

Genes that contribute to cell growth advantage when altered

Two types of cancer genes:

  1. POGs
    • Activation by mutation is advantageous to cancer
  2. TSGs
    • Inactivation by mutation is advantageous to cancer

Methods to find driver genes

strategy1

  • positive selection signals
    • MutsigCV, MUSiC, Oncodrive
    • Compare gene mutation burden with a background mutation model
    • mutation Clustering in sequence / protein structure

strategy2

  • Gene or protein Interaction based analysis
    • NetBox, HotNet2, DriverNet
    • neighbor-based forcasting

strategy3

  • Copy number variation and Gene expression level
    • OncodriveCIS, S2N, Conexic
    • abnormal elevated / supressed gene expression

driver mutations

  • Definition:

Mutations that contribute to cell growth advantage when altered

  • somatic and germline

  • Types:
    • Coding region:
      • Single nucleotide variation
      • Insertion / deletion
    • Noncoding region
    • Copy Number Variation / Aneuploidy

Methods to find driver mutations

Two main strategies:

  • Mutation recurrence among patients
    • fail to spot less frequent drivers
  • Functional impact of mutations
    • SIFT, Polyphen, CHASM, FATHMM, etc
    • strategies:
      • DNA conservation
      • Clustering of mutations
      • Affecting functional sites

summary of driver discovery methods

  • For driver Genes:
    • A lack of "gold standard" driver genes
    • Unable to "discover" all known driver genes
  • For driver mutations:
    • lack of "gold standard"

Project

Mutation analysis in Cancers

  • First step of driver discovery: understand them.

  • Goal: analyze cancer mutations and their correlation with DNA and protein structural features

  • We hypothesize that driver is enriched in recurrent mutations, next few slides test this hypothesis

Results

Distribution of mutations in different gene types
GeneType GeneNumber #=1 1<#<5 #>=5 percent_gt5 Enrichment p_value
Non_CG 14508 1000255 189042 10356 0.0086 1.0000 1
Putative_CG 1551 207992 49463 2565 0.0099 1.1427 2.03e-09
CG 609 56395 11506 4176 0.0579 6.7116 0
TSG 100 17298 5415 1382 0.0574 6.6442 0
oncogene 67 8027 2572 3422 0.2441 28.2725 0

mutation distribution in different genes

mutation distribution in different genes

recurrent mutations have high functional impact

SIFT ~ recurrence

SIFT ~ recurrence

recurrent mutations breaks the conservation

DNA conservation ~ recurrence

DNA conservation ~ recurrence

summary and future work