Project

Mutation & Expression data preparation

muation files

  • The api provided by firehose is convenient but with some limits (speed mostly), Here’s a brief summary of what I’ve done:
    • downloaded 32 cohorts, 5 cohorts listed on firebrowse are pancancer list, see: data/tcga.cohorts.rda
    • combined as one mutation maf: onemaf, stored at aspen:~/aspen/dataset/onemaf.rds1
      • used data.table::rbindlist for rowbind, dplyr::bind_rows() caused an error. (stackoverflow)
      • cohort info added for each mutation.
      • tcga barcode added for patient identification, version at
        ~/aspen/dataset/onemaf.minimaf.test.v1.0.rda

Expression files

  • Use FirebrowseR to download expression, Two types available
    • gene - patients, ~200m records ~10G mem, aborted,consider to use it when necessary
    • gene quartiles for each cohort, including normal/tumor samples. This seems to be useful. ```r
    ``` - already combined as one in large_db/oneexpr.rda
    • scripts are at R/fh_*

Analysis

Investigate cohort level relationships

OK we need to find out how many genes have recurrent mutations.

Silent mutations

Nonsense mutations

Missense mutations

Is this working? (Lawrence et al. 2013)

plotmutprofile

  • a behavior of ggplot geom_tile() : if coordinate are the same, the latter will replace previous one. Two recurrence definition : exact / position: position: eg. p.F333A ,p.F333[ACDEKEF] would be treated as one here exact: exactly the same mutation. _____________

Other news

How to write a book in R.

bookdown

Open preamble.tex
Comment out \renewenvironment{Shaded}{\begin{kframe}}{\end{kframe}} fixed in #issue6

citation

  • citr + zotero is sufficient
    • addin of rstudio
    • active while standalone zotero is up
  • help page of rstudio
    • basically:
      1. add bibliography: "/home/tc/GIT/tcgaMut/references.bib" in the YAML header
      2. add #Reference at last
      3. done

Rstudio doesn’t support fcitx?

  • Don’t worry it’s fixed

Things that I write down so I don’t forget

Lawrence, Michael S., Petar Stojanov, Paz Polak, Gregory V. Kryukov, Kristian Cibulskis, Andrey Sivachenko, Scott L. Carter, et al. 2013. “Mutational Heterogeneity in Cancer and the Search for New Cancer-Associated Genes.” Nature 499 (7457): 214–18. doi:10.1038/nature12213.


  1. The onemaf can only be processed from aspen