
Mutation & Expression data preparation

muation files

  • The api provided by firehose is convenient but with some limits (speed mostly), Here’s a brief summary of what I’ve done:
    • downloaded 32 cohorts, 5 cohorts listed on firebrowse are pancancer list, see: data/tcga.cohorts.rda
    • combined as one mutation maf: onemaf, stored at aspen:~/aspen/dataset/onemaf.rds1
      • used data.table::rbindlist for rowbind, dplyr::bind_rows() caused an error. (stackoverflow)
      • cohort info added for each mutation.
      • tcga barcode added for patient identification, version at

Expression files

  • Use FirebrowseR to download expression, Two types available
    • gene - patients, ~200m records ~10G mem, aborted,consider to use it when necessary
    • gene quartiles for each cohort, including normal/tumor samples. This seems to be useful. ```r
    ``` - already combined as one in large_db/oneexpr.rda
    • scripts are at R/fh_*


Investigate cohort level relationships

OK we need to find out how many genes have recurrent mutations.

Silent mutations

Nonsense mutations

Missense mutations

Is this working? (Lawrence et al. 2013)


  • a behavior of ggplot geom_tile() : if coordinate are the same, the latter will replace previous one. Two recurrence definition : exact / position: position: eg. p.F333A ,p.F333[ACDEKEF] would be treated as one here exact: exactly the same mutation. _____________

Other news

How to write a book in R.


Open preamble.tex
Comment out \renewenvironment{Shaded}{\begin{kframe}}{\end{kframe}} fixed in #issue6


  • citr + zotero is sufficient
    • addin of rstudio
    • active while standalone zotero is up
  • help page of rstudio
    • basically:
      1. add bibliography: "/home/tc/GIT/tcgaMut/references.bib" in the YAML header
      2. add #Reference at last
      3. done

Rstudio doesn’t support fcitx?

  • Don’t worry it’s fixed

Things that I write down so I don’t forget

  1. The onemaf can only be processed from aspen