\documentclass[10pt]{article}
\usepackage[USletter]{vmargin}
\setmargrb{0.75in}{0.75in}{0.75in}{0.75in}
\usepackage{amsmath}
\usepackage{float}
\usepackage{color}
\usepackage{amscd}
\usepackage[tableposition=top]{caption}
\usepackage{ifthen}
\usepackage[utf8]{inputenc}
\usepackage{hyperref}
%\VignetteIndexEntry{Using Canopy}
\begin{document}
\SweaveOpts{concordance=TRUE}
\title{Canopy vignette}
\author{Yuchao Jiang
    \\
    \href{mailto:yuchaoj@upenn.edu}{yuchaoj@upenn.edu}}
\maketitle
This is a demo for using the \verb@Canopy@ package in R. \verb@Canopy@ is a
statistical framework and computational procedure for identifying
subpopulations within a tumor, determining the mutation profiles of each
subpopulation, and inferring the tumor's phylogenetic history. The input to
\verb@Canopy@ are variant allele frequencies of somatic single nucleotide 
alterations (SNAs) along with allele-specific coverage ratios between the tumor
and matched normal sample for somatic copy number alterations (CNAs). These 
quantities can be directly taken from the output of existing software. 
\verb@Canopy@ provides a general mathematical framework for pooling data across
samples and sites to infer the underlying parameters. For SNAs that fall within
CNA regions, \verb@Canopy@ infers their temporal ordering and resolves their 
phase.  When there are multiple evolutionary configurations consistent with the
data, \verb@Canopy@ outputs all configurations along with their confidence.\\

Below is an example on reconstructing tumor phylogeny of a transplantable
metastasis model system derived from a heterogeneous human breast cancer
cell line MDA-MB-231. Cancer cells from the parental line MDA-MB-231 were
engrafted into mouse hosts leading to organ-specific metastasis. Mixed cell
populations (MCPs) were in vivo selected from either bone or lung metastasis
and grew into phenotypically stable and metastatically competent cancer cell
lines. The parental line as well as the MCP sublines were whole-exome sequenced 
with somatic SNAs and CNAs profiled. \verb@Canopy@ is used to infer metastatic
phylogeny. Code for analysis of this dataset is broken down below
with explanations and is further available
\href{https://github.com/yuchaojiang/Canopy/blob/master/demo_code/canopy_demo_MDA231.R}{\textcolor{blue}{here}}.\\

\verb@Canopy@'s \textbf{webpage} is \href{https://github.com/yuchaojiang/Canopy}{\textcolor{blue}{here}}. \textbf{Demo codes} for \verb@Canopy@ under various settings/modes can be found \href{https://github.com/yuchaojiang/Canopy/blob/master/demo_code}{\textcolor{blue}{here}}.
Script for dataset from the MDA231 study is attached below with step-by-step decomposition and explanation. Online \textbf{Q\&A forum} for
\verb@Canopy@ is available \href{https://groups.google.com/d/forum/canopy_phylogeny}{\textcolor{blue}{here}}.
If you've any questions regarding the software, you can also email us at
\href{mailto:canopy\_phylogeny@googlegroups.com}{\textcolor{blue}{canopy\_phylogeny@googlegroups.com}}.

\section*{1. Installation}

R package \verb$Canopy$ is availble from \verb@CRAN@ (\href{https://cran.r-project.org/web/packages/Canopy/index.html}{https://CRAN.R-project.org/package=Canopy}):
<<Installation1, eval=FALSE>>=
install.packages('Canopy')  # updated every 3-4 months
@
A devel version can be installed from GitHub ((\href{https://github.com/yuchaojiang/Canopy}{https://github.com/yuchaojiang/Canopy}):
<<Installation2, eval=FALSE>>=
install.packages("devtools")
library(devtools)
install_github("yuchaojiang/Canopy/package")  # STRONGLY recommended
@


\section*{2. Canopy workflow}
\subsection*{2.1 CNA and SNA input}
The input to \verb@Canopy@ are variant allele frequencies of somatic SNAs along
with allele-specific coverage ratios between the tumor and matched normal sample
for somatic CNAs. For SNAs, let the matrices $R$ and $X$ be, respectively,
the number of reads containing the mutant allele and the total number of reads
for each locus across all samples. The ratio $R/X$ is the proportion of reads
supporting the mutant allele, known as the variant allele frequency. For CNAs,
\verb@Canopy@ directly takes output from allele-specific copy number estimation
softwares, such as 
\href{https://CRAN.R-project.org/package=falconx}{FALCON-X} or
\href{https://CRAN.R-project.org/package=sequenza}{Sequenza}. These outputs are in the form of estimated major and minor copy number ratios, respectively denoted
by $W^M$ and $W^m$, with their corresponding standard errors $\epsilon^M$ and
$\epsilon^m$. Matrix $Y$ specifies whether SNAs are affected by CNAs; matrix $C$
specifies whether CNA regions harbor specific CNAs (this input is only needed if
overlapping CNA events are observed).
\\\\
How to generate CNA and SNA data input is futher discussed 
\href{https://github.com/yuchaojiang/Canopy/blob/master/instruction/SNA_CNA_input.md}{\textcolor{blue}{here}}. 
How to select \textit{informative} SNA and CNA input is discussed 
\href{https://github.com/yuchaojiang/Canopy/blob/master/instruction/SNA_CNA_choice.md}{\textcolor{blue}{here}}.
\\\\
Below is demo data input from project MDA231 (first case study in our paper).


<<Input>>=
library(Canopy)
data("MDA231")

projectname = MDA231$projectname ## name of project
R = MDA231$R; R ## mutant allele read depth (for SNAs)
X = MDA231$X; X ## total depth (for SNAs)
WM = MDA231$WM; WM ## observed major copy number (for CNA regions)
Wm = MDA231$Wm; Wm ## observed minor copy number (for CNA regions)
epsilonM = MDA231$epsilonM ## standard deviation of WM, pre-fixed here
epsilonm = MDA231$epsilonm ## standard deviation of Wm, pre-fixed here
## Matrix C specifices whether CNA regions harbor specific CNAs 
## only needed if overlapping CNAs are observed, specifying which CNAs overlap
C = MDA231$C; C
Y = MDA231$Y; Y ## whether SNAs are affected by CNAs
@

\subsection*{2.2 Binomial clustering of SNAs}

A multivariate binomial mixture clustering step can be applied to the SNAs before MCMC sampling. We show in our paper via simulations that this pre-clustering method helps the Markov chain converge faster with smaller estimation error (especially when mutations show clear cluster patterns by visualization). This clustering step can also remove likely false positives before feeding the mutations to the MCMC algorithm.
\\\\
Below is a toy example, where three bulk tumor samples were in silico simulated from a tree of 4 clones/leaves. The 5 tree segments (excluding the leftmost branch, which corresponds to the normal clone) separate 200 mutations into 5 mutation clusters. More detailed demo codes for clustering can be found \href{https://github.com/yuchaojiang/Canopy/blob/master/clustering/binomial_EM.R}{\textcolor{blue}{here}}. Detailed methods can be found in the \href{http://www.pnas.org/content/suppl/2016/08/26/1522203113.DCSupplemental/pnas.1522203113.sapp.pdf}{\textcolor{blue}{supplements}} of our paper under section Binomial mixture clustering. BIC is used for model selection. 2D (two longitudinal/spatial samples) or 3D (three samples) plots are generated for visualization.

<<SNA_clustering>>=
library(Canopy)
data(toy3)
R=toy3$R; X=toy3$X # 200 mutations across 3 samples
num_cluster=2:9 # Range of number of clusters to run
num_run=10 # How many EM runs per clustering step for each mutation cluster wave
canopy.cluster=canopy.cluster(R = R,
                              X = X,
                              num_cluster = num_cluster,
                              num_run = num_run)
bic_output=canopy.cluster$bic_output # BIC for model selection (# of clusters)
Mu=canopy.cluster$Mu # VAF centroid for each cluster
Tau=canopy.cluster$Tau  # Prior for mutation cluster, with a K+1 component
sna_cluster=canopy.cluster$sna_cluster # cluster identity for each mutation
@


\begin{figure}[H]
\begin{center}
\setkeys{Gin}{width=0.8\linewidth}
<<label=fig1,fig=TRUE,echo=FALSE,height=4,width=8>>=
par(mfrow=c(1,2))
plot(num_cluster,bic_output,xlab='Number of mutation clusters',ylab='BIC',type='b',main='BIC for model selection')
abline(v=num_cluster[which.max(bic_output)],lty=2)
colc=c('green4','red3','royalblue1','darkorange1','royalblue4',
       'mediumvioletred','seagreen4','olivedrab4','steelblue4','lavenderblush4')
pchc=c(17,0,1,15,3,16,4,8,2,16)
library(scatterplot3d)
scatterplot3d((R/X)[,1],(R/X)[,2],(R/X)[,3],xlim=c(0,max(R/X)),ylim=c(0,max(R/X)),zlim=c(0,max(R/X)),color=colc[sna_cluster],pch=pchc[sna_cluster],
              xlab='Sample1 VAF',ylab='Sample2 VAF',zlab='Sample3 VAF',
              main='VAF clustering across 3 samples')
par(mfrow=c(1,1))
@
\end{center}
\caption{BIC to select the optimal number of clusters and Binomial mixture 
clustering results.}
\label{fig:one}
\end{figure}


Below is real dataset from Ding et al. (Nature 2012), where a leukemia patient was sequenced at two timepoints -- primary tumor (sample 1) and relapse genome (sample 2). The real dataset is noisier and can potentially contain false positives for somatic mutations. We thus include in the mixture a multivariate uniform component on the unit interval, which corresponds to mutations that have high standard errors during sequencing or that are likely to be false positives. The code and SNA input for this dataset can be found \href{https://github.com/yuchaojiang/Canopy/blob/master/clustering/binomial_EM.R}{\textcolor{blue}{here}}.
<<SNA_clustering_AML43>>=
library(Canopy)
data(AML43)
R=AML43$R; X=AML43$X
num_cluster=4 # Range of number of clusters to run
num_run=6 # How many EM runs per clustering step for each mutation cluster wave
Tau_Kplus1=0.05 # Pre-specified proportion of noise component
Mu.init=cbind(c(0.01,0.15,0.25,0.45),
              c(0.2,0.2,0.01,0.2)) # Initial value for centroid
canopy.cluster=canopy.cluster(R = R,
                              X = X,
                              num_cluster = num_cluster,
                              num_run = num_run,
                              Mu.init = Mu.init,
                              Tau_Kplus1=Tau_Kplus1)
Mu=canopy.cluster$Mu # VAF centroid for each cluster
Tau=canopy.cluster$Tau  # Prior for mutation cluster, with a K+1 component
sna_cluster=canopy.cluster$sna_cluster # cluster identity for each mutation

R.qc=R[sna_cluster<=4,] # exclude mutations in the noise cluster
X.qc=X[sna_cluster<=4,]
sna_cluster.qc=sna_cluster[sna_cluster<=4]

R.cluster=round(Mu*100)  # Generate pseudo-SNAs correponding to each cluster. 
X.cluster=pmax(R.cluster,100)   # Total depth is set at 100 but can be obtained as median instead 
rownames(R.cluster)=rownames(X.cluster)=paste('SNA.cluster',1:4,sep='')
@

\begin{figure}[H]
\begin{center}
\setkeys{Gin}{width=0.5\linewidth}
<<label=fig2,fig=TRUE,echo=FALSE,height=4,width=4>>=
Mu=canopy.cluster$Mu # VAF centroid for each cluster
Tau=canopy.cluster$Tau  # Prior for mutation cluster, with a K+1 component
sna_cluster=canopy.cluster$sna_cluster # cluster identity for each mutation
colc=c('green4','red3','royalblue1','darkorange1','royalblue4',
       'mediumvioletred','seagreen4','olivedrab4','steelblue4','lavenderblush4')
pchc=c(17,0,1,15,3,16,4,8,2,16)
plot((R/X)[,1],(R/X)[,2],xlab='Sample1 VAF',ylab='Sample2 VAF',col=colc[sna_cluster],pch=pchc[sna_cluster],ylim=c(0,max(R/X)),xlim=c(0,max(R/X)))
@
\end{center}
\caption{Binomial mixture clustering on real dataset of primary tumor and relapse
genome.}
\label{fig:two}
\end{figure}


\subsection*{2.3 Phylogenetic tree (unknown)}
Each sampled tree is modeled as a list by \verb@Canopy@. Below are the tree 
elements of the most likely tree from the project MDA231 (first case study in
the paper). The most likely tree is obtained from the posterior distribution 
in the tree space from the MCMC sampling (detailed in section 2.3). How to
visualize/plot the sampled trees is discussed later.

<<Tree_elements1>>=
data('MDA231_tree')
MDA231_tree$Z # Z matrix specifies the position of the SNAs along the tree branch
MDA231_tree$cna.copy # major and minor copy number (interger values) for each CNA
MDA231_tree$CM # Major copy per clone for each CNA
MDA231_tree$Cm # Minor copy per clone for each CNA
MDA231_tree$Q # whether an SNA precedes a CNA
@

<<Tree_elements2>>=
MDA231_tree$H # if an SNA precedes a CNA, whether it resides in the major copy
MDA231_tree$P # clonal compostion for each sample
MDA231_tree$VAF # VAF based on current tree structure
@


\subsection*{2.4 MCMC sampling}
\verb@Canopy@ samples in subtree space with varying number of subclones 
(denoted as $K$) by a Markov chain Monte Carlo (MCMC) method. A plot of
posterior likelihood (pdf format) will be generated for each subtree space and
we recommend users to refer to the plot as a sanity check for sampling 
convergence and to choose the number of burn-ins and thinning accordingly. Note
that this step can be time-consuming, especially with larger number of chains
(\verb@numchain@ specifies the number of chains with random initiations, a 
larger value of which is in favor of not getting stuck in local optima) and 
longer chains (\verb@simrun@ specifies number of iterations per chain).
MCMC sampling is the most computationally heavy step in \verb@Canopy@. It is 
recommended that jobs are run in parallel on high-performance cluster.
\\\\
There are four modes of MCMC sampling embedded in Canopy: (1) \verb@canopy.sample@
which takes both SNA and CNA as input by default; (2) \verb@canopy.sample.nocna@ for cases where there is no CNA input; (3) \verb@canopy.sample.cluster@ for cases where SNAs are pre-clustered by the Binomial mixture EM algorithm; (4) \verb@canopy.sample.cluster.nocna@ for cases where there is no CNA input and SNAs are pre-clustered by the Binomial mixture EM algorithm. More details can be found \href{https://github.com/yuchaojiang/Canopy/blob/master/instruction/sampling_mode.md}{\textcolor{blue}{here}}.
\\\\
Below is sampling code for the MDA231 dataset where both SNA and CNA are used as input.

<<Sampling1, eval=FALSE>>=
K = 3:6 # number of subclones
numchain = 20 # number of chains with random initiations
sampchain = canopy.sample(R = R, X = X, WM = WM, Wm = Wm, epsilonM = epsilonM, 
            epsilonm = epsilonm, C = C, Y = Y, K = K, numchain = numchain, 
            max.simrun = 50000, min.simrun = 10000, 
            writeskip = 200, projectname = projectname, cell.line = TRUE, 
            plot.likelihood = TRUE)
save.image(file = paste(projectname, '_postmcmc_image.rda',sep=''),
           compress = 'xz')
@


<<Sampling2, echo=FALSE>>=
data("MDA231_sampchain")
sampchain = MDA231_sampchain
k = 3; K = 3:6
sampchaink = MDA231_sampchain[[which(K==k)]]
@


<<Sampling3>>=
length(sampchain) ## number of subtree spaces (K=3:6)
length(sampchain[[which(K==4)]]) ## number of chains for subtree space with 4 subclones
length(sampchain[[which(K==4)]][[1]]) ## number of posterior trees in each chain
@


\subsection*{2.5 BIC for model selection}
\verb@Canopy@ uses BIC as a model selection criterion to determine to optimal
number of subclones.
<<BIC>>=
burnin = 100
thin = 5 # If there is error in the bic and canopy.post step below, make sure
         # burnin and thinning parameters are wisely selected so that there are
         # posterior trees left.
bic = canopy.BIC(sampchain = sampchain, projectname = projectname, K = K,
               numchain = numchain, burnin = burnin, thin = thin, pdf = FALSE)
optK = K[which.max(bic)]
@

\begin{figure}[H]
\begin{center}
\setkeys{Gin}{width=.6\linewidth}
<<label=fig3,fig=TRUE,echo=FALSE,height=4,width=8>>=
# Note: this segment is soley for generating BIC plot in the vignettes.
# Use Canopy.BIC() with pdf = TRUE to generate this plot directly.
par(mfrow=c(1,2))
projectname = 'MDA231'
numchain = 20
clikelihood = matrix(nrow = numchain, ncol = length(sampchaink[[1]]), data = NA)
for(numi in 1:numchain){
  for(i in 1:ncol(clikelihood)){
    clikelihood[numi,i] = sampchaink[[numi]][[i]]$likelihood
  }
}
plot(1:ncol(clikelihood), clikelihood[1,], type='l', xlab = 'Iteration',
     ylab = 'Log-likelihood', col = 1, ylim = c(min(clikelihood), 
                                                max(clikelihood)))
for(numi in 2:numchain){
  points(1:ncol(clikelihood), clikelihood[numi,], type = 'l', col = numi)
}
title(main=paste('Posterior likelihood', k, 'clones', numchain,
            'chains'),cex=0.6)
plot(K, bic, xlab = 'Number of subclones', ylab = 'BIC', type = 'b', xaxt = "n")
axis(1, at = K)
abline(v = (3:6)[which.max(bic)], lty = 2)
title('BIC for model selection')
@
\end{center}
\caption{Posterior likelihood of MCMC (chains are colored differently) and BIC
as a model selection method.}
\label{fig:three}
\end{figure}


\subsection*{2.6 Posterior evaluation of sampled trees}

\verb@Canopy@ then runs a posterior evaluation of all sampled trees by MCMC. If
modes of posterior probabilities (second column of \verb@config.summary@)
aren't obvious, check if the algorithm has converged (and run sampling longer
if not).


<<Post>>=
post = canopy.post(sampchain = sampchain, projectname = projectname, K = K,
                 numchain = numchain, burnin = burnin, thin = thin, optK = optK,
                 C = C, post.config.cutoff = 0.05)
samptreethin = post[[1]]   # list of all post-burnin and thinning trees
samptreethin.lik = post[[2]]   # likelihoods of trees in samptree
config = post[[3]] # configuration for each posterior tree
config.summary = post[[4]] # configuration summary
print(config.summary)
# first column: tree configuration
# second column: posterior configuration probability in the entire tree space
# third column: posterior configuration likelihood in the subtree space
@


\subsection*{2.7 Tree output and plotting}

One can then use \verb@Canopy@ to output and plot the most likely tree (i.e., 
tree with the highest posterior likelihood). Mutations, clonal frequencies, and
tree topology, etc., of the tree are obtained from the posterior distributions
of subtree space with trees having the same configuration. In our MDA231 
example, the most likely tree is the tree with configuration 3.
\\\\
\textbf{Note}: A separate txt file can be generated (with txt=TRUE and txt.name='*.txt') if the figure legend of mutational profiles (texts below the phylogenetic tree) in the plot is too long to be fitted entirely.
\\\\
<<Plot>>=
config.i = config.summary[which.max(config.summary[,3]),1]
cat('Configuration', config.i, 'has the highest posterior likelihood!\n')
# plot the most likely tree in the posterior tree space
output.tree = canopy.output(post, config.i, C)
canopy.plottree(output.tree)

# plot the tree with configuration 1 in the posterior tree space
output.tree = canopy.output(post, 1, C)
canopy.plottree(output.tree, pdf=TRUE, pdf.name = 
                    paste(projectname,'_first_config.pdf',sep=''))
@


\begin{figure}[H]
\begin{center}
\setkeys{Gin}{width=0.7\linewidth}
<<label=fig4,fig=TRUE,echo=FALSE,height=6,width=5.9>>=
canopy.plottree(output.tree)
@
\end{center}
\caption{Most likely tree by Canopy for project MDA231.}
\label{fig:four}
\end{figure}
\newpage

\section*{3. Try it yourself}
Now try Canopy yourself using the simulated toy dataset below! Note that no
overlapping CNAs are used as input and thus matrix $C$ doesn't need to be
specified.
<<Try it your self, eval = FALSE>>=
library(Canopy)
data(toy)
projectname = 'toy'
R = toy$R; X = toy$X; WM = toy$WM; Wm = toy$Wm
epsilonM = toy$epsilonM; epsilonm = toy$epsilonm; Y = toy$Y

K = 3:6; numchain = 10
sampchain = canopy.sample(R = R, X = X, WM = WM, Wm = Wm, epsilonM = epsilonM, 
                          epsilonm = epsilonm, C = NULL, Y = Y, K = K, 
                          numchain = numchain, simrun = 50000, writeskip = 200,
                          projectname = projectname, cell.line = FALSE,
                          plot.likelihood = TRUE)
@
The most likely tree is shown below. There should be only one tree configuration 
from the posterior tree space. The code for this toy dataset analysis 
can be found \href{https://github.com/yuchaojiang/Canopy/blob/master/demo_code/canopy_demo_toy.R}{\textcolor{blue}{here}}. 

\begin{figure}[H]
\begin{center}
\setkeys{Gin}{width=0.65\linewidth}
<<label=fig5,fig=TRUE,echo=FALSE,height=6,width=5.9>>=
data(toy)
canopy.plottree(toy$besttree, txt = FALSE, pdf = FALSE)
@
\end{center}
\caption{Most likely tree by Canopy for simulated toy dataset.}
\label{fig:five}
\end{figure}
\newpage
The second toy example has a different tree topology. Feel free to try Canopy 
on this dataset too! There should be also just one tree configuration as is 
shown below from the posterior tree space.  The code for this toy dataset 
analysis can be found \href{https://github.com/yuchaojiang/Canopy/blob/master/demo_code/canopy_demo_toy2.R}{\textcolor{blue}{here}}. 

<<Try it your self2, eval = FALSE>>=
library(Canopy)
data(toy2)
projectname = 'toy2'
R = toy2$R; X = toy2$X; WM = toy2$WM; Wm = toy2$Wm
epsilonM = toy2$epsilonM; epsilonm = toy2$epsilonm; Y = toy2$Y
true.tree = toy2$true.tree  # true underlying tree
K = 3:6; numchain = 15
sampchain = canopy.sample(R = R, X = X, WM = WM, Wm = Wm, epsilonM = epsilonM, 
                          epsilonm = epsilonm, C = NULL, Y = Y, K = K, 
                          numchain = numchain, max.simrun = 100000,
                          min.simrun = 10000, writeskip = 200,
                          projectname = projectname, cell.line = FALSE,
                          plot.likelihood = TRUE)
@

\begin{figure}[H]
\begin{center}
\setkeys{Gin}{width=0.7\linewidth}
<<label=fig6,fig=TRUE,echo=FALSE,height=6,width=5.9>>=
data(toy2)
canopy.plottree(toy2$true.tree, txt = FALSE, pdf = FALSE)
@
\end{center}
\caption{Most likely tree by Canopy for simulated toy dataset 2.}
\label{fig:six}
\end{figure}


The third toy example consists of three bulk tumor samples, in silico simulated from a tree of 4 clones/leaves. The 5 tree segments (excluding the leftmost branch, which corresponds to the normal clone) separate 200 mutations into 5 mutation clusters. The SNA clustering details are outlined in section ``Binomial clustering of SNAs". The code for this toy dataset analysis is briefly attached below. Code in more details with visualization and posterior analysis can be found \href{https://github.com/yuchaojiang/Canopy/blob/master/demo_code/canopy_demo_toy3.R}{\textcolor{blue}{here}}. 

<<Try it your self3, eval = FALSE>>=
library(Canopy)
data(toy3)
R=toy3$R; X=toy3$X
num_cluster=2:9 # Range of number of clusters to run
num_run=10 # How many EM runs per clustering step for each mutation cluster wave
canopy.cluster=canopy.cluster(R = R,
                              X = X,
                              num_cluster = num_cluster,
                              num_run = num_run)

bic_output=canopy.cluster$bic_output # BIC for model selection (# of clusters)
Mu=canopy.cluster$Mu # VAF centroid for each cluster
Tau=canopy.cluster$Tau  # Prior for mutation cluster, with a K+1 component
sna_cluster=canopy.cluster$sna_cluster # cluster identity for each mutation

projectname='toy3'
K = 3:5 # number of subclones
numchain = 15 # number of chains with random initiations
sampchain = canopy.sample.cluster.nocna(R = R, X = X, sna_cluster = sna_cluster,
                                        K = K, numchain = numchain, 
                                        max.simrun = 100000, min.simrun = 20000,
                                        writeskip = 200, projectname = projectname,
                                        cell.line = FALSE, plot.likelihood = TRUE)
save.image(file = paste(projectname, '_postmcmc_image.rda',sep=''),
           compress = 'xz')
@

\section*{4. Citation}
Assessing intra-tumor heterogeneity and tracking longitudinal and spatial clonal
evolutionary history by next-generation sequencing,
Yuchao Jiang, Yu Qiu, Andy J Minn, Nancy R zhang, Proceedings of the National Academy of Sciences, 2016. (\href{http://www.pnas.org/content/113/37/E5528}{html}, \href{http://www.pnas.org/content/113/37/E5528.full.pdf}{pdf})


\section*{5. Session information:}
Output of sessionInfo on the system on which this document was compiled:

<<sessionInfo, results=tex, echo=FALSE>>=
toLatex(sessionInfo())
@

\end{document}