Description
Reproducibility of research is a common issue in science, especially
in computationally expensive research fields e.g. cancer research.
A comprehensive picture of the genomic aberrations that occur during
tumour progression and the resulting intra-tumour heterogeneity, is
essential for personalised and precise cancer therapies. With the
change in the tumour environment under treatment, heterogeneity allows
the tumour additional ways to evolve resistance, such that
intra-tumour genomic diversity is a cause of relapse and treatment
failure. Earlier bulk sequencing technologies were incapable of
determining the diversity in the tumour.
Single-cell DNA sequencing - a recent sequencing technology - offers
resolution down to the level of individual cells and is playing an
increasingly important role in this field.
We present a reproducible and scalable Python data analysis pipeline
that employs a statistical model and an MCMC algorithm to infer the
evolutionary history of copy number alterations of a tumour from
single cells. The pipeline is built using Python, Conda environment
management system and the Snakemake workflow management system. The
pipeline starts from the raw sequencing files and a settings file for
parameter configurations. After running the data analysis, pipeline
produces report and figures to inform the treatment decision of the
cancer patient.