Evaluating Assemblies

eval_assem.4x3

In 2018, I published a work in which I used high throughput Illumina sequencing to produce a de novo assembly of the chloroplast genome of the alga, Chlamydomonas reinhardtii. Two earlier versions of the Chlamydomonas chloroplast genome had been generated from Sanger sequencing, and were publicly available. In my research, it seemed that there were many errors in the earlier versions, but how could I demonstrate that?

To this end, I generated a suite of tools called evaluatingAssemblies to quantitatively examine the relative quality of different assemblies. The tools in this suite take Illumina data that has been aligned to each assembly, and grades the alignment using three criteria:

  1. the error (mismatch) frequency
  2. uniformity of depth of coverage
  3. uniformity of insert length for paired-end data

The evaluatingAssemblies suite is available from my bitbucket repository.