
In 2018, I published a work in which I used high throughput Illumina sequencing to produce a de novo assembly of the chloroplast genome of the alga, Chlamydomonas reinhardtii. Two earlier versions of the Chlamydomonas chloroplast genome had been generated from Sanger sequencing, and were publicly available. In my research, it seemed that there were many errors in the earlier versions, but how could I demonstrate that?
To this end, I generated a suite of tools called evaluatingAssemblies to quantitatively examine the relative quality of different assemblies. The tools in this suite take Illumina data that has been aligned to each assembly, and grades the alignment using three criteria:
- the error (mismatch) frequency
- uniformity of depth of coverage
- uniformity of insert length for paired-end data
The evaluatingAssemblies suite is available from my bitbucket repository.
