The archives CORE_1.tar.gz and CORE_2.tar.gz contain the simulated 454 pyrosequencing 16S rDNA datasets used in: May, A., Abeln, S., Crielaard, W., Heringa, J., Brandt, B. W. (2014). Unravelling the outcome of 16S rDNA-based taxonomy analysis through mock data and simulations. Bioinformatics, xx(x), xxx-xxx If you use the contents of these archives, please cite the article above. To extract, use: tar -zxvf CORE_1.tar.gz tar -zxvf CORE_2.tar.gz Directory structure and contents: * CORE_1 (datasets generated by using the CORE_1 reference database) - 1 (dataset 1) -- reads.sff.gz: The simulated flowgram file. -- NC.fasta: Demultiplexed reads (no cleaning -i.e. no chimera checking or denoising, only quality checking -i.e. checking for base quality scores, nr. of ambiguous bases etc.). -- reference.fasta: Non-erroneous reference sequences of NC.fasta reads without any sequencing/PCR errors and chimeras. -- D.fasta: NC.fasta reads after denoising. -- CC.fasta: NC.fasta reads after chimera checking. -- DCC.fasta: NC.fasta reads after denoising and chimera checking. -- CCD.fasta: NC.fasta reads after chimera checking and denoising. -- taxPerRead_species_labels.txt: The speices level taxonomy of reads in NC.fasta. (sample_read_id** -> taxonomy_species) -- taxPerRead_genus_labels.txt: The genus level taxonomy of reads in NC.fasta. (sample_read_id -> taxonomy_genus) -- taxPerRead_SFF_species_labels.txt: The species level taxonomy of reads in reads.sff (sff_readid*** -> taxonomy_species). -- taxPerRead_SFF_genus_labels.txt: The genus level taxonomy of reads in reads.sff (sff_readid -> taxonomy_genus). - 2 (dataset 2) -- Same as in dataset 1. - ... - ... - 50 (dataset 50) * CORE_2 (datasets generated by using the CORE_2 reference database) -- The same directory/file structure as in CORE_1. ** sff_readid is the unique identifier of reads coming from Grinder & CHSIM fasta files. *** sample_read_id is the unique identifier the reads acquire after demultiplexing with QIIME's split_libraries.py. ------ e.g. >BigSimMock_11 22297 orig_bc=TGAGCTAGAG new_bc=TGAGCTAGAG bc_diffs=0 CACGCTGTAAACGTTGGGCACTA... sff_readid: 22297 sample_read_id: BigSimMock_11 ------ For details of CORE_1 & CORE_2 databases, simulations and detailed methods, see the above article. Contact: a.may@vu.nl & b.brandt@acta.nl