The archives CORE_1.tar.gz and CORE_2.tar.gz contain the simulated 454 pyrosequencing 16S rDNA datasets used in:

May, A., Abeln, S., Crielaard, W., Heringa, J., Brandt, B. W. (2014). 
Unravelling the outcome of 16S rDNA-based taxonomy analysis through mock data and simulations. 
Bioinformatics, xx(x), xxx-xxx


If you use the contents of these archives, please cite the article above.

To extract, use:
tar -zxvf CORE_1.tar.gz
tar -zxvf CORE_2.tar.gz

Directory structure and contents:
* CORE_1 (datasets generated by using the CORE_1 reference database)
 - 1 (dataset 1)
  -- reads.sff.gz: The simulated flowgram file.
  -- NC.fasta: Demultiplexed reads (no cleaning -i.e. no chimera checking or denoising, only quality checking -i.e. checking for base quality scores, nr. of ambiguous bases etc.).
  -- reference.fasta: Non-erroneous reference sequences of NC.fasta reads without any sequencing/PCR errors and chimeras.
  -- D.fasta: NC.fasta reads after denoising.
  -- CC.fasta: NC.fasta reads after chimera checking.
  -- DCC.fasta: NC.fasta reads after denoising and chimera checking.
  -- CCD.fasta: NC.fasta reads after chimera checking and denoising.
  -- taxPerRead_species_labels.txt: The speices level taxonomy of reads in NC.fasta. (sample_read_id** -> taxonomy_species)
  -- taxPerRead_genus_labels.txt: The genus level taxonomy of reads in NC.fasta. (sample_read_id -> taxonomy_genus)
  -- taxPerRead_SFF_species_labels.txt: The species level taxonomy of reads in reads.sff (sff_readid*** -> taxonomy_species).
  -- taxPerRead_SFF_genus_labels.txt: The genus level taxonomy of reads in reads.sff (sff_readid -> taxonomy_genus).
 - 2 (dataset 2)
  -- Same as in dataset 1.
 - ...
 - ...
 - 50 (dataset 50)

* CORE_2 (datasets generated by using the CORE_2 reference database)
 -- The same directory/file structure as in CORE_1. 
 
** sff_readid is the unique identifier of reads coming from Grinder & CHSIM fasta files.
*** sample_read_id is the unique identifier the reads acquire after demultiplexing with QIIME's split_libraries.py.

------
e.g.

>BigSimMock_11 22297 orig_bc=TGAGCTAGAG new_bc=TGAGCTAGAG bc_diffs=0
CACGCTGTAAACGTTGGGCACTA...

sff_readid: 22297
sample_read_id: BigSimMock_11
------

For details of CORE_1 & CORE_2 databases, simulations and detailed methods, see the above article.

Contact: a.may@vu.nl & b.brandt@acta.nl