---------------------------- A ROUGH GUIDE TO PRC OPTIONS ---------------------------- Library versus pairwise runs ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The main PRC binary has two modes, "pairwise" and "library". In the pairwise mode, PRC scores a model against another model and simply prints out the results. In the library mode, it runs a model against a library of models and saves the output to output.scores (and, if requested, the alignments to output.aligns). PRC assumes that each model file contains exactly one model, and that the library file lists model files (using absolute paths), one filename per line. -algo <> : forward (default), viterbi ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Forward means, roughly, "sum across all possible alignments". Viterbi means, roughly, "find the best possible alignment". Forward is considered to be better, because it can find similarity even when the exact alignment is unclear. For Forward, the alignments (as opposed to scores) are calculated using the Maximum Alignment Accuracy algorithm introduced by Holmes and Durbin. This is about 3x slower than calculating the single best alignment using Viterbi, but there is now empirical evidence (T-COFFEE, PROBCONS) that the benefit is worth the while. -MMfn <> : match-match scoring function; dot1, dot2 (default) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Dot1 was the function used by PRC up to 1.5.0. Dot2, suggested by Johannes Soeding, appears to perform marginally better and has become the default starting with PRC 1.5.1. -mode <> : local-local (default), local-global, etc. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Alignment mode. Local is Smith-Waterman ("alignment can start and end anywhere"), global is Needleman-Wunsch ("alignment starts at the start of the model and ends at the end"). Local-global means "local to HMM1, global to HMM2". -align <> : alignment style; none (default), prc, sam1, sam2 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ By default, no alignments are printed out or saved (corresponding to the option 'none'). For the remaining three options, the correspondence between what's printed out and internal PRC states is as follows: MM MI IM DM,DI MD,ID DD II ------- ------- ------- ------- ------- ------- ------- prc | 'M','M' 'M','I' 'I','M' 'D','~' '~','D' 'D','D' 'I','I' sam1 | 'M' 'm' 'I' 'd' '-' 'D' 'i' sam2 | 'M' 'I' 'm' '-' 'd' 'D' 'i' Prc is the easiest to understand, but it requires two lines per alignment (one for each model). Sam1 and sam2 are inspired by SAM formats; the difference is that for sam1, hmm1 is the "sequence" (and hmm2 the "model"), and vice versa for sam2. -Emax <> : only report hits with E-value <= Emax (default: 10) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ If E-values are calculated for a run, only hits better than Emax get reported. Currently, only local-local runs against a library have E-values. -stop <> : stop looking for more hits when simple < stop -hits <> : stop looking for more hits when hit_no > hits ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ PRC uses a variation of the Waterman-Eggert procedure to try and find all plausible alignments between a pair of models. However, sometimes there are hundreds of such alignments, so the question is: when do you stop looking for more? The PRC answer is: whenever an alignment is found with a simple score less than the stop parameter, or whenever the number of hits is exceeded. If E-values are calculated for the run, the search will also be terminated when an alignment gets an E-value > 10*Emax. Beware of setting stop too low for fully automated runs (without resetting hits) -- it can easily result in a near-infinite loop. Model names ~~~~~~~~~~~ The model names are printed out in the hmm1 and hmm2 columns, and in the alignment file. Where do they come from? For SAM and PSI-BLAST files, they are taken from the filename. If the filename is /dir1/dir2/XYZ.1.extension the model name will be "XYZ.1". For HMMER models, the name is taken from the ACC line, or the NAME line if there is no ACC, or the filename if there is no ACC or NAME. For FASTA files, it is taken from the > line. Finally, PRC models store the name inside the file. For models created using convert_to_prc the name is taken from the input file. If you have any questions, please get in touch! Martin Madera mm238@mrc-lmb.cam.ac.uk