----------------------------
			 A ROUGH GUIDE TO PRC OPTIONS
			 ----------------------------


Library versus pairwise runs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The main PRC binary has two modes, "pairwise" and "library". In the pairwise
mode, PRC scores a model against another model and simply prints out the
results. In the library mode, it runs a model against a library of models and
saves the output to output.scores (and, if requested, the alignments to
output.aligns).

PRC assumes that each model file contains exactly one model, and that the
library file lists model files (using absolute paths), one filename per line.


-algo  <> : forward (default), viterbi
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Forward means, roughly, "sum across all possible alignments". Viterbi means,
roughly, "find the best possible alignment". Forward is considered to be
better, because it can find similarity even when the exact alignment is
unclear. 

For Forward, the alignments (as opposed to scores) are calculated using the
Maximum Alignment Accuracy algorithm introduced by Holmes and Durbin. This is
about 3x slower than calculating the single best alignment using Viterbi,
but there is now empirical evidence (T-COFFEE, PROBCONS) that the benefit is
worth the while.


-MMfn  <> : match-match scoring function; dot1, dot2 (default)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Dot1 was the function used by PRC up to 1.5.0. Dot2, suggested by Johannes
Soeding, appears to perform marginally better and has become the default
starting with PRC 1.5.1.


-mode  <> : local-local (default), local-global, etc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Alignment mode. Local is Smith-Waterman ("alignment can start and end
anywhere"), global is Needleman-Wunsch ("alignment starts at the start of the
model and ends at the end"). Local-global means "local to HMM1, global to
HMM2".


-align <> : alignment style; none (default), prc, sam1, sam2
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
By default, no alignments are printed out or saved (corresponding to the option
'none'). For the remaining three options, the correspondence between what's
printed out and internal PRC states is as follows:

               MM       MI       IM     DM,DI    MD,ID      DD      II
            -------  -------  -------  -------  -------  ------- -------
     prc  | 'M','M'  'M','I'  'I','M'  'D','~'  '~','D'  'D','D' 'I','I'
     sam1 |   'M'      'm'      'I'      'd'      '-'      'D'     'i'
     sam2 |   'M'      'I'      'm'      '-'      'd'      'D'     'i'

Prc is the easiest to understand, but it requires two lines per alignment (one
for each model). Sam1 and sam2 are inspired by SAM formats; the difference is
that for sam1, hmm1 is the "sequence" (and hmm2 the "model"), and vice versa
for sam2.


-Emax  <> : only report hits with E-value <= Emax (default: 10)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If E-values are calculated for a run, only hits better than Emax get reported.

Currently, only local-local runs against a library have E-values.


-stop  <> : stop looking for more hits when simple < stop
-hits  <> : stop looking for more hits when hit_no > hits
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
PRC uses a variation of the Waterman-Eggert procedure to try and find all
plausible alignments between a pair of models. However, sometimes there are
hundreds of such alignments, so the question is: when do you stop looking for
more? The PRC answer is: whenever an alignment is found with a simple score
less than the stop parameter, or whenever the number of hits is exceeded. If
E-values are calculated for the run, the search will also be terminated when an
alignment gets an E-value > 10*Emax.

Beware of setting stop too low for fully automated runs (without resetting
hits) -- it can easily result in a near-infinite loop.


Model names
~~~~~~~~~~~
The model names are printed out in the hmm1 and hmm2 columns, and in the
alignment file. Where do they come from? For SAM and PSI-BLAST files, they are
taken from the filename. If the filename is

/dir1/dir2/XYZ.1.extension

the model name will be "XYZ.1". For HMMER models, the name is taken from the
ACC line, or the NAME line if there is no ACC, or the filename if there is no
ACC or NAME. For FASTA files, it is taken from the > line. Finally, PRC models
store the name inside the file. For models created using convert_to_prc the
name is taken from the input file.


If you have any questions, please get in touch!

Martin Madera
mm238@mrc-lmb.cam.ac.uk