webPRC Information


		webPRC Information

PRC description

What is PRC?
The Profile Comparer (PRC) is a program for aligning and scoring profile hidden Markov models written by Martin Madera. A stand-alone version of PRC can be downloaded from the Superfamily site. PRC can be used to find more distant homologies as it scores alignments against alignments (in the form of Hidden Markov Models). For this purpose, PRC is also used by, for example, the CATH and Pfam domain databases (see e.g. for CATH Bioinformatics 2007, Nucleic Acids Research 2007, and CATH intro (Homologous Superfamily) and for Pfam, Nucleic Acids Research 2006, 2008 and Pfam entry SprT-like).
This web server
With this web server you can search for similar alignments in a number of domain databases and evaluate the results using our user-friendly output page. PRC is a profile Comparer and only reports matches, insertions or deletions (the "states"). These PRC alignments are in HMM space. We post-process the PRC output to produce a (hyperlinked) result page that includes a graphic to visualize the distribution of the hits over your query sequence. As sequences generally are more informative to the user, we provide a view in alignment space and additionally, you can view "aligned alignments". These alignments include the first sequence of the alignments as well as the consensus sequences . In addition, we provide multiple sequence alignments that correspond to the query and hit regions that were found by PRC. These alignments can be downloaded and viewed interactively with the Jalview multiple alignment editor applet. Furthermore, you can choose to visualize the output as pairwise HMM logos with LogoMat-P or Two Sample Logos (see below). Thus with this server, you can:

input your alignment directly
search domain databases
get graphical output of the hits in HMM and alignment space
evaluate results using PRC-type output, but also use our "Aligned alignments" view
evaluate results with Jalview using our multiple alignment output
download query, hit and combined alignment regions
optionally produce two types of logos
For more information, see sections below.
webPRC input

Paste or upload an alignment in one of the following alignment formats:

ClustalW (incl. ClustalW header!)
GCG MSF
SELEX
Stockholm
aligned FASTA

Please note that your input alignment has to be complete. That is, for example, a ClustalW file should start with the ClustalW header line.

Databases

You can choose to search a variety of public domain databases:

Pfam
CDD, the NCBI Conserved Domain Database, with the following sub sections:

Pfam v24.0(CDD version)
CD (Conserved Domains: NCBI-curated section attempting to group ancient domains related by common descent into family hierarchies)
SMART (Simple Modular Architecture Research Tool)
COG (Clusters of Orthologous Groups of proteins)
PRK (PRotein K(c)lusters)
TIGRFAMs v8.00

KOG (Eukaryotic clusters of Orthologous groups)
SUPERFAMILY (only the PRC models, the alignment files are not available)
TIGRFAMs
CATH

Building details

Most profiles are generated locally from seed alignments in order to make the mapping to the sequences possible. The building process depends on the database:

Pfam: Pfam is built locally using seed alignment files and the hmmbuild options present therein (downloaded from Pfam).
CDD/KOG: all FASTA files were download via the NCBI. The HMM models were generated with hmmbuild (no options).
TIGRFAMs: TIGRFAMs are built locally using the seed alignment files and the hmmbuild options present therein. The INFO files are parsed to provide a domain description.
CATH: The CATH HMMer library was download from the CATH site. We did not rebuild this HMMer library locally. The CATH alignment have been processed to include only the first 200 sequences.
Less...

You can choose to search the entire CDD from NCBI by choosing CDD. Note that KOG itself is not part of the CDD database. If you prefer to search a section of CDD only, you may select CD, SMART, COG or select KOG. For more information on these NCBI databases, see the NCBI CDD help. CATH uses a different protocol to build the alignments, which results in huge alignments (up to ~80000 sequences and 680 Mb for a single file). Therefore, the CATH alignment have been processed to include only the first 200 sequences. This reduced processing time and prevents huge alignment output. The original CATH HMM library is used, thus the PRC profile-profile searches are not affected. Note that the mapping to alignments is not possible for SUPERFAMILY as the alignments used to the models are not available.

PSI-BLAST
You may choose to run PSI-BLAST on your input alignment or sequence before the PRC search is started. Your alignment is used to start the PSI-BLAST search against the NCBI non-redundant database. (downloaded on 8 February 2010). The number of iterations and the E-value threshold can be chosen. The iteration value of "0" prevents PSI-BLAST runs. If you have an expert alignment, you might want to leave this value on "0". The E-value threshold refers to both the inclusion threshold (-h option) and usual E-value (-e option). This means that all sequences scoring below the supplied E-value are part of the PSI-BLAST PSSM as well as the final alignment; see PSI-BLAST for more info.
Filters
Three filtering options are available:

low-complexity: the standard low-complexity filter (SEG) build into PSI-BLAST. This masks low-complexity regions present in the PSI-BLAST query sequence. It is only used if you submit a single sequence.
identity: remove sequences with an identity less than the given value from the final alignment.
coverage: remove sequences with a coverage less than the given value from the final alignment. Coverage is the ratio of the length of hit and length of the query (calculated after removing all positions from the alignment where the query sequence has gaps).
Identity and coverage are calculated for all sequences with respect to the query sequence. If you supplied a multiple alignment and chose to run PSI-BLAST, this query sequence is the first sequence in your alignment. Both the complete PSI-BLAST output as well as the produced alignment can be download from the results page.
PRC options

You may choose a number of PRC options here:

E-value: Only hits scoring better than E-value are reported in the output.
Algorithm: You can choose forward or viterbi. Forward means, roughly, "sum across all possible alignments". Viterbi means, roughly, "find the best possible alignment". Forward is considered to be better, because it can find similarity even when the exact alignment is unclear.
Match-match scoring: dot2 is the new default scoring function. However, users may select the previous dot1 function. Generally, this option does not need to be changed.
Mode: This selects the alignment mode for the profiles. Local-global means "local to HMM1, global to HMM2".

For more information on (all) PRC options, see the README of the PRC stand-alone program itself.

Other options
Three options are available here:

Make logos: If you select "Yes" Logos will be generated automatically for all hits. Logos are generated with LogoMat-P for aligned HMMs and with Two Sample Logo for aligned sequence alignments . These programs provide a graphical view of the aligned query and hit (for details follow the links). Two Sample Logo "calculates and visualizes differences between two sets of aligned samples of amino acids ..." and uses two multiple sequence alignments as input. Both these programs are run locally with default options. The LogoMat-P software has been adapted such that the generated HMM-Logos exactly reflect the HMM alignments reported by PRC.
Use --hand: This option can only be used with the Stockholm/SELEX alignment format. The "RF" line may be used to indicate positions that should be present in the HMM model. This can be used to indicate discontinuous domains. (#=RF in SELEX, #=GC RF in Stockholm). Any column marked with a non-gap symbol (such as an "x", for instance) is assigned as a consensus (match) column in the HMM model. If you select this option but do not upload an alignment in Stockholm or SELEX format (including the RF line) no HMMer model can be generated (cf. the hmmbuild man page or the HMMER user's guide).
Number of hits in graphic: You can choose the maximum number of hits to display in the graphic showing the distribution of hits over your query alignment.

E-mail address
If you supply your e-mail address, you will be notified when your job is finished. You will also receive an e-mail if your job exceeded the computing queue time and was cancelled (this is very unlikely to happen).
webPRC output
We provide three examples:

The output of the example on the webPRC homepage is available here. The query alignment was the Pfam Serpin seed alignment and was posted in FASTA alignment format (see alignment).
An alignment of peptidases in MSF format was run against the Conserved Domains (CD) section of CDD (v2.16). This output shows a feature of the hit graph: multiple hits on the same domain are collected. Note that the domain descriptions in CD are quite extentensive.
Pfam family SprT-like was run against Pfam-A (v23.0). PRC was run in local-local mode with Viterbi alignments (see Bateman & Finn, 2007). The output shows that the two most significant hits are DUF335 and Peptidase_M76, which are indeed present as "internal database links" for SprT-like on the Pfam server. You may find such relations now also for your own alignments.

The hits provide links to the original database, the description and (if applicable) to the alignment, logos and download (providing links to the query, hit and combined alignments in ClustalW format).
The PRC profile alignments use the following letters: M, D, I, to indicate Match, Delete and Insert states. The "~" symbol indicates a "place-holder" and is often found paired with a delete state, "D" (cf. stand-alone PRC README). The "Aligned alignments" view provides a detailed view in sequence-alignment space. The first sequence and consensus of the query are shown, followed by a mid-line where each "+" indicated a PRC match ("M") state. This mid-line is followed by the consensus and first sequence of the hit alignment.
Reformat
Use reformat to change the results page without the need of re-running PRC. In case you would like to see a different number of hits in the graphic or forgot to ask for logos on submission, use reformat. By default only the 25 best hits (or less depending on the PRC E-value threshold) are shown in the hit table and alignment view. You can also use reformat to show more hits.
Note that PRC only calculates E-value for local-local searches. As the number of hits in local-global mode can be huge, you the maximum number of hits shown is 200.
Remember that you can retrieve the raw PRC output.
Logos
Help on reading the pairwise HMM logo produced by LogoMat-P is available here. More information on the Two Sample Logos, produced from the multiple sequence alignments, can be found here.
Gaps
The "Aligned alignments" view can contain different gap characters. Gaps already present in the alignment are (still) indicated by "-". The gaps (as well as Insert states) introduced by PRC are indicated by '~', while the gaps that were introduced to align the profiles with sequence alignments are indicated by ":". These colons represent positions that are absent in the profile HMM, but present in the multiple sequence alignment. You may notice that the ":" always occurs opposite to a lower-case letter (in the other consensus sequence) or to a dot ("."; in the first sequence). The alignment files for viewing with Jalview or for download only contain one type of gap character ("-").
Notes
HMM models often contain dots (".") and lower-case letters, which are preserved in the aligned-alignment view. Alignment colums with dots or lower-case letters are not present in the HMM profile (see the Pfam FAQ). Thus, an HMM profile can contain less columns than its seed alignment.
ClustalX only regards "-" as a gap character and ignores all others. The alignments we provide for viewing and download therefore only use "-" as gap character. This also means that you will not see the "~" or "." characters in Jalview or ClustalX.