What is PRC?
The Profile Comparer (PRC) is a program for aligning and scoring profile hidden Markov models written by
A stand-alone version of PRC can be downloaded from the Superfamily site.
PRC can be used to find more distant homologies as it scores alignments against alignments (in the form of Hidden Markov Models).
For this purpose, PRC is also used by, for example, the CATH and Pfam domain databases
(see e.g. for CATH Bioinformatics 2007,
Nucleic Acids Research 2007,
and CATH intro (Homologous Superfamily)
and for Pfam,
Nucleic Acids Research 2006,
and Pfam entry SprT-like).
This web server
With this web server you can search for similar alignments in a number of domain databases and evaluate the results using our user-friendly output page.
PRC is a profile Comparer and only reports matches, insertions or deletions (the "states").
These PRC alignments are in HMM space. We post-process the PRC output to produce a (hyperlinked) result page that includes
a graphic to visualize the distribution of the hits over your query sequence.
As sequences generally are more informative to the user, we provide a view in alignment space
and additionally, you can view "aligned alignments".
These alignments include the first sequence of the alignments as well as the consensus sequences
In addition, we provide multiple sequence alignments that correspond to the query and hit regions that were found by PRC.
These alignments can be downloaded and viewed interactively with the Jalview multiple alignment editor applet.
Furthermore, you can choose to visualize the output as pairwise HMM logos with LogoMat-P or
Two Sample Logos (see below).
Thus with this server, you can:
For more information, see sections below.
- input your alignment directly
- search domain databases
- get graphical output of the hits in HMM and alignment space
- evaluate results using PRC-type output, but also use our "Aligned alignments" view
- evaluate results with Jalview using our multiple alignment output
- download query, hit and combined alignment regions
- optionally produce two types of logos
Paste or upload an alignment in one of the following alignment formats:
Please note that your input alignment has to be complete.
That is, for example, a ClustalW file should start with the ClustalW header line.
You can choose to search a variety of public domain databases:
- CDD, the NCBI Conserved Domain Database, with the following sub sections:
- Pfam v24.0(CDD version)
- CD (Conserved Domains:
NCBI-curated section attempting to group ancient domains related by
common descent into family hierarchies)
- SMART (Simple Modular Architecture Research Tool)
- COG (Clusters of Orthologous Groups of proteins)
- PRK (PRotein K(c)lusters)
- TIGRFAMs v8.00
- KOG (Eukaryotic clusters of Orthologous groups)
- SUPERFAMILY (only the PRC models, the alignment files are not available)
You can choose to search the entire CDD from NCBI by choosing CDD.
Note that KOG itself is not part of the CDD database.
If you prefer to search a section of CDD only, you may select CD, SMART, COG or select KOG.
For more information on these NCBI databases, see the NCBI CDD help.
CATH uses a different protocol to build the alignments, which results in huge alignments
(up to ~80000 sequences and 680 Mb for a single file).
Therefore, the CATH alignment have been processed to include only the first 200 sequences.
This reduced processing time and prevents huge alignment output.
The original CATH HMM library is used, thus the PRC profile-profile searches are not affected.
Note that the mapping to alignments is not possible for SUPERFAMILY as the alignments used to the
models are not available.
You may choose to run PSI-BLAST on your input alignment or sequence before the PRC search is started.
Your alignment is used to start the PSI-BLAST search against the NCBI non-redundant database.
(downloaded on 8 February 2010). The number of iterations and the E-value threshold can be chosen.
The iteration value of "0" prevents PSI-BLAST runs.
If you have an expert alignment, you might want to leave this value on "0".
The E-value threshold refers to both the inclusion threshold (-h option) and usual
E-value (-e option). This means that all sequences scoring below the supplied E-value are part of the PSI-BLAST PSSM
as well as the final alignment; see PSI-BLAST
for more info.
Three filtering options are available:
Identity and coverage are calculated for all sequences with respect to the query sequence.
If you supplied a multiple alignment and chose to run PSI-BLAST, this query sequence is the first sequence in your alignment.
Both the complete PSI-BLAST output as well as the produced alignment can be download from the results page.
- low-complexity: the standard low-complexity filter (SEG) build into PSI-BLAST.
This masks low-complexity regions present in the PSI-BLAST query sequence.
It is only used if you submit a single sequence.
- identity: remove sequences with an identity less than the given value from the final alignment.
- coverage: remove sequences with a coverage less than the given value from the final alignment.
Coverage is the ratio of the length of hit and length of the query
(calculated after removing all positions from the alignment where the query sequence has gaps).
You may choose a number of PRC options here:
- E-value: Only hits scoring better than E-value are reported in the output.
- Algorithm: You can choose forward or viterbi. Forward means, roughly, "sum across all possible alignments". Viterbi means,
roughly, "find the best possible alignment". Forward is considered to be
better, because it can find similarity even when the exact alignment is
- Match-match scoring: dot2 is the new default scoring function. However, users may select the previous dot1 function.
Generally, this option does not need to be changed.
- Mode: This selects the alignment mode for the profiles.
Local-global means "local to HMM1, global to HMM2".
For more information on (all) PRC options, see the README
of the PRC stand-alone program itself.
Three options are available here:
- Make logos: If you select "Yes" Logos will be generated automatically for all hits.
Logos are generated with LogoMat-P for aligned HMMs
and with Two Sample Logo for aligned sequence alignments
These programs provide a graphical view of the aligned query and hit (for details follow the links).
Two Sample Logo "calculates and visualizes differences between two sets of aligned samples of amino acids ..."
and uses two multiple sequence alignments as input.
Both these programs are run locally with default options.
The LogoMat-P software has been adapted such that the generated HMM-Logos exactly reflect the HMM alignments reported by PRC.
- Use --hand: This option can only be used with the Stockholm/SELEX alignment format.
The "RF" line may be used to indicate positions that should be present in the HMM model. This can be used to indicate discontinuous domains.
(#=RF in SELEX, #=GC RF in Stockholm). Any column marked with a non-gap symbol (such
as an "x", for instance) is assigned as a consensus (match) column in the HMM model.
If you select this option but do not upload an alignment in Stockholm or SELEX format (including the RF line) no HMMer model can be generated
hmmbuild man page or the HMMER user's guide).
- Number of hits in graphic: You can choose the maximum number of hits to display in the graphic
showing the distribution of hits over your query alignment.
If you supply your e-mail address, you will be notified when your job is finished.
You will also receive an e-mail if your job exceeded the computing queue time
and was cancelled (this is very unlikely to happen).
We provide three examples:
The output of the example on the webPRC homepage is available here.
The query alignment was the Pfam Serpin seed alignment
and was posted in FASTA alignment format (see alignment).
- An alignment of peptidases in MSF format
was run against the Conserved Domains (CD) section of CDD (v2.16).
This output shows a feature of the hit graph:
multiple hits on the same domain are collected.
Note that the domain descriptions in CD are quite extentensive.
- Pfam family SprT-like was run against Pfam-A (v23.0).
PRC was run in local-local mode with Viterbi alignments (see Bateman & Finn, 2007).
The output shows that
the two most significant hits are DUF335
and Peptidase_M76, which are indeed present as
"internal database links" for SprT-like on the Pfam server.
You may find such relations now also for your own alignments.
The hits provide links to the original database, the description and (if applicable) to the alignment, logos and
download (providing links to the query, hit and combined alignments in ClustalW format).
The PRC profile alignments use the following letters: M, D, I, to indicate Match, Delete and Insert states.
The "~" symbol indicates a "place-holder" and is often found paired with a delete state, "D"
(cf. stand-alone PRC README).
The "Aligned alignments" view provides a detailed view in sequence-alignment space.
The first sequence and consensus of the query are shown, followed by a mid-line where each "+" indicated a PRC match ("M") state.
This mid-line is followed by the consensus and first sequence of the hit alignment.
Use reformat to change the results page without the need of re-running PRC.
In case you would like to see a different number of hits in the graphic
or forgot to ask for logos on submission, use reformat.
By default only the 25 best hits (or less depending on the PRC E-value threshold) are shown in the hit table and alignment view.
You can also use reformat to show more hits.
Note that PRC only calculates E-value for local-local searches.
As the number of hits in local-global mode can be huge,
you the maximum number of hits shown is 200.
Remember that you can retrieve the raw PRC output.
Help on reading the pairwise HMM logo produced by LogoMat-P is available here.
More information on the Two Sample Logos, produced from the multiple sequence alignments, can be found here.
The "Aligned alignments" view can contain different gap characters.
Gaps already present in the alignment are (still) indicated by "-".
The gaps (as well as Insert states) introduced by PRC are indicated by '~', while
the gaps that were introduced to align the profiles with sequence alignments
are indicated by ":". These colons represent positions that are absent in the profile HMM,
but present in the multiple sequence alignment.
You may notice that the ":" always occurs opposite to
a lower-case letter (in the other consensus sequence) or to a dot
("."; in the first sequence).
The alignment files for viewing with Jalview or for download only contain one type of gap character ("-").
HMM models often contain dots (".") and lower-case letters, which are preserved in the aligned-alignment view.
Alignment colums with dots or lower-case letters are not present in the HMM profile (see the Pfam FAQ).
Thus, an HMM profile can contain less columns than its seed alignment.
ClustalX only regards "-" as a gap character and ignores all others.
The alignments we provide for viewing and download therefore only use "-" as gap character.
This also means that you will not see the "~" or "." characters in Jalview or ClustalX.
(c) IBIVU 2017. If you are experiencing problems with the
site, please contact the webmaster.