Introduction
The program REPRO (Heringa and Argos, 1993) is able to recognise distant repeats in a single query sequence. The technique relies on a variation of the Smith-Waterman local alignment strategy to find non-overlapping top-scoring local alignments, followed by a graph-based iterative clustering procedure to delineate the repeat set(s) based on consistency of the pairwise top-alignments. REPRO is able to detect multiple repeat types within a single protein sequence.
The REPRO program operates in two steps:
- Calculation of a list of N top-scoring non-overlapping local alignments. This is the SLOW step. The parameter N should be specified by the user.
- Graph-based clustering of M top-alignments to assemble the repeat sets. This is the FAST step. The parameter M (M <= N) should be set by the user.
It is advisable to set N to 50-100 in the first step, so that M can be varied easily in the second step, would a repeat set be very divergent and difficult to detect.
Back to Top of Document
User guide
Repro is simple to use:
Calculation of a list of N top-scoring non-overlapping local alignments.
- Paste your sequence into the text area, or specify a sequence file using the upload option.
- The parameters can be altered to optimise your results. Gap opening and extension penalties can be selected, the default is set at 10 and 1 respectively. A choice is given between the blosum62 and pam250 substitution matrices, the default being pam250. Set the number of alignments (N) to between 50-100, default 50.
- Enter your email address. If the query sequence is longer than 700 residues you will be mailed the location of your personal directory where the results from step1 are contained.
- Press 'Run Repro'.
After step 1 is complete you will be notified, and given the location of your personal directory where the results from step 1 are held.
Graph-based clustering of M top-alignments to assemble the repeat sets.
- Follow the hyperlink to the personal directory containing your results, or if your results from step 1 where mailed to you, enter your personal directory into the text area.
- Specify the number of top-alignments (M) to assemble the repeat sets.
- Press 'Run Repro'.
Results will be held for up to 7 days. Questions and comments should be sent to: rgeorge@nimr.mrc.ac.uk
Back to Top of Document
Sample Results
Enter your sequence into the text area in Step 1, eg:
>tf3a.seq
MGEKALPVVYKRYICSFADCGAAYNKNWKLQAHLCKHTGEKPFPCKEEGC
EKGFTSLHHLTRHSLTHTGEKNFTCDSDGCDLRFTTKANMKKHFNRFHNI
KICVYVCHFENCGKAFKKHNQLKVHQFSHTQQLPYECPHEGCDKRFSLPS
RLKRHEKVHAGYPCKKDDSCSFVGKTWTLYLKHVAECHQDLAVCDVCNRK
FRHKDYLRDHQKTHEKERTVYLCPRDGCDRSYTTAFNLRSHIQSFHEEQR
PFVCEHAGCGKCFAMKKSLERHSVVHDPEKRKLKEKCPRPKRSLASRLTG
YIPPKSKEKNASVSGTEKTDSLVKNKPSGTETNGSLVLDKLTIQ
Or upload the sequence using the browse button. The sequence does not have to be in fasta format. Press the 'Run Repro' button on the form to submit your sequence. The results from step 1 will be saved in your personal directory. When step 1 is complete you will be displayed this message:
Go to your personal directory, each of the top scoring alignments are printed with their corresponding positions in the sequence. Complete the form and press the second 'Run Repro' Button to run step 2 of the method. The results will appear on the screen.
The final results page is split into four categories; Repeats, Alignments, Evaluation and Fragments.
Repeats,
Alignments,
Evaluation,
Fragments.
Back to Top of Document
FTP
You are able to download REPRO via HTTP from the downloads section. The interface can not be downloaded. There are three basic ways to run REPRO:
- repro seqfile a topnum [mismatch-val or Dayhoff-file] gap-i gap-e
- repro seqfile c [topnum] alignment-scripts-file
- repro seqfile topnum [mismatch-val or Dayhoff-file] gap-i gap-e
Repro Variables |
repro | The repro executable. |
seqfile | The file containing the sequence. |
a | (alignment) implies making alignments only (and automatically writing scripts to .iscr file). |
c | (continue) implies continuation with analysis using a .iscr file.
|
cv | Implies verbose continuation (more parameters). No a or c implies a + c. |
topnum | The number of non-overlapping top-scoring local alignments, generated using the 'a' option or analysed using 'c' option (topnum should be smaller or equal to that in previously produced .iscr file). |
mismatch-val | Negative real number applied to mismatching residues. |
Dayhoff-file | File containing residue substitution values. |
gap-i, gap-e | Gap initiation and extension penalties (non-negative real value). |
alignment-scripts-file | The .iscr input file containing scripts from earlier run. |
For time reasons it is advisable to first produce an .iscr file ('a' option) with a high number of top-scoring alignments (e.g. set topnum to 50) and then use this file for further analysis using the 'c' (or 'cv') option. Producing the non-overlapping top-scoring alignments is very slow compared to inferring the repeats during subsequent analysis. 'l' instead of 'a' (alignment only option) yields alignment scripts file in alternative format.
Back to Top of Document
Help
There is a help page which can be accessed throughout the REPRO steps.
Questions and comments regarding the interface to Repro should be sent to Richard George and questions regarding the program Repro should be sent to Jaap Heringa.
Back to Top of Document
References:
- Heringa J. and Argos P. (1993) A method to recognize distant repeats in protein sequences. Proteins Struct. Func. Genet. 17, 391-411.
- Heringa J. (1999) Two strategies for sequence comparison: profile-preprocessed and secondary structure-induced multiple alignment. Comp. Chem. 23, 341-364.
- George RA. and Heringa J. (2000) The REPRO server: finding protein internal
sequence repeats through the web. Trends Biochem. Sci. 25, 515-517.
Back to Top of Document
Back to REPRO
|