REPRO - documentation
sidebar-main
title
griff
banner-button_0 banner-button_Layer-7 banner_button_03 banner-button_Layer-4 banner-button_05 banner-button_Layer-5 banner-button_07
banner-button_08 banner-button_09 banner-button_10
Bioinformatics Unit banner
   homeoffresearchoffconfoffprogonmemoffpuboffvacoff
   tabfoot tabfoot-bgtabfoot-bgtabfoot-bgtabfoot-bgtabfoot-bgtabfoot-bgtabfoot-bgtabfoot-bgtabfoot-bgtabfoot-bgtabfoot-bgtabfoot-bgtabfoot-bgtabfoot-bgtabfoot-bg

REPRO - documentation

  1. Introduction
  2. User Guide
  3. Sample Results
  4. FTP
  5. Help
  6. References

Introduction

The program REPRO (Heringa and Argos, 1993) is able to recognise distant repeats in a single query sequence. The technique relies on a variation of the Smith-Waterman local alignment strategy to find non-overlapping top-scoring local alignments, followed by a graph-based iterative clustering procedure to delineate the repeat set(s) based on consistency of the pairwise top-alignments. REPRO is able to detect multiple repeat types within a single protein sequence.

The REPRO program operates in two steps:

  1. Calculation of a list of N top-scoring non-overlapping local alignments. This is the SLOW step. The parameter N should be specified by the user.
  2. Graph-based clustering of M top-alignments to assemble the repeat sets. This is the FAST step. The parameter M (M <= N) should be set by the user.
It is advisable to set N to 50-100 in the first step, so that M can be varied easily in the second step, would a repeat set be very divergent and difficult to detect.

Back to Top of Document

User guide

Repro is simple to use:

Step 1

Calculation of a list of N top-scoring non-overlapping local alignments.
  • Paste your sequence into the text area, or specify a sequence file using the upload option.
  • The parameters can be altered to optimise your results. Gap opening and extension penalties can be selected, the default is set at 10 and 1 respectively. A choice is given between the blosum62 and pam250 substitution matrices, the default being pam250. Set the number of alignments (N) to between 50-100, default 50.
  • Enter your email address. If the query sequence is longer than 700 residues you will be mailed the location of your personal directory where the results from step1 are contained.
  • Press 'Run Repro'.

After step 1 is complete you will be notified, and given the location of your personal directory where the results from step 1 are held.

Step 2

Graph-based clustering of M top-alignments to assemble the repeat sets.
  • Follow the hyperlink to the personal directory containing your results, or if your results from step 1 where mailed to you, enter your personal directory into the text area.
  • Specify the number of top-alignments (M) to assemble the repeat sets.
  • Press 'Run Repro'.

Results will be held for up to 7 days. Questions and comments should be sent to: rgeorge@nimr.mrc.ac.uk

Back to Top of Document


Sample Results


Enter your sequence into the text area in Step 1, eg:

                 >tf3a.seq
                 MGEKALPVVYKRYICSFADCGAAYNKNWKLQAHLCKHTGEKPFPCKEEGC 
                 EKGFTSLHHLTRHSLTHTGEKNFTCDSDGCDLRFTTKANMKKHFNRFHNI
                 KICVYVCHFENCGKAFKKHNQLKVHQFSHTQQLPYECPHEGCDKRFSLPS
                 RLKRHEKVHAGYPCKKDDSCSFVGKTWTLYLKHVAECHQDLAVCDVCNRK
                 FRHKDYLRDHQKTHEKERTVYLCPRDGCDRSYTTAFNLRSHIQSFHEEQR
                 PFVCEHAGCGKCFAMKKSLERHSVVHDPEKRKLKEKCPRPKRSLASRLTG
                 YIPPKSKEKNASVSGTEKTDSLVKNKPSGTETNGSLVLDKLTIQ 

Or upload the sequence using the browse button. The sequence does not have to be in fasta format. Press the 'Run Repro' button on the form to submit your sequence. The results from step 1 will be saved in your personal directory. When step 1 is complete you will be displayed this message:

Go to your personal directory, each of the top scoring alignments are printed with their corresponding positions in the sequence. Complete the form and press the second 'Run Repro' Button to run step 2 of the method. The results will appear on the screen.

The final results page is split into four categories; Repeats, Alignments, Evaluation and Fragments.

Repeats, 

Alignments, 

Evaluation,

Fragments.


Back to Top of Document

FTP

You are able to download REPRO via HTTP from the downloads section. The interface can not be downloaded. There are three basic ways to run REPRO:

  • repro seqfile a topnum [mismatch-val or Dayhoff-file] gap-i gap-e
  • repro seqfile c [topnum] alignment-scripts-file
  • repro seqfile topnum [mismatch-val or Dayhoff-file] gap-i gap-e
Repro Variables
reproThe repro executable.
seqfileThe file containing the sequence.
a(alignment) implies making alignments only (and automatically writing scripts to .iscr file).
c(continue) implies continuation with analysis using a .iscr file.
cvImplies verbose continuation (more parameters). No a or c implies a + c.
topnumThe number of non-overlapping top-scoring local alignments, generated using the 'a' option or analysed using 'c' option (topnum should be smaller or equal to that in previously produced .iscr file).
mismatch-valNegative real number applied to mismatching residues.
Dayhoff-fileFile containing residue substitution values.
gap-i, gap-eGap initiation and extension penalties (non-negative real value).
alignment-scripts-fileThe .iscr input file containing scripts from earlier run.

For time reasons it is advisable to first produce an .iscr file ('a' option) with a high number of top-scoring alignments (e.g. set topnum to 50) and then use this file for further analysis using the 'c' (or 'cv') option. Producing the non-overlapping top-scoring alignments is very slow compared to inferring the repeats during subsequent analysis. 'l' instead of 'a' (alignment only option) yields alignment scripts file in alternative format.


Back to Top of Document

Help

There is a help page which can be accessed throughout the REPRO steps.

Questions and comments regarding the interface to Repro should be sent to Richard George and questions regarding the program Repro should be sent to Jaap Heringa.

Back to Top of Document


References:

  1. Heringa J. and Argos P. (1993) A method to recognize distant repeats in protein sequences. Proteins Struct. Func. Genet. 17, 391-411.
  2. Heringa J. (1999) Two strategies for sequence comparison: profile-preprocessed and secondary structure-induced multiple alignment. Comp. Chem. 23, 341-364.
  3. George RA. and Heringa J. (2000) The REPRO server: finding protein internal sequence repeats through the web. Trends Biochem. Sci. 25, 515-517.


Back to Top of Document

Back to REPRO

(c) IBIVU 2025. If you are experiencing problems with the site, please contact the webmaster.