Input help of RJPrimers
- Repeat databases
-
To date, RJPrimers includes several major repeat libraries for plant and animal genomes, RepBase, MIPS-REdat, 15 TIGR plant repeat databases maize transposable element database (maize TEDB) and Triticeae repeat database (TREP). RepBase is a well organized and annotated plant and animal repeat database, and is a default library for the RepeatMasker program. MIPS-REdat, using MIPS Repeat Element Catalog (mips-REcat), a systematic hierarchical tree structure of repeat classifications, was compiled from several major plant repeat libraries. However, all those repeat databases have inconsistent classification terms and are not able to be directly used to clearly distinguish repeat junction types. To unify the repeat classifications as a controlled vocabulary of terms, we used the unified classification system proposed by Wicker et al., which includes levels of class, subclass, order, superfamily, family and subfamily. Because many repeat sequences have no clear annotations in subclass, family and subfamily, we just used three levels, class, order and superfamilly to describe the repeat junction types. All databases are recompiled based on headlines of sequences in a specific format for RJPrimers. Those recompiled repeat databases can also be downloaded together with RJPrimers pipeline.
A user can choose one or more repeat databses. However, some databases are a subset of repeat sequences, for example, TIGR gramineae repeats includes sorghum, rice, barley. So please don't choose both graminease repeat databases and it's component databases at a time to avoid database redundant.
- Input sequences in FASTA format
-
Sequences in FASTA format can be taken as input by either a copy-and-paste to the sequence text box or uploading a sequence file. RJPrimers takes DNA sequences and any unrecognized characters in the sequences are automatically removed. To balance the workload on the server, a limit of 200 sequences is set on the RJPrimers web server. In RJPrimers, BAC end sequence, random shotgun sequneces or next generation sequences/reads (Roche 454) are uasable for TE-based primer design.
Sample sequences are procided for test. Those sequences are Roche 454 reads of the diploid ancestor of hexaploid wheat, Aegilops tauschii. Please choose TREP, TIGR Triticum Repeats, or TIGR Gramineae Repeats as a repeat database.
- Minimum E value cutoff
- An expectation value cutoff value for any BLAST hit. Two expectation values (E-value) cutoffs are used to reduce false positive repeat junctions. The first one is the minimum top hit E-value cutoff which means that for a query sequence only if the top BLAST hit is less than or equal to this cutoff, this query sequence is considered significantly positive. The second E-value cutoff is the minimum E-value. For a query sequence, if the query sequence is significantly positive, all hits meeting this minimum E-value cutoff are significantly positive. Based on our observations, 1E-50 for the minimum top hit E-value cutoff and 1E-5 for the minimum E-value are reasonable values. The two E-value parameters can effectively detect "distant homologs" of known TE families, specifically the diverse regions between two LTRs. Usually a complete LTR element has two highly conserved LTRs on the both ends, but their interior sequence fragment are usually diverse and is not able to be detected at a high E-value stringency. In fact, only the repeat junctions located at the left side of the left LTR and at the right side of the right LTR are true positive. The junction sites between Left LTR and the interior fragment and between the interior fragment and right LTR must be false positive. If the interior sequence fragments are not able to be detected, the false positive junctions will be found.
- Minimum E value cutoff
- An expectation value cutoff value for a top BLAST hit. Two expectation values (E-value) cutoffs are used to reduce false positive repeat junctions. The first one is the minimum top hit E-value cutoff which means that for a query sequence only if the top BLAST hit is less than or equal to this cutoff, this query sequence is considered significantly positive. The second E-value cutoff is the minimum E-value. For a query sequence, if the query sequence is significantly positive, all hits meeting this minimum E-value cutoff are significantly positive. Based on our observations, 1E-50 for the minimum top hit E-value cutoff and 1E-5 for the minimum E-value are reasonable values. The two E-value parameters can effectively detect "distant homologs" of known TE families, specifically the diverse regions between two LTRs. Usually a complete LTR element has two highly conserved LTRs on the both ends, but their interior sequence fragment are usually diverse and is not able to be detected at a high E-value stringency. In fact, only the repeat junctions located at the left side of the left LTR and at the right side of the right LTR are true positive. The junction sites between Left LTR and the interior fragment and between the interior fragment and right LTR must be false positive. If the interior sequence fragments are not able to be detected, the false positive junctions will be found.
- Transposable elements used in primer design
- RBIP and IRAP initially use only class I repeats, retrotransposons for marker development. In RJPrimers, those two methods are able to be extended to use all classes of repeat elements. But users have options to use retrotransposons ony or all Transposal elements.
- Sequence Id
- An identifier that is reproduced in the output to enable you to identify the chosen primers. This is disabled in batch primer mode. You need to use FASTA format to provide sequence ids in the file or sequences.
- E-mail address:
- An e-mail address can be optionally provided. The primer design report will be sent to the user if an email address is available. This is specifically for batch primer design of a large volume of sequences, which could take more than 5 minutes. In BatchRJPrimers, an asynchronous mode with email alert was implemented. The job of primer design is executed in a separate thread. After the primer design job is finished, an email of result report will be sent to the user.
- Targets
- If one or more Targets is specified then a legal primer pair must
flank at least one of them. A Target might be a simple sequence
repeat site (for example a CA repeat) or a single-base-pair
polymorphism. The value should be a space-separated list of
start,length
pairs where start is the index of the first base of a Target, and length is its length. - Excluded Regions
- Primer oligos may not overlap any region specified in this tag.
The associated value must be a space-separated list of
start,length
pairs where start is the index of the first base of the excluded region, and length is its length. This tag is useful for tasks such as excluding regions of low sequence quality or for excluding regions containing repetitive elements such as ALUs or LINEs. - Product Size
- Minimum, Optimum, and Maximum lengths (in bases) of the PCR product. RJPrimers will not generate primers with products shorter than Min or longer than Max, and with default arguments RJPrimers will attempt to pick primers producing products close to the Optimum length.
- Number To Return
- The maximum number of primer pairs to return. Primer pairs returned are sorted by their "quality", in other words by the value of the objective function (where a lower number indicates a better primer pair). Caution: setting this parameter to a large value will increase running time.
- Max 3' Stability
- The maximum stability for the five 3' bases of a left or right primer. Bigger numbers mean more stable 3' ends. The value is the maximum delta G for duplex disruption for the five 3' bases as calculated using the nearest neighbor parameters published in Breslauer, Frank, Bloeker and Marky, Proc. Natl. Acad. Sci. USA, vol 83, pp 3746-3750. Rychlik recommends a maximum value of 9 (Wojciech Rychlik, "Selection of Primers for Polymerase Chain Reaction" in BA White, Ed., "Methods in Molecular Biology, Vol. 15: PCR Protocols: Current Methods and Applications", 1993, pp 31-40, Humana Press, Totowa NJ).
- Max Mispriming
- The maximum allowed weighted similarity with any sequence in Mispriming Library. Default is 12.
- Pair Max Mispriming
- The maximum allowed sum of weighted similarities of a primer pair (one similarity for each primer) with any single sequence in Mispriming Library. Default is 24.
- Primer Size
- Minimum, Optimum, and Maximum lengths (in bases) of a primer oligo. RJPrimers will not pick primers shorter than Min or longer than Max, and with default arguments will attempt to pick primers close with size close to Opt. Min cannot be smaller than 1. Max cannot be larger than 36. (This limit is governed by maximum oligo size for which melting-temperature calculations are valid.) Min cannot be greater than Max.
- Primer Tm
- Minimum, Optimum, and Maximum melting temperatures (Celsius)
for a primer oligo. RJPrimers will not pick oligos with temperatures
smaller than Min or larger than Max, and with default conditions
will try to pick primers with melting temperatures close to Opt.
RJPrimers uses the oligo melting temperature formula given in Rychlik, Spencer and Rhoads, Nucleic Acids Research, vol 18, num 12, pp 6409-6412 and Breslauer, Frank, Bloeker and Marky, Proc. Natl. Acad. Sci. USA, vol 83, pp 3746-3750. Please refer to the former paper for background discussion.
- Maximum Tm Difference
- Maximum acceptable (unsigned) difference between the melting temperatures of the left and right primers. To increase primer specificity, 5 is suggested.
- Product Tm
- The minimum, optimum, and maximum melting temperature of the
amplicon. RJPrimers will not pick a product with melting
temperature less than min or greater than max. If Opt is supplied
and the Penalty Weights for Product
Size are non-0 RJPrimers will attempt to pick an amplicon with
melting temperature close to Opt.
RJPrimers calculates product melting temperature using equation (iii) from Rychlik, Spencer and Rhoads, Nucleic Acids Research 18:21 pg. 6410.
- Primer GC%
Minimum, Optimum, and Maximum percentage of Gs and Cs in any primer.
- Max Complementarity
- The maximum allowable local alignment score when testing a single
primer for (local) self-complementarity and the maximum allowable
local alignment score when testing for complementarity between
left and right primers. Local self-complementarity is taken to
predict the tendency of primers to anneal to each other without
necessarily causing self-priming in the PCR. The scoring system
gives 1.00 for complementary bases, -0.25 for a match of any base
(or N) with an N, -1.00 for a mismatch, and -2.00 for a gap.
Only single-base-pair gaps are allowed. For example, the
alignment
5' ATCGNA 3' || | | 3' TA-CGT 5'
is allowed (and yields a score of 1.75), but the alignment5' ATCCGNA 3' || | | 3' TA--CGT 5'
is not considered. Scores are non-negative, and a score of 0.00 indicates that there is no reasonable local alignment between two oligos. - Max 3' Complementarity
- The maximum allowable 3'-anchored global alignment score when
testing a single primer for self-complementarity, and the maximum
allowable 3'-anchored global alignment score when testing for
complementarity between left and right primers. The 3'-anchored
global alignment score is taken to predict the likelihood of
PCR-priming primer-dimers, for example
5' ATGCCCTAGCTTCCGGATG 3' ||| ||||| 3' AAGTCCTACATTTAGCCTAGT 5'or5` AGGCTATGGGCCTCGCGA 3' |||||| 3' AGCGCTCCGGGTATCGGA 5'The scoring system is as for the Max Complementarity argument. In the examples above the scores are 7.00 and 6.00 respectively. Scores are non-negative, and a score of 0.00 indicates that there is no reasonable 3'-anchored global alignment between two oligos. In order to estimate 3'-anchored global alignments for candidate primers and primer pairs, Primer assumes that the sequence from which to choose primers is presented 5'->3'. It is nonsensical to provide a larger value for this parameter than for the Maximum (local) Complementarity parameter because the score of a local alignment will always be at least as great as the score of a global alignment. - Max Poly-X
- The maximum allowable length of a mononucleotide repeat, for example AAAAAA. For next generation sequencing, a popular sequencing error is a homopolymer error. To avoid this error, the max poly-X should be restricted a small number, e.g., 2 or 3.
- Included Region
- A sub-region of the given sequence in which to pick primers. For
example, often the first dozen or so bases of a sequence are
vector, and should be excluded from consideration. The value for
this parameter has the form
start,length
where start is the index of the first base to consider, and length is the number of subsequent bases in the primer-picking region. - Start Codon Position
- This parameter should be considered EXPERIMENTAL at this point. Please check the output carefully; some erroneous inputs might cause an error in RJPrimers. Index of the first base of a start codon. This parameter allows RJPrimers to select primer pairs to create in-frame amplicons e.g. to create a template for a fusion protein. RJPrimers will attempt to select an in-frame left primer, ideally starting at or to the left of the start codon, or to the right if necessary. Negative values of this parameter are legal if the actual start codon is to the left of available sequence. If this parameter is non-negative RJPrimers signals an error if the codon at the position specified by this parameter is not an ATG. A value less than or equal to -10^6 indicates that RJPrimers should ignore this parameter. RJPrimers selects the position of the right primer by scanning right from the left primer for a stop codon. Ideally the right primer will end at or after the stop codon.
- Mispriming Library
- This selection indicates what mispriming library (if any) RJPrimers should use to screen for interspersed repeats or for other sequence to avoid as a location for primers.
- CG Clamp
- Require the specified number of consecutive Gs and Cs at the 3' end of both the left and right primer. (This parameter has no effect on the hybridization oligo if one is requested.)
- Salt Concentration
- The millimolar concentration of salt (usually KCl) in the PCR. RJPrimers uses this argument to calculate oligo melting temperatures.
- Annealing Oligo Concentration
- The nanomolar concentration of annealing oligos in the PCR. RJPrimers uses this argument to calculate oligo melting temperatures. The default (50nM) works well with the standard protocol used at the Whitehead/MIT Center for Genome Research--0.5 microliters of 20 micromolar concentration for each primer oligo in a 20 microliter reaction with 10 nanograms template, 0.025 units/microliter Taq polymerase in 0.1 mM each dNTP, 1.5mM MgCl2, 50mM KCl, 10mM Tris-HCL (pH 9.3) using 35 cycles with an annealing temperature of 56 degrees Celsius. This parameter corresponds to 'c' in Rychlik, Spencer and Rhoads' equation (ii) (Nucleic Acids Research, vol 18, num 12) where a suitable value (for a lower initial concentration of template) is "empirically determined". The value of this parameter is less than the actual concentration of oligos in the reaction because it is the concentration of annealing oligos, which in turn depends on the amount of template (including PCR product) in a given cycle. This concentration increases a great deal during a PCR; fortunately PCR seems quite robust for a variety of oligo melting temperatures.
- Max Ns Accepted
- Maximum number of unknown bases (N) allowable in any primer. Nomally this value should be 0.
- Liberal Base
- This parameter provides a quick-and-dirty way to get RJPrimers to accept IUB / IUPAC codes for ambiguous bases (i.e. by changing all unrecognized bases to N). If you wish to include an ambiguous base in an oligo, you must set Max Ns Accepted to a non-0 value. Perhaps '-' and '* ' should be squeezed out rather than changed to 'N', but currently they simply get converted to N's. The authors invite user comments.
- First Base Index
- The index of the first base in the input sequence. For input and output using 1-based indexing (such as that used in GenBank and to which many users are accustomed) set this parameter to 1. For input and output using 0-based indexing set this parameter to 0. (This parameter also affects the indexes in the contents of the files produced when the primer file flag is set.) In the WWW interface this parameter defaults to 1.
- Inside Target Penalty
- Non-default values valid only for sequences with 0 or 1 target regions. If the primer is part of a pair that spans a target and overlaps the target, then multiply this value times the number of nucleotide positions by which the primer overlaps the (unique) target to get the 'position penalty'. The effect of this parameter is to allow RJPrimers to include overlap with the target as a term in the objective function.
- Outside Target Penalty
- Non-default values valid only for sequences with 0 or 1 target regions. If the primer is part of a pair that spans a target and does not overlap the target, then multiply this value times the number of nucleotide positions from the 3' end to the (unique) target to get the 'position penalty'. The effect of this parameter is to allow RJPrimers to include nearness to the target as a term in the objective function.
- Show Debuging Info
- Include the input to primer3_core as part of the output.