NOTE: This page is best viewed with any browser BUT internet explorer (i.e. mozilla, firefox, opera, netscape, etc.).
Mclip
Mclip uses local alignments of all input sequences to determine motifs. This is done in a set of consecutive steps:
1:
  -For each sequence all local alignment to the other sequences are calculated.
  -By mapping the local alignment traces back on to the sequences, profiles are generated containing information on how many residues were aligned at each position and which residues/gaps were present how many times.
2:
  -Each of these profiles is then aligned locally to all other profiles
  (Optional: Iteration)
  -By mapping the profile-profile local alignment traces back on to the profiles, the residue frequencies can be modified. This has the effect of making profiles more similar if any local alignment between them was found.
  -Then the profile-profile alignment is repeated
  (end Iteration)
  -This generates a set of Profile-Profile traces, some of which may overlap in one or more of the profiles.
3:
  -Cliques of overlapping profile-profile traces are determined. "Linker" traces (i.e. traces linking two or more cliques together) are split into the bits corresponding to the cliques and added. Motifs are then derived from these sets of traces.
4:
  -The generated motifs are compared against the input sequences and relevant hits are returned.
In Contrast to most other motif detection tools, Mclip uses a "gapped" alignment approach to determine the motifs. While this makes the individual alignment steps more time consuming and complicated, it has the potential to find very short motifs if they occur in close proximity of one another in a number of sequences but with a variable number of residues between them. In addition, the gapped approach also creates more flexibility in regards to the actual start and end of motifs (i.e. endgaps).
Find a list of the possible parameters below and a short explanation of what they do.
Parameter
Value
Info
-Infile
Filename
The path to the file containing the set of sequences to be searched for shared motifs (FASTA).
-Complement
True/False (def: true)
Should only the sequences provided be analyzed, or also their reverse complements?
-Endgaps
True/False (def: true)
Should endgaps be allowed (i.e. should the motifs be matched gloabbly (no engaps, false) or locally (endgaps, true))
-E-value
>=0 (def: 1e-3)
Up to what E-value should motif hits be returned.
-Minlength
>=1 (def: 10)
Minimal length of a motif.
-Minmotifnum
>=1 (def: 2)
The minimal number of times a motif has to be recovered in the set of input sequences.
-Mincoverage
>=0 (def: 0.75)
How much of the motif must a valid hit cover?
-Matchvals
"estimate" or float[] (in order AA,AC,AG,AT,CC,CG,CT,GG,GT,TT)
Either estimate the match scores for each residue from the input data, or use values specified by the user. NOTE: If values are provided, then the parameter "frequencies" also needs to be specified. If "estimate" is used then both of these are calculated from the input sequences.
-Frequencies
float[] in order A,C,G,T
If Match values were provided, this tells the program what relative residue frequencies it should use to calculate scores.
-Gapopen
float (def: -5)
The penalty for opening a gap in the sequence-sequence local alignments.
-Gapextend
float (def: -4)
The penalty for extending a gap in the sequence-sequence local alignments.
-Minalnlength
int (def: 7)
Only local alignments of length greater than minalnlength are used to generate the profiles.
-Minalnscore
float (def:7)
Only local alignments with scores greater than minalnscore are used to generate the profile.
-Minprofilescore
float (def: 7)
Only profile traces with scores greater than minprofilescore are used to generate the motifs.
-Minprofilenum
int (def:2)
Cliques with less than minprofilenum members are discarded.
-Iterate
int (def: 0)
How many round of iteration should be performed (the optional updating of profiles based on the profile-profile matches).
-Pseudocounts
float (def: 0.5)
The amount of pseudocounts to add to each profile prior to profile-profile comparisons (if pseudocounts==0.7 and at position X, 13 residues generated the residue frequencies for profile Y, then each residue frequency is changed to "(original-frequency*13+0.7)/13.7". Pseudocounts are required to avoid getting P-values of "0" (i.e. impossible) when aligning two profile columns sharing no common residues. Addign pseudocounts changes the probability from "impossible" to "unlikely", depending on the amount of pseudocounts added and the number of residues from which each profile column was generated. (Note: The pseudocounts are NOT used when updating the profiles and/or deriving motifs from the various cliques as that would water down the signal)
For any questions, comments, additional feature requests, etc. contact Tancred.frickey at rsbs.anu.edu.au