Multiple Sequence Alignment
Regardless of the outcome of your searches, you will want a multiple sequence alignment
containing your sequence and all the homologues you have found above.
Some sites for performing multiple alignment:
If you are going to do a lot of alignments, then it is probably best to get
your own copy of one of many programs, some FTP sites for some of these are:
Note that PileUp is contained within the GCG commercial package. Most institutions with people doing
this sort of work will have access to this software, so ask around if you want to use it.
Probably the most important advance since these pages first appeared are Hidden Markov Models for
sequence alignment. Several methods are listed above.
Alignments can provide:
- Information as to protein domain structure
- The location of residues likely to be involved in protein function
- Information of residues likely to be buried in the protein core or exposed to solvent
- More information than a single sequence for applications like
homology modelling and secondary structure prediction.
Some tips
- Don't just take everything found in the searches and feed them directly into the alignment program.
Searches will almost always return matches that do not indicate a significant sequence similarity.
Look through the output carefully and throw things out if they don't appear to be a member of the
sequence family. Inclusion of non-members in your alignment will confuse things and likely lead to
errors later.
- Remember that the programs for aligning sequences aren't perfect, and do not always provide the
best alignment. This is particularly so for large families of proteins with low sequence identities. If
you can see a better way of aligning the sequences, then by all means edit the alignment manually.
Next secondary structure prediction.
Back to the Flowchart