Protein sequence data
There is some value in doing some initial analysis on your protein sequence. If
a protein has come (for example) directly from a gene prediction, it may consist of
multiple domains. More seriously, it may contain regions that are unlikely to
be globular, or soluble. This flowchart assumes that your protein is soluble,
likely comprises a single domain, and does not contain non-globular regions.
Things to consider are:
- Is your protein a transmembrane protein, or does it contain transmembrane segments?
There are many methods for predicting these segments, including:
- Does your protein contain coiled-coils?
You can predict coiled coils at the
COILS server or you can
download the COILS program
(recently re-written by me of all people;
note that a version of COILS is contained within the GCG suite of programs).
- Does your protein contain regions of low complexity?
Proteins frequently contain runs of poly-glutamine or poly-serine, which do
not predict well. To check for this you can use the program
SEG
(a version of SEG is also contained within the GCG suite of programs).
If the answer to any of the above questions is yes, then it is worthwhile trying to
break your sequence into pieces, or ignore particular sections of the sequence, etc. This
is related to the problem of
locating domains.
Next Sequence database searching
Back to the Flowchart