Protein Motif Search

Find patterns in any protein sequence — no regex knowledge required. Build motifs step-by-step or enter advanced syntax directly.

1 — Protein Sequence
No sequence loaded
2 — Motif
PROSITE Regex
What is a protein motif? (Tutorial)
What is a protein motif?

A protein motif is a short, conserved pattern of amino acids that recurs in many different proteins and is associated with a specific biological function or structural feature.

Motifs are much shorter than full protein domains (typically 4–20 residues) and can be separated by stretches of unrelated sequence — what matters is the pattern of key residues, not every amino acid in between.

Example: the motif HExxH appears in hundreds of enzymes. The two histidines (H) and glutamate (E) coordinate a zinc ion at the active site. The two middle positions (xx) can be almost any amino acid — what matters is that the spacing is exactly 2.

Why search for motifs?

Motif searching is useful for:

  • Predicting enzyme function — e.g. finding HExxH in an uncharacterised protein suggests it might be a metalloprotease.
  • Identifying binding sites — e.g. CxxC motifs often coordinate zinc or form disulfide bonds.
  • Finding post-translational modification sites — e.g. N-x-S/T is the core N-glycosylation sequon.
  • Scanning newly sequenced proteins — before expensive experimental characterisation.
  • Comparative analysis — checking whether a motif is conserved across homologs.

Important: A motif hit is a hypothesis, not a proof. Finding HExxH does not guarantee zinc-binding or protease activity — you should always interpret motif results alongside domain annotations, structural data, and experimental evidence.

Exact vs flexible motifs

An exact motif requires a specific sequence at every position. Example: searching for KDEL exactly. This is useful for very conserved signals like ER retention sequences.

A flexible motif specifies key residues but allows variable spacers. Example: HExxH requires H, E, exactly 2 any residues, then H — but does not constrain those 2 middle positions.

Flexibility is important because:

  • Evolution tolerates substitutions at non-critical positions
  • Loops between functional residues vary in length
  • The same chemical function can be achieved with different surrounding sequences
Worked example: HExxH metalloprotease motif

The zinc metalloprotease motif is one of the most studied in biochemistry. It appears in neprilysin, thermolysin, angiotensin-converting enzyme (ACE), and hundreds of other enzymes.

H E x x H │ │ │ │ └── 2nd zinc-ligand histidine │ │ └──┘──── two ANY residues (the spacer) │ └────────── glutamate (catalytic base) └───────────── 1st zinc-ligand histidine

In PROSITE syntax: H-E-x(2)-H

In neprilysin (Peptidase_M13 family), the full zinc-binding motif is extended: the two histidines in HExxH plus a downstream glutamate about 20–80 residues away form a three-ligand zinc coordination shell:

H-E-x(2)-H-x(20,80)-E

The downstream E is the third zinc ligand. Together these three residues (H, H, E) hold the catalytic zinc in place. Searching for both parts together gives a much more specific hit than HExxH alone.

How to build this in the Visual Builder:

  1. Add residue: H
  2. Add spacing: directly adjacent (0 residues)
  3. Add residue: E
  4. Add spacing: exactly 2 residues
  5. Add residue: H
  6. Add spacing: between 20 and 80 residues
  7. Add residue: E
Worked example: R…S…E flexible pattern

Sometimes you want to search for residues at variable distances — for example a catalytic triad where R, S, and E must all be present somewhere in the protein, in order, but not necessarily adjacent.

R-x(0,200)-S-x(0,200)-E

This means: find R, then 0 to 200 any residues, then S, then 0 to 200 any residues, then E. The "any number" spacers make this very flexible.

Be careful with very wide spacers. Motifs like R-x(0,200)-S-x(0,200)-E will match almost any protein that contains all three residues in order, which is most proteins. Use wide spacers only when you have strong biological reason to believe the residues are functionally coupled despite the distance.

Worked example: N-x-S/T glycosylation sequon

N-linked glycosylation occurs on asparagine (N) residues within the sequence N-x-S/T, where x is any amino acid except proline, and S/T means serine or threonine.

N-x(1)-[ST] ← finds all N-x-S and N-x-T

In the Visual Builder:

  1. Add residue: N
  2. Add spacing: exactly 1 residue
  3. Add residue: [ST] (type the brackets to mean "S or T")

This search will find all potential N-glycosylation sites. Note that in biology, proline at the x position prevents glycosylation — this search does not exclude proline, so you may get some false positives at N-P-S/T sites.

How to use the Visual Builder
  1. Click "+ Add residue" in the builder track.
  2. A small panel appears. Click "Residue" to add an amino acid (type single letter like H, or a group like [ST]), or "Spacing" to add a variable-length gap.
  3. For spacing, choose: exactly N, between N and M, any number, or one or more.
  4. Click Add to insert the segment into the builder track.
  5. Repeat for each element in your motif.
  6. Click the small × on any segment to remove it.
  7. The PROSITE and Regex previews update live so you can see what pattern you're building.
Advanced syntax reference

In Advanced mode you can type directly. Two syntax flavours are supported:

PROSITE-like (dash-separated tokens):

H Single amino acid
x Any single amino acid
x(2) Exactly 2 any residues
x(20,80) Between 20 and 80 any residues
[ST] S or T (residue group)
{P} Anything except P (exclusion group)

Regex-like (no dashes):

. Any single amino acid
.{2} Exactly 2 any residues
.{20,80} Between 20 and 80 any residues
[ST] Residue group (same as PROSITE)
Understanding the results

Residue numbering in the results table uses 1-based indexing: residue 1 is the first amino acid in the sequence you entered (regardless of whether it has a UniProt or other biological number).

Start = position of the first matched residue (1-based).
End = position of the last matched residue (1-based, inclusive).
Length = End − Start + 1 = total span of the motif hit.

The sequence viewer shows the full protein with match regions highlighted. Different colour highlights represent different motif queries (if you run multiple searches). Click any row in the results table to scroll the viewer to that position.

Motif hits are not proof of function. A sequence match shows that the pattern is present, not that the protein actually performs the associated function. False positives are common, especially with short or flexible motifs. Always interpret results in context: check domain annotations, conservation, and experimental data.