Name
  • String
  • Substring
  • Search
Edit

Smith noticed that computing the shift with the text character just next the rightmost text character of the window gives sometimes shorter shift than using the rightmost text character of the window.
He advised then to take the maximum between the two values.

The preprocessing phase of the Smith algorithm consists in computing the bad-character shift function (see chapter Boyer-Moore algorithm) and the Quick Search bad-character shift function (see chapter Quick Search algorithm).

The preprocessing phase is in O(m+sigma) time and O(sigma) space complexity.

The searching phase of the Smith algorithm has a quadratic worst case time complexity.


Main features:

  • takes the maximum of the Horspool bad-character shift function and the Quick Search bad-character shift function;
  • preprocessing phase in O(m+sigma) time and O(sigma) space complexity;
  • searching phase in O(mn) time complexity.

Example:

Preprocessing phase

Smith algorithm tables
bmBc and qsBc tables used by Smith algorithm.

Searching phase

First attempt
GCATCGCAGAGAGTATACAGTACG
1234 
GCAGAGAG 

Shift by: 1 (bmBc[A]=qsBc[G])

Second attempt
GCATCGCAGAGAGTATACAGTACG
 1 
 GCAGAGAG 

Shift by: 2 (bmBc[G]=qsBc[A])

Third attempt
GCATCGCAGAGAGTATACAGTACG
 1 
 GCAGAGAG 

Shift by: 2 (bmBc[G]=qsBc[A])

Fourth attempt
GCATCGCAGAGAGTATACAGTACG
 12345678 
 GCAGAGAG 

Shift by: 9 (qsBc[T])

Fifth attempt
GCATCGCAGAGAGTATACAGTACG
 1 
 GCAGAGAG 

Shift by: 7 (qsBc[C])

The Smith algorithm performs 15 character comparisons on the example.

C