Name
  • String
  • Substring
  • Search
Edit

The Tuned Boyer-Moore is a implementation of a simplified version of the Boyer-Moore algorithm which is very fast in practice. The most costly part of a string-matching algorithm is to check whether the character of the pattern match the character of the window. To avoid doing this part too often, it is possible to unrolled several shifts before actually comparing the characters. The algorithm used the bad-character shift function to find x[m-1] in y and keep on shifting until finding it, doing blindly three shifts in a row. This required to save the value of bmBc[x[m-1]] in a variableshift and then to set bmBc[x[m-1]] to 0. This required also to add m occurrences of x[m-1] at the end of y. When x[m-1] is found the m-1 other characters of the window are checked and a shift of length shift is applied.

The comparisons between pattern and text characters during each attempt can be done in any order. This algorithm has a quadratic worst-case time complexity but a very good practical behaviour.


Main features:

  • simplification of the Boyer-Moore algorithm;


  • easy to implement;


  • very fast in practice.


Example:

Preprocessing phase

bmBc table
bmBc table used by Tuned Boyer-Moore algorithm.

Searching phase

First attempt
GCATCGCAGAGAGTATACAGTACG
 1 
GCAGAGAG 
 
 2 
 GCAGAGAG 
 
 3 
 GCAGAGAG 
 
 4 
 GCAGAGAG 
 
 5 
 GCAGAGAG 

Shift by: 2 (shift)

Second attempt
GCATCGCAGAGAGTATACAGTACG
 1 
 GCAGAGAG 
 
 2 
 GCAGAGAG 

Shift by: 2 (shift)

Third attempt
GCATCGCAGAGAGTATACAGTACG
 1 
 GCAGAGAG 
 
 2345678 
 GCAGAGAG 

Shift by: 2 (shift)

Fourth attempt
GCATCGCAGAGAGTATACAGTACG
 1 
 GCAGAGAG 
 
 2 
 GCAGAGAG 
 
 3
 GCAGAGAG
 
 4
 GCAGAGAG
 
 5 
 GCAGAGAG

Shift by: 2 (shift)

The Tuned Boyer Moore algorithm performs 10 character comparisons on the example.


C