Name
  • String
  • Substring
  • Search
Edit

Zhu and Takaoka designed an algorithm which performs the shift by considering the bad-character shift (see chapter Boyer-Moore algorithm) for two consecutive text characters.

During the searching phase the comparisons are performed from right to left and when the window is positioned on the text factor y[j .. j+m-1] and a mismatch occurs between x[m-k] and y[j+m-k]while x[m-k+1 .. m-1]=y[j+m-k+1 .. j+m-1] the shift is performed with the bad-character shift for text characters y[j+m-2] and y[j+m-1]. The good-suffix shift table is also used to compute the shifts.

The preprocessing phase of the algorithm consists in computing for each pair of characters (ab) with ab in Sigma the rightmost occurrence of ab in x[0 .. m-2].

For a,b in SigmaztBc[ab]=k iff
 k<m-2 and x[m-k .. m-k+1]=ab and ab does not occur in x[m-k+2 .. m-2]
  or  
 k=m-1 and x[0]=b and ab does not occur in x[0 .. m-2]
  or  
 k=m and x[0] neq b and ab does not occur in x[0 .. m-2]

It also consists in computing the table bmGs (see chapter Boyer-Moore algorithm).

The preprocessing phase is in O(m+sigma2) time and space complexity. The searching phase has a quadratic worst case.


Main features:

  • variant of the Boyer-Moore algorithm;
  • uses two consecutive text characters to compute the bad-character shift;
  • preprocessing phase in O(m+2) time and space complexity;
  • searching phase in O(mn) time complexity.


Example:

Preprocessing phase

Zhu-Takaoka algorithm tables
ztBc and bmGs tables used by Zhu-Takaoka algorithm.

Searching phase

First attempt
GCATCGCAGAGAGTATACAGTACG
 1 
GCAGAGAG 

Shift by: 5 (ztBc[C][A])

Second attempt
GCATCGCAGAGAGTATACAGTACG
 87654321 
 GCAGAGAG 

Shift by: 7 (bmGs[0])

Third attempt
GCATCGCAGAGAGTATACAGTACG
 321 
 GCAGAGAG 

Shift by: 7 (bmGs[6])

The Zhu-Takaoka algorithm performs 12 character comparisons on the example.

C