Name
  • String
  • Substring
  • Search
Edit

The preprocessing phase of the Alpha Skip Search algorithm consists in building a trie T(x) of all the factors of the length ell=logsigmam occurring in the word x. The leaves of T(x) represent all the factors of length ell of x. There is then one bucket for each leaf of T(x) in which is stored the list of positions where the factor, associated to the leaf, occurs in x.

The worst case time of this preprocessing phase is linear if the alphabet size is considered to be a constant.

The searching phase consists in looking into the buckets of the text factors y[j .. j+ell-1] for all j = k.(m-ell+1)-1 with the integer k in the interval y[1,lfloor(n-ell) / mrfloor].

The worst case time complexity of the searching phase is quadratic but the expected number of text character comparisons is O(logsigma(m).(n / (m-logsigma(m)))).


Main features:

improvement of the Skip Search algorithm;

uses buckets of positions for each factor of length logsigma(m) of the pattern;

preprocessing phase in O(m) time and space complexity;

searching phase in O(mn) time complexity;

O(logsigma(m).(n / (m-logsigma(m)))) expected text character comparisons.


Example:

Preprocessing phase

Alpha skip search Z table
Z table used by Alpha Skip Search algorithm.

Searching phase

First attempt
GCATCGCAGAGAGTATACAGTACG
 123 
 
GCATCGCAGAGAGTATACAGTACG
 ---45678 
 GCAGAGAG 

Shift by: 6

Second attempt
GCATCGCAGAGAGTATACAGTACG
 123 

Shift by: 6

Third attempt
GCATCGCAGAGAGTATACAGTACG
 123 
 
GCATCGCAGAGAGTATACAGTACG
 1--- 
 GCAGAGAG

The Alpha Skip Search algorithm performs 18 character comparisons on the example.

C