It is possible to make the Skip Search algorithm linear using the two shift tables of Morris-Pratt and Knuth-Morris-Pratt.
For 1 i
m, mpNext[i] is equal to the length of the longest border of x[0 .. i-1] and mpNext[0]=-1.
For 1 i < m, kmpNext[i] is equal to length of the longest border of x[0 .. i-1] followed by a character different from x[i], kmpNext[0]=-1 and kmpNext[m]=m-per(x).
The lists in the buckets are explicitly stored in a table list.
The preprocessing phase of the KmpSkip Search algorithm is in O(m+) time and space complexity.
![]() | j is the current text position; |
![]() | x[i] = y[j]; |
![]() | start = j-i is the possible starting position of an occurrence of x in y; |
![]() | wall is the rightmost scanned text position; |
![]() | x[0 .. wall-start-1] = y[start .. wall-1]; |
Figure 30.1: General situation during the searching phase of the linear algorithm.
The comparisons are performed from left to right between x[wall-start .. m-1] and y[wall .. start+m-1] until a mismatch or a whole match occurs. Let k wall-start be the smallest integer such thatx[k]
y[start+k] or k = m if an occurrence of x starts at position start in y.
Then wall takes the value of start+k.
After that the algorithm KmpSkip computes two shifts (two new starting positions): the first one according to the skip algorithm (see algorithm AdvanceSkip for details), this gives us a starting position skipStart, the second one according to the shift table of Knuth-Morris-Pratt, which gives us another starting position kmpStart.
![]() | skipStart < kmpStart then a shift according to the skip algorithm is applied which gives a new value for skipStart, and we have to compare again skipStart and kmpStart; |
![]() | kmpStart < skipStart < wall then a shift according to the shift table of Morris-Pratt is applied. This gives a new value for kmpStart. We have to compare again skipStart and kmpStart; |
![]() | skipStart = kmpStart then another attempt can be performed with start = skipStart; |
![]() | kmpStart < wall < skipStart then another attempt can be performed with start = skipStart. |
The searching phase of the KmpSkip Search algorithm is in O(n) time.
Main features:
improvement of the Skip Search algorithm;
uses buckets of positions for each character of the alphabet;
preprocessing phase in O(m+) time and space complexity;
searching phase in O(n) time complexity.
Example:
Preprocessing phase
Tables used by KMP Skip Search algorithm.
Searching phase
G | C | A | T | C | G | C | A | G | A | G | A | G | T | A | T | A | C | A | G | T | A | C | G |
2 | 1 | ||||||||||||||||||||||
G | C | A | G | A | G | A | G | ||||||||||||||||
G | C | A | T | C | G | C | A | G | A | G | A | G | T | A | T | A | C | A | G | T | A | C | G |
1 | - | ||||||||||||||||||||||
G | C | A | G | A | G | A | G | ||||||||||||||||
G | C | A | T | C | G | C | A | G | A | G | A | G | T | A | T | A | C | A | G | T | A | C | G |
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | ||||||||||||||||
G | C | A | G | A | G | A | G |
Shift by: 8
G | C | A | T | C | G | C | A | G | A | G | A | G | T | A | T | A | C | A | G | T | A | C | G |
1 |
Shift by: 8
G | C | A | T | C | G | C | A | G | A | G | A | G | T | A | T | A | C | A | G | T | A | C | G |
2 | 1 | ||||||||||||||||||||||
G | C | A | G | A | G | A | G |
The KMP Skip Search algorithm performs 14 character comparisons on the example.