Each Answer to this Q is separated by one/two green lines.
I am thinking of how the
in operator implement, for instance
>>> s1 = 'abcdef' >>> s2 = 'bcd' >>> s2 in s1 True
In CPython, which algorithm is used to implement the string match, and what is the time complexity? Is there any official document or wiki about this?
You can view the C code here:
Fast search/count implementation, based on a mix between Boyer-Moore and Horspool, with a few more bells and whistles on the top. For some more background, see: https://web.archive.org/web/20201107074620/http://effbot.org/zone/stringlib.htm.
From the link above:
When designing the new algorithm, I used the following constraints:
- should be faster than the current brute-force algorithm for all test cases (based on real-life code), including Jim Hugunin’s worst-case test
- small setup overhead; no dynamic allocation in the fast path (O(m) for speed, O(1) for storage)
- sublinear search behaviour in good cases (O(n/m))
- no worse than the current algorithm in worst case (O(nm))
- should work well for both 8-bit strings and 16-bit or 32-bit Unicode strings (no O(?) dependencies)
- many real-life searches should be good, very few should be worst case
- reasonably simple implementation