Each Answer to this Q is separated by one/two green lines.
Check if multiple strings exist in another string
I am trying to find out if there is a nice and clean way to test for 3 different strings.
Basically I am looping trough a file using a
for loop; then I have to check if it contains 1 of the 3 strings that I have set in a list.
So far I have found the multiple if condition check, but it does not feel like is really elegant and efficient:
for line in file if "string1" in line or "string2" in line or "string3" in line: print "found the string"
I was thinking like creating a list that contains
string3, and check if any of these is contained in the line, but it doesn’t seems that i can just compare the list without explicitly loop trough the list, and in that case I am basically in the same conditions as in the multiple if statement that I wrote above.
Is there any smart way to check against multiple strings without writing long if statements or loop trough the elements of a list?
strings = ("string1", "string2", "string3") for line in file: if any(s in line for s in strings): print "yay!"
This still loops through the cartesian product of the two lists, but it does it one line:
>>> lines1 = ['soup', 'butter', 'venison'] >>> lines2 = ['prune', 'rye', 'turkey'] >>> search_strings = ['a', 'b', 'c'] >>> any(s in l for l in lines1 for s in search_strings) True >>> any(s in l for l in lines2 for s in search_strings) False
This also have the advantage that
any short-circuits, and so the looping stops as soon as a match is found. Also, this only finds the first occurrence of a string from
linesX. If you want to find multiple occurrences you could do something like this:
>>> lines3 = ['corn', 'butter', 'apples'] >>> [(s, l) for l in lines3 for s in search_strings if s in l] [('c', 'corn'), ('b', 'butter'), ('a', 'apples')]
If you feel like coding something more complex, it seems the Aho-Corasick algorithm can test for the presence of multiple substrings in a given input string. (Thanks to Niklas B. for pointing that out.) I still think it would result in quadratic performance for your use-case since you’ll still have to call it multiple times to search multiple lines. However, it would beat the above (cubic, on average) algorithm.