python - Delete item from list if it contains a substring from a "blacklist" -
python - Delete item from list if it contains a substring from a "blacklist" -
in python, i'd remove list string contains substring found in called "blacklist".
for example, assume list following:
a = [ 'cat', 'doxxxg', 'monkey', 'hobbbrse', 'fish', 'snake']
and list b is:
b = ['xxx', 'bbb']
how list c:
c = [ 'cat', 'monkey', 'fish', 'snake']
i've played around various combinations of regex expressions , list comprehensions can't seem work.
you bring together blacklist 1 expression:
import re blacklist = re.compile('|'.join([re.escape(word) word in b]))
then filter words out if match:
c = [word word in if not blacklist.search(word)]
words in pattern escaped (so .
, other meta characters not treated such, literal characters instead), , joined series of |
alternatives:
>>> '|'.join([re.escape(word) word in b]) 'xxx|bbb'
demo:
>>> import re >>> = [ 'cat', 'doxxxg', 'monkey', 'hobbbrse', 'fish', 'snake'] >>> b = ['xxx', 'bbb'] >>> blacklist = re.compile('|'.join([re.escape(word) word in b])) >>> [word word in if not blacklist.search(word)] ['cat', 'monkey', 'fish', 'snake']
this should outperform explicit membership testing, number of words in blacklist grows:
>>> import string, random, timeit >>> def regex_filter(words, blacklist): ... [word word in if not blacklist.search(word)] ... >>> def any_filter(words, blacklist): ... [word word in if not any(bad in word bad in b)] ... >>> words = [''.join([random.choice(string.letters) _ in range(random.randint(3, 20))]) ... _ in range(1000)] >>> blacklist = [''.join([random.choice(string.letters) _ in range(random.randint(2, 5))]) ... _ in range(10)] >>> timeit.timeit('any_filter(words, blacklist)', 'from __main__ import any_filter, words, blacklist', number=100000) 0.36232495307922363 >>> timeit.timeit('regex_filter(words, blacklist)', "from __main__ import re, regex_filter, words, blacklist; blacklist = re.compile('|'.join([re.escape(word) word in blacklist]))", number=100000) 0.2499098777770996
the above tests 10 random blacklisted short words (2 - 5 characters) against list of 1000 random words (3 - 20 characters long), regex 50% faster.
python regex string list-comprehension
Comments
Post a Comment