python - Delete item from list if it contains a substring from a "blacklist" -



python - Delete item from list if it contains a substring from a "blacklist" -

in python, i'd remove list string contains substring found in called "blacklist".

for example, assume list following:

a = [ 'cat', 'doxxxg', 'monkey', 'hobbbrse', 'fish', 'snake']

and list b is:

b = ['xxx', 'bbb']

how list c:

c = [ 'cat', 'monkey', 'fish', 'snake']

i've played around various combinations of regex expressions , list comprehensions can't seem work.

you bring together blacklist 1 expression:

import re blacklist = re.compile('|'.join([re.escape(word) word in b]))

then filter words out if match:

c = [word word in if not blacklist.search(word)]

words in pattern escaped (so . , other meta characters not treated such, literal characters instead), , joined series of | alternatives:

>>> '|'.join([re.escape(word) word in b]) 'xxx|bbb'

demo:

>>> import re >>> = [ 'cat', 'doxxxg', 'monkey', 'hobbbrse', 'fish', 'snake'] >>> b = ['xxx', 'bbb'] >>> blacklist = re.compile('|'.join([re.escape(word) word in b])) >>> [word word in if not blacklist.search(word)] ['cat', 'monkey', 'fish', 'snake']

this should outperform explicit membership testing, number of words in blacklist grows:

>>> import string, random, timeit >>> def regex_filter(words, blacklist): ... [word word in if not blacklist.search(word)] ... >>> def any_filter(words, blacklist): ... [word word in if not any(bad in word bad in b)] ... >>> words = [''.join([random.choice(string.letters) _ in range(random.randint(3, 20))]) ... _ in range(1000)] >>> blacklist = [''.join([random.choice(string.letters) _ in range(random.randint(2, 5))]) ... _ in range(10)] >>> timeit.timeit('any_filter(words, blacklist)', 'from __main__ import any_filter, words, blacklist', number=100000) 0.36232495307922363 >>> timeit.timeit('regex_filter(words, blacklist)', "from __main__ import re, regex_filter, words, blacklist; blacklist = re.compile('|'.join([re.escape(word) word in blacklist]))", number=100000) 0.2499098777770996

the above tests 10 random blacklisted short words (2 - 5 characters) against list of 1000 random words (3 - 20 characters long), regex 50% faster.

python regex string list-comprehension

Comments

Popular posts from this blog

formatting - SAS SQL Datepart function returning odd values -

c++ - Apple Mach-O Linker Error(Duplicate Symbols For Architecture armv7) -

php - Yii 2: Unable to find a class into the extension 'yii2-admin' -