Python "not in" index out of range enrror - python

I am dealing with some kind of problem and i could not find any solution.
My problem is I am controlling a value in a nested list if it is not in another list and deleting it if it is not there but in the not in line it gives me an error like index out of range.
def heroes_updater(men_pref,women):
for i in range(0,len(men_pref)-1):
for j in range(0,len(men_pref[i])-1):
if men_pref[i][j] not in women:
men_pref[i]=men_pref[i][:j]+men_pref[i][j+1:]
example men_pref:
[['Storm', 'Black Widow', 'Scarlet Witch', 'Rouge', 'Mystique', 'Jean Grey', 'Ms. Marvel', 'Gamora', 'Invisible Woman', 'Elektra'], ['Storm', 'Elektra', 'Jean Grey', 'Scarlet Witch', 'Mystique', 'Ms. Marvel', 'Gamora', 'Rouge', 'Black Widow', 'Invisible Woman'], ['Invisible Woman', 'Scarlet Witch', 'Mystique', 'Black Widow', 'Ms. Marvel', 'Elektra', 'Jean Grey', 'Gamora', 'Storm', 'Rouge']]
example women:
['Jean Grey', 'Elektra', 'Mystique', 'Ms. Marvel', 'Rouge']
And the error is :
if men_pref[i][j] not in women:
IndexError: list index out of range

By removing elements from your list, the list gets shorter j is larger than the length of the list. To circumvent this problem, just don't alter the lists, but create a new one:
def heroes_updater(men_pref,women):
result = []
for prefs in men_pref:
new_prefs = []
for pref in prefs:
if pref in women:
new_prefs.append(pref)
result.append(new_prefs)
men_pref[:] = result
or better:
def filter_non_heroes(men_pref,women):
return [
[pref for pref in prefs if pref in women]
for prefs in men_pref
]

You're editing the list you're reading, you must never do that.
With the line men_pref[i]=men_pref[i][:j]+men_pref[i][j+1:]
you're removing an item from the list men_pref[i], but your j variable goes from 0 to the original lenght of the list, so you'll eventually have an index error when you check for men_pref[i][j] if j>len(men_pref[i])
EDIT:
Of, if you want to edit your current list, then you'll have to read it with a backwards index (you start from the last, if it's not on the list of women, you remove it and then continue with the next item):
def heroes_updater(men_pref,women):
for i in range(len(men_pref)-1, -1,-1):
for j in range(len(men_pref[i])-1, -1, -1):
if men_pref[i][j] not in women:
men_pref[i].pop(j)
# An alternative: del(mem_pref[i][j])
Another way would be to use list comprehension:
def heroes_updater(men_pref,women):
for i in range(len(men_pref)-1, -1,-1):
mem_pref[i] = [_w_ for _w_ in mem_pref[i] if _w_ in women]
There are other options, but I'll leave that to you.
That's how you learn.

You can use a set with a list comp, you cannot iterate over and mutate a list as each time you remove an element the list gets smaller and your index is based on what the size of the list when you started the range:
men = [['Storm', 'Black Widow', 'Scarlet Witch', 'Rouge', 'Mystique', 'Jean Grey', 'Ms. Marvel', 'Gamora', 'Invisible Woman', 'Elektra'], ['Storm', 'Elektra', 'Jean Grey', 'Scarlet Witch', 'Mystique', 'Ms. Marvel', 'Gamora', 'Rouge', 'Black Widow', 'Invisible Woman'], ['Invisible Woman', 'Scarlet Witch', 'Mystique', 'Black Widow', 'Ms. Marvel', 'Elektra', 'Jean Grey', 'Gamora', 'Storm', 'Rouge']]
wom = {'Jean Grey', 'Elektra', 'Mystique', 'Ms. Marvel', 'Rouge'}
men[:] = [[ele for ele in sub if ele in wom] for sub in men]
print(men)
Or functionally if order is irrelevant:
men = [['Storm', 'Black Widow', 'Scarlet Witch', 'Rouge', 'Mystique', 'Jean Grey', 'Ms. Marvel', 'Gamora', 'Invisible Woman', 'Elektra'], ['Storm', 'Elektra', 'Jean Grey', 'Scarlet Witch', 'Mystique', 'Ms. Marvel', 'Gamora', 'Rouge', 'Black Widow', 'Invisible Woman'], ['Invisible Woman', 'Scarlet Witch', 'Mystique', 'Black Widow', 'Ms. Marvel', 'Elektra', 'Jean Grey', 'Gamora', 'Storm', 'Rouge']]
wom = {'Jean Grey', 'Elektra', 'Mystique', 'Ms. Marvel', 'Rouge'}
men[:] = map(list,map(wom.intersection, men))
print(men)
You could also start from the end of the list, using your range logic but using range(len(sub)-1,-1, -1) but it is easier to just to use reversed and iterate over the elements themselves:
def heroes_updater(men_pref, women):
for sub in men_pref:
for m in reversed(sub):
if m not in women:
sub.remove(m)

Related

torch/lib/libgomp-d22c30c5.so.1: cannot allocate memory in static TLS block

I installed YoloV5 on my jetson nano. I wanted to execute my object detection code when this error appeared: python3.8/site-packages/torch/lib/libgomp-d22c30c5.so.1: cannot allocate memory in static TLS block.
To fix the problem I tried to put in the bashrc:
export LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libgomp.so.1
It didn't work
Do you have another idea?
Here is my code:
import cv2
import numpy as np
from elements.yolo import OBJ_DETECTION
Object_classes = ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light',
'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear',
'hair drier', 'toothbrush' ]
Object_colors = list(np.random.rand(80,3)*255)
Object_detector = OBJ_DETECTION('weights/yolov5s.pt', Object_classes)
def gstreamer_pipeline(
capture_width=1280,
capture_height=720,
display_width=1280,
display_height=720,
framerate=60,
flip_method=0,
):
return (
"nvarguscamerasrc ! "
"video/x-raw(memory:NVMM), "
"width=(int)%d, height=(int)%d, "
"format=(string)NV12, framerate=(fraction)%d/1 ! "
"nvvidconv flip-method=%d ! "
"video/x-raw, width=(int)%d, height=(int)%d, format=(string)BGRx ! "
"videoconvert ! "
"video/x-raw, format=(string)BGR ! appsink"
% (
capture_width,
capture_height,
framerate,
flip_method,
display_width,
display_height,
)
)
print(gstreamer_pipeline(flip_method=0))
cap = cv2.VideoCapture(gstreamer_pipeline(flip_method=0), cv2.CAP_GSTREAMER)
if cap.isOpened():
window_handle = cv2.namedWindow("CSI Camera", cv2.WINDOW_AUTOSIZE)
while cv2.getWindowProperty("CSI Camera", 0) >= 0:
ret, frame = cap.read()
if ret:
objs = Object_detector.detect(frame)
for obj in objs:
label = obj['label']
score = obj['score']
[(xmin,ymin),(xmax,ymax)] = obj['bbox']
color = Object_colors[Object_classes.index(label)]
frame = cv2.rectangle(frame, (xmin,ymin), (xmax,ymax), color, 2)
frame = cv2.putText(frame, f'{label} ({str(score)})', (xmin,ymin), cv2.FONT_HERSHEY_SIMPLEX , 0.75, color, 1, cv2.LINE_AA)
cv2.imshow("CSI Camera", frame)
keyCode = cv2.waitKey(30)
if keyCode == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
else:
print("Unable to open camera")
export LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libgomp.so.1
is not likely to fix your problem.
Because your problem occurred at python3.8/site-packages/torch/lib/libgomp-d22c30c5.so.1.
So required env variable setting is below.
export LD_PRELOAD=<parent path to python3.8>/python3.8/site-packages/torch/lib/libgomp-d22c30c5.so.1
I'm sure there's parent path to python3.8 directory. You should find it and insert it into command above.
how to find it :
How do I find the location of my Python site-packages directory?
By default, it will be /usr/lib. In this case, required command is
export LD_PRELOAD=/usr/lib/python3.8/site-packages/torch/lib/libgomp-d22c30c5.so.1

if a lowercase element is in the list without changing the list to lowercase

I am trying to create a program that replaces any word entered in the input by the word that goes along with it in the dictionary. Here is the dictionary:
slang = {'phone': 'dog and bone', 'queen': 'baked bean', 'suit': 'whistle and flute', 'money': 'bees and honey', 'dead': 'brown bread', 'mate': 'china plate', 'shoes': 'dinky doos', 'telly': 'custard and jelly', 'boots': 'daisy roots', 'road': 'frog and toad', 'head': 'loaf of bread', 'soup': 'loop the loop', 'walk': 'ball and chalk', 'fork': 'roast pork', 'goal': 'sausage roll', 'stairs': 'apples and pears', 'face': 'boat race'}
And here is an example of the output this would make.
Sentence: I called the Queen on the phone
I called the baked bean on the dog and bone
I have tried to code this program, and I have gotten it to print out the output (nearly). I just don't know how to ask if the inputted words are inside the dictionary without replacing the word that is capitalised with a lowercased version.
Here's an example of my output:
Sentence: I called the Queen on the phone
i called the baked bean on the dog and bone
This is the code I have tried and I realise that this issue is arising because I am setting the sentence as lower at the beginning. I have tried to set the 'word' to lower before going into the for loop but this doesn't work either because 'word' is unknown until the for loop.
slang = {'phone': 'dog and bone', 'queen': 'baked bean', 'suit': 'whistle and flute', 'money': 'bees and honey', 'dead': 'brown bread', 'mate': 'china plate', 'shoes': 'dinky doos', 'telly': 'custard and jelly', 'boots': 'daisy roots', 'road': 'frog and toad', 'head': 'loaf of bread', 'soup': 'loop the loop', 'walk': 'ball and chalk', 'fork': 'roast pork', 'goal': 'sausage roll', 'stairs': 'apples and pears', 'face': 'boat race'}
new_sentence = []
sentence = input("Sentence: ").lower()
words_list = sentence.split()
for word in words_list:
if word in slang:
replace = slang[word]
new_sentence.append(replace.lower())
if word not in slang:
new_sentence.append(word)
separator = " "
print(separator.join(new_sentence))
Thank you so much!
Something like the below:
slang = {'phone': 'dog and bone', 'queen': 'baked bean'}
def replace_with_slang(sentence):
words = sentence.split(' ')
temp = []
for word in words:
temp.append(slang.get(word,word))
return ' '.join(temp)
print(replace_with_slang('I called the phone It was the queen '))
You can use list comprehension instead,
slang = {'phone': 'dog and bone', 'queen': 'baked bean', ...}
Sentence = "I called the baked bean on the dog and bone"
print(" ".join(slang.get(x.lower(), x) for x in Sentence.split()))
I called the baked bean on the dog and bone

sorting dictionary by numeric value

I have a dict of music genres:
tag_weight = {'industrial': '533621', 'indie': '1971962', 'metal': '1213678', 'heavy metal': '652471', 'japanese': '428102', 'pop': '1873806', 'new wave': '399507', 'black metal': '772132', 'rap': '513024', 'ambient': '1030414', 'alternative': '2059313', 'hard rock': '820796', 'electronic': '2288563', 'blues': '531045', 'folk': '882178', 'classic rock': '1123712', 'alternative rock': '1123488', '90s': '447671', 'indie rock': '850515', 'death metal': '671118', 'electronica': '614494', 'female vocalists': '1557702', 'Soundtrack': '529406', 'dance': '769039', 'funk': '399843', 'psychedelic': '458710', '80s': '751871', 'piano': '409931', 'chillout': '636088', 'post-rock': '426516', 'punk rock': '518515', 'jazz': '1117114', 'seen live': '2097509', 'instrumental': '817816', 'singer-songwriter': '810185', 'metalcore': '444383', 'hardcore': '656111', 'Hip-Hop': '814630', 'hip hop': '394989', 'Classical': '539190', 'punk': '848955', 'soul': '641095', 'british': '667559', 'thrash metal': '465163', 'Progressive metal': '407220', 'rock': '3879179', 'acoustic': '460841', 'german': '409030', 'Progressive rock': '693480', 'experimental': '1010190'}
And I would like to tag them by popularity, that is, sorting by value, from most to less popular.
since dicts are unordered by nature, I must use tuples for that, and I've been trying to use this:
sorted_dict = sorted(tag_weight.items(), key=operator.itemgetter(0), reverse=True)
but it does not seem to be working, because it returns:
[('thrash metal', '465163'), ('soul', '641095'), ('singer-songwriter', '810185'), ('seen live', '2097511'), ('rock', '3879179'), ('rap', '513024'), ('punk rock', '518515'), ('punk', '848955'), ('psychedelic', '458710'), ('post-rock', '426516'), ('pop', '1873806'), ('piano', '409931'), ('new wave', '399507'), ('metalcore', '444383'), ('metal', '1213678'), ('jazz', '1117114'), ('japanese', '428102'), ('instrumental', '817816'), ('industrial', '533621'), ('indie rock', '850515'), ('indie', '1971962'), ('hip hop', '394989'), ('heavy metal', '652471'), ('hardcore', '656111'), ('hard rock', '820796'), ('german', '409030'), ('funk', '399843'), ('folk', '882178'), ('female vocalists', '1557702'), ('experimental', '1010190'), ('electronica', '614494'), ('electronic', '2288563'), ('death metal', '671118'), ('dance', '769039'), ('classic rock', '1123712'), ('chillout', '636088'), ('british', '667559'), ('blues', '531045'), ('black metal', '772132'), ('ambient', '1030414'), ('alternative rock', '1123488'), ('alternative', '2059313'), ('acoustic', '460841'), ('Soundtrack', '529406'), ('Progressive rock', '693480'), ('Progressive metal', '407220'), ('Hip-Hop', '814630'), ('Classical', '539190'), ('90s', '447671'), ('80s', '751871')]
and I guess ('rock', '3879179') should be top on the list.
what am I doing wrong?
Use collections.Counter which is built for this purpose:
import collections
# Convert values to int
tag_weight = {k: int(v) for k, v in tag_weight.items()}
count = collections.Counter(tag_weight)
# Print the top 10
print count.most_common(10)
# Print all, from most popular to least
print count.most_common()
Output of top 10:
[('rock', 3879179), ('electronic', 2288563), ('seen live', 2097509), ('alternative', 2059313), ('indie', 1971962), ('pop', 1873806), ('female vocalists', 1557702), ('metal', 1213678), ('classic rock', 1123712), ('alternative rock', 1123488)]
You're currently sorting on keys, not values, and you also want to do a type cast to integer to avoid sorting lexicographically:
sorted(tag_weight.items(), key=lambda x: int(x[1]), reverse=True)
# ^^^^^^^^ sort on values and do a type cast

I am trying to import a set of lists in python and call a random item from one of the lists

#Set of lists I want to import into my python program called "setlist.txt"
---------------------------------------------------------------------------------------------
Tripolee = ('Saeed Younan', 'Matrixxman', 'Pete Tong', 'Dubfire', 'John Digweed', 'Carl Cox')
Ranch = ('Dabin', 'Galantis', 'Borgeous', 'Shpongle', 'ODESZA', 'Kaskade')
Sherwood = ('Nadus', 'Mr. Carmack', 'Wave Racer', 'Lido', 'Goldlink', 'Four Tet', 'Flume')
Jubilee = ('Chaz French', 'MartyParty', 'Sango', 'Brodinski', 'Phutureprimitive', 'EOTO')
The Hangar = ('Vourteque', 'The Gentlemen Callers', 'Bart&Baker', 'Jaga Jazzist', 'JPOD')
Forest = ('Vibe Street', 'Lafa Taylor', 'Vaski', 'Little People', 'jackLNDN', 'MartyParty')
---------------------------------------------------------------------------------------------
#program
from sys import exit
from random import randint
from sys import argv
script, setlist = argv
setlist = open(setlist)
print "Here is the setlist for day 1"
print setlist.read()
print "%r is playing on the Tripolee stage" % random.choice(setlist.readline(2))
I have a bunch more code in between all this that I"m not putting up here but basically that last line what I'm having trouble with.
Probably not the best format for your file but you can split and use ast.literal_eval:
from ast import literal_eval
with open("in.txt") as f:
choices = [literal_eval(line.split(" = ")[-1]) for line in f]
Which will give you a list of tuples which you can pass to random.choice:
[('Saeed Younan', 'Matrixxman', 'Pete Tong', 'Dubfire', 'John Digweed', 'Carl Cox'), ('Dabin', 'Galantis', 'Borgeous', 'Shpongle', 'ODESZA', 'Kaskade'), ('Nadus', 'Mr. Carmack', 'Wave Racer', 'Lido', 'Goldlink', 'Four Tet', 'Flume'), ('Chaz French', 'MartyParty', 'Sango', 'Brodinski', 'Phutureprimitive', 'EOTO'), ('Vourteque', 'The Gentlemen Callers', 'Bart&Baker', 'Jaga Jazzist', 'JPOD'), ('Vibe Street', 'Lafa Taylor', 'Vaski', 'Little People', 'jackLNDN', 'MartyParty')]
I have no idea where setlist is supposed to come from, you file is what looks like tuple assignments. setlist.readline(2) would read 2 bytes or actually in your case nothing as you have already exhausted the file iterator calling read.
I would suggest after extracting using literal_eval putting your file in a more usable format, maybe creating a dict using the name as the key and dumping the dict.
from ast import literal_eval
with open("in.txt") as f:
choices = {}
for line in f:
ven, tpl = line.split(" = ")
choices[ven] = literal_eval(tpl)
print(choices)
Output:
{'Jubilee': ('Chaz French', 'MartyParty', 'Sango', 'Brodinski', 'Phutureprimitive', 'EOTO'), 'Tripolee': ('Saeed Younan', 'Matrixxman', 'Pete Tong', 'Dubfire', 'John Digweed', 'Carl Cox'), 'The Hangar': ('Vourteque', 'The Gentlemen Callers', 'Bart&Baker', 'Jaga Jazzist', 'JPOD'), 'Ranch': ('Dabin', 'Galantis', 'Borgeous', 'Shpongle', 'ODESZA', 'Kaskade'), 'Sherwood': ('Nadus', 'Mr. Carmack', 'Wave Racer', 'Lido', 'Goldlink', 'Four Tet', 'Flume'), 'Forest': ('Vibe Street', 'Lafa Taylor', 'Vaski', 'Little People', 'jackLNDN', 'MartyParty')}
You can persist the dict using json.dump or the pickle module so your data will be in a lot easier format to each time.
To make it a little clearer what you have below is the content of your .txt file:
---------------------------------------------------------------------------------------------
Tripolee = ('Saeed Younan', 'Matrixxman', 'Pete Tong', 'Dubfire', 'John Digweed', 'Carl Cox')
Ranch = ('Dabin', 'Galantis', 'Borgeous', 'Shpongle', 'ODESZA', 'Kaskade')
Sherwood = ('Nadus', 'Mr. Carmack', 'Wave Racer', 'Lido', 'Goldlink', 'Four Tet', 'Flume')
Jubilee = ('Chaz French', 'MartyParty', 'Sango', 'Brodinski', 'Phutureprimitive', 'EOTO')
The Hangar = ('Vourteque', 'The Gentlemen Callers', 'Bart&Baker', 'Jaga Jazzist', 'JPOD')
Forest = ('Vibe Street', 'Lafa Taylor', 'Vaski', 'Little People', 'jackLNDN', 'MartyParty')
To print the venue and set list you can use dict.items:
for ven, set_l in choices.items():
print("Set list for {}: {}".format(ven, ", ".join(set_l)))
Output:
Set list for Jubilee: Chaz French, MartyParty, Sango, Brodinski, Phutureprimitive, EOTO
Set list for Tripolee: Saeed Younan, Matrixxman, Pete Tong, Dubfire, John Digweed, Carl Cox
Set list for The Hangar: Vourteque, The Gentlemen Callers, Bart&Baker, Jaga Jazzist, JPOD
Set list for Ranch: Dabin, Galantis, Borgeous, Shpongle, ODESZA, Kaskade
Set list for Sherwood: Nadus, Mr. Carmack, Wave Racer, Lido, Goldlink, Four Tet, Flume
Set list for Forest: Vibe Street, Lafa Taylor, Vaski, Little People, jackLNDN, MartyParty
When you open the file and call read you now have all the content in your file stored as a string. You then print the string, next you try random.choice(setlist.readline(2)), readline(2) is trying to read two bytes which it cannot even do as the file pointer is at the end of the file as you have already called read so you see an empty string outputted.
If you want to get a random string from the first tuple:
choices = [literal_eval(line.split(" = ")[-1]) for line in f]
from random import choice
print(choice(choices[0]))

What’s a good Python profanity filter library? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
Like https://stackoverflow.com/questions/1521646/best-profanity-filter, but for Python — and I’m looking for libraries I can run and control myself locally, as opposed to web services.
(And whilst it’s always great to hear your fundamental objections of principle to profanity filtering, I’m not specifically looking for them here. I know profanity filtering can’t pick up every hurtful thing being said. I know swearing, in the grand scheme of things, isn’t a particularly big issue. I know you need some human input to deal with issues of content. I’d just like to find a good library, and see what use I can make of it.)
I didn't found any Python profanity library, so I made one myself.
Parameters
filterlist
A list of regular expressions that match a forbidden word. Please do not use \b, it will be inserted depending on inside_words.
Example:
['bad', 'un\w+']
ignore_case
Default: True
Self-explanatory.
replacements
Default: "$#%-?!"
A string with characters from which the replacements strings will be randomly generated.
Examples: "%&$?!" or "-" etc.
complete
Default: True
Controls if the entire string will be replaced or if the first and last chars will be kept.
inside_words
Default: False
Controls if words are searched inside other words too. Disabling this
Module source
(examples at the end)
"""
Module that provides a class that filters profanities
"""
__author__ = "leoluk"
__version__ = '0.0.1'
import random
import re
class ProfanitiesFilter(object):
def __init__(self, filterlist, ignore_case=True, replacements="$#%-?!",
complete=True, inside_words=False):
"""
Inits the profanity filter.
filterlist -- a list of regular expressions that
matches words that are forbidden
ignore_case -- ignore capitalization
replacements -- string with characters to replace the forbidden word
complete -- completely remove the word or keep the first and last char?
inside_words -- search inside other words?
"""
self.badwords = filterlist
self.ignore_case = ignore_case
self.replacements = replacements
self.complete = complete
self.inside_words = inside_words
def _make_clean_word(self, length):
"""
Generates a random replacement string of a given length
using the chars in self.replacements.
"""
return ''.join([random.choice(self.replacements) for i in
range(length)])
def __replacer(self, match):
value = match.group()
if self.complete:
return self._make_clean_word(len(value))
else:
return value[0]+self._make_clean_word(len(value)-2)+value[-1]
def clean(self, text):
"""Cleans a string from profanity."""
regexp_insidewords = {
True: r'(%s)',
False: r'\b(%s)\b',
}
regexp = (regexp_insidewords[self.inside_words] %
'|'.join(self.badwords))
r = re.compile(regexp, re.IGNORECASE if self.ignore_case else 0)
return r.sub(self.__replacer, text)
if __name__ == '__main__':
f = ProfanitiesFilter(['bad', 'un\w+'], replacements="-")
example = "I am doing bad ungood badlike things."
print f.clean(example)
# Returns "I am doing --- ------ badlike things."
f.inside_words = True
print f.clean(example)
# Returns "I am doing --- ------ ---like things."
f.complete = False
print f.clean(example)
# Returns "I am doing b-d u----d b-dlike things."
arrBad = [
'2g1c',
'2 girls 1 cup',
'acrotomophilia',
'anal',
'anilingus',
'anus',
'arsehole',
'ass',
'asshole',
'assmunch',
'auto erotic',
'autoerotic',
'babeland',
'baby batter',
'ball gag',
'ball gravy',
'ball kicking',
'ball licking',
'ball sack',
'ball sucking',
'bangbros',
'bareback',
'barely legal',
'barenaked',
'bastardo',
'bastinado',
'bbw',
'bdsm',
'beaver cleaver',
'beaver lips',
'bestiality',
'bi curious',
'big black',
'big breasts',
'big knockers',
'big tits',
'bimbos',
'birdlock',
'bitch',
'black cock',
'blonde action',
'blonde on blonde action',
'blow j',
'blow your l',
'blue waffle',
'blumpkin',
'bollocks',
'bondage',
'boner',
'boob',
'boobs',
'booty call',
'brown showers',
'brunette action',
'bukkake',
'bulldyke',
'bullet vibe',
'bung hole',
'bunghole',
'busty',
'butt',
'buttcheeks',
'butthole',
'camel toe',
'camgirl',
'camslut',
'camwhore',
'carpet muncher',
'carpetmuncher',
'chocolate rosebuds',
'circlejerk',
'cleveland steamer',
'clit',
'clitoris',
'clover clamps',
'clusterfuck',
'cock',
'cocks',
'coprolagnia',
'coprophilia',
'cornhole',
'cum',
'cumming',
'cunnilingus',
'cunt',
'darkie',
'date rape',
'daterape',
'deep throat',
'deepthroat',
'dick',
'dildo',
'dirty pillows',
'dirty sanchez',
'dog style',
'doggie style',
'doggiestyle',
'doggy style',
'doggystyle',
'dolcett',
'domination',
'dominatrix',
'dommes',
'donkey punch',
'double dong',
'double penetration',
'dp action',
'eat my ass',
'ecchi',
'ejaculation',
'erotic',
'erotism',
'escort',
'ethical slut',
'eunuch',
'faggot',
'fecal',
'felch',
'fellatio',
'feltch',
'female squirting',
'femdom',
'figging',
'fingering',
'fisting',
'foot fetish',
'footjob',
'frotting',
'fuck',
'fucking',
'fuck buttons',
'fudge packer',
'fudgepacker',
'futanari',
'g-spot',
'gang bang',
'gay sex',
'genitals',
'giant cock',
'girl on',
'girl on top',
'girls gone wild',
'goatcx',
'goatse',
'gokkun',
'golden shower',
'goo girl',
'goodpoop',
'goregasm',
'grope',
'group sex',
'guro',
'hand job',
'handjob',
'hard core',
'hardcore',
'hentai',
'homoerotic',
'honkey',
'hooker',
'hot chick',
'how to kill',
'how to murder',
'huge fat',
'humping',
'incest',
'intercourse',
'jack off',
'jail bait',
'jailbait',
'jerk off',
'jigaboo',
'jiggaboo',
'jiggerboo',
'jizz',
'juggs',
'kike',
'kinbaku',
'kinkster',
'kinky',
'knobbing',
'leather restraint',
'leather straight jacket',
'lemon party',
'lolita',
'lovemaking',
'make me come',
'male squirting',
'masturbate',
'menage a trois',
'milf',
'missionary position',
'motherfucker',
'mound of venus',
'mr hands',
'muff diver',
'muffdiving',
'nambla',
'nawashi',
'negro',
'neonazi',
'nig nog',
'nigga',
'nigger',
'nimphomania',
'nipple',
'nipples',
'nsfw images',
'nude',
'nudity',
'nympho',
'nymphomania',
'octopussy',
'omorashi',
'one cup two girls',
'one guy one jar',
'orgasm',
'orgy',
'paedophile',
'panties',
'panty',
'pedobear',
'pedophile',
'pegging',
'penis',
'phone sex',
'piece of shit',
'piss pig',
'pissing',
'pisspig',
'playboy',
'pleasure chest',
'pole smoker',
'ponyplay',
'poof',
'poop chute',
'poopchute',
'porn',
'porno',
'pornography',
'prince albert piercing',
'pthc',
'pubes',
'pussy',
'queaf',
'raghead',
'raging boner',
'rape',
'raping',
'rapist',
'rectum',
'reverse cowgirl',
'rimjob',
'rimming',
'rosy palm',
'rosy palm and her 5 sisters',
'rusty trombone',
's&m',
'sadism',
'scat',
'schlong',
'scissoring',
'semen',
'sex',
'sexo',
'sexy',
'shaved beaver',
'shaved pussy',
'shemale',
'shibari',
'shit',
'shota',
'shrimping',
'slanteye',
'slut',
'smut',
'snatch',
'snowballing',
'sodomize',
'sodomy',
'spic',
'spooge',
'spread legs',
'strap on',
'strapon',
'strappado',
'strip club',
'style doggy',
'suck',
'sucks',
'suicide girls',
'sultry women',
'swastika',
'swinger',
'tainted love',
'taste my',
'tea bagging',
'threesome',
'throating',
'tied up',
'tight white',
'tit',
'tits',
'titties',
'titty',
'tongue in a',
'topless',
'tosser',
'towelhead',
'tranny',
'tribadism',
'tub girl',
'tubgirl',
'tushy',
'twat',
'twink',
'twinkie',
'two girls one cup',
'undressing',
'upskirt',
'urethra play',
'urophilia',
'vagina',
'venus mound',
'vibrator',
'violet blue',
'violet wand',
'vorarephilia',
'voyeur',
'vulva',
'wank',
'wet dream',
'wetback',
'white power',
'women rapping',
'wrapping men',
'wrinkled starfish',
'xx',
'xxx',
'yaoi',
'yellow showers',
'yiffy',
'zoophilia']
def profanityFilter(text):
brokenStr1 = text.split()
badWordMask = '!##$%!##$%^~!#%^~##$%!##$%^~!'
new = ''
for word in brokenStr1:
if word in arrBad:
print word + ' <--Bad word!'
text = text.replace(word,badWordMask[:len(word)])
#print new
return text
print profanityFilter("this thing sucks sucks sucks fucking stuff")
You can add or remove from the bad words list,arrBad, as you please.
WebPurify is a Profanity Filter Library for Python
You could probably combine http://spambayes.sourceforge.net/ and http://www.cs.cmu.edu/~biglou/resources/bad-words.txt.
Profanity? What the f***'s that? ;-)
It will still take a couple of years before a computer will really be able to recognize swearing and cursing and it is my sincere hope that people will have understood by then that profanity is human and not "dangerous."
Instead of a dumb filter, have a smart human moderator who can balance the tone of discussion as appropriate. A moderator who can detect abuse like:
"If you were my husband, I'd poison your tea." - "If you were my wife, I'd drink it."
(that was from Winston Churchill, btw.)
It's possible for users to work around this, of course, but it should do a fairly thorough job of removing profanity:
import re
def remove_profanity(s):
def repl(word):
m = re.match(r"(\w+)(.*)", word)
if not m:
return word
word = "Bork" if m.group(1)[0].isupper() else "bork"
word += m.group(2)
return word
return " ".join([repl(w) for w in s.split(" ")])
print remove_profanity("You just come along with me and have a good time. The Galaxy's a fun place. You'll need to have this fish in your ear.")

Categories

Resources