I'm using a simple system to check if some banned words are currently in a string but I'd like to improve it to display the word in question so I added the line
print ("BANNED WORD DETECTED : ", word)
But I get the error
"NameError: name 'word' is not defined"
If think that the problem is that my system is just checking if any of the words is in the list without "storing it" somewhere, maybe I am misunderstanding the python list system, any advice of what I should modify ?
# -*- coding: utf-8 -*-
bannedWords = ['word1','word2','check']
mystring = "the string i'd like to check"
if any(word in mystring for word in bannedWords):
print ("BANNED WORD DETECTED : ", word)
else :
print (mystring)
any() isn't suitable for this, use a generator expression with next() instead or a list comprehension:
banned_word = next((word for word in mystring.split() if word in bannedWords), None)
if banned_word is not None:
print("BANNED WORD DETECTED : ", word)
Or for multiple words:
banned_words = [word for word in mystring.split() if word in bannedWords]
if banned_words:
print("BANNED WORD DETECTED : ", ','.join(banned_words))
For improved O(1) membership testing, make bannedWords a set rather than a list
Don't use any here. A generator isn't the right tool either. You actually want a list comprehension to collect all the matching words.
matching = [word for word in bannedWords if word in mystring]
if matching:
print ("BANNED WORD(S) DETECTED : ", ','.join(matching))
you could do that very easily :-
Just take a referance variable to check if it found anything within the loop.
detected = False ;
for word in bannedWords:
if word in mystring :
detected = True ;
print("Detected Banned word " ,word) ;
if not detected:
print(mystring ) ;
If you want a more pythonic way :-
print("Banned words are {}".format([word for word in bannedWords if word in mystring]) if len([word for word in bannedWords if word in mystring]) else mystring) ;
Related
I would like to check against a list of words if they are within a string.
For Example:
listofwords = ['hi','bye','yes','no']
String = 'Hi how are you'
string2 = 'I have none of the words'
String 1 is true as it contains 'hi' and string2 is false as it does not.
I have tried the following code but it always returns false.
if any(ext in String for ext in listofwords):
print(String)
I would also like to show what the matching word was to check this is correct.
hi and Hi are different words. Use .lower before comparing.
if any(ext.lower() in String.lower() for ext in listofwords):
print(String)
Update:
to print matching word use for loop to iterate and print words that match.
Example:
listofwords = ['hi','bye','yes','no']
String = 'Hi how are you'
string2 = 'I have none of the words'
for word in listofwords:
if word.lower() in map(str.lower,String.split()): # map both of the words to lowercase before matching
print(word)
for word in listofwords:
if word.lower() in map(str.lower,string2.split()): # map both of the words to lowercase before matching
print(word)
PS: Not the optimized version. You can store String.split results in a list and then start iterating that will save time for larger strings. But purpose of the code is to demonstrate use of lower case.
Python is case sensitive. Hence hi is not equal to Hi. This works:
listofwords = ['hi','bye','yes','no']
String = 'hi how are you'
string2 = 'I have none of the words'
if any(ext in String for ext in listofwords):
print(String)
The problem is both with case-sensitivity and with using in directly with a string.
If you want to make your search case-insensitive, consider converting both the String and the word to lower case, also, you should split the string after lower casing it, if you want to properly search for words:
if any(ext.lower() in String.lower().split() for ext in listofwords):
print(String)
Splitting avoids returning True for strings like no in none and only works if no (or any other word) is present on its own. So now the above will work for both String (it will print it) and for string2 (it will not print it).
I am having issue where I want to compare a word inside a text - meaning if there is a word that contains inside a text it should print out.
The issue is that I am having let say I have a word that is "lo" - and my text is = "hello guys, my name is Stackoverflow" - it will print out that whole text there is a lo inside this text which are inside "hello" and "stackoverflow"
my question is how can I make whenever I want to search for a word such as "lo" it should take it as a word and not print out if it etc. contains inside a word such as "hello" or "stackoverflow" - Only print out if it has the word "lo"?
keywords = ["Lo"]
for text in keywords:
if text in text_roman():
print("Yay found word")
Split up the string into words then test for the substring in each of the words.
For word in s.split():
If q in word:
Print word
You could do this but there are like... 400 edge cases that will make this a problem.
text = "This is my text"
keywords = ["Lo"]
if len(set(text.split()).intersection(set(keywords))) > 0:
print("Yes")
use string.find() . It returns the index of the substring you're looking for, and -1 if not found. So you can apply if statement to check whether it is a substring or not.
s='Hello there Stack!'
if (s.find('llo')!=-1):
print('String found')
Hope this helped!
The most straightforward way is probably to use a regex. You can play around with regexes here and figure out how to implement them in python here.
import re
target_strings = ["lo", "stack", "hell", "cow", "hello", "overf"]
for target in target_strings:
re_target = re.compile(r"\b({})\b".format(target), flags=re.IGNORECASE)
if re.search(re_target, "Hello stack overflow lo"):
print(target)
>>> lo
>>> stack
>>> hello
I have created a list of all possible outcomes for this specific wordgrid, doing diagonals,up,down and all the reverses too):
I have called this allWords, but when I try too find specific words I know are in the allWords the loop does not find the Hidden words. I know my problem but I do know how to go around it (sorry for terrible explanation hopefully an example below will show it better):
an Example follows: My wordList is the list of words that I know are hidden somewhere in the wordgrid. My allWords is a list of Rows,Columns,Diagonals from the wordgrid but
WordList = ['HAMMER','....']
allWords = ['ARBHAMMERTYU','...']
that HAMMER is in allWords but 'cloaked' by other characters after it so I am unable to show HAMMER is in the wordgrid.
length = len(allWords)
for i in range(length):
word = allWords[i]
if word in wordList:
print("I have found", word)
it does not find any word HAMMER in allWords.
Any help towards solving this problem would be great
You are not comparing each word in wordList to a word in allWords. The line if word in wordList compares the exact word.
i.e.
if word in wordList will return True only if the word Hammer is in wordList.
To match substring you need another loop:
for i in range(length):
word = allWords[i]
for w in WordList:
if w in word:
print("I have found ", word)
If I understand your problem correctly, you probably need to implement a function that checks if a token (e.g. 'HAMMER') is present in any of the entries in allWords. My best bet for solution would be to use regular expressions.
import re
def findWordInWordList(word, allWords):
pattern = re.compile(".*%s.*" % word)
for item in allWords:
match = pattern.search(item)
if match:
return match
This will return first occurence, if you want more then it's easy to collect them in a list.
You could try something like this:
for word in allWords:
if word in WordList:
print("I have found", word)
Ah, or maybe the error is that you wrote wordList and you really defined WordList. Hope this helps.
If I understand correctly, you are trying to find a match inside allWords and you want to iterate over WordList and determine if there is a substring match.
So, if that is correct, then your code is not exactly doing that. To go through your code step by step to correct what is happening:
length = len(allWords)
for i in range(length):
What you want to do above is not necessarily go over your allWords. You want to iterate over WordList and see if it is inside allWords. You are not doing that, instead you want to do this:
length = len(WordList)
for i in range(length):
With that in mind, that means now you want to reference WordList and not allWords, so you want to now change this:
word = allWords[i]
to this:
word = WordList[i]
Finally, here comes a new bit of information to determine if you in fact have a substring match in the strings you are matching. A method called "any". The "any" method works by returning True if at least one match of what you are looking for is found. It looks like this:
any(if "something" in word in word for words)
Then it will return True if it "something" is in word otherwise it will return False.
So, to put this all together, and run your code with your sample input, we get:
WordList = ['HAMMER','....']
allWords = ['ARBHAMMERTYU','...']
length = len(WordList)
for i in range(length):
word = WordList[i]
if any(word in w for w in allWords):
print("I have found", word)
Output:
I have found HAMMER
I'm trying to store the value of word, is there a way to do this?
if any(word in currentFile for word in otherFile):
Don't use any if you want the words themselves:
words = [word for word in otherFile if word in currentFile]
Then you can truth-test directly (since an empty list is falsy):
if words:
# do stuff
And also access the words that matched:
print words
EDIT: If you only want the first matching word, you can do that too:
word = next((word for word in otherFile if word in currentFile), None)
if word:
# do stuff with word
Just a little follow-up:
You should consider what is an input to any() function here. Input is a generator. So let's break it down:
word in currentFile is a boolean expression - output value is True or False
for word in otherFile performs an iteration over otherFile
So the output of any() argument would be in fact generator of boolean values. You can check it by simply executing [word in currentFile for word in otherFile]. Note that brackets means that a list would be created, with all values computed at once. Generator works functionally the same (if what you do is a single loop over all values), but are better memory-wise. The point is - what you feed to any() is a list of booleans. It has no knowledge about actual words - therefore it cannot possibly output one.
No. You'll have to write explicit loop:
def find_first(currentFile, otherFile)
for word in currentFile:
if word in otherFile:
return word
If no match is found, function would implicitly return None which may be handled by a caller outside of find_first() function.
You're not going to be able to store this value directly from any. I'd recommend a for-loop
for word in otherFile:
if word in currentFile:
break
else:
word = None
if word is not None:
print word, "was found in the current file"
Note that this will store only the first relevant value of word. If you would like all relevant values of word, then this should do it:
words = [word for word in otherFile if word in currentFile]
for word in words:
print word, "was found in the current file"
You can get the first word from otherFile that is also in currentFile by dropping all words from otherFile that are not in currentFile and then taking the next one:
from itertools import dropwhile
word = next(dropwhile(lambda word: word not in currentFile, otherfile))
If there is no such word, this raises StopIteration.
You can get all words from otherFile that are also in currentFile by using a list comprehension:
words = [word for word in otherFile if word in currentFile]
Or by using a set intersection:
words = list(set(otherFile) & set(currentFile))
Or by using the filter function:
words = filter(lambda word: word in currentFile, otherFile)
I have a list of stopwords. And I have a search string. I want to remove the words from the string.
As an example:
stopwords=['what','who','is','a','at','is','he']
query='What is hello'
Now the code should strip 'What' and 'is'. However in my case it strips 'a', as well as 'at'. I have given my code below. What could I be doing wrong?
for word in stopwords:
if word in query:
print word
query=query.replace(word,"")
If the input query is "What is Hello", I get the output as:
wht s llo
Why does this happen?
This is one way to do it:
query = 'What is hello'
stopwords = ['what', 'who', 'is', 'a', 'at', 'is', 'he']
querywords = query.split()
resultwords = [word for word in querywords if word.lower() not in stopwords]
result = ' '.join(resultwords)
print(result)
I noticed that you want to also remove a word if its lower-case variant is in the list, so I've added a call to lower() in the condition check.
the accepted answer works when provided a list of words separated by spaces, but that's not the case in real life when there can be punctuation to separate the words. In that case re.split is required.
Also, testing against stopwords as a set makes lookup faster (even if there's a tradeoff between string hashing & lookup when there's a small number of words)
My proposal:
import re
query = 'What is hello? Says Who?'
stopwords = {'what','who','is','a','at','is','he'}
resultwords = [word for word in re.split("\W+",query) if word.lower() not in stopwords]
print(resultwords)
output (as list of words):
['hello','Says','']
There's a blank string in the end, because re.split annoyingly issues blank fields, that needs filtering out. 2 solutions here:
resultwords = [word for word in re.split("\W+",query) if word and word.lower() not in stopwords] # filter out empty words
or add empty string to the list of stopwords :)
stopwords = {'what','who','is','a','at','is','he',''}
now the code prints:
['hello','Says']
building on what karthikr said, try
' '.join(filter(lambda x: x.lower() not in stopwords, query.split()))
explanation:
query.split() #splits variable query on character ' ', e.i. "What is hello" -> ["What","is","hello"]
filter(func,iterable) #takes in a function and an iterable (list/string/etc..) and
# filters it based on the function which will take in one item at
# a time and return true.false
lambda x: x.lower() not in stopwords # anonymous function that takes in variable,
# converts it to lower case, and returns true if
# the word is not in the iterable stopwords
' '.join(iterable) #joins all items of the iterable (items must be strings/chars)
#using the string/char in front of the dot, i.e. ' ' as a joiner.
# i.e. ["What", "is","hello"] -> "What is hello"
Looking at the other answers to your question I noticed that they told you how to do what you are trying to do, but they did not answer the question you posed at the end.
If the input query is "What is Hello", I get the output as:
wht s llo
Why does this happen?
This happens because .replace() replaces the substring you give it exactly.
for example:
"My, my! Hello my friendly mystery".replace("my", "")
gives:
>>> "My, ! Hello friendly stery"
.replace() is essentially splitting the string by the substring given as the first parameter and joining it back together with the second parameter.
"hello".replace("he", "je")
is logically similar to:
"je".join("hello".split("he"))
If you were still wanting to use .replace to remove whole words you might think adding a space before and after would be enough, but this leaves out words at the beginning and end of the string as well as punctuated versions of the substring.
"My, my! hello my friendly mystery".replace(" my ", " ")
>>> "My, my! hello friendly mystery"
"My, my! hello my friendly mystery".replace(" my", "")
>>> "My,! hello friendlystery"
"My, my! hello my friendly mystery".replace("my ", "")
>>> "My, my! hello friendly mystery"
Additionally, adding spaces before and after will not catch duplicates as it has already processed the first sub-string and will ignore it in favor of continuing on:
"hello my my friend".replace(" my ", " ")
>>> "hello my friend"
For these reasons your accepted answer by Robby Cornelissen is the recommended way to do what you are wanting.
" ".join([x for x in query.split() if x not in stopwords])
stopwords=['for','or','to']
p='Asking for help, clarification, or responding to other answers.'
for i in stopwords:
n=p.replace(i,'')
p=n
print(p)