Check if text within a list is withing a string - python

I would like to check against a list of words if they are within a string.
For Example:
listofwords = ['hi','bye','yes','no']
String = 'Hi how are you'
string2 = 'I have none of the words'
String 1 is true as it contains 'hi' and string2 is false as it does not.
I have tried the following code but it always returns false.
if any(ext in String for ext in listofwords):
print(String)
I would also like to show what the matching word was to check this is correct.

hi and Hi are different words. Use .lower before comparing.
if any(ext.lower() in String.lower() for ext in listofwords):
print(String)
Update:
to print matching word use for loop to iterate and print words that match.
Example:
listofwords = ['hi','bye','yes','no']
String = 'Hi how are you'
string2 = 'I have none of the words'
for word in listofwords:
if word.lower() in map(str.lower,String.split()): # map both of the words to lowercase before matching
print(word)
for word in listofwords:
if word.lower() in map(str.lower,string2.split()): # map both of the words to lowercase before matching
print(word)
PS: Not the optimized version. You can store String.split results in a list and then start iterating that will save time for larger strings. But purpose of the code is to demonstrate use of lower case.

Python is case sensitive. Hence hi is not equal to Hi. This works:
listofwords = ['hi','bye','yes','no']
String = 'hi how are you'
string2 = 'I have none of the words'
if any(ext in String for ext in listofwords):
print(String)

The problem is both with case-sensitivity and with using in directly with a string.
If you want to make your search case-insensitive, consider converting both the String and the word to lower case, also, you should split the string after lower casing it, if you want to properly search for words:
if any(ext.lower() in String.lower().split() for ext in listofwords):
print(String)
Splitting avoids returning True for strings like no in none and only works if no (or any other word) is present on its own. So now the above will work for both String (it will print it) and for string2 (it will not print it).

Related

Convert titlecase words in the string to lowercase words

I want to convert all the titlecase words (words starting with uppercase character and having rest of the characters as lowercase) in the string to the lowercase characters. For example, if my initial string is:
text = " ALL people ARE Great"
I want my resultant string to be:
"ALL people ARE great"
I tried the following but it did not work
text = text.split()
for i in text:
if i in [word for word in a if not word.islower() and not word.isupper()]:
text[i]= text[i].lower()
I also checked related question Check if string is upper, lower, or mixed case in Python.. I want to iterate over my dataframe and for each word that meet this criteria.
You could define your transform function
def transform(s):
if len(s) == 1 and s.isupper():
return s.lower()
if s[0].isupper() and s[1:].islower():
return s.lower()
return s
text = " ALL people ARE Great"
final_text = " ".join([transform(word) for word in text.split()])
You can use str.istitle() to check whether your word represents the titlecased string, i.e. whether first character of the word is uppercase and rest are lowercase.
For getting your desired result, you need to:
Convert your string to list of words using str.split()
Do the transformation you need using str.istitle() and str.lower() (I am using list comprehension for iterating the list and for generating a new list of words in desired format)
Join back the list to strings using str.join() as:
For example:
>>> text = " ALL people ARE Great"
>>> ' '.join([word.lower() if word.istitle() else word for word in text.split()])
'ALL people ARE great'

How to replace multiple strings with one string with downcase in python?

I need to write a function which replaces multiple format strings into downcase.
For example, a paragraph contains a word 'something' in different formats like 'Something', 'SomeThing', 'SOMETHING', 'SomeTHing' need to convert all format words into downcase 'something'.
How to write a function with replacing with downcase?
You can split your paragraph into different words, then use the slugify module to generate a slug of each word, compare it with "something", and if there is a match, replace the word with "something".
In [1]: text = "This paragraph contains Something, SOMETHING, AND SomeTHing"
In [2]: from slugify import slugify
In [3]: for word in text.split(" "): # Split the text using space, and iterate through the words
...: if slugify(unicode(word)) == "something": # Compare the word slug with "something"
...: text = text.replace(word, word.lower())
In [4]: text
Out[4]: 'This paragraph contains something, something AND something'
Split the text into single words and check whether a word in written in lower case is "something". If yes, then change the case to lower
if word.lower() == "something":
text = text.replace(word, "something")
To know how to split a text into words, see this question.
Another way is to iterate through single letters and check whether a letter is the first letter of "something":
text = "Many words: SoMeThInG, SOMEthING, someTHing"
for n in range(len(text)-8):
if text[n:n+9].lower() == "something": # check whether "something" is here
text = text.replace(text[n:n+9], "something")
print text
You can also use re.findall to search and split the paragraph into words and punctuation, and replace all the different cases of "Something" with the lowercase version:
import re
text = "Something, Is: SoMeThInG, SOMEthING, someTHing."
to_replace = "something"
words_punct = re.findall(r"[\w']+|[.,!?;: ]", text)
new_text = "".join(to_replace if x.lower() == to_replace else x for x in words_punct)
print(new_text)
Which outputs:
something, Is: something, something, something.
Note: re.findall requires a hardcoded regular expression to search for contents in a string. Your actual text may contain characters that are not in the regular expression above, you will need to add these as needed.

Display element of a list detected by "if any"

I'm using a simple system to check if some banned words are currently in a string but I'd like to improve it to display the word in question so I added the line
print ("BANNED WORD DETECTED : ", word)
But I get the error
"NameError: name 'word' is not defined"
If think that the problem is that my system is just checking if any of the words is in the list without "storing it" somewhere, maybe I am misunderstanding the python list system, any advice of what I should modify ?
# -*- coding: utf-8 -*-
bannedWords = ['word1','word2','check']
mystring = "the string i'd like to check"
if any(word in mystring for word in bannedWords):
print ("BANNED WORD DETECTED : ", word)
else :
print (mystring)
any() isn't suitable for this, use a generator expression with next() instead or a list comprehension:
banned_word = next((word for word in mystring.split() if word in bannedWords), None)
if banned_word is not None:
print("BANNED WORD DETECTED : ", word)
Or for multiple words:
banned_words = [word for word in mystring.split() if word in bannedWords]
if banned_words:
print("BANNED WORD DETECTED : ", ','.join(banned_words))
For improved O(1) membership testing, make bannedWords a set rather than a list
Don't use any here. A generator isn't the right tool either. You actually want a list comprehension to collect all the matching words.
matching = [word for word in bannedWords if word in mystring]
if matching:
print ("BANNED WORD(S) DETECTED : ", ','.join(matching))
you could do that very easily :-
Just take a referance variable to check if it found anything within the loop.
detected = False ;
for word in bannedWords:
if word in mystring :
detected = True ;
print("Detected Banned word " ,word) ;
if not detected:
print(mystring ) ;
If you want a more pythonic way :-
print("Banned words are {}".format([word for word in bannedWords if word in mystring]) if len([word for word in bannedWords if word in mystring]) else mystring) ;

Python word in file change

I am trying to change the words that are nouns in a text to "noun".
I am having trouble. Here is what I have so far.
def noun(file):
for word in file:
for ch in word:
if ch[-1:-3] == "ion" or ch[-1:-3] == "ism" or ch[-1:-3] == "ity":
word = "noun"
if file(word-1) == "the" and (file(word+1)=="of" or file(word+1) == "on"
word = "noun"
# words that appear after the
return outfile
Any ideas?
Your slices are empty:
>>> 'somethingion'[-1:-3]
''
because the endpoint lies before the start. You could just use [-3:] here:
>>> 'somethingion'[-3:]
'ion'
But you'd be better of using str.endswith() instead:
ch.endswith(("ion", "ism", "ity"))
The function will return True if the string ends with any of the 3 given strings.
Not that ch is actually a word; if word is a string, then for ch in word iterates over individual characters, and those are never going to end in 3-character strings, being only one character long themselves.
Your attempts to look at the next and previous words are also going to fail; you cannot use a list or file object as a callable, let alone use file(word - 1) as a meaningful expression (a string - 1 fails, as well as file(...)).
Instead of looping over the 'word', you could use a regular expression here:
import re
nouns = re.compile(r'(?<=\bthe\b)(\s*\w+(?:ion|ism|ity)\s*)(?=\b(?:of|on)\b)')
some_text = nouns.sub(' noun ', some_text)
This looks for words ending in your three substrings, but only if preceded by the and followed by of or on and replaces those with noun.
Demo:
>>> import re
>>> nouns = re.compile(r'(?<=\bthe\b)(\s*\w+(?:ion|ism|ity)\s*)(?=\b(?:of|on)\b)')
>>> nouns.sub(' noun ', 'the scion on the prism of doom')
'the noun on the noun of doom'

Removing list of words from a string

I have a list of stopwords. And I have a search string. I want to remove the words from the string.
As an example:
stopwords=['what','who','is','a','at','is','he']
query='What is hello'
Now the code should strip 'What' and 'is'. However in my case it strips 'a', as well as 'at'. I have given my code below. What could I be doing wrong?
for word in stopwords:
if word in query:
print word
query=query.replace(word,"")
If the input query is "What is Hello", I get the output as:
wht s llo
Why does this happen?
This is one way to do it:
query = 'What is hello'
stopwords = ['what', 'who', 'is', 'a', 'at', 'is', 'he']
querywords = query.split()
resultwords = [word for word in querywords if word.lower() not in stopwords]
result = ' '.join(resultwords)
print(result)
I noticed that you want to also remove a word if its lower-case variant is in the list, so I've added a call to lower() in the condition check.
the accepted answer works when provided a list of words separated by spaces, but that's not the case in real life when there can be punctuation to separate the words. In that case re.split is required.
Also, testing against stopwords as a set makes lookup faster (even if there's a tradeoff between string hashing & lookup when there's a small number of words)
My proposal:
import re
query = 'What is hello? Says Who?'
stopwords = {'what','who','is','a','at','is','he'}
resultwords = [word for word in re.split("\W+",query) if word.lower() not in stopwords]
print(resultwords)
output (as list of words):
['hello','Says','']
There's a blank string in the end, because re.split annoyingly issues blank fields, that needs filtering out. 2 solutions here:
resultwords = [word for word in re.split("\W+",query) if word and word.lower() not in stopwords] # filter out empty words
or add empty string to the list of stopwords :)
stopwords = {'what','who','is','a','at','is','he',''}
now the code prints:
['hello','Says']
building on what karthikr said, try
' '.join(filter(lambda x: x.lower() not in stopwords, query.split()))
explanation:
query.split() #splits variable query on character ' ', e.i. "What is hello" -> ["What","is","hello"]
filter(func,iterable) #takes in a function and an iterable (list/string/etc..) and
# filters it based on the function which will take in one item at
# a time and return true.false
lambda x: x.lower() not in stopwords # anonymous function that takes in variable,
# converts it to lower case, and returns true if
# the word is not in the iterable stopwords
' '.join(iterable) #joins all items of the iterable (items must be strings/chars)
#using the string/char in front of the dot, i.e. ' ' as a joiner.
# i.e. ["What", "is","hello"] -> "What is hello"
Looking at the other answers to your question I noticed that they told you how to do what you are trying to do, but they did not answer the question you posed at the end.
If the input query is "What is Hello", I get the output as:
wht s llo
Why does this happen?
This happens because .replace() replaces the substring you give it exactly.
for example:
"My, my! Hello my friendly mystery".replace("my", "")
gives:
>>> "My, ! Hello friendly stery"
.replace() is essentially splitting the string by the substring given as the first parameter and joining it back together with the second parameter.
"hello".replace("he", "je")
is logically similar to:
"je".join("hello".split("he"))
If you were still wanting to use .replace to remove whole words you might think adding a space before and after would be enough, but this leaves out words at the beginning and end of the string as well as punctuated versions of the substring.
"My, my! hello my friendly mystery".replace(" my ", " ")
>>> "My, my! hello friendly mystery"
"My, my! hello my friendly mystery".replace(" my", "")
>>> "My,! hello friendlystery"
"My, my! hello my friendly mystery".replace("my ", "")
>>> "My, my! hello friendly mystery"
Additionally, adding spaces before and after will not catch duplicates as it has already processed the first sub-string and will ignore it in favor of continuing on:
"hello my my friend".replace(" my ", " ")
>>> "hello my friend"
For these reasons your accepted answer by Robby Cornelissen is the recommended way to do what you are wanting.
" ".join([x for x in query.split() if x not in stopwords])
stopwords=['for','or','to']
p='Asking for help, clarification, or responding to other answers.'
for i in stopwords:
n=p.replace(i,'')
p=n
print(p)

Categories

Resources