Check for words in a sentence - python

I write a program in Python. The user enters a text message. It is necessary to check whether there is a sequence of words in this message. Sample. Message: "Hello world, my friend.". Check the sequence of these two words: "Hello", "world". The Result Is "True". But when checking the sequence of these words in the message: "Hello, beautiful world "the result is"false". When you need to check the presence of only two words it is possible as I did it in the code, but when combinations of 5 or more words is difficult. Is there any small solution to this problem?
s=message.text
s=s.lower()
lst = s.split()
elif "hello" in lst and "world" in lst :
if "hello" in lst:
c=lst.index("hello")
if lst[c+1]=="world" or lst[c-1]=="world":
E=True
else:
E=False

The straightforward way is to use a loop. Split your message into individual words, and then check for each of those in the sentence in general.
word_list = message.split() # this gives you a list of words to find
word_found = True
for word in word_list:
if word not in message2:
word_found = False
print(word_found)
The flag word_found is True iff all words were found in the sentence. There are many ways to make this shorter and faster, especially using the all operator, and providing the word list as an in-line expression.
word_found = all(word in message2 for word in message.split())
Now, if you need to restrict your "found" property to matching exact words, you'll need more preprocessing. The above code is too forgiving of substrings, such as finding "Are you OK ?" in the sentence "your joke is only barely funny". For the more restrictive case, you should break message2 into words, strip those words of punctuation, drop them to lower-case (to make matching easier), and then look for each word (from message) in the list of words from message2.
Can you take it from there?

I will clarify your requirement first:
ignore case
consecutive sequence
match in any order, like permutation or anagram
support duplicated words
if the number is not too large, you can try this easy-understanding but not the fastest way.
split all words in text message
join them with ' '
list all the permutation of words and join them with ' ' too, For
example, if you want to check sequence of ['Hello', 'beautiful', 'world']. The permutation will be 'Hello beautiful world',
'Hello world beautiful', 'beautiful Hello world'... and so on.
and you can just find whether there is one permutation such as
'hello beautiful world' is in it.
The sample code is here:
import itertools
import re
# permutations brute-force, O(nk!)
def checkWords(text, word_list):
# split all words without space and punctuation
text_words= re.findall(r"[\w']+", text.lower())
# list all the permutations of word_list, and match
for words in itertools.permutations(word_list):
if ' '.join(words).lower() in ' '.join(text_words):
return True
return False
# or use any, just one line
# return any(' '.join(words).lower() in ' '.join(text_words) for words in list(itertools.permutations(word_list)))
def test():
# True
print(checkWords('Hello world, my friend.', ['Hello', 'world', 'my']))
# False
print(checkWords('Hello, beautiful world', ['Hello', 'world']))
# True
print(checkWords('Hello, beautiful world Hello World', ['Hello', 'world', 'beautiful']))
# True
print(checkWords('Hello, beautiful world Hello World', ['Hello', 'world', 'world']))
But it costs a lot when words number is large, k words will generate k! permutation, the time complexity is O(nk!).
I think a more efficient solution is sliding window. The time complexity will decrease to O(n):
import itertools
import re
import collections
# sliding window, O(n)
def checkWords(text, word_list):
# split all words without space and punctuation
text_words = re.findall(r"[\w']+", text.lower())
counter = collections.Counter(map(str.lower, word_list))
start, end, count, all_indexes = 0, 0, len(word_list), []
while end < len(text_words):
counter[text_words[end]] -= 1
if counter[text_words[end]] >= 0:
count -= 1
end += 1
# if you want all the index of match, you can change here
if count == 0:
# all_indexes.append(start)
return True
if end - start == len(word_list):
counter[text_words[start]] += 1
if counter[text_words[start]] > 0:
count += 1
start += 1
# return all_indexes
return False

I don't know if that what you really need but this worked you can tested
message= 'hello world'
message2= ' hello beautiful world'
if 'hello' in message and 'world' in message :
print('yes')
else :
print('no')
if 'hello' in message2 and 'world' in message2 :
print('yes')
out put :
yes
yes

Related

Check if string is valid based on list of words

Question:
Given an input string and a dictionary of words, find out if the input string can be segmented into a space-separated sequence of dictionary words. See following examples for more details.
Consider the following dictionary
{ i, like, sam, sung, samsung, mobile, ice,
cream, icecream, man, go, mango}
Input: ilike
Output: Yes
The string can be segmented as "i like".
Input: ilikesamsung
Output: Yes
The string can be segmented as "i like samsung" or
"i like sam sung".
I have tried the following recursive approach:
from timeit import timeit
def wordBreak(wordList, word):
word = word.replace(" ", "")
if word == '':
return True
else:
wordLen = len(word)
return any([(word[:i] in wordList)
and wordBreak(wordList, word[i:]) for i in range(1, wordLen+1)])
wordList = ["the", "quick", "fox", "brown"]
word = "the quick brown fox"
print(wordBreak(wordList,word))
print(timeit(lambda: wordBreak(wordList,word))) #12.690028140999999
I also tried using a Trie, which turned out to be way slower after benchmarking the run. Is there an iterative / OOP way of solving the same? Also is my current solution: O(n*(n-1)) in terms of time complexity?
I didn't see it at first, but there's a very similar way to do this without doing it one letter a time. At each recursion, check if you can remove an entire word at a time off the front of the string, and then just keep going. In an initial test or two, it appears to run a good bit faster.
I think this is the first time I've used the count argument to str.replace.
def word_break(word_list, text):
text = text.replace(' ', '')
if text == '':
return True
return any(
text.startswith(word)
and word_break(word_list, text.replace(word, '', 1))
for word in word_list
)
If you're using Python 3.9+, you can replace text.replace(word, '', 1) with text.removeprefix(word).
I believe it's the same asymptotic complexity, but with a smaller constant (unless the words in your allowed list are all single characters, anyway).
I think the best way to go about this is to use a for-loop in this way:
def wordBreak(wordList, word):
word = word.replace(" ", "")
if word == "":
return True
#No need for an else statement
llist = []
words = []
for i in range(len(word)):
llist.append(word[i])
for j in range(len(llist)):
if i != j:
llist[j] += word[i]
if llist[j] in wordList:
#print(llist[j])
words.append(llist[j])
return words
wordList = ["the", "quick", "fox", "brown"]
word = "the quick brown fox"
print(wordBreak(wordList,word))
Although it is a bit lengthier than your original one, it runs much quicker.

How to get the position of a character in Python and store it in a variable?

I am looking for a way to store the position integer of a character into a variable, but now I'm using a way I used in Delphi 2010, which is not right, according to Jupyter Notebook
This is my code I have this far:
def animal_crackers(text):
for index in text:
if index==' ':
if text[0] == text[pos(index)+1]:
return True
else:
return False
else:
pass
The aim, is to get two words (word + space + word) and if the beginning letters, of both words, match, then it has to show True, otherwise it shows False
For getting the index of a letter in a string (as the title asks), just use str.index(), or str.find() if you don't want an error to be raised if the letter/substring could not be found:
>>> text = 'seal sheep'
>>> text.index(' ')
4
However for your program, you do not need to use str.index if you want to identify the first and second word. Instead, use str.split() to break up a given text into a list of substrings:
>>> words = text.split() # With no arguments, splits words by whitespace
>>> words
['seal', 'sheep']
Then, you can take the letter of the first word and check if the second word begins with the same letter:
# For readability, you can assign the two words into their own variables
>>> first_word, second_word = words[0], words[1]
>>> first_word[0] == second_word[0]
True
Combined into a function, it may look like this:
def animal_crackers(text):
words = text.split()
first_word, second_word = words[0], words[1]
return first_word[0] == second_word[0]
Assuming that text is a single line containing two words:
def animal_crackers(text):
words = text.split()
if len(words)== 1:
break # we only have one word!
# here, the .lower() is only necessary is the program is NOT case-sensitive
# if you do care about the case of the letter, remove them
if word[0].lower() == words[1][0].lower():
return True
else:
return false

Python - Find words in string

I know that I can find a word in a string with
if word in my_string:
But I want to find all "word" in the string, like this.
counter = 0
while True:
if word in my_string:
counter += 1
How can I do it without "counting" the same word over and over again?
If you want to make sure that it counts a full word like is will only have one in this is even if there is an is in this, you can split, filter and count:
>>> s = 'this is a sentences that has is and is and is (4)'
>>> word = 'is'
>>> counter = len([x for x in s.split() if x == word])
>>> counter
4
However, if you just want count all occurrences of a substring, ie is would also match the is in this then:
>>> s = 'is this is'
>>> counter = len(s.split(word))-1
>>> counter
3
in other words, split the string at every occurrence of the word, then minus one to get the count.
Edit - JUST USE COUNT:
It's been a long day so I totally forgot but str has a built-in method for this str.count(substring) that does the same as my second answer but way more readable. Please consider using this method (and look at other people's answers for how to)
Use the beg argument for the .find method.
counter = 0
search_pos = 0
while True:
found = my_string.find(word, search_pos)
if found != -1: # find returns -1 when it's not found
#update counter and move search_pos to look for the next word
search_pos = found + len(word)
counter += 1
else:
#the word wasn't found
break
This is kinda a general purpose solution. Specifically for counting in a string you can just use my_string.count(word)
String actually already has the functionality you are looking for. You simply need to use str.count(item) for example.
EDIT: This will search for all occurrences of said string including parts of words.
string_to_search = 'apple apple orange banana grapefruit apple banana'
number_of_apples = string_to_search.count('apple')
number_of_bananas = string_to_search.count('banana')
The following will search for only complete words, just split the string you want to search.
string_to_search = 'apple apple orange banana grapefruit apple banana'.split()
number_of_apples = string_to_search.count('apple')
number_of_bananas = string_to_search.count('banana')
Use regular expressions:
import re
word = 'test'
my_string = 'this is a test and more test and a test'
# Use escape in case your search word contains periods or symbols that are used in regular expressions.
re_word = re.escape(word)
# re.findall returns a list of matches
matches = re.findall(re_word, my_string)
# matches = ['test', 'test', 'test']
print len(matches) # 3
Be aware that this will catch other words that contain your word like testing. You could change your regex to just match exactly your word

Python find n-sized window around phrase within string

I have a string, for example 'i cant sleep what should i do'as well as a phrase that is contained in the string 'cant sleep'. What I am trying to accomplish is to get an n sized window around the phrase even if there isn't n words on either side. So in this case if I had a window size of 2 (2 words on either size of the phrase) I would want 'i cant sleep what should'.
This is my current solution attempting to find a window size of 2, however it fails when the number of words to the left or right of the phrase is less than 2, I would also like to be able to use different window sizes.
import re
sentence = 'i cant sleep what should i do'
phrase = 'cant sleep'
words = re.findall(r'\w+', sentence)
phrase_words = re.findall(r'\w+', phrase)
print sentence_words[left-2:right+3]
left = sentence_words.index(span_words[0])
right = sentence_words.index(span_words[-1])
print sentence_words[left-2:right+3]
You can use the partition method for a non-regex solution:
>>> s='i cant sleep what should i do'
>>> p='cant sleep'
>>> lh, _, rh = s.partition(p)
Then use a slice to get up to two words:
>>> n=2
>>> ' '.join(lh.split()[:n]), p, ' '.join(rh.split()[:n])
('i', 'cant sleep', 'what should')
Your exact output:
>>> ' '.join(lh.split()[:n]+[p]+rh.split()[:n])
'i cant sleep what should'
You would want to check whether p is in s or if the partition succeeds of course.
As pointed out in comments, lh should be a negative to take the last n words (thanks Mathias Ettinger):
>>> s='w1 w2 w3 w4 w5 w6 w7 w8 w9'
>>> p='w4 w5'
>>> n=2
>>> ' '.join(lh.split()[-n:]+[p]+rh.split()[:n])
'w2 w3 w4 w5 w6 w7'
If you define words being entities separated by spaces you can split your sentences and use regular python slicing:
def get_window(sentence, phrase, window_size):
sentence = sentence.split()
phrase = phrase.split()
words = len(phrase)
for i,word in enumerate(sentence):
if word == phrase[0] and sentence[i:i+words] == phrase:
start = max(0, i-window_size)
return ' '.join(sentence[start:i+words+window_size])
sentence = 'i cant sleep what should i do'
phrase = 'cant sleep'
print(get_window(sentence, phrase, 2))
You can also change it to a generator by changing return to yield and be able to generate all windows if several match of phrase are in sentence:
>>> list(gen_window('I dont need it, I need to get rid of it', 'need', 2))
['I dont need it, I', 'it, I need to get']
import re
def contains_sublist(lst, sublst):
n = len(sublst)
for i in xrange(len(lst)-n+1):
if (sublst == lst[i:i+n]):
a = max(i, i-2)
b = min(i+n+2, len(lst))
return ' '.join(lst[a:b])
sentence = 'i cant sleep what should i do'
phrase = 'cant sleep'
sentence_words = re.findall(r'\w+', sentence)
phrase_words = re.findall(r'\w+', phrase)
print contains_sublist(sentence_words, phrase_words)
you can split words using inbuilt string methods, so re shouldn't be nessesary. If you want to define varrring values, then wrap it in a function call like so:
def get_word_window(sentence, phrase, w_left=0, w_right=0):
w_lst = sentence.split()
p_lst = phrase.split()
for i,word in enumerate(w_lst):
if word == p_lst[0] and \
w_lst[i:i+len(p_lst)] == p_lst:
left = max(0, i-w_left)
right = min(len(w_lst), i+w_right+len(p_list)
return w_lst[left:right]
Then you can get the new phrase like so:
>>> sentence='i cant sleep what should i do'
>>> phrase='cant sleep'
>>> ' '.join(get_word_window(sentence,phrase,2,2))
'i cant sleep what should'

How do I print words with only 1 vowel?

my code so far, but since i'm so lost it doesn't do anything close to what I want it to do:
vowels = 'a','e','i','o','u','y'
#Consider 'y' as a vowel
input = input("Enter a sentence: ")
words = input.split()
if vowels == words[0]:
print(words)
so for an input like this:
"this is a really weird test"
I want it to only print:
this, is, a, test
because they only contains 1 vowel.
Try this:
vowels = set(('a','e','i','o','u','y'))
def count_vowels(word):
return sum(letter in vowels for letter in word)
my_string = "this is a really weird test"
def get_words(my_string):
for word in my_string.split():
if count_vowels(word) == 1:
print word
Result:
>>> get_words(my_string)
this
is
a
test
Here's another option:
import re
words = 'This sentence contains a bunch of cool words'
for word in words.split():
if len(re.findall('[aeiouy]', word)) == 1:
print word
Output:
This
a
bunch
of
words
You can translate all the vowels to a single vowel and count that vowel:
import string
trans = string.maketrans('aeiouy','aaaaaa')
strs = 'this is a really weird test'
print [word for word in strs.split() if word.translate(trans).count('a') == 1]
>>> s = "this is a really weird test"
>>> [w for w in s.split() if len(w) - len(w.translate(None, "aeiouy")) == 1]
['this', 'is', 'a', 'test']
Not sure if words with no vowels are required. If so, just replace == 1 with < 2
You may use one for-loop to save the sub-strings into the string array if you have checked he next character is a space.
Them for each substring, check if there is only one a,e,i,o,u (vowels) , if yes, add into the another array
aFTER THAT, FROM another array, concat all the strings with spaces and comma
Try this:
vowels = ('a','e','i','o','u','y')
words = [i for i in input('Enter a sentence ').split() if i != '']
interesting = [word for word in words if sum(1 for char in word if char in vowel) == 1]
i found so much nice code here ,and i want to show my ugly one:
v = 'aoeuiy'
o = 'oooooo'
sentence = 'i found so much nice code here'
words = sentence.split()
trans = str.maketrans(v,o)
for word in words:
if not word.translate(trans).count('o') >1:
print(word)
I find your lack of regex disturbing.
Here's a plain regex only solution (ideone):
import re
str = "this is a really weird test"
words = re.findall(r"\b[^aeiouy\W]*[aeiouy][^aeiouy\W]*\b", str)
print(words)

Categories

Resources