I am trying to find a single exact word within a large string.
I have tried the below:
for word in words:
if word in strings:
best.append("The word " + word + " The Sentence " + strings)
else:
pass
This seemed to work at first until tried with a larger set of words in a much larger string and was getting partial matches. As an example if the word is "me" it would pass "message" off as being found.
Is there a way of searching for exactly "me"?
Thanks in advance.
You need to set boundaries in order to find complete word. I'd go to regex. Something like:
re.search(r'\b' + word_to_find + r'\b')
You can split the string into words and then perform the in operation, making sure you strip the words in the list and the string of any trailing whitespaces
import string
def find_words(words, s):
best = []
#Strip extra whitespaces if any around the word and make them all lowercase
modified_words = [word.strip().lower() for word in words]
#Strip away punctuations from string, and make it lower
modified_s = s.translate(str.maketrans('', '', string.punctuation))
words_list = [word.strip().lower() for word in modified_s.lower().split()]
#Iterate through the list
for idx, word in enumerate(modified_words):
#If word is found in lit of words, append to result
if word in words_list:
best.append("The word " + words[idx] + " The Sentence " + s)
return best
print(find_words(['me', 'message'], 'I me myself'))
print(find_words([' me ', 'message'], 'I me myself'))
print(find_words(['me', 'message'], 'I me myself'))
print(find_words(['me', 'message'], 'I am me.'))
print(find_words(['me', 'message'], 'I am ME.'))
print(find_words(['Me', 'message'], 'I am ME.'))
The output will be
['The word me The Sentence I me myself']
['The word me The Sentence I me myself']
['The word me The Sentence I me myself']
['The word me The Sentence I am me.']
['The word me The Sentence I am ME.']
['The word Me The Sentence I am ME.']
You can also use regex to find the word exactly. \\b means boundary like space or punctuation marks.
for word in words:
if len(re.findall("\\b" + word + "\\b", strings)) > 0:
best.append("The word " + word + " The Sentence " + strings)
else:
pass
The double backslashes are due to a '\b' character being the backspace control sequence. Source
You could include the surrounding spaces in the if statement.
for word in words:
if f' {word} ' in strings:
best.append("The word " + word + " The Sentence " + strings)
else:
pass
To make sure you don't detect words inside words that they are contained within (like "me" in "message" or "flame") is to add spaces before and after the words in the detection. The easiest way of doing this is to replace
if word in strings:
with
if " "+word+" " in strings:
Hope this helps! -Theo
You need to set boundaries for your search, \b is the boundary character.
import re
string = 'youyou message me me me me me'
print(re.findall(r'\bme\b', string))
The string has message and me, we only need me explicitly. So added boundaries in my search expression. The result is below -
['me', 'me', 'me', 'me', 'me']
Got all the me(s), but not the message which also has a me in it.
Without knowing the rest of the code, the best I could suggest is using == to get a direct match, so for example
a = 0
list = ["Me","Hello","Message"]
b = len(list)
i = input("What do you want to find?")
for d in range(b):
if list[a] == i:
print("Found a match")
else:
a = a+1
Related
I need to take the initial letter of every word, moving it to the end of the word and adding 'arg'. For such I tried the following way
def pirate(str):
list_str = str.split(' ')
print(list_str)
new_str = ''
for lstr in list_str:
first_element = lstr[0]
second_element = lstr[1:]
new_str += second_element + first_element + 'arg' + ' '
return new_str
print(pirate('Hello! how are, you!!'))
The expected output is: elloHarg! owharg reaarg, ouyarg!!
However, I am getting following output: ello!Harg owharg re,aarg ou!!yarg
How can I make it work the following usecase?
Punctuations should remain at the end of the word even after translation. Assume Punctuations wont appear after than end of the word. Punctuations to be considered are .,:;?! There could be multiple punctuations present (e.g yes!!)
Here is a short and efficient solution using a regex:
import re
re.sub(r'(\w)(\w+)', r'\2\1arg', 'Hello! how are, you!!')
This is literally: replace each single letter followed by more letters by the more letters first, then the single letter and 'arg'
Output:
'elloHarg! owharg reaarg, ouyarg!!'
As a function:
def pirate(s):
return re.sub(r'(\w)(\w+)', r'\2\1arg', s)
I tried using this code that I found online:
K=sentences
m=[len(i.split()) for i in K]
lengthorder= sorted(K, key=len, reverse=True)
#print(lengthorder)
#print("\n")
list1 = lengthorder
str1 = '\n'.join(list1)
print(str1)
print('\n')
Sentence1 = "We have developed speed, but we have shut ourselves in"
res = len(Sentence1.split())
print ("The longest sentence in this text contains" + ' ' + str(res) + ' ' + "words.")
Sentence2 = "More than cleverness we need kindness and gentleness"
res = len(Sentence2.split())
print ("The second longest sentence in this text contains" + ' ' + str(res) + ' ' + "words.")
Sentence3 = "Machinery that gives abundance has left us in want"
res = len(Sentence3.split())
print ("The third longest sentence in this text contains" + ' ' + str(res) + ' ' + "words.")
but it doesn't sort out the sentences per word number, but per actual length (as in cm)
You can simply iterate through the different sentaces and split them up into words like this:
text = " We have developed speed. but we have. shut ourselves in Machinery that. gives abundance has left us in want Our knowledge has made us cynical Our cleverness, hard and unkind We think too much and feel too little More than machinery we need humanity More than cleverness we need kindness and gentleness"
# split into sentances
text2array = text.split(".")
i =0
# interate through sentances and split them into words
for sentance in text2array:
text2array[i] = sentance.split(" ")
i += 1
# sort the sentances by word length
text2array.sort(key=len,reverse=True)
i = 0
#iterate through sentances and print them to screen
for sentance in text2array:
i += 1
sentanceOut = ""
for word in sentance:
sentanceOut += " " + word
sentanceOut += "."
print("the nr "+ str(i) +" longest sentence is" + sentanceOut)
You can define a function that uses the regex to obtain the number of words in a given sentence:
import re
def get_word_count(sentence: str) -> int:
return len(re.findall(r"\w+", sentence))
Assuming you already have a list of sentences, you can iterate the list and pass each sentence to the word count function then store each sentence and its word count in a dictionary:
sentences = [
"Assume that this sentence has one word. Really?",
"Assume that this sentence has more words than all sentences in this list. Obviously!",
"Assume that this sentence has more than one word. Duh!",
]
word_count_dict = {}
for sentence in sentences:
word_count_dict[sentence] = get_word_count(sentence)
At this point, the word_count_dict contains sentences as keys and their associated word count as values.
You can then sort word_count_dict by values:
sorted_word_count_dict = dict(
sorted(word_count_dict.items(), key=lambda item: item[1], reverse=True)
)
Here's the full snippet:
import re
def get_word_count(sentence: str) -> int:
return len(re.findall(r"\w+", sentence))
sentences = [
"Assume that this sentence has one word. Really?",
"Assume that this sentence has more words than all sentences in this list. Obviously!",
"Assume that this sentence has more than one word. Duh!",
]
word_count_dict = {}
for sentence in sentences:
word_count_dict[sentence] = get_word_count(sentence)
sorted_word_count_dict = dict(
sorted(word_count_dict.items(), key=lambda item: item[1], reverse=True)
)
print(sorted_word_count_dict)
Let's assume that your sentences are already separate and there is no need to detect the sentences.
So we have a list of sentences. Then we need to calculate the length of the sentence based on the word count. the basic way is to split them by space. So each space separates two words from each other in a sentence.
list_of_sen = ['We have developed speed, but we have shut ourselves in','Machinery that gives abundance has left us in want Our knowledge has made us cynical Our cleverness', 'hard and unkind We think too much and feel too little More than machinery we need humanity More than cleverness we need kindness and gentleness']
sen_len=[len(i.split()) for i in list_of_sen]
sen_len= sorted(sen_len, reverse=True)
for index , count in enumerate(sen_len):
print(f'The {index+1} longest sentence in this text contains {count} words')
But if your sentence is not separated, first we need to recognize the end of the sentence then split them. Your sample date does not contain any punctuation that can be useful to separate sentences. So if we assume that your data has punctuation the answer below can be helpful.
see this question
from nltk import tokenized
p = "Good morning Dr. Adams. The patient is waiting for you in room number 3."
tokenize.sent_tokenize(p)
I am trying to print each word from my list onto separate lines, however it is printing each letter onto individual lines
Words = sentence.strip()
for word in sentence:
print (word)
My full code (for anyone wondering) is:
import csv
file = open("Task2.csv", "w")
sentence = input("Please enter a sentence: ")
Words = sentence.strip()
for word in sentence:
print (word)
for s in Words:
Positions = Words.index(s)+1
file.write(str(Words) + (str(Positions) + "\n"))
file.close()
You forgot to split sentence and use "Words" not "sentence" in first for loop.
#file = open("Task2.csv", "w")
sentence = input("Please enter a sentence: ")
Words = sentence.split()
for word in Words:
print (word)
for s in Words:
Positions = Words.index(s)+1
#file.write(str(Words) + (str(Positions) + "\n"))
#file.close()
Output:
C:\Users\dinesh_pundkar\Desktop>python c.py
Please enter a sentence: I am Dinesh
I
am
Dinesh
C:\Users\dinesh_pundkar\Desktop>
You need to used str.split() instead of str.strip().
str.strip() only removes the leading and trailing whitespaces in a string:
>>> my_string = ' This is a sentence. '
>>> my_string.strip()
'This is a sentence.'
str.split() does what you want which is return a list of the words in the string; by default, using whitespace as the delimiter string:
>>> my_string = ' This is a sentence. '
>>> my_string.split()
['This', 'is', 'a', 'sentence.']
So, your code should look more like:
words = sentence.split()
for word in sentence:
print(word)
This question already has answers here:
How to modify list entries during for loop?
(10 answers)
Closed 6 years ago.
I'm taking a sentence and turning it into pig latin, but when I edit the words in the list it never stays.
sentence = input("Enter a sentence you want to convert to pig latin")
sentence = sentence.split()
for words in sentence:
if words[0] in "aeiou":
words = words+'yay'
And when I print sentence I get the same sentence I put in.
another way to do it (includes some fixes)
sentence = input("Enter a sentence you want to convert to pig latin: ")
sentence = sentence.split()
for i in range(len(sentence)):
if sentence[i][0] in "aeiou":
sentence[i] = sentence[i] + 'yay'
sentence = ' '.join(sentence)
print(sentence)
Because you did not change sentence
So to get the results you want
new_sentence = ''
for word in sentence:
if word[0] in "aeiou":
new_sentence += word +'yay' + ' '
else:
new_sentence += word + ' '
So now print new_sentence
I set this up to return a string, if you would rather have a list that can be accomplished as easily
new_sentence = []
for word in sentence:
if word[0] in "aeiou":
new_sentence.append(word + 'yay')
else:
new_sentence.append(word)
If you are working with a list and you want to then convert the list to a string then just
" ".join(new_sentence)
It does not seem as though you are updating sentence.
sentence = input("Enter a sentence you want to convert to pig latin")
sentence = sentence.split()
# lambda and mapping instead of a loop
sentence = list(map(lambda word: word+'yay' if word[0] in 'aeiou' else word, sentence))
# instead of printing a list, print the sentence
sentence = ' '.join(sentence)
print(sentence)
PS. Kinda forgot some things about Python's for loop so I didn't use it. Sorry
How do I use Python to print a word one letter at a time? Any help would be appreciated.
If i understood you correctly than you can use the following code:
for word in text.split():
print word
else if you need to print word's letters:
for let in word:
print let
In case you need to skip punctuation and so on you can also use regEx:
tst = 'word1, word2 word3;'
from re import findall
print findall(r'\w+', tst)
Or not very pythonic:
skipC = [':','.', ',', ';']# add any if needed
text= 'word1, word2. word3;'
for x in skipC:
text = text.replace(x, ' ')
for word in text.split():
print word