Replacing duplicated words in python 3 - python

I want to take a piece of text which looks like this:
Engineering will save the world from inefficiency. Inefficiency is a blight on the world and its humanity.
and return:
Engineering will save the world from inefficiency..is a blight on the . and its humanity.
That is, I want to remove duplicated words and replace them with "."
This is how I started my code:
lines= ["Engineering will save the world from inefficiency.",
"Inefficiency is a blight on the world and its humanity."]
def solve(lines):
clean_paragraph = []
for line in lines:
if line not in str(lines):
clean_paragraph.append(line)
print (clean_paragraph)
if word == word in line in clean_paragraph:
word = "."
return clean_paragraph
My logic was to create a list with all of the worst in the strings and add each one to a new list, and then, if the word was already in the list, to replace it with ".". My code returns []. Any suggestions would be greatly appreciated!

PROBLEM:
if word == word in line in clean_paragraph:
I'm not sure what you expect of this, but it will always be False. Here it is gain with some clarifying parentheses:
if word == ((word in line) in clean_paragraph):
This evaluates word in line, which may be either Boolean value. However, that value will not appear in the text of clean_paragraph, so the resulting expression is False.
REPAIR
Write the loops you're trying to encode:
for clean_line in clean_paragraph:
for word in clean_line:
At this point, you'll have to figure out what you want for variable names. You've tried to make a couple of variables stand for two different things at once (line and word; I fixed the first one).
You'll also have to learn to properly manipulate loops and their indices; part of the problem is that you've written more code at once than you can handle -- yet. Back up, write one loop at a time, and print the results, so you know what you're getting into. For instance, start with
for line in lines:
if line not in str(lines):
print("line", line, "is new: append")
clean_paragraph.append(line)
else:
print("line", line, "is already in *lines*")
I think you'll spot another problem here -- one even earlier than the one I found. Fix this, then add only one or two lines at a time, building up your program (and programming knowledge) gradually. When something doesn't work, you know it's almost certainly the new lines.

Here is one way to do this. It replaces all duplicate words with a dot.
lines_test = (["Engineering will save the world from inefficiency.",
"Inefficiency is a blight on the world and its humanity."])
def solve(lines):
clean_paragraph = ""
str_lines = " ".join(lines)
words_lines = str_lines.replace('.', ' .').split()
for word in words_lines:
if word != "." and word.lower() in clean_paragraph.lower():
word = " ."
elif word != ".":
word = " " + word
clean_paragraph += word
return clean_paragraph
print(solve(lines_test))
Output:
Engineering will save the world from inefficiency. . is . blight on . . and its humanity.
It is important to convert words or strings into the lower case or upper case (consistent form) before you make comparisons.

Another way of doing this can be:
lines_test = 'Engineering will save the world from inefficiency. Inefficiency is a blight on the world and its humanity.'
text_array = lines_test.split(" ")
formatted_text = ''
for word in text_array:
if word.lower() not in formatted_text:
formatted_text = formatted_text +' '+word
else:
formatted_text = formatted_text +' '+'.'
print(formatted_text)
Output
Engineering will save the world from inefficiency. . is . blight on . . and its humanity.

Related

Code Not fulfilling all the Sample Inputs' result on HackerRank

Question on HackerRank- You are asked to ensure that the first and last names of people begin with a capital letter in their passports. For example, alison heck should be capitalised correctly as Alison Heck.(What they actually want is to capitalize the first letter of every individual string)
def solve(s):
0<len(s)<1000
abc=[]
for p in s.split():
abc.append(p.capitalize())
x=" ".join(abc)
return x
I am getting correct answers on putting my own custom inputs but HackerRank says otherwise.(4/6 Sample Inputs are unsatisfied)
arr = ['muhammad Atif', 'alison heck','dr dexter Morgan']
def capitalizeName(word):
words = word.split(' ')
for i in range(0,len(words)):
words[i] = words[i].capitalize()
return ' '.join(words)
for word in arr:
print(capitalizeName(word))
Hopefully, This simple function will solve your problem. Further, modify it according to the hackerrank criteria . i-e print or return statements etc

Is there a way in python to count sentences having quotation marks, question mark and full stop?

I have been searching for the solution to this problem. I am writing a custom function to count number of sentences. I tried nltk and textstat for this problem but both are giving me different counts.
An Example of a sentence is something like this.
Annie said, "Are you sure? How is it possible? you are joking, right?"
NLTK is giving me --> count=3.
['Annie said, "Are you sure?', 'How is it possible?', 'you are
joking, right?"']
another example:
Annie said, "It will work like this! you need to go and confront your
friend. Okay!"
NLTK is giving me --> count=3.
Please suggest. The expected count is 1 as it is a single direct sentence.
I have written a simple function that does what you want:
def sentences_counter(text: str):
end_of_sentence = ".?!…"
# complete with whatever end of a sentence punctuation mark I might have forgotten
# you might for instance want to add '\n'.
sentences_count = 0
sentences = []
inside_a_quote = False
start_of_sentence = 0
last_end_of_sentence = -2
for i, char in enumerate(text):
# quote management, to solve your issue
if char == '"':
inside_a_quote = not inside_a_quote
if not inside_a_quote and text[i-1] in end_of_sentence: # 🚩
last_end_of_sentence = i # 🚩
elif inside_a_quote:
continue
# basic management of sentences with the punctuation marks in `end_of_sentence`
if char in end_of_sentence:
last_end_of_sentence = i
elif last_end_of_sentence == i-1:
sentences.append(text[start_of_sentence:i].strip())
sentences_count += 1
start_of_sentence = i
# same as the last block in case there is no end punctuation mark in the text
last_sentence = text[start_of_sentence:]
if last_sentence:
sentences.append(last_sentence.strip())
sentences_count += 1
return sentences_count, sentences
Consider the following:
text = '''Annie said, "Are you sure? How is it possible? you are joking, right?" No, I'm not... I thought you were'''
To generalize your problem a bit, I added 2 more sentences, one with ellipsis and the last one without even any end punctuation mark. Now, if I execute this:
sentences_count, sentences = sentences_counter(text)
print(f'{sentences_count} sentences detected.')
print(f'The detected sentences are: {sentences}')
I obtain this:
3 sentences detected.
The detected sentences are: ['Annie said, "Are you sure? How is it possible? you are joking, right?"', "No, I'm not...", 'I thought you were']
I think it works fine.
Note: Please consider the quote management of my solution works for American style quotes, where the end punctuation mark of the sentence can be inside of the quote. Remove the lines where I have put flag emojis 🚩 to disable this.

Capitalize the first word of a sentence in a text

I want to make sure that each sentence in a text starts with a capital letter.
E.g. "we have good news and bad news about your emissaries to our world," the extraterrestrial ambassador informed the Prime Minister. the good news is they tasted like chicken." should become
"We have good news and bad news about your emissaries to our world," the extraterrestrial ambassador informed the Prime Minister. The good news is they tasted like chicken."
I tried using split() to split the sentence. Then, I capitalized the first character of each line. I appended the rest of the string to the capitalized character.
text = input("Enter the text: \n")
lines = text.split('. ') #Split the sentences
for line in lines:
a = line[0].capitalize() # capitalize the first word of sentence
for i in range(1, len(line)):
a = a + line[i]
print(a)
I want to obtain "We have good news and bad news about your emissaries to our world," the extraterrestrial ambassador informed the Prime Minister. The good news is they tasted like chicken."
I get "We have good news and bad news about your emissaries to our world," the extraterrestrial ambassador informed the Prime Minister
The good news is they tasted like chicken."
This code should work:
text = input("Enter the text: \n")
lines = text.split('. ') # Split the sentences
for index, line in enumerate(lines):
lines[index] = line[0].upper() + line[1:]
print(". ".join(lines))
The error in your code is that str.split(chars) removes the splitting delimiter char and that's why the period is removed.
Sorry for not providing a thorough description as I cannot think of what to say. Please feel free to ask in comments.
EDIT: Let me try to explain what I did.
Lines 1-2: Accepts the input and splits into a list by '. '. On the sample input, this gives: ['"We have good news and bad news about your emissaries to our world," the extraterrestrial ambassador informed the Prime Minister', 'the good news is they tasted like chicken.']. Note the period is gone from the first sentence where it was split.
Line 4: enumerate is a generator and iterates through an iterator returning the index and item of each item in the iterator in a tuple.
Line 5: Replaces the place of line in lines with the capital of the first character plus the rest of the line.
Line 6: Prints the message. ". ".join(lines) basically reverses what you did with split. str.join(l) takes a iterator of strings, l, and sticks them together with str between all the items. Without this, you would be missing your periods.
When you split the string by ". " that removes the ". "s from your string and puts the rest of it into a list. You need to add the lost periods to your sentences to make this work.
Also, this can result in the last sentence to have double periods, since it only has "." at the end of it, not ". ". We need to remove the period (if it exists) at the beginning to make sure we don't get double periods.
text = input("Enter the text: \n")
output = ""
if (text[-1] == '.'):
# remove the last period to avoid double periods in the last sentence
text = text[:-1]
lines = text.split('. ') #Split the sentences
for line in lines:
a = line[0].capitalize() # capitalize the first word of sentence
for i in range(1, len(line)):
a = a + line[i]
a = a + '.' # add the removed period
output = output + a
print (output)
We can also make this solution cleaner:
text = input("Enter the text: \n")
output = ""
if (text[-1] == '.'):
# remove the last period to avoid double periods in the last sentence
text = text[:-1]
lines = text.split('. ') #Split the sentences
for line in lines:
a = line[0].capitalize() + line [1:] + '.'
output = output + a
print (output)
By using str[1:] you can get a copy of your string with the first character removed. And using str[:-1] will give you a copy of your string with the last character removed.
split splits the string AND none of the new strings contain the delimiter - or the string/character you split by.
change your code to this:
text = input("Enter the text: \n")
lines = text.split('. ') #Split the sentences
final_text = ". ".join([line[0].upper()+line[1:] for line in lines])
print(final_text)
The below can handle multiple sentence types (ending in ".", "!", "?", etc...) and will capitalize the first word of each of the sentences. Since you want to keep your existing capital letters, using the capitalize function will not work (since it will make none sentence starting words lowercase). You can throw a lambda function into the list comp to take advantage of upper() on the first letter of each sentence, this keeps the rest of the sentence completely un-changed.
import re
original_sentence = 'we have good news and bad news about your emissaries to our world," the extraterrestrial ambassador informed the Prime Minister. the good news is they tasted like chicken.'
val = re.split('([.!?] *)', original_sentence)
new_sentence = ''.join([(lambda x: x[0].upper() + x[1:])(each) if len(each) > 1 else each for each in val])
print(new_sentence)
The "new_sentence" list comprehension is the same as saying:
sentence = []
for each in val:
sentence.append((lambda x: x[0].upper() + x[1:])(each) if len(each) > 1 else each)
print(''.join(sentence))
You can use the re.sub function in order to replace all characters following the pattern . \w with its uppercase equivalent.
import re
original_sentence = 'we have good news and bad news about your emissaries to our world," the extraterrestrial ambassador informed the Prime Minister. the good news is they tasted like chicken.'
def replacer(match_obj):
return match_obj.group(0).upper()
# Replace the very first characer or any other following a dot and a space by its upper case version.
re.sub(r"(?<=\. )(\w)|^\w", replacer, original_sentence)
>>> 'We have good news and bad news about your emissaries to our world," the extraterrestrial ambassador informed the Prime Minister. The good news is they tasted like chicken.'

Replace a word in a String by indexing without "string replace function" -python

Is there a way to replace a word within a string without using a "string replace function," e.g., string.replace(string,word,replacement).
[out] = forecast('This snowy weather is so cold.','cold','awesome')
out => 'This snowy weather is so awesome.
Here the word cold is replaced with awesome.
This is from my MATLAB homework which I am trying to do in python. When doing this in MATLAB we were not allowed to us strrep().
In MATLAB, I can use strfind to find the index and work from there. However, I noticed that there is a big difference between lists and strings. Strings are immutable in python and will likely have to import some module to change it to a different data type so I can work with it like how I want to without using a string replace function.
just for fun :)
st = 'This snowy weather is so cold .'.split()
given_word = 'awesome'
for i, word in enumerate(st):
if word == 'cold':
st.pop(i)
st[i - 1] = given_word
break # break if we found first word
print(' '.join(st))
Here's another answer that might be closer to the solution you described using MATLAB:
st = 'This snow weather is so cold.'
given_word = 'awesome'
word_to_replace = 'cold'
n = len(word_to_replace)
index_of_word_to_replace = st.find(word_to_replace)
print st[:index_of_word_to_replace]+given_word+st[index_of_word_to_replace+n:]
You can convert your string into a list object, find the index of the word you want to replace and then replace the word.
sentence = "This snowy weather is so cold"
# Split the sentence into a list of the words
words = sentence.split(" ")
# Get the index of the word you want to replace
word_to_replace_index = words.index("cold")
# Replace the target word with the new word based on the index
words[word_to_replace_index] = "awesome"
# Generate a new sentence
new_sentence = ' '.join(words)
Using Regex and a list comprehension.
import re
def strReplace(sentence, toReplace, toReplaceWith):
return " ".join([re.sub(toReplace, toReplaceWith, i) if re.search(toReplace, i) else i for i in sentence.split()])
print(strReplace('This snowy weather is so cold.', 'cold', 'awesome'))
Output:
This snowy weather is so awesome.

Python How to count how many times each of the vocabulary words shows in the sentence? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
hey guys Im confused and very unsure why my code is not working. what I am doing in this code is trying to find certain words from a list in a sentence I have and output the number of times it is repeated within the sentence.
vocabulary =["in","on","to","www"]
numwords = [0,0,0,0]
mysentence = (" ʺAlso the sea tosses itself and breaks itself, and should any  sleeper fancying that he might find on the beach an answer to his doubts, a  sharer of his solitude, throw off his bedclothes and go down by himself to  walk on the sand, no image with semblance of serving and divine  promptitude comes readily to hand bringing the night to order and making  the world reflect the compass of the soul.ʺ)
for word in mysentence.split():
if (word == vocabulary):
else:
numwords[0] += 1
if(word == vocabulary):
else:
numwords[1] +=1
if (word == vocabulary):
else:
numwords [2] += 1
if (word == vocabulary):
else :
numwords [3] += 1
if (word == vocabulary):
else:
numwords [4] += 1
print "total number of words : " + str(len(mysentence))
The easiest way to do this is to use collections.Counter to count all the words in the sentence, and then look up the ones you're interested in.
from collections import Counter
vocabulary =["in","on","to","www"]
mysentence = "Also the sea tosses itself and breaks itself, and should any sleeper fancying that he might find on the beach an answer to his doubts, a sharer of his solitude, throw off his bedclothes and go down by himself to walk on the sand, no image with semblance of serving and divine promptitude comes readily to hand bringing the night to order and making the world reflect the compass of the soul."
mysentence = mysentence.split()
c = Counter(mysentence)
numwords = [c[i] for i in vocabulary]
print(numwords)
Presumably you could iterate through the list with a for loop checking if it's in the list and then incrementing the counter - an example implementation might look like
def find_word(word,string):
word_count = 0
for i in range(len(string)):
if list[i] == word:
word_count +=1
This might be a little inefficient, but I'm sure it might be easier to understand for you than collections.Counter :)
I would do it like this honestly to check:
for word in mysentence.split():
if word in vocabulary:
numwords[vocabulary.index(word)] += 1
Therefore your entire code would look like this:
vocabulary = ["in", "on", "to", "www"]
numwords = [0, 0, 0, 0]
mysentence = (" ʺAlso the sea tosses itself and breaks itself, and should any sleeper fancying that he might find on the beach an answer to his doubts, a sharer of his solitude, throw off his bedclothes and go down by himself to walk on the sand, no image with semblance of serving and divine promptitude comes readily to hand bringing the night to order and making the world reflect the compass of the soul.ʺ")
for word in mysentence.replace('.', '').replace(',', '').split():
if word in vocabulary:
numwords[vocabulary.index(word)] += 1
print("total number of words : " + str(len(mysentence)))
As #Jacob suggested, replacing the '.' and ',' characters can also be applied before the split, to avoid any possible conflicts.
Consider the issue that characters like “ and ” may not parse well unless an appropriate encoding scheme has been specified.
this_is_how_you_define_a_string = "The string goes here"
# and thus:
mysentence = "Also the sea tosses itself and breaks itself, and should any sleeper fancying that he might find on the beach an answer to his doubts, a sharer of his solitude, throw off his bedclothes and go down by himself to walk on the sand, no image with semblance of serving and divine promptitude comes readily to hand bringing the night to order and making the world reflect the compass of the soul."
for v in vocabulary:
v in mysentence # Notice the indentation of 4 spaces
This solution will return TRUE or FALSE if v i sin mysentence. I think I will leave as an exercise how to accumulate the values. Hint: TRUE == 1 and FALSE = 0. You need the sum of the true values for each word v.

Categories

Resources