Capitalize the first word of a sentence in a text

Capitalize the first word of a sentence in a text - python

I want to make sure that each sentence in a text starts with a capital letter.
E.g. "we have good news and bad news about your emissaries to our world," the extraterrestrial ambassador informed the Prime Minister. the good news is they tasted like chicken." should become
"We have good news and bad news about your emissaries to our world," the extraterrestrial ambassador informed the Prime Minister. The good news is they tasted like chicken."
I tried using split() to split the sentence. Then, I capitalized the first character of each line. I appended the rest of the string to the capitalized character.
text = input("Enter the text: \n")
lines = text.split('. ') #Split the sentences
for line in lines:
a = line[0].capitalize() # capitalize the first word of sentence
for i in range(1, len(line)):
a = a + line[i]
print(a)
I want to obtain "We have good news and bad news about your emissaries to our world," the extraterrestrial ambassador informed the Prime Minister. The good news is they tasted like chicken."
I get "We have good news and bad news about your emissaries to our world," the extraterrestrial ambassador informed the Prime Minister
The good news is they tasted like chicken."

This code should work:
text = input("Enter the text: \n")
lines = text.split('. ') # Split the sentences
for index, line in enumerate(lines):
lines[index] = line[0].upper() + line[1:]
print(". ".join(lines))
The error in your code is that str.split(chars) removes the splitting delimiter char and that's why the period is removed.
Sorry for not providing a thorough description as I cannot think of what to say. Please feel free to ask in comments.
EDIT: Let me try to explain what I did.
Lines 1-2: Accepts the input and splits into a list by '. '. On the sample input, this gives: ['"We have good news and bad news about your emissaries to our world," the extraterrestrial ambassador informed the Prime Minister', 'the good news is they tasted like chicken.']. Note the period is gone from the first sentence where it was split.
Line 4: enumerate is a generator and iterates through an iterator returning the index and item of each item in the iterator in a tuple.
Line 5: Replaces the place of line in lines with the capital of the first character plus the rest of the line.
Line 6: Prints the message. ". ".join(lines) basically reverses what you did with split. str.join(l) takes a iterator of strings, l, and sticks them together with str between all the items. Without this, you would be missing your periods.

When you split the string by ". " that removes the ". "s from your string and puts the rest of it into a list. You need to add the lost periods to your sentences to make this work.
Also, this can result in the last sentence to have double periods, since it only has "." at the end of it, not ". ". We need to remove the period (if it exists) at the beginning to make sure we don't get double periods.
text = input("Enter the text: \n")
output = ""
if (text[-1] == '.'):
# remove the last period to avoid double periods in the last sentence
text = text[:-1]
lines = text.split('. ') #Split the sentences
for line in lines:
a = line[0].capitalize() # capitalize the first word of sentence
for i in range(1, len(line)):
a = a + line[i]
a = a + '.' # add the removed period
output = output + a
print (output)
We can also make this solution cleaner:
text = input("Enter the text: \n")
output = ""
if (text[-1] == '.'):
# remove the last period to avoid double periods in the last sentence
text = text[:-1]
lines = text.split('. ') #Split the sentences
for line in lines:
a = line[0].capitalize() + line [1:] + '.'
output = output + a
print (output)
By using str[1:] you can get a copy of your string with the first character removed. And using str[:-1] will give you a copy of your string with the last character removed.

split splits the string AND none of the new strings contain the delimiter - or the string/character you split by.
change your code to this:
text = input("Enter the text: \n")
lines = text.split('. ') #Split the sentences
final_text = ". ".join([line[0].upper()+line[1:] for line in lines])
print(final_text)

The below can handle multiple sentence types (ending in ".", "!", "?", etc...) and will capitalize the first word of each of the sentences. Since you want to keep your existing capital letters, using the capitalize function will not work (since it will make none sentence starting words lowercase). You can throw a lambda function into the list comp to take advantage of upper() on the first letter of each sentence, this keeps the rest of the sentence completely un-changed.
import re
original_sentence = 'we have good news and bad news about your emissaries to our world," the extraterrestrial ambassador informed the Prime Minister. the good news is they tasted like chicken.'
val = re.split('([.!?] *)', original_sentence)
new_sentence = ''.join([(lambda x: x[0].upper() + x[1:])(each) if len(each) > 1 else each for each in val])
print(new_sentence)
The "new_sentence" list comprehension is the same as saying:
sentence = []
for each in val:
sentence.append((lambda x: x[0].upper() + x[1:])(each) if len(each) > 1 else each)
print(''.join(sentence))

You can use the re.sub function in order to replace all characters following the pattern . \w with its uppercase equivalent.
import re
original_sentence = 'we have good news and bad news about your emissaries to our world," the extraterrestrial ambassador informed the Prime Minister. the good news is they tasted like chicken.'
def replacer(match_obj):
return match_obj.group(0).upper()
# Replace the very first characer or any other following a dot and a space by its upper case version.
re.sub(r"(?<=\. )(\w)|^\w", replacer, original_sentence)
>>> 'We have good news and bad news about your emissaries to our world," the extraterrestrial ambassador informed the Prime Minister. The good news is they tasted like chicken.'

Related

Extracting words/phrase followed by a phrase

I have one text file with a list of phrases. Below is how the file looks:
Filename: KP.txt
And from the below input (paragraph), I want to extract the next 2 words after the KP.txt phrase (the phrases could be anything as shown in my above KP.txt file). All I need is to extract the next 2 words.
Input:
This is Lee. Thanks for contacting me. I wanted to know the exchange policy at Noriaqer hardware services.
In the above example, I found the phrase " I wanted to know", matches with the KP.txt file content. So if I wanted to extract the next 2 words after this, my output will be like "exchange policy".
How could I extract this in python?

Assuming you already know how to read the input file into a list, it can be done with some help from regex.
>>> wordlist = ['I would like to understand', 'I wanted to know', 'I wish to know', 'I am interested to know']
>>> input_text = 'This is Lee. Thanks for contacting me. I wanted to know exchange policy at Noriaqer hardware services.'
>>> def word_extraction (input_text, wordlist):
... for word in wordlist:
... if word in input_text:
... output = re.search (r'(?<=%s)(.\w*){2}' % word, input_text)
... print (output.group ().lstrip ())
>>> word_extraction(input_text, wordlist)
exchange policy
>>> input_text = 'This is Lee. Thanks for contacting me. I wish to know where is Noriaqer hardware.'
>>> word_extraction(input_text, wordlist)
where is
>>> input_text = 'This is Lee. Thanks for contacting me. I\'d like to know where is Noriaqer hardware.'
>>> word_extraction(input_text, wordlist)
>>>
First we need to check whether the phrase we want is in the sentence. It's not the most efficient way if you have large list but it works for now.
Next if it is in our "dictionary" of phrase, we use regex to extract the keyword that we wanted.
Finally strip the leading white space in front of our target word.
Regex Hint:
(?<=%s) is look behind assertion. Meaning check the word behind the sentence starting with "I wanted to know"
(.\w*){2} means any character after our phrase followed by one or more words stopping at 2 words after the key phrase.

I Think natural language processing could be a better solution, but this code would help :)
def search_in_text(kp,text):
for line in kp:
#if a search phrase found in kp lines
if line in text:
#the starting index of the two words
i1=text.find(line)+len(line)
#the end index of the following two words (first index+50 at maximum)
i2=(i1+50) if len(text)>(i1+50) else len(text)
#split the following text to words (next_words) and remove empty spaces
next_words=[word for word in text[i1:i2].split(' ') if word!='']
#return only the next two words from (next_words)
return next_words[0:2]
return [] # return empty list if no phrase matching
#read your kp file as list of lines
kp=open("kp.txt").read().split("\n")
#input 1
text = 'This is Lee. Thanks for contacting me. I wanted to know exchange policy at Noriaqer hardware services.'
print('input ->>',text)
output = search_in_text(kp,text)
print('output ->>',output)
input ->> This is Lee. Thanks for contacting me. I wanted to know exchange policy at Noriaqer hardware services.
output ->> ['exchange', 'policy']
#input 2
text = 'Boss was very angry and said: I wish to know why you are late?'
print('input ->>',text)
output = search_in_text(kp,text)
print('output ->>',output)
input ->> Boss was very angry and said: I wish to know why you are late?
output ->> ['why', 'you']

you can use this:
with open("KP.txt") as fobj:
phrases = list(map(lambda sentence : sentence.lower().strip(), fobj.readlines()))
paragraph = input("Enter The Whole Paragraph in one line:\t").lower()
for phrase in phrases:
if phrase in paragraph:
temp = paragraph.split(phrase)[1:]
for clause in temp:
print(" ".join(clause.split()[:2]))

string count method giving incorrect result

I have a strange problem with the following codewars Kata:
https://www.codewars.com/kata/51e056fe544cf36c410000fb/train/python
I haven't completed it, but I ran into a really strange problem with the string.count() method.
The count that I get with the method for word "an" is 8, eventhough it is only once in the string.
Advise would be much appreciated.
Here is my code:
import re
words = "In a village of La Mancha, the name of which I have no desire to call to \
mind, there lived not long since one of those gentlemen that keep a lance \
in the lance-rack, an old buckler, a lean hack, and a greyhound for \
coursing. An olla of rather more beef than mutton, a salad on most \
nights, scraps on Saturdays, lentils on Fridays, and a pigeon or so extra \
on Sundays, made away with three-quarters of his income."
# => ["a", "of", "on"]
def top_3_words(text: str):
text = re.sub('[^0-9a-zA-Z]+', ' ', text).strip(' ')
arr = text.split(' ')
dict = {value: text.count(value) for value in arr}
result = []
for _ in range(3):
max_key = max(dict, key=dict.get)
result.append(max_key)
dict.pop(max_key)
print(result)
top_3_words(words)

You should first tokenize string words and then count.
text = text.replace(',', '').split()
The above code return list of words in your string. Then you can count numbers of each word in the list.

First split your string into individual words and count the number of an in it.
temp_words = words.split(' ')
an_count = temp_words.count('an')

I'm trying to solve for the other Acronyms

I have the following strings that I need to make acronyms for:
Institute of Electrical and Electronics Engineers
As Soon As Possible
University of California San Diego
Self Contained Underwater Breathing Apparatus
This is my code
my_string = input()
my_string2 = my_string.upper()
for x in range(0, 1, len(my_string2)):
print(my_string2[0::15])
but it only worked for the first input. There are three more examples that this code doesn't cover. What I need is for this code to be modified in such a way where it will create an Acronym out of any input.The first Acronym is called "Institute of Electrical and Electronics Engineers" and once it's placed into the input it returns IEEE as the output. Basically all of the first letters that are capitalized are kept and no lower cased words remain.

I'm new to programming so I bet the way I did it is a bit funky, but this worked for me on my zybooks lab:
Name = input()
AStart = Name.split()
AFinal = ''
for string in AStart:
if string[0].isupper():
AFinal += string[0] + '.'
print(AFinal)

Here's a regex based solution which looks for words that start with a capital letter and extracts their starting letter, then joins all them together to make the acronym:
import re
strings = [
'As Soon As Possible',
'Institute of Electrical and Electronics Engineers',
'University of California San Diego',
'Self Contained Underwater Breathing Apparatus'
]
for s in strings:
acronym = ''.join(re.findall(r'\b[A-Z]', s))
print(acronym)
If you don't want to use regex, you can just split the strings and test the first character of each word to see if it is uppercase:
for s in strings:
acronym = ''.join(w[0] for w in s.split(' ') if w[0].isupper())
print(acronym)
In either case the output is:
ASAP
IEEE
UCSD
SCUBA
To run from input, use this code:
import re
s = input()
acronym = ''.join(re.findall(r'\b[A-Z]', s))
print(acronym)
Or:
s = input()
acronym = ''.join(w[0] for w in s.split(' ') if w[0].isupper())
print(acronym)
Demo on ideone.com

try this:
full_string = input("Enter Text: ")
string_list = full_string.split()
acronym = ""
for string in string_list:
acronym += f"{string[0].upper()}"
print(acronym)
output:
Enter Text: This is a long string please be kind
TIALSPBK

This is what I used.
phrase = str(input()).rstrip() #gets the phrase and makes string sanitized
for char in phrase: #goes through every char
x = char #Did this to make it easier to keep track
if x.isupper() == True: #The char loop check if the value is true or not
print(x, end='') #print the true uppercase, end print on 1 line.

Is there a way of getting this string down to 3 words?

There are multiple problems with the code i posted below, since as i also said on my previous post im new to coding i have some trouble finding stuff by myself :(
My goal is to take user input, narrow it down to 3 words by size and then sort them alphabetically. Am i doing this right?
Probably not because it prints it out with commas. For example, with "i like eating cake" as input, the output is:
"'cake',", "'eating'", "'i',", "'like',"
But I want it to be:
cake, eating, like
Any help is much appreciated.
input = input(" ")
prohibited = {'this','although','and','as','because','but','even if','he','and','however','cosmos','an','a','is','what','question :','question','[',']',',','cosmo',' ',' ',' '}
processedinput = [word for word in re.split("\W+",input) if word.lower() not in prohibited]
processed = processedinput
processed.sort(key = len)
processed = re.sub('[\[\]]','',repr(processedinput)) #removes brackets
keywords = processed
keywords = keywords.split()
keywords.sort(key=str.lower)
keywords.sort()
keywords = re.sub('[\[\]]','',repr(keywords))
str(keywords)
print(keywords)

The first issue with your code is input = input(). The problem with this is that input is the name of the function you are calling, but you are overwriting input with the user's string. Consequently, if you tried to run input() again, it would fail.
The second issue is that you are misunderstanding lists. In the code below, tokens is a list, not a string. Each element in the list is a string. So there is no need to strip out brackets and such. You can simply order the list (that part of your code was correct) in reverse order of length, then print the first three words.
Code:
import re
user_input = input(" ")
prohibited = {'this','although','and','as','because','but','even if','he','and','however','cosmos','an','a','is','what','question :','question','[',']',',','cosmo',' ',' ',' '}
tokens = [word for word in re.split("\W+", user_input) if word.lower() not in prohibited]
tokens.sort(key=len, reverse=True)
print(tokens[0], end=', ')
print(tokens[1], end=', ')
print(tokens[2])
Input:
i like eating cake
Output:
eating, like, cake

Replacing duplicated words in python 3

I want to take a piece of text which looks like this:
Engineering will save the world from inefficiency. Inefficiency is a blight on the world and its humanity.
and return:
Engineering will save the world from inefficiency..is a blight on the . and its humanity.
That is, I want to remove duplicated words and replace them with "."
This is how I started my code:
lines= ["Engineering will save the world from inefficiency.",
"Inefficiency is a blight on the world and its humanity."]
def solve(lines):
clean_paragraph = []
for line in lines:
if line not in str(lines):
clean_paragraph.append(line)
print (clean_paragraph)
if word == word in line in clean_paragraph:
word = "."
return clean_paragraph
My logic was to create a list with all of the worst in the strings and add each one to a new list, and then, if the word was already in the list, to replace it with ".". My code returns []. Any suggestions would be greatly appreciated!

PROBLEM:
if word == word in line in clean_paragraph:
I'm not sure what you expect of this, but it will always be False. Here it is gain with some clarifying parentheses:
if word == ((word in line) in clean_paragraph):
This evaluates word in line, which may be either Boolean value. However, that value will not appear in the text of clean_paragraph, so the resulting expression is False.
REPAIR
Write the loops you're trying to encode:
for clean_line in clean_paragraph:
for word in clean_line:
At this point, you'll have to figure out what you want for variable names. You've tried to make a couple of variables stand for two different things at once (line and word; I fixed the first one).
You'll also have to learn to properly manipulate loops and their indices; part of the problem is that you've written more code at once than you can handle -- yet. Back up, write one loop at a time, and print the results, so you know what you're getting into. For instance, start with
for line in lines:
if line not in str(lines):
print("line", line, "is new: append")
clean_paragraph.append(line)
else:
print("line", line, "is already in *lines*")
I think you'll spot another problem here -- one even earlier than the one I found. Fix this, then add only one or two lines at a time, building up your program (and programming knowledge) gradually. When something doesn't work, you know it's almost certainly the new lines.

Here is one way to do this. It replaces all duplicate words with a dot.
lines_test = (["Engineering will save the world from inefficiency.",
"Inefficiency is a blight on the world and its humanity."])
def solve(lines):
clean_paragraph = ""
str_lines = " ".join(lines)
words_lines = str_lines.replace('.', ' .').split()
for word in words_lines:
if word != "." and word.lower() in clean_paragraph.lower():
word = " ."
elif word != ".":
word = " " + word
clean_paragraph += word
return clean_paragraph
print(solve(lines_test))
Output:
Engineering will save the world from inefficiency. . is . blight on . . and its humanity.
It is important to convert words or strings into the lower case or upper case (consistent form) before you make comparisons.

Another way of doing this can be:
lines_test = 'Engineering will save the world from inefficiency. Inefficiency is a blight on the world and its humanity.'
text_array = lines_test.split(" ")
formatted_text = ''
for word in text_array:
if word.lower() not in formatted_text:
formatted_text = formatted_text +' '+word
else:
formatted_text = formatted_text +' '+'.'
print(formatted_text)
Output
Engineering will save the world from inefficiency. . is . blight on . . and its humanity.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Capitalize the first word of a sentence in a text - python

Related

Extracting words/phrase followed by a phrase

string count method giving incorrect result

I'm trying to solve for the other Acronyms

Is there a way of getting this string down to 3 words?

Replacing duplicated words in python 3

Categories

Resources