This should be easy but somehow I'm not quite getting it.
My assignment is:
Write a function sentenceCapitalizer that has one parameter of type string. The function returns a
copy of the string with the first character of each sentence capitalized. The function should return
“Hello. My name is Joe. What is your name?” if the argument to the function is “hello. my name is
Joe. what is your name?” Assume a sentence is separated by a period followed by a space."
What I have so far is:
def sentenceCapitalizer (string1: str):
words = string1.split(". ")
words2=words.capitalize()
string2=words2.join()
return (string2)
print (sentenceCapitalizer("hello. my name is Joe. what is your name?"))
Upon execution I get the error:
Traceback (most recent call last):
File "C:\Users\Andrew\Desktop\lab3.py", line 83, in <module>
print (sentenceCapitalizer("hello. my name is Joe. what is your name?"))
File "C:\Users\Andrew\Desktop\lab3.py", line 79, in sentenceCapitalizer
words2=words.capitalize()
AttributeError: 'list' object has no attribute 'capitalize'"
What is that telling me and how do I fix this? I tried following instructions found on a page listed as the python software foundation so I thought I'd have this.
You are trying to use a string method on the wrong object; words is list object containing strings. Use the method on each individual element instead:
words2 = [word.capitalize() for word in words]
But this would be applying the wrong transformation; you don't want to capitalise the whole sentence, but just the first letter. str.capitalize() would lowercase everything else, including the J in Joe:
>>> 'my name is Joe'.capitalize()
'My name is joe'
Limit yourself to the first letter only, and then add back the rest of the string unchanged:
words2 = [word[0].capitalize() + word[1:] for word in words]
Next, a list object has no .join() method either; that too is a string method:
string2 = '. '.join(words2)
This'll join the strings in words2 with the '. ' (full stop and space) joiner.
You'll probably want to use better variable names here; your strings are sentences, not words, so your code could do better reflecting that.
Together that makes your function:
def sentenceCapitalizer (string1: str):
sentences = string1.split(". ")
sentences2 = [sentence[0].capitalize() + sentence[1:] for sentence in sentences]
string2 = '. '.join(sentences2)
return string2
Demo:
>>> def sentenceCapitalizer (string1: str):
... sentences = string1.split(". ")
... sentences2 = [sentence[0].capitalize() + sentence[1:] for sentence in sentences]
... string2 = '. '.join(sentences2)
... return string2
...
>>> print (sentenceCapitalizer("hello. my name is Joe. what is your name?"))
Hello. My name is Joe. What is your name?
This does the job. Since it extracts all sentences including their trailing whitespace, this also works if you have multiple paragraphs, where there are line breaks between sentences.
import re
def sentence_case(text):
# Split into sentences. Therefore, find all text that ends
# with punctuation followed by white space or end of string.
sentences = re.findall('[^.!?]+[.!?](?:\s|\Z)', text)
# Capitalize the first letter of each sentence
sentences = [x[0].upper() + x[1:] for x in sentences]
# Combine sentences
return ''.join(sentences)
Here is a working example.
To allow arbitrary whitespace after the dot. Or to capitalize the full words (It might make the difference for a Unicode text), you could use regular expressions -- re module:
#!/usr/bin/env python3
import re
def sentenceCapitalizer(text):
return re.sub(r"(\.\s+|^)(\w+)",
lambda m: m.group(1) + m.group(2).capitalize(),
text)
s = "hEllo. my name is Joe. what is your name?"
print(sentenceCapitalizer(s))
# -> 'Hello. My name is Joe. What is your name?'
Note: pep8 recommends lowercase names for functions e.g., capitalize_sentence() instead of sentenceCapitalizer().
To accept a larger variaty of texts, you could use nltk package:
# $ pip install nltk
from nltk.tokenize import sent_tokenize, word_tokenize
def sent_capitalize(sentence):
"""Capitalize the first word in the *sentence*."""
words = word_tokenize(sentence)
if words:
words[0] = words[0].capitalize()
return " ".join(words[:-1]) + "".join(words[-1:]) # dot
text = "hEllo. my name is Joe. what is your name?"
# split the text into a list of sentences
sentences = sent_tokenize(text)
print(" ".join(map(sent_capitalize, sentences)))
# -> Hello. My name is Joe. What is your name?
Just because I couldn't find this solution here.
You can use 'sent_tokenize' method from nltk.
import nltk
string = "hello. my name is Joe. what is your name?"
sentences = nltk.sent_tokenize(string)
print (' '.join([s.replace(s[0],s[0].capitalize(),1) for s in sentences]) )
And the output
Hello. My name is Joe. What is your name?
try:
import textwrap
except ImportError:
print("textwrap library module error")
try:
import re
except ImportError:
print("re library module errror")
txt = "what ever you want. this will format it nicely. it makes me happy"
txt = '.'.join(map(lambda s: s.strip().capitalize(), txt.split('. ')))
user = "Joe"
prefix = user + ":\t"
preferredWidth = 79
wrapper = textwrap.TextWrapper(initial_indent=prefix,
width=preferredWidth, subsequent_indent=' ' * len(prefix) + " ")
print(wrapper.fill(txt))
I try to use as little amount of internet dependent functions as possible. I found this works for me, hope this is of some use to someone
I did not use 'split' but just while loop instead. Here is my code.
my_string = input('Enter a string: ')
new_string = ''
new_string += my_string[0].upper()
i = 1
while i < len(my_string)-2:
new_string += my_string[i]
if my_string[i] == '.' or my_string[i] == '?' or my_string[i] == '!':
new_string += ' '
new_string += my_string[i+2].upper()
i = i+3
else:
if i == len(my_string)-3:
new_string += my_string[len(my_string)-2:len(my_string)]
i = i+1
print(new_string)
Here is how it works:
Enter a string: hello. my name is Joe. what is your name?
Hello. My name is Joe. What is your name
Related
Unclear on how to frame the following function correctly:
Creating a function that will take in a string and return the string in camel case without spaces (or pascal case if the first letter was already capital), removing special characters
text = "This-is_my_test_string,to-capitalize"
def to_camel_case(text):
# Return 1st letter of text + all letters after
return text[:1] + text.title()[1:].replace(i" ") if not i.isdigit()
# Output should be "ThisIsMyTestStringToCapitalize"
the "if" statement at the end isn't working out, and I wrote this somewhat experimentally, but with a syntax fix, could the logic work?
Providing the input string does not contain any spaces then you could do this:
from re import sub
def to_camel_case(text, pascal=False):
r = sub(r'[^a-zA-Z0-9]', ' ', text).title().replace(' ', '')
return r if pascal else r[0].lower() + r[1:]
ts = 'This-is_my_test_string,to-capitalize'
print(to_camel_case(ts, pascal=True))
print(to_camel_case(ts))
Output:
ThisIsMyTestStringToCapitalize
thisIsMyTestStringToCapitalize
Here is a short solution using regex. First it uses title() as you did, then the regex finds non-alphanumeric-characters and removes them, and finally we take the first character to handle pascal / camel case.
import re
def to_camel_case(s):
s1 = re.sub('[^a-zA-Z0-9]+', '', s.title())
return s[0] + s1[1:]
text = "this-is2_my_test_string,to-capitalize"
print(to_camel_case(text)) # ThisIsMyTestStringToCapitalize
The below should work for your example.
Splitting apart your example by anything that isn's alphanumeric or a space. Then capitalizing each word. Finally, returning the re-joined string.
import re
def to_camel_case(text):
words = re.split(r'[^a-zA-Z0-9\s]', text)
return "".join([word.capitalize() for word in words])
text_to_camelcase = "This-is_my_test_string,to-capitalize"
print(to_camel_case(text_to_camelcase))
use the split function to split between anything that is not a letter or a whitespace and the function .capitalize() to capitalize single words
import re
text_to_camelcase = "This-is_my_test_string,to-capitalize"
def to_camel_case(text):
split_text = re.split(r'[^a-zA-Z0-9\s]', text)
cap_string = ''
for word in split_text:
cap_word = word.capitalize()
cap_string += cap_word
return cap_string
print(to_camel_case(text_to_camelcase))
using function
def make_cap(sentence):
return sentence.title()
tryining out
make_cap("hello world")
'Hello World'
# it workd but when I have world like "aren't" and 'isn't". how to write function for that
a = "I haven't worked hard"
make_cap(a)
"This Isn'T A Right Thing" # it's wrong I am aware of \ for isn\'t but confused how to include it in function
This should work:
def make_cap(sentence):
return " ".join(word[0].title() + (word[1:] if len(word) > 1 else "") for word in sentence.split(" "))
It manually splits the word by spaces (and not by any other character), and then capitalizes the first letter of each token. It does this by separating that first letter out, capitalizing it, and then concatenating the rest of the word. I used a ternary if statement to avoid an IndexError if the word is only one letter long.
Use .capwords() from the string library.
import string
def make_cap(sentence):
return string.capwords(sentence)
Demo: https://repl.it/repls/BlankMysteriousMenus
I found this method to be very helpful for formatting all different types of texts as titles.
from string import capwords
text = "I can't go to the USA due to budget concerns"
title = ' '.join([capwords(w) if w.islower() else w for w in text.split()])
print(title) # I Can't Go To The USA Due To Budget Concerns
For example, this sentence:
say "mosquito!"
I try to capitalize with the following code:
'say "mosquito!"'.capitalize()
Which returns this:
'Say "mosquito!"’
However, the desired result is:
'Say "Mosquito!"’
You can use str.title:
print('say "mosquito!"'.title())
# Output: Say "Mosquito!"
Looks like Python has a built-in method for this!
This is quite tricky. I will look for the first letter (alphabet) of a word. Split the string into words and join them again after converting the first letter of each word in upper case.
def start(word):
for n in range(len(word)):
if word[n].isalpha():
return n
return 0
strng = 'say mosquito\'s house'
print( ' '.join(word[:start(word)] + word[start(word)].upper() + word[start(word)+1:] for word in strng.split()))
Result:
Say "Mosquito's House"
You could do it using a lambda in regular expression substitution:
string = 'say "mosquito\'s house" '
import re
caps = re.sub("((^| |\")[a-z])",lambda m:m.group(1).upper(),string)
# 'Say "Mosquito\'s House" '
How to remove user defined letters from a user defined sentence in Python?
Hi, if anyone is willing to take the time to try and help me out with some python code.
I am currently doing a software engineering bootcamp which the current requirement is that I create a program where a user inputs a sentence and then a user will input the letters he/she wishes to remove from the sentence.
I have searched online and there are tons of articles and threads about removing letters from strings but I cannot find one article or thread about how to remove user defined letters from a user defined string.
import re
sentence = input("Please enter a sentence: ")
letters = input("Please enter the letters you wish to remove: ")
sentence1 = re.sub(letters, '', sentence)
print(sentence1)
The expected result should remove multiple letters from a user defined string, yet this will remove a letter if you only input 1 letter. If you input multiple letters it will just print the original sentence. Any help or guidance would be much appreciated.
If I understood correctly we can use str.maketrans and str.translate methods here like
from itertools import repeat
sentence1 = sentence.translate(str.maketrans(dict(zip(letters, repeat(None)))))
What this does line by line:
create mapping of letters to None which will be interpreted as "remove this character"
translation_mapping = dict(zip(letters, repeat(None))
create translation table from it
translation_table = str.maketrans(translation_mapping)
use translation table for given str
sentence1 = sentence.translate(translation_table)
Test
>>> sentence = 'Some Text'
>>> letters = 'te'
>>> sentence.translate(str.maketrans(dict(zip(letters, repeat(None)))))
'Som Tx'
Comparison
from timeit import timeit
print('this solution:',
timeit('sentence.translate(str.maketrans(dict(zip(letters, repeat(None)))))',
'from itertools import repeat\n'
'sentence = "Hello World" * 100\n'
'letters = "el"'))
print('#FailSafe solution using `re` module:',
timeit('re.sub(str([letters]), "", sentence)',
'import re\n'
'sentence = "Hello World" * 100\n'
'letters = "el"'))
print('#raratiru solution using `str.join` method:',
timeit('"".join([x for x in sentence if x not in letters])',
'sentence = "Hello World" * 100\n'
'letters = "el"'))
gives on my PC
this solution: 3.620041800000024
#FailSafe solution using `re` module: 66.5485033
#raratiru solution using `str.join` method: 70.18480099999988
so we probably should think twice before using regular expressions everywhere and str.join'ing one-character strings.
>>> sentence1 = re.sub(str([letters]), '', sentence)
Preferably with letters entered in the form letters = 'abcd'. No spaces or punctuation marks if necessary.
.
Edit:
These are actually better:
>>> re.sub('['+letters+']', '', sentence)
>>> re.sub('['+str(letters)+']', '', sentence)
The first also removes \' if it appears in the string, although it is the prettier solution
You can use a list comprehension:
result = ''.join([x for x in sentence if x not in letters])
Your code doesn't work as expected because the regex you provide only matches the exact combination of letters you give it. What you want is to match either one of the letters, which can be achieved by putting them in brackets, for example:
import re
sentence = input("Please enter a sentence: ")
letters = input("Please enter the letters you wish to remove: ")
regex_str = '[' + letters + ']'
sentence1 = re.sub(regex_str, '', sentence)
print(sentence1)
For more regex help I would suggest visiting https://regex101.com/
user_word = input("What is your prefered sentence? ")
user_letter_to_remove = input("which letters would you like to delete? ")
#list of letter to remove
letters =str(user_letter_to_remove)
for i in letters:
user_word = user_word.replace(i,"")
print(user_word)
I must use Python to print the number of words and mean length of words in each sentence of a text file. I cannot use NLTK or Regex for this assignment.
The sentence in the file ends with a period, exclamation point, or question mark. A hyphen, dash, or apostrophe does not end a sentence. Quotation marks do not end a sentence. But also, some periods do not end sentences. For example, Mrs., Mr., Dr., Fr., Jr., St., are all commonly occurring abbreviations.
For example, if input text is:
"My name? Bob. Your name? Lily! Hi there"
...output should be:
[(no. of words, mean length of words in sentence1),
(no. of words, mean length of words in sentence2),
...]
The code:
p= ("Mrs.","Mr.","St.")
def punct_after_ab(texts):
new_text = texts
for abb in p:
new_text = new_text.replace(abb,abb[:-1])
return print(new_text)
import numpy
def word_list(text):
special_characters = ["'",","]
clean_text = text
for string in special_characters:
clean_text = clean_text.replace(string, "")
count_list = [len(i) for i in clean_text.split()]
count = [numpy.mean(count_list)]
return print((count_list),(count))
But when I tested this, it does not split sentences.
Use something along the lines of .split(' ') to separate the words (in the stated case by spaces) and then use array operations and basic math/statistics to get your answers. If you update your question to be more specific and include some of your own code I would be willing to revise my answer accordingly.
You will find that on this site if you do not put much effort into the question you are asking, you aren't going to get very helpful answers. Try doing some research and writing as much code as you can before asking questions. This makes it much easier for people to help you and they will be more willing. As of right now it seems like you are just trying to get someone to do your homework for you.
Update:
You code works for the most part, there's just some things you need to change. I played around with what you have and I was able to break the text down to arrays of sentences from which you could continue to run statistics on them.
input.txt:
My name? Mr. Bob. Your name? Mrs. Lily!
What's up?
test.py (I use python 3.6):
def punct_after_ab(texts):
p = ("Mrs.", "Mr.", "St.")
new_text = texts
for abb in p:
new_text = new_text.replace(abb,abb[:-1])
return new_text
def clean_text(text):
special_characters = ["'", ","]
clean_text = text
for string in special_characters:
clean_text = clean_text.replace(string, "")
return clean_text
def split_sentence(text):
#Initialize vars
sentences = []
start = 0
i = 0
# Loop through the text until you find punctuation,
# then add the sentence to the final array
for char in text:
if char == '.':
sentences.append(text[start:i+1])
start = i + 2
if char == '?':
sentences.append(text[start:i+1])
start = i + 2
if char == '!':
sentences.append(text[start:i+1])
start = i + 2
i += 1
# Print the sentences to console
for sentence in sentences:
print(sentence)
def main():
# Ask user for file name
file = input("Enter file name: ")
# Open the file and strip newline chars
fd = open(file).read()
fd = fd.strip("\n")
# Remove punctuation that doesn't delineate sentences
text = punct_after_ab(fd)
text = clean_text(text)
# Separate sentences
split_sentence(text)
# Run program
if __name__ == '__main__':
main()
I was able to get this to output the text below:
Enter file name: input.txt
My name?
Mr Bob.
Your name?
Mrs Lily!
Whats up?
Process finished with exit code 0
From there you can easily do your sentence statistics. I just put typed this up so you'll probably want to go through it and clean it up a bit. I hope this helps.