I have this code:
def remove_punctuation(self,text):
exclude = set(string.punctuation)
a=''.join(ch for ch in text if ch not in exclude)
return ''.join(c for c in a if not ud.category(c).startswith('P'))
First I would like to know what this does :
ch for ch in text if ch not in exclude
How is it possible to write a for loop like that?
second, I want to replace those punctuation let's say in a text like this :
"hello_there?my_friend!" with a space using the above code. How can I change that code to do that?
The piece of code:
a = ''.join([ch for ch in text if ch not in exclude])
is equivalent to
string_without_punctuation = ''
exclude = set(string.punctuation) # =set('!"#$%&'()*+,-./:;<=>?#[\]^_`{|}~')
for character in text:
if character not in exclude:
string_without_punctuation += character
You could simply do this to replace the punctuation with spaces:
string_without_punctuation = ''
exclude = set(string.punctuation) # =set('!"#$%&'()*+,-./:;<=>?#[\]^_`{|}~')
for character in text:
if character not in exclude:
string_without_punctuation += character
else:
string_without_punctuation += ' '
I'd recommend using str.translate instead of manually rebuilding the string. Make a lookup table mapping characters to the strings you want to replace them with.
trans = str.maketrans(dict.fromkeys(string.punctuation, ' '))
"hello_there?my_friend!".translate(trans)
# 'hello there my friend '
Related
I need to take the initial letter of every word, moving it to the end of the word and adding 'arg'. For such I tried the following way
def pirate(str):
list_str = str.split(' ')
print(list_str)
new_str = ''
for lstr in list_str:
first_element = lstr[0]
second_element = lstr[1:]
new_str += second_element + first_element + 'arg' + ' '
return new_str
print(pirate('Hello! how are, you!!'))
The expected output is: elloHarg! owharg reaarg, ouyarg!!
However, I am getting following output: ello!Harg owharg re,aarg ou!!yarg
How can I make it work the following usecase?
Punctuations should remain at the end of the word even after translation. Assume Punctuations wont appear after than end of the word. Punctuations to be considered are .,:;?! There could be multiple punctuations present (e.g yes!!)
Here is a short and efficient solution using a regex:
import re
re.sub(r'(\w)(\w+)', r'\2\1arg', 'Hello! how are, you!!')
This is literally: replace each single letter followed by more letters by the more letters first, then the single letter and 'arg'
Output:
'elloHarg! owharg reaarg, ouyarg!!'
As a function:
def pirate(s):
return re.sub(r'(\w)(\w+)', r'\2\1arg', s)
I am trying to make a lossy text compression program that removes all vowels from the input, except for if the vowel is the first letter of a word. I keep getting this "string index out of range" error on line 6. Please help!
text = str(input('Message: '))
text = (' ' + text)
for i in range(0, len(text)):
i = i + 1
if str(text[i-1]) != ' ': #LINE 6
text = text.replace('a', '')
text = text.replace('e', '')
text = text.replace('i', '')
text = text.replace('o', '')
text = text.replace('u', '')
print(text)
As busybear notes, the loop isn't necessary: your replacements don't depend on i.
Here's how I'd do it:
def strip_vowels(s): # Remove all vowels from a string
for v in 'aeiou':
s = s.replace(v, '')
return s
def compress_word(s):
if not s: return '' # Needed to avoid an out-of-range error on the empty string
return s[0] + strip_vowels(s[1:]) # Strip vowels from all but the first letter
def compress_text(s): # Apply to each word
words = text.split(' ')
new_words = compress_word(w) for w in words
return ' '.join(new_words)
When you replace letters with a blank, your word gets shorter. So what was originally len(text) is going to be out of bounds if you remove any letters. Do note however, replace is replacing all occurrences within your string, so a loop isn't even necessary.
An alternative to use the loop is to just keep track of the index of letters to replace while going through the loop, then replace after the loop is complete.
Shortening your string length by replacing any char with "" means that if you remove a character, len(text) used in your iterator is longer than the actual string length. There are plenty of alternative solutions. for example,
text_list = list(text)
for i in range(1, len(text_list)):
if text_list[i] in "aeiou":
text_list[i] = ""
text = "".join(text_list)
By turning your string into a list of its composite characters, you can remove characters but maintain the list length (since empty elements are allowed) then rejoin them.
Be sure to account for special cases, such as len(text)<2.
The program correctly identifies the words regardless of punctuation. I am having trouble integrate this into spam_indicator(text).
def spam_indicator(text):
text=text.split()
w=0
s=0
words=[]
for char in string.punctuation:
text = text.replace(char, '')
return word
for word in text:
if word.lower() not in words:
words.append(word.lower())
w=w+1
if word.lower() in SPAM_WORDS:
s=s+1
return float("{:.2f}".format(s/w))
enter image description here
The second block is wrong. I am trying to remove punctuations to run the function.
Try removing the punctuation first, then split the text into words.
def spam_indicator(text):
for char in string.punctuation:
text = text.replace(char, ' ') # N.B. replace with ' ', not ''
text = text.split()
w = 0
s = 0
words = []
for word in text:
if word.lower() not in words:
words.append(word.lower())
w=w+1
if word.lower() in SPAM_WORDS:
s=s+1
return float("{:.2f}".format(s/w))
There are many improvements that could be made to your code.
Use a set for words rather than a list. Since a set can not contain duplicates you don't need to check whether you've already seen the word before adding it to the set.
Use str.translate() to remove the punctuation. You want to replace punctuation with whitespace so that the split() will split the text into words.
Use round() instead of converting to a string then to a float.
Here is an example:
import string
def spam_indicator(text):
trans_table = {ord(c): ' ' for c in string.punctuation}
text = text.translate(trans_table).lower()
text = text.split()
word_count = 0
spam_count = 0
words = set()
for word in text:
if word not in SPAM_WORDS:
words.add(word)
word_count += 1
else:
spam_count += 1
return round(spam_count / word_count, 2)
You need to take care not to divide by 0 if there are no non-spam words. Anyway, I'm not sure what you want as the spam indicator value. Perhaps it should be the number of spam words divided by the total number of words (both spam and non-spam) to make it a value between 0 and 1?
Write a function that accepts an input string consisting of alphabetic
characters and removes all the leading whitespace of the string and
returns it without using .strip(). For example if:
input_string = " Hello "
then your function should return a string such as:
output_string = "Hello "
The below is my program for removing white spaces without using strip:
def Leading_White_Space (input_str):
length = len(input_str)
i = 0
while (length):
if(input_str[i] == " "):
input_str.remove()
i =+ 1
length -= 1
#Main Program
input_str = " Hello "
result = Leading_White_Space (input_str)
print (result)
I chose the remove function as it would be easy to get rid off the white spaces before the string 'Hello'. Also the program tells to just eliminate the white spaces before the actual string. By my logic I suppose it not only eliminates the leading but trailing white spaces too. Any help would be appreciated.
You can loop over the characters of the string and stop when you reach a non-space one. Here is one solution :
def Leading_White_Space(input_str):
for i, c in enumerate(input_str):
if c != ' ':
return input_str[i:]
Edit :
#PM 2Ring mentionned a good point. If you want to handle all types of types of whitespaces (e.g \t,\n,\r), you need to use isspace(), so a correct solution could be :
def Leading_White_Space(input_str):
for i, c in enumerate(input_str):
if not c.isspace():
return input_str[i:]
Here's another way to strip the leading whitespace, that actually strips all leading whitespace, not just the ' ' space char. There's no need to bother tracking the index of the characters in the string, we just need a flag to let us know when to stop checking for whitespace.
def my_lstrip(input_str):
leading = True
for ch in input_str:
if leading:
# All the chars read so far have been whitespace
if not ch.isspace():
# The leading whitespace is finished
leading = False
# Start saving chars
result = ch
else:
# We're past the whitespace, copy everything
result += ch
return result
# test
input_str = " \n \t Hello "
result = my_lstrip(input_str)
print(repr(result))
output
'Hello '
There are various other ways to do this. Of course, in a real program you'd simply use the string .lstrip method, but here are a couple of cute ways to do it using an iterator:
def my_lstrip(input_str):
it = iter(input_str)
for ch in it:
if not ch.isspace():
break
return ch + ''.join(it)
and
def my_lstrip(input_str):
it = iter(input_str)
ch = next(it)
while ch.isspace():
ch = next(it)
return ch + ''.join(it)
Use re.sub
>>> input_string = " Hello "
>>> re.sub(r'^\s+', '', input_string)
'Hello '
or
>>> def remove_space(s):
ind = 0
for i,j in enumerate(s):
if j != ' ':
ind = i
break
return s[ind:]
>>> remove_space(input_string)
'Hello '
>>>
Just to be thorough and without using other modules, we can also specify which whitespace to remove (leading, trailing, both or all), including tab and new line characters. The code I used (which is, for obvious reasons, less compact than other answers) is as follows and makes use of slicing:
def no_ws(string,which='left'):
"""
Which takes the value of 'left'/'right'/'both'/'all' to remove relevant
whitespace.
"""
remove_chars = (' ','\n','\t')
first_char = 0; last_char = 0
if which in ['left','both']:
for idx,letter in enumerate(string):
if not first_char and letter not in remove_chars:
first_char = idx
break
if which == 'left':
return string[first_char:]
if which in ['right','both']:
for idx,letter in enumerate(string[::-1]):
if not last_char and letter not in remove_chars:
last_char = -(idx + 1)
break
return string[first_char:last_char+1]
if which == 'all':
return ''.join([s for s in string if s not in remove_chars])
you can use itertools.dropwhile to remove all particualar characters from the start of you string like this
import itertools
def my_lstrip(input_str,remove=" \n\t"):
return "".join( itertools.dropwhile(lambda x:x in remove,input_str))
to make it more flexible, I add an additional argument called remove, they represent the characters to remove from the string, with a default value of " \n\t", then with dropwhile it will ignore all characters that are in remove, to check this I use a lambda function (that is a practical form of write short anonymous functions)
here a few tests
>>> my_lstrip(" \n \t Hello ")
'Hello '
>>> my_lstrip(" Hello ")
'Hello '
>>> my_lstrip(" \n \t Hello ")
'Hello '
>>> my_lstrip("--- Hello ","-")
' Hello '
>>> my_lstrip("--- Hello ","- ")
'Hello '
>>> my_lstrip("- - - Hello ","- ")
'Hello '
>>>
the previous function is equivalent to
def my_lstrip(input_str,remove=" \n\t"):
i=0
for i,x in enumerate(input_str):
if x not in remove:
break
return input_str[i:]
def main():
print('Please enter a sentence without spaces and each word has ' + \
'a capital letter.')
sentence = input('Enter your sentence: ')
for ch in sentence:
if ch.isupper():
capital = ch
sentence = sentence.replace(capital, ' ' + capital)
main()
Ex: sentence = 'ExampleSentenceGoesHere'
I need this to print as: Example sentence goes here
as of right now, it prints as: Example Sentence Goes Here (with space at the beginning)
You can iterate over the string character by character and replace every upper case letter with a space and appropriate lower case letter:
>>> s = 'ExampleSentenceGoesHere'
>>> "".join(' ' + i.lower() if i.isupper() else i for i in s).strip().capitalize()
'Example sentence goes here'
Note that check if the string is in upper case is done by isupper(). Calling strip() and capitalize() just helps to deal with the first letter.
Also see relevant threads:
Elegant Python function to convert CamelCase to snake_case?
How to check if a character is upper-case in Python?
You need to convert the each uppercase letter to a lowercase one using capital.lower(). You should also ignore the first letter of the sentence so it stays capitalised and doesn't have a space first. You can do this using a flag as such:
is_first_letter = True
for ch in sentence:
if is_first_letter:
is_first_letter = False
continue
if ch.isupper():
capital = ch
sentence = sentence.replace(capital, ' ' + capital.lower())
I'd probably use re and re.split("[A-Z]", text) but I'm assuming you can't do that because this looks like homework. How about:
def main():
text = input(">>")
newtext = ""
for character in text:
if character.isupper():
ch = " " + character.lower()
else:
ch = character
newtext += ch
text = text[0]+newtext[2:]
You could also do:
transdict = {letter:" "+letter.lower() for letter in 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'}
transtable = str.maketrans(transdict)
text.translate(transtable).strip().capitalize()
But again I think that's outside the scope of the assignment