How to remove characters from string?

How to remove characters from string? - python

How to remove user defined letters from a user defined sentence in Python?
Hi, if anyone is willing to take the time to try and help me out with some python code.
I am currently doing a software engineering bootcamp which the current requirement is that I create a program where a user inputs a sentence and then a user will input the letters he/she wishes to remove from the sentence.
I have searched online and there are tons of articles and threads about removing letters from strings but I cannot find one article or thread about how to remove user defined letters from a user defined string.
import re
sentence = input("Please enter a sentence: ")
letters = input("Please enter the letters you wish to remove: ")
sentence1 = re.sub(letters, '', sentence)
print(sentence1)
The expected result should remove multiple letters from a user defined string, yet this will remove a letter if you only input 1 letter. If you input multiple letters it will just print the original sentence. Any help or guidance would be much appreciated.

If I understood correctly we can use str.maketrans and str.translate methods here like
from itertools import repeat
sentence1 = sentence.translate(str.maketrans(dict(zip(letters, repeat(None)))))
What this does line by line:
create mapping of letters to None which will be interpreted as "remove this character"
translation_mapping = dict(zip(letters, repeat(None))
create translation table from it
translation_table = str.maketrans(translation_mapping)
use translation table for given str
sentence1 = sentence.translate(translation_table)
Test
>>> sentence = 'Some Text'
>>> letters = 'te'
>>> sentence.translate(str.maketrans(dict(zip(letters, repeat(None)))))
'Som Tx'
Comparison
from timeit import timeit
print('this solution:',
timeit('sentence.translate(str.maketrans(dict(zip(letters, repeat(None)))))',
'from itertools import repeat\n'
'sentence = "Hello World" * 100\n'
'letters = "el"'))
print('#FailSafe solution using `re` module:',
timeit('re.sub(str([letters]), "", sentence)',
'import re\n'
'sentence = "Hello World" * 100\n'
'letters = "el"'))
print('#raratiru solution using `str.join` method:',
timeit('"".join([x for x in sentence if x not in letters])',
'sentence = "Hello World" * 100\n'
'letters = "el"'))
gives on my PC
this solution: 3.620041800000024
#FailSafe solution using `re` module: 66.5485033
#raratiru solution using `str.join` method: 70.18480099999988
so we probably should think twice before using regular expressions everywhere and str.join'ing one-character strings.

>>> sentence1 = re.sub(str([letters]), '', sentence)
Preferably with letters entered in the form letters = 'abcd'. No spaces or punctuation marks if necessary.
.
Edit:
These are actually better:
>>> re.sub('['+letters+']', '', sentence)
>>> re.sub('['+str(letters)+']', '', sentence)
The first also removes \' if it appears in the string, although it is the prettier solution

You can use a list comprehension:
result = ''.join([x for x in sentence if x not in letters])

Your code doesn't work as expected because the regex you provide only matches the exact combination of letters you give it. What you want is to match either one of the letters, which can be achieved by putting them in brackets, for example:
import re
sentence = input("Please enter a sentence: ")
letters = input("Please enter the letters you wish to remove: ")
regex_str = '[' + letters + ']'
sentence1 = re.sub(regex_str, '', sentence)
print(sentence1)
For more regex help I would suggest visiting https://regex101.com/

user_word = input("What is your prefered sentence? ")
user_letter_to_remove = input("which letters would you like to delete? ")
#list of letter to remove
letters =str(user_letter_to_remove)
for i in letters:
user_word = user_word.replace(i,"")
print(user_word)

Related

Capitalize only the first letter of sentences in python,using split function

How can I capitalize the first letter of a input sentence in python?
Output has to be: Enter sentence to be capitalized:+ input sentence
input_string =input("Enter sentence to be capitalized: ")
def capitalize_first(input_string):
output=input_string.split('.')
i=0
while i<len(output)-1:
result=output[i][0].upper()+output[i][1:]+"."
print("Enter sentence to be capitalized:"+result)

How about input_string.title()?
input_string =input("Enter sentence to be capitalized: ")
def capitalize_first(input_string):
result = input_string.title()
print("Enter sentence to be capitalized:"+result)
This built-in method only capitalises the first character and keeps other ones lower, just like how titles work.
As you can see, the extra capitals in THIS IS AN AMAZING are changed.
>>> input_string = "Hello World THIS IS AN AMAZING day!!!"
>>> input_string.title()
>>> 'Hello World This Is An Amazing Day!!!'

In my opinion there are many ways to do so, but title() is the easiest. You can use upper() inside a for loop which iterating the input string, or even capitalize(). If the goal is to capitalise only the first letter of every word. Then you can't use above methods since they capitalise the word in traditional way (first letter is capitalise and others in simple letters regardless what user entered). To avoid that and keep any capitalise letters inside a word as it is, just like user entered.
Then this might be a solution
inputString=input("Your statement enter value or whatever")
seperatedString=inputString.split()
for i in seperatedString:
i[0].upper()
print("Anything you want to say" + i)

sentence="test Sentence"
print(sentence.title()) #makes the first letter of every word in sentence capital
print(sentence[0].upper()+sentence[1:] ) #retains case of other charecters
print(sentence.capitalize()) #makes all other charecters lowercase
Output:
Test Sentence
Test Sentence
Test sentence
Answer your specific question
def modify_string(str1):
sentence_list=str1.split('.')
modify_this=input("Enter sentence to be modified: ")
for idx, item in enumerate(sentence_list):
modify_this_copy=modify_this
if item.lower().strip()==modify_this.lower().strip():
sentence_list[idx]=modify_this_copy[0].upper()+modify_this_copy[1:]
return '. '.join(sentence_list)
string1="hello. Nice to meet you. hello. Howdy."
print(modify_string(string1))
Output
Enter sentence to be modified: hello
Hello. Nice to meet you. Hello. Howdy.

CSV file in Python not giving the exact results

I have created the following program and imported a CSV file containing words related to common phone problems. My problem is, that it will pick out "smashed" but it won't pick out "smashed," because of the comma.
So, my question is, how can I make it read the word without the comma and not giving me any errors or anything?
Any help will be appreciated :)
import csv
screen_list = {}
with open('keywords.csv') as csvfile:
readCSV = csv.reader(csvfile)
for row in readCSV:
screen_list[row[0]] = row[1]
print("Welcome to the troubleshooting program. Here we will help you solve your problems which you are having with your phone. Let's get started: ")
what_issue = input("What is the issue with your phone?: ")
what_issue = what_issue.split(' ')
results = [(solution, screen_list[solution]) for solution in what_issue if solution in screen_list]
if len(results) > 6:
print('Please only insert a maximum of 6 problems at once. ')
else:
for solution, problems in results:
print('As you mentioned the word in your sentence which is: {}, the possible outcome solution for your problem is: {}'.format(solution, problems))
exit_program = input("Type 0 and press ENTER to exit/switch off the program.")

Your problem is when you split the what_issue string. The best solution is to use here a regular expression:
>>> import re
>>> what_issue = "My screen is smashed, usb does not charge"
>>> what_issue.split(' ')
['My', 'screen', 'is', 'smashed,', 'usb', 'does', 'not', 'charge']
>>> print re.findall(r"[\w']+", what_issue )
['My', 'screen', 'is', 'smashed', 'usb', 'does', 'not', 'charge']

You've encountered a topic in Computer Science called tokenization.
It looks like you want to remove all non-alphabetical characters from the user input. An easy way to do that is to use Python's re library, which has support for regular expressions.
Here's an example of using re to do this:
import re
regex = re.compile('[^a-zA-Z]')
regex.sub('', some_string)
First we create a regular expression that matches all characters that aren't letters. Then we use this regex to replace all the matching characters in some_string with an empty string, which deletes them from the string.
A quick-and-dirty method for doing the same thing would be to use the isAlpha method that belongs to all Python strings to filter out the unwanted characters.
some_string = ''.join([char for char in some_string if char.isAlpha()])
Here we make a list that only includes the alphabetical characters from some_string. Then we join it together to create a new string, which we assign to some_string.

How to debug my Python code?

The purpose of my code it to intake a string from a user and turn into a non case-sensitive list. Then I need to intake a second string from the user then output the position of the second given string. This is my code:
UserSentence = input('Enter your chosen sentence: ') #this is where the user inputs their sentence
from string import punctuation #this 'fetches' the punctuation from the string
tbl=str.maketrans({ord(ch):" " for ch in punctuation})
UserSentence = UserSentence.lower().translate(tbl).split()#.split() turns the input sentence into a list,...
#...this will help to identify where a word appears...
#...in the sentence. The .lower() also turns the...
#...string into lowercase so it is not case sensitive.
UserWord = input('Enter a word from the sentence: ')#this is where the user inputs their word from the sentence
UserWord = UserWord.lower()#.lower() is used to make UserWord not case sensitive
for i in range(len(UserSentence)):
if UserSentence (i) == UserWord:
print ('Your chosen word appears in: ')

To index a sequence you need to use []
if UserSentence[i] == UserWord:
If you are trying to find which index (by word) their word is you can do
if UserWord in UserSentence:
print('Your word is located at {}'.format(UserSentence.index(UserWord)))
Or similarly
try:
print('Your word is located at {}'.format(UserSentence.index(UserWord)))
except ValueError:
print('Your word is not in the sentence')

Couple of errors here:
If you're using Python 2, use raw_input for strings. In Python 3 input is okay.
Your maketrans call is weird
For looking up if a word is inside a list you don't any loops or own comparisons. Python can do that for you.
Please stick to PEP0008. It tells you how you should format your code so that it's easier to read.
Your rewritten and tested code:
from string import punctuation, maketrans
user_sentence = raw_input('Enter a sentence: ')
trans = maketrans("", "")
user_sentence = user_sentence.lower().translate(trans, punctuation).split()
user_word = raw_input('Enter a word from the sentence: ')
user_word = user_word.lower()
if user_word in user_sentence:
print ('Your chosen word appears in the sentence.')

Capitalization of each sentence in a string in Python 3

This should be easy but somehow I'm not quite getting it.
My assignment is:
Write a function sentenceCapitalizer that has one parameter of type string. The function returns a
copy of the string with the first character of each sentence capitalized. The function should return
“Hello. My name is Joe. What is your name?” if the argument to the function is “hello. my name is
Joe. what is your name?” Assume a sentence is separated by a period followed by a space."
What I have so far is:
def sentenceCapitalizer (string1: str):
words = string1.split(". ")
words2=words.capitalize()
string2=words2.join()
return (string2)
print (sentenceCapitalizer("hello. my name is Joe. what is your name?"))
Upon execution I get the error:
Traceback (most recent call last):
File "C:\Users\Andrew\Desktop\lab3.py", line 83, in <module>
print (sentenceCapitalizer("hello. my name is Joe. what is your name?"))
File "C:\Users\Andrew\Desktop\lab3.py", line 79, in sentenceCapitalizer
words2=words.capitalize()
AttributeError: 'list' object has no attribute 'capitalize'"
What is that telling me and how do I fix this? I tried following instructions found on a page listed as the python software foundation so I thought I'd have this.

You are trying to use a string method on the wrong object; words is list object containing strings. Use the method on each individual element instead:
words2 = [word.capitalize() for word in words]
But this would be applying the wrong transformation; you don't want to capitalise the whole sentence, but just the first letter. str.capitalize() would lowercase everything else, including the J in Joe:
>>> 'my name is Joe'.capitalize()
'My name is joe'
Limit yourself to the first letter only, and then add back the rest of the string unchanged:
words2 = [word[0].capitalize() + word[1:] for word in words]
Next, a list object has no .join() method either; that too is a string method:
string2 = '. '.join(words2)
This'll join the strings in words2 with the '. ' (full stop and space) joiner.
You'll probably want to use better variable names here; your strings are sentences, not words, so your code could do better reflecting that.
Together that makes your function:
def sentenceCapitalizer (string1: str):
sentences = string1.split(". ")
sentences2 = [sentence[0].capitalize() + sentence[1:] for sentence in sentences]
string2 = '. '.join(sentences2)
return string2
Demo:
>>> def sentenceCapitalizer (string1: str):
... sentences = string1.split(". ")
... sentences2 = [sentence[0].capitalize() + sentence[1:] for sentence in sentences]
... string2 = '. '.join(sentences2)
... return string2
...
>>> print (sentenceCapitalizer("hello. my name is Joe. what is your name?"))
Hello. My name is Joe. What is your name?

This does the job. Since it extracts all sentences including their trailing whitespace, this also works if you have multiple paragraphs, where there are line breaks between sentences.
import re
def sentence_case(text):
# Split into sentences. Therefore, find all text that ends
# with punctuation followed by white space or end of string.
sentences = re.findall('[^.!?]+[.!?](?:\s|\Z)', text)
# Capitalize the first letter of each sentence
sentences = [x[0].upper() + x[1:] for x in sentences]
# Combine sentences
return ''.join(sentences)
Here is a working example.

To allow arbitrary whitespace after the dot. Or to capitalize the full words (It might make the difference for a Unicode text), you could use regular expressions -- re module:
#!/usr/bin/env python3
import re
def sentenceCapitalizer(text):
return re.sub(r"(\.\s+|^)(\w+)",
lambda m: m.group(1) + m.group(2).capitalize(),
text)
s = "hEllo. my name is Joe. what is your name?"
print(sentenceCapitalizer(s))
# -> 'Hello. My name is Joe. What is your name?'
Note: pep8 recommends lowercase names for functions e.g., capitalize_sentence() instead of sentenceCapitalizer().
To accept a larger variaty of texts, you could use nltk package:
# $ pip install nltk
from nltk.tokenize import sent_tokenize, word_tokenize
def sent_capitalize(sentence):
"""Capitalize the first word in the *sentence*."""
words = word_tokenize(sentence)
if words:
words[0] = words[0].capitalize()
return " ".join(words[:-1]) + "".join(words[-1:]) # dot
text = "hEllo. my name is Joe. what is your name?"
# split the text into a list of sentences
sentences = sent_tokenize(text)
print(" ".join(map(sent_capitalize, sentences)))
# -> Hello. My name is Joe. What is your name?

Just because I couldn't find this solution here.
You can use 'sent_tokenize' method from nltk.
import nltk
string = "hello. my name is Joe. what is your name?"
sentences = nltk.sent_tokenize(string)
print (' '.join([s.replace(s[0],s[0].capitalize(),1) for s in sentences]) )
And the output
Hello. My name is Joe. What is your name?

try:
import textwrap
except ImportError:
print("textwrap library module error")
try:
import re
except ImportError:
print("re library module errror")
txt = "what ever you want. this will format it nicely. it makes me happy"
txt = '.'.join(map(lambda s: s.strip().capitalize(), txt.split('. ')))
user = "Joe"
prefix = user + ":\t"
preferredWidth = 79
wrapper = textwrap.TextWrapper(initial_indent=prefix,
width=preferredWidth, subsequent_indent=' ' * len(prefix) + " ")
print(wrapper.fill(txt))
I try to use as little amount of internet dependent functions as possible. I found this works for me, hope this is of some use to someone

I did not use 'split' but just while loop instead. Here is my code.
my_string = input('Enter a string: ')
new_string = ''
new_string += my_string[0].upper()
i = 1
while i < len(my_string)-2:
new_string += my_string[i]
if my_string[i] == '.' or my_string[i] == '?' or my_string[i] == '!':
new_string += ' '
new_string += my_string[i+2].upper()
i = i+3
else:
if i == len(my_string)-3:
new_string += my_string[len(my_string)-2:len(my_string)]
i = i+1
print(new_string)
Here is how it works:
Enter a string: hello. my name is Joe. what is your name?
Hello. My name is Joe. What is your name

Removing list of words from a string

I have a list of stopwords. And I have a search string. I want to remove the words from the string.
As an example:
stopwords=['what','who','is','a','at','is','he']
query='What is hello'
Now the code should strip 'What' and 'is'. However in my case it strips 'a', as well as 'at'. I have given my code below. What could I be doing wrong?
for word in stopwords:
if word in query:
print word
query=query.replace(word,"")
If the input query is "What is Hello", I get the output as:
wht s llo
Why does this happen?

This is one way to do it:
query = 'What is hello'
stopwords = ['what', 'who', 'is', 'a', 'at', 'is', 'he']
querywords = query.split()
resultwords = [word for word in querywords if word.lower() not in stopwords]
result = ' '.join(resultwords)
print(result)
I noticed that you want to also remove a word if its lower-case variant is in the list, so I've added a call to lower() in the condition check.

the accepted answer works when provided a list of words separated by spaces, but that's not the case in real life when there can be punctuation to separate the words. In that case re.split is required.
Also, testing against stopwords as a set makes lookup faster (even if there's a tradeoff between string hashing & lookup when there's a small number of words)
My proposal:
import re
query = 'What is hello? Says Who?'
stopwords = {'what','who','is','a','at','is','he'}
resultwords = [word for word in re.split("\W+",query) if word.lower() not in stopwords]
print(resultwords)
output (as list of words):
['hello','Says','']
There's a blank string in the end, because re.split annoyingly issues blank fields, that needs filtering out. 2 solutions here:
resultwords = [word for word in re.split("\W+",query) if word and word.lower() not in stopwords] # filter out empty words
or add empty string to the list of stopwords :)
stopwords = {'what','who','is','a','at','is','he',''}
now the code prints:
['hello','Says']

building on what karthikr said, try
' '.join(filter(lambda x: x.lower() not in stopwords, query.split()))
explanation:
query.split() #splits variable query on character ' ', e.i. "What is hello" -> ["What","is","hello"]
filter(func,iterable) #takes in a function and an iterable (list/string/etc..) and
# filters it based on the function which will take in one item at
# a time and return true.false
lambda x: x.lower() not in stopwords # anonymous function that takes in variable,
# converts it to lower case, and returns true if
# the word is not in the iterable stopwords
' '.join(iterable) #joins all items of the iterable (items must be strings/chars)
#using the string/char in front of the dot, i.e. ' ' as a joiner.
# i.e. ["What", "is","hello"] -> "What is hello"

Looking at the other answers to your question I noticed that they told you how to do what you are trying to do, but they did not answer the question you posed at the end.
If the input query is "What is Hello", I get the output as:
wht s llo
Why does this happen?
This happens because .replace() replaces the substring you give it exactly.
for example:
"My, my! Hello my friendly mystery".replace("my", "")
gives:
>>> "My, ! Hello friendly stery"
.replace() is essentially splitting the string by the substring given as the first parameter and joining it back together with the second parameter.
"hello".replace("he", "je")
is logically similar to:
"je".join("hello".split("he"))
If you were still wanting to use .replace to remove whole words you might think adding a space before and after would be enough, but this leaves out words at the beginning and end of the string as well as punctuated versions of the substring.
"My, my! hello my friendly mystery".replace(" my ", " ")
>>> "My, my! hello friendly mystery"
"My, my! hello my friendly mystery".replace(" my", "")
>>> "My,! hello friendlystery"
"My, my! hello my friendly mystery".replace("my ", "")
>>> "My, my! hello friendly mystery"
Additionally, adding spaces before and after will not catch duplicates as it has already processed the first sub-string and will ignore it in favor of continuing on:
"hello my my friend".replace(" my ", " ")
>>> "hello my friend"
For these reasons your accepted answer by Robby Cornelissen is the recommended way to do what you are wanting.

" ".join([x for x in query.split() if x not in stopwords])

stopwords=['for','or','to']
p='Asking for help, clarification, or responding to other answers.'
for i in stopwords:
n=p.replace(i,'')
p=n
print(p)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to remove characters from string? - python

You can use a list comprehension: result = ''.join([x for x in sentence if x not in letters])

user_word = input("What is your prefered sentence? ") user_letter_to_remove = input("which letters would you like to delete? ") #list of letter to remove letters =str(user_letter_to_remove) for i in letters: user_word = user_word.replace(i,"") print(user_word)

Related

Capitalize only the first letter of sentences in python,using split function

CSV file in Python not giving the exact results

How to debug my Python code?

Capitalization of each sentence in a string in Python 3

Removing list of words from a string

Categories

Resources