Dealing with a string that consists of only space characters - python

Genre = str(input(prompt))
GenreStripped = Genre.strip()
Length = len(GenreStripped)
while Length > 0:
Im trying to make a function that does not only accept ' ' spaces as an input, what are some other python functions to deal with ' ' as an input, because the strip() function does not strip ' '.

.strip() only removes leading and trailing whitespace, not spaces in between characters. you could use .replace(" ", "") instead if you wanted to remove every space
Try running the following to get a sense of the differences between these methods:
print(" test test test ".replace(" ", "")) # 'testtesttest'
print(" test test test ".strip()) # 'test test test'

Related

how to remove the first comma of a print statement in python

I have been trying to remove the first two commas of a print statement and am having trouble doing this.
The print statement essentially allows to print the elements of a set without the brackets (using *) and with commas separating the elements (using sep=", ").
Only problem is the first two pieces of the statement (a cross mark and a sentence) also get separated with a comma. This is not desired (as shown in screenshot).
I would like to know how I can remove the comma after the cross mark and the comma after the colon.
FYI: '\033[91m' = red font colour, '\u274C' = cross mark, '\033[0m' = no color
My code is shown below.
print('\033[91m' + '\u274C', "Paragraphs contain unspecified font(s):" + '\033[0m', *invalid_font, sep=", ")
You can either, use two separate print statements.
print('\033[91m\u274C '
'Paragraphs contain unspecified font(s):'
'\033[0m', end='')
print(*invalid_font, sep=", ")
or join the fonts with comma and space.
print('\033[91m\u274C '
'Paragraphs contain unspecified font(s):'
'\033[0m',
', '.join(map(str, invalid_font)))
Edit
As #S3DEV pointed out in the comments, it is unnecessary to map the invalid_font iterable to str if it is already an iterable of str. In this case, you just need
print('\033[91m\u274C '
'Paragraphs contain unspecified font(s):'
'\033[0m',
', '.join(invalid_font))

How to replace character ' " ' randomly repeated in a string with alternating characters ' “ ' and ' ” ' (python)?

I am trying to replace any normal straight quotations (") in a given string with sets of curved open and closed quotations (“ and ”). This would mean the first, third etc "s would be replaced with “, and the second, fourth etc "s would be replaced with ”.
I have tried finding the index of the first quote, creating a splice up to it, and replacing all " in that splice with the “. I have followed that by creating a splice from this new quotes index+1 to the end and replacing all " with ”. The thing is, I am not going to be sure of the length or number of "s in the string provided, and so need to figure out a way to loop some sort of system like this.
This works to only convert a string with 2 quotes properly:
def convert_quotes(text):
'''(str) -> str
Convert the straight quotation mark into open/close quotations.
>>> convert_quotes('"Hello"')
'“Hello”'
>>> convert_quotes('"Hi" and "Hello"')
'“Hi” and “Hello”'
>>> convert_quotes('"')
'“'
>>> convert_quotes('"""')
'“”“'
>>> convert_quotes('" "o" "i" "')
'“ ”o“ ”i“ ”'
'''
find=text.find('"')
if find != -1:
for i in text:
#first convert first found " to “
text1 = text[:find+1]
replace1=text1.replace('"','“')
text2 = text[find+1:]
replace2=text2.replace('"','”')
text=replace1+replace2
return text
As seen in my docstring, '"Hello" should become “Hello”, but '" "o" "i" "' should become “ ”o“ ”i“ ”.
You might want to collect all locations with quotes, and then change the characters accordingly. This requires an intermediate list of characters (s_list below):
import re
s = '"Hi" and "Hello"'
s_list = list(s)
quote_position = [p.start() for p in re.finditer('"', s)]
for po, pc in zip(quote_position[::2], quote_position[1::2]):
s_list[po] = '“'
s_list[pc] = '”'
s = "".join(s_list)
You can use the re.sub function.
I'll use parenthesises for better readability, just replace them with your quotes.
import re
s = """
sdffsd"fsdfsdfdsf fdsf<s" fgdgdfgdf " gfdgdfgd" gdfgdfgdf"
bla re bla
dfsfds " fdsfsdf " fsdfsd "
and the final odd " is here
"""
def func(match): # function for to be called for each sub() step
return("(" + match.group()[1:-1] + ")")
rex = re.compile(r'"[^"]*"') # regular expression for a quoted string.
result = rex.sub(func, s) # substitute each match in s with func(match)
result = result.replace('"', '(') # take care of last remaining " if existing
print(result)
The output would be:
sdffsd(fsdfsdfdsf fdsf<s) fgdgdfgdf ( gfdgdfgd) gdfgdfgdf(
bla re bla
dfsfds ) fdsfsdf ( fsdfsd )
and the final odd ( is here
A second solution without using the re module:
s = """
sdffsd"fsdfsdfdsf fdsf<s" fgdgdfgdf " gfdgdfgd" gdfgdfgdf"
bla re bla
dfsfds " fdsfsdf " fsdfsd "
and the final odd " is here
"""
while True:
if not '"' in s:
break
s = s.replace('"', '(', 1)
s = s.replace('"', ')', 1)
print(s)
I didn't make any effort to make it efficient.
The focus was on being simple.

Regex does not identify '#' for removing "#' from words starting with "#'

How to remove # from words in a string if it is the first character in a word. It should remain if it is present by itself, in the middle of a word, or at the end of a word.
Currently I am using the regex expression:
test = "# #DataScience"
test = re.sub(r'\b#\w\w*\b', '', test)
for removing the # from the words starting with # but it does not work at all. It returns the string as it is
Can anyone please tell me why the # is not being recognized and removed?
Examples -
test - "# #DataScience"
Expected Output - "# DataScience"
Test - "kjndjk#jnjkd"
Expected Output - "kjndjk#jnjkd"
Test - "# #DataScience #KJSBDKJ kjndjk#jnjkd #jkzcjkh# iusadhuish#""
Expected Output -"# DataScience KJSBDKJ kjndjk#jnjkd jkzcjkh# iusadhuish#"
a = '# #DataScience'
b = 'kjndjk#jnjkd'
c = "# #DataScience #KJSBDKJ kjndjk#jnjkd #jkzcjkh# iusadhuish#"
regex = '(\s+)#(\S)'
import re
print re.sub(regex, '\\1\\2', a)
print re.sub(regex, '\\1\\2', b)
print re.sub(regex, '\\1\\2', c)
You can split your string by space ' ' to make a list of all words in the string. Then loop in that list, check each word for your given condition and replace hash if necessary. After that you can join the list by space ' ' to create a string and return it.
def remove_hash(str):
words = str.split(' ') # Split the string into a list
without_hash = [] # Create a list for saving the words after removing hash
for word in words:
if re.match('^#[a-zA-Z]+', word) is not None: # check if the word starts with hash('#') and contains some characters after it.
without_hash.append(word[1:]) # it true remove the hash and append it your the ther list
else:
without_hash.append(word) # otherwise append the word as is in new list
return ' '.join(without_hash) # join the new list(without hash) by space and return it.
Output:
>>> remove_hash('# #DataScience')
'# DataScience'
>>> remove_hash('kjndjk#jnjkd')
'kjndjk#jnjkd'
>>> remove_hash("# #DataScience #KJSBDKJ kjndjk#jnjkd #jkzcjkh# iusadhuish#")
'# DataScience KJSBDKJ kjndjk#jnjkd jkzcjkh# iusadhuish#'
Your make your code shorter(but a bit harder to understand) by avoiding if else like this:
def remove_hash(str):
words = str.split(' ' )
without_hash = []
for word in words:
without_hash.append(re.sub(r'^#+(.+)', r'\1', word))
return ' '.join(without_hash)
This will get you the same results
Do give the following pattern a try. It looks for a sequence of '#'s and whitespaces that's located at the beginning of the string and substitute it for '# '
import re
test = "# #DataScience"
test = re.sub(r'(^[#\s]+)', '# ', test)
>>>test
# DataScience
You can play with the pattern further here: https://regex101.com/r/6hfw4t/1

strip and split how to strip the list

my code:
readfile = open("{}".format(file), "r")
lines = readfile.read().lower().split()
elements = """,.:;|!##$%^&*"\()`_+=[]{}<>?/~"""
for char in elements:
lines = lines.replace(char, '')
this works and removes the special characters. but I need help with striping "-" and " ' "
so for example " saftey-dance " would be okay but not " -hi- " but " i'll " is okay but not " 'hi "
i need to strip only the beginning and ending
its not a string it is a list.
how do I do this?
May be you can try string.punctuation and strip:
import string
my_string_list = ["-hello-", "safety-dance", "'hi", "I'll", "-hello"]
result = [item.strip(string.punctuation) for item in my_string_list]
print(result)
Result:
['hello', 'safety-dance', 'hi', "I'll", 'hello']
First, using str.replace in a loop is inefficient. Since strings are immutable, you would be creating a need string on each of your iterations. You can use str.translate to remove the unwanted characters in a single pass.
As to removing a dash only if it is not a boundary character, this is exactly what str.strip does.
It also seems the characters you want to remove correspond to string.punctuation, with a special case for '-'.
from string import punctuation
def remove_special_character(s):
transltation = str.maketrans('', '', punctuation.replace('-', ''))
return ' '.join([w.strip('-') for w in s.split()]).translate(transltation)
polluted_string = '-This $string contain%s ill-desired characters!'
clean_string = remove_special_character(polluted_string)
print(clean_string)
# prints: 'This string contains ill-desired characters'
If you want to apply this to multiple lines, you can do it with a list-comprehension.
lines = [remove_special_character(line) for line in lines]
Finally, to read a file you should be using a with statement.
with open(file, "r") as f
lines = [remove_special_character(line) for line in f]

Having trouble adding a space after a period in a python string

I have to write a code to do 2 things:
Compress more than one occurrence of the space character into one.
Add a space after a period, if there isn't one.
For example:
input> This is weird.Indeed
output>This is weird. Indeed.
This is the code I wrote:
def correction(string):
list=[]
for i in string:
if i!=" ":
list.append(i)
elif i==" ":
k=i+1
if k==" ":
k=""
list.append(i)
s=' '.join(list)
return s
strn=input("Enter the string: ").split()
print (correction(strn))
This code takes any input by the user and removes all the extra spaces,but it's not adding the space after the period(I know why not,because of the split function it's taking the period and the next word with it as one word, I just can't figure how to fix it)
This is a code I found online:
import re
def correction2(string):
corstr = re.sub('\ +',' ',string)
final = re.sub('\.','. ',corstr)
return final
strn= ("This is as .Indeed")
print (correction2(strn))
The problem with this code is I can't take any input from the user. It is predefined in the program.
So can anyone suggest how to improve any of the two codes to do both the functions on ANY input by the user?
Is this what you desire?
import re
def corr(s):
return re.sub(r'\.(?! )', '. ', re.sub(r' +', ' ', s))
s = input("> ")
print(corr(s))
I've changed the regex to a lookahead pattern, take a look here.
Edit: explain Regex as requested in comment
re.sub() takes (at least) three arguments: The Regex search pattern, the replacement the matched pattern should be replaced with, and the string in which the replacement should be done.
What I'm doing here is two steps at once, I've been using the output of one function as input of another.
First, the inner re.sub(r' +', ' ', s) searches for multiple spaces (r' +') in s to replace them with single spaces. Then the outer re.sub(r'\.(?! )', '. ', ...) looks for periods without following space character to replace them with '. '. I'm using a negative lookahead pattern to match only sections, that don't match the specified lookahead pattern (a normal space character in this case). You may want to play around with this pattern, this may help understanding it better.
The r string prefix changes the string to a raw string where backslash-escaping is disabled. Unnecessary in this case, but it's a habit of mine to use raw strings with regular expressions.
For a more basic answer, without regex:
>>> def remove_doublespace(string):
... if ' ' not in string:
... return string
... return remove_doublespace(string.replace(' ',' '))
...
>>> remove_doublespace('hi there how are you.i am fine. '.replace('.', '. '))
'hi there how are you. i am fine. '
You try the following code:
>>> s = 'This is weird.Indeed'
>>> def correction(s):
res = re.sub('\s+$', '', re.sub('\s+', ' ', re.sub('\.', '. ', s)))
if res[-1] != '.':
res += '.'
return res
>>> print correction(s)
This is weird. Indeed.
>>> s=raw_input()
hee ss.dk
>>> s
'hee ss.dk'
>>> correction(s)
'hee ss. dk.'

Categories

Resources