Unexpected outcome, manipulating string on Python - python

I am writing some code in Python, trying to clean a string all to lower case without special characters.
string_salada_russa = ' !! LeTRas PeqUEnAS & GraNdeS'
clean_string = string_salada_russa.lower().strip()
print(clean_string)
i = 0
for c in clean_string:
if(c.isalpha() == False and c != " "):
clean_string = clean_string.replace(c, "").strip()
print(clean_string)
for c in clean_string:
if(i >= 1 and i <= len(clean_string)-1):
if(clean_string[i] == " " and clean_string[i-1] == " " and clean_string[i+1] == " "):
clean_string = clean_string.replace(clean_string[i], "")
i += 1
print(clean_string)
Expected outcome would be:
#original string
' !! LeTRas PeqUEnAS & GraNdeS'
#expected
'letras pequenas grandes'
#actual outcome
'letraspequenasgrandes'
I am trying to remove the extra spaces, however unsucessfully. I end up removing ALL spaces.
Could anyone help me figure it out? What is wrong in my code?

How about using re?
import re
s = ' !! LeTRas PeqUEnAS & GraNdeS'
s = re.sub(r"[^a-zA-Z]+", " ", s.lower()).strip()
print(s) # letras pequenas grandes
This first translates the letters into lower case (lower), replace each run of non-alphabetical characters into a single blank (re.sub), and then remove blanks around the string (strip).
Btw, your code does not output 'letraspequenasgrandes'. Instead, it outputs 'letrasZpequenasZZZZZgrandes'.

You could get away with a combination of str.lower(), str.split(), str.join() and str.isalpha():
def clean(s):
return ' '.join(x for x in s.lower().split(' ') if x.isalpha())
s = ' !! LeTRas PeqUEnAS & GraNdeS'
print(clean(s))
# letras pequenas grandes
Basically, you first convert to lower and the split by ' '. After that you filter out non-alpha tokens and join them back.

There's no need to strip your string at each iteration of the first for loop; but, other than that, you could keep the first piece of your code:
for c in clean_string:
if (c.isalpha() == False and c != " "):
clean_string = clean_string.replace(c, "")
Then split your string, effectively removing all the spaces, and re-join the word back into a single string, with a single space between each word:
clean_string = " ".join(clean_string.split())

Related

how to remove spaces in between of a string in python?

Suppose I have a string : ' Swarnendu Pal is a good boy '
Here I want to remove all the spaces in between the strings, that means the leading and the last spaces should be remain same but all other spaces should be removed. My final expected output will be : ' SwarnenduPalisagoodboy '
Try this... doesn't matter #spaces you have in the start/end... the code will retain them... & remove in between strings...
s = " Swarnendu Pal is a good boy "
start_space, end_space = 0,0
for i in s:
if i == " ":
start_space += 1
else:
break
for i in s[::-1]:
if i == " ":
end_space += 1
else:
break
result = " " * start_space + s.replace(" ","") + " " * end_space
print(result)
# " SwarnenduPalisagoodboy " #### output
Hope this helps...
An attempt at regular expression, which will remove consecutive white space characters with non white space characters outside both ends (I'm not very good at using it yet, and there may be a better solution):
>>> import re
>>> re.sub(r'(?<=[^ ]) +(?=[^ ])', '', ' Swarnendu Pal is a good boy ')
' SwarnenduPalisagoodboy '

Python: Get single words from a string

i'm trying to make a string analyzer in python. I'm starting with this input as example:
toAnalyze= "Hello!!gyus-- lol\n"
and as output i want something like that:
>Output: ['Hello', '!!', 'guys', '--', ' ', 'lol']
I want every gropus sorted in the original order
I have thought to scan all chars in the original string until the "\n" character and i came up whith this solution:
toAnalyze= "Hello!!gyus-- lol\n"
final = ""
for char in toAnalyze:
if char != " \n\t" and char != " " and char != "\n" and char != "\n\t":
final += char
elif char == " " or char == "\n" or char == "\n\t" or char == " \n\t":
if not final.isalnum():
word= ""
thing = ""
for l in final:
if l.isalnum():
word += l
else:
thing += l
print("word: " + word)
print("thing: " + thing )
And my current output is:
>Output: thing: !!-- word: Hellogyus lol
Do you have and idea?
The output wanted :
>Output: ['Hello', '!!', 'guys', '--', ' ', 'lol']
Thanks in advance and have a nice day
I'm not a python guy, but want to help you to get started. This is the working solution which you can try to improve so that it becomes more pythonist:
toAnalyze= 'Hello!!gyus-- lol\n'
word = ''
separator = ''
tokens = []
for ch in toAnalyze:
if ch.isalnum():
word += ch
# we met the first character of a separator, so save a word
if not ch.isalnum() and word:
tokens.append(word)
word = ''
# 1. we met the first alphanumeric after a separator, so save the separator or
# 2. we met a new separator right after another one, also save the old separator
if ch.isalnum() and separator or separator and separator[-1] != ch:
tokens.append(separator)
separator = ''
if not ch.isalnum():
separator += ch
The output for your example is:
['Hello', '!!', 'gyus', '--', ' ', 'lol']

Removing And Re-Inserting Spaces

What is the most efficient way to remove spaces from a text, and then after the neccessary function has been performed, re-insert the previously removed spacing?
Take this example below, here is a program for encoding a simple railfence cipher:
from string import ascii_lowercase
string = "Hello World Today"
string = string.replace(" ", "").lower()
print(string[::2] + string[1::2])
This outputs the following:
hlooltdyelwrdoa
This is because it must remove the spacing prior to encoding the text. However, if I now want to re-insert the spacing to make it:
hlool tdyel wrdoa
What is the most efficient way of doing this?
As mentioned by one of the other commenters, you need to record where the spaces came from then add them back in
from string import ascii_lowercase
string = "Hello World Today"
# Get list of spaces
spaces = [i for i,x in enumerate(string) if x == ' ']
string = string.replace(" ", "").lower()
# Set string with ciphered text
ciphered = (string[::2] + string[1::2])
# Reinsert spaces
for space in spaces:
ciphered = ciphered[:space] + ' ' + ciphered[space:]
print(ciphered)
You could use str.split to help you out. When you split on spaces, the lengths of the remaining segments will tell you where to split the processed string:
broken = string.split(' ')
sizes = list(map(len, broken))
You'll need the cumulative sum of the sizes:
from itertools import accumulate, chain
cs = accumulate(sizes)
Now you can reinstate the spaces:
processed = ''.join(broken).lower()
processed = processed[::2] + processed[1::2]
chunks = [processed[index:size] for index, size in zip(chain([0], cs), sizes)]
result = ' '.join(chunks)
This solution is not especially straightforward or efficient, but it does avoid explicit loops.
Using list and join operation,
random_string = "Hello World Today"
space_position = [pos for pos, char in enumerate(random_string) if char == ' ']
random_string = random_string.replace(" ", "").lower()
random_string = list(random_string[::2] + random_string[1::2])
for index in space_position:
random_string.insert(index, ' ')
random_string = ''.join(random_string)
print(random_string)
I think this might Help
string = "Hello World Today"
nonSpaceyString = string.replace(" ", "").lower()
randomString = nonSpaceyString[::2] + nonSpaceyString[1::2]
spaceSet = [i for i, x in enumerate(string) if x == " "]
for index in spaceSet:
randomString = randomString[:index] + " " + randomString[index:]
print(randomString)
string = "Hello World Today"
# getting index of ' '
index = [i for i in range(len(string)) if string[i]==' ']
# storing the non ' ' characters
data = [i for i in string.lower() if i!=' ']
# applying cipher code as mention in OP STATEMENT
result = data[::2]+data[1::2]
# inserting back the spaces in there position as they had in original string
for i in index:
result.insert(i, ' ')
# creating a string solution
solution = ''.join(result)
print(solution)
# output hlool tdyel wrdoa
You can make a new string with this small yet simple (kind of) code:
Note this doesn't use any libraries, which might make this slower, but less confusing.
def weird_string(string): # get input value
spaceless = ''.join([c for c in string if c != ' ']) # get spaceless version
skipped = spaceless[::2] + spaceless[1::2] # get new unique 'code'
result = list(skipped) # get list of one letter strings
for i in range(len(string)): # loop over strings
if string[i] == ' ': # if a space 'was' here
result.insert(i, ' ') # add the space back
# end for
s = ''.join(result) # join the results back
return s # return the result

How to remove or strip off white spaces without using strip() function?

Write a function that accepts an input string consisting of alphabetic
characters and removes all the leading whitespace of the string and
returns it without using .strip(). For example if:
input_string = " Hello "
then your function should return a string such as:
output_string = "Hello "
The below is my program for removing white spaces without using strip:
def Leading_White_Space (input_str):
length = len(input_str)
i = 0
while (length):
if(input_str[i] == " "):
input_str.remove()
i =+ 1
length -= 1
#Main Program
input_str = " Hello "
result = Leading_White_Space (input_str)
print (result)
I chose the remove function as it would be easy to get rid off the white spaces before the string 'Hello'. Also the program tells to just eliminate the white spaces before the actual string. By my logic I suppose it not only eliminates the leading but trailing white spaces too. Any help would be appreciated.
You can loop over the characters of the string and stop when you reach a non-space one. Here is one solution :
def Leading_White_Space(input_str):
for i, c in enumerate(input_str):
if c != ' ':
return input_str[i:]
Edit :
#PM 2Ring mentionned a good point. If you want to handle all types of types of whitespaces (e.g \t,\n,\r), you need to use isspace(), so a correct solution could be :
def Leading_White_Space(input_str):
for i, c in enumerate(input_str):
if not c.isspace():
return input_str[i:]
Here's another way to strip the leading whitespace, that actually strips all leading whitespace, not just the ' ' space char. There's no need to bother tracking the index of the characters in the string, we just need a flag to let us know when to stop checking for whitespace.
def my_lstrip(input_str):
leading = True
for ch in input_str:
if leading:
# All the chars read so far have been whitespace
if not ch.isspace():
# The leading whitespace is finished
leading = False
# Start saving chars
result = ch
else:
# We're past the whitespace, copy everything
result += ch
return result
# test
input_str = " \n \t Hello "
result = my_lstrip(input_str)
print(repr(result))
output
'Hello '
There are various other ways to do this. Of course, in a real program you'd simply use the string .lstrip method, but here are a couple of cute ways to do it using an iterator:
def my_lstrip(input_str):
it = iter(input_str)
for ch in it:
if not ch.isspace():
break
return ch + ''.join(it)
and
def my_lstrip(input_str):
it = iter(input_str)
ch = next(it)
while ch.isspace():
ch = next(it)
return ch + ''.join(it)
Use re.sub
>>> input_string = " Hello "
>>> re.sub(r'^\s+', '', input_string)
'Hello '
or
>>> def remove_space(s):
ind = 0
for i,j in enumerate(s):
if j != ' ':
ind = i
break
return s[ind:]
>>> remove_space(input_string)
'Hello '
>>>
Just to be thorough and without using other modules, we can also specify which whitespace to remove (leading, trailing, both or all), including tab and new line characters. The code I used (which is, for obvious reasons, less compact than other answers) is as follows and makes use of slicing:
def no_ws(string,which='left'):
"""
Which takes the value of 'left'/'right'/'both'/'all' to remove relevant
whitespace.
"""
remove_chars = (' ','\n','\t')
first_char = 0; last_char = 0
if which in ['left','both']:
for idx,letter in enumerate(string):
if not first_char and letter not in remove_chars:
first_char = idx
break
if which == 'left':
return string[first_char:]
if which in ['right','both']:
for idx,letter in enumerate(string[::-1]):
if not last_char and letter not in remove_chars:
last_char = -(idx + 1)
break
return string[first_char:last_char+1]
if which == 'all':
return ''.join([s for s in string if s not in remove_chars])
you can use itertools.dropwhile to remove all particualar characters from the start of you string like this
import itertools
def my_lstrip(input_str,remove=" \n\t"):
return "".join( itertools.dropwhile(lambda x:x in remove,input_str))
to make it more flexible, I add an additional argument called remove, they represent the characters to remove from the string, with a default value of " \n\t", then with dropwhile it will ignore all characters that are in remove, to check this I use a lambda function (that is a practical form of write short anonymous functions)
here a few tests
>>> my_lstrip(" \n \t Hello ")
'Hello '
>>> my_lstrip(" Hello ")
'Hello '
>>> my_lstrip(" \n \t Hello ")
'Hello '
>>> my_lstrip("--- Hello ","-")
' Hello '
>>> my_lstrip("--- Hello ","- ")
'Hello '
>>> my_lstrip("- - - Hello ","- ")
'Hello '
>>>
the previous function is equivalent to
def my_lstrip(input_str,remove=" \n\t"):
i=0
for i,x in enumerate(input_str):
if x not in remove:
break
return input_str[i:]

How to format my string

def main():
print('Please enter a sentence without spaces and each word has ' + \
'a capital letter.')
sentence = input('Enter your sentence: ')
for ch in sentence:
if ch.isupper():
capital = ch
sentence = sentence.replace(capital, ' ' + capital)
main()
Ex: sentence = 'ExampleSentenceGoesHere'
I need this to print as: Example sentence goes here
as of right now, it prints as: Example Sentence Goes Here (with space at the beginning)
You can iterate over the string character by character and replace every upper case letter with a space and appropriate lower case letter:
>>> s = 'ExampleSentenceGoesHere'
>>> "".join(' ' + i.lower() if i.isupper() else i for i in s).strip().capitalize()
'Example sentence goes here'
Note that check if the string is in upper case is done by isupper(). Calling strip() and capitalize() just helps to deal with the first letter.
Also see relevant threads:
Elegant Python function to convert CamelCase to snake_case?
How to check if a character is upper-case in Python?
You need to convert the each uppercase letter to a lowercase one using capital.lower(). You should also ignore the first letter of the sentence so it stays capitalised and doesn't have a space first. You can do this using a flag as such:
is_first_letter = True
for ch in sentence:
if is_first_letter:
is_first_letter = False
continue
if ch.isupper():
capital = ch
sentence = sentence.replace(capital, ' ' + capital.lower())
I'd probably use re and re.split("[A-Z]", text) but I'm assuming you can't do that because this looks like homework. How about:
def main():
text = input(">>")
newtext = ""
for character in text:
if character.isupper():
ch = " " + character.lower()
else:
ch = character
newtext += ch
text = text[0]+newtext[2:]
You could also do:
transdict = {letter:" "+letter.lower() for letter in 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'}
transtable = str.maketrans(transdict)
text.translate(transtable).strip().capitalize()
But again I think that's outside the scope of the assignment

Categories

Resources