In Python, I have a lot of strings, containing spaces.
I would like to clear all spaces from the text, except if it is in quotation marks.
Example input:
This is "an example text" containing spaces.
And I want to get:
Thisis"an example text"containingspaces.
line.split() is not good, I think, because it clears all of spaces from the text.
What do you recommend?
For the simple case that only " are used as quotes:
>>> import re
>>> s = 'This is "an example text" containing spaces.'
>>> re.sub(r' (?=(?:[^"]*"[^"]*")*[^"]*$)', "", s)
'Thisis"an example text"containingspaces.'
Explanation:
[ ] # Match a space
(?= # only if an even number of spaces follows --> lookahead
(?: # This is true when the following can be matched:
[^"]*" # Any number of non-quote characters, then a quote, then
[^"]*" # the same thing again to get an even number of quotes.
)* # Repeat zero or more times.
[^"]* # Match any remaining non-quote characters
$ # and then the end of the string.
) # End of lookahead.
There is probably a more elegant solution than this, but:
>>> test = "This is \"an example text\" containing spaces."
>>> '"'.join([x if i % 2 else "".join(x.split())
for i, x in enumerate(test.split('"'))])
'Thisis"an example text"containingspaces.'
We split the text on quotes, then iterate through them in a list comprehension. We remove the spaces by splitting and rejoining if the index is odd (not inside quotes), and don't if it is even (inside quotes). We then rejoin the whole thing with quotes.
Using re.findall is probably the more easily understood/flexible method:
>>> s = 'This is "an example text" containing spaces.'
>>> ''.join(re.findall(r'(?:".*?")|(?:\S+)', s))
'Thisis"an example text"containingspaces.'
You could (ab)use the csv.reader:
>>> import csv
>>> ''.join(next(csv.reader([s.replace('"', '"""')], delimiter=' ')))
'Thisis"an example text"containingspaces.'
Or using re.split:
>>> ''.join(filter(None, re.split(r'(?:\s*(".*?")\s*)|[ ]', s)))
'Thisis"an example text"containingspaces.'
Use regular expressions!
import cStringIO, re
result = cStringIO.StringIO()
regex = re.compile('("[^"]*")')
text = 'This is "an example text" containing spaces.'
for part in regex.split(text):
if part and part[0] == '"':
result.write(part)
else:
result.write(part.replace(" ", ""))
return result.getvalue()
You can do this with csv as well:
import csv
out=[]
for e in csv.reader('This is "an example text" containing spaces. '):
e=''.join(e)
if e==' ': continue
if ' ' in e: out.extend('"'+e+'"')
else: out.extend(e)
print ''.join(out)
Prints Thisis"an example text"containingspaces.
'"'.join(v if i%2 else v.replace(' ', '') for i, v in enumerate(line.split('"')))
quotation_mark = '"'
space = " "
example = 'foo choo boo "blaee blahhh" didneid ei did '
formated_example = ''
if example[0] == quotation_mark:
inside_quotes = True
else:
inside_quotes = False
for character in example:
if inside_quotes != True:
formated_example += character
else:
if character != space:
formated_example += character
if character == quotation_mark:
if inside_quotes == True:
inside_quotes = False
else:
inside_quotes = True
print formated_example
Related
I have a python challenge that if given a string with '_' or '-' in between each word such as the_big_red_apple or the-big-red-apple to convert it to camel case. Also if the first word is uppercase keep it as uppercase. This is my code. Im not allowed to use the re library in the challenge however but I didn't know how else to do it.
from re import sub
def to_camel_case(text):
if text[0].isupper():
text = sub(r"(_|-)+"," ", text).title().replace(" ", "")
else:
text = sub(r"(_|-)+"," ", text).title().replace(" ", "")
text = text[0].lower() + text[1:]
return print(text)
Word delimiters can be - dash or _ underscore.
Let's simplify, making them all underscores:
text = text.replace('-', '_')
Now we can break out words:
words = text.split('_')
With that in hand it's simple to put them back together:
text = ''.join(map(str.capitalize, words))
or more verbosely, with a generator expression,
assign ''.join(word.capitalize() for word in words).
I leave "finesse the 1st character"
as an exercise to the reader.
If you RTFM you'll find it contains a wealth of knowledge.
https://docs.python.org/3/library/re.html#raw-string-notation
'+'
Causes the resulting RE to match 1 or more repetitions of the preceding RE. ab+ will match ‘a’ followed by any non-zero number of ‘b’s
The effect of + is turn both
db_rows_read and
db__rows_read
into DbRowsRead.
Also,
Raw string notation (r"text") keeps regular expressions sane.
The regex in your question doesn't exactly
need a raw string, as it has no crazy
punctuation like \ backwhacks.
But it's a very good habit to always put
a regex in an r-string, Just In Case.
You never know when code maintenance
will tack on additional elements,
and who wants a subtle regex bug on their hands?
You can try it like this :
def to_camel_case(text):
s = text.replace("-", " ").replace("_", " ")
s = s.split()
if len(text) == 0:
return text
return s[0] + ''.join(i.capitalize() for i in s[1:])
print(to_camel_case('momo_es-es'))
the output of print(to_camel_case('momo_es-es')) is momoEsEs
r"..." refers to Raw String in Python which simply means treating backlash \ as literal instead of escape character.
And (_|-)[+] is a Regular Expression that match the string containing one or more - or _ characters.
(_|-) means matching the string that contains - or _.
+ means matching the above character (- or _) than occur one or more times in the string.
In case you cannot use re library for this solution:
def to_camel_case(text):
# Since delimiters can have 2 possible answers, let's simplify it to one.
# In this case, I replace all `_` characters with `-`, to make sure we have only one delimiter.
text = text.replace("_", "-") # the_big-red_apple => the-big-red-apple
# Next, we should split our text into words in order for us to iterate through and modify it later.
words = text.split("-") # the-big-red-apple => ["the", "big", "red", "apple"]
# Now, for each word (except the first word) we have to turn its first character to uppercase.
for i in range(1, len(words)):
# `i`start from 1, which means the first word IS NOT INCLUDED in this loop.
word = words[i]
# word[1:] means the rest of the characters except the first one
# (e.g. w = "apple" => w[1:] = "pple")
words[i] = word[0].upper() + word[1:].lower()
# you can also use Python built-in method for this:
# words[i] = word.capitalize()
# After this loop, ["the", "big", "red", "apple"] => ["the", "Big", "Red", "Apple"]
# Finally, we put the words back together and return it
# ["the", "Big", "Red", "Apple"] => theBigRedApple
return "".join(words)
print(to_camel_case("the_big-red_apple"))
Try this:
First, replace all the delimiters into a single one, i.e. str.replace('_', '-')
Split the string on the str.split('-') standardized delimiter
Capitalize each string in list, i.e. str.capitilize()
Join the capitalize string with str.join
>>> s = "the_big_red_apple"
>>> s.replace('_', '-').split('-')
['the', 'big', 'red', 'apple']
>>> ''.join(map(str.capitalize, s.replace('_', '-').split('-')))
'TheBigRedApple'
>> ''.join(word.capitalize() for word in s.replace('_', '-').split('-'))
'TheBigRedApple'
If you need to lowercase the first char, then:
>>> camel_mile = lambda x: x[0].lower() + x[1:]
>>> s = 'TheBigRedApple'
>>> camel_mile(s)
'theBigRedApple'
Alternative,
First replace all delimiters to space str.replace('_', ' ')
Titlecase the string str.title()
Remove space from string, i.e. str.replace(' ', '')
>>> s = "the_big_red_apple"
>>> s.replace('_', ' ').title().replace(' ', '')
'TheBigRedApple'
Another alternative,
Iterate through the characters and then keep a pointer/note on previous character, i.e. for prev, curr in zip(s, s[1:])
check if the previous character is one of your delimiter, if so, uppercase the current character, i.e. curr.upper() if prev in ['-', '_'] else curr
skip whitepace characters, i.e. if curr != " "
Then add the first character in lowercase, [s[0].lower()]
>>> chars = [s[0].lower()] + [curr.upper() if prev in ['-', '_'] else curr for prev, curr in zip(s, s[1:]) if curr != " "]
>>> "".join(chars)
'theBigRedApple'
Yet another alternative,
Replace/Normalize all delimiters into a single one, s.replace('-', '_')
Convert it into a list of chars, list(s.replace('-', '_'))
While there is still '_' in the list of chars, keep
find the position of the next '_'
replacing the character after '_' with its uppercase
replacing the '_' with ''
>>> s = 'the_big_red_apple'
>>> s_list = list(s.replace('-', '_'))
>>> while '_' in s_list:
... where_underscore = s_list.index('_')
... s_list[where_underscore+1] = s_list[where_underscore+1].upper()
... s_list[where_underscore] = ""
...
>>> "".join(s_list)
'theBigRedApple'
or
>>> s = 'the_big_red_apple'
>>> s_list = list(s.replace('-', '_'))
>>> while '_' in s_list:
... where_underscore = s_list.index('_')
... s_list[where_underscore:where_underscore+2] = ["", s_list[where_underscore+1].upper()]
...
>>> "".join(s_list)
'theBigRedApple'
Note: Why do we need to convert the string to list of chars? Cos strings are immutable, 'str' object does not support item assignment
BTW, the regex solution can make use of some group catching, e.g.
>>> import re
>>> s = "the_big_red_apple"
>>> upper_regex_group = lambda x: x.group(1).upper()
>>> re.sub("[_|-](\w)", upper_regex_group, s)
'theBigRedApple'
>>> re.sub("[_|-](\w)", lambda x: x.group(1).upper(), s)
'theBigRedApple'
I tried matching words including the letter "ab" or "ba" e.g. "ab"olition, f"ab"rics, pro"ba"ble. I came up with the following regular expression:
r"[Aa](?=[Bb])[Bb]|[Bb](?=[Aa])[Aa]"
But it includes words that start or end with ", (, ), / ....non-alphanumeric characters. How can I erase it? I just want to match words list.
import sys
import re
word=[]
dict={}
f = open('C:/Python27/brown_half.txt', 'rU')
w = open('C:/Python27/brown_halfout.txt', 'w')
data = f.read()
word = data.split() # word is list
f.close()
for num2 in word:
match2 = re.findall("\w*(ab|ba)\w*", num2)
if match2:
dict[num2] = (dict[num2] + 1) if num2 in dict.keys() else 1
for key2 in sorted(dict.iterkeys()):print "%s: %s" % (key2, dict[key2])
print len(dict.keys())
Here, I don't know how to mix it up with "re.compile~~" method that 1st comment said...
To match all the words with ab or ba (case insensitive):
import re
text = 'fabh, obar! (Abtt) yybA, kk'
pattern = re.compile(r"(\w*(ab|ba)\w*)", re.IGNORECASE)
# to print all the matches
for match in pattern.finditer(text):
print match.group(0)
# to print the first match
print pattern.search(text).group(0)
https://regex101.com/r/uH3xM9/1
Regular expressions are not the best tool for the job in this case. They'll complicate stuff way too much for such simple circumstances. You can instead use Python's builtin in operator (works for both Python 2 and 3)...
sentence = "There are no probable situations whereby that may happen, or so it seems since the Abolition."
words = [''.join(filter(lambda x: x.isalpha(), token)) for token in sentence.split()]
for word in words:
word = word.lower()
if 'ab' in word or 'ba' in word:
print('Word "{}" matches pattern!'.format(word))
As you can see, 'ab' in word evaluates to True if the string 'ab' is found as-is (that is, exactly) in word, or False otherwise. For example 'ba' in 'probable' == True and 'ab' in 'Abolition' == False. The second line takes take of dividing the sentence in words and taking out any punctuation character. word = word.lower() makes word lowercase before the comparisons, so that for word = 'Abolition', 'ab' in word == True.
I would do it this way:
Strip your string from unwanted chars using the below two
techniques, your choice:
a - By building a translation dictionary and using translate method:
>>> import string
>>> del_punc = dict.fromkeys(ord(c) for c in string.punctuation)
s = 'abolition, fabrics, probable, test, case, bank;, halfback 1(ablution).'
>>> s = s.translate(del_punc)
>>> print(s)
'abolition fabrics probable test case bank halfback 1ablution'
b - using re.sub method:
>>> import string
>>> import re
>>> s = 'abolition, fabrics, probable, test, case, bank;, halfback 1(ablution).'
>>> s = re.sub(r'[%s]'%string.punctuation, '', s)
>>> print(s)
'abolition fabrics probable test case bank halfback 1ablution'
Next will be finding your words containing 'ab' or 'ba':
a - Splitting over whitespaces and finding occurrences of your desired strings, which is the one I recommend to you:
>>> [x for x in s.split() if 'ab' in x.lower() or 'ba' in x.lower()]
['abolition', 'fabrics', 'probable', 'bank', 'halfback', '1ablution']
b -Using re.finditer method:
>>> pat
re.compile('\\b.*?(ab|ba).*?\\b', re.IGNORECASE)
>>> for m in pat.finditer(s):
print(m.group())
abolition
fabrics
probable
test case bank
halfback
1ablution
string = "your string here"
lowercase = string.lower()
if 'ab' in lowercase or 'ba' in lowercase:
print(true)
else:
print(false)
Try this one
[(),/]*([a-z]|(ba|ab))+[(),/]*
Write a function that accepts an input string consisting of alphabetic
characters and removes all the leading whitespace of the string and
returns it without using .strip(). For example if:
input_string = " Hello "
then your function should return a string such as:
output_string = "Hello "
The below is my program for removing white spaces without using strip:
def Leading_White_Space (input_str):
length = len(input_str)
i = 0
while (length):
if(input_str[i] == " "):
input_str.remove()
i =+ 1
length -= 1
#Main Program
input_str = " Hello "
result = Leading_White_Space (input_str)
print (result)
I chose the remove function as it would be easy to get rid off the white spaces before the string 'Hello'. Also the program tells to just eliminate the white spaces before the actual string. By my logic I suppose it not only eliminates the leading but trailing white spaces too. Any help would be appreciated.
You can loop over the characters of the string and stop when you reach a non-space one. Here is one solution :
def Leading_White_Space(input_str):
for i, c in enumerate(input_str):
if c != ' ':
return input_str[i:]
Edit :
#PM 2Ring mentionned a good point. If you want to handle all types of types of whitespaces (e.g \t,\n,\r), you need to use isspace(), so a correct solution could be :
def Leading_White_Space(input_str):
for i, c in enumerate(input_str):
if not c.isspace():
return input_str[i:]
Here's another way to strip the leading whitespace, that actually strips all leading whitespace, not just the ' ' space char. There's no need to bother tracking the index of the characters in the string, we just need a flag to let us know when to stop checking for whitespace.
def my_lstrip(input_str):
leading = True
for ch in input_str:
if leading:
# All the chars read so far have been whitespace
if not ch.isspace():
# The leading whitespace is finished
leading = False
# Start saving chars
result = ch
else:
# We're past the whitespace, copy everything
result += ch
return result
# test
input_str = " \n \t Hello "
result = my_lstrip(input_str)
print(repr(result))
output
'Hello '
There are various other ways to do this. Of course, in a real program you'd simply use the string .lstrip method, but here are a couple of cute ways to do it using an iterator:
def my_lstrip(input_str):
it = iter(input_str)
for ch in it:
if not ch.isspace():
break
return ch + ''.join(it)
and
def my_lstrip(input_str):
it = iter(input_str)
ch = next(it)
while ch.isspace():
ch = next(it)
return ch + ''.join(it)
Use re.sub
>>> input_string = " Hello "
>>> re.sub(r'^\s+', '', input_string)
'Hello '
or
>>> def remove_space(s):
ind = 0
for i,j in enumerate(s):
if j != ' ':
ind = i
break
return s[ind:]
>>> remove_space(input_string)
'Hello '
>>>
Just to be thorough and without using other modules, we can also specify which whitespace to remove (leading, trailing, both or all), including tab and new line characters. The code I used (which is, for obvious reasons, less compact than other answers) is as follows and makes use of slicing:
def no_ws(string,which='left'):
"""
Which takes the value of 'left'/'right'/'both'/'all' to remove relevant
whitespace.
"""
remove_chars = (' ','\n','\t')
first_char = 0; last_char = 0
if which in ['left','both']:
for idx,letter in enumerate(string):
if not first_char and letter not in remove_chars:
first_char = idx
break
if which == 'left':
return string[first_char:]
if which in ['right','both']:
for idx,letter in enumerate(string[::-1]):
if not last_char and letter not in remove_chars:
last_char = -(idx + 1)
break
return string[first_char:last_char+1]
if which == 'all':
return ''.join([s for s in string if s not in remove_chars])
you can use itertools.dropwhile to remove all particualar characters from the start of you string like this
import itertools
def my_lstrip(input_str,remove=" \n\t"):
return "".join( itertools.dropwhile(lambda x:x in remove,input_str))
to make it more flexible, I add an additional argument called remove, they represent the characters to remove from the string, with a default value of " \n\t", then with dropwhile it will ignore all characters that are in remove, to check this I use a lambda function (that is a practical form of write short anonymous functions)
here a few tests
>>> my_lstrip(" \n \t Hello ")
'Hello '
>>> my_lstrip(" Hello ")
'Hello '
>>> my_lstrip(" \n \t Hello ")
'Hello '
>>> my_lstrip("--- Hello ","-")
' Hello '
>>> my_lstrip("--- Hello ","- ")
'Hello '
>>> my_lstrip("- - - Hello ","- ")
'Hello '
>>>
the previous function is equivalent to
def my_lstrip(input_str,remove=" \n\t"):
i=0
for i,x in enumerate(input_str):
if x not in remove:
break
return input_str[i:]
I am using python to go through a file and remove any comments. A comment is defined as a hash and anything to the right of it as long as the hash isn't inside double quotes. I currently have a solution, but it seems sub-optimal:
filelines = []
r = re.compile('(".*?")')
for line in f:
m = r.split(line)
nline = ''
for token in m:
if token.find('#') != -1 and token[0] != '"':
nline += token[:token.find('#')]
break
else:
nline += token
filelines.append(nline)
Is there a way to find the first hash not within quotes without for loops (i.e. through regular expressions?)
Examples:
' "Phone #":"555-1234" ' -> ' "Phone #":"555-1234" '
' "Phone "#:"555-1234" ' -> ' "Phone "'
'#"Phone #":"555-1234" ' -> ''
' "Phone #":"555-1234" #Comment' -> ' "Phone #":"555-1234" '
Edit: Here is a pure regex solution created by user2357112. I tested it, and it works great:
filelines = []
r = re.compile('(?:"[^"]*"|[^"#])*(#)')
for line in f:
m = r.match(line)
if m != None:
filelines.append(line[:m.start(1)])
else:
filelines.append(line)
See his reply for more details on how this regex works.
Edit2: Here's a version of user2357112's code that I modified to account for escape characters (\"). This code also eliminates the 'if' by including a check for end of string ($):
filelines = []
r = re.compile(r'(?:"(?:[^"\\]|\\.)*"|[^"#])*(#|$)')
for line in f:
m = r.match(line)
filelines.append(line[:m.start(1)])
r'''(?: # Non-capturing group
"[^"]*" # A quote, followed by not-quotes, followed by a quote
| # or
[^"#] # not a quote or a hash
) # end group
* # Match quoted strings and not-quote-not-hash characters until...
(#) # the comment begins!
'''
This is a verbose regex, designed to operate on a single line, so make sure to use the re.VERBOSE flag and feed it one line at a time. It'll capture the first unquoted hash as group 1 if there is one, so you can use match.start(1) to get the index. It doesn't handle backslash escapes, if you want to be able to put a backslash-escaped quote in a string. This is untested.
You can remove comments using this script:
import re
print re.sub(r'(?s)("[^"\\]*(?:\\.[^"\\]*)*")|#[^\n]*', lambda m: m.group(1) or '', '"Phone #"#:"555-1234"')
The idea is to capture first parts enclosed in double-quotes and to replace them by themself before searching a sharp:
(?s) # the dot matches newlines too
( # open the capture group 1
" # "
[^"\\]* # all characters except a quote or a backslash
# zero or more times
(?: # open a non-capturing group
\\. # a backslash and any character
[^"\\]* #
)* # repeat zero or more times
" # "
) # close the capture group 1
| # OR
#[^\n]* # a sharp and zero or one characters that are not a newline.
This code was so ugly, I had to post it.
def remove_comments(text):
char_list = list(text)
in_str = False
deleting = False
for i, c in enumerate(char_list):
if deleting:
if c == '\n':
deleting = False
else:
char_list[i] = None
elif c == '"':
in_str = not in_str
elif c == '#':
if not in_str:
deleting = True
char_list[i] = None
char_list = filter(lambda x: x is not None, char_list)
return ''.join(char_list)
Seems to work though. Although I'm not sure how it might handle newline chars between windows and linux.
How do I remove leading and trailing whitespace from a string in Python?
" Hello world " --> "Hello world"
" Hello world" --> "Hello world"
"Hello world " --> "Hello world"
"Hello world" --> "Hello world"
To remove all whitespace surrounding a string, use .strip(). Examples:
>>> ' Hello '.strip()
'Hello'
>>> ' Hello'.strip()
'Hello'
>>> 'Bob has a cat'.strip()
'Bob has a cat'
>>> ' Hello '.strip() # ALL consecutive spaces at both ends removed
'Hello'
Note that str.strip() removes all whitespace characters, including tabs and newlines. To remove only spaces, specify the specific character to remove as an argument to strip:
>>> " Hello\n ".strip(" ")
'Hello\n'
To remove only one space at most:
def strip_one_space(s):
if s.endswith(" "): s = s[:-1]
if s.startswith(" "): s = s[1:]
return s
>>> strip_one_space(" Hello ")
' Hello'
As pointed out in answers above
my_string.strip()
will remove all the leading and trailing whitespace characters such as \n, \r, \t, \f, space .
For more flexibility use the following
Removes only leading whitespace chars: my_string.lstrip()
Removes only trailing whitespace chars: my_string.rstrip()
Removes specific whitespace chars: my_string.strip('\n') or my_string.lstrip('\n\r') or my_string.rstrip('\n\t') and so on.
More details are available in the docs.
strip is not limited to whitespace characters either:
# remove all leading/trailing commas, periods and hyphens
title = title.strip(',.-')
This will remove all leading and trailing whitespace in myString:
myString.strip()
You want strip():
myphrases = [" Hello ", " Hello", "Hello ", "Bob has a cat"]
for phrase in myphrases:
print(phrase.strip())
This can also be done with a regular expression
import re
input = " Hello "
output = re.sub(r'^\s+|\s+$', '', input)
# output = 'Hello'
Well seeing this thread as a beginner got my head spinning. Hence came up with a simple shortcut.
Though str.strip() works to remove leading & trailing spaces it does nothing for spaces between characters.
words=input("Enter the word to test")
# If I have a user enter discontinous threads it becomes a problem
# input = " he llo, ho w are y ou "
n=words.strip()
print(n)
# output "he llo, ho w are y ou" - only leading & trailing spaces are removed
Instead use str.replace() to make more sense plus less error & more to the point.
The following code can generalize the use of str.replace()
def whitespace(words):
r=words.replace(' ','') # removes all whitespace
n=r.replace(',','|') # other uses of replace
return n
def run():
words=input("Enter the word to test") # take user input
m=whitespace(words) #encase the def in run() to imporve usability on various functions
o=m.count('f') # for testing
return m,o
print(run())
output- ('hello|howareyou', 0)
Can be helpful while inheriting the same in diff. functions.
In order to remove "Whitespace" which causes plenty of indentation errors when running your finished code or programs in Pyhton. Just do the following;obviously if Python keeps telling that the error(s) is indentation in line 1,2,3,4,5, etc..., just fix that line back and forth.
However, if you still get problems about the program that are related to typing mistakes, operators, etc, make sure you read why error Python is yelling at you:
The first thing to check is that you have your
indentation right. If you do, then check to see if you have
mixed tabs with spaces in your code.
Remember: the code
will look fine (to you), but the interpreter refuses to run it. If
you suspect this, a quick fix is to bring your code into an
IDLE edit window, then choose Edit..."Select All from the
menu system, before choosing Format..."Untabify Region.
If you’ve mixed tabs with spaces, this will convert all your
tabs to spaces in one go (and fix any indentation issues).
I could not find a solution to what I was looking for so I created some custom functions. You can try them out.
def cleansed(s: str):
""":param s: String to be cleansed"""
assert s is not (None or "")
# return trimmed(s.replace('"', '').replace("'", ""))
return trimmed(s)
def trimmed(s: str):
""":param s: String to be cleansed"""
assert s is not (None or "")
ss = trim_start_and_end(s).replace(' ', ' ')
while ' ' in ss:
ss = ss.replace(' ', ' ')
return ss
def trim_start_and_end(s: str):
""":param s: String to be cleansed"""
assert s is not (None or "")
return trim_start(trim_end(s))
def trim_start(s: str):
""":param s: String to be cleansed"""
assert s is not (None or "")
chars = []
for c in s:
if c is not ' ' or len(chars) > 0:
chars.append(c)
return "".join(chars).lower()
def trim_end(s: str):
""":param s: String to be cleansed"""
assert s is not (None or "")
chars = []
for c in reversed(s):
if c is not ' ' or len(chars) > 0:
chars.append(c)
return "".join(reversed(chars)).lower()
s1 = ' b Beer '
s2 = 'Beer b '
s3 = ' Beer b '
s4 = ' bread butter Beer b '
cdd = trim_start(s1)
cddd = trim_end(s2)
clean1 = cleansed(s3)
clean2 = cleansed(s4)
print("\nStr: {0} Len: {1} Cleansed: {2} Len: {3}".format(s1, len(s1), cdd, len(cdd)))
print("\nStr: {0} Len: {1} Cleansed: {2} Len: {3}".format(s2, len(s2), cddd, len(cddd)))
print("\nStr: {0} Len: {1} Cleansed: {2} Len: {3}".format(s3, len(s3), clean1, len(clean1)))
print("\nStr: {0} Len: {1} Cleansed: {2} Len: {3}".format(s4, len(s4), clean2, len(clean2)))
If you want to trim specified number of spaces from left and right, you could do this:
def remove_outer_spaces(text, num_of_leading, num_of_trailing):
text = list(text)
for i in range(num_of_leading):
if text[i] == " ":
text[i] = ""
else:
break
for i in range(1, num_of_trailing+1):
if text[-i] == " ":
text[-i] = ""
else:
break
return ''.join(text)
txt1 = " MY name is "
print(remove_outer_spaces(txt1, 1, 1)) # result is: " MY name is "
print(remove_outer_spaces(txt1, 2, 3)) # result is: " MY name is "
print(remove_outer_spaces(txt1, 6, 8)) # result is: "MY name is"
How do I remove leading and trailing whitespace from a string in Python?
So below solution will remove leading and trailing whitespaces as well as intermediate whitespaces too. Like if you need to get a clear string values without multiple whitespaces.
>>> str_1 = ' Hello World'
>>> print(' '.join(str_1.split()))
Hello World
>>>
>>>
>>> str_2 = ' Hello World'
>>> print(' '.join(str_2.split()))
Hello World
>>>
>>>
>>> str_3 = 'Hello World '
>>> print(' '.join(str_3.split()))
Hello World
>>>
>>>
>>> str_4 = 'Hello World '
>>> print(' '.join(str_4.split()))
Hello World
>>>
>>>
>>> str_5 = ' Hello World '
>>> print(' '.join(str_5.split()))
Hello World
>>>
>>>
>>> str_6 = ' Hello World '
>>> print(' '.join(str_6.split()))
Hello World
>>>
>>>
>>> str_7 = 'Hello World'
>>> print(' '.join(str_7.split()))
Hello World
As you can see this will remove all the multiple whitespace in the string(output is Hello World for all). Location doesn't matter. But if you really need leading and trailing whitespaces, then strip() would be find.
One way is to use the .strip() method (removing all surrounding whitespaces)
str = " Hello World "
str = str.strip()
**result: str = "Hello World"**
Note that .strip() returns a copy of the string and doesn't change the underline object (since strings are immutable).
Should you wish to remove all whitespace (not only trimming the edges):
str = ' abcd efgh ijk '
str = str.replace(' ', '')
**result: str = 'abcdefghijk'
I wanted to remove the too-much spaces in a string (also in between the string, not only in the beginning or end). I made this, because I don't know how to do it otherwise:
string = "Name : David Account: 1234 Another thing: something "
ready = False
while ready == False:
pos = string.find(" ")
if pos != -1:
string = string.replace(" "," ")
else:
ready = True
print(string)
This replaces double spaces in one space until you have no double spaces any more