Remove all whitespace in a string - python

I want to eliminate all the whitespace from a string, on both ends, and in between words.
I have this Python code:
def my_handle(self):
sentence = ' hello apple '
sentence.strip()
But that only eliminates the whitespace on both sides of the string. How do I remove all whitespace?

If you want to remove leading and ending spaces, use str.strip():
>>> " hello apple ".strip()
'hello apple'
If you want to remove all space characters, use str.replace() (NB this only removes the “normal” ASCII space character ' ' U+0020 but not any other whitespace):
>>> " hello apple ".replace(" ", "")
'helloapple'
If you want to remove duplicated spaces, use str.split() followed by str.join():
>>> " ".join(" hello apple ".split())
'hello apple'

To remove only spaces use str.replace:
sentence = sentence.replace(' ', '')
To remove all whitespace characters (space, tab, newline, and so on) you can use split then join:
sentence = ''.join(sentence.split())
or a regular expression:
import re
pattern = re.compile(r'\s+')
sentence = re.sub(pattern, '', sentence)
If you want to only remove whitespace from the beginning and end you can use strip:
sentence = sentence.strip()
You can also use lstrip to remove whitespace only from the beginning of the string, and rstrip to remove whitespace from the end of the string.

An alternative is to use regular expressions and match these strange white-space characters too. Here are some examples:
Remove ALL spaces in a string, even between words:
import re
sentence = re.sub(r"\s+", "", sentence, flags=re.UNICODE)
Remove spaces in the BEGINNING of a string:
import re
sentence = re.sub(r"^\s+", "", sentence, flags=re.UNICODE)
Remove spaces in the END of a string:
import re
sentence = re.sub(r"\s+$", "", sentence, flags=re.UNICODE)
Remove spaces both in the BEGINNING and in the END of a string:
import re
sentence = re.sub("^\s+|\s+$", "", sentence, flags=re.UNICODE)
Remove ONLY DUPLICATE spaces:
import re
sentence = " ".join(re.split("\s+", sentence, flags=re.UNICODE))
(All examples work in both Python 2 and Python 3)

"Whitespace" includes space, tabs, and CRLF. So an elegant and one-liner string function we can use is str.translate:
Python 3
' hello apple '.translate(str.maketrans('', '', ' \n\t\r'))
OR if you want to be thorough:
import string
' hello apple'.translate(str.maketrans('', '', string.whitespace))
Python 2
' hello apple'.translate(None, ' \n\t\r')
OR if you want to be thorough:
import string
' hello apple'.translate(None, string.whitespace)

For removing whitespace from beginning and end, use strip.
>> " foo bar ".strip()
"foo bar"

' hello \n\tapple'.translate({ord(c):None for c in ' \n\t\r'})
MaK already pointed out the "translate" method above. And this variation works with Python 3 (see this Q&A).

In addition, strip has some variations:
Remove spaces in the BEGINNING and END of a string:
sentence= sentence.strip()
Remove spaces in the BEGINNING of a string:
sentence = sentence.lstrip()
Remove spaces in the END of a string:
sentence= sentence.rstrip()
All three string functions strip lstrip, and rstrip can take parameters of the string to strip, with the default being all white space. This can be helpful when you are working with something particular, for example, you could remove only spaces but not newlines:
" 1. Step 1\n".strip(" ")
Or you could remove extra commas when reading in a string list:
"1,2,3,".strip(",")

Be careful:
strip does a rstrip and lstrip (removes leading and trailing spaces, tabs, returns and form feeds, but it does not remove them in the middle of the string).
If you only replace spaces and tabs you can end up with hidden CRLFs that appear to match what you are looking for, but are not the same.

eliminate all the whitespace from a string, on both ends, and in between words.
>>> import re
>>> re.sub("\s+", # one or more repetition of whitespace
'', # replace with empty string (->remove)
''' hello
... apple
... ''')
'helloapple'
https://en.wikipedia.org/wiki/Whitespace_character
Python docs:
https://docs.python.org/library/stdtypes.html#textseq
https://docs.python.org/library/stdtypes.html#str.replace
https://docs.python.org/library/string.html#string.replace
https://docs.python.org/library/re.html#re.sub
https://docs.python.org/library/re.html#regular-expression-syntax

I use split() to ignore all whitespaces and use join() to concatenate
strings.
sentence = ''.join(' hello apple '.split())
print(sentence) #=> 'helloapple'
I prefer this approach because it is only a expression (not a statement).
It is easy to use and it can use without binding to a variable.
print(''.join(' hello apple '.split())) # no need to binding to a variable

import re
sentence = ' hello apple'
re.sub(' ','',sentence) #helloworld (remove all spaces)
re.sub(' ',' ',sentence) #hello world (remove double spaces)

In the following script we import the regular expression module which we use to substitute one space or more with a single space. This ensures that the inner extra spaces are removed. Then we use strip() function to remove leading and trailing spaces.
# Import regular expression module
import re
# Initialize string
a = " foo bar "
# First replace any number of spaces with a single space
a = re.sub(' +', ' ', a)
# Then strip any leading and trailing spaces.
a = a.strip()
# Show results
print(a)

I found that this works the best for me:
test_string = ' test a s test '
string_list = [s.strip() for s in str(test_string).split()]
final_string = ' '.join(string_array)
# final_string: 'test a s test'
It removes any whitespaces, tabs, etc.

try this.. instead of using re i think using split with strip is much better
def my_handle(self):
sentence = ' hello apple '
' '.join(x.strip() for x in sentence.split())
#hello apple
''.join(x.strip() for x in sentence.split())
#helloapple

Related

Replace string with same amount of white spaces in search pattern

I would like to keep the same amount of white space in my string but replace 'no' with empty string. Here is my code:
>>> import re
>>> line = ' no command'
>>> line = re.sub('^\s.+no','',line)
>>> line
' command'
Expected output without only 'no' replaced with empty string and the number of white spaces remain:
' command'
Using .+ matches any character, so if you only want to match 1 or more whitespace characters you should repeat \s+ instead.
Then capture that in group 1 to be able to use that in the replacement and match no followed by a word boundary.
Note that \s can also match a newline.
import re
line = ' no command'
line = re.sub(r'^(\s+)no\b', r'\1', line)
print(line)
Output
command
I think you want a positive look behind. It doesn't quite do exactly what your code does in terms of finding a match at the beginning of the line, but hopefully this sets you on the right track.
>>> line = ' no command'
>>> re.sub('(?<=\s)no','',line)
' command'
You can also capture the preceding text.
>>> re.sub('^(\s.+)no', r'\1', line)
' command'
what about using the replace method of the string class?
new_line = line.replace("no", "")

remove whitespace before letters using regex [duplicate]

I have a text string that starts with a number of spaces, varying between 2 & 4.
What is the simplest way to remove the leading whitespace? (ie. remove everything before a certain character?)
" Example" -> "Example"
" Example " -> "Example "
" Example" -> "Example"
The lstrip() method will remove leading whitespaces, newline and tab characters on a string beginning:
>>> ' hello world!'.lstrip()
'hello world!'
Edit
As balpha pointed out in the comments, in order to remove only spaces from the beginning of the string, lstrip(' ') should be used:
>>> ' hello world with 2 spaces and a tab!'.lstrip(' ')
'\thello world with 2 spaces and a tab!'
Related question:
Trimming a string in Python
The function strip will remove whitespace from the beginning and end of a string.
my_str = " text "
my_str = my_str.strip()
will set my_str to "text".
If you want to cut the whitespaces before and behind the word, but keep the middle ones.
You could use:
word = ' Hello World '
stripped = word.strip()
print(stripped)
To remove everything before a certain character, use a regular expression:
re.sub(r'^[^a]*', '')
to remove everything up to the first 'a'. [^a] can be replaced with any character class you like, such as word characters.
The question doesn't address multiline strings, but here is how you would strip leading whitespace from a multiline string using python's standard library textwrap module. If we had a string like:
s = """
line 1 has 4 leading spaces
line 2 has 4 leading spaces
line 3 has 4 leading spaces
"""
if we print(s) we would get output like:
>>> print(s)
this has 4 leading spaces 1
this has 4 leading spaces 2
this has 4 leading spaces 3
and if we used textwrap.dedent:
>>> import textwrap
>>> print(textwrap.dedent(s))
this has 4 leading spaces 1
this has 4 leading spaces 2
this has 4 leading spaces 3
My personal favorite for any string handling is strip, split and join (in that order):
>>> ' '.join(" this is my badly spaced string ! ".strip().split())
'this is my badly spaced string !'
In general it can be good to apply this for all string handling.
This does the following:
First it strips - this removes leading and ending spaces.
Then it splits - it does this on whitespace by default (so it'll even get tabs and newlines). The thing is that this returns a list.
Finally join iterates over the list and joins each with a single space in between.
Using regular expressions when cleaning the text is the best practice
def removing_leading_whitespaces(text):
return re.sub(r"^\s+","",text)
Apply the above function
removing_leading_whitespaces(" Example")
" Example" -> "Example"
removing_leading_whitespaces(" Example ")
" Example " -> "Example "
removing_leading_whitespaces(" Example")
" Example" -> "Example"

Remove characters from the beginning of a string [duplicate]

I have a text string that starts with a number of spaces, varying between 2 & 4.
What is the simplest way to remove the leading whitespace? (ie. remove everything before a certain character?)
" Example" -> "Example"
" Example " -> "Example "
" Example" -> "Example"
The lstrip() method will remove leading whitespaces, newline and tab characters on a string beginning:
>>> ' hello world!'.lstrip()
'hello world!'
Edit
As balpha pointed out in the comments, in order to remove only spaces from the beginning of the string, lstrip(' ') should be used:
>>> ' hello world with 2 spaces and a tab!'.lstrip(' ')
'\thello world with 2 spaces and a tab!'
Related question:
Trimming a string in Python
The function strip will remove whitespace from the beginning and end of a string.
my_str = " text "
my_str = my_str.strip()
will set my_str to "text".
If you want to cut the whitespaces before and behind the word, but keep the middle ones.
You could use:
word = ' Hello World '
stripped = word.strip()
print(stripped)
To remove everything before a certain character, use a regular expression:
re.sub(r'^[^a]*', '')
to remove everything up to the first 'a'. [^a] can be replaced with any character class you like, such as word characters.
The question doesn't address multiline strings, but here is how you would strip leading whitespace from a multiline string using python's standard library textwrap module. If we had a string like:
s = """
line 1 has 4 leading spaces
line 2 has 4 leading spaces
line 3 has 4 leading spaces
"""
if we print(s) we would get output like:
>>> print(s)
this has 4 leading spaces 1
this has 4 leading spaces 2
this has 4 leading spaces 3
and if we used textwrap.dedent:
>>> import textwrap
>>> print(textwrap.dedent(s))
this has 4 leading spaces 1
this has 4 leading spaces 2
this has 4 leading spaces 3
My personal favorite for any string handling is strip, split and join (in that order):
>>> ' '.join(" this is my badly spaced string ! ".strip().split())
'this is my badly spaced string !'
In general it can be good to apply this for all string handling.
This does the following:
First it strips - this removes leading and ending spaces.
Then it splits - it does this on whitespace by default (so it'll even get tabs and newlines). The thing is that this returns a list.
Finally join iterates over the list and joins each with a single space in between.
Using regular expressions when cleaning the text is the best practice
def removing_leading_whitespaces(text):
return re.sub(r"^\s+","",text)
Apply the above function
removing_leading_whitespaces(" Example")
" Example" -> "Example"
removing_leading_whitespaces(" Example ")
" Example " -> "Example "
removing_leading_whitespaces(" Example")
" Example" -> "Example"

Python: use regular expression to remove the white space from all lines

^(\s+) only removes the whitespace from the first line. How do I remove the front whitespace from all the lines?
Python's regex module does not default to multi-line ^ matching, so you need to specify that flag explicitly.
r = re.compile(r"^\s+", re.MULTILINE)
r.sub("", "a\n b\n c") # "a\nb\nc"
# or without compiling (only possible for Python 2.7+ because the flags option
# didn't exist in earlier versions of re.sub)
re.sub(r"^\s+", "", "a\n b\n c", flags = re.MULTILINE)
# but mind that \s includes newlines:
r.sub("", "a\n\n\n\n b\n c") # "a\nb\nc"
It's also possible to include the flag inline to the pattern:
re.sub(r"(?m)^\s+", "", "a\n b\n c")
An easier solution is to avoid regular expressions because the original problem is very simple:
content = 'a\n b\n\n c'
stripped_content = ''.join(line.lstrip(' \t') for line in content.splitlines(True))
# stripped_content == 'a\nb\n\nc'
you can try strip() if you want to remove front and back, or lstrip() if front
>>> s=" string with front spaces and back "
>>> s.strip()
'string with front spaces and back'
>>> s.lstrip()
'string with front spaces and back '
for line in open("file"):
print line.lstrip()
If you really want to use regex
>>> import re
>>> re.sub("^\s+","",s) # remove the front
'string with front spaces and back '
>>> re.sub("\s+\Z","",s)
' string with front spaces and back' #remove the back
#AndiDog acknowledges in his (currently accepted) answer that it munches consecutive newlines.
Here's how to fix that deficiency, which is caused by the fact that \n is BOTH whitespace and a line separator. What we need to do is make an re class that includes only whitespace characters other than newline.
We want whitespace and not newline, which can't be expressed directly in an re class. Let's rewrite that as not not (whitespace and not newline) i.e. not(not whitespace or not not newline (thanks, Augustus) i.e. not(not whitespace or newline) i.e. [^\S\n] in re notation.
So:
>>> re.sub(r"(?m)^[^\S\n]+", "", " a\n\n \n\n b\n c\nd e")
'a\n\n\n\nb\nc\nd e'
nowhite = ''.join(mytext.split())
NO whitespace will remain like you asked (everything is put as one word). More useful usualy is to join everything with ' ' or '\n' to keep words separately.
You'll have to use the re.MULTILINE option:
re.sub("(?m)^\s+", "", text)
The "(?m)" part enables multiline.
You don't actually need regular expressions for this most of the time. If you are only looking to remove common indentation across multiple lines, try the textwrap module:
>>> import textwrap
>>> messy_text = " grrr\n whitespace\n everywhere"
>>> print textwrap.dedent(messy_text)
grrr
whitespace
everywhere
Note that if the indentation is irregular, this will maintained:
>>> very_messy_text = " grrr\n \twhitespace\n everywhere"
>>> print textwrap.dedent(very_messy_text)
grrr
whitespace
everywhere

How do I remove leading whitespace in Python?

I have a text string that starts with a number of spaces, varying between 2 & 4.
What is the simplest way to remove the leading whitespace? (ie. remove everything before a certain character?)
" Example" -> "Example"
" Example " -> "Example "
" Example" -> "Example"
The lstrip() method will remove leading whitespaces, newline and tab characters on a string beginning:
>>> ' hello world!'.lstrip()
'hello world!'
Edit
As balpha pointed out in the comments, in order to remove only spaces from the beginning of the string, lstrip(' ') should be used:
>>> ' hello world with 2 spaces and a tab!'.lstrip(' ')
'\thello world with 2 spaces and a tab!'
Related question:
Trimming a string in Python
The function strip will remove whitespace from the beginning and end of a string.
my_str = " text "
my_str = my_str.strip()
will set my_str to "text".
If you want to cut the whitespaces before and behind the word, but keep the middle ones.
You could use:
word = ' Hello World '
stripped = word.strip()
print(stripped)
To remove everything before a certain character, use a regular expression:
re.sub(r'^[^a]*', '')
to remove everything up to the first 'a'. [^a] can be replaced with any character class you like, such as word characters.
The question doesn't address multiline strings, but here is how you would strip leading whitespace from a multiline string using python's standard library textwrap module. If we had a string like:
s = """
line 1 has 4 leading spaces
line 2 has 4 leading spaces
line 3 has 4 leading spaces
"""
if we print(s) we would get output like:
>>> print(s)
this has 4 leading spaces 1
this has 4 leading spaces 2
this has 4 leading spaces 3
and if we used textwrap.dedent:
>>> import textwrap
>>> print(textwrap.dedent(s))
this has 4 leading spaces 1
this has 4 leading spaces 2
this has 4 leading spaces 3
My personal favorite for any string handling is strip, split and join (in that order):
>>> ' '.join(" this is my badly spaced string ! ".strip().split())
'this is my badly spaced string !'
In general it can be good to apply this for all string handling.
This does the following:
First it strips - this removes leading and ending spaces.
Then it splits - it does this on whitespace by default (so it'll even get tabs and newlines). The thing is that this returns a list.
Finally join iterates over the list and joins each with a single space in between.
Using regular expressions when cleaning the text is the best practice
def removing_leading_whitespaces(text):
return re.sub(r"^\s+","",text)
Apply the above function
removing_leading_whitespaces(" Example")
" Example" -> "Example"
removing_leading_whitespaces(" Example ")
" Example " -> "Example "
removing_leading_whitespaces(" Example")
" Example" -> "Example"

Categories

Resources