Remove String spaces with regular expression

Remove String spaces with regular expression - python

I am going to remove the first index space and last index spaces via python's re feature:
I tried:
re.sub(r"\s+" ,""," hello world ") // remove the first place space
But it does not removed any thing.

>>> re.sub(r'^\s+|\s+$', '', ' hello world ')
'hello world'
That will remove all leading and trailing whitespaces, though, not necessarily only the first and last index.

Related

Regex extract strings

Original text: This is the first variable "${abc}" and this is the second variable "${def}"
Desired output: This is the first variable and this is the second variable
I want to get rid of "${abc}" and "${def}" using regex. Currently, I am using this regex command: \".*\" but the output I am getting is "${abc}" and this is the second variable "${def}"

You want to match all characters that are not ", so instead of .* do [^"]*.
>>> in_str = 'This is the first variable "${abc}" and this is the second variable "${def}"'
>>> re.sub(r'"[^"]*"', '', in_str)
'This is the first variable and this is the second variable '
Better yet, constrain it more so you only match "${...}", and not anything enclosed in quotes (r'"\$\{[^}]*\}"')
>>> re.sub(r'"\$\{[^}]*\}"', '', in_str)
'This is the first variable and this is the second variable '
Explanation:
"\$\{[^}]*\}"
-------------
" " : Quotes
\$ : A literal $ sign
\{ \} : Braces
[^}]* : Zero or more characters that are not }
To get rid of the trailing spaces after the match, add an optional \s? at the end of the regex.
>>> re.sub(r'"\$\{[^}]*\}"\s?', '', in_str)
'This is the first variable and this is the second variable '
Of course, this leaves behind a trailing space if the last word was a match, but you can just .strip() that out.
Try it on Regex101

remove whitespace before letters using regex [duplicate]

I have a text string that starts with a number of spaces, varying between 2 & 4.
What is the simplest way to remove the leading whitespace? (ie. remove everything before a certain character?)
" Example" -> "Example"
" Example " -> "Example "
" Example" -> "Example"

The lstrip() method will remove leading whitespaces, newline and tab characters on a string beginning:
>>> ' hello world!'.lstrip()
'hello world!'
Edit
As balpha pointed out in the comments, in order to remove only spaces from the beginning of the string, lstrip(' ') should be used:
>>> ' hello world with 2 spaces and a tab!'.lstrip(' ')
'\thello world with 2 spaces and a tab!'
Related question:
Trimming a string in Python

The function strip will remove whitespace from the beginning and end of a string.
my_str = " text "
my_str = my_str.strip()
will set my_str to "text".

If you want to cut the whitespaces before and behind the word, but keep the middle ones.
You could use:
word = ' Hello World '
stripped = word.strip()
print(stripped)

To remove everything before a certain character, use a regular expression:
re.sub(r'^[^a]*', '')
to remove everything up to the first 'a'. [^a] can be replaced with any character class you like, such as word characters.

The question doesn't address multiline strings, but here is how you would strip leading whitespace from a multiline string using python's standard library textwrap module. If we had a string like:
s = """
line 1 has 4 leading spaces
line 2 has 4 leading spaces
line 3 has 4 leading spaces
"""
if we print(s) we would get output like:
>>> print(s)
this has 4 leading spaces 1
this has 4 leading spaces 2
this has 4 leading spaces 3
and if we used textwrap.dedent:
>>> import textwrap
>>> print(textwrap.dedent(s))
this has 4 leading spaces 1
this has 4 leading spaces 2
this has 4 leading spaces 3

My personal favorite for any string handling is strip, split and join (in that order):
>>> ' '.join(" this is my badly spaced string ! ".strip().split())
'this is my badly spaced string !'
In general it can be good to apply this for all string handling.
This does the following:
First it strips - this removes leading and ending spaces.
Then it splits - it does this on whitespace by default (so it'll even get tabs and newlines). The thing is that this returns a list.
Finally join iterates over the list and joins each with a single space in between.

Using regular expressions when cleaning the text is the best practice
def removing_leading_whitespaces(text):
return re.sub(r"^\s+","",text)
Apply the above function
removing_leading_whitespaces(" Example")
" Example" -> "Example"
removing_leading_whitespaces(" Example ")
" Example " -> "Example "
removing_leading_whitespaces(" Example")
" Example" -> "Example"

Remove characters from the beginning of a string [duplicate]

I have a text string that starts with a number of spaces, varying between 2 & 4.
What is the simplest way to remove the leading whitespace? (ie. remove everything before a certain character?)
" Example" -> "Example"
" Example " -> "Example "
" Example" -> "Example"

The lstrip() method will remove leading whitespaces, newline and tab characters on a string beginning:
>>> ' hello world!'.lstrip()
'hello world!'
Edit
As balpha pointed out in the comments, in order to remove only spaces from the beginning of the string, lstrip(' ') should be used:
>>> ' hello world with 2 spaces and a tab!'.lstrip(' ')
'\thello world with 2 spaces and a tab!'
Related question:
Trimming a string in Python

The function strip will remove whitespace from the beginning and end of a string.
my_str = " text "
my_str = my_str.strip()
will set my_str to "text".

If you want to cut the whitespaces before and behind the word, but keep the middle ones.
You could use:
word = ' Hello World '
stripped = word.strip()
print(stripped)

To remove everything before a certain character, use a regular expression:
re.sub(r'^[^a]*', '')
to remove everything up to the first 'a'. [^a] can be replaced with any character class you like, such as word characters.

The question doesn't address multiline strings, but here is how you would strip leading whitespace from a multiline string using python's standard library textwrap module. If we had a string like:
s = """
line 1 has 4 leading spaces
line 2 has 4 leading spaces
line 3 has 4 leading spaces
"""
if we print(s) we would get output like:
>>> print(s)
this has 4 leading spaces 1
this has 4 leading spaces 2
this has 4 leading spaces 3
and if we used textwrap.dedent:
>>> import textwrap
>>> print(textwrap.dedent(s))
this has 4 leading spaces 1
this has 4 leading spaces 2
this has 4 leading spaces 3

My personal favorite for any string handling is strip, split and join (in that order):
>>> ' '.join(" this is my badly spaced string ! ".strip().split())
'this is my badly spaced string !'
In general it can be good to apply this for all string handling.
This does the following:
First it strips - this removes leading and ending spaces.
Then it splits - it does this on whitespace by default (so it'll even get tabs and newlines). The thing is that this returns a list.
Finally join iterates over the list and joins each with a single space in between.

Using regular expressions when cleaning the text is the best practice
def removing_leading_whitespaces(text):
return re.sub(r"^\s+","",text)
Apply the above function
removing_leading_whitespaces(" Example")
" Example" -> "Example"
removing_leading_whitespaces(" Example ")
" Example " -> "Example "
removing_leading_whitespaces(" Example")
" Example" -> "Example"

Remove all whitespace in a string

I want to eliminate all the whitespace from a string, on both ends, and in between words.
I have this Python code:
def my_handle(self):
sentence = ' hello apple '
sentence.strip()
But that only eliminates the whitespace on both sides of the string. How do I remove all whitespace?

If you want to remove leading and ending spaces, use str.strip():
>>> " hello apple ".strip()
'hello apple'
If you want to remove all space characters, use str.replace() (NB this only removes the “normal” ASCII space character ' ' U+0020 but not any other whitespace):
>>> " hello apple ".replace(" ", "")
'helloapple'
If you want to remove duplicated spaces, use str.split() followed by str.join():
>>> " ".join(" hello apple ".split())
'hello apple'

To remove only spaces use str.replace:
sentence = sentence.replace(' ', '')
To remove all whitespace characters (space, tab, newline, and so on) you can use split then join:
sentence = ''.join(sentence.split())
or a regular expression:
import re
pattern = re.compile(r'\s+')
sentence = re.sub(pattern, '', sentence)
If you want to only remove whitespace from the beginning and end you can use strip:
sentence = sentence.strip()
You can also use lstrip to remove whitespace only from the beginning of the string, and rstrip to remove whitespace from the end of the string.

An alternative is to use regular expressions and match these strange white-space characters too. Here are some examples:
Remove ALL spaces in a string, even between words:
import re
sentence = re.sub(r"\s+", "", sentence, flags=re.UNICODE)
Remove spaces in the BEGINNING of a string:
import re
sentence = re.sub(r"^\s+", "", sentence, flags=re.UNICODE)
Remove spaces in the END of a string:
import re
sentence = re.sub(r"\s+$", "", sentence, flags=re.UNICODE)
Remove spaces both in the BEGINNING and in the END of a string:
import re
sentence = re.sub("^\s+|\s+$", "", sentence, flags=re.UNICODE)
Remove ONLY DUPLICATE spaces:
import re
sentence = " ".join(re.split("\s+", sentence, flags=re.UNICODE))
(All examples work in both Python 2 and Python 3)

"Whitespace" includes space, tabs, and CRLF. So an elegant and one-liner string function we can use is str.translate:
Python 3
' hello apple '.translate(str.maketrans('', '', ' \n\t\r'))
OR if you want to be thorough:
import string
' hello apple'.translate(str.maketrans('', '', string.whitespace))
Python 2
' hello apple'.translate(None, ' \n\t\r')
OR if you want to be thorough:
import string
' hello apple'.translate(None, string.whitespace)

For removing whitespace from beginning and end, use strip.
>> " foo bar ".strip()
"foo bar"

' hello \n\tapple'.translate({ord(c):None for c in ' \n\t\r'})
MaK already pointed out the "translate" method above. And this variation works with Python 3 (see this Q&A).

In addition, strip has some variations:
Remove spaces in the BEGINNING and END of a string:
sentence= sentence.strip()
Remove spaces in the BEGINNING of a string:
sentence = sentence.lstrip()
Remove spaces in the END of a string:
sentence= sentence.rstrip()
All three string functions strip lstrip, and rstrip can take parameters of the string to strip, with the default being all white space. This can be helpful when you are working with something particular, for example, you could remove only spaces but not newlines:
" 1. Step 1\n".strip(" ")
Or you could remove extra commas when reading in a string list:
"1,2,3,".strip(",")

Be careful:
strip does a rstrip and lstrip (removes leading and trailing spaces, tabs, returns and form feeds, but it does not remove them in the middle of the string).
If you only replace spaces and tabs you can end up with hidden CRLFs that appear to match what you are looking for, but are not the same.

eliminate all the whitespace from a string, on both ends, and in between words.
>>> import re
>>> re.sub("\s+", # one or more repetition of whitespace
'', # replace with empty string (->remove)
''' hello
... apple
... ''')
'helloapple'
https://en.wikipedia.org/wiki/Whitespace_character
Python docs:
https://docs.python.org/library/stdtypes.html#textseq
https://docs.python.org/library/stdtypes.html#str.replace
https://docs.python.org/library/string.html#string.replace
https://docs.python.org/library/re.html#re.sub
https://docs.python.org/library/re.html#regular-expression-syntax

I use split() to ignore all whitespaces and use join() to concatenate
strings.
sentence = ''.join(' hello apple '.split())
print(sentence) #=> 'helloapple'
I prefer this approach because it is only a expression (not a statement).
It is easy to use and it can use without binding to a variable.
print(''.join(' hello apple '.split())) # no need to binding to a variable

import re
sentence = ' hello apple'
re.sub(' ','',sentence) #helloworld (remove all spaces)
re.sub(' ',' ',sentence) #hello world (remove double spaces)

In the following script we import the regular expression module which we use to substitute one space or more with a single space. This ensures that the inner extra spaces are removed. Then we use strip() function to remove leading and trailing spaces.
# Import regular expression module
import re
# Initialize string
a = " foo bar "
# First replace any number of spaces with a single space
a = re.sub(' +', ' ', a)
# Then strip any leading and trailing spaces.
a = a.strip()
# Show results
print(a)

I found that this works the best for me:
test_string = ' test a s test '
string_list = [s.strip() for s in str(test_string).split()]
final_string = ' '.join(string_array)
# final_string: 'test a s test'
It removes any whitespaces, tabs, etc.

try this.. instead of using re i think using split with strip is much better
def my_handle(self):
sentence = ' hello apple '
' '.join(x.strip() for x in sentence.split())
#hello apple
''.join(x.strip() for x in sentence.split())
#helloapple

How do I remove leading whitespace in Python?

I have a text string that starts with a number of spaces, varying between 2 & 4.
What is the simplest way to remove the leading whitespace? (ie. remove everything before a certain character?)
" Example" -> "Example"
" Example " -> "Example "
" Example" -> "Example"

The lstrip() method will remove leading whitespaces, newline and tab characters on a string beginning:
>>> ' hello world!'.lstrip()
'hello world!'
Edit
As balpha pointed out in the comments, in order to remove only spaces from the beginning of the string, lstrip(' ') should be used:
>>> ' hello world with 2 spaces and a tab!'.lstrip(' ')
'\thello world with 2 spaces and a tab!'
Related question:
Trimming a string in Python

The function strip will remove whitespace from the beginning and end of a string.
my_str = " text "
my_str = my_str.strip()
will set my_str to "text".

If you want to cut the whitespaces before and behind the word, but keep the middle ones.
You could use:
word = ' Hello World '
stripped = word.strip()
print(stripped)

To remove everything before a certain character, use a regular expression:
re.sub(r'^[^a]*', '')
to remove everything up to the first 'a'. [^a] can be replaced with any character class you like, such as word characters.

The question doesn't address multiline strings, but here is how you would strip leading whitespace from a multiline string using python's standard library textwrap module. If we had a string like:
s = """
line 1 has 4 leading spaces
line 2 has 4 leading spaces
line 3 has 4 leading spaces
"""
if we print(s) we would get output like:
>>> print(s)
this has 4 leading spaces 1
this has 4 leading spaces 2
this has 4 leading spaces 3
and if we used textwrap.dedent:
>>> import textwrap
>>> print(textwrap.dedent(s))
this has 4 leading spaces 1
this has 4 leading spaces 2
this has 4 leading spaces 3

My personal favorite for any string handling is strip, split and join (in that order):
>>> ' '.join(" this is my badly spaced string ! ".strip().split())
'this is my badly spaced string !'
In general it can be good to apply this for all string handling.
This does the following:
First it strips - this removes leading and ending spaces.
Then it splits - it does this on whitespace by default (so it'll even get tabs and newlines). The thing is that this returns a list.
Finally join iterates over the list and joins each with a single space in between.

Using regular expressions when cleaning the text is the best practice
def removing_leading_whitespaces(text):
return re.sub(r"^\s+","",text)
Apply the above function
removing_leading_whitespaces(" Example")
" Example" -> "Example"
removing_leading_whitespaces(" Example ")
" Example " -> "Example "
removing_leading_whitespaces(" Example")
" Example" -> "Example"

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Remove String spaces with regular expression - python

I am going to remove the first index space and last index spaces via python's re feature: I tried: re.sub(r"\s+" ,""," hello world ") // remove the first place space But it does not removed any thing.

>>> re.sub(r'^\s+|\s+$', '', ' hello world ') 'hello world' That will remove all leading and trailing whitespaces, though, not necessarily only the first and last index.

Related

Regex extract strings

remove whitespace before letters using regex [duplicate]

Remove characters from the beginning of a string [duplicate]

Remove all whitespace in a string

How do I remove leading whitespace in Python?

Categories

Resources