How to check multiple consecutive spaces in a string with python? [duplicate] - python

This question already has answers here:
Python - Check multiple white spaces in string
(3 answers)
Closed 2 years ago.
I want to find multiple spaces in a string using python. How to check if multiple spaces exist in a string?
mystring = 'this is a test'
I have tried the below code but it does not work.
if bool(re.search(' +', ' ', mystring))==True:
# ....
The result must return True.

You can use split() if you give no delimiter it will consider space as delimiter. after split check the length.If the length is greater than one it has spaces.
mystring = 'this is a test'
if len(mystring.split()) > 1:
#do something

You can use the string.count() method:
If you want to check if multiple (more than one and no matter what their length) spaces exist the code is:
mystring.count(' ')>1
If you want to check if there is at least one consecutive space the code is:
mystring.count(' ')>=1

You can use re and compare the strings like this:
import re
mystring = 'this is a test'
new_str = re.sub(' +', ' ', mystring)
if mystring == new_str:
print('There are no multiple spaces')
else:
print('There are multiple spaces')

The syntax you use for re.search is wrong. It should be re.search(pattern, string, flags=0).
So, you can just do, searching for 2 or more spaces:
import re
def contains_multiple_spaces(s):
return bool(re.search(r' {2,}', s))
contains_multiple_spaces('no multiple spaces')
# False
contains_multiple_spaces('here are multiple spaces')
# True

Related

How to remove special characters except space from a file in python? [duplicate]

This question already has answers here:
Keeping only certain characters in a string using Python?
(3 answers)
Closed 7 months ago.
I have a huge corpus of text (line by line) and I want to remove special characters but sustain the space and structure of the string.
hello? there A-Z-R_T(,**), world, welcome to python.
this **should? the next line#followed- by# an#other %million^ %%like $this.
should be
hello there A Z R T world welcome to python
this should be the next line followed by another million like this
You can use this pattern, too, with regex:
import re
a = '''hello? there A-Z-R_T(,**), world, welcome to python.
this **should? the next line#followed- by# an#other %million^ %%like $this.'''
for k in a.split("\n"):
print(re.sub(r"[^a-zA-Z0-9]+", ' ', k))
# Or:
# final = " ".join(re.findall(r"[a-zA-Z0-9]+", k))
# print(final)
Output:
hello there A Z R T world welcome to python
this should the next line followed by an other million like this
Edit:
Otherwise, you can store the final lines into a list:
final = [re.sub(r"[^a-zA-Z0-9]+", ' ', k) for k in a.split("\n")]
print(final)
Output:
['hello there A Z R T world welcome to python ', 'this should the next line followed by an other million like this ']
I think nfn neil answer is great...but i would just add a simple regex to remove all no words character,however it will consider underscore as part of the word
print re.sub(r'\W+', ' ', string)
>>> hello there A Z R_T world welcome to python
you can try this
import re
sentance = '''hello? there A-Z-R_T(,**), world, welcome to python. this **should? the next line#followed- by# an#other %million^ %%like $this.'''
res = re.sub('[!,*)##%(&$_?.^]', '', sentance)
print(res)
re.sub('["]') -> here you can add which symbol you want to remove
A more elegant solution would be
print(re.sub(r"\W+|_", " ", string))
>>> hello there A Z R T world welcome to python this should the next line followed by another million like this
Here,
re is regex module in python
re.sub will substitute pattern with space i.e., " "
r'' will treat input string as raw (with \n)
\W for all non-words i.e. all special characters *&^%$ etc excluding underscore _
+ will match zero to unlimited matches, similar to * (one to more)
| is logical OR
_ stands for underscore
Create a dictionary mapping special characters to None
d = {c:None for c in special_characters}
Make a translation table using the dictionary. Read the entire text into a variable and use str.translate on the entire text.

Replace command condition for replacing strings [duplicate]

This question already has answers here:
Replace all the occurrences of specific words
(4 answers)
Find substring in string but only if whole words?
(8 answers)
Closed 6 years ago.
Want to replace a certain words in a string but keep getting the followinf result:
String: "This is my sentence."
User types in what they want to replace: "is"
User types what they want to replace word with: "was"
New string: "Thwas was my sentence."
How can I make sure it only replaces the word "is" instead of any string of the characters it finds?
Code function:
import string
def replace(word, new_word):
new_file = string.replace(word, new_word[1])
return new_file
Any help is much appreciated, thank you!
using regular expression word boundary:
import re
print(re.sub(r"\bis\b","was","This is my sentence"))
Better than a mere split because works with punctuation as well:
print(re.sub(r"\bis\b","was","This is, of course, my sentence"))
gives:
This was, of course, my sentence
Note: don't skip the r prefix, or your regex would be corrupt: \b would be interpreted as backspace.
A simple but not so all-round solution (as given by Jean-Francios Fabre) without using regular expressions.
' '.join(x if x != word else new_word for x in string.split())

Why doesn't .strip() remove whitespaces? [duplicate]

This question already has answers here:
Remove all whitespace in a string
(14 answers)
Python String replace doesn't work [duplicate]
(1 answer)
Closed 1 year ago.
I have a function that begins like this:
def solve_eq(string1):
string1.strip(' ')
return string1
I'm inputting the string '1 + 2 * 3 ** 4' but the return statement is not stripping the spaces at all and I can't figure out why. I've even tried .replace() with no luck.
strip does not remove whitespace everywhere, only at the beginning and end. Try this:
def solve_eq(string1):
return string1.replace(' ', '')
This can also be achieved using regex:
import re
a_string = re.sub(' +', '', a_string)
strip doesn't change the original string since strings are immutable. Also, instead of string1.strip(' '), use string1.replace(' ', '') and set a return value to the new string or just return it.
Option 1:
def solve_eq(string1):
string1 = string1.replace(' ', '')
return string1
Option 2:
def solve_eq(string1):
return string1.replace(' ', '')
strip returns the stripped string; it does not modify the original string.

Having trouble adding a space after a period in a python string

I have to write a code to do 2 things:
Compress more than one occurrence of the space character into one.
Add a space after a period, if there isn't one.
For example:
input> This is weird.Indeed
output>This is weird. Indeed.
This is the code I wrote:
def correction(string):
list=[]
for i in string:
if i!=" ":
list.append(i)
elif i==" ":
k=i+1
if k==" ":
k=""
list.append(i)
s=' '.join(list)
return s
strn=input("Enter the string: ").split()
print (correction(strn))
This code takes any input by the user and removes all the extra spaces,but it's not adding the space after the period(I know why not,because of the split function it's taking the period and the next word with it as one word, I just can't figure how to fix it)
This is a code I found online:
import re
def correction2(string):
corstr = re.sub('\ +',' ',string)
final = re.sub('\.','. ',corstr)
return final
strn= ("This is as .Indeed")
print (correction2(strn))
The problem with this code is I can't take any input from the user. It is predefined in the program.
So can anyone suggest how to improve any of the two codes to do both the functions on ANY input by the user?
Is this what you desire?
import re
def corr(s):
return re.sub(r'\.(?! )', '. ', re.sub(r' +', ' ', s))
s = input("> ")
print(corr(s))
I've changed the regex to a lookahead pattern, take a look here.
Edit: explain Regex as requested in comment
re.sub() takes (at least) three arguments: The Regex search pattern, the replacement the matched pattern should be replaced with, and the string in which the replacement should be done.
What I'm doing here is two steps at once, I've been using the output of one function as input of another.
First, the inner re.sub(r' +', ' ', s) searches for multiple spaces (r' +') in s to replace them with single spaces. Then the outer re.sub(r'\.(?! )', '. ', ...) looks for periods without following space character to replace them with '. '. I'm using a negative lookahead pattern to match only sections, that don't match the specified lookahead pattern (a normal space character in this case). You may want to play around with this pattern, this may help understanding it better.
The r string prefix changes the string to a raw string where backslash-escaping is disabled. Unnecessary in this case, but it's a habit of mine to use raw strings with regular expressions.
For a more basic answer, without regex:
>>> def remove_doublespace(string):
... if ' ' not in string:
... return string
... return remove_doublespace(string.replace(' ',' '))
...
>>> remove_doublespace('hi there how are you.i am fine. '.replace('.', '. '))
'hi there how are you. i am fine. '
You try the following code:
>>> s = 'This is weird.Indeed'
>>> def correction(s):
res = re.sub('\s+$', '', re.sub('\s+', ' ', re.sub('\.', '. ', s)))
if res[-1] != '.':
res += '.'
return res
>>> print correction(s)
This is weird. Indeed.
>>> s=raw_input()
hee ss.dk
>>> s
'hee ss.dk'
>>> correction(s)
'hee ss. dk.'

I want to remove '\' from a string in python [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to refer to “\” sign in python string
I've quite large string data in which I've to remove all characters other than A-Z,a-z and 0-9
I'm able to remove almost every character but '\' is a problem.
every other character is removed but '\' is making problem
def replace_all(text, dic):
for i, j in dic.iteritems():
text = text.replace(i, j)
return text
reps = {' ':'-','.':'-','"':'-',',':'-','/':'-',
'<':'-',';':'-',':':'-','*':'-','+':'-',
'=':'-','_':'-','?':'-','%':'-','!':'-',
'$':'-','(':'-',')':'-','\#':'-','[':'-',
']':'-','\&':'-','#':'-','\W':'-','\t':'-'}
x.name = x.name.lower()
x1 = replace_all(x.name,reps)
I've quite large string data in which I've to remove all characters other than A-Z,a-z and 0-9
In other words, you want to keep only those characters.
The string class already provides a test "is every character a letter or number?", called .isalnum(). So, we can just filter with that:
>>> filter(str.isalnum, 'foo-bar\\baz42')
'foobarbaz42'
If you have a string:
a = 'hi how \\are you'
you can remove it by doing:
a.replace('\\','')
>'hi how are you'
If you have a specific context where you are having trouble, I recommend posting a bit more detail.
birryee is correct, you need to escape the backslash with a second backslash.
to remove all characters other than A-Z, a-z and 0-9
Instead of trying to list all the characters you want to remove (that would take a long time), use a regular expression to specify those characters you wish to keep:
import re
text = re.sub('[^0-9A-Za-z]', '-', text)

Categories

Resources