Strip removing more characters than expected

Strip removing more characters than expected - python

Can anyone explain what's going on here:
s = 'REFPROP-MIX:METHANOL&WATER'
s.lstrip('REFPROP-MIX') # this returns ':METHANOL&WATER' as expected
s.lstrip('REFPROP-MIX:') # returns 'THANOL&WATER'
What happened to that 'ME'? Is a colon a special character for lstrip? This is particularly confusing because this works as expected:
s = 'abc-def:ghi'
s.lstrip('abc-def') # returns ':ghi'
s.lstrip('abd-def:') # returns 'ghi'

str.lstrip removes all the characters in its argument from the string, starting at the left. Since all the characters in the left prefix "REFPROP-MIX:ME" are in the argument "REFPROP-MIX:", all those characters are removed. Likewise:
>>> s = 'abcadef'
>>> s.lstrip('abc')
'def'
>>> s.lstrip('cba')
'def'
>>> s.lstrip('bacabacabacabaca')
'def'
str.lstrip does not remove whole strings (of length greater than 1) from the left. If you want to do that, use a regular expression with an anchor ^ at the beginning:
>>> import re
>>> s = 'REFPROP-MIX:METHANOL&WATER'
>>> re.sub(r'^REFPROP-MIX:', '', s)
'METHANOL&WATER'

The method mentioned by #PadraicCunningham is a good workaround for the particular problem as stated.
Just split by the separating character and select the last value:
s = 'REFPROP-MIX:METHANOL&WATER'
res = s.split(':', 1)[-1] # 'METHANOL&WATER'

Related

Python regex how to remove all zeo from beginning?

I have lot of string somethings like this "01568460144" ,"0005855048560"
I want to remove all zero from beginning. I tried this which only removing one zeo from beginning but I also have others string those have multiple zeo at the beginning.
re.sub(r'0','',number)
so my expected result will be for "0005855048560" this type of string "5855048560"

If the goal is to remove all leading zeroes from a string, skip the regex, and just call .lstrip('0') on the string. The *strip family of functions are a little weird when the argument isn't a single character, but for the purposes of stripping leading/trailing copies of a single character, they're perfect:
>>> s = '000123'
>>> s = s.lstrip('0')
>>> s
'123'

>>> v = '0001111110'
>>>
>>> str(int(v))
'1111110'
>>>
>>> str(int('0005855048560'))
'5855048560'

If the string should contain only digits, you can use either isnumeric() or use re.sub and match only digits:
import re
strings = [
"01568460144",
"0005855048560",
"00test",
"00000",
"0"
]
for s1 in strings:
if s1.isnumeric():
print(f"'{s1.lstrip('0')}'")
else:
print(f"'{s1}'")
print("----------------------------")
for s2 in strings:
res = re.sub(r"^0+(\d*)$", r"\1", s2)
print(f"'{res}'")
Output
'1568460144'
'5855048560'
'00test'
''
''
----------------------------
'1568460144'
'5855048560'
'00test'
''
''

Rstrip not removing correct backslashes or giving position

So,
I have a string that looks like \uisfhb\dfjn
This will vary in length. Im struggling to get my head around rsplit and the fact that backslash is an escape character. i only want "dfjn"
i currently have
more = "\\\\uisfhb\dfjn"
more = more.replace(r'"\\\\', r"\\")
sharename = more.rsplit(r'\\', 2)
print(sharename)
and im getting back
['', 'uisfhb\dfjn']

If you want to partition a string on a literal backslash, you need to escape the backslash with another backslash in the separator.
>>> more.split('\\')
['', '', 'uisfhb', 'dfjn']
>>> more.rsplit('\\', 1)
['\\\\uisfhb', 'dfjn']
>>> more.rpartition('\\')
('\\\\uisfhb', '\\', 'dfjn')
Once the string has been split, the last element can be accessed using the index -1:
>>> sharename = more.rsplit('\\', 1)[-1]
>>> sharename
'dfjn'
or using sequence-unpacking syntax (the * operator)
>>> *_, sharename = more.rpartition('\\')
>>> sharename
'dfjn'

I think this is an issue with raw strings. Try this:
more = "\\\\uisfhb\dfjn"
more = more.replace("\\\\", "\\")
sharename = more.split("\\")[2] # using split and not rsplit
print(sharename)

If sharename is the last node in the tree, this will get it:
>>>more = "\\\\uisfhb\dfjn"
>>>sharename = more.split('\\')[-1]
>>>sharename
'dfjn'

Understanding string method strip

After initializing a variable x with the content shown in below, I applied strip with a parameter. The result of strip is unexpected. As I'm trying to strip "ios_static_analyzer/", "rity/ios_static_analyzer/" is getting striped.
Kindly help me know why is it so.
>>> print x
/Users/msecurity/Desktop/testspace/Hy5_Workspace/security/ios_static_analyzer/
>>> print x.strip()
/Users/msecurity/Desktop/testspace/Hy5_Workspace/security/ios_static_analyzer/
>>> print x.strip('/')
Users/msecurity/Desktop/testspace/Hy5_Workspace/security/ios_static_analyzer
>>> print x.strip('ios_static_analyzer/')
Users/msecurity/Desktop/testspace/Hy5_Workspace/secu
>>> print x.strip('analyzer/')
Users/msecurity/Desktop/testspace/Hy5_Workspace/security/ios_static_
>>> print x.strip('_analyzer/')
Users/msecurity/Desktop/testspace/Hy5_Workspace/security/ios_static
>>> print x.strip('static_analyzer/')
Users/msecurity/Desktop/testspace/Hy5_Workspace/security/io
>>> print x.strip('_static_analyzer/')
Users/msecurity/Desktop/testspace/Hy5_Workspace/security/io
>>> print x.strip('s_static_analyzer/')
Users/msecurity/Desktop/testspace/Hy5_Workspace/security/io
>>> print x.strip('os_static_analyzer/')
Users/msecurity/Desktop/testspace/Hy5_Workspace/secu

Quoting from str.strip docs
Return a copy of the string with the leading and trailing characters
removed. The chars argument is a string specifying the set of
characters to be removed. If omitted or None, the chars argument
defaults to removing whitespace. The chars argument is not a prefix or
suffix; rather, all combinations of its values are stripped:
So, it removes all the characters in the parameter, from both the sides of the string.
For example,
my_str = "abcd"
print my_str.strip("da") # bc
Note: You can think of it like this, it stops removing the characters from the string when it finds a character which is not found in the input parameter string.
To actually, remove the particular string, you should use str.replace
x = "/Users/Desktop/testspace/Hy5_Workspace/security/ios_static_analyzer/"
print x.replace('analyzer/', '')
# /Users/msecurity/Desktop/testspace/Hy5_Workspace/security/ios_static_
But replace will remove the matches everywhere,
x = "abcd1abcd2abcd"
print x.replace('abcd', '') # 12
But if you want to remove words only at the beginning and ending of the string, you can use RegEx, like this
import re
pattern = re.compile("^{0}|{0}$".format("abcd"))
x = "abcd1abcd2abcd"
print pattern.sub("", x) # 1abcd2

What you need, I think, is replace:
>>> x.replace('ios_static_analyzer/','')
'/Users/msecurity/Desktop/testspace/Hy5_Workspace/security/'
string.replace(s, old, new[, maxreplace])
Return a copy of string s with all occurrences of substring old replaced by new.
So you can replace your string with nothing and get the desired output.

Python x.strip(s) remove from the begginning or the end of the string x any character appearing in s ! So s is just a set of characters, not a string being matched for substring.

string.strip removes a set of characters given as an argument. The chars argument is not a prefix or suffix; rather, all combinations of its values are stripped.

strip does not remove the string given as argument from the object; it removes the characters in the argument.
In this case, strip sees the string s_static_analyzer/ as an iterable of characters that needs to be stripped.

Using parentheses as delimiter in re or str.split() python

I am trying to split a string such as: add(ten)sub(one) into add(ten) sub(one).
I can't figure out how to match the close parentheses. I have used re.sub(r'\\)', '\\) ') and every variation of escaping the parentheses,I can think of. It is hard to tell in this font but I am trying to add a space between these commands so I can split it into a list later.

There's no need to escape ) in the replacement string, ) has a special a special meaning only in the regex pattern so it needs to be escaped there in order to match it in the string, but in normal string it can be used as is.
>>> strs = "add(ten)sub(one)"
>>> re.sub(r'\)(?=\S)',r') ', strs)
'add(ten) sub(one)'
As #StevenRumbalski pointed out in comments the above operation can be simply done using str.replace and str.rstrip:
>>> strs.replace(')',') ').strip()
'add(ten) sub(one)'

d = ')'
my_str = 'add(ten)sub(one)'
result = [t+d for t in my_str.split(d) if len(t) > 0]
result = ['add(ten)','sub(one)']

Create a list of all substrings
import re
a = 'add(ten)sub(one)'
print [ b for b in re.findall('(.+?\(.+?\))', a) ]
Output:
['add(ten)', 'sub(one)']

Strip all characters after the final dash in a string in Python, and test if numeric?

If I have a string:
string = 'this-is-a-string-125'
How can I grab the last set of characters after the dash and check if they are digits?

If you want to verify that they are actually digits, you can do
x.rsplit('-', 1)[1].isdigit()
"Numeric" is a more general criteria that could be interpreted different ways. For instance "12.87" is numeric in some sense, but not all the characters are digits.
You can do int(x.rsplit('-', 1)[1]) to see if the string can be interpreted as a integer, or float(x.rsplit('-', 1)[1]) to see if it can be interpreted as a float. (These will raise a ValueError if the string isn't numeric in the appropriate sense, so you can catch that exception and do whatever you need to do if it's not numeric.)

s = 'this-is-a-string-125'.split('-')[-1].isdigit()
We split the string by dash ('-') which gives a list of substrings (see split()). We then take the last one ([-1]) and we verify that that string contains only digits (isdigit()):
>>> 'this-is-a-string-125'.split('-')
['this', 'is', 'a', 'string', '125']
>>> 'this-is-a-string-125'.split('-')[-1]
'125'
>>> 'this-is-a-string-125'.split('-')[-1].isdigit()
True

Nobody knows about partition or rpartition:
text.rpartition("-")[-1].isdigit()

How about:
str.split('-')[-1].isdigit()

Seems like a simple regex can do both the stripping and checking:
>>> import re
>>> s = 'this-is-a-string-125'
>>> m = re.search(r'-(\d+)$', s)
>>> m.group(1)
'125'
>>> s[:m.start()] # gives you what was stripped away.
'this-is-a-string'
Match object m will be None if the string lacks a dash character followed by one or more digits at the end.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Strip removing more characters than expected - python

The method mentioned by #PadraicCunningham is a good workaround for the particular problem as stated. Just split by the separating character and select the last value: s = 'REFPROP-MIX:METHANOL&WATER' res = s.split(':', 1)[-1] # 'METHANOL&WATER'

Related

Python regex how to remove all zeo from beginning?

Rstrip not removing correct backslashes or giving position

Understanding string method strip

Using parentheses as delimiter in re or str.split() python

Strip all characters after the final dash in a string in Python, and test if numeric?

Categories

Resources