Using tuple efficiently with strip() - python

Consider a basic tuple used with the built-in method str.startswith():
>>> t = 'x','y','z'
>>> s = 'x marks the spot'
>>> s.startswith( t )
True
It seems a loop is required when using a tuple with str.strip():
>>> for i in t: s.strip(i)
...
' marks the spot'
'x marks the spot'
'x marks the spot'
This seems wasteful perhaps; is there a more efficient way to use tuple items with str.strip()? s.strip(t) seemed logical, however, no bueno.

Stripping a string is a very different usecase from testing if a string starts with a given text, so the methods differ materially in how they treat their input.
str.strip() takes one string, which is treated as a set of characters; the string will be stripped of each of those characters in the set; as long as the string starts of ends with a character that is a member of the set, that starting or ending character is removed, until start and end are free of any characters from the given set.
If you have a tuple join it into one string:
s.strip(''.join(t))
or pass it in as a string literal:
s.strip('xyz')
Note that this means something different from using str.strip() with each individual tuple element!
Compare:
>>> s = 'yxz_middle_zxy'
>>> for t in ('x', 'y', 'z'):
... print(s.strip(t))
...
yxz_middle_zxy
xz_middle_zx
yxz_middle_zxy
>>> print(s.strip('xyz'))
_middle_
Even if you chained the str.strip() calls with individual character, it would still not produce the same output because the str.strip() call:
>>> for t in ('x', 'y', 'z'):
... s = s.strip(t)
...
>>> print(s)
xz_middle_zx
because the string never started or ended in x or z when those character were stripped; x only ended up at the start and end because the y character was stripped in a next step.

Related

Python .rstrip() strips a string even if it doesn't match exactly

print('345nov'.rstrip('nov'))
the code above prints what you would expect: 345
So why does print('345v'.rstrip('nov')) print the same thing. The string doesn't end with "nov". It only ends with "v". But rstrip() strips it all the same. Either way, how can I make it ignore this behavior and not strip anything unless the ending string matches it exactly.
You get this behavior because rstrip() actually takes an iterable for its parameter. That means the string you place in ("nov") is interpreted as ['n', 'o', 'v']. This can be proved further by changing the order of the characters:
>>>"345nov".rstrip("nvo")
345
You can use endswith and index the string:
string = '345v'
suffix = 'nov'
if string.endswith(suffix):
string = string[:-len(suffix)]
I ended up doing something similar to #a_guest.
def rstrip(a, b):
if a.endswith(b):
return a[:-len(b)]
return a

Python 3 split()

When I'm splitting a string "abac" I'm getting undesired results.
Example
print("abac".split("a"))
Why does it print:
['', 'b', 'c']
instead of
['b', 'c']
Can anyone explain this behavior and guide me on how to get my desired output?
Thanks in advance.
As #DeepSpace pointed out (referring to the docs)
If sep is given, consecutive delimiters are not grouped together and are deemed to delimit empty strings (for example, '1,,2'.split(',') returns ['1', '', '2']).
Therefore I'd suggest using a better delimiter such as a comma , or if this is the formatting you're stuck with then you could just use the builtin filter() function as suggested in this answer, this will remove any "empty" strings if passed None as the function.
sample = 'abac'
filtered_sample = filter(None, sample.split('a'))
print(filtered_sample)
#['b', 'c']
When you split a string in python you keep everything between your delimiters (even when it's an empty string!)
For example, if you had a list of letters separated by commas:
>>> "a,b,c,d".split(',')
['a','b','c','d']
If your list had some missing values you might leave the space in between the commas blank:
>>> "a,b,,d".split(',')
['a','b','','d']
The start and end of the string act as delimiters themselves, so if you have a leading or trailing delimiter you will also get this "empty string" sliced out of your main string:
>>> "a,b,c,d,,".split(',')
['a','b','c','d','','']
>>> ",a,b,c,d".split(',')
['','a','b','c','d']
If you want to get rid of any empty strings in your output, you can use the filter function.
If instead you just want to get rid of this behavior near the edges of your main string, you can strip the delimiters off first:
>>> ",,a,b,c,d".strip(',')
"a,b,c,d"
>>> ",,a,b,c,d".strip(',').split(',')
['a','b','c','d']
In your example, "a" is what's called a delimiter. It acts as a boundary between the characters before it and after it. So, when you call split, it gets the characters before "a" and after "a" and inserts it into the list. Since there's nothing in front of the first "a" in the string "abac", it returns an empty string and inserts it into the list.
split will return the characters between the delimiters you specify (or between an end of the string and a delimiter), even if there aren't any, in which case it will return an empty string. (See the documentation for more information.)
In this case, if you don't want any empty strings in the output, you can use filter to remove them:
list(filter(lambda s: len(s) > 0, "abac".split("a"))

Strip removing more characters than expected

Can anyone explain what's going on here:
s = 'REFPROP-MIX:METHANOL&WATER'
s.lstrip('REFPROP-MIX') # this returns ':METHANOL&WATER' as expected
s.lstrip('REFPROP-MIX:') # returns 'THANOL&WATER'
What happened to that 'ME'? Is a colon a special character for lstrip? This is particularly confusing because this works as expected:
s = 'abc-def:ghi'
s.lstrip('abc-def') # returns ':ghi'
s.lstrip('abd-def:') # returns 'ghi'
str.lstrip removes all the characters in its argument from the string, starting at the left. Since all the characters in the left prefix "REFPROP-MIX:ME" are in the argument "REFPROP-MIX:", all those characters are removed. Likewise:
>>> s = 'abcadef'
>>> s.lstrip('abc')
'def'
>>> s.lstrip('cba')
'def'
>>> s.lstrip('bacabacabacabaca')
'def'
str.lstrip does not remove whole strings (of length greater than 1) from the left. If you want to do that, use a regular expression with an anchor ^ at the beginning:
>>> import re
>>> s = 'REFPROP-MIX:METHANOL&WATER'
>>> re.sub(r'^REFPROP-MIX:', '', s)
'METHANOL&WATER'
The method mentioned by #PadraicCunningham is a good workaround for the particular problem as stated.
Just split by the separating character and select the last value:
s = 'REFPROP-MIX:METHANOL&WATER'
res = s.split(':', 1)[-1] # 'METHANOL&WATER'

Understanding string method strip

After initializing a variable x with the content shown in below, I applied strip with a parameter. The result of strip is unexpected. As I'm trying to strip "ios_static_analyzer/", "rity/ios_static_analyzer/" is getting striped.
Kindly help me know why is it so.
>>> print x
/Users/msecurity/Desktop/testspace/Hy5_Workspace/security/ios_static_analyzer/
>>> print x.strip()
/Users/msecurity/Desktop/testspace/Hy5_Workspace/security/ios_static_analyzer/
>>> print x.strip('/')
Users/msecurity/Desktop/testspace/Hy5_Workspace/security/ios_static_analyzer
>>> print x.strip('ios_static_analyzer/')
Users/msecurity/Desktop/testspace/Hy5_Workspace/secu
>>> print x.strip('analyzer/')
Users/msecurity/Desktop/testspace/Hy5_Workspace/security/ios_static_
>>> print x.strip('_analyzer/')
Users/msecurity/Desktop/testspace/Hy5_Workspace/security/ios_static
>>> print x.strip('static_analyzer/')
Users/msecurity/Desktop/testspace/Hy5_Workspace/security/io
>>> print x.strip('_static_analyzer/')
Users/msecurity/Desktop/testspace/Hy5_Workspace/security/io
>>> print x.strip('s_static_analyzer/')
Users/msecurity/Desktop/testspace/Hy5_Workspace/security/io
>>> print x.strip('os_static_analyzer/')
Users/msecurity/Desktop/testspace/Hy5_Workspace/secu
Quoting from str.strip docs
Return a copy of the string with the leading and trailing characters
removed. The chars argument is a string specifying the set of
characters to be removed. If omitted or None, the chars argument
defaults to removing whitespace. The chars argument is not a prefix or
suffix; rather, all combinations of its values are stripped:
So, it removes all the characters in the parameter, from both the sides of the string.
For example,
my_str = "abcd"
print my_str.strip("da") # bc
Note: You can think of it like this, it stops removing the characters from the string when it finds a character which is not found in the input parameter string.
To actually, remove the particular string, you should use str.replace
x = "/Users/Desktop/testspace/Hy5_Workspace/security/ios_static_analyzer/"
print x.replace('analyzer/', '')
# /Users/msecurity/Desktop/testspace/Hy5_Workspace/security/ios_static_
But replace will remove the matches everywhere,
x = "abcd1abcd2abcd"
print x.replace('abcd', '') # 12
But if you want to remove words only at the beginning and ending of the string, you can use RegEx, like this
import re
pattern = re.compile("^{0}|{0}$".format("abcd"))
x = "abcd1abcd2abcd"
print pattern.sub("", x) # 1abcd2
What you need, I think, is replace:
>>> x.replace('ios_static_analyzer/','')
'/Users/msecurity/Desktop/testspace/Hy5_Workspace/security/'
string.replace(s, old, new[, maxreplace])
Return a copy of string s with all occurrences of substring old replaced by new.
So you can replace your string with nothing and get the desired output.
Python x.strip(s) remove from the begginning or the end of the string x any character appearing in s ! So s is just a set of characters, not a string being matched for substring.
string.strip removes a set of characters given as an argument. The chars argument is not a prefix or suffix; rather, all combinations of its values are stripped.
strip does not remove the string given as argument from the object; it removes the characters in the argument.
In this case, strip sees the string s_static_analyzer/ as an iterable of characters that needs to be stripped.

Strip all characters after the final dash in a string in Python, and test if numeric?

If I have a string:
string = 'this-is-a-string-125'
How can I grab the last set of characters after the dash and check if they are digits?
If you want to verify that they are actually digits, you can do
x.rsplit('-', 1)[1].isdigit()
"Numeric" is a more general criteria that could be interpreted different ways. For instance "12.87" is numeric in some sense, but not all the characters are digits.
You can do int(x.rsplit('-', 1)[1]) to see if the string can be interpreted as a integer, or float(x.rsplit('-', 1)[1]) to see if it can be interpreted as a float. (These will raise a ValueError if the string isn't numeric in the appropriate sense, so you can catch that exception and do whatever you need to do if it's not numeric.)
s = 'this-is-a-string-125'.split('-')[-1].isdigit()
We split the string by dash ('-') which gives a list of substrings (see split()). We then take the last one ([-1]) and we verify that that string contains only digits (isdigit()):
>>> 'this-is-a-string-125'.split('-')
['this', 'is', 'a', 'string', '125']
>>> 'this-is-a-string-125'.split('-')[-1]
'125'
>>> 'this-is-a-string-125'.split('-')[-1].isdigit()
True
Nobody knows about partition or rpartition:
text.rpartition("-")[-1].isdigit()
How about:
str.split('-')[-1].isdigit()
Seems like a simple regex can do both the stripping and checking:
>>> import re
>>> s = 'this-is-a-string-125'
>>> m = re.search(r'-(\d+)$', s)
>>> m.group(1)
'125'
>>> s[:m.start()] # gives you what was stripped away.
'this-is-a-string'
Match object m will be None if the string lacks a dash character followed by one or more digits at the end.

Categories

Resources