.strip('{}:'.format(key)) sometimes strips last characters [duplicate] - python

This question already has answers here:
Python strip unexpected behavior
(2 answers)
Closed 4 years ago.
value = div.xpath('normalize-space(.)').extract()[0].strip('{}:'.format(key)).strip()
The code above sometimes strips the last character from the word. After removing the code after extract() all the data came back fine but in a list.
Example :
Unknown from Duration: Unknown turns into unknow
Movie from Type: Movie turns into Movi
Why does this happen?
I tried this in Python shell and it also strips the last characters
>>> value = ['Type: Movie']
>>> value[0].strip('{}:'.format('Type')).strip()
'Movi'
I expect it to return Movie instead of e getting stripped.
It seems that this .strip('{}:'.format('Type')) is responsible. I removed the last strip() it only return data with spaces.
Edit: It seems that strip() takes characters in inputted string and remove them instead of removing exact strings. That is why the data came out broken. I think a string split then slice is good.
Edit 2:
Seems like answers by Austin and Pankaj Singhal is good and bug free for my use case.

Use a split on 'Type: ' and take the second item:
value = ['Type: Movie']
print(value[0].split('Type: ')[1])
# Movie
Talking about your code, strip is not meant for what you are trying to do. strip only removes characters from at the ends.

You could use lstrip (which returns a copy of the string with only leading characters removed), instead of strip (which returns a copy of the string with leading and trailing characters removed):
>>> 'Type: Movie'.lstrip("Type:").strip()
'Movie'
>>> 'Type: Something with Type'.lstrip("Type:").strip()
'Something with Type'
>>> 'Type: Something with Type:'.lstrip("Type:").strip()
'Something with Type:'
>>>

OR:
>>> value = ['Type: Movie']
>>> value[0][value[0].find(':')+2:]
'Movie'
>>>
And of course, this is another option similar to the first one, just using lstrip:
>>> value[0][value[0].find(':')+1:].lstrip()
'Movie'
>>>
OR:
>>> value[0].lstrip(value[0][:value[0].find(':')+2])
'Movie'
Note: here find can be replaced with index

str.strip does not strip that exact string, but each character in that string, i.e. strip("Type:") will remove each T, y, p, etc. from the beginning and end of the string.
Instead, you could use a regular expression with the ^ anchor to only match substrings at the beginning of the string.
>>> value = ['Type: Movie with Type: in its name']
>>> key = "Type"
>>> re.sub(r"^{}: ".format(key), "", value[0])
'Movie with Type: in its name'

Related

Strip 2m characters from string in Python

I'm trying to remove the "m2" characters from a string using python. This is the code i'm using right now. Unfortunately it appears to do nothing to the string.
Typically the string i would like to strip looks as follow; 502m2, 3m2....
if "m2" in messageContent:
messageContent = messageContent.translate(None, 'm2')
str.translate() is not the correct tool here; you are removing all m and all 2 characters regardless of their context.
If you need to remove the literal text 'm2', just use str.replace():
messageContent = messageContent.replace('m2', '')
You don't even need to test first; str.replace() will return the string unchanged if there are no instances of the literal text present:
>>> '502m2, 3m2'.replace('m2', '')
'502, 3'
>>> 'The quick brown fox'.replace('m2', '')
'The quick brown fox'
Just use str.replace
Return a copy of the string with all occurrences of substring old replaced by new. If the optional argument count is given, only the first count occurrences are replaced.
s = "502m2, 3m2"
print s.replace("m2","")
502, 3

Python - regex, blank element at the end of the list?

I have a code
print(re.split(r"[\s\?\!\,\;]+", "Holy moly, feferoni!"))
which results
['Holy', 'moly', 'feferoni', '']
How can I get rid of this last blank element, what caused it?
If this is a dirty way to get rid of punctuation and spaces from a string, how else can I write but in regex?
Expanding on what #HamZa said in his comment, you would use re.findall and a negative character set:
>>> from re import findall
>>> findall(r"[^\s?!,;]+", "Holy moly, feferoni!")
['Holy', 'moly', 'feferoni']
>>>
You get the empty string as the last element of you list, because the RegEx splits after the last !. It ends up giving you what's before the ! and what's after it, but after it, there's simply nothing, i.e. an empty string! You might have the same problem in the middle of the string if you didn't wisely add the + to your RegEx.
Add a call to list if you can't work with an iterator. If you want to elegantly get rid of the optional empty string, do:
filter(None, re.split(r"[\s?!,;]+", "Holy moly, feferoni!"))
This will result in:
['Holy', 'moly', 'feferoni']
What this does is remove every element that is not a True value. The filter function generally only returns elements that satisfy a requirement given as a function, but if you pass None it will check if the value itself is True. Because an empty string is False and every other string is True it will remove every empty string from the list.
Also note I removed the escaping of special characters in the character class, as it is simply not neccessary and just makes the RegEx harder to read.
the first thing which comes to my mind is something like this:
>>> mystring = re.split(r"[\s\?\!\,\;]+", "Holy moly, feferoni!")
['Holy', 'moly', 'feferoni', '']
>>> mystring.pop(len(mystring)-1)
>>> print mystring
['Holy', 'moly', 'feferoni']
__import__('re').findall('[^\s?!,;]+', 'Holy moly, feferoni!')

Understanding string method strip

After initializing a variable x with the content shown in below, I applied strip with a parameter. The result of strip is unexpected. As I'm trying to strip "ios_static_analyzer/", "rity/ios_static_analyzer/" is getting striped.
Kindly help me know why is it so.
>>> print x
/Users/msecurity/Desktop/testspace/Hy5_Workspace/security/ios_static_analyzer/
>>> print x.strip()
/Users/msecurity/Desktop/testspace/Hy5_Workspace/security/ios_static_analyzer/
>>> print x.strip('/')
Users/msecurity/Desktop/testspace/Hy5_Workspace/security/ios_static_analyzer
>>> print x.strip('ios_static_analyzer/')
Users/msecurity/Desktop/testspace/Hy5_Workspace/secu
>>> print x.strip('analyzer/')
Users/msecurity/Desktop/testspace/Hy5_Workspace/security/ios_static_
>>> print x.strip('_analyzer/')
Users/msecurity/Desktop/testspace/Hy5_Workspace/security/ios_static
>>> print x.strip('static_analyzer/')
Users/msecurity/Desktop/testspace/Hy5_Workspace/security/io
>>> print x.strip('_static_analyzer/')
Users/msecurity/Desktop/testspace/Hy5_Workspace/security/io
>>> print x.strip('s_static_analyzer/')
Users/msecurity/Desktop/testspace/Hy5_Workspace/security/io
>>> print x.strip('os_static_analyzer/')
Users/msecurity/Desktop/testspace/Hy5_Workspace/secu
Quoting from str.strip docs
Return a copy of the string with the leading and trailing characters
removed. The chars argument is a string specifying the set of
characters to be removed. If omitted or None, the chars argument
defaults to removing whitespace. The chars argument is not a prefix or
suffix; rather, all combinations of its values are stripped:
So, it removes all the characters in the parameter, from both the sides of the string.
For example,
my_str = "abcd"
print my_str.strip("da") # bc
Note: You can think of it like this, it stops removing the characters from the string when it finds a character which is not found in the input parameter string.
To actually, remove the particular string, you should use str.replace
x = "/Users/Desktop/testspace/Hy5_Workspace/security/ios_static_analyzer/"
print x.replace('analyzer/', '')
# /Users/msecurity/Desktop/testspace/Hy5_Workspace/security/ios_static_
But replace will remove the matches everywhere,
x = "abcd1abcd2abcd"
print x.replace('abcd', '') # 12
But if you want to remove words only at the beginning and ending of the string, you can use RegEx, like this
import re
pattern = re.compile("^{0}|{0}$".format("abcd"))
x = "abcd1abcd2abcd"
print pattern.sub("", x) # 1abcd2
What you need, I think, is replace:
>>> x.replace('ios_static_analyzer/','')
'/Users/msecurity/Desktop/testspace/Hy5_Workspace/security/'
string.replace(s, old, new[, maxreplace])
Return a copy of string s with all occurrences of substring old replaced by new.
So you can replace your string with nothing and get the desired output.
Python x.strip(s) remove from the begginning or the end of the string x any character appearing in s ! So s is just a set of characters, not a string being matched for substring.
string.strip removes a set of characters given as an argument. The chars argument is not a prefix or suffix; rather, all combinations of its values are stripped.
strip does not remove the string given as argument from the object; it removes the characters in the argument.
In this case, strip sees the string s_static_analyzer/ as an iterable of characters that needs to be stripped.

Select last chars of string until whitespace in Python [duplicate]

This question already has answers here:
Python: Cut off the last word of a sentence?
(10 answers)
Closed 8 years ago.
Is there any efficient way to select the last characters of a string until there's a whitespace in Python?
For example I have the following string:
str = 'Hello my name is John'
I want to return 'John'. But if the str was:
str = 'Hello my name is Sally'
I want to retrun 'Sally'
Just split the string on whitespace, and get the last element of the array. Or use rsplit() to start splitting from end:
>>> st = 'Hello my name is John'
>>> st.rsplit(' ', 1)
['Hello my name is', 'John']
>>>
>>> st.rsplit(' ', 1)[1]
'John'
The 2nd argument specifies the number of split to do. Since you just want last element, we just need to split once.
As specified in comments, you can just pass None as 1st argument, in which case the default delimiter which is whitespace will be used:
>>> st.rsplit(None, 1)[-1]
'John'
Using -1 as index is safe, in case there is no whitespace in your string.
It really depends what you mean by efficient, but the simplest (efficient use of programmer time) way I can think of is:
str.split()[-1]
This fails for empty strings, so you'll want to check that.
I think this is what you want:
str[str.rfind(' ')+1:]
this creates a substring from str starting at the character after the right-most-found-space, and up until the last character.
This works for all strings - empty or otherwise (unless it's not a string object, e.g. a None object would throw an error)

Strip all characters after the final dash in a string in Python, and test if numeric?

If I have a string:
string = 'this-is-a-string-125'
How can I grab the last set of characters after the dash and check if they are digits?
If you want to verify that they are actually digits, you can do
x.rsplit('-', 1)[1].isdigit()
"Numeric" is a more general criteria that could be interpreted different ways. For instance "12.87" is numeric in some sense, but not all the characters are digits.
You can do int(x.rsplit('-', 1)[1]) to see if the string can be interpreted as a integer, or float(x.rsplit('-', 1)[1]) to see if it can be interpreted as a float. (These will raise a ValueError if the string isn't numeric in the appropriate sense, so you can catch that exception and do whatever you need to do if it's not numeric.)
s = 'this-is-a-string-125'.split('-')[-1].isdigit()
We split the string by dash ('-') which gives a list of substrings (see split()). We then take the last one ([-1]) and we verify that that string contains only digits (isdigit()):
>>> 'this-is-a-string-125'.split('-')
['this', 'is', 'a', 'string', '125']
>>> 'this-is-a-string-125'.split('-')[-1]
'125'
>>> 'this-is-a-string-125'.split('-')[-1].isdigit()
True
Nobody knows about partition or rpartition:
text.rpartition("-")[-1].isdigit()
How about:
str.split('-')[-1].isdigit()
Seems like a simple regex can do both the stripping and checking:
>>> import re
>>> s = 'this-is-a-string-125'
>>> m = re.search(r'-(\d+)$', s)
>>> m.group(1)
'125'
>>> s[:m.start()] # gives you what was stripped away.
'this-is-a-string'
Match object m will be None if the string lacks a dash character followed by one or more digits at the end.

Categories

Resources