"/1/2/3/".split("/") - python

It's too hot & I'm probably being retarded.
>>> "/1/2/3/".split("/")
['', '1', '2', '3','']
Whats with the empty elements at the start and end?
Edit: Thanks all, im putting this down to heat induced brain failure. The docs aren't quite the clearest though, from http://docs.python.org/library/stdtypes.html
"Return a list of the words in the string, using sep as the delimiter string"
Is there a word before the first, or after the last "/"?

Compare with:
"1/2/3".split("/")
Empty elements are still elements.
You could use strip('/') to trim the delimiter from the beginning/end of your string.

As JLWarlow says, you have an extra '/' in the string. Here's another example:
>>> "//2//3".split('/')
['', '', '2', '', '3']

Slashes are separators, so there are empty elements before the first and after the last.

you're splitting on /. You have 4 /, so, the list returned will have 5 elements.

That is exactly what I would expect, but we are all different :)
What would you expect from: : "1,,2,3".split(",") ?

You can use strip() to get rid of the leading and trailing fields... Then call split() as before.

[x for x in "//1///2/3///".split("/") if x != ""]

Related

Python 3 split()

When I'm splitting a string "abac" I'm getting undesired results.
Example
print("abac".split("a"))
Why does it print:
['', 'b', 'c']
instead of
['b', 'c']
Can anyone explain this behavior and guide me on how to get my desired output?
Thanks in advance.
As #DeepSpace pointed out (referring to the docs)
If sep is given, consecutive delimiters are not grouped together and are deemed to delimit empty strings (for example, '1,,2'.split(',') returns ['1', '', '2']).
Therefore I'd suggest using a better delimiter such as a comma , or if this is the formatting you're stuck with then you could just use the builtin filter() function as suggested in this answer, this will remove any "empty" strings if passed None as the function.
sample = 'abac'
filtered_sample = filter(None, sample.split('a'))
print(filtered_sample)
#['b', 'c']
When you split a string in python you keep everything between your delimiters (even when it's an empty string!)
For example, if you had a list of letters separated by commas:
>>> "a,b,c,d".split(',')
['a','b','c','d']
If your list had some missing values you might leave the space in between the commas blank:
>>> "a,b,,d".split(',')
['a','b','','d']
The start and end of the string act as delimiters themselves, so if you have a leading or trailing delimiter you will also get this "empty string" sliced out of your main string:
>>> "a,b,c,d,,".split(',')
['a','b','c','d','','']
>>> ",a,b,c,d".split(',')
['','a','b','c','d']
If you want to get rid of any empty strings in your output, you can use the filter function.
If instead you just want to get rid of this behavior near the edges of your main string, you can strip the delimiters off first:
>>> ",,a,b,c,d".strip(',')
"a,b,c,d"
>>> ",,a,b,c,d".strip(',').split(',')
['a','b','c','d']
In your example, "a" is what's called a delimiter. It acts as a boundary between the characters before it and after it. So, when you call split, it gets the characters before "a" and after "a" and inserts it into the list. Since there's nothing in front of the first "a" in the string "abac", it returns an empty string and inserts it into the list.
split will return the characters between the delimiters you specify (or between an end of the string and a delimiter), even if there aren't any, in which case it will return an empty string. (See the documentation for more information.)
In this case, if you don't want any empty strings in the output, you can use filter to remove them:
list(filter(lambda s: len(s) > 0, "abac".split("a"))

How to remove falsy values when splitting a string with a non-whitespace separator

According to the docs:
str.split(sep=None, maxsplit=-1)
If sep is given, consecutive delimiters are not grouped together and
are deemed to delimit empty strings (for example, '1,,2'.split(',')
returns ['1', '', '2']). The sep argument may consist of multiple
characters (for example, '1<>2<>3'.split('<>') returns ['1', '2', '3']).
Splitting an empty string with a specified separator returns
[''].
If sep is not specified or is None, a different splitting algorithm is applied: runs of consecutive whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the string has leading or trailing whitespace. Consequently, splitting an empty string or a string consisting of just whitespace with a None separator returns [].
So to use the keyword argument sep=, is the following the pythonic way to remove the falsy items?
[w for w in s.strip().split(' ') if w]
If it's just whitespaces (\s\t\n), str.split() will suffice but let's say we are trying to split on another character/substring, the if-condition in the list comprehension is necessary. Is that right?
If you want to be obtuse, you could use filter(None, x) to remove falsey items:
>>> list(filter(None, '1,2,,3,'.split(',')))
['1', '2', '3']
Probably less Pythonic. It might be clearer to iterate over the items specifically:
for w in '1,2,,3,'.split(','):
if w:
…
This makes it clear that you're skipping the empty items and not relying on the fact that str.split sometimes skips empty items.
I'd just as soon use a regex, either to skip consecutive runs of the separator (but watch out for the end):
>>> re.split(r',+', '1,2,,3,')
['1', '2', '3', '']
or to find everything that's not a separator:
>>> re.findall(r'[^,]+', '1,2,,3,')
['1', '2', '3']
If you want to go way back in Python's history, there were two separate functions, split and splitfields. I think the name explains the purpose. The first splits on any whitespace, useful for arbitrary text input, and the second behaves predictably on some delimited input. They were implemented in pure Python before v1.6.
Well, I think you might just need a hand in understanding the documentation. In your example, you pretty much are demonstrating the differences in the algorithm mentioned in documentation. Not using the sep keyword argument more or less is like using sep=' ' and then throwing out the empty strings. When you have multiple spaces in a row the algorithm splits those and finds None. Because you were explicit that you wanted everything split by a space it converts None to an empty string. Changing None to an empty string is good practice in this case, because it avoids changing the signature of the function (or in other words what the functions returns), in this case it returns a list of strings.
Below is showing how an empty string with 4 spaces is treated differently...
>>> empty = ' '
>>> s = 'this is an irritating string with random spacing .'
>>> empty.split()
[]
>>> empty.split(' ')
['', '', '', '']
For you question, just use split() with no sep argument
well your string
s = 'this is an irritating string with random spacing .',
which is containing more than one white spaces that's why empty.split(' ') is returning noney value.
you would have to remove extra white space from string s and can get desired result.

Split string in the Python script

I have following string under variable test:
1 - 10 of 20 Results
I want to split this string and need 20 as a result.
I am using the following script to achieve this:
result=test.split('of')
mid_result=result[1].split(" ")
final_result=mid_result[1]
Is there any way to achieve this in one line or any direct method ?
Thanks.
You could do this:
result = test.split('of')[1].split(" ")[1]
But this would be faster:
result = test.split(' ')[4]
like this?
result = test.split('of')[1].split(" ")[1]
Splitting on whitespaces seems better idea. Thanks Useless
you can use str.split method like this,
str.split(' ')[4]
You had it almost right, but there is not need to split it first by the word "of".
test.split(" ")[4]
result = test.split( )[-2]
Use a reverse index. much easier and more fool proof.
If you are sure of that the format of string remains same for all the strings you are processing, it can be accomplished in the following ways.
test.split(' ')[4]
test.split(' ')[-2]
explanation:
test.split(' ')
will return u a list of
['1', '-', '10', 'of', '20', 'Results']
and the 4th element left to right and the second element right to left will give you the result.

Python - regex, blank element at the end of the list?

I have a code
print(re.split(r"[\s\?\!\,\;]+", "Holy moly, feferoni!"))
which results
['Holy', 'moly', 'feferoni', '']
How can I get rid of this last blank element, what caused it?
If this is a dirty way to get rid of punctuation and spaces from a string, how else can I write but in regex?
Expanding on what #HamZa said in his comment, you would use re.findall and a negative character set:
>>> from re import findall
>>> findall(r"[^\s?!,;]+", "Holy moly, feferoni!")
['Holy', 'moly', 'feferoni']
>>>
You get the empty string as the last element of you list, because the RegEx splits after the last !. It ends up giving you what's before the ! and what's after it, but after it, there's simply nothing, i.e. an empty string! You might have the same problem in the middle of the string if you didn't wisely add the + to your RegEx.
Add a call to list if you can't work with an iterator. If you want to elegantly get rid of the optional empty string, do:
filter(None, re.split(r"[\s?!,;]+", "Holy moly, feferoni!"))
This will result in:
['Holy', 'moly', 'feferoni']
What this does is remove every element that is not a True value. The filter function generally only returns elements that satisfy a requirement given as a function, but if you pass None it will check if the value itself is True. Because an empty string is False and every other string is True it will remove every empty string from the list.
Also note I removed the escaping of special characters in the character class, as it is simply not neccessary and just makes the RegEx harder to read.
the first thing which comes to my mind is something like this:
>>> mystring = re.split(r"[\s\?\!\,\;]+", "Holy moly, feferoni!")
['Holy', 'moly', 'feferoni', '']
>>> mystring.pop(len(mystring)-1)
>>> print mystring
['Holy', 'moly', 'feferoni']
__import__('re').findall('[^\s?!,;]+', 'Holy moly, feferoni!')

Python - removing characters from a list

I have list of elements like this:
['1:{test}', '2:{test}', '4:{1989}', '9:{test}', '']
My question is:
How can I remove specific characters from the elements of this list ?
As a result I want to have :
['test', 'test', '1989', 'test', '']
Any suggestions, solutions ?
Thanks in advance.
>>> re.findall(r'\{(.*)\}', '1:{test}')
['test']
Just make a loop with it:
[(re.findall(r'\{(.*)\}', i) or [''])[0] for i in your_list]
or maybe:
[''.join(re.findall(r'\{(.*)\}', i)) for i in your_list]
You could use a regular expression, like so:
import re
s = re.compile("\d+:{(.*)}")
data = ['1:{test}', '2:{test}', '4:{1989}', '9:{test}', '']
result = [s.match(d).group(1) if s.match(d) else d for d in data]
results in
['test', 'test', '1989', 'test', '']
Python's strip() function does exactly that -- remove specific characters from the ends of a string -- but there are probably better ways to do what you want.
You haven't said exactly what the pattern is, or what you want if there are no braces, but this will work on your example:
stripped = []
for x in my_data:
m = re.search("{.*}", x)
stripped.append(m.group if m else x)
t = ['1:{test}', '2:{test}', '4:{1989}', '9:{test}', '']
map(lambda string: re.search(r'(?<=\{).+(?=\})', string).group(0), t)
Granted, this is not the most well-formatted or easiest to read of answers. This maps an anonymous function that finds and returns what is inside the brackets to each element of the list, returning the whole list.
(?<=...) means "match only that has this at the beginning, but don't include it in the result
(?=...) means "match only that has this at the end, but don't include it in the result
.+ means "at least one character of any kind"

Categories

Resources