How to split a list on an element delimiter [duplicate]

How to split a list on an element delimiter [duplicate] - python

This question already has answers here:
How to split a list into chunks determined by a separator?
(4 answers)
Closed 5 years ago.
Are there concise and elegant ways of splitting a list in Python into a list of sub-lists by a delimiting element, such that ['a', 'delim', 'b'] -> [['a'], ['b']]?
Here is the example:
ldat = ['a','b','c','a','b','c','a','b']
dlim = 'c'
lspl = [] # an elegant python one-liner wanted on this line!
print(lspl) # want: [['a', 'b'], ['a', 'b'], ['a', 'b']]
Working examples that seem overly complex
I have surveyed documentation and related questions on stackoverflow - many referenced below - which did not answer my question, and am summarizing my research below: several approaches which do generate the desired output, but are verbose and intricate, and what is happening (splitting a list) is not immediately apparent -- you really have to squint.
Are there better ways? I am primarily interested in readability for beginners (e.g. teaching), canonical / 'Pythonic' approaches, and secondarily in the most efficient approaches (e.g. timeit speed). Ideally answers would address both Python 2.7 and 3.x.
with conditional .append()
Loop through the list and either append to the last output list or add a new output list. Based on an example that includes the delimiter, but altered to exclude it. I'm not sure how to make it a one-liner, or if that is even desirable.
lspl = [[]]
for i in ldat:
if i==dlim:
lspl.append([])
else:
lspl[-1].append(i)
print(lspl) # prints: [['a', 'b'], ['a', 'b'], ['a', 'b']]
with itertools.groupby
Combine itertools.groupby with list comprehension. Many answers include delimeters, this is based on those that exclude delimeters.
import itertools
lspl = [list(y) for x, y in itertools.groupby(ldat, lambda z: z == dlim) if not x]
print(lspl) # prints: [['a', 'b'], ['a', 'b'], ['a', 'b']]
with slicing on indices
Some related questions have discussed how to use slicing after using .index() -- however answers usually focus on finding the first index only. One can extend this approach by first finding a list of indices and then looping through a self-zipped list to slice the ranges.
indices = [i for i, x in enumerate(ldat) if x == dlim]
lspl = [ldat[s+1:e] for s, e in zip([-1] + indices, indices + [len(ldat)])]
print(lspl) # prints: [['a', 'b'], ['a', 'b'], ['a', 'b']]
However, like all the approaches I have found, this seems like a very complex way of enacting a simple split-on-delimiter operation.
Comparison to string splitting
By comparison and as a model only, here is a working, concise, and elegant way of splitting
a string into a list of sub-strings by a delimiter.
sdat = 'abcabcab'
dlim = 'c'
sspl = sdat.split(dlim)
print(sspl) # prints: ['ab', 'ab', 'ab']
NOTE: I understand there is no split method on lists in Python, and I am not asking about splitting a string. I am also not asking about splitting element-strings into new elements.

or this:
ldat = ['a','b','c','a','b','c','a','b']
dlim = 'c'
lspl = [] # an elegant python one-liner wanted on this line!
print(lspl) # want: [['a', 'b'], ['a', 'b'], ['a', 'b']]
s = str(ldat).replace(", '%s', " % dlim, "],[")
result = eval(s)
print(result)

Related

How to split up elements of a list separated by ],[ in Python

I have a list that looks like :
mylist=[[["A","B"],["A","C","B"]],[["A","D"]]]
and I want to return :
mylist=[["A","B"],["A","C","B"],["A","D"]]
Using the split() function returns an error of :
list object has no attribute split
Therefore, I am unsure how I should split the elements of this list.
Thanks!

I am not sure why you think splitting will do any good for you; after all, you are -- if anything -- merging the second layer lists. But flattenening by one level can be done by a comprehension:
mylist = [inner for outer in mylist for inner in outer]
# [['A', 'B'], ['A', 'C', 'B'], ['A', 'D']]
One util to (maybe a matter of taste) simplify this is itertools.chain:
from itertools import chain
mylist = list(chain(*mylist))

Use for-loop in order to do this.
Here is an example code:
output = []
for list_element in my_list:
for single_list in list_element:
output.append(single_list)

set item at multiple indexes in a list

I am trying to find a way to use a list of indexes to set values at multiple places in a list (as is possible with numpy arrays).
I found that I can map __getitem__ and a list of indexes to return the values at those indexes:
# something like
a_list = ['a', 'b', 'c']
idxs = [0, 1]
get_map = map(a_list.__getitem__, idxs)
print(list(get_map)) # yields ['a', 'b']
However, when applying this same line of thought to __setitem__, the setting fails. This probably has something to do with pass-by-reference vs pass-by-value, which I have never fully understood no matter how many times I've read about it.
Is there a way to do this?
b_list = ['a', 'b', 'c']
idxs = [0, 1]
put_map = map(b_list.__setitem__, idx, ['YAY', 'YAY'])
print(b_list) # yields ['YAY', 'YAY', 'c']
For my use case, I only want to set one value at multiple locations. Not multiple values at multiple locations.
EDIT: I know how to use list comprehension. I am trying to mimic numpy's capability to accept a list of indexes for both getting and setting items in an array, except for lists.

The difference between the get and set case is that in the get case you are interested in the result of map itself, but in the set case you want a side effect. Thus, you never consume the map generator and the instructions are never actually executed. Once you do, b_list gets changed as expected.
>>> put_map = map(b_list.__setitem__, idxs, ['YAY', 'YAY'])
>>> b_list
['a', 'b', 'c']
>>> list(put_map)
[None, None]
>>> b_list
['YAY', 'YAY', 'c']
Having said that, the proper way for get would be a list comprehension and for set a simple for loop. That also has the advantage that you do not have to repeat the value to put in place n times.
>>> for i in idxs: b_list[i] = "YAY"
>>> [b_list[i] for i in idxs]
['YAY', 'YAY']

Generating unique permutations in a predictable order with Python and Sympy

I have a list of elements in python. In the final code, the length of the list will be variable and the elements of my list much more lengthy, but I can demonstrate my question with three dummy elements.
Basically, for any list of length n, there will be n-1 identical elements, and 1 unique element.
So, in the example of three elements I have:
test = ['b', 'a', 'a']
For small cases such as 3, where I can verify by eye the order of the elements, I have been using a function from the Sympy module, as below:
permutations = list(multiset_permutations(test))
However, once cases become too large, I'm not certain that the order will be predictable. And, checking the official documentations doesn't really clarify the issue for me very well.
Is there a way to generate these permutations in a predictable order such that I could know, for example:
permutations[0] = ['b', 'a', 'a']
permutations[1] = ['a', 'b', 'a']
permutations[2] = ['a', 'a', 'b']
Thank you for any help that can be given.

The elements are ordered before the permutations are generated by multiset_permutations. Your result will not depend on the order of the input items.

Why does pythons slice indexing give counter intuitive results? [duplicate]

This question already has answers here:
Printing a column of a 2-D List in Python
(7 answers)
Closed 6 years ago.
If I have a 2D list in python Data and I want to create a slice of that 2D list, where I select all the elements from the first index and a single on from the second.
eg.
Data = [[a,b,c],[d,e,f],[h,i,g]]
and I want the list;
raw_data = [b,e,i]
Why does doing something like;
raw_data = Data[:][1]
not give the desired output?
I have specified the whole first index and the 1 index for the second.
Instead I get the output that is;
raw_data = [d,e,f]
Which is what I would expect to get from;
raw_data = Data[1][:]
raw_data = [d,e,f]
So;
Data[1][:] = Data[:][1]
Which is not compatible with my mental model of how lists work in python.
Instead I have to use a loop to do it;
raw_data = []
for i in xrange(0,len(Data),1):
raw_data.append(Data[i][1])
So my question is, can anyone explain why Data[1][:] = Data[:][1] ?
Thanks for reading!

lst[:] has no explicit start an no explicit end, so according to the Python documentation, it will return a copy of the list starting at the start and ending at the end of the list. In other words, it will return a copy of same list you have before. So:
>>> Data = [['a','b','c'],['d','e','f'],['h','i','g']]
>>> Data[:]
[['a', 'b', 'c'], ['d', 'e', 'f'], ['h', 'i', 'g']]
So when you say Data[:], that will evaluate to the same as a copy of Data, meaning that Data[:][1] essentially is just Data[1], which is [d,e,f]
If you do it the other way:
>>> Data[1]
['d', 'e', 'f']
>>> Data[1][:]
['d', 'e', 'f']
You get the second element in data, [d,e,f], then you use that same list slicing syntax as before to get that same list again.
To get what you want, I'd use a list comprehension:
>>> [x[1] for x in Data]
['b', 'e', 'i']
Simple as that.

Vanilla Python doesn't have two dimensional arrays, but it does allow for extensions to implement them. You have a list of lists, which is somewhat different.
The solution to your problem is to use numpy which does have a 2d array type. You can then say data[:,1]
Why your example doesn't work as you expect: data[:] means "a copy of data" and so data[:][1] means the index 1 element of the copy of data, which is [d,e,f]

It's pretty obvious if you go through what's happening from left to right.
raw_data = Data[:]
will give you the entirety of Data, so the whole list of lists: [[a,b,c],[d,e,f],[h,i,g]]
raw_data = Data[:][1]
will then give you the element at index 1 in this list, which is [d,e,f].
On the other hand,
raw_data = Data[1]
will return the element at position 1 in data, which is also [d,e,f].
[:] on this object will again return itself in its entirety.
What you are trying to do is best done with a list comprehension, such as:
raw_data = [x[1] for x in Data]
This will give you list of all second elements in all lists in Data.

Is there a python method to re-order a list based on the provided new indices?

Say I have a working list:
['a','b','c']
and an index list
[2,1,0]
which will change the working list to:
['c','b','a']
Is there any python method to do this easily (the working list may also be a numpy array, and so a more adaptable method is greatly preferred)? Thanks!

ordinary sequence:
L = [L[i] for i in ndx]
numpy.array:
L = L[ndx]
Example:
>>> L = "abc"
>>> [L[i] for i in [2,1,0]]
['c', 'b', 'a']

Converting my comment-answer from when this question was closed:
As you mentioned numpy, here is an answer for that case:
For numerical data, you can do this directly with numpy arrays Details here: http://www.scipy.org/Cookbook/Indexing#head-a8f6b64c733ea004fd95b47191c1ca54e9a579b5
the syntax is then
myarray[myindexlist]
For non-numerical data, the most efficient way in the case of long arrays which you read out only once is most likely this:
(myarray[i] for i in myindex)
note the () for generator expressions instead of [] for list comprehension.
Read this:
Generator Expressions vs. List Comprehension

This is very easy to do with numpy if you don't mind converting to numpy arrays:
>>> import numpy
>>> vals = numpy.array(['a','b','c'])
>>> idx = numpy.array([2,1,0])
>>> vals[idx]
array(['c', 'b', 'a'],
dtype='|S1')
To get back to a list, you can do:
>>> vals[idx].tolist()
['c', 'b', 'a']

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to split a list on an element delimiter [duplicate] - python

or this: ldat = ['a','b','c','a','b','c','a','b'] dlim = 'c' lspl = [] # an elegant python one-liner wanted on this line! print(lspl) # want: [['a', 'b'], ['a', 'b'], ['a', 'b']] s = str(ldat).replace(", '%s', " % dlim, "],[") result = eval(s) print(result)

Related

How to split up elements of a list separated by ],[ in Python

set item at multiple indexes in a list

Generating unique permutations in a predictable order with Python and Sympy

Why does pythons slice indexing give counter intuitive results? [duplicate]

Is there a python method to re-order a list based on the provided new indices?

Categories

Resources