I have two lists, looking like this:
a= [[1,2,3,4], [2,3,4,5],[3,4,5,6,7]], b= [[5,6,7,8], [9,1,2,3], [4,5,6,7,8]]
which I want to subtract from each other element by element for an Output like this:
a-b= [[-4,-4,-4,-4],[7,2,2,2],[-1,-1,-1,-1,-1]]
In order to do so I convert each of a and b to arrays and subtract them I use:
np.array(a)-np.array(b)
The Output just gives me the error:
Unsupported Operand type for-: 'list' and 'list'
What am I doing wrong? Shouldn't the np.array command ensure the conversion to the array?
Here is a Numpythonic way:
>>> y = map(len, a)
>>> a = np.hstack(np.array(a))
>>> b = np.hstack(np.array(b))
>>> np.split(a-b, np.cumsum(y))
[array([-4, -4, -4, -4]), array([-7, 2, 2, 2]), array([-1, -1, -1, -1, -1]), array([], dtype=float64)]
>>>
Since you cannot subtract the arrays with different shapes, you can flatten your arrays using np.hstack() then subtract your flattened arrays then reshape based on the previous shape.
You can try:
>>> a= [[1,2,3,4], [2,3,4,5],[3,4,5,6,7]]
>>> b= [[5,6,7,8], [9,1,2,3], [4,5,6,7,8]]
>>>
>>> c =[]
>>> for i in range(len(a)):
c.append([A - B for A, B in zip(a[i], b[i])])
>>> print c
[[-4, -4, -4, -4], [-7, 2, 2, 2], [-1, -1, -1, -1, -1]]
Or
2nd method is using map:
from operator import sub
a= [[1,2,3,4], [2,3,4,5],[3,4,5,6,7]]
b= [[5,6,7,8], [9,1,2,3], [4,5,6,7,8]]
c =[]
for i in range(len(a)):
c.append(map(sub, a[i], b[i]))
print c
[[-4, -4, -4, -4], [-7, 2, 2, 2], [-1, -1, -1, -1, -1]]
The dimensions of your two arrays don't match, i.e. the first two sublists of a have 4 elements, but the third has 5 and ditto with b. If you convert the lists to numpy arrays, numpy silently gives you something like this:
In [346]: aa = np.array(a)
In [347]: bb = np.array(b)
In [348]: aa
Out[348]: array([[1, 2, 3, 4], [2, 3, 4, 5], [3, 4, 5, 6, 7]], dtype=object)
In [349]: bb
Out[349]: array([[5, 6, 7, 8], [9, 1, 2, 3], [4, 5, 6, 7, 8]], dtype=object)
You need to make sure that all your sublists have the same number of elements, then your code will work:
In [350]: a = [[1,2,3,4], [2,3,4,5],[3,4,5,6]]; b = [[5,6,7,8], [9,1,2,3], [4,5,6,7]] # I removed the last element of third sublist in a and b
In [351]: np.array(a) - np.array(b)
Out[351]:
array([[-4, -4, -4, -4],
[-7, 2, 2, 2],
[-1, -1, -1, -1]])
Without NumPy:
result = []
for (m, n) in (zip(a, b)):
result.append([i - j for i, j in zip(m, n)])
See also this question and this one.
What about a custom function such as:
import numpy as np
a = [[1, 2, 3, 4], [2, 3, 4, 5], [3, 4, 5, 6, 7]]
b = [[5, 6, 7, 8], [9, 1, 2, 3], [4, 5, 6, 7, 8]]
def np_substract(l1, l2):
return np.array([np.array(l1[i]) - np.array(l2[i]) for i in range(len(l1))])
print np_substract(a, b)
You are getting the error, because your code is trying to subtract sublist from sublist, if you want to make it work, you can do the same in following manner:
import numpy as np
a= [[1,2,3,4], [2,3,4,5],[3,4,5,6,7]]
b= [[5,6,7,8], [9,1,2,3], [4,5,6,7,8]]
#You can apply different condition here, like (if (len(a) == len(b)), then only run the following code
for each in range(len(a)):
list = np.array(a[each])-np.array(b[each])
#for converting the output array in to list
subList[each] = list.tolist()
print subList
A nested list comprehension will do the job:
In [102]: [[i2-j2 for i2,j2 in zip(i1,j1)] for i1,j1 in zip(a,b)]
Out[102]: [[-4, -4, -4, -4], [-7, 2, 2, 2], [-1, -1, -1, -1, -1]]
The problem with np.array(a)-np.array(b) is that the sublists differ in length, so the resulting arrays are object type - arrays of lists
In [104]: np.array(a)
Out[104]: array([[1, 2, 3, 4], [2, 3, 4, 5], [3, 4, 5, 6, 7]], dtype=object)
Subtraction is iterating over the outer array just fine, but hitting a problem when subtracting one sublist from another - hence the error message.
If I made the inputs arrays of arrays, the subtraction will work
In [106]: np.array([np.array(a1) for a1 in a])
Out[106]: array([array([1, 2, 3, 4]), array([2, 3, 4, 5]), array([3, 4, 5, 6, 7])], dtype=object)
In [107]: aa=np.array([np.array(a1) for a1 in a])
In [108]: bb=np.array([np.array(a1) for a1 in b])
In [109]: aa-bb
Out[109]:
array([array([-4, -4, -4, -4]),
array([-7, 2, 2, 2]),
array([-1, -1, -1, -1, -1])], dtype=object)
You can't count of array operations working on object dtype arrays. But in this case, subtraction is defined for the subarrays, so it can handle the nesting.
Another way to do the nesting is use np.subtract. This is a ufunc version of - and will apply np.asarray to its inputs as needed:
In [103]: [np.subtract(i1,j1) for i1,j1 in zip(a,b)]
Out[103]: [array([-4, -4, -4, -4]), array([-7, 2, 2, 2]), array([-1, -1, -1, -1, -1])]
Notice that these array calculations return arrays or a list of arrays. Turning the inner arrays back to lists requires iteration.
If you are starting with lists, converting to arrays often does not save time. Array calculations can be faster, but that doesn't compensate for the overhead in creating the arrays in the first place.
If I pad the inputs to equal length, then the simple array subtraction works, creating a 2d array.
In [116]: ao= [[1,2,3,4,0], [2,3,4,5,0],[3,4,5,6,7]]; bo= [[5,6,7,8,0], [9,1,2,3,0], [4,5,6,7,8]]
In [117]: np.array(ao)-np.array(bo)
Out[117]:
array([[-4, -4, -4, -4, 0],
[-7, 2, 2, 2, 0],
[-1, -1, -1, -1, -1]])
Related
I need to perform some calculations on consecutive columns in a 2D array, for simplicity's sake, let's say substruction.
I currently do this in the following way:
c = np.array([(a[i, j + 1] - a[i, j]) for j in range(a.shape[1] - 1) for i in range(a.shape[0])]).reshape(a.shape[0], a.shape[1] - 1)
But I suspect there must be a better way using NumPy's vector operations without iteration over 2 values and a reshape.
First of all, I don't think that what you wrote achieves what you try to achieve.
I ran:
>>> a
array([[4, 6, 1, 1, 4],
[7, 1, 7, 0, 6],
[2, 0, 0, 1, 2],
[0, 6, 3, 2, 8]])
>>> c = np.array([(a[i, j + 1] - a[i, j]) for j in range(a.shape[1] - 1) for i in range(a.shape[0])])
>>> c
array([ 2, -6, -2, 6, -5, 6, 0, -3, 0, -7, 1, -1, 3, 6, 1, 6])
>>> c = np.array([(a[i, j + 1] - a[i, j]) for j in range(a.shape[1] - 1) for i in range(a.shape[0])]).reshape(a.shape[0], a.shape[1] - 1)
>>> c
array([[ 2, -6, -2, 6],
[-5, 6, 0, -3],
[ 0, -7, 1, -1],
[ 3, 6, 1, 6]])
The function np.diff receives a vector and returns it's differences array, so:
>>> np.diff([1, 2, 3, 5])
array([1, 1, 2])
But in numpy most functions can handle np.arrays and not just scalars. For this reason, a good key word to know is axis. When passing axis=0 or axis=1 the function will perform like the original one, but on a higher dimension. So instead of subtracting two numbers, it will subtract two vectors. axis=0 and axis=1 will give subtraction of rows and columns (respectively).
Final Answer:
So the final answer is: np.diff(a, axis=1).
Example:
>>> a
array([[4, 6, 1, 1, 4],
[7, 1, 7, 0, 6],
[2, 0, 0, 1, 2],
[0, 6, 3, 2, 8]])
>>> np.diff(a, axis=1)
array([[ 2, -5, 0, 3],
[-6, 6, -7, 6],
[-2, 0, 1, 1],
[ 6, -3, -1, 6]])
First of all, the order of loops in the question differs from what would seem to do the obvious thing. I am going to guess here that you meant to have the i and j loops the other way round.
Given an example:
a = np.arange(8).reshape(2,4) ** 2
i.e.
array([[ 0, 1, 4, 9],
[16, 25, 36, 49]])
Swapping the order of loops gives:
c = np.array([(a[i, j + 1] - a[i, j]) for i in range(a.shape[0]) for j in range(a.shape[1] - 1)]).reshape(a.shape[0], a.shape[1] - 1)
i.e.
array([[ 1, 3, 5],
[ 9, 11, 13]])
So now proceeding to answer the question on that basis, you can do this simply using:
a[:,1:] - a[:,:-1]
Here, a[:,1:] is the array without the first column, a[:,:-1] is the array without the last column, and then the element-by-element difference between the two is calculated.
Replace - with whatever other operator you want. Your question implies that subtraction is just an example, but other operators (e.g. * or whatever) will also similarly output element-by-element results.
Your actual operation does not have to be a single basic operation; provided that it is some combination of basic operations, then you ought to be able to operate on these two subarrays in the same way that you would operate on scalars.
For example, if you have:
def mycalc(right, left):
return 2 * right + left
then
mycalc(a[:,1:], a[:,:-1])
gives:
array([[ 2, 9, 22],
[ 66, 97, 134]])
which is the same as you get when calling mycalc in place of just doing a subtraction in the original example:
np.array([mycalc(a[i, j + 1], a[i, j]) for i in range(a.shape[0]) for j in range(a.shape[1] - 1)]).reshape(a.shape[0], a.shape[1] - 1)
Given a list L = [0,1,2,3,4,5,6,7,8,9]. What's the best way to access/extract elements where their indices are given by a numpy array? Let nArr=np.array([0,-1,2,6]).
The resulting output should be another list P = [0,9,2,6].
It is clear that when the elements are uniform in shape, we can simply cast it into another numpy array, but what if it isn't? For example, M = [np.random.rand(5,5), np.random.rand(1)].
Stock Python has a convenience class, itemgetter:
In [27]: from operator import itemgetter
In [28]: L = [0,1,2,3,4,5,6,7,8,9]
In [29]: nArr=np.array([0,-1,2,6])
In [30]: itemgetter(*nArr)
Out[30]: operator.itemgetter(0, -1, 2, 6)
In [31]: itemgetter(*nArr)(L)
Out[31]: (0, 9, 2, 6)
Internally it does something equivalent to the list comprehension:
In [33]: [L[x] for x in nArr]
Out[33]: [0, 9, 2, 6]
So it isn't a fast operation like the array indexing (look at its code). It may be most useful as a way of doing sort or other operations where you'd like to define a key function that fetches multiple values.
https://stackoverflow.com/a/47585659/901925
Make a random nested list:
In [54]: arr = np.random.randint(0,10,(4,4))
In [55]: L = arr.tolist()
In [56]: L
Out[56]: [[9, 5, 8, 4], [1, 5, 5, 8], [8, 0, 5, 8], [1, 4, 5, 1]]
lexical sort by 'column':
In [57]: sorted(L)
Out[57]: [[1, 4, 5, 1], [1, 5, 5, 8], [8, 0, 5, 8], [9, 5, 8, 4]]
lexical sort by 'columns' 2 and 1 (in that order):
In [59]: sorted(L, key=itemgetter(2,1))
Out[59]: [[8, 0, 5, 8], [1, 4, 5, 1], [1, 5, 5, 8], [9, 5, 8, 4]]
To summarize the comments: lists do not support indexing by an array, like L[nArr] where nArr is an array of indexes. One normally uses list comprehension, [L[i] for i in nArr]. But if you want to, you can cast the list to a NumPy array of objects, which can then be indexed and sliced as any other NumPy array:
np.array(L, dtype=object)[nArr]
If you want a list returned, you can do:
np.array(L, dtype=object)[nArr].tolist()
But that's not nearly as pythonic as list comprehension, requires more memory, and very likely doesn't improve the speed.
Given an n x n (stochastic) Numpy array A and another Numpy array p in [0,1]^n. For each row A_i of A, I want to compute the smallest index j* such that p_i <= A_i,j*.
How can I implement this efficiently in Numpy? I guess this can somehow be done with numpy.random.choice.
One approach using broadcasting -
(p.T <= A).argmax(1)
In case, we don't find any element p_i <= A_i,j*, we can use an invalid specifier, say -1. For the same, we need a modified one -
mask = (p.T <= A)
out = np.where(mask.any(1), mask.argmax(1), -1)
Sample run -
In [140]: A
Out[140]:
array([[5, 3, 8, 0, 1],
[5, 4, 5, 2, 6],
[2, 5, 5, 0, 4],
[4, 2, 6, 5, 8],
[4, 2, 5, 2, 6]])
In [141]: p
Out[141]: array([[8, 5, 8, 5, 6]])
In [142]: mask = (p.T <= A)
In [143]: np.where(mask.any(1), mask.argmax(1), -1)
Out[143]: array([ 2, 0, -1, 2, 4])
Currently, I have a 3D Python list in jagged array format.
A = [[[0, 0, 0], [0, 0, 0], [0, 0, 0]], [[0], [0], [0]]]
Is there any way I could convert this list to a NumPy array, in order to use certain NumPy array operators such as adding a number to each element.
A + 4 would give [[[4, 4, 4], [4, 4, 4], [4, 4, 4]], [[4], [4], [4]]].
Assigning B = numpy.array(A) then attempting to B + 4 throws a type error.
TypeError: can only concatenate list (not "float") to list
Is a conversion from a jagged Python list to a NumPy array possible while retaining the structure (I will need to convert it back later) or is looping through the array and adding the required the better solution in this case?
The answers by #SonderingNarcissit and #MadPhysicist are already quite nice.
Here is a quick way of adding a number to each element in your list and keeping the structure. You can replace the function return_number by anything you like, if you want to not only add a number but do something else with it:
def return_number(my_number):
return my_number + 4
def add_number(my_list):
if isinstance(my_list, (int, float)):
return return_number(my_list)
else:
return [add_number(xi) for xi in my_list]
A = [[[0, 0, 0], [0, 0, 0], [0, 0, 0]], [[0], [0], [0]]]
Then
print(add_number(A))
gives you the desired output:
[[[4, 4, 4], [4, 4, 4], [4, 4, 4]], [[4], [4], [4]]]
So what it does is that it look recursively through your list of lists and everytime it finds a number it adds the value 4; this should work for arbitrarily deep nested lists. That currently only works for numbers and lists; if you also have e.g. also dictionaries in your lists then you would have to add another if-clause.
Since numpy can only work with regular-shaped arrays, it checks that all the elements of a nested iterable are the same length for a given dimension. If they are not, it still creates an array, but of type np.object instead of np.int as you would expect:
>>> B = np.array(A)
>>> B
array([[[0, 0, 0], [0, 0, 0], [0, 0, 0]],
[[0], [0], [0]]], dtype=object)
In this case, the "objects" are lists. Addition is defined for lists, but only in terms of other lists that extend the original, hence your error. [0, 0] + 4 is an error, while [0, 0] + [4] is [0, 0, 4]. Neither is what you want.
It may be interesting that numpy will make the object portion of your array nest as low as possible. Array you created is actually a 2D numpy array containing lists, not a 1D array containing nested lists:
>>> B[0, 0]
[0, 0, 0]
>>> B[0, 0, 0]
Traceback (most recent call last):
File "<ipython-input-438-464a9bfa40bf>", line 1, in <module>
B[0, 0, 0]
IndexError: too many indices for array
As you pointed out, you have two options when it comes to ragged arrays. The first is to pad the array so it is non-ragged, convert it to numpy, and only use the elements you care about. This does not seem very convenient in your case.
The other method is to apply functions to your nested array directly. Luckily for you, I wrote a snippet/recipe in response to this question, which does exactly what you need, down to being able to support arbitrary levels of nesting and your choice of operators. I have upgraded it here to accept non-iterable nested elements anywhere along the list, including the original input and do a primitive form of broadcasting:
from itertools import repeat
def elementwiseApply(op, *iters):
def isIterable(x):
"""
This function is also defined in numpy as `numpy.iterable`.
"""
try:
iter(x)
except TypeError:
return False
return True
def apply(op, *items):
"""
Applies the operator to the given arguments. If any of the
arguments are iterable, the non-iterables are broadcast by
`itertools.repeat` and the function is applied recursively
on each element of the zipped result.
"""
elements = []
count = 0
for iter in items:
if isIterable(iter):
elements.append(iter)
count += 1
else:
elements.append(itertools.repeat(iter))
if count == 0:
return op(*items)
return [apply(op, *items) for items in zip(*elements)]
return apply(op, *iters)
This is a pretty general solution that will work with just about any kind of input. Here are a couple of sample runs showing how it is relevant to your question:
>>> from operator import add
>>> elementwiseApply(add, 4, 4)
8
>>> elementwiseApply(add, [4, 0], 4)
[8, 4]
>>> elementwiseApply(add, [(4,), [0, (1, 3, [1, 1, 1])]], 4)
[[8], [4, [5, 7, [5, 5, 5]]]]
>>> elementwiseApply(add, [[0, 0, 0], [0, 0], 0], [[4, 4, 4], [4, 4], 4])
[[4, 4, 4], [4, 4], 4]
>>> elementwiseApply(add, [(4,), [0, (1, 3, [1, 1, 1])]], [1, 1, 1])
[[5], [1, [2, 4, [2, 2, 2]]]]
The result is always a new list or scalar, depending on the types of the inputs. The number of inputs must be the number accepted by the operator. operator.add always takes two inputs, for example.
Looping and adding is likely better, since you want to preserve the structure of the original. Plus, the error you mentioned indicates that you would need to flatten the numpy array and then add to each element. Although numpy operations tend to be faster than list operations, converting, flattening, and reverting is cumbersome and will probably offset any gains.
It we turn your list into an array, we get a 2d array of objects
In [1941]: A = [[[0, 0, 0], [0, 0, 0], [0, 0, 0]], [[0], [0], [0]]]
In [1942]: A = np.array(A)
In [1943]: A.shape
Out[1943]: (2, 3)
In [1944]: A
Out[1944]:
array([[[0, 0, 0], [0, 0, 0], [0, 0, 0]],
[[0], [0], [0]]], dtype=object)
When I try A+1 it iterates over the elements of A and tries to do +1 for each. In the case of numeric array it can do that in fast compiled code. With an object array it has to invoke the + operation for each element.
In [1945]: A+1
...
TypeError: can only concatenate list (not "int") to list
Lets try that again with a flat iteration over A:
In [1946]: for a in A.flat:
...: print(a+1)
....
TypeError: can only concatenate list (not "int") to list
The elements of A are lists; + for a list is a concatenate:
In [1947]: for a in A.flat:
...: print(a+[1])
...:
[0, 0, 0, 1]
[0, 0, 0, 1]
[0, 0, 0, 1]
[0, 1]
[0, 1]
[0, 1]
If the elements of A were themselves arrays, I think the +1 would work.
In [1956]: for i, a in np.ndenumerate(A):
...: A[i]=np.array(a)
...:
In [1957]: A
Out[1957]:
array([[array([0, 0, 0]), array([0, 0, 0]), array([0, 0, 0])],
[array([0]), array([0]), array([0])]], dtype=object)
In [1958]: A+1
Out[1958]:
array([[array([1, 1, 1]), array([1, 1, 1]), array([1, 1, 1])],
[array([1]), array([1]), array([1])]], dtype=object)
And to get back to the pure list form, we have apply tolist to both the elements of the object array and to the array itself:
In [1960]: A1=A+1
In [1961]: for i, a in np.ndenumerate(A1):
...: A1[i]=a.tolist()
In [1962]: A1
Out[1962]:
array([[[1, 1, 1], [1, 1, 1], [1, 1, 1]],
[[1], [1], [1]]], dtype=object)
In [1963]: A1.tolist()
Out[1963]: [[[1, 1, 1], [1, 1, 1], [1, 1, 1]], [[1], [1], [1]]]
This a rather round about way of adding a value to all elements of nested lists. I could have done that with one iteration:
In [1964]: for i,a in np.ndenumerate(A):
...: A[i]=[x+1 for x in a]
...:
In [1965]: A
Out[1965]:
array([[[1, 1, 1], [1, 1, 1], [1, 1, 1]],
[[1], [1], [1]]], dtype=object)
So doing math on object arrays is hit and miss. Some operations do propagate to the elements, but even those depend on how the elements behave.
It is unfortunate that the input structure is a jagged list. If one could adjust the method used to generate the list by assigning no data values, then there is so much more one can do. I made this comment in the initial post, but I will demonstrate how the design of the originals could be altered to facilitate obtaining more data while enabling the return of a list.
I have done this as a function so I could comment the inputs and outputs for further reference.
def num_46():
"""(num_46)... Masked array from ill-formed list
: http://stackoverflow.com/questions/40289943/
: converting-a-3d-list-to-a-3d-numpy-array
: A =[[[0, 0, 0], [0, 0, 0], [0, 0, 0]],
: [[0, 0, 0], [0, 0, 0], [0, 0, 0]], [[0], [0], [0]]]
"""
frmt = """
:Input list...
{}\n
:Masked array data
{}\n
:A sample calculations:
: a.count(axis=0) ... a.count(axis=1) ... a.count(axis=2)
{}\n
{}\n
{}\n
: and finally: a * 2
{}\n
:Return it to a list...
{}
"""
a_list = [[[0, 1, 2], [3, 4, 5], [6, 7, 8]],
[[9, 10, 11], [12, 13, 14], [15, 16, 17]],
[[18, -1, -1], [21, -1, -1], [24, -1, -1]]]
mask_val = -1
a = np.ma.masked_equal(a_list, mask_val)
a.set_fill_value(mask_val)
final = a.tolist(mask_val)
args = [a_list, a,
a.count(axis=0), a.count(axis=1), a.count(axis=2),
a*2, final]
print(dedent(frmt).format(*args))
return a_list, a, final
#----------------------
if __name__ == "__main__":
"""Main section... """
A, a, c = num_46()
Some results that show that the use of masked arrays may be preferable to jagged/malformed list structure.
:Input list...
[[[0, 1, 2], [3, 4, 5], [6, 7, 8]],
[[9, 10, 11], [12, 13, 14], [15, 16, 17]],
[[18, -1, -1], [21, -1, -1], [24, -1, -1]]]
:Masked array data
[[[0 1 2]
[3 4 5]
[6 7 8]]
[[9 10 11]
[12 13 14]
[15 16 17]]
[[18 - -]
[21 - -]
[24 - -]]]
:A sample calculations:
: a.count(axis=0) ... a.count(axis=1) ... a.count(axis=2)
[[3 2 2]
[3 2 2]
[3 2 2]]
[[3 3 3]
[3 3 3]
[3 0 0]]
[[3 3 3]
[3 3 3]
[1 1 1]]
: and finally: a * 2
[[[0 2 4]
[6 8 10]
[12 14 16]]
[[18 20 22]
[24 26 28]
[30 32 34]]
[[36 - -]
[42 - -]
[48 - -]]]
:Return it to a list...
[[[0, 1, 2], [3, 4, 5], [6, 7, 8]], [[9, 10, 11], [12, 13, 14], [15, 16, 17]], [[18, -1, -1], [21, -1, -1], [24, -1, -1]]]
Hope this helps someone.
I created two sorted ndarrays of the same length and joined them via vstack().
I refer to my array in the following as:
[[x1 y1][x2 y2][x3 y3][x4 y4]].
However, in reality I have a different value for x in every entry but only a few different values for y ascending from 0 to n.
So I got something like this:
[[x1 0],[x2 0],[x3 0],[x4 1],[x5 1],[x6 2],[x7 2],[x8 2][x9 3][x10 3]...]
My goal is to create a loop to get every first and last x-value for all the different y-values. So that the loop returns x1 and x3 (y == 0), x4 and x5 (y == 1), x6 and x8 (y == 2) and so on.
I am trying an ugly solution for this at the moment, creating sub arrays for all the different y-values, so that I can take the first and last element of each array to get the y-values I need but I was wondering what the most effective or pythonic way to achieve this would look like.
You can do in using 2 list comprehension. At the first one you can use itertools.groupby() in order to grouping your sub lists based on their second elements then choose the first and last item.
>>> from itertools import groupby
>>> from operator import itemgetter
>>>
>>> groups = [list(g) for _, g in groupby(lst, key=itemgetter(1))]
>>>
>>> [sub if len(sub)<2 else [sub[0], sub[-1]] for sub in groups]
[[['x1', 0], ['x3', 0]], [['x4', 1], ['x5', 1]], [['x6', 2], ['x8', 2]], [['x9', 3], ['x10', 3]]]
A default_dict is a nice way of collecting values like this
define the array (wish I could simply have copy-n-pasted):
In [186]: A=np.array([[1, 0],[2, 0],[3, 0],[4 ,1],[5 ,1],[6, 2],[7, 2],[8 ,2],[9 ,3],[10 ,3]])
In [187]: A
Out[187]:
array([[ 1, 0],
[ 2, 0],
[ 3, 0],
[ 4, 1],
[ 5, 1],
[ 6, 2],
[ 7, 2],
[ 8, 2],
[ 9, 3],
[10, 3]])
Make the dictionary, default value of list(), and append the array row:
In [188]: from collections import defaultdict
In [189]: dd = defaultdict(list)
In [190]: for row in A:
.....: dd[row[1]].append(row)
.....:
In [191]: dd
Out[191]: defaultdict(<class 'list'>, {0: [array([1, 0]), array([2, 0]), array([3, 0])], 1: [array([4, 1]), array([5, 1])], 2: [array([6, 2]), array([7, 2]), array([8, 2])], 3: [array([9, 3]), array([10, 3])]})
I can extract the 1st and last values into another dictionary:
In [192]: {key:[value[0],value[-1]] for key,value in dd.items()}
Out[192]:
{0: [array([1, 0]), array([3, 0])],
1: [array([4, 1]), array([5, 1])],
2: [array([6, 2]), array([8, 2])],
3: [array([9, 3]), array([10, 3])]}
Or I could have collected values in lists, etc., or a 3d array
In [195]: np.array([np.array([value[0],value[-1]]) for key,value in dd.items()])
Out[195]:
array([[[ 1, 0],
[ 3, 0]],
[[ 4, 1],
[ 5, 1]],
[[ 6, 2],
[ 8, 2]],
[[ 9, 3],
[10, 3]]])
itertools.groupby is nice, and may be faster. But you need to be comfortable with generators.
If the y values are sorted, you could find the values where value changes, and use those indices to split the array.