Numpy slice with array as index - python

I am trying to extract the full set of indices into an N-dimensional cube, and it seems like np.mgrid is just what I need for that. For example, np.mgrid[0:4,0:4] produces a 4 by 4 matrix containing all the indices into an array of the same shape.
The problem is that I want to do this in an arbitrary number of dimensions, based on the shape of another array. I.e. if I have an array a of arbitrary dimension, I want to do something like idx = np.mgrid[0:a.shape], but that syntax is not allowed.
Is it possible to construct the slice I need for np.mgrid to work? Or is there perhaps some other, elegant way of doing this? The following expression does what I need, but it is rather complicated and probably not very efficient:
np.reshape(np.array(list(np.ndindex(a.shape))),list(a.shape)+[len(a.shape)])

I usually use np.indices:
>>> a = np.arange(2*3).reshape(2,3)
>>> np.mgrid[:2, :3]
array([[[0, 0, 0],
[1, 1, 1]],
[[0, 1, 2],
[0, 1, 2]]])
>>> np.indices(a.shape)
array([[[0, 0, 0],
[1, 1, 1]],
[[0, 1, 2],
[0, 1, 2]]])
>>> a = np.arange(2*3*5).reshape(2,3,5)
>>> (np.mgrid[:2, :3, :5] == np.indices(a.shape)).all()
True

I believe the following does what you're asking:
>>> a = np.random.random((1, 2, 3))
>>> np.mgrid[map(slice, a.shape)]
array([[[[0, 0, 0],
[0, 0, 0]]],
[[[0, 0, 0],
[1, 1, 1]]],
[[[0, 1, 2],
[0, 1, 2]]]])
It produces exactly the same result as np.mgrid[0:1,0:2,0:3]except that it uses a's shape instead of hard-coded dimensions.

Related

using integer as index for multidimensional numpy array

I have boolean array of shape (n_samples, n_items) which represents a set: my_set[i, j] tells if sample i contains item j.
To populate it, the array is initialized as zeros, and receive another array of integers, with shape (n_samples, 3), telling for each example, three elements that belongs to it, for instance:
my_set = np.zeros((2, 5), dtype=bool)
init_values = np.array([[1,3,4], [0,1,2]], dtype=np.int64)
So, I need to fill my_set in row 0 and columns 1, 3, 4 and in row 1, columns 0, 1, 2, with with ones.
my_set contain valid values in appropriated range (that is, in [0, n_items)), and each column doesn't contain duplicated items.
Some failed approaches:
I know that a list of integers (or array) can be used as index, so I tried to use init_values as index straightforward, but it failed:
my_set[init_values] = 1
File "<ipython-input-9-9b2c4d19f4f6>", line 1, in <cell line: 1>
my_set[init_values] = 1
IndexError: index 3 is out of bounds for axis 0 with size 2
I don't know why the 3 is indexing over the first axis, so I tried a second approach: "pick up all rows and index only desired columns", using a mix of slicing and integer index. And it didn't throw error, but didn't worked as expected: checkout the shape, I expect it to be (2, 3), however...
my_set[:, init_values].shape
Out[11]: (2, 2, 3)
Not sure why it didn't work, but at least the first axis looks correct, so I tried to pick up only the first column, which is a list of integers, and therefore it is "more natural"... once again, it didn't worked:
my_set[:, init_values[:,0]].shape
Out[12]: (2, 2)
I expected this shape to be (2, 1) since I wanted all rows with a single column on each, corresponding to the indexes given in init_values.
I decided to go back to integer index approach for the first axis.... and it worked:
my_set[np.arange(len(my_set)), init_values[:,0]].shape
Out[13]: (2,)
However, it only works wor one column, so I need to iterate over columns to make it really work, but it looks like a good-initial workaround.
Current solution
So, to solve my original problem, I wrote this:
for c in range(init_values.shape[1])
my_set[np.arange(len(my_set)), init_values[:,c]] = 1
# now lets check my_set is properly filled
print(my_set)
Out[14]: [[False True False True True]
[ True True True False False]]
which is exactly what I need.
Question(s):
That said, here goes my main question:
Is there a more efficient way to do this? I see it quite inefficient as the number of elements grows (for this example I used 3 but I actually need larger values).
In addition to this I'd like to understand why using np.arange on the first index behaves different from slicing it as :: I didn't expect this behavior.
Any other comment to understand why previous approaches failed, are also welcome.
You only have column indices, so you also need to create their corresponding row indices:
>>> my_set[np.arange(len(my_set))[:, None], init_values] = 1
>>> my_set
array([[False, True, False, True, True],
[ True, True, True, False, False]])
[:, None] is used to convert the row indices row vector to the column vector, so that row and column indices have compatible shapes for broadcasting:
>>> np.arange(len(my_set))[:, None]
array([[0],
[1]])
>>> np.broadcast_arrays(np.arange(len(my_set))[:, None], init_values)
[array([[0, 0, 0],
[1, 1, 1]]),
array([[1, 3, 4],
[0, 1, 2]], dtype=int64)]
The essence of slicing is to apply the index of other dimensions to each index in the slicing range of this dimension. Here is a simple test. The matrix to be indexed is as follows:
>>> ar = np.arange(4).reshape(2, 2)
>>> ar
array([[0, 1],
[2, 3]])
If you want to get elements whit indices 0 and 1 in row 0, and elements with indices 1 and 0 in row 1, but you use the combination of column indices [[0, 1], [1, 0]] and slice, you will get:
>>> ar[:, [[0, 1], [1, 0]]]
array([[[0, 1],
[1, 0]],
[[2, 3],
[3, 2]]])
This is equivalent to combining the row index from 0 to 1 with the column indices respectively:
>>> ar[0, [[0, 1], [1, 0]]]
array([[0, 1],
[1, 0]])
>>> ar[1, [[0, 1], [1, 0]]]
array([[2, 3],
[3, 2]])
In fact, broadcasting is used secretly here. The actual indices are:
>>> np.broadcast_arrays(0, [[0, 1], [1, 0]])
[array([[0, 0],
[0, 0]]),
array([[0, 1],
[1, 0]])]
>>> np.broadcast_arrays(1, [[0, 1], [1, 0]])
[array([[1, 1],
[1, 1]]),
array([[0, 1],
[1, 0]])]
This is not the same as the indices you actually need. Therefore, you need to manually generate the correct row indices for broadcasting:
>>> ar[[[0], [1]], [[0, 1], [1, 0]]]
array([[0, 1],
[3, 2]])
>>> np.broadcast_arrays([[0], [1]], [[0, 1], [1, 0]])
[array([[0, 0],
[1, 1]]),
array([[0, 1],
[1, 0]])]

How to remove duplicate elements from list of numpy arrays?

I have a list of numpy arrays. How can I can remove duplicate arrays from the list?
I tried set(arrays) but got the error "TypeError: unhashable type: 'numpy.ndarray"
Example with 2d arrays (mine are actually 3d). Here the starting list is length 10. The output list of distinct arrays should be length 8, because the elements at indexes 0, 5, 9 are all equal.
>>> import numpy
>>> numpy.random.seed(0)
>>> arrays = [numpy.random.randint(2, size=(2,2)) for i in range(10)]
>>> numpy.array_equal(arrays[0], arrays[5])
True
>>> numpy.array_equal(arrays[5], arrays[9])
True
You can start off by collecting all those arrays from the input list into a NumPy array. Then, lex-sort it, which would bring all the duplicate rows in consecutive order. Then, do differentiation along the rows, giving us all zeros for duplicate rows, which could be extracted using (sorted_array==0).all(1). This would give you a mask of starting positions of duplicates, which could be used to select elements from the concatenated array. Finally, the selected elements are reshaped and sent back to a list of arrays format by splitting along the first axis. Thus, you would have a vectorized implementation, like so -
A = numpy.concatenate((arrays)).reshape(-1,arrays[0].size)
sortedA = A[numpy.lexsort(A.T)]
idx = numpy.append(True,~(numpy.diff(sortedA,axis=0)==0).all(1))
out = numpy.vsplit((A.reshape((len(arrays),) + arrays[0].shape))[idx],idx.sum())
Sample input, output -
In [238]: arrays
Out[238]:
[array([[0, 1],
[1, 0]]), array([[1, 1],
[1, 1]]), array([[1, 1],
[1, 0]]), array([[0, 1],
[0, 0]]), array([[0, 0],
[0, 1]]), array([[0, 1],
[1, 0]]), array([[0, 1],
[1, 1]]), array([[1, 0],
[1, 0]]), array([[1, 0],
[1, 1]]), array([[0, 1],
[1, 0]])]
In [239]: out
Out[239]:
[array([[[0, 1],
[1, 0]]]), array([[[1, 1],
[1, 1]]]), array([[[1, 1],
[1, 0]]]), array([[[0, 1],
[1, 0]]]), array([[[0, 1],
[1, 1]]]), array([[[1, 0],
[1, 0]]]), array([[[1, 0],
[1, 1]]]), array([[[0, 1],
[1, 0]]])]
In the end, looped over the list comparing with numpy.array_equal
distinct = list()
for M in arrays:
if any(numpy.array_equal(M, N) for N in distinct):
continue
distinct.append(M)
It's O(n**2) but what the hey.
You can use tostring and fromstring to convert to and from hashable items (byte strings). You can put them in a set:
>>> arrs = [np.random.random(10) for _ in range(10)]
>>> arrs += arrs # create duplicate items
>>>
>>> darrs = set((arr.tostring(), arr.dtype) for arr in arrs)
>>> uniq_arrs = [np.fromstring(arr, dtype=dtype) for arr, dtype in darrs]

Python - Flatten lists of lists of two different types in one function

As input, I receive two types of lists of lists made of x and y coordinates that represent polygon and multipolygon geometries. In fact the input is represented in the GeoJson standard
list1 represents coordinates of a simple polygon geometry and list2 represent a multipolygon geometry:
list1 = [[[0 , 0], [0, 1], [0 ,2]]]
list2 = [[[[0, 0] , [0, 1], [0, 2]], [[1, 0], [1, 1], [1 ,2]]]]
Multipolygon geometry (list2) are represented by a list of lists one level deeper than simple polygon geometry (list1).
I want to flatten those lists in order to get those output:
if input is list1 type : list1_out = [[0, 0, 0, 1, 0, 2]]
if input is list2 type : list2_out = [[0, 0, 0, 1, 0, 2], [1, 0, 1, 1, 1, 2]]
I am using the following code that is usually used to flatten lists where input can be a list of the two types:
[coords for polygon in input for coords in polygon]
With this code above, the output for list1 is correct but the output of list2 is the following:
[[[0, 0] ,[0, 1], [0, 2]], [1, 0], [1, 1], [1, 2]]]
Is there a function that could deeply flatten those two types of lists to get the expected output?
Edit: Performance really matter here as the lists are really big
Edit 2: I can use a if sentence to filter each type of list
Try;
for list1
[sum(x, []) for x in list1]
for list2
[sum(x, []) for a in list2 for x in a]
Demo
>>> list1 = [[[0 , 0], [0, 1], [0 ,2]]]
>>> list2 = [[[[0, 0] , [0, 1], [0, 2]], [[1, 0], [1, 1], [1 ,2]]]]
>>> [sum(x, []) for x in list1]
[[0, 0, 0, 1, 0, 2]]
>>> [sum(x, []) for a in list2 for x in a]
[[0, 0, 0, 1, 0, 2], [1, 0, 1, 1, 1, 2]]
>>>
Casting your data to numpy.array, you can use reshape:
import numpy as np
t = np.array([[[[0, 0] , [0, 1], [0, 2]], [[1, 0], [1, 1], [1 ,2]]]])
print t.shape # (1, 2, 3, 2)
t = np.reshape([1, 2, 6]) # merging the last 2 coordinates/axes
flattens the second list as you want.
A code which works for both list (since in both cases you want to merge the last to axis) is:
t = np.array(yourList)
newShape = t.shape[:-2] + (t.shape[-2] * t.shape[-1], ) # this assumes your
# arrays are always at least 2 dimensional (no need to flatten them otherwise...)
t = t.reshape(t, newShape)
The key thing is to keep the shape unchanged up to the last 2 axes (hence
t.shape[:-2]), but to merge the two last axes together (using an axis of length t.shape[-2] * t.shape[-1])
We are creating the new shape by concatenation of these two tuples (hence the extra comma after the multiplication).
Edit: np.reshape() doc is here. The important parameters are the input array (your list, cast as an array), and a tuple which I've called newShape, which represents the lengths along the new axes.

How to convert list of tuple to an array

new = zero(rows_A,cols_B)
for i in range(rows_A):
for j in range(cols_B):
new[i][j] += np.sum(A[i] * B[:,j])
If I'm using this form of array [[0, 0, 0], [0, 1, 0], [0, 2, 1]] in B
it is giving me an error
TypeError: list indices must be integers, not tuple
but if I'm using same array B, in place of A, it's working well.
I am getting this type of return array
[[0, 0, 0], [0, 1, 0], [0, 2, 1]]
so i want to convert it into this form
[[0 0 0]
[0 1 0]
[0 2 1]]
numpy.asarray will do that.
import numpy as np
B = np.asarray([[0, 0, 0], [0, 1, 0], [0, 2, 1]])
This produces
array([[0, 0, 0],
[0, 1, 0],
[0, 2, 1]])
which can be indexed with [:, j].
Also, it looks like you're trying to do a matrix product. You can do the same thing with just one line of code using np.dot:
new = np.dot(A, B)
It appears that B is a list. You can't index it as B[:,i] -- Which is implcitly passed to __getitem__ as (slice(None,None,None),i) -- i.e. a tuple.
You could convert B to a numpy array first (B = np.array(B)) and then go from there ...

Python Two-Dimensional Query..

I am seeing a very unusual behavior in python.. Kindly let me know what am i doing wrong!!
bc = [[0]*(n+1)]*(n+1)
for i in range(n+1):
bc[i][i] = 1
print (bc)
Output
[[1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1]]
I am trying to initialize the diagonal elements of two dimensional array to 1, but it is initializing all the elements with 1. I think I am doing something wrong with accessing two dimensional Array..
Also, kindly let me know how can I use two loops to access all the elements of two dimensional array.. my next step..
Thanks.
Your array is initialized incorrectly. The correct way to initialize a 2d array is this:
bc = [[0 for i in xrange(n + 1)] for i in xrange(n + 1)]
It's a common mistake, but the * operator copies the pointer to an list rather than copying the list, so while it looks like you have a 2d list, you actually have a 1d list of pointers to the same list.
the problem is that each array in your array is the same array in memory. you need a new array each time. the [[0]]*6 will for example make 6 of the same arrays in an array, editing one of them will update the other ones.
e.g.
>>> x=[1]
>>> y=x
>>> x.append(3)
>>> x
[1, 3]
>>> y
[1, 3]
>>> z=[x]*3
>>> x.append(6)
>>> z
[[1, 3, 4, 6], [1, 3, 4, 6], [1, 3, 4, 6]]
here is a fix by simply editing bc to be n+1 different arrays:
n=4
bc = [[0]*(n+1) for i in range(n+1)]
for i in range(n+1):
bc[i][i] = 1
print (bc)
[[1, 0, 0, 0, 0], [0, 1, 0, 0, 0], [0, 0, 1, 0, 0], [0, 0, 0, 1, 0], [0, 0, 0, 0, 1]]
Try this one:
bc = [[0 for i in range(n+1)] for j in range(n+1)]
In your example you have only one (!) instance of [0] which is referenced multiple times. So if you change that instance, all references are changed.

Categories

Resources