Convert n 2D arrays into a single array with lookup table - python

I have a series of n 2D arrays that are presented to a function as a 3D array of depth n. I want to generate a tuple of each set of values along the third axis, then replace each of these tuples with a single index value and a lookup table.
I'm working in python, with some large datasets so it needs to be scalable, so will probably use numpy. Other solutions are accepted though.
Here's what I've got so far:
In [313]: arr=np.array([[[0,0,0],[1,2,2],[3,0,0]],[[0,1,0],[1,3,2],[0,0,0]]])
In [314]: stacked = np.stack((arr[0], arr[1]), axis=2)
In [315]: pairs = stacked.reshape(-1, arr.shape[0])
In [316]: pairs
Out[316]:
array([[0, 0],
[0, 1],
[0, 0],
[1, 1],
[2, 3],
[2, 2],
[3, 0],
[0, 0],
[0, 0]])
In [317]: unique = set([tuple(a) for a in pairs])
In [318]: lookup = sorted(list(unique))
In [319]: lookup
Out[319]: [(0, 0), (0, 1), (1, 1), (2, 2), (2, 3), (3, 0)]
Now, I want to create an output array, using the indexes of the values in the lookup table, so the output would be:
[0, 1, 0, 2, 4, 3, 5, 0, 0]
This example is just with two input 2D arrays, but there could be many more.

So, I've come up with a solution that produces the outputs I want, but is it the most efficient method of doing this? In particular, the lookup.index call is a bit costly. Does anyone have a better way?
def squash_array(arr):
tuples = arr.T.reshape(-1, arr.shape[0])
lookup = sorted(list(set([tuple(a) for a in tuples])))
out_arr = np.array([lookup.index(tuple(a)) for a in tuples]).reshape(arr.shape[1:][::-1]).T
return out_arr, lookup

Related

Convert data from numpy where() [duplicate]

This question already has an answer here:
Numpy binary matrix - get rows and columns of True elements
(1 answer)
Closed 2 years ago.
I have a large set of data and I am trying to get it into a specific form (So i can re-use someone else's code). Here's an example of a smaller set I am working with.
>>> a = np.array([[0, 1, 2], [0, 2, 4], [0, 3, 6]])
>>> a
array([[0, 1, 2],
[0, 2, 4],
[0, 3, 6]])
>>> np.where(a==0)
(array([0, 1, 2]), array([0, 0, 0]))
So, what this returns is two arrays in a tuple.
The places that are 0 are (0,0), (1,0), and (2,0)
I'd like to get this data into this form:
[(0,0), (1,0), (2,0)]
Which is a list of tuples.
Appreciate any pointers.
Try numpy.argwhere
[tuple(x) for x in numpy.argwhere(a==0)]
list(zip(*np.where(a==0)))
How this works:
The zip function will produce a sequence of tuples, consisting of:
tuple containing element 0 from each of its arguments
tuple containing element 1 from each of its arguments
... etc..
So the elements of this sequence will be in the required form if the arguments to zip are the elements of the tuple returned by numpy.where. The use of * means to expand this tuple to separate positional parameters, as required, rather than passing in the tuple itself. It is then only necessary to call list() to iterate over the iterator returned by zip and convert the values into a list.
Example:
>>> a = np.array([[0, 1, 2], [0, 2, 4], [0, 3, 6]]) # array in the question
>>> list(zip(*np.where(a==0)))
[(0, 0), (1, 0), (2, 0)] # list of 2-tuples
>>> a = a.reshape(1,3,3) # now a 3d-array (adds slowest varying dimension of size 1)
>>> list(zip(*np.where(a==0)))
[(0, 0, 0), (0, 1, 0), (0, 2, 0)] # now you get a list of 3-tuples
You need this:
np.argwhere(a==0)
output for your example:
[[0 0]
[1 0]
[2 0]]
And if you need a list of tuples instead:
list(map(tuple,np.argwhere(a==0)))
output:
[(0, 0), (1, 0), (2, 0)]

Flatten a matrix into an array containing the index positions of the values

Given a numpy matrix, my_matrix.
import numpy as np
my_matrix = np.array([[1.2,2.3,None],[4.5,3.4,9.3]])
How can you efficiently flatten it into the following array containing the index positions of my_matrix?
[[0,0],[0,1],[0,2],[1,0],[1,1],[1,2]]
you can try:
rows, cols = my_matrix.shape
[[i, j] for i in range(rows) for j in range(cols)]
output:
[[0, 0], [0, 1], [0, 2], [1, 0], [1, 1], [1, 2]]
You can use numpy.indices() and reshape the returned values a little bit
np.indices(my_matrix.shape).transpose(1, 2, 0).reshape(-1, 2)
# array([[0, 0],
# [0, 1],
# [0, 2],
# [1, 0],
# [1, 1],
# [1, 2]])
You can create such a list easily with pure python:
from itertools import product
list(product(range(my_matrix.shape[0]), range(my_matrix.shape[1])))
Result is
[(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2)]
If you are not using the explicit list but only want to iterate over the indices, leave the list(...) away. This will save memory and computation time, as the indices will be generated when they are used only.
However, if you want to use the result to index a numpy array, it may be more convenient to use np.ix_:
np.ix_(np.arange(my_matrix.shape[0]), np.arange(my_matrix.shape[1]))
Output is
(array([[0],
[1]]), array([[0, 1, 2]]))

Access the 2D index of an element while running an apply function over it in pandas/numpy?

I am trying to iterate over an array in numpy and applying a function over every element using some calculation on the index. So I have code that looks something like this:
# f takes in a matrix element and returns some calculation based on the
# element's 2D index i,j
def f(elt, i,j):
return elt*i + elt*j
# create a 2x3 matrix, A
A = np.array([[1,2,3]
[4,5,6]])
# Transform A by applying the function `f` over every element.
A_1 = applyFunction(f, A)
print(A_1)
# A_1 should now be a matrix that is transformed:
# [[0 2 6]
[4 10 18]
It is very easy to do this using a for-loop, but my matrix is so big that it is not efficient to do so in this case. I am trying to use numpy's builtin methods like apply or apply_along_axis
I have also thought about converting the matrix to a pandas DataFrame and then maybe using the column and row names as the indices.. but I don't know how to have access to that in an apply_along_axis function call.
Any help would be appreciated. Thanks!
def f(elt, i,j):
return (i,j)
A = [[1,2,3],
[4,5,6]]
In [306]: [[f(None,i,j) for j in range(len(A[0]))] for i in range(len(A))]
Out[306]: [[(0, 0), (0, 1), (0, 2)], [(1, 0), (1, 1), (1, 2)]]
An array solution, with probably about the same speed:
In [309]: np.frompyfunc(f,3,1)(None, [[0],[1]],[0,1,2])
Out[309]:
array([[(0, 0), (0, 1), (0, 2)],
[(1, 0), (1, 1), (1, 2)]], dtype=object)
In [310]: _.shape
Out[310]: (2, 3)
Fastest numpy approach, but doesn't use your f function:
In [312]: I,J = np.meshgrid(range(2),range(3),indexing='ij')
In [313]: I
Out[313]:
array([[0, 0, 0],
[1, 1, 1]])
In [314]: J
Out[314]:
array([[0, 1, 2],
[0, 1, 2]])
In [315]: np.stack((I,J), axis=2)
Out[315]:
array([[[0, 0],
[0, 1],
[0, 2]],
[[1, 0],
[1, 1],
[1, 2]]])
In [316]: _.shape
Out[316]: (2, 3, 2)

Python reshape list to ndim array

Hi I have a list flat which is length 2800, it contains 100 results for each of 28 variables: Below is an example of 4 results for 2 variables
[0,
0,
1,
1,
2,
2,
3,
3]
I would like to reshape the list to an array (2,4) so that the results for each variable are in a single element.
[[0,1,2,3],
[0,1,2,3]]
You can think of reshaping that the new shape is filled row by row (last dimension varies fastest) from the flattened original list/array.
If you want to fill an array by column instead, an easy solution is to shape the list into an array with reversed dimensions and then transpose it:
x = np.reshape(list_data, (100, 28)).T
Above snippet results in a 28x100 array, filled column-wise.
To illustrate, here are the two options of shaping a list into a 2x4 array:
np.reshape([0, 0, 1, 1, 2, 2, 3, 3], (4, 2)).T
# array([[0, 1, 2, 3],
# [0, 1, 2, 3]])
np.reshape([0, 0, 1, 1, 2, 2, 3, 3], (2, 4))
# array([[0, 0, 1, 1],
# [2, 2, 3, 3]])
You can specify the interpretation order of the axes using the order parameter:
np.reshape(arr, (2, -1), order='F')
Step by step:
# import numpy library
import numpy as np
# create list
my_list = [0,0,1,1,2,2,3,3]
# convert list to numpy array
np_array=np.asarray(my_list)
# reshape array into 4 rows x 2 columns, and transpose the result
reshaped_array = np_array.reshape(4, 2).T
#check the result
reshaped_array
array([[0, 1, 2, 3],
[0, 1, 2, 3]])
The answers above are good. Adding a case that I used.
Just if you don't want to use numpy and keep it as list without changing the contents.
You can run a small loop and change the dimension from 1xN to Nx1.
tmp=[]
for b in bus:
tmp.append([b])
bus=tmp
It maybe not efficient in case of very large numbers. But it works for small set of numbers.
Thanks

A function for returning a list of tuples that correspond to indices of ALL elements of an ndarray sorted?

I'm aware of numpy.argsort(), but what it does is return indices of elements in an array that would be sorted along a certain axis.
What I need is to sort all the values in an N-dimensional array and have a linear list of tuples as as result.
Like this:
>>> import numpy
>>> A = numpy.array([[7, 8], [9, 5]])
>>> numpy.magic(A)
[(1, 0), (0, 1), (0, 0), (1, 1)]
P.S. I don't even understand what the output of argsort is trying to tell me for this array.
np.argsort(A) is sorting each row of A separately. For example,
In [21]: np.argsort([[6,5,4],[3,2,1]])
Out[21]:
array([[2, 1, 0],
[2, 1, 0]])
Instead, you want to flatten your array into a 1-dimensional array of values, then argsort that. That can be done by setting the axis parameter to None (thanks to #Akavall for pointing this out):
In [23]: np.argsort(A, axis=None)
Out[23]: array([3, 0, 1, 2])
Then use np.unravel_index to recover the associated index in A.
In [14]: import numpy as np
In [15]: A = np.array([[7, 8], [9, 5]])
In [4]: np.column_stack(np.unravel_index(np.argsort(A, axis=None)[::-1], A.shape))
Out[4]:
array([[1, 0],
[0, 1],
[0, 0],
[1, 1]])
Note, for NumPy version 1.5.1 or older, np.unravel_index raises a ValueError if passed an array-like object for its first argument. In that case, you could use a list comprehension:
In [17]: [np.unravel_index(p, A.shape) for p in np.argsort(A, axis=None)[::-1]]
Out[17]: [(1, 0), (0, 1), (0, 0), (1, 1)]

Categories

Resources