I'm trying to create a function that will calculate the lattice distance (number of horizontal and vertical steps) between elements in a multi-dimensional numpy array. For this I need to retrieve the actual numbers from the indexes of each element as I iterate through the array. I want to store those values as numbers that I can run through a distance formula.
For the example array A
A=np.array([[1,2,3],[4,5,6],[7,8,9]])
I'd like to create a loop that iterates through each element and for the first element 1 it would retrieve a=0, b=0 since 1 is at A[0,0], then a=0, b=1 for element 2 as it is located at A[0,1], and so on...
My envisioned output is two numbers (corresponding to the two index values for that element) for each element in the array. So in the example above, it would be the two values that I am assigning to be a and b. I only will need to retrieve these two numbers within the loop (rather than save separately as another data object).
Any thoughts on how to do this would be greatly appreciated!
As I've become more familiar with the numpy and pandas ecosystem, it's become clearer to me that iteration is usually outright wrong due to how slow it is in comparison, and writing to use a vectorized operation is best whenever possible. Though the style is not as obvious/Pythonic at first, I've (anecdotally) gained ridiculous speedups with vectorized operations; more than 1000x in a case of swapping out a form like some row iteration .apply(lambda)
#MSeifert's answer much better provides this and will be significantly more performant on a dataset of any real size
More general Answer by #cs95 covering and comparing alternatives to iteration in Pandas
Original Answer
You can iterate through the values in your array with numpy.ndenumerate to get the indices of the values in your array.
Using the documentation above:
A = np.array([[1,2,3],[4,5,6],[7,8,9]])
for index, values in np.ndenumerate(A):
print(index, values) # operate here
You can do it using np.ndenumerate but generally you don't need to iterate over an array.
You can simply create a meshgrid (or open grid) to get all indices at once and you can then process them (vectorized) much faster.
For example
>>> x, y = np.mgrid[slice(A.shape[0]), slice(A.shape[1])]
>>> x
array([[0, 0, 0],
[1, 1, 1],
[2, 2, 2]])
>>> y
array([[0, 1, 2],
[0, 1, 2],
[0, 1, 2]])
and these can be processed like any other array. So if your function that needs the indices can be vectorized you shouldn't do the manual loop!
For example to calculate the lattice distance for each point to a point say (2, 3):
>>> abs(x - 2) + abs(y - 3)
array([[5, 4, 3],
[4, 3, 2],
[3, 2, 1]])
For distances an ogrid would be faster. Just replace np.mgrid with np.ogrid:
>>> x, y = np.ogrid[slice(A.shape[0]), slice(A.shape[1])]
>>> np.hypot(x - 2, y - 3) # cartesian distance this time! :-)
array([[ 3.60555128, 2.82842712, 2.23606798],
[ 3.16227766, 2.23606798, 1.41421356],
[ 3. , 2. , 1. ]])
Another possible solution:
import numpy as np
A=np.array([[1,2,3],[4,5,6],[7,8,9]])
for _, val in np.ndenumerate(A):
ind = np.argwhere(A==val)
print val, ind
In this case you will obtain the array of indexes if value appears in array not once.
Related
I often need to convert a multi-column (or 2D) numpy array into an indicator vector in a stable (i.e., order preserved) manner.
For example, I have the following numpy array:
import numpy as np
arr = np.array([
[2, 20, 1],
[1, 10, 3],
[2, 20, 2],
[2, 20, 1],
[1, 20, 3],
[2, 20, 2],
])
The output I like to have is:
indicator = [0, 1, 2, 0, 3, 2]
How can I do this (preferably using numpy only)?
Notes:
I am looking for a high performance (vectorized) approach as the arr (see the example above) has millions of rows in a real application.
I am aware of the following auxiliary solutions, but none is ideal. It would be nice to hear expert's opinion.
My thoughts so far:
1. Numpy's unique: This would not work, as it is not stable:
arr_unq, indicator = np.unique(arr, axis=0, return_inverse=True)
print(arr_unq)
# output 1:
# [[ 1 10 3]
# [ 1 20 3]
# [ 2 20 1]
# [ 2 20 2]]
print(indicator)
# output 2:
# [2 0 3 2 1 3]
Notice how the indicator starts from 2. This is because unique function returns a "sorted" array (see output 1). However, I would like it to start from 0.
Of course I can use LabelEncoder from sklearn to convert the items in a manner that they start from 0 but I feel that there is a simple numpy trick that I can use and therefore avoid adding sklearn dependency to my program.
Or I can resolve this by a dictionary mapping like below, but I can imagine that there is a better or more elegant solution:
dct = {}
for idx, item in enumerate(indicator):
if item not in dct:
dct[item] = len(dct)
indicator[idx] = dct[item]
print(indicator)
# outputs:
# [0 1 2 0 3 2]
2. Stabilizing numpy's unique output: This solution is already posted in stackoverflow and correctly returns an stable unique array. But I do not know how to convert the returned indicator vector (returned when return_inverse=True) to represent the values in an stable order starting from 0.
3. Pandas's get_dummies: function. But it returns a "hot encoding" (matrix of indicator values). In contrast, I would like to have an indicator vector. It is indeed possible to convert the "hot encoding" to the indicator vector by few lines of code and data manipulation. But again that approach is not going to be highly efficient.
In addition to return_inverse, you can add the return_index option. This will tell you the first occurrence of each sorted item:
unq, idx, inv = np.unique(arr, axis=0, return_index=True, return_inverse=True)
Now you can use the fact that np.argsort is its own inverse to fix the order. Note that idx.argsort() places unq into sorted order. The corrected result is therefore
indicator = idx.argsort().argsort()[inv]
And of course the byproduct
unq = unq[idx.argsort()]
Of course there's nothing special about these operations to 2D.
A Note on the Intuition
Let's say you have an array x:
x = np.array([7, 3, 0, 1, 4])
x.argsort() is the index that tells you what elements of x are placed at each of the locations in the sorted array. So
i = x.argsort() # 2, 3, 1, 4, 0
But how would you get from np.sort(x) back to x (which is the problem you express in #2)?
Well, it happens that i tells you the original position of each element in the sorted array: the first (smallest) element was originally at index 2, the second at 3, ..., the last (largest) element was at index 0. This means that to place np.sort(x) back into its original order, you need the index that puts i into sorted order. That means that you can write x as
np.sort(x)[i.argsort()]
Which is equivalent to
x[i][i.argsort()]
OR
x[x.argsort()][x.argsort().argsort()]
So, as you can see, np.argsort is effectively its own inverse: argsorting something twice gives you the index to put it back in the original order.
To make a long story short, I'm trying to generate all the possible permutations of a set of numpy arrays. I have three numbers [j,k,m] and I would like to specify a maximum value for each one [J,K,M]. How would I then get all the combinations of arrays under these values? How could I force the k values to always be even as well? For instance:
So with the max values set to [1,2,2], the permutations would be: [0,0,0], [0,0,1], [0,0,2], [0,2,0], [0,2,1], [0,2,2], [1,0,0], [1,0,1] ...
I realise I don't have any example to code to show but I'm afraid I have literally no idea where to start with this.
From other answers it seems like sympy would be of some use?
I found answer that might be interested for you here and generalised it. So you can construct list of possible values for each item like so:
X = [[0, 1], [0, 1, 2], [0, 1, 2]]
And then use:
np.array(np.meshgrid(*X)).T.reshape(-1, len(X))
Output contains 18 items that you wanted. Actually, if you have only maximum values [J, K, L], you can construct X using X = [range(J+1), range(K+1), range(L+1)]
I have two numpy arrays:
import numpy as np
a=np.array([1,-2,3])
b=np.array([-2,-1,4])
I know how to create an array of the minimum of each pair of entries:
np.minimum(a,b)
array([-2, -2, 3])
And how to get an array of the absolute value of the smallest magnitude vales
np.minimum(abs(a),abs(b))
array([1, 1, 3])
But what I would like is an array of the smallest magnitude values but retaining the sign of the values, in other words to get
array([1,-1,3])
as my output... I can't think of a python-esque way of doing this in one line, only resorting to long-winded loops and if-thens...
Use np.where with absolute values as conditions and original arrays as return elements:
np.where(np.abs(a) > np.abs(b), b, a)
# array([ 1, -1, 3])
The use of np.where is the best answer, but I just worked out an alternative solution using a python iteration:
np.array([a[i] if abs(a[i])<abs(b[i]) else b[i] for i in range(len(a))])
The np.where function will be much faster of course.
i have a 3D array of int32. I would like to transform each item from array to its corresponding bit value on "n" th position. My current approach is to loop through the whole array, but I think it can be done much more efficiently.
for z in range(0,dim[2]):
for y in range(0,dim[1]):
for x in range(0,dim[0]):
byte='{0:032b}'.format(array[z][y][x])
array[z][y][x]=int(byte>>n) & 1
Looking forward to your answers.
If you are dealing with large arrays, you are better off using numpy. Applying bitwise operations on a numpy array is much faster than applying it on python lists.
import numpy as np
a = np.random.randint(1,65, (2,2,2))
print a
Out[12]:
array([[[37, 46],
[47, 34]],
[[ 3, 15],
[44, 57]]])
print (a>>1)&1
Out[16]:
array([[[0, 1],
[1, 1]],
[[1, 1],
[0, 0]]])
Unless there is an intrinsic relation between the different points, you have no other choice than to loop over them to discover their current values. So the best you can do, will always be O(n^3)
What I don't get however, is why you go over the hassle of converting a number to a 32bit string, then back to int.
If you want to check if the nth bit of a number is set, you would do the following:
power_n = 1 << (n - 1)
for z in xrange(0,dim[2]):
for y in xrange(0,dim[1]):
for x in xrange(0,dim[0]):
array[z][y][x]= 0 if array[z][y][x] & power_n == 0 else 1
Not that in this example, I'm assuming that N is a 1-index (first bit is at n=1).
Consider the following NumPy array:
a = np.array([[1,4], [2,1],(3,10),(4,8)])
This gives an array that looks like the following:
array([[ 1, 4],
[ 2, 1],
[ 3, 10],
[ 4, 8]])
What I'm trying to do is find the minimum value of the second column (which in this case is 1), and then report the other value of that pair (in this case 2). I've tried using something like argmin, but that gets tripped up by the 1 in the first column.
Is there a way to do this easily? I've also considered sorting the array, but I can't seem to get that to work in a way that keeps the pairs together. The data is being generated by a loop like the following, so if there's a easier way to do this that isn't a numpy array, I'd take that as an answer too:
results = np.zeros((100,2))
# Loop over search range, change kappa each time
for i in range(100):
results[i,0] = function1(x)
results[i,1] = function2(y)
How about
a[np.argmin(a[:, 1]), 0]
Break-down
a. Grab the second column
>>> a[:, 1]
array([ 4, 1, 10, 8])
b. Get the index of the minimum element in the second column
>>> np.argmin(a[:, 1])
1
c. Index a with that to get the corresponding row
>>> a[np.argmin(a[:, 1])]
array([2, 1])
d. And take the first element
>>> a[np.argmin(a[:, 1]), 0]
2
Using np.argmin is probably the best way to tackle this. To do it in pure python, you could use:
min(tuple(r[::-1]) for r in a)[::-1]