Related
Given 2 lists of arrays (or 2 3D arrays) is there a smarter way in numpy, besides a loop, to get the multiplication of the first array of the first list times the first array of the second list and so on? I have a feeling I am overlooking the obvious. This is my current implementation:
import numpy as np
r = []
for i in range(np.shape(rz)[2]):
r.append(ry[..., i] # rz[..., i])
r = np.array(r)
Assuming that the last dimension is the same, numpy.einsum should do the trick:
import numpy as np
np.einsum('ijk,jmk-> imk', ry, rz)
import numpy as np
A = np.array([[3, 6, 7], [5, -3, 0]])
B = np.array([[1, 1], [2, 1], [3, -3]])
C = A.dot(B)
print(C)
Output:
[[ 36 -12] [ -1 2]]
I have a list of numbers which I wish to add a second column such that the array becomes 2D like in the example below:
a = [1,1,1,1,1]
b = [2,2,2,2,2]
should become:
c = [[1,2],[1,2],[1,2],[1,2],[1,2]]
I am not sure how to do this using numpy?
I would just stack them and then transpose the resulting array with .T:
import numpy as np
a = np.array([1, 1, 1, 1, 1])
b = np.array([2, 2, 2, 2, 2])
c = np.stack((a, b)).T
Use numpy built-in functions:
import numpy as np
c = np.vstack((np.array(a),np.array(b))).T.tolist()
np.vstack stacks arrays vertically. .T transposes the array and tolist() converts it back to a list.
Another similar way to do it, is to add a dimensions using [:,None] and then you can horizontally stack them without the need to transpose:
c = np.hstack((np.array(a)[:,None],np.array(b)[:,None])).tolist())
output:
[[1, 2], [1, 2], [1, 2], [1, 2], [1, 2]]
I have a NumPy array with each row representing some (x, y, z) coordinate like so:
a = array([[0, 0, 1],
[1, 1, 2],
[4, 5, 1],
[4, 5, 2]])
I also have another NumPy array with unique values of the z-coordinates of that array like so:
b = array([1, 2])
How can I apply a function, let's call it "f", to each of the groups of rows in a which correspond to the values in b? For example, the first value of b is 1 so I would get all rows of a which have a 1 in the z-coordinate. Then, I apply a function to all those values.
In the end, the output would be an array the same shape as b.
I'm trying to vectorize this to make it as fast as possible. Thanks!
Example of an expected output (assuming that f is count()):
c = array([2, 2])
because there are 2 rows in array a which have a z value of 1 in array b and also 2 rows in array a which have a z value of 2 in array b.
A trivial solution would be to iterate over array b like so:
for val in b:
apply function to a based on val
append to an array c
My attempt:
I tried doing something like this, but it just returns an empty array.
func(a[a[:, 2]==b])
The problem is that the groups of rows with the same Z can have different sizes so you cannot stack them into one 3D numpy array which would allow to easily apply a function along the third dimension. One solution is to use a for-loop, another is to use np.split:
a = np.array([[0, 0, 1],
[1, 1, 2],
[4, 5, 1],
[4, 5, 2],
[4, 3, 1]])
a_sorted = a[a[:,2].argsort()]
inds = np.unique(a_sorted[:,2], return_index=True)[1]
a_split = np.split(a_sorted, inds)[1:]
# [array([[0, 0, 1],
# [4, 5, 1],
# [4, 3, 1]]),
# array([[1, 1, 2],
# [4, 5, 2]])]
f = np.sum # example of a function
result = list(map(f, a_split))
# [19, 15]
But imho the best solution is to use pandas and groupby as suggested by FBruzzesi. You can then convert the result to a numpy array.
EDIT: For completeness, here are the other two solutions
List comprehension:
b = np.unique(a[:,2])
result = [f(a[a[:,2] == z]) for z in b]
Pandas:
df = pd.DataFrame(a, columns=list('XYZ'))
result = df.groupby(['Z']).apply(lambda x: f(x.values)).tolist()
This is the performance plot I got for a = np.random.randint(0, 100, (n, 3)):
As you can see, approximately up to n = 10^5 the "split solution" is the fastest, but after that the pandas solution performs better.
If you are allowed to use pandas:
import pandas as pd
df=pd.DataFrame(a, columns=['x','y','z'])
df.groupby('z').agg(f)
Here f can be any custom function working on grouped data.
Numeric example:
a = np.array([[0, 0, 1],
[1, 1, 2],
[4, 5, 1],
[4, 5, 2]])
df=pd.DataFrame(a, columns=['x','y','z'])
df.groupby('z').size()
z
1 2
2 2
dtype: int64
Remark that .size is the way to count number of rows per group.
To keep it into pure numpy, maybe this can suit your case:
tmp = np.array([a[a[:,2]==i] for i in b])
tmp
array([[[0, 0, 1],
[4, 5, 1]],
[[1, 1, 2],
[4, 5, 2]]])
which is an array with each group of arrays.
c = np.array([])
for x in np.nditer(b):
c = np.append(c, np.where((a[:,2] == x))[0].shape[0])
Output:
[2. 2.]
I wish to get the element index that is the highest in the array.
import numpy as np
a = np.array([1,2,3,4])
print(np.where(a==a.max()))
Current output:
(array([3]),)
Expected output:
3
Use argmax that returns the indices of the maximum values along an axis:
np.argmax(a)
3
As you don't supply the axis it will return the index of flattened array:
a = np.array([[1, 2, 3, 4], [2, 3, 3, 9]])
np.argmax(a)
7
You can use np.argmax(). It will return the index of the highest value in your array.
For more deatils on the function here is a link to the documentation.
np.argmax() also works for 2D-arrys:
a = array([[10, 11, 12],
[13, 14, 15]])
np.argmax(a)
>>> 5
np.argmax(a, axis=0)
>>> array([1, 1, 1])
np.argmax(a, axis=1)
>>> array([2, 2])
Try this, it will return the value of the largest element in the array
import numpy as np
a = np.array([1,2,3,4])
print(np.argmax(a))
I have the following code:
import numpy as np
sample = np.random.random((10,10,3))
argmax_indices = np.argmax(sample, axis=2)
i.e. I take the argmax along axis=2 and it gives me a (10,10) matrix. Now, I want to assign these indices value 0. For this, I want to index the sample array. I tried:
max_values = sample[argmax_indices]
but it doesn't work. I want something like
max_values = sample[argmax_indices]
sample[argmax_indices] = 0
I simply validate by checking that max_values - np.max(sample, axis=2) should give a zero matrix of shape (10,10).
Any help will be appreciated.
Here's one approach -
m,n = sample.shape[:2]
I,J = np.ogrid[:m,:n]
max_values = sample[I,J, argmax_indices]
sample[I,J, argmax_indices] = 0
Sample step-by-step run
1) Sample input array :
In [261]: a = np.random.randint(0,9,(2,2,3))
In [262]: a
Out[262]:
array([[[8, 4, 6],
[7, 6, 2]],
[[1, 8, 1],
[4, 6, 4]]])
2) Get the argmax indices along axis=2 :
In [263]: idx = a.argmax(axis=2)
3) Get the shape and arrays for indexing into first two dims :
In [264]: m,n = a.shape[:2]
In [265]: I,J = np.ogrid[:m,:n]
4) Index using I, J and idx for storing the max values using advanced-indexing :
In [267]: max_values = a[I,J,idx]
In [268]: max_values
Out[268]:
array([[8, 7],
[8, 6]])
5) Verify that we are getting an all zeros array after subtracting np.max(a,axis=2) from max_values :
In [306]: max_values - np.max(a, axis=2)
Out[306]:
array([[0, 0],
[0, 0]])
6) Again using advanced-indexing assign those places as zeros and do one more level of visual verification :
In [269]: a[I,J,idx] = 0
In [270]: a
Out[270]:
array([[[0, 4, 6], # <=== Compare this against the original version
[0, 6, 2]],
[[1, 0, 1],
[4, 0, 4]]])
An alternative to np.ogrid is np.indices.
I, J = np.indices(argmax_indices.shape)
sample[I,J,argmax_indices] = 0
This can also be generalized to handle matrices of any dimensionality. The resulting function will set the largest value in every 1-d vector of the matrix along any dimension d desired (dimension 2 in the case of the original question) to 0 (or to whatever value is desired):
def set_zero(sample, d, val):
"""Set all max value along dimension d in matrix sample to value val."""
argmax_idxs = sample.argmax(d)
idxs = [np.indices(argmax_idxs.shape)[j].flatten() for j in range(len(argmax_idxs.shape))]
idxs.insert(d, argmax_idxs.flatten())
sample[idxs] = val
return sample
set_zero(sample, d=2, val=0)
(Tested for numpy 1.14.1 on python 3.6.4 and python 2.7.14)