I would like to apply a function to each of the 3x3 matrices in my (6890,6890,3,3) numpy array. Until now, I have tried using vectorization on a smaller example and with a simpler function which didn't work out.
def myfunc(x):
return np.linalg.norm(x)
m = np.arange(45).reshape(5,3,3)
t = m.shape[0]
r = np.zeros((t, t))
q = m[:,None,...] # m.swapaxes(1,2) # m[i] # m[j].T
f = np.vectorize(q, otypes=[np.float])
res = myfunc(f)
Is vectorization even the right approach to solve this problem efficiently or should I try something else? I've also looked into numpy.apply_along_axis but this only applies to 1D-subarrays.
You need loop over each element and apply function:
import numpy as np
# setup function
def myfunc(x):
return np.linalg.norm(x*2)
# setup data array
data = np.arange(45).reshape(5, 3, 3)
# loop over elements and update
for item in np.nditer(data, op_flags = ['readwrite']):
item[...] = myfunc(item)
If you need apply function for entire 3x3 array then use:
out_data = []
for item in data:
out_data.append(myfunc(item))
Output:
[14.2828568570857, 39.761790704142086, 66.4529909033446, 93.32202312423365, 120.24974012445931]
I have an numpy array M of dimension NxM and a dataframe tmp containing the information of the cell of the array.
If I have to add values to the cell of M, I do
M[tmp.a, tmp.b] = tmp1.n
However I would like to add the values only to those cells in which M < tmp.n, something like
M[M[tmp.a, tmp.b] < tmp1.n] = tmp1.n
I solved in this way
s = shape(M)
M0 = np.zeros((s[1], s[0]))
M0[tmp1.a, tmp1.b] += tmp1.n
idx = np.where(M < M0)
M[idx[:][0], idx[:][1]] = M0[idx[:][0], idx[:][1]]
If I understood you correctly you may do something like:
M[tmp.a, tmp.b] = max(tmp1.n, M[tmp.a, tmp.b])
This can be done using Numpy logical indexing
# a logical (boolean) array
log = M < tmp.n
# apply it to source and target and use `+=` to add the values
M[log] += tmp.n[log]
If the arrays don't have the same shape then you can also pick a specific dimension:
log = M[:, 0] < tmp.n
# apply it to source and target and use `+=` to add the values
M[log, 0] += tmp.n[log]
Suppose I have the following two arrays:
>>> a = np.random.normal(size=(5,))
>>> a
array([ 1.42185826, 1.85726088, -0.18968258, 0.55150255, -1.04356681])
>>> b = np.random.normal(size=(10,10))
>>> b
array([[ 0.64207828, -1.08930317, 0.22795289, 0.13990505, -0.9936441 ,
1.07150754, 0.1701072 , 0.83970818, -0.63938211, -0.76914925],
[ 0.07776129, -0.37606964, -0.54082077, 0.33910246, 0.79950839,
0.33353221, 0.00967273, 0.62224009, -0.2007335 , -0.3458876 ],
[ 2.08751603, -0.52128218, 1.54390634, 0.96715102, 0.799938 ,
0.03702108, 0.36095493, -0.13004965, -1.12163463, 0.32031951],
[-2.34856521, 0.11583369, -0.0056261 , 0.80155082, 0.33421475,
-1.23644508, -1.49667424, -1.01799365, -0.58232326, 0.404464 ],
[-0.6289335 , 0.63654201, -1.28064055, -1.01977467, 0.86871352,
0.84909353, 0.33036771, 0.2604609 , -0.21102014, 0.78748329],
[ 1.44763687, 0.84205291, 0.76841512, 1.05214051, 2.11847126,
-0.7389102 , 0.74964783, -1.78074088, -0.57582084, -0.67956203],
[-1.00599479, -0.93125754, 1.43709533, 1.39308038, 1.62793589,
-0.2744919 , -0.52720952, -0.40644809, 0.14809867, -1.49267633],
[-1.8240385 , -0.5416585 , 1.10750423, 0.56598464, 0.73927224,
-0.54362927, 0.84243497, -0.56753587, 0.70591902, -0.26271302],
[-1.19179547, -1.38993415, -1.99469983, -1.09749452, 1.28697997,
-0.74650318, 1.76384156, 0.33938808, 0.61647274, -0.42166111],
[-0.14147554, -0.96192206, 0.14434349, 1.28437894, -0.38865447,
-1.42540195, 0.93105528, 0.28993325, -1.16119916, -0.58244758]])
I have to find a way to round all values from b to the nearest value found in a.
Does anyone know of a good way to do this with python? I am at a total loss myself.
Here is something you can try
import numpy as np
def rounder(values):
def f(x):
idx = np.argmin(np.abs(values - x))
return values[idx]
return np.frompyfunc(f, 1, 1)
a = np.random.normal(size=(5,))
b = np.random.normal(size=(10,10))
rounded = rounder(a)(b)
print(rounded)
The rounder function takes the values which we want to round to. It creates a function which takes a scalar and returns the closest element from the values array. We then transform this function to a broadcast-able function using numpy.frompyfunc. This way you are not limited to using this on 2d arrays, numpy automatically does broadcasting for you without any loops.
If you sort a you can use bisect to find the index in array a where each element from the sub arrays of array b would land:
import numpy as np
from bisect import bisect
a = np.random.normal(size=(5,))
b = np.random.normal(size=(10, 10))
a.sort()
size = a.size
for sub in b:
for ind2, ele in enumerate(sub):
i = bisect(a, ele, hi=size-1)
i1, i2 = a[i], a[i-1]
sub[ind2] = i1 if abs(i1 - ele) < abs(i2 - ele) else i2
Assuming a will always be 1 dimensional, and that b can have any dimension in this solution.
Create two temporary arrays tiling a and b into the dimensions of the other (here both will now have a shape of (5,10,10)).
at = np.tile(np.reshape(a, (-1, *list(np.ones(len(b.shape)).astype(int)))), (1, *b.shape))
bt = np.tile(b, (a.size, *list(np.ones(len(b.shape)).astype(int))))
For the nearest operation, you can take the absolute value of the difference between the two. The minimum value of that operation in the first dimension (dimension 0) gives the index in the a array.
idx = np.argmin(np.abs(at-bt),axis=0)
All that is left is to select the values from array a using the index, which will return an array in the shape of b with the nearest values from a.
ans = a[idx]
This method can also be used (modifying how the index is calculated) to do other operations, such as a floor, ceil, etc.
Note that this solution can be memory intensive, which is not much of an issue with small arrays. A looping solution could be less memory intensive at the cost of speed.
I don't know Numpy, but I don't think knowledge of Numpy is needed to be able to answer this question. Assuming that an array can be iterated and modified in the same way as a list, the following code solves your problem by using a nested loop to find the closest value.
for i in range(len(b)):
for k in range(len(b[i])):
closest = a[0]
for j in range(1, len(a)):
if abs(a[j] - b[i][k]) < abs(closest - b[i][k]):
closest = a[j]
b[i][k] = closest
Disclaimer: a more pythonic approach may exist.
I have a function which gives me the index for a given value. Eg,
def F(value):
index = do_something(value)
return index
I want to use this index to fill a huge numpy array by 1s. Lets call array features
l = [1,4,2,3,7,5,3,6,.....]
NOTE: features.shape[0] = len(l)
for i in range(features.shape[0]):
idx = F(l[i])
features[i, idx] = 1
Is there a pythonic way to perform this (as the loop takes a lot of time if the array is huge)?
If you can vectorize F(value) you could write something like
indices = np.arange(features.shape[0])
feature_indices = F(l)
features.flat[indices, feature_indices] = 1
try this:
i = np.arange(features.shape[0]) # rows
j = np.vectorize(F)(np.array(l)) # columns
features[i,j] = 1
I wrote a small script to assign values to a numpy array by knowing their row and column coordinates:
gridarray = np.zeros([3,3])
gridarray_counts = np.zeros([3,3])
cols = np.random.random_integers(0,2,15)
rows = np.random.random_integers(0,2,15)
data = np.random.random_integers(0,9,15)
for nn in np.arange(len(data)):
gridarray[rows[nn],cols[nn]] += data[nn]
gridarray_counts[rows[nn],cols[nn]] += 1
In fact, then I know how many values are stored in the same grid cell and what the sum is of them. However, performing this on arrays of lengths 100000+ it is getting quite slow. Is there another way without using a for-loop?
Is an approach similar to this possible? I know this is not working yet.
gridarray[rows,cols] += data
gridarray_counts[rows,cols] += 1
I would use bincount for this, but for now bincount only takes 1darrays so you'll need to write your own ndbincout, something like:
def ndbincount(x, weights=None, shape=None):
if shape is None:
shape = x.max(1) + 1
x = np.ravel_multi_index(x, shape)
out = np.bincount(x, weights, minlength=np.prod(shape))
out.shape = shape
return out
Then you can do:
gridarray = np.zeros([3,3])
cols = np.random.random_integers(0,2,15)
rows = np.random.random_integers(0,2,15)
data = np.random.random_integers(0,9,15)
x = np.vstack([rows, cols])
temp = ndbincount(x, data, gridarray.shape)
gridarray = gridarray + temp
gridarray_counts = ndbincount(x, shape=gridarray.shape)
You can do this directly:
gridarray[(rows,cols)]+=data
gridarray_counts[(rows,cols)]+=1