I'm working with slices of a 2D numpy array. To select the slices, I have the indices stored in arrays. For example, I have:
mat = np.zeros([xdim,ydim], float)
xmin = np.array([...]) # Array of minimum indices in x
xmax = np.array([...]) # Array of maximum indices in x
ymin = np.array([...]) # Array of minimum indices in y
ymax = np.array([...]) # Array of maximum indices in y
value = np.array([...]) # Values
Where ... just denotes some integer numbers previously calculated. All arrays are well-defined and have lengths of ~265000. What I want to do is something like:
mat[xmin:xmax, ymin:ymax] += value
In such a way that for the first elements I would have:
mat[xmin[0]:xmax[0], ymin[0]:ymax[0]] += value[0]
mat[xmin[1]:xmax[1], ymin[1]:ymax[1]] += value[1]
and so on, for the ~265000 elements of the array. Unfortunately what I just wrote is not working, and it is throwing the error: IndexError: invalid slice.
I've been trying to use np.meshgrid as suggested here: NumPy: use 2D index array from argmin in a 3D slice, but it hasn't worked for me yet. Besides, I'm looking for a pythonic way to do so, avoiding the for loops.
Any help will be much appreciated!
Thanks!
I don't think there is a satisfactory way of vectorizing your problem without resorting to Cython or the like. Let me outline what a pure numpy solution could look like, which should make clear why this is probably not a very good approach.
First, lets look at a 1D case. There's not much you can do with a bunch of slices in numpy, so the first task is to expand them into individual indices. Say that your arrays were:
mat = np.zeros((10,))
x_min = np.array([2, 5, 3, 1])
x_max = np.array([5, 9, 8, 7])
value = np.array([0.2, 0.6, 0.1, 0.9])
Then the following code expands the slice limits into lists of (possibly repeating) indices and values, joins them together with bincount, and adds them to the original mat:
x_len = x_max - x_min
x_cum_len = np.cumsum(x_len)
x_idx = np.arange(x_cum_len[-1])
x_idx[x_len[0]:] -= np.repeat(x_cum_len[:-1], x_len[1:])
x_idx += np.repeat(x_min, x_len)
x_val = np.repeat(value, x_len)
x_cumval = np.bincount(x_idx, weights=x_val)
mat[:len(x_cumval)] += x_cumval
>>> mat
array([ 0. , 0.9, 1.1, 1.2, 1.2, 1.6, 1.6, 0.7, 0.6, 0. ])
It is possible to expand this to your 2D case, although it is anything but trivial, and things start getting hard to follow:
mat = np.zeros((10, 10))
x_min = np.array([2, 5, 3, 1])
x_max = np.array([5, 9, 8, 7])
y_min = np.array([1, 7, 2, 6])
y_max = np.array([6, 8, 6, 9])
value = np.array([0.2, 0.6, 0.1, 0.9])
x_len = x_max - x_min
y_len = y_max - y_min
total_len = x_len * y_len
x_cum_len = np.cumsum(x_len)
x_idx = np.arange(x_cum_len[-1])
x_idx[x_len[0]:] -= np.repeat(x_cum_len[:-1], x_len[1:])
x_idx += np.repeat(x_min, x_len)
x_val = np.repeat(value, x_len)
y_min_ = np.repeat(y_min, x_len)
y_len_ = np.repeat(y_len, x_len)
y_cum_len = np.cumsum(y_len_)
y_idx = np.arange(y_cum_len[-1])
y_idx[y_len_[0]:] -= np.repeat(y_cum_len[:-1], y_len_[1:])
y_idx += np.repeat(y_min_, y_len_)
x_idx_ = np.repeat(x_idx, y_len_)
xy_val = np.repeat(x_val, y_len_)
xy_idx = np.ravel_multi_index((x_idx_, y_idx), dims=mat.shape)
xy_cumval = np.bincount(xy_idx, weights=xy_val)
mat.ravel()[:len(xy_cumval)] += xy_cumval
Which produces:
>>> mat
array([[ 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. , 0. , 0.9, 0.9, 0.9, 0. ],
[ 0. , 0.2, 0.2, 0.2, 0.2, 0.2, 0.9, 0.9, 0.9, 0. ],
[ 0. , 0.2, 0.3, 0.3, 0.3, 0.3, 0.9, 0.9, 0.9, 0. ],
[ 0. , 0.2, 0.3, 0.3, 0.3, 0.3, 0.9, 0.9, 0.9, 0. ],
[ 0. , 0. , 0.1, 0.1, 0.1, 0.1, 0.9, 1.5, 0.9, 0. ],
[ 0. , 0. , 0.1, 0.1, 0.1, 0.1, 0.9, 1.5, 0.9, 0. ],
[ 0. , 0. , 0.1, 0.1, 0.1, 0.1, 0. , 0.6, 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.6, 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ]])
But if you have 265,000 two dimensional slices of arbitrary size, then the indexing arrays are going to get into the many millions of items really fast. Having to handle reading and writing so much data can negate the speed improvements that come with using numpy. Frankly, I doubt this is a good option at all, if nothing else because of how cryptic your code is going to become.
Related
I have a NumPy array made of ragged nested sequences such as the following:
arr = np.array((
np.random.random((2, 2, 2)),
np.random.random((4, 4, 4)),
np.random.random((2, 2, 2))
))
I want to resize each of the nested arrays to the shape (4, 4, 4) by filling it with zeros.
I initially looked at this post numpy - resize array filling with 0 which works for 2D NumPy arrays but, I have struggled to modify it for a 3D NumPy array.
So far I have tried iterating over the individual nested arrays however, even with some fairly basic code such as
for i, a in enumerate(arr[0]):
arr[0][i] = np.hstack([a, np.zeros([a.shape[0], 2])])
It still creates an error.
ValueError: could not broadcast input array from shape (2,4) into shape (2,2)
I could create separate variables for every nested array except this feels very slow and inefficient and I'd need even messier code to extend this to all 3 dimensions.
An example of a test:
arr = [[[0.1, 0.4],
[0.3, 0,7]],
[[0.5, 0.2],
[0.8, 0.1]]]
If I wanted it to have the shape (2, 3, 4) the output would be the following
[[[0.1, 0.4, 0.0, 0.0],
[0.3, 0,7, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0]],
[[0.5, 0.2, 0.0, 0.0],
[0.8, 0.1, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0]]]
UPDATE:
Don't even need to use pad then:
def pad_3d(arr: np.ndarray, out_shape: tuple[int, int, int]) -> np.ndarray:
x, y, z = arr.shape
output = np.zeros(out_shape, dtype=arr.dtype)
output[:x, :y, :z] = arr
return output
test_arr = np.array(
[[[0.1, 0.4],
[0.3, 0.7]],
[[0.5, 0.2],
[0.8, 0.1]]]
)
desired_shape = (2, 3, 4)
expected_output = np.array(
[[[0.1, 0.4, 0.0, 0.0],
[0.3, 0.7, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0]],
[[0.5, 0.2, 0.0, 0.0],
[0.8, 0.1, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0]]]
)
assert np.all(expected_output == pad_3d(test_arr, desired_shape)) # True
Original answer:
It's not entirely clear how you want to fill the resulting arrays with zeros around your data. Only on one side along each axis? Or do you want to essentially "center" your original data amidst the zeros?
Either way, I see no way around creating new arrays. The pad function does what you want, I think. Here is a simplified example for one array, where I "pad around" the data:
import numpy as np
a = np.arange(2*2*2).reshape((2, 2, 2))
x = np.pad(a, 0)
If you want to pad on one side with zeros:
x = np.pad(a, (0, 2))
Assuming your arrays are always cubic, i.e. of the shape (n, n, n), you can generalize like this:
def pad_with_zeros(arr, target_size):
return np.pad(arr, (0, target_size - arr.shape[0]))
IIUC, here is one way to do it:
Assuming your arr is actually a list or a tuple:
arr = (
np.random.random((2, 2, 2)),
np.random.random((4, 4, 4)),
np.random.random((2, 2, 2)),
)
# new shape: max length in each dimension:
shape = np.c_[[x.shape for x in arr]].max(0)
>>> shape
array([4, 4, 4])
# pad all arrays
new = [np.pad(x, np.c_[[0]*len(shape), shape - x.shape]) for x in arr]
>>> new[0].shape
(4, 4, 4)
>>> new[0]
array([[[0.5488135 , 0.71518937, 0. , 0. ],
[0.60276338, 0.54488318, 0. , 0. ],
[0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. ]],
[[0.4236548 , 0.64589411, 0. , 0. ],
[0.43758721, 0.891773 , 0. , 0. ],
[0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. ]],
[[0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. ]],
[[0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. ]]])
I have N matrices with dimensions R x R and one 'Weight matrix' with dimension R x N.
Now I want to combine those N matrices row-wise by weighting them with the 'Weight matrix'. In the end I want a R x R matrix.
Let me show you an example:
In the following example my initial matrices are a and b and my weight matrix is c. The desired output is matrix r.
The first row of r is the first row of a, because c[0,0] is 1 and c[0,1] is 0, so we just consider the first row of matrix a.
The second row of r is a weighted average of row 2 from both matrix a and b (because c[1,0]= 0.5 and c[1,1] = 0.5).
The third row of r is the third row of b, because c[2,0] is 0 and c[2,1] is 1, so we just consider the third row of matrix b.
How can I do this in Python (preferable with a numpy function)?
We can use np.einsum -
In [57]: A # 3D input array
Out[57]:
array([[[0.2, 0. , 0.8],
[0. , 0. , 1. ],
[0. , 0.2, 0.8]],
[[1. , 0. , 0. ],
[0. , 0.2, 0.8],
[0.2, 0. , 0.8]]])
In [58]: c # 2D weight array
Out[58]:
array([[1. , 0. ],
[0.5, 0.5],
[0. , 1. ]])
In [59]: np.einsum('ijk,ji->jk',A,c)
Out[59]:
array([[0.2, 0. , 0.8],
[0. , 0.1, 0.9],
[0.2, 0. , 0.8]])
Alternatively with np.matmul -
In [142]: (np.matmul(A.transpose(1,2,0),c[...,None]))[...,0]
Out[142]:
array([[0.2, 0. , 0.8],
[0. , 0.1, 0.9],
[0.2, 0. , 0.8]])
Note : On Python 3.x np.matmul could be replaced by # operator.
I have an nd array that looks as follows:
[[ 0. 1.73205081 6.40312424 7.21110255 2.44948974]
[ 1.73205081 0. 5.09901951 5.91607978 1. ]
[ 6.40312424 5.09901951 0. 1. 4.35889894]
[ 7.21110255 5.91607978 1. 0. 5.09901951]
[ 2.44948974 1. 4.35889894 5.09901951 0. ]]
Each element in this array is a distance and I need to turn this into a list with the row,col,distance as follows:
l = [(0,0,0),(0,1, 1.73205081),(0,2, 6.40312424),...,(1,0, 1.73205081),(1,1,0),...,(4,4,0)]
Additionally, it would be cool to remove the diagonal elements and also the elements (j,i) as (i,j) are already there. Essentially, is it possible to take just the top triangular matrix of this?
Is this possible to do efficiently (without a lot of loops)? I had created this array with squareform, but couldn't find any docs to do this.
squareform does all this. Read the docs and experiment. It works in both directions. If you give it a matrix it returns the upper triangle values (condensed form). If you give it those values, it returns the matrix.
In [668]: M
Out[668]:
array([[ 0. , 0.1, 0.5, 0.2],
[ 0.1, 0. , 2. , 0.3],
[ 0.5, 2. , 0. , 0.2],
[ 0.2, 0.3, 0.2, 0. ]])
In [669]: spatial.distance.squareform(M)
Out[669]: array([ 0.1, 0.5, 0.2, 2. , 0.3, 0.2])
In [670]: v=spatial.distance.squareform(M)
In [671]: v
Out[671]: array([ 0.1, 0.5, 0.2, 2. , 0.3, 0.2])
In [672]: spatial.distance.squareform(v)
Out[672]:
array([[ 0. , 0.1, 0.5, 0.2],
[ 0.1, 0. , 2. , 0.3],
[ 0.5, 2. , 0. , 0.2],
[ 0.2, 0.3, 0.2, 0. ]])
You can also specify a force and checks parameter, but without those it just goes by the shape.
Indicies can come from triu
In [677]: np.triu_indices(4,1)
Out[677]:
(array([0, 0, 0, 1, 1, 2], dtype=int32),
array([1, 2, 3, 2, 3, 3], dtype=int32))
In [680]: np.vstack((np.triu_indices(4,1),v)).T
Out[680]:
array([[ 0. , 1. , 0.1],
[ 0. , 2. , 0.5],
[ 0. , 3. , 0.2],
[ 1. , 2. , 2. ],
[ 1. , 3. , 0.3],
[ 2. , 3. , 0.2]])
Just to check, we can fill in a 4x4 matrix with these values
In [686]: A=np.vstack((np.triu_indices(4,1),v)).T
In [687]: MM = np.zeros((4,4))
In [688]: MM[A[:,0].astype(int),A[:,1].astype(int)]=A[:,2]
In [689]: MM
Out[689]:
array([[ 0. , 0.1, 0.5, 0.2],
[ 0. , 0. , 2. , 0.3],
[ 0. , 0. , 0. , 0.2],
[ 0. , 0. , 0. , 0. ]])
Those triu indices can also fetch the values from M:
In [693]: I,J = np.triu_indices(4,1)
In [694]: M[I,J]
Out[694]: array([ 0.1, 0.5, 0.2, 2. , 0.3, 0.2])
squareform uses compiled code in spatial.distance._distance_wrap so I expect it will be quite fast for large arrays. Only problem it just returns the condensed form values, but not the indices. But given the shape,the indices can always be calculated. They don't need to be stored with the values.
If your input is x, first generate the indices:
i0,i1 = np.indices(x.shape)
Then:
np.concatenate((i1,i0,x)).reshape(3,5,5).T
That gives you the first result--for the entire matrix.
As for taking only the upper triangle, you might considering trying np.triu() but I'm not sure exactly what result you're looking for. You can probably figure out how to mask the parts you don't want now though.
you can try this,
print([(x,y, value) for (x,y), value in np.ndenumerate(numpymatrixarray)])
output [(0, 0, 0.0), (0, 1, 1.7320508100000001), (0, 2, 6.4031242400000004), (0, 3, 7.2111025499999997), (0, 4, 2.4494897400000002), (1, 0, 1.7320508100000001), (1, 1, 0.0), (1, 2, 5.0990195099999998), (1, 3, 5.9160797799999996), (1, 4, 1.0), (2, 0, 6.4031242400000004), (2, 1, 5.0990195099999998), (2, 2, 0.0), (2, 3, 1.0), (2, 4, 4.3588989400000004), (3, 0, 7.2111025499999997), (3, 1, 5.9160797799999996), (3, 2, 1.0), (3, 3, 0.0), (3, 4, 5.0990195099999998), (4, 0, 2.4494897400000002), (4, 1, 1.0), (4, 2, 4.3588989400000004), (4, 3, 5.0990195099999998), (4, 4, 0.0)]
Do you really want the top triangular matrix for an [nxm] matrix where n>m? That will give you (nxn-n)/2 elements and lose all the data where m⊖n.
What you probably want is the lower triangular matrix:
def tri_reduce(m):
n=m.shape
if n[0]>n[1]:
i=np.tril_indices(n[0],1,n[1])
else:
i=np.triu_indices(n[0],1,n[1])
return np.vstack((i,m[i])).T
Rebuilding it into a list of tuples would require a loop though I believe. list(tri_reduce(m)) would give a list of nd arrays.
Suppose we had two arrays: some values, e.g. array([1.2, 1.4, 1.6]), and some indices (let's say, array([0, 2, 1])) Our output is expected to be the values put into a bigger array, "addressed" by the indices, so we would get
array([[ 1.2, 0. , 0. ],
[ 0. , 0. , 1.4],
[ 0. , 1.6, 0. ]])
Is there a way to do this without loops, in a nice, fast way?
With
a = zeros((3,3))
b = array([0, 2, 1])
vals = array([1.2, 1.4, 1.6])
You just need to index it (with the help of arange or r_):
>>> a[r_[:len(b)], b] = vals
array([[ 1.2, 0. , 0. ],
[ 0. , 0. , 1.4],
[ 0. , 1.6, 0. ]])
How do we modify this for higher dimensions? For example, a is a 5x4x3 array and b and vals are 5x4 arrays.
then How do we modify the statement a[r_[:len(b)],b] = vals ?
I have a matrix that should have ones on the diagonal but the columns are mixed up.
But I don't know how, without the obvious for loop, to efficiently interchange rows to get unity on the diagonals. I'm not even sure what key I would pass to sort on.
Any suggestions?
You can use numpy's argmax to determine the goal column ordering and reorder your matrix using the argmax results as column indices:
>>> z = numpy.array([[ 0.1 , 0.1 , 1. ],
... [ 1. , 0.1 , 0.09],
... [ 0.1 , 1. , 0.2 ]])
numpy.argmax(z, axis=1)
>>> array([2, 0, 1]) #Goal column indices
z[:,numpy.argmax(z, axis=1)]
>>> array([[ 1. , 0.1 , 0.1 ],
... [ 0.09, 1. , 0.1 ],
... [ 0.2 , 0.1 , 1. ]])
>>> import numpy as np
>>> a = np.array([[ 1. , 0.5, 0.5, 0. ],
... [ 0.5, 0.5, 1. , 0. ],
... [ 0. , 1. , 0. , 0.5],
... [ 0. , 0.5, 0.5, 1. ]])
>>> np.array(sorted(a, cmp=lambda x, y: list(x).index(1) - list(y).index(1)))
array([[ 1. , 0.5, 0.5, 0. ],
[ 0. , 1. , 0. , 0.5],
[ 0.5, 0.5, 1. , 0. ],
[ 0. , 0.5, 0.5, 1. ]])
It actually sorts by rows, not columns (but the result is the same). It works by sorting by the index of the column the 1 is in.