I have a NumPy array made of ragged nested sequences such as the following:
arr = np.array((
np.random.random((2, 2, 2)),
np.random.random((4, 4, 4)),
np.random.random((2, 2, 2))
))
I want to resize each of the nested arrays to the shape (4, 4, 4) by filling it with zeros.
I initially looked at this post numpy - resize array filling with 0 which works for 2D NumPy arrays but, I have struggled to modify it for a 3D NumPy array.
So far I have tried iterating over the individual nested arrays however, even with some fairly basic code such as
for i, a in enumerate(arr[0]):
arr[0][i] = np.hstack([a, np.zeros([a.shape[0], 2])])
It still creates an error.
ValueError: could not broadcast input array from shape (2,4) into shape (2,2)
I could create separate variables for every nested array except this feels very slow and inefficient and I'd need even messier code to extend this to all 3 dimensions.
An example of a test:
arr = [[[0.1, 0.4],
[0.3, 0,7]],
[[0.5, 0.2],
[0.8, 0.1]]]
If I wanted it to have the shape (2, 3, 4) the output would be the following
[[[0.1, 0.4, 0.0, 0.0],
[0.3, 0,7, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0]],
[[0.5, 0.2, 0.0, 0.0],
[0.8, 0.1, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0]]]
UPDATE:
Don't even need to use pad then:
def pad_3d(arr: np.ndarray, out_shape: tuple[int, int, int]) -> np.ndarray:
x, y, z = arr.shape
output = np.zeros(out_shape, dtype=arr.dtype)
output[:x, :y, :z] = arr
return output
test_arr = np.array(
[[[0.1, 0.4],
[0.3, 0.7]],
[[0.5, 0.2],
[0.8, 0.1]]]
)
desired_shape = (2, 3, 4)
expected_output = np.array(
[[[0.1, 0.4, 0.0, 0.0],
[0.3, 0.7, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0]],
[[0.5, 0.2, 0.0, 0.0],
[0.8, 0.1, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0]]]
)
assert np.all(expected_output == pad_3d(test_arr, desired_shape)) # True
Original answer:
It's not entirely clear how you want to fill the resulting arrays with zeros around your data. Only on one side along each axis? Or do you want to essentially "center" your original data amidst the zeros?
Either way, I see no way around creating new arrays. The pad function does what you want, I think. Here is a simplified example for one array, where I "pad around" the data:
import numpy as np
a = np.arange(2*2*2).reshape((2, 2, 2))
x = np.pad(a, 0)
If you want to pad on one side with zeros:
x = np.pad(a, (0, 2))
Assuming your arrays are always cubic, i.e. of the shape (n, n, n), you can generalize like this:
def pad_with_zeros(arr, target_size):
return np.pad(arr, (0, target_size - arr.shape[0]))
IIUC, here is one way to do it:
Assuming your arr is actually a list or a tuple:
arr = (
np.random.random((2, 2, 2)),
np.random.random((4, 4, 4)),
np.random.random((2, 2, 2)),
)
# new shape: max length in each dimension:
shape = np.c_[[x.shape for x in arr]].max(0)
>>> shape
array([4, 4, 4])
# pad all arrays
new = [np.pad(x, np.c_[[0]*len(shape), shape - x.shape]) for x in arr]
>>> new[0].shape
(4, 4, 4)
>>> new[0]
array([[[0.5488135 , 0.71518937, 0. , 0. ],
[0.60276338, 0.54488318, 0. , 0. ],
[0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. ]],
[[0.4236548 , 0.64589411, 0. , 0. ],
[0.43758721, 0.891773 , 0. , 0. ],
[0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. ]],
[[0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. ]],
[[0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. ]]])
Related
I have N matrices with dimensions R x R and one 'Weight matrix' with dimension R x N.
Now I want to combine those N matrices row-wise by weighting them with the 'Weight matrix'. In the end I want a R x R matrix.
Let me show you an example:
In the following example my initial matrices are a and b and my weight matrix is c. The desired output is matrix r.
The first row of r is the first row of a, because c[0,0] is 1 and c[0,1] is 0, so we just consider the first row of matrix a.
The second row of r is a weighted average of row 2 from both matrix a and b (because c[1,0]= 0.5 and c[1,1] = 0.5).
The third row of r is the third row of b, because c[2,0] is 0 and c[2,1] is 1, so we just consider the third row of matrix b.
How can I do this in Python (preferable with a numpy function)?
We can use np.einsum -
In [57]: A # 3D input array
Out[57]:
array([[[0.2, 0. , 0.8],
[0. , 0. , 1. ],
[0. , 0.2, 0.8]],
[[1. , 0. , 0. ],
[0. , 0.2, 0.8],
[0.2, 0. , 0.8]]])
In [58]: c # 2D weight array
Out[58]:
array([[1. , 0. ],
[0.5, 0.5],
[0. , 1. ]])
In [59]: np.einsum('ijk,ji->jk',A,c)
Out[59]:
array([[0.2, 0. , 0.8],
[0. , 0.1, 0.9],
[0.2, 0. , 0.8]])
Alternatively with np.matmul -
In [142]: (np.matmul(A.transpose(1,2,0),c[...,None]))[...,0]
Out[142]:
array([[0.2, 0. , 0.8],
[0. , 0.1, 0.9],
[0.2, 0. , 0.8]])
Note : On Python 3.x np.matmul could be replaced by # operator.
I must build a list of 3x2 value combinations with all possible values between 0.0 and 1.0 by the step size given (for now it’s 1/3).
The output should be [ [[v1, v2], [v3, v4], [v5, v6]], ... ] where every v is a value between 0.0 and 1.0, e.g.:
[ [[0.0, 0.0], [0.0, 0.0], [0.0, 0.0]],
[[0.0, 0.0], [0.0, 0.0], [0.0, 0.33]],
[[0.0, 0.0], [0.0, 0.0], [0.0, 0.66]],
[[0.0, 0.0], [0.0, 0.0], [0.0, 1.0]],
[[0.0, 0.0], [0.0, 0.0], [0.33, 0.0]],
[[0.0, 0.0], [0.0, 0.0], [0.33, 0.33]],
...,
[[1.0, 1.0], [1.0, 1.0], [1.0, 1.0]] ]
So far I have:
step = 1.0/3.0
lexica = []
for num1 in numpy.arange(0.0, 1.0, step):
for num2 in numpy.arange(0.0, 1.0, step):
for num3 in numpy.arange(0.0, 1.0, step):
for num4 in numpy.arange(0.0, 1.0, step):
for num5 in numpy.arange(0.0, 1.0, step):
for num6 in numpy.arange(0.0, 1.0, step):
lexica.append([[num1, num2],[num3, num4],[num5, num6]])
This doesn't get 1.0 for the highest value and knowing Python there’s got to be a better way of writing this.
You can use numpy.mgrid and manipulate it to give you the output you want
np.mgrid[0:1:step, 0:1:step, 0:1:step, 0:1:step, 0:1:step, 0:1:step].T.reshape(-1, 3, 2)
EDIT:
A bit more extensible method that fixes the endpoints:
def myMesh(nSteps, shape = (3, 2)):
c = np.prod(shape)
x = np.linspace(0, 1, nSteps + 1)
return np.array(np.meshgrid(*(x,)*c)).T.reshape((-1, ) + shape)
myMesh(3)
array([[[ 0. , 0. ],
[ 0. , 0. ],
[ 0. , 0. ]],
[[ 0. , 0.33333333],
[ 0. , 0. ],
[ 0. , 0. ]],
[[ 0. , 0.66666667],
[ 0. , 0. ],
[ 0. , 0. ]],
...,
[[ 1. , 0.33333333],
[ 1. , 1. ],
[ 1. , 1. ]],
[[ 1. , 0.66666667],
[ 1. , 1. ],
[ 1. , 1. ]],
[[ 1. , 1. ],
[ 1. , 1. ],
[ 1. , 1. ]]])
this is what you could do without numpy:
from itertools import product
ret = []
for a, b, c, d, e, f in product(range(4), repeat=6):
ret.append([[a/3, b/3], [c/3, d/3], [e/3, f/3]])
or even as a list comprehension:
ret = [[[a/3, b/3], [c/3, d/3], [e/3, f/3]]
for a, b, c, d, e, f in product(range(4), repeat=6)]
You can use itertools.combinations_with_replacement in order to accomplish that task:
>>> from itertools import combinations_with_replacement as cwr
>>> cwr(cwr(numpy.linspace(0, 1, 4), 2), 3)
cwr(numpy.linspace(0, 1, 4), 2) creates all possible combinations of length 2 from the elements of numpy.linspace(0, 1, 4) (which are 0, 1/3, 2/3, 1). The outer call cwr(..., 3) then creates all possible length 3 tuples from the previous 2-tuples, resulting in your 3x2 elements.
I have an nd array that looks as follows:
[[ 0. 1.73205081 6.40312424 7.21110255 2.44948974]
[ 1.73205081 0. 5.09901951 5.91607978 1. ]
[ 6.40312424 5.09901951 0. 1. 4.35889894]
[ 7.21110255 5.91607978 1. 0. 5.09901951]
[ 2.44948974 1. 4.35889894 5.09901951 0. ]]
Each element in this array is a distance and I need to turn this into a list with the row,col,distance as follows:
l = [(0,0,0),(0,1, 1.73205081),(0,2, 6.40312424),...,(1,0, 1.73205081),(1,1,0),...,(4,4,0)]
Additionally, it would be cool to remove the diagonal elements and also the elements (j,i) as (i,j) are already there. Essentially, is it possible to take just the top triangular matrix of this?
Is this possible to do efficiently (without a lot of loops)? I had created this array with squareform, but couldn't find any docs to do this.
squareform does all this. Read the docs and experiment. It works in both directions. If you give it a matrix it returns the upper triangle values (condensed form). If you give it those values, it returns the matrix.
In [668]: M
Out[668]:
array([[ 0. , 0.1, 0.5, 0.2],
[ 0.1, 0. , 2. , 0.3],
[ 0.5, 2. , 0. , 0.2],
[ 0.2, 0.3, 0.2, 0. ]])
In [669]: spatial.distance.squareform(M)
Out[669]: array([ 0.1, 0.5, 0.2, 2. , 0.3, 0.2])
In [670]: v=spatial.distance.squareform(M)
In [671]: v
Out[671]: array([ 0.1, 0.5, 0.2, 2. , 0.3, 0.2])
In [672]: spatial.distance.squareform(v)
Out[672]:
array([[ 0. , 0.1, 0.5, 0.2],
[ 0.1, 0. , 2. , 0.3],
[ 0.5, 2. , 0. , 0.2],
[ 0.2, 0.3, 0.2, 0. ]])
You can also specify a force and checks parameter, but without those it just goes by the shape.
Indicies can come from triu
In [677]: np.triu_indices(4,1)
Out[677]:
(array([0, 0, 0, 1, 1, 2], dtype=int32),
array([1, 2, 3, 2, 3, 3], dtype=int32))
In [680]: np.vstack((np.triu_indices(4,1),v)).T
Out[680]:
array([[ 0. , 1. , 0.1],
[ 0. , 2. , 0.5],
[ 0. , 3. , 0.2],
[ 1. , 2. , 2. ],
[ 1. , 3. , 0.3],
[ 2. , 3. , 0.2]])
Just to check, we can fill in a 4x4 matrix with these values
In [686]: A=np.vstack((np.triu_indices(4,1),v)).T
In [687]: MM = np.zeros((4,4))
In [688]: MM[A[:,0].astype(int),A[:,1].astype(int)]=A[:,2]
In [689]: MM
Out[689]:
array([[ 0. , 0.1, 0.5, 0.2],
[ 0. , 0. , 2. , 0.3],
[ 0. , 0. , 0. , 0.2],
[ 0. , 0. , 0. , 0. ]])
Those triu indices can also fetch the values from M:
In [693]: I,J = np.triu_indices(4,1)
In [694]: M[I,J]
Out[694]: array([ 0.1, 0.5, 0.2, 2. , 0.3, 0.2])
squareform uses compiled code in spatial.distance._distance_wrap so I expect it will be quite fast for large arrays. Only problem it just returns the condensed form values, but not the indices. But given the shape,the indices can always be calculated. They don't need to be stored with the values.
If your input is x, first generate the indices:
i0,i1 = np.indices(x.shape)
Then:
np.concatenate((i1,i0,x)).reshape(3,5,5).T
That gives you the first result--for the entire matrix.
As for taking only the upper triangle, you might considering trying np.triu() but I'm not sure exactly what result you're looking for. You can probably figure out how to mask the parts you don't want now though.
you can try this,
print([(x,y, value) for (x,y), value in np.ndenumerate(numpymatrixarray)])
output [(0, 0, 0.0), (0, 1, 1.7320508100000001), (0, 2, 6.4031242400000004), (0, 3, 7.2111025499999997), (0, 4, 2.4494897400000002), (1, 0, 1.7320508100000001), (1, 1, 0.0), (1, 2, 5.0990195099999998), (1, 3, 5.9160797799999996), (1, 4, 1.0), (2, 0, 6.4031242400000004), (2, 1, 5.0990195099999998), (2, 2, 0.0), (2, 3, 1.0), (2, 4, 4.3588989400000004), (3, 0, 7.2111025499999997), (3, 1, 5.9160797799999996), (3, 2, 1.0), (3, 3, 0.0), (3, 4, 5.0990195099999998), (4, 0, 2.4494897400000002), (4, 1, 1.0), (4, 2, 4.3588989400000004), (4, 3, 5.0990195099999998), (4, 4, 0.0)]
Do you really want the top triangular matrix for an [nxm] matrix where n>m? That will give you (nxn-n)/2 elements and lose all the data where m⊖n.
What you probably want is the lower triangular matrix:
def tri_reduce(m):
n=m.shape
if n[0]>n[1]:
i=np.tril_indices(n[0],1,n[1])
else:
i=np.triu_indices(n[0],1,n[1])
return np.vstack((i,m[i])).T
Rebuilding it into a list of tuples would require a loop though I believe. list(tri_reduce(m)) would give a list of nd arrays.
I'm working with slices of a 2D numpy array. To select the slices, I have the indices stored in arrays. For example, I have:
mat = np.zeros([xdim,ydim], float)
xmin = np.array([...]) # Array of minimum indices in x
xmax = np.array([...]) # Array of maximum indices in x
ymin = np.array([...]) # Array of minimum indices in y
ymax = np.array([...]) # Array of maximum indices in y
value = np.array([...]) # Values
Where ... just denotes some integer numbers previously calculated. All arrays are well-defined and have lengths of ~265000. What I want to do is something like:
mat[xmin:xmax, ymin:ymax] += value
In such a way that for the first elements I would have:
mat[xmin[0]:xmax[0], ymin[0]:ymax[0]] += value[0]
mat[xmin[1]:xmax[1], ymin[1]:ymax[1]] += value[1]
and so on, for the ~265000 elements of the array. Unfortunately what I just wrote is not working, and it is throwing the error: IndexError: invalid slice.
I've been trying to use np.meshgrid as suggested here: NumPy: use 2D index array from argmin in a 3D slice, but it hasn't worked for me yet. Besides, I'm looking for a pythonic way to do so, avoiding the for loops.
Any help will be much appreciated!
Thanks!
I don't think there is a satisfactory way of vectorizing your problem without resorting to Cython or the like. Let me outline what a pure numpy solution could look like, which should make clear why this is probably not a very good approach.
First, lets look at a 1D case. There's not much you can do with a bunch of slices in numpy, so the first task is to expand them into individual indices. Say that your arrays were:
mat = np.zeros((10,))
x_min = np.array([2, 5, 3, 1])
x_max = np.array([5, 9, 8, 7])
value = np.array([0.2, 0.6, 0.1, 0.9])
Then the following code expands the slice limits into lists of (possibly repeating) indices and values, joins them together with bincount, and adds them to the original mat:
x_len = x_max - x_min
x_cum_len = np.cumsum(x_len)
x_idx = np.arange(x_cum_len[-1])
x_idx[x_len[0]:] -= np.repeat(x_cum_len[:-1], x_len[1:])
x_idx += np.repeat(x_min, x_len)
x_val = np.repeat(value, x_len)
x_cumval = np.bincount(x_idx, weights=x_val)
mat[:len(x_cumval)] += x_cumval
>>> mat
array([ 0. , 0.9, 1.1, 1.2, 1.2, 1.6, 1.6, 0.7, 0.6, 0. ])
It is possible to expand this to your 2D case, although it is anything but trivial, and things start getting hard to follow:
mat = np.zeros((10, 10))
x_min = np.array([2, 5, 3, 1])
x_max = np.array([5, 9, 8, 7])
y_min = np.array([1, 7, 2, 6])
y_max = np.array([6, 8, 6, 9])
value = np.array([0.2, 0.6, 0.1, 0.9])
x_len = x_max - x_min
y_len = y_max - y_min
total_len = x_len * y_len
x_cum_len = np.cumsum(x_len)
x_idx = np.arange(x_cum_len[-1])
x_idx[x_len[0]:] -= np.repeat(x_cum_len[:-1], x_len[1:])
x_idx += np.repeat(x_min, x_len)
x_val = np.repeat(value, x_len)
y_min_ = np.repeat(y_min, x_len)
y_len_ = np.repeat(y_len, x_len)
y_cum_len = np.cumsum(y_len_)
y_idx = np.arange(y_cum_len[-1])
y_idx[y_len_[0]:] -= np.repeat(y_cum_len[:-1], y_len_[1:])
y_idx += np.repeat(y_min_, y_len_)
x_idx_ = np.repeat(x_idx, y_len_)
xy_val = np.repeat(x_val, y_len_)
xy_idx = np.ravel_multi_index((x_idx_, y_idx), dims=mat.shape)
xy_cumval = np.bincount(xy_idx, weights=xy_val)
mat.ravel()[:len(xy_cumval)] += xy_cumval
Which produces:
>>> mat
array([[ 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. , 0. , 0.9, 0.9, 0.9, 0. ],
[ 0. , 0.2, 0.2, 0.2, 0.2, 0.2, 0.9, 0.9, 0.9, 0. ],
[ 0. , 0.2, 0.3, 0.3, 0.3, 0.3, 0.9, 0.9, 0.9, 0. ],
[ 0. , 0.2, 0.3, 0.3, 0.3, 0.3, 0.9, 0.9, 0.9, 0. ],
[ 0. , 0. , 0.1, 0.1, 0.1, 0.1, 0.9, 1.5, 0.9, 0. ],
[ 0. , 0. , 0.1, 0.1, 0.1, 0.1, 0.9, 1.5, 0.9, 0. ],
[ 0. , 0. , 0.1, 0.1, 0.1, 0.1, 0. , 0.6, 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.6, 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ]])
But if you have 265,000 two dimensional slices of arbitrary size, then the indexing arrays are going to get into the many millions of items really fast. Having to handle reading and writing so much data can negate the speed improvements that come with using numpy. Frankly, I doubt this is a good option at all, if nothing else because of how cryptic your code is going to become.
Suppose we had two arrays: some values, e.g. array([1.2, 1.4, 1.6]), and some indices (let's say, array([0, 2, 1])) Our output is expected to be the values put into a bigger array, "addressed" by the indices, so we would get
array([[ 1.2, 0. , 0. ],
[ 0. , 0. , 1.4],
[ 0. , 1.6, 0. ]])
Is there a way to do this without loops, in a nice, fast way?
With
a = zeros((3,3))
b = array([0, 2, 1])
vals = array([1.2, 1.4, 1.6])
You just need to index it (with the help of arange or r_):
>>> a[r_[:len(b)], b] = vals
array([[ 1.2, 0. , 0. ],
[ 0. , 0. , 1.4],
[ 0. , 1.6, 0. ]])
How do we modify this for higher dimensions? For example, a is a 5x4x3 array and b and vals are 5x4 arrays.
then How do we modify the statement a[r_[:len(b)],b] = vals ?