numpy column_stack in loop

numpy column_stack in loop - python

I don't know if the title is apprpriate or not, but let me show you what I want to do,
In [56]: import numpy as np
In [57]: a= np.random.rand(2,2,2); a
Out[57]:
array([[[0.4300565 , 0.82251319],
[0.56113378, 0.83284255]],
[[0.00822414, 0.28256243],
[0.16648411, 0.33381438]]])
In [58]: b=np.random.rand(2); b
Out[58]: array([0.8035224 , 0.09884653])
In [59]: np.stack(( np.column_stack((b,a[:,i,:])) for i in range(a.shape[1])))
Out[59]:
array([[[0.8035224 , 0.4300565 , 0.82251319],
[0.09884653, 0.00822414, 0.28256243]],
[[0.8035224 , 0.56113378, 0.83284255],
[0.09884653, 0.16648411, 0.33381438]]])
So, I want to stack an array as column to an inner axis. Is it possible to do the looping structure more efficiently and concisely in numpy? I tried with numpy insert but could not do it.
EDIT:
another example
In [110]: a= np.random.rand(5,3,3); a
Out[110]:
array([[[0.27506756, 0.82334411, 0.7004287 ],
[0.6834928 , 0.28457133, 0.6275462 ],
[0.49744358, 0.25131814, 0.56422852]],
[[0.82591597, 0.92367306, 0.04652992],
[0.98545051, 0.92813944, 0.14360307],
[0.85454081, 0.8254149 , 0.5637401 ]],
[[0.59545519, 0.41563571, 0.41937218],
[0.90980491, 0.30169504, 0.96630809],
[0.06713389, 0.64357544, 0.12901734]],
[[0.47566444, 0.33476802, 0.26635363],
[0.4678913 , 0.53028241, 0.03112231],
[0.68445959, 0.07113376, 0.86651669]],
[[0.66951982, 0.01827502, 0.43831829],
[0.02798567, 0.36880876, 0.55029074],
[0.40127051, 0.6311474 , 0.51015882]]])
In [111]: b= np.random.rand(5,2); b
Out[111]:
array([[0.01659589, 0.15320541],
[0.79025065, 0.28041334],
[0.56024173, 0.49317082],
[0.28229119, 0.46010724],
[0.72239851, 0.62075004]])
In [112]: np.stack(( np.column_stack((b,a[:,i,:])) for i in range(a.shape[1])))
Out[112]:
array([[[0.01659589, 0.15320541, 0.27506756, 0.82334411, 0.7004287 ],
[0.79025065, 0.28041334, 0.82591597, 0.92367306, 0.04652992],
[0.56024173, 0.49317082, 0.59545519, 0.41563571, 0.41937218],
[0.28229119, 0.46010724, 0.47566444, 0.33476802, 0.26635363],
[0.72239851, 0.62075004, 0.66951982, 0.01827502, 0.43831829]],
[[0.01659589, 0.15320541, 0.6834928 , 0.28457133, 0.6275462 ],
[0.79025065, 0.28041334, 0.98545051, 0.92813944, 0.14360307],
[0.56024173, 0.49317082, 0.90980491, 0.30169504, 0.96630809],
[0.28229119, 0.46010724, 0.4678913 , 0.53028241, 0.03112231],
[0.72239851, 0.62075004, 0.02798567, 0.36880876, 0.55029074]],
[[0.01659589, 0.15320541, 0.49744358, 0.25131814, 0.56422852],
[0.79025065, 0.28041334, 0.85454081, 0.8254149 , 0.5637401 ],
[0.56024173, 0.49317082, 0.06713389, 0.64357544, 0.12901734],
[0.28229119, 0.46010724, 0.68445959, 0.07113376, 0.86651669],
[0.72239851, 0.62075004, 0.40127051, 0.6311474 , 0.51015882]]])

A variation on concatenating is indexed assignment:
For the first example:
In [245]: a=np.arange(8).reshape(2,2,2); b=np.array([100,200])
In [246]: c = np.zeros((2,2,3), a.dtype)
In [247]: c[:,:,0]=b
In [248]: c[:,:,1:]=a.transpose(1,0,2)
In [249]: c
Out[249]:
array([[[100, 0, 1],
[200, 4, 5]],
[[100, 2, 3],
[200, 6, 7]]])
And for the second:
In [250]: a1 = np.arange(5*3*3).reshape(5,3,3)
In [251]: b1 = np.arange(10).reshape(5,2)
In [252]: c1 = np.zeros((3,5,5),a.dtype)
In [253]: c1[:,:,:2]=b1
In [254]: c1[:,:,2:]=a1.transpose(1,0,2)
In [255]: c1
Out[255]:
array([[[ 0, 1, 0, 1, 2],
[ 2, 3, 9, 10, 11],
[ 4, 5, 18, 19, 20],
[ 6, 7, 27, 28, 29],
[ 8, 9, 36, 37, 38]],
[[ 0, 1, 3, 4, 5],
[ 2, 3, 12, 13, 14],
[ 4, 5, 21, 22, 23],
[ 6, 7, 30, 31, 32],
[ 8, 9, 39, 40, 41]],
[[ 0, 1, 6, 7, 8],
[ 2, 3, 15, 16, 17],
[ 4, 5, 24, 25, 26],
[ 6, 7, 33, 34, 35],
[ 8, 9, 42, 43, 44]]])
Deriving the shape of c from a and b is left as an exercise for the reader. :)
np.stack (or np.array) over the iteration on 2nd axis is effectively a partial transpose (or interchange of the first 2 axes):
In [261]: np.stack([a[:,i,:] for i in range(a.shape[1])])
Out[261]:
array([[[0, 1],
[4, 5]],
[[2, 3],
[6, 7]]])
In [262]: a.transpose(1,0,2)
Out[262]:
array([[[0, 1],
[4, 5]],
[[2, 3],
[6, 7]]])
We could also iterate on the first axis, and join on the second with:
In [263]: np.stack(a, axis=1)
Out[263]:
array([[[0, 1],
[4, 5]],
[[2, 3],
[6, 7]]])
A refinement on Ankit's answer using concatenate is:
np.concatenate([np.repeat(b[None,:,None], 2, axis=0), a.transpose(1,0,2)], axis=2)
np.concatenate([np.repeat(b1[None,:,:], 3, axis=0), a1.transpose(1,0,2)], axis=2)

The below code worked for me.
>>> a= np.random.rand(2,2,2); a
array([[[0.52706506, 0.48344319],
[0.79027196, 0.90581149]],
[[0.25930158, 0.59498346],
[0.02164495, 0.63081622]]])
>>> b=np.random.rand(2); b
array([0.96890722, 0.93670425])
>>> a1 = a.transpose(1, 0, 2); a1
array([[[0.52706506, 0.48344319],
[0.25930158, 0.59498346]],
[[0.79027196, 0.90581149],
[0.02164495, 0.63081622]]])
>>> c = np.tile(b, (2, 1)); c
array([[0.43134454, 0.4042494 ],
[0.43134454, 0.4042494 ]])
>>> c = np.expand_dims(c,2); c
array([[[0.43134454],
[0.4042494 ]],
[[0.43134454],
[0.4042494 ]]])
>>> np.concatenate((c, a1), axis=2)
array([[[0.43134454, 0.52706506, 0.48344319],
[0.4042494 , 0.25930158, 0.59498346]],
[[0.43134454, 0.79027196, 0.90581149],
[0.4042494 , 0.02164495, 0.63081622]]])
Here I first repeated b using tile by same number as 2nd dimention of a in a new dimention.
Then I used concatication to concat b and a array.
For the 2nd example
>>> a= np.random.rand(5,3,3)
>>> a1 = a.transpose(1, 0, 2)
>>> b=np.random.rand(5, 2)
>>> c = np.tile(b, (3, 1, 1))
>>> np.concatenate((c, a1), axis=2)

Related

How does this python normalization code work?

cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
In learning about normalization for image recognition, I have seen many people use this code. I know this sentence is used to normalize the confusion matrix so that it contains only numbers between 0 and 1. So that the percentage of correctly classified samples is read from the matrix. I'm not very good at math, but I'd like to know exactly how this sentence works.
If anyone can help me, I'd appreciate it!

It finds a sum along an axis (axis 1) and then does broadcasted division along that axis by the corresponding value of the sum.
So suppose you had:
>>> arr = np.arange(4*5).reshape(4, 5)
>>> arr
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]])
So first, it sums along the axis:
>>> arr.sum(1)
array([10, 35, 60, 85])
Note, you can't broadcast these two arrays with the current shape:
>>> arr / arr.sum(1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: operands could not be broadcast together with shapes (4,5) (4,)
The trailing axis has to be 1, so you add a new axis, with resulting shape (4, 1):
>>> arr.sum(1)[:, np.newaxis]
array([[10],
[35],
[60],
[85]])
>>> arr.sum(1)[:, np.newaxis].shape
(4, 1)
So now, the broadcasting division works:
>>> arr
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]])
>>> arr.sum(1)[:, np.newaxis]
array([[10],
[35],
[60],
[85]])
>>> arr / arr.sum(1)[:, np.newaxis]
array([[0. , 0.1 , 0.2 , 0.3 , 0.4 ],
[0.14285714, 0.17142857, 0.2 , 0.22857143, 0.25714286],
[0.16666667, 0.18333333, 0.2 , 0.21666667, 0.23333333],
[0.17647059, 0.18823529, 0.2 , 0.21176471, 0.22352941]])
Read more about broadcasting in the numpy docs

Numpy indexing by range of arrays

Say I have an array myarr such that myarr.shape = (2,64,64,2). Now if I define myarr2 = myarr[[0,1,0,0,1],...], then the following is true
myarr2.shape #(5,64,64,2)
myarr2[0,...] == myarr[0,...] # = True
myarr2[1,...] == myarr[1,...] # = True
myarr2[2,...] == myarr[0,...] # = True
...
Can this be generalized so the slices are arrays? That is, is there a way to make the following hypothetical code work?
myarr2 = myarr[...,[20,30,40]:[30,40,50],[15,25,35]:[25,35,45],..]
myarr2[0,] == myarr[...,20:30,15:25,...] # = True
myarr2[1,] == myarr[...,30:40,25:35,...] # = True
myarr2[2,] == myarr[...,40:50,35:45,...] # = True

you may feed the coordinates of subarrays to the cycle which cuts subarrays from myarray. I don't know hoe you store the indices of subarrays so I put them into nested list idx_list:
idx_list = [[[20,30,40],[30,40,50]],[[15,25,35]:[25,35,45]]] # assuming 2D cutouts
idx_array = np.array([k for i in idx_list for j in i for k in j]) # unpack
idx_array = idx_array .reshape(-1,2).T # reshape
myarray2 = np.array([myarray[a:b,c:d] for a,b,c,d in i2]) # cut and combine

Let's simplify the problem a bit; first by removing the two outer dimensions that don't affect the core indexing issue; and by reducing the size so we can see and understand the results.
The setup
In [540]: arr = np.arange(7*7).reshape(7,7)
In [541]: arr
Out[541]:
array([[ 0, 1, 2, 3, 4, 5, 6],
[ 7, 8, 9, 10, 11, 12, 13],
[14, 15, 16, 17, 18, 19, 20],
[21, 22, 23, 24, 25, 26, 27],
[28, 29, 30, 31, 32, 33, 34],
[35, 36, 37, 38, 39, 40, 41],
[42, 43, 44, 45, 46, 47, 48]])
In [542]: idx =np.array([[0,2,4,6],[1,3,5,7]])
Now a straightforward iteration approach:
In [543]: alist = []
...: for i in range(idx.shape[1]-1):
...: j,k = idx[:,i]
...: sub = arr[j:j+2, k:k+2]
...: alist.append(sub)
...:
In [544]: np.array(alist)
Out[544]:
array([[[ 1, 2],
[ 8, 9]],
[[17, 18],
[24, 25]],
[[33, 34],
[40, 41]]])
In [545]: _.shape
Out[545]: (3, 2, 2)
I simplified the iteration from:
...: for i in range(idx.shape[1]-1):
...: sub = arr[idx[0,i]:idx[0,i+1],idx[1,i]:idx[1,i+1]]
...: alist.append(sub)
to highlight the fact that we are generating ranges of a consistent size, and make the next transformation more obvious.
So I start with a (7,7) array, and create 3 (2,2) slices.
As I demonstrated in Slicing a different range at each index of a multidimensional numpy array, we can use linspace to expand a set of slices, or ranges.
In [567]: ranges = np.linspace(idx[:,:3],idx[:,:3]+1,2).astype(int)
In [568]: ranges
Out[568]:
array([[[0, 2, 4],
[1, 3, 5]],
[[1, 3, 5],
[2, 4, 6]]])
So ranges[0] expands on the idx[0] slices, etc. But if I simply index with these I get 'diagonal' values from Out[554]:
In [569]: arr[ranges[0], ranges[1]]
Out[569]:
array([[ 1, 17, 33],
[ 9, 25, 41]])
to get blocks I have to add a dimension to the first indices:
In [570]: arr[ranges[0,:,None], ranges[1]]
Out[570]:
array([[[ 1, 17, 33],
[ 2, 18, 34]],
[[ 8, 24, 40],
[ 9, 25, 41]]])
these are the same values as in Out[554], but need to be transposed:
In [571]: _.transpose(2,0,1)
Out[571]:
array([[[ 1, 2],
[ 8, 9]],
[[17, 18],
[24, 25]],
[[33, 34],
[40, 41]]])
The code's a bit clunky and needs to get generalized, but gives the general idea of how one can substitute one indexing for the iterative one, provide the slices are regular enough. For this small example it probably isn't faster, but it probably will come ahead as the problem size gets larger.

Merging rows in numpy to form new array

This is a sample of what I am trying to accomplish. I am very new to python and have searched for hours to find out what I am doing wrong. I haven't been able to find what my issue is. I am still new enough that I may be searching for the wrong phrases. If so, could you please point me in the right direction?
I want to combine n mumber of arrays to make one array. I want to have the first row from x as the first row in the combined the first row from y as the second row in combined, the first row from z as the third row in combined the the second row in x as the fourth row in combined, etc.
so I would look something like this.
x = [x1 x2 x3]
[x4 x5 x6]
[x7 x8 x9]
y = [y1 y2 y3]
[y4 y5 y6]
[y7 y8 y9]
x = [z1 z2 z3]
[z4 z5 z6]
[z7 z8 z9]
combined = [x1 x2 x3]
[y1 y2 y3]
[z1 z2 z3]
[x4 x5 x6]
[...]
[z7 z8 z9]
The best I can come up with is the
import numpy as np
x = np.random.rand(6,3)
y = np.random.rand(6,3)
z = np.random.rand(6,3)
combined = np.zeros((9,3))
for rows in range(len(x)):
combined[0::3] = x[rows,:]
combined[1::3] = y[rows,:]
combined[2::3] = z[rows,:]
print(combined)
All this does is write the last value of the input array to every third row in the output array instead of what I wanted. I am not sure if this is even the best way to do this. Any advice would help out.
*I just figure out this works but if someone knows a higher performance method, *please let me know.
import numpy as np
x = np.random.rand(6,3)
y = np.random.rand(6,3)
z = np.random.rand(6,3)
combined = np.zeros((18,3))
for rows in range(6):
combined[rows*3,:] = x[rows,:]
combined[rows*3+1,:] = y[rows,:]
combined[rows*3+2,:] = z[rows,:]
print(combined)

You can do this using a list comprehension and zip:
combined = np.array([row for row_group in zip(x, y, z) for row in row_group])

Using vectorised operations only:
A = np.vstack((x, y, z))
idx = np.arange(A.shape[0]).reshape(-1, x.shape[0]).T.flatten()
A = A[idx]
Here's a demo:
import numpy as np
x, y, z = np.random.rand(3,3), np.random.rand(3,3), np.random.rand(3,3)
print(x, y, z)
[[ 0.88259564 0.17609363 0.01067734]
[ 0.50299357 0.35075811 0.47230915]
[ 0.751129 0.81839586 0.80554345]]
[[ 0.09469396 0.33848691 0.51550685]
[ 0.38233976 0.05280427 0.37778962]
[ 0.7169351 0.17752571 0.49581777]]
[[ 0.06056544 0.70273453 0.60681583]
[ 0.57830566 0.71375038 0.14446909]
[ 0.23799775 0.03571076 0.26917939]]
A = np.vstack((x, y, z))
idx = np.arange(A.shape[0]).reshape(-1, x.shape[0]).T.flatten()
print(idx) # [0 3 6 1 4 7 2 5 8]
A = A[idx]
print(A)
[[ 0.88259564 0.17609363 0.01067734]
[ 0.09469396 0.33848691 0.51550685]
[ 0.06056544 0.70273453 0.60681583]
[ 0.50299357 0.35075811 0.47230915]
[ 0.38233976 0.05280427 0.37778962]
[ 0.57830566 0.71375038 0.14446909]
[ 0.751129 0.81839586 0.80554345]
[ 0.7169351 0.17752571 0.49581777]
[ 0.23799775 0.03571076 0.26917939]]

I have changed your code a little bit to get the desired output
import numpy as np
x = np.random.rand(6,3)
y = np.random.rand(6,3)
z = np.random.rand(6,3)
combined = np.zeros((18,3))
combined[0::3] = x
combined[1::3] = y
combined[2::3] = z
print(combined)
You had the shape of the combined matrix wrong and there is no real need for the for loop.

This might not be the most pythonic way to do it but you could
for block in range(len(combined)/3):
for rows in range(len(x)):
combined[block*3+0::3] = x[rows,:]
combined[block*3+1::3] = y[rows,:]
combined[block*3+2::3] = z[rows,:]

A simple numpy solution is to stack the arrays on a new middle axis, and reshape the result to 2d:
In [5]: x = np.arange(9).reshape(3,3)
In [6]: y = np.arange(9).reshape(3,3)+10
In [7]: z = np.arange(9).reshape(3,3)+100
In [8]: np.stack((x,y,z),axis=1).reshape(-1,3)
Out[8]:
array([[ 0, 1, 2],
[ 10, 11, 12],
[100, 101, 102],
[ 3, 4, 5],
[ 13, 14, 15],
[103, 104, 105],
[ 6, 7, 8],
[ 16, 17, 18],
[106, 107, 108]])
It may be easier to see what's happening if we give each dimension a different value; e.g. 2 3x4 arrays:
In [9]: x = np.arange(12).reshape(3,4)
In [10]: y = np.arange(12).reshape(3,4)+10
np.array combines them on a new 1st axis, making a 2x3x4 array. To get the interleaving you want, we can transpose the first 2 dimensions, producing a 3x2x4. Then reshape to a 6x4.
In [13]: np.array((x,y))
Out[13]:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[10, 11, 12, 13],
[14, 15, 16, 17],
[18, 19, 20, 21]]])
In [14]: np.array((x,y)).transpose(1,0,2)
Out[14]:
array([[[ 0, 1, 2, 3],
[10, 11, 12, 13]],
[[ 4, 5, 6, 7],
[14, 15, 16, 17]],
[[ 8, 9, 10, 11],
[18, 19, 20, 21]]])
In [15]: np.array((x,y)).transpose(1,0,2).reshape(-1,4)
Out[15]:
array([[ 0, 1, 2, 3],
[10, 11, 12, 13],
[ 4, 5, 6, 7],
[14, 15, 16, 17],
[ 8, 9, 10, 11],
[18, 19, 20, 21]])
np.vstack produces a 6x4, but with the wrong order. We can't transpose that directly.
np.stack with default axis behaves just like np.array. But with axis=1, it creates a 3x2x4, which we can reshape:
In [16]: np.stack((x,y), 1)
Out[16]:
array([[[ 0, 1, 2, 3],
[10, 11, 12, 13]],
[[ 4, 5, 6, 7],
[14, 15, 16, 17]],
[[ 8, 9, 10, 11],
[18, 19, 20, 21]]])
The list zip in the accepted answer is a list version of transpose, creating a list of 3 2-element tuples.
In [17]: list(zip(x,y))
Out[17]:
[(array([0, 1, 2, 3]), array([10, 11, 12, 13])),
(array([4, 5, 6, 7]), array([14, 15, 16, 17])),
(array([ 8, 9, 10, 11]), array([18, 19, 20, 21]))]
np.array(list(zip(x,y))) produces the same thing as the stack, a 3x2x4 array.
As for speed, I suspect the allocate and assign (as in Ash's answer) is fastest:
In [27]: z = np.zeros((6,4),int)
...: for i, arr in enumerate((x,y)):
...: z[i::2,:] = arr
...:
In [28]: z
Out[28]:
array([[ 0, 1, 2, 3],
[10, 11, 12, 13],
[ 4, 5, 6, 7],
[14, 15, 16, 17],
[ 8, 9, 10, 11],
[18, 19, 20, 21]])
For serious timings, use much larger examples than this.

Efficient numpy array random views with dropped dimensions

For computer vision training purposes, random cropping is often used as a data augmentation technique. At each iteration, a batch of random crops is generated and fed to the network being trained. This needs to be efficient, as it is done at each training iteration.
If the data has too many dimensions, random dimension selection might also be needed. Random frames can be selected in a video for example. The data can even have 4 dimensions (3 in space + time), or more.
How can one write an efficient generator of random views of lower dimension?
A very naïve version for getting 2D views from 3D data, and only one by one, could be:
import numpy as np
import numpy.random as nr
def views():
# suppose `data` comes from elsewhere
# data.shape is (n1, n2, n3)
while True:
drop_dim = nr.randint(0, 3)
drop_dim_keep = nr.randint(0, shape[drop_dim])
selector = np.zeros(shape, dtype=bool)
if drop_dim == 0:
selector[drop_dim_keep, :, :] = 1
elif drop_dim == 1:
selector[:, drop_dim_keep, :] = 1
else:
selector[:, :, drop_dim_keep] = 1
yield np.squeeze(data[selector])
A more elegant solution probably exists, where at least:
there is no ugly if/else on the randomly chosen dimension
views can take a batch_size integer argument and generate several views at once without a loop
the dimension of input/output data is not specified (e.g. can do 3D -> 2D as well as 4D -> 2D)

I tweaked your function to clarify what it's doing:
def views():
# suppose `data` comes from elsewhere
# data.shape is (n1, n2, n3)
while True:
drop_dim = nr.randint(0, 3)
dropshape = list(shape[:])
dropshape[drop_dim] -= 1
drop_dim_keep = nr.randint(0, shape[drop_dim])
print(drop_dim, drop_dim_keep)
selector = np.ones(shape, dtype=bool)
if drop_dim == 0:
selector[drop_dim_keep, :, :] = 0
elif drop_dim == 1:
selector[:, drop_dim_keep, :] = 0
else:
selector[:, :, drop_dim_keep] = 0
yield data[selector].reshape(dropshape)
A small sample run:
In [534]: data = np.arange(24).reshape(shape)
In [535]: data
Out[535]:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]])
In [536]: v = views()
In [537]: next(v)
2 1
Out[537]:
array([[[ 0, 2, 3],
[ 4, 6, 7],
[ 8, 10, 11]],
[[12, 14, 15],
[16, 18, 19],
[20, 22, 23]]])
In [538]: next(v)
0 0
Out[538]:
array([[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]])
So it's picking one of the dimensions, and for that dimension dropping one 'column'.
The main efficiency issue is whether it's returning a view or a copy. In this case it has to return a copy.
You are using a boolean mask to select the return, exactly the same as what np.delete does in this case.
In [544]: np.delete(data,1,2).shape
Out[544]: (2, 3, 3)
In [545]: np.delete(data,0,0).shape
Out[545]: (1, 3, 4)
So you could replace much of your interals with delete, letting it take care of generalizing the dimensions. Look at its code to see how it handles those details (It isn't short and sweet!).
def rand_delete():
# suppose `data` comes from elsewhere
# data.shape is (n1, n2, n3)
while True:
drop_dim = nr.randint(0, 3)
drop_dim_keep = nr.randint(0, shape[drop_dim])
print(drop_dim, drop_dim_keep)
yield np.delete(data, drop_dim_keep, drop_dim)
In [547]: v1=rand_delete()
In [548]: next(v1)
0 1
Out[548]:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]]])
In [549]: next(v1)
2 0
Out[549]:
array([[[ 1, 2, 3],
[ 5, 6, 7],
[ 9, 10, 11]],
[[13, 14, 15],
[17, 18, 19],
[21, 22, 23]]])
Replace the delete with take:
def rand_take():
while True:
take_dim = nr.randint(0, 3)
take_keep = nr.randint(0, shape[take_dim])
print(take_dim, take_keep)
yield np.take(data, take_keep, axis=take_dim)
In [580]: t = rand_take()
In [581]: next(t)
0 0
Out[581]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
In [582]: next(t)
2 3
Out[582]:
array([[ 3, 7, 11],
[15, 19, 23]])
np.take returns a copy, but the equivalent slicing does not
In [601]: data.__array_interface__['data']
Out[601]: (182632568, False)
In [602]: np.take(data,0,1).__array_interface__['data']
Out[602]: (180099120, False)
In [603]: data[:,0,:].__array_interface__['data']
Out[603]: (182632568, False)
A slicing tuple can be generated with expressions like
In [604]: idx = [slice(None)]*data.ndim
In [605]: idx[1] = 0
In [606]: data[tuple(idx)]
Out[606]:
array([[ 0, 1, 2, 3],
[12, 13, 14, 15]])
Various numpy functions that take an axis parameter construct an indexing tuple like this. (For example one or more of the apply... functions.

Python VTK: Coordinates directly to PolyData

I want to convert all coordinate combinations for x,y and z in specific range with for now step 1 directly to vtk.polyData or vtk.points. My first approach was to use itertools.product, but I thought this would have a very bad runtime. So i came to another approach with vtk, which i need anyway for the next part sof my program.
First appraoch with itertools.product
import numpy as np
import itertools
import vtk
x1=[10,11,12....310]
y1=[10,11,12....310]
z1=[0,1,2....65]
points1 = vtk.vtkPoints()
for coords in itertools.product(x1,y1,z1):
points1.InsertNextPoint(coords)
boxPolyData1 = vtk.vtkPolyData()
boxPolyData1.SetPoints(points1)
My approach with vtk so far:
import numpy as np
from vtk.util import numpy_support
coords = np.mgrid[10:310, 10:310, 0:65]
vtk_data_array = numpy_support.numpy_to_vtk(num_array=coords.ravel(),deep=True,array_type=vtk.VTK_FLOAT)
points = vtk.vtkPoints()
points.SetData(vtk_data_array)
But his just crashes my python. Does anyone has an idea?
best regards!

Stack those coords in columns with np.column_stack or np.c_ and then feed those as input to num_array, like so -
x,y,z = np.mgrid[10:310, 10:310, 0:65]
out_data = np.column_stack((x.ravel(), y.ravel(), z.ravel()))
vtk_data_array = numpy_support.numpy_to_vtk(num_array=out_data,\
deep=True,array_type=vtk.VTK_FLOAT)
Alternatively, to get out_data directly -
out_data = np.mgrid[10:310, 10:310, 0:65].reshape(3,-1).T
Another approach using initialization to replace the 3D array created by np.mgrid would be like so -
def create_mgrid_array(d00,d01,d10,d11,d20,d21,dtype=int):
df0 = d01-d00
df1 = d11-d10
df2 = d21-d20
a = np.zeros((df0,df1,df2,3),dtype=dtype)
X,Y,Z = np.ogrid[d00:d01,d10:d11,d20:d21]
a[:,:,:,2] = Z
a[:,:,:,1] = Y
a[:,:,:,0] = X
a.shape = (-1,3)
return a
Sample run to showcase usage of create_mgrid_array -
In [151]: create_mgrid_array(3,6,10,14,20,22,dtype=int)
Out[151]:
array([[ 3, 10, 20],
[ 3, 10, 21],
[ 3, 11, 20],
[ 3, 11, 21],
[ 3, 12, 20],
[ 3, 12, 21],
[ 3, 13, 20],
[ 3, 13, 21],
[ 4, 10, 20],
[ 4, 10, 21],
[ 4, 11, 20],
[ 4, 11, 21],
[ 4, 12, 20],
[ 4, 12, 21],
[ 4, 13, 20],
[ 4, 13, 21],
[ 5, 10, 20],
[ 5, 10, 21],
[ 5, 11, 20],
[ 5, 11, 21],
[ 5, 12, 20],
[ 5, 12, 21],
[ 5, 13, 20],
[ 5, 13, 21]])
Runtime test
Approaches -
def loopy_app():
x1 = range(10,311)
y1 = range(10,311)
z1 = range(0,66)
points1 = vtk.vtkPoints()
for coords in itertools.product(x1,y1,z1):
points1.InsertNextPoint(coords)
return points1
def vectorized_app():
out_data = create_mgrid_array(10,311,10,311,0,66,dtype=float)
vtk_data_array = numpy_support.numpy_to_vtk(num_array=out_data,\
deep=True,array_type=vtk.VTK_FLOAT)
points2 = vtk.vtkPoints()
points2.SetData(vtk_data_array)
return points2
Timings and verification -
In [155]: # Verify outputs with loopy and vectorized approaches
...: out1 = vtk_to_numpy(loopy_app().GetData())
...: out2 = vtk_to_numpy(vectorized_app().GetData())
...: print np.allclose(out1, out2)
...:
True
In [156]: %timeit loopy_app()
1 loops, best of 3: 923 ms per loop
In [157]: %timeit vectorized_app()
10 loops, best of 3: 67.3 ms per loop
In [158]: 923/67.3
Out[158]: 13.714710252600298
13x+ speedup there with the proposed vectorized one over the loopy one!

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

numpy column_stack in loop - python

Related

How does this python normalization code work?

Numpy indexing by range of arrays

Merging rows in numpy to form new array

Efficient numpy array random views with dropped dimensions

Python VTK: Coordinates directly to PolyData

Categories

Resources