collapsing all dimensions of numpy array except the first two

collapsing all dimensions of numpy array except the first two - python

I have a variable dimension numpy array, for example it could have the following shapes
(64, 64)
(64, 64, 2, 5)
(64, 64, 40)
(64, 64, 10, 20, 4)
What I want to do is that if the number of dimensions is greater than 3, I want to collapse/stack everything else into the third dimension while preserving order. So, in my above example the shapes after the operation should be:
(64, 64)
(64, 64, 10)
(64, 64, 40)
(64, 64, 800)
Also, the order needs to be preserved. For example, the array of the shape (64, 64, 2, 5) should be stacked as
(64, 64, 2)
(64, 64, 2)
(64, 64, 2)
(64, 64, 2)
(64, 64, 2)
i.e. the 3D slices one after the other. Also, after the operation I would like to reshape it back to the original shape without any permutation i.e. preserve the original order.
One way I could do is multiply all the dimension values from 3 to the last dimension i.e.
shape = array.shape
if len(shape) > 3:
final_dim = 1
for i in range(2, len(shape)):
final_dim *= shape[i]
and then reshape the array. Something like:
array.reshape(64, 64, final_dim)
However, I was first of all not sure if the order is preserved as I want and whether there is a better pythonic way to achieve this?

EDIT: As pointed out in the other answers it is even easier to just provide -1 as the third dimension for reshape. Numpy automatically determines the correct shape then.
I am not sure what the problem here is. You can just use np.reshape and it preserves order. See the following code:
import numpy as np
A = np.random.rand(20,20,2,2,18,5)
print A.shape
new_dim = np.prod(A.shape[2:])
print new_dim
B = np.reshape(A, (A.shape[0], A.shape[1], np.prod(A.shape[2:])))
print B.shape
C = B.reshape((20,20,2,2,18,5))
print np.array_equal(A,C)
The output is:
(20L, 20L, 2L, 2L, 18L, 5L)
360
(20L, 20L, 360L)
True
This accomplishes exactly what you asked for.

reshape accept automatic re-dimension :
a=rand(20,20,8,6,4)
s=a.shape[:2]
if a.ndim>2 : s = s+ (-1,)
b=a.reshape(s)

Going by the requirement of stacking for the given (64, 64, 2, 5) sample, I think you need to permute the axes. For the permuting, we can use np.rollaxis, like so -
def collapse_dims(a):
if a.ndim>3:
return np.rollaxis(a,-1,2).reshape(a.shape[0],a.shape[1],-1)
else:
return a
Sample run on the given four sample shapes -
1) Sample shapes :
In [234]: shp1 = (64, 64)
...: shp2 = (64, 64, 2, 5)
...: shp3 = (64, 64, 40)
...: shp4 = (64, 64, 10, 20, 4)
...:
Case #1 :
In [235]: a = np.random.randint(11,99,(shp1))
In [236]: np.allclose(a, collapse_dims(a))
Out[236]: True
Case #2 :
In [237]: a = np.random.randint(11,99,(shp2))
In [238]: np.allclose(a[:,:,:,0], collapse_dims(a)[:,:,0:2])
Out[238]: True
In [239]: np.allclose(a[:,:,:,1], collapse_dims(a)[:,:,2:4])
Out[239]: True
In [240]: np.allclose(a[:,:,:,2], collapse_dims(a)[:,:,4:6]) # .. so on
Out[240]: True
Case #3 :
In [241]: a = np.random.randint(11,99,(shp3))
In [242]: np.allclose(a, collapse_dims(a))
Out[242]: True
Case #4 :
In [243]: a = np.random.randint(11,99,(shp4))
In [244]: np.allclose(a[:,:,:,:,0].ravel(), collapse_dims(a)[:,:,:200].ravel())
Out[244]: True
In [245]: np.allclose(a[:,:,:,:,1].ravel(), collapse_dims(a)[:,:,200:400].ravel())
Out[245]: True

I'll try to illustrate the concern that #Divaker brings up.
In [522]: arr = np.arange(2*2*3*4).reshape(2,2,3,4)
In [523]: arr
Out[523]:
array([[[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]],
[[[24, 25, 26, 27],
[28, 29, 30, 31],
[32, 33, 34, 35]],
[[36, 37, 38, 39],
[40, 41, 42, 43],
[44, 45, 46, 47]]]])
4 is the inner most dimension, so it displays the array as 3x4 blocks. And if you pay attention to spaces and [] you'll see there are 2x2 blocks.
Notice what happens when we use the reshape:
In [524]: arr1 = arr.reshape(2,2,-1)
In [525]: arr1
Out[525]:
array([[[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]],
[[24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35],
[36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47]]])
Now it is 2 2x12 blocks. You can do anything to those 12 element rows, and reshape them back to 3x4 blocks
In [526]: arr1.reshape(2,2,3,4)
Out[526]:
array([[[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
...
But I could also split this array on the last dimension. np.split can do it, but a list comprehension is easier to understand:
In [527]: alist = [arr[...,i] for i in range(4)]
In [528]: alist
Out[528]:
[array([[[ 0, 4, 8],
[12, 16, 20]],
[[24, 28, 32],
[36, 40, 44]]]),
array([[[ 1, 5, 9],
[13, 17, 21]],
[[25, 29, 33],
[37, 41, 45]]]),
array([[[ 2, 6, 10],
[14, 18, 22]],
[[26, 30, 34],
[38, 42, 46]]]),
array([[[ 3, 7, 11],
[15, 19, 23]],
[[27, 31, 35],
[39, 43, 47]]])]
This contains 4 (2,2,3) arrays. Note that the 3 element rows display as columns in the 4d display.
I can reform into a 4d array with np.stack (which is like np.array, but gives more control of how the arrays are joined):
In [529]: np.stack(alist, axis=-1)
Out[529]:
array([[[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
...
[[36, 37, 38, 39],
[40, 41, 42, 43],
[44, 45, 46, 47]]]])
==========
The split equivalent is [x[...,0] for x in np.split(arr, 4, axis=-1)]. Without the indexing split produces (2, 2, 3, 1) arrays.
collapse_dims produces (for my example):
In [532]: np.rollaxis(arr,-1,2).reshape(arr.shape[0],arr.shape[1],-1)
Out[532]:
array([[[ 0, 4, 8, 1, 5, 9, 2, 6, 10, 3, 7, 11],
[12, 16, 20, 13, 17, 21, 14, 18, 22, 15, 19, 23]],
[[24, 28, 32, 25, 29, 33, 26, 30, 34, 27, 31, 35],
[36, 40, 44, 37, 41, 45, 38, 42, 46, 39, 43, 47]]])
A (2,2,12) array, but with the elements in rows in a different order. It does a transpose on the inner 2 dimensions before flattening.
In [535]: arr[0,0,:,:].ravel()
Out[535]: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
In [536]: arr[0,0,:,:].T.ravel()
Out[536]: array([ 0, 4, 8, 1, 5, 9, 2, 6, 10, 3, 7, 11])
Restoring that back to the original order requires another roll or transpose
In [542]: arr2.reshape(2,2,4,3).transpose(0,1,3,2)
Out[542]:
array([[[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
....
[[36, 37, 38, 39],
[40, 41, 42, 43],
[44, 45, 46, 47]]]])

Related

np.delete() how to delete multiple rows in python [duplicate]

How can I delete multiple rows of NumPy array? For example, I want to delete the first five rows of x. I'm trying the following code:
import numpy as np
x = np.random.rand(10, 5)
np.delete(x, (0:5), axis=0)
but it doesn't work:
np.delete(x, (0:5), axis=0)
^
SyntaxError: invalid syntax

There are several ways to delete rows from NumPy array.
The easiest one is to use basic indexing as with standard Python lists:
>>> import numpy as np
>>> x = np.arange(35).reshape(7, 5)
>>> x
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
>>> result = x[5:]
>>> result
array([[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
You can select not only rows but columns as well:
>>> x[:2, 1:4]
array([[1, 2, 3],
[6, 7, 8]])
Another way is to use "fancy indexing" (indexing arrays using arrays):
>>> x[[0, 2, 6]]
array([[ 0, 1, 2, 3, 4],
[10, 11, 12, 13, 14],
[30, 31, 32, 33, 34]])
You can achieve the same using np.take:
>>> np.take(x, [0, 2, 6], axis=0)
array([[ 0, 1, 2, 3, 4],
[10, 11, 12, 13, 14],
[30, 31, 32, 33, 34]])
Yet another option is to use np.delete as in the question. For selecting the rows/columns for deletion it can accept slice objects, int, or array of ints:
>>> np.delete(x, slice(0, 5), axis=0)
array([[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
>>> np.delete(x, [0, 2, 3], axis=0)
array([[ 5, 6, 7, 8, 9],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
But all this time that I've been using NumPy I never needed this np.delete, as in this case it's much more convenient to use boolean indexing.
As an example, if I would want to remove/select those rows that start with a value greater than 12, I would do:
>>> mask_array = x[:, 0] < 12 # comparing values of the first column
>>> mask_array
array([ True, True, True, False, False, False, False])
>>> x[mask_array]
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
>>> x[~mask_array] # ~ is an element-wise inversion
array([[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
For more information refer to the documentation on indexing: https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html

If you want to delete selected rows you can write like
np.delete(x, (1,2,5), axis = 0)
This will delete 1,2 and 5 th line, and if you want to delete like (1:5) try this one
np.delete(x, np.s_[0:5], axis = 0)
by this you can delete 0 to 4 lines from your array.
np.s_[0:5] --->> slice(0, 5, None)
both are same.

Pass the multiple row numbers to the list argument.
General Syntax:
np.delete(array_name,[rownumber1,rownumber2,..,rownumber n],axis=0)
Example: delete first three rows in an array:
np.delete(array_name,[0,1,2],axis=0)

Transform Numpy Arrays to specific 2 dimensional Form

Currently I work with numpy and have to transform large data sets. Starting point are some one-dimensional arrays. These should be combined to a large 2 dimensional array. I attach a small example how it should look like.
# Returns 16 arrays with four numbers in each array (from 0 to 3)
array = [np.arange(4) for i in range(16)]
# Each of these arrays should be transformed to become a two-dimensional shape=(2,2)
array.reshape((2,2))
1. Step:
# Subsequently, the first of all of them are to be brought together:
[[0,1,0,1,0,1,0,1,0,1, ...,0,1],
[2,3,2,3,2,3,2,3,2,3, ...,2,3]]
2. Step:
# Afterwards, the arrays are to be wrapped on the basis of a length (let's say 8). The result should look like this:
[[0,1,0,1,0,1,0,1],
[2,3,2,3,2,3,2,3],
[0,1,0,1,0,1,0,1],
[2,3,2,3,2,3,2,3]]
This is only a miniature example. I'm actually working with an array with the length of 64 that is to be converted to an array with the shape=(8, 8). And at the end I want to create a 2 dimensional array with the dimensions 416x416.
Edit: So my current question is, how do I get to the first and second step in the example above?

Assuming that you have created array (as you described), try the following code:
chunkSize = 8 # No of columns in the result
# Reshape and bring together
array2 = np.hstack([ arr.reshape(2,2) for arr in array ])
# The final result
result = np.vstack(np.hsplit(array2, [ (n + 1) * chunkSize
for n in range(int(array2.shape[1] / chunkSize) - 1) ]))

You can use np.pad, with mode='wrap':
final_width = 8
final_height = 8
a = np.arange(4).reshape(2,2)
np.pad(a, ((0, final_height-a.shape[0]),(0, final_width-a.shape[1])), mode='wrap')
a
out:
array([[0, 1, 0, 1, 0, 1, 0, 1],
[2, 3, 2, 3, 2, 3, 2, 3],
[0, 1, 0, 1, 0, 1, 0, 1],
[2, 3, 2, 3, 2, 3, 2, 3],
[0, 1, 0, 1, 0, 1, 0, 1],
[2, 3, 2, 3, 2, 3, 2, 3],
[0, 1, 0, 1, 0, 1, 0, 1],
[2, 3, 2, 3, 2, 3, 2, 3]])

Well the example you posted explains everything clearly. I don't know what the problem is.
An implementation for your 64 to 8*8 transformation would be like:
import numpy as np
a = np.array([i for i in range(64)]) # defines a 64*1 1-D array
# a = array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
# 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
# 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
# 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63])
print(a.shape) # (64,)
a = a.reshape((8,8)) # makes "a" an 8*8 2-D array
# array([[ 0, 1, 2, 3, 4, 5, 6, 7],
# [ 8, 9, 10, 11, 12, 13, 14, 15],
# [16, 17, 18, 19, 20, 21, 22, 23],
# [24, 25, 26, 27, 28, 29, 30, 31],
# [32, 33, 34, 35, 36, 37, 38, 39],
# [40, 41, 42, 43, 44, 45, 46, 47],
# [48, 49, 50, 51, 52, 53, 54, 55],
# [56, 57, 58, 59, 60, 61, 62, 63]])
print(a.shape) # (8, 8)
And the same thing goes for converting a 173056*1 array to a 416*416 one.
Maybe you are confused with the fact that you can use reshape method on a 2-D array. Of course you can!

How do you print elements that are less than a variable from a numpy array

Hi so Im fairly new to python and an assignment require me to print elements that are less than a variable from a numpy array.
I made a 20x10 numpy array of random integers between -5 and 50
x = np.random.randint (-5, 50, (20, 10))
x
array([[17, 23, 15, 13, -1, 17, 30, 14, 2, 3],
[ 8, 0, -5, 3, 10, 10, 48, 6, -1, 34],
[23, 40, 21, 5, 47, 41, 44, 22, 46, 30],
[36, 13, 48, 29, 46, 25, 48, 38, 13, 40],
[18, -4, 1, 37, 48, 43, 25, 11, 21, 30],
[44, 37, 4, 39, 8, 1, 33, 34, 3, 8],
[ 2, 11, 17, 10, 20, 3, 30, 1, 12, 2],
[15, 20, -3, 11, 45, 40, 18, 19, -1, 31],
[39, 44, 18, 25, 49, 20, 15, 28, 32, 18],
[22, 24, 28, 46, 48, 46, 17, 49, 2, 36],
[44, 4, 49, -5, 14, 31, 12, 15, 48, 43],
[-2, 37, -4, 15, 31, -1, 11, 43, 42, 5],
[40, 35, 25, 22, 38, 26, 15, 1, 4, 22],
[42, 30, 14, 7, 13, 44, 5, 29, 28, 38],
[-2, 7, 31, -4, 44, -5, 34, 19, 31, 30],
[ 0, 1, -2, 29, 35, 28, 23, -1, 21, 27],
[40, 46, 4, 48, 0, 28, 2, 25, 3, 49],
[15, 2, -2, 16, 22, 39, -2, 33, 15, 2],
[14, 26, -5, 0, 22, 38, 25, 4, 14, 2],
[16, 32, 23, 3, 38, 41, -5, 35, 46, 33]])
above is the result. Now i want to print the number of elements that are less than 5 in each row.
I managed to do this
print (x[0, :] < 5)
[False False False False True False False False True True]
the result is as shown above but what i wanted was for it to show the number of elements that is less than 5. I wanted for it to give me 3 since there are 3 elements.
Can anyone help me with this? Thank you

It's possible to use np.sum for arrays of type bool like yours. So, at first I have tried the following:
[np.sum(n<5) for n in x]
This gives me a list [3, 4, 0, 0, 2, 3, 4, 2, 0, 1, 2, 3, 2, 0, 3, 4, 4, 4, 4, 2] which is correct but the bad thing is that you need to avoid list comprehensions in numpy actions. Here is the best way to do this in numpy:
np.sum(x<5, axis=1)
This command makes bool array out of x and then calculates True values for each row along y axis (axis number 1)

You can use your boolean mask to index the array and then count the elements. Alternatively, you can use numpy.where(). Similar to your approach, it will give you a boolean mask where a certain condition is met.
For your example:
indices = numpy.where(x < 3)
values_greater_than_3 = x[indices]
count = len(values_greater_than_3)
print(count)

numpy mask using np.where then replace values

I've got two 2-D numpy arrays with same shape, let's say (10,6).
The first array x is full of some meaningful float numbers.
x = np.arange(60).reshape(-1,6)
The second array a is sparse array, with each row contains ONLY 2 non-zero values.
a = np.zeros((10,6))
for i in range(10):
a[i, 1] = 1
a[i, 2] = 1
Then there's a third array with the shape of (10,2), and I want to update the values of each row to the first array x at the position where a is not zero.
v = np.arange(20).reshape(10,2)
so the original x and the updated x will be:
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35],
[36, 37, 38, 39, 40, 41],
[42, 43, 44, 45, 46, 47],
[48, 49, 50, 51, 52, 53],
[54, 55, 56, 57, 58, 59]])
and
array([[ 0, 0, 1, 3, 4, 5],
[ 6, 2, 3, 9, 10, 11],
[12, 4, 5, 15, 16, 17],
[18, 6, 7, 21, 22, 23],
[24, 8, 9, 27, 28, 29],
[30, 10, 11, 33, 34, 35],
[36, 12, 13, 39, 40, 41],
[42, 14, 15, 45, 46, 47],
[48, 16, 17, 51, 52, 53],
[54, 18, 19, 57, 58, 59]])
I've tried the following method
x[np.where(a!=0)] = v
Then I got an error of shape mismatch: value array of shape (10,2) could not be broadcast to indexing result of shape (20,)
What's wrong with this approach, is there an alternative to do it? Thanks a lot.

Thanks to the comment by #Divakar, the problem happens because the shapes of the two variables on both side of the assignment mark = are different.
To the left, the expression x[np.where(a!=0)] or x[a!=0] or x[np.nonzero(a)] are not structured, which has a shape of (20,)
To the right, we need an array of similar shape to finish the assignment. Therefore, a simple ravel() or reshape(-1) will do the job.
so the solution is as simple as x[a!=0] = v.ravel().

import numpy as np
arrayOne = np.random.rand(6).reshape((2, 3))
arrayTwo = np.asarray([[0,1,2], [1,2,0]])
arrayThree = np.zeros((2, 2))
arrayOne[arrayTwo != 0] = arrayThree.ravel()
print(arrayOne)
[[0.56251284 0. 0. ]
[0. 0. 0.20076913]]
Note regarding edit: The solution above is not mine, all credit goes to Divakar. I edited because my earlier answer misunderstood OP's question and I wish to avoid confusion.

How to delete multiple rows of NumPy array?

How can I delete multiple rows of NumPy array? For example, I want to delete the first five rows of x. I'm trying the following code:
import numpy as np
x = np.random.rand(10, 5)
np.delete(x, (0:5), axis=0)
but it doesn't work:
np.delete(x, (0:5), axis=0)
^
SyntaxError: invalid syntax

There are several ways to delete rows from NumPy array.
The easiest one is to use basic indexing as with standard Python lists:
>>> import numpy as np
>>> x = np.arange(35).reshape(7, 5)
>>> x
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
>>> result = x[5:]
>>> result
array([[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
You can select not only rows but columns as well:
>>> x[:2, 1:4]
array([[1, 2, 3],
[6, 7, 8]])
Another way is to use "fancy indexing" (indexing arrays using arrays):
>>> x[[0, 2, 6]]
array([[ 0, 1, 2, 3, 4],
[10, 11, 12, 13, 14],
[30, 31, 32, 33, 34]])
You can achieve the same using np.take:
>>> np.take(x, [0, 2, 6], axis=0)
array([[ 0, 1, 2, 3, 4],
[10, 11, 12, 13, 14],
[30, 31, 32, 33, 34]])
Yet another option is to use np.delete as in the question. For selecting the rows/columns for deletion it can accept slice objects, int, or array of ints:
>>> np.delete(x, slice(0, 5), axis=0)
array([[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
>>> np.delete(x, [0, 2, 3], axis=0)
array([[ 5, 6, 7, 8, 9],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
But all this time that I've been using NumPy I never needed this np.delete, as in this case it's much more convenient to use boolean indexing.
As an example, if I would want to remove/select those rows that start with a value greater than 12, I would do:
>>> mask_array = x[:, 0] < 12 # comparing values of the first column
>>> mask_array
array([ True, True, True, False, False, False, False])
>>> x[mask_array]
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
>>> x[~mask_array] # ~ is an element-wise inversion
array([[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
For more information refer to the documentation on indexing: https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html

If you want to delete selected rows you can write like
np.delete(x, (1,2,5), axis = 0)
This will delete 1,2 and 5 th line, and if you want to delete like (1:5) try this one
np.delete(x, np.s_[0:5], axis = 0)
by this you can delete 0 to 4 lines from your array.
np.s_[0:5] --->> slice(0, 5, None)
both are same.

Pass the multiple row numbers to the list argument.
General Syntax:
np.delete(array_name,[rownumber1,rownumber2,..,rownumber n],axis=0)
Example: delete first three rows in an array:
np.delete(array_name,[0,1,2],axis=0)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

collapsing all dimensions of numpy array except the first two - python

reshape accept automatic re-dimension : a=rand(20,20,8,6,4) s=a.shape[:2] if a.ndim>2 : s = s+ (-1,) b=a.reshape(s)

Related

np.delete() how to delete multiple rows in python [duplicate]

Transform Numpy Arrays to specific 2 dimensional Form

How do you print elements that are less than a variable from a numpy array

numpy mask using np.where then replace values

How to delete multiple rows of NumPy array?

Categories

Resources