Split Numpy array into equal-length sub-arrays - python

I have a very huge numpy array like this:
np.array([1, 2, 3, 4, 5, 6, 7 , ... , 12345])
I need to create subgroups of n elements (in the example n = 3) in another array like this:
np.array([[1, 2, 3],[4, 5, 6], [6, 7, 8], [...], [12340, 12341, 12342], [12343, 12344, 12345]])
I did accomplish that using normal python lists, just appending the subgroups to another list. But, I'm having a hard time trying to do that in numpy.
Any ideas how can I do that?
Thanks!

You can use np.reshape(-1, 3), where the -1 means "whatever's left".
>>> array = np.arange(1, 12346)
>>> array
array([ 1, 2, 3, ..., 12343, 12344, 12345])
>>> array.reshape(-1, 3)
array([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
...,
[12337, 12338, 12339],
[12340, 12341, 12342],
[12343, 12344, 12345]])

You can use np.reshape():
From the documentation (link in title):
numpy.reshape(a, newshape, order='C')
Gives a new shape to an array without changing its data.
Here is an example of how you can apply it to your situation:
>>> import numpy as np
>>> a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 12345])
>>> a.reshape((int(len(a)/3), 3))
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 12345]], dtype=object)
Note that obviously, the length of the array (len(a)) has to be a multiple of 3 to be able to reshape it into a 2-dimensional numpy array, because they must be rectangular.

Related

Are the elements created by numpy.repeat() views of the original numpy.array or unique elements?

I have a 3D array that I like to repeat 4 times.
Achieved via a mixture of Numpy and Python methods:
>>> z = np.arange(9).reshape(3,3)
>>> z
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> z2 = []
>>> for i in range(4):
z2.append(z)
>>> z2
[array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]]), array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]]), array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]]), array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])]
>>> z2 = np.array(z2)
>>> z2
array([[[0, 1, 2],
[3, 4, 5],
[6, 7, 8]],
[[0, 1, 2],
[3, 4, 5],
[6, 7, 8]],
[[0, 1, 2],
[3, 4, 5],
[6, 7, 8]],
[[0, 1, 2],
[3, 4, 5],
[6, 7, 8]]])
Achieved via Pure NumPy:
>>> z2 = np.repeat(z[np.newaxis,...], 4, axis=0)
>>> z2
array([[[0, 1, 2],
[3, 4, 5],
[6, 7, 8]],
[[0, 1, 2],
[3, 4, 5],
[6, 7, 8]],
[[0, 1, 2],
[3, 4, 5],
[6, 7, 8]],
[[0, 1, 2],
[3, 4, 5],
[6, 7, 8]]])
Are the elements created by numpy.repeat() views of the original numpy.array() or unique elements?
If the latter, is there an equivalent NumPy functions that can create views of the original array the same way as numpy.repeat()?
I think such an ability can help reduce the buffer space of z2 in the event size of z is large and when there are many repeats of z involved.
A follow-up on one of #FrankYellin answer:
>>> z = np.arange(9).reshape(3,3)
>>> z
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> z2 = np.repeat(z[np.newaxis,...], 1_000_000_000, axis=0)
>>> z2.nbytes
72000000000
>>> y2 = np.broadcast_to(z, (1_000_000_000, 3, 3))
>>> y2.nbytes
72000000000
The nbytes from using np.broadcast_to() is the same as np.repeat(). This is surprising given that the former returns a readonly view on the original z array with the given shape. Having said this, I did notice that np.broadcast_to() created the y2 array instantaneously, while the creation of z2 via np.repeat() took abt 40 seconds to complete. Hence,np.broadcast_to() yielded significantly faster performance.
If you want a writable version, it is doable, but it's really ugly.
If you want a read-only version, np.broadcast_to(z, (4, 3, 3)) should be all you need.
Now the ugly writable version. Be careful. You can corrupt memory if you mess the arguments up.
> z.shape
(3, 3)
> z.strides
(24, 8)
from numpy.lib.stride_tricks import as_strided
z2 = as_strided(z, shape=(4, 3, 3), strides=(0, 24, 8))
and you end up with:
>>> z2[1, 1, 1]
4
>>> z2[1, 1, 1] = 100
>>> z2[2, 1, 1]
100
>>>
You are using strides to say that I want to create a second array overlayed on top of the first array. You set its new shape, and you prefix 0 to the previous stride, indicating that the first dimension has no effect on the data you want.
Make sure you understand strides.
numpy.repeat creates a new array and not a view (you can check it by looking the __array_interface__ field). In fact, it is not possible to create a view on the original array in the general case since Numpy views does not support such pattern. A views is basically just an object containing a pointer to a raw memory buffer, some strides, a shape and a type. While it is possible to repeat one item N times with a 0 stride, it is not possible to repeat 2 items N times (without adding a new dimension to the output array). Thus, no there is no way to build a function like numpy.repeat having the same array output shape to repeat items of the last axis. If adding a new dimension is Ok, then you can build an array with a new dimension and a stride set to 0. Repeating the last dimension is possible though. The answer of #FrankYellin gives a good example. Note that reshaping/ravel the resulting array cause a mandatory copy. Supporting such advanced views would make the Numpy code more complex or/and less efficient for a feature that is only used rarely by users.

how to use slicing to get 2 numbers in a multiple array (numpy)

if i have an array
a = np.array([[1,2,3],[4,5,6],[7,8,9]])
output would be a =[1,2,3],[4,5,6],[7,8,9]
using slice [start:endindex:stepindex],
how could i retrieve 3 and 7?
is it possible?
I have tried
a[:3:2]
this gave me 1rst row and third row
In [928]: a = np.array([[1,2,3],[4,5,6],[7,8,9]])
In [929]: a
Out[929]:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
[3,7] isn't regular pattern in this 2d array. But its flattened view:
In [931]: a.ravel()
Out[931]: array([1, 2, 3, 4, 5, 6, 7, 8, 9])
In [932]: a.ravel()[2::4]
Out[932]: array([3, 7])
In [933]: a.flat[2::4]
Out[933]: array([3, 7])
Now guarantee that it can be extended for larger arrays and selections.

Python numpy indexing confusion

I'm new in python, I was looking into a code which is similar to as follows,
import numpy as np
a = np.ones([1,1,5,5], dtype='int64')
b = np.ones([11], dtype='float64')
x = b[a]
print (x.shape)
# (1, 1, 5, 5)
I looked into the python numpy documentation I didn't find anything related to such case. I'm not sure what's going on here and I don't know where to look.
Edit
The actual code
def gausslabel(length=180, stride=2):
gaussian_pdf = signal.gaussian(length+1, 3)
label = np.reshape(np.arange(stride/2, length, stride), [1,1,-1,1])
y = np.reshape(np.arange(stride/2, length, stride), [1,1,1,-1])
delta = np.array(np.abs(label - y), dtype=int)
delta = np.minimum(delta, length-delta)+length/2
return gaussian_pdf[delta]
I guess that this code is trying to demonstrate that if you index an array with an array, the result is an array with the same shape as the indexing array (in this case a) and not the indexed array (i.e. b)
But it's confusing because b is full of 1s. Rather try this with a b full of different numbers:
>> a = np.ones([1,1,5,5], dtype='int64')
>> b = np.arange(11) + 3
array([ 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13])
>>> b[a]
array([[[[4, 4, 4, 4, 4],
[4, 4, 4, 4, 4],
[4, 4, 4, 4, 4],
[4, 4, 4, 4, 4],
[4, 4, 4, 4, 4]]]])
because a is an array of 1s, the only element of b that is indexed is b[1] which equals 4. The shape of the result though is the shape of a, the array used as the index.

Numpy array, specifiyng what elements to return

Say I have the following 5x5 numpy array called A
array([[6, 7, 7, 7, 8],
[4, 2, 5, 5, 9],
[1, 2, 4, 7, 4],
[0, 7, 3, 6, 8],
[4, 9, 6, 1, 6]])
and this 5x5 array called F
array([[1,0,0,0,0],
[1,0,0,0,0],
[1,0,0,0,0],
[1,0,0,0,0],
[0,0,0,0,0]])
I've been trying to use np.copyto, but I can't wrap my head around why it is not working/how it works.ValueError: could not broadcast input array from shape (5,5) into shape (2)
Is there a easy way to get the values of only the matching integers that have a corresponding 1 in F when laid over A? e.i it would return, 6,4,1,0
you can just do this little trick: A[F==1]
In [8]:
A[F==1]
Out[8]:
array([6, 4, 1, 0])
Check out Boolean indexing
To use np.copyto make sure that the destination array is np.empty.
This basically solved my problem.

How to split an array according to a condition in numpy?

For example, I have a ndarray that is:
a = np.array([1, 3, 5, 7, 2, 4, 6, 8])
Now I want to split a into two parts, one is all numbers <5 and the other is all >=5:
[array([1,3,2,4]), array([5,7,6,8])]
Certainly I can traverse a and create two new array. But I want to know does numpy provide some better ways?
Similarly, for multidimensional array, e.g.
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
[2, 4, 7]])
I want to split it according to the first column <3 and >=3, which result is:
[array([[1, 2, 3],
[2, 4, 7]]),
array([[4, 5, 6],
[7, 8, 9]])]
Are there any better ways instead of traverse it? Thanks.
import numpy as np
def split(arr, cond):
return [arr[cond], arr[~cond]]
a = np.array([1,3,5,7,2,4,6,8])
print split(a, a<5)
a = np.array([[1,2,3],[4,5,6],[7,8,9],[2,4,7]])
print split(a, a[:,0]<3)
This produces the following output:
[array([1, 3, 2, 4]), array([5, 7, 6, 8])]
[array([[1, 2, 3],
[2, 4, 7]]), array([[4, 5, 6],
[7, 8, 9]])]
It might be a quick solution
a = np.array([1,3,5,7])
b = a >= 3 # variable with condition
a[b] # to slice the array
len(a[b]) # count the elements in sliced array
1d array
a = numpy.array([2,3,4,...])
a_new = a[(a < 4)] # to get elements less than 5
2d array based on column(consider value of column i should be less than 5,
a = numpy.array([[1,2],[5,6],...]
a = a[(a[:,i] < 5)]
if your condition is multicolumn based, then you can make a new array applying the conditions on the columns. Then you can just compare the new array with value 5(according to my assumption) to get indexes and follow above codes.
Note that, whatever i have written in (), returns the index array.

Categories

Resources