There are 2 np.arrays and I would like to reshape np.array1 from shape (12,)in reference to array2 with shape (4,):
array1 = np.array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]) and
array1.shape
returns: (12,)
array2 = np.array([ 12, 34, 56, 78])
and
array2.shape
returns: (4,)
I tried to execute reshape:
array1.reshape(array2.shape)
But, there is an error:
ValueError: cannot reshape array of size 12 into shape (4,)
So, Expected result is array1 with 4 elements:
np.array([ 1, 2, 3, 4]),
instead of 12.
I'd appreciate for any idea and help.
If I understand your requirements correctly, I think what you're looking for is simple slicing:
In [140]: array2 = np.array([ 12, 34, 56, 78])
In [135]: a_sliced = array1[:array2.shape[0]]
In [136]: a_sliced.shape
Out[136]: (4,)
If array2 is multi-dimensional, then use the approach suggested by Mad Physicist:
sliced_arr = array1[tuple(slice(0, d) for d in array2.shape)]
Alternatively, if you intended to split the array into three equal halves, then use numpy.split() as in:
# split `array1` into 3 portions
In [138]: np.split(array1, 3)
Out[138]: [array([1, 2, 3, 4]), array([5, 6, 7, 8]), array([ 9, 10, 11, 12])]
Related
I have a 2-D numpy array X with shape (100, 4). I want to find the sum of each row of that
array and store it inside a new numpy array x_new with shape (100,0). What I've done so far
doesn't work. Any suggestions ?. Below is my approach.
x_new = np.empty([100,0])
for i in range(len(X)):
array = np.append(x_new, sum(X[i]))
Using the sum method on a 2d array:
In [8]: x = np.arange(12).reshape(3,4)
In [9]: x
Out[9]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
In [10]: x.sum(axis=1)
Out[10]: array([ 6, 22, 38])
In [12]: x.sum(axis=1, keepdims=True)
Out[12]:
array([[ 6],
[22],
[38]])
In [13]: _.shape
Out[13]: (3, 1)
reference: https://numpy.org/doc/stable/reference/generated/numpy.sum.html
z = np.arange(15).reshape(3,5)
indexx = [0,2]
indexy = [1,2,3,4]
zz = []
for i in indexx:
for j in indexy:
zz.append(z[i][j])
Output:
zz >> [1, 2, 3, 4, 11, 12, 13, 14]
This essentially flattens the array but only keeping the elements that have indicies present in the two indices list.
This works, but it is very slow for larger arrays/list of indicies. Is there a way to speed this up using numpy?
Thanks.
Edited to show desired output.
A list of integers can be used to access the entries of interest for numpy arrays.
z[indexx][:,indexy].flatten()
x = {"apple", "banana", "cherry"}
y = {"google", "microsoft", "apple"}
z = x.intersection(y)
print(z)
z => apples
If I understand you correctly, just use Python set. And then cast it to list.
Indexing in several dimensions at once requires broadcasting the indices against each other. np.ix_ is a handy tool for doing this:
In [127]: z
Out[127]:
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
In [128]: z[np.ix_(indexx, indexy)]
Out[128]:
array([[ 1, 2, 3, 4],
[11, 12, 13, 14]])
Converting that to 1d is a trivial ravel() task.
Look at the ix_ produces, here it's a (2,1) and (1,4) array. You can construct such arrays 'from-scratch':
In [129]: np.ix_(indexx, indexy)
Out[129]:
(array([[0],
[2]]),
array([[1, 2, 3, 4]]))
Provided a numpy array:
arr = np.array([0,1,2,3,4,5,6,7,8,9,10,11,12])
I wonder how access chosen size chunks with chosen separation, both concatenated and in slices:
E.g.: obtain chunks of size 3 separated by two values:
arr_chunk_3_sep_2 = np.array([0,1,2,5,6,7,10,11,12])
arr_chunk_3_sep_2_in_slices = np.array([[0,1,2],[5,6,7],[10,11,12])
Wha is the most efficient way to do it? If possible, I would like to avoid copying or creating new objects as much as possible. Maybe Memoryviews could be of help here?
Approach #1
Here's one with masking -
def slice_grps(a, chunk, sep):
N = chunk + sep
return a[np.arange(len(a))%N < chunk]
Sample run -
In [223]: arr
Out[223]: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
In [224]: slice_grps(arr, chunk=3, sep=2)
Out[224]: array([ 0, 1, 2, 5, 6, 7, 10, 11, 12])
Approach #2
If the input array is such that the last chunk would have enough runway, we could , we could leverage np.lib.stride_tricks.as_strided, inspired by this post to select m elements off each block of n elements -
# https://stackoverflow.com/a/51640641/ #Divakar
def skipped_view(a, m, n):
s = a.strides[0]
strided = np.lib.stride_tricks.as_strided
shp = ((a.size+n-1)//n,n)
return strided(a,shape=shp,strides=(n*s,s), writeable=False)[:,:m]
out = skipped_view(arr,chunk,chunk+sep)
Note that the output would be a view into the input array and as such no extra memory overhead and virtually free!
Sample run to make things clear -
In [255]: arr
Out[255]: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
In [256]: chunk = 3
In [257]: sep = 2
In [258]: skipped_view(arr,chunk,chunk+sep)
Out[258]:
array([[ 0, 1, 2],
[ 5, 6, 7],
[10, 11, 12]])
# Let's prove that the output is a view indeed
In [259]: np.shares_memory(arr, skipped_view(arr,chunk,chunk+sep))
Out[259]: True
How about a reshape and slice?
In [444]: arr = np.array([0,1,2,3,4,5,6,7,8,9,10,11,12])
In [445]: arr.reshape(-1,5)
...
ValueError: cannot reshape array of size 13 into shape (5)
Ah a problem - your array isn't big enough for this reshape - so we have to pad it:
In [446]: np.concatenate((arr,np.zeros(2,int))).reshape(-1,5)
Out[446]:
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 0, 0]])
In [447]: np.concatenate((arr,np.zeros(2,int))).reshape(-1,5)[:,:-2]
Out[447]:
array([[ 0, 1, 2],
[ 5, 6, 7],
[10, 11, 12]])
as_strided can get a way with this by including bytes outside the databuffer. Usually that's seen as a bug, though here it can be an asset - provided you really do throw that garbage away.
Or throwing away the last incomplete line:
In [452]: arr[:-3].reshape(-1,5)[:,:3]
Out[452]:
array([[0, 1, 2],
[5, 6, 7]])
I have scripts with multi-dimensional arrays and instead of for-loops I would like to use a vectorized implementation for my problems (which sometimes contain column operations).
Let's consider a simple example with matrix arr:
> arr = np.arange(12).reshape(3, 4)
> arr
> ([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
> arr.shape
> (3, 4)
So we have a matrix arr with 3 rows and 4 columns.
The simplest case in my scripts is adding something to the values in the array. E.g. I'm doing this for single or multiple rows:
> someVector = np.array([1, 2, 3, 4])
> arr[0] += someVector
> arr
> array([[ 1, 3, 5, 7], <--- successfully added someVector
[ 4, 5, 6, 7], to one row
[ 8, 9, 10, 11]])
> arr[0:2] += someVector
> arr
> array([[ 2, 5, 8, 11], <--- added someVector to two
[ 5, 7, 9, 11], <--- rows at once
[ 8, 9, 10, 11]])
This works well. However, sometimes I need to manipulate one or several columns. One column at a time works:
> arr[:, 0] += [1, 2, 3]
> array([[ 3, 5, 8, 11],
[ 7, 7, 9, 11],
[11, 9, 10, 11]])
^
|___ added the values [1, 2, 3] successfully to
this column
But I am struggling to think out why this does not work for multiple columns at once:
> arr[:, 0:2] += [1, 2, 3]
> ValueError
> Traceback (most recent call last)
> <ipython-input-16-5feef53e53af> in <module>()
> ----> 1 arr[:, 0:2] += [1, 2, 3]
> ValueError: operands could not be broadcast
> together with shapes (3,2) (3,) (3,2)
Isn't this the very same way it works with rows? What am I doing wrong here?
To add a 1D array to multiple columns you need to broadcast the values to a 2D array. Since broadcasting adds new axes on the left (of the shape) by default, broadcasting a row vector to multiple rows happens automatically:
arr[0:2] += someVector
someVector has shape (N,) and gets automatically broadcasted to shape (1, N). If arr[0:2] has shape (2, N), then the sum is performed element-wise as though both arr[0:2] and someVector were arrays of the same shape, (2, N).
But to broadcast a column vector to multiple columns requires hinting NumPy that you want broadcasting to occur with the axis on the right. In fact, you have to add the new axis on the right explicitly by using someVector[:, np.newaxis] or equivalently someVector[:, None]:
In [41]: arr = np.arange(12).reshape(3, 4)
In [42]: arr[:, 0:2] += np.array([1, 2, 3])[:, None]
In [43]: arr
Out[43]:
array([[ 1, 2, 2, 3],
[ 6, 7, 6, 7],
[11, 12, 10, 11]])
someVector (e.g. np.array([1, 2, 3])) has shape (N,) and someVector[:, None] has shape (N, 1) so now broadcasting happens on the right. If arr[:, 0:2] has shape (N, 2), then the sum is performed element-wise as though both arr[:, 0:2] and someVector[:, None] were arrays of the same shape, (N, 2).
Very clear explanation of #unutbu.
As a complement, transposition (.T) can often simplify the task, by working in the first dimension :
In [273]: arr = np.arange(12).reshape(3, 4)
In [274]: arr.T[0:2] += [1, 2, 3]
In [275]: arr
Out[275]:
array([[ 1, 2, 2, 3],
[ 6, 7, 6, 7],
[11, 12, 10, 11]])
Given a list of numpy 2d-arrays of size mxn, what is the best way to get an array of size n (number of columns of each matrix in the list) where the i-the value of the array is the maximum of column i, across all matrices in the list?
>>> import numpy as np
>>> a = np.array([[1,11,5,2], [3,9,1,12], [5,7,7,1]])
>>> a
array([[ 1, 11, 5, 2],
[ 3, 9, 1, 12],
[ 5, 7, 7, 1]])
Max by column
>>> a.max(axis=0)
array([ 5, 11, 7, 12])
Max by row
>>> a.max(axis=1)
array([11, 12, 7])
If you have a list of 2D numpy arrays:
>>> a = np.array([[1,11,5,2], [3,9,1,12], [5,7,7,1]])
>>> b = np.array([[2,4,6,8],[1,3,2,1],[5,6,7,8]])
>>> l = [a,b]
You can use a list comprehension
>>> [i.max(axis=0) for i in l]
[array([ 5, 11, 7, 12]),
array([5, 6, 7, 8])]
>>> [i.max(axis=1) for i in l]
[array([11, 12, 7]),
array([8, 3, 8])]
You can first stack the arrays vertically and then take the maximum of each column:
np.vstack(list_of_arrays).max(axis=0)