Fill empty numpy array inside for loop - python

I have a 2-D numpy array X with shape (100, 4). I want to find the sum of each row of that
array and store it inside a new numpy array x_new with shape (100,0). What I've done so far
doesn't work. Any suggestions ?. Below is my approach.
x_new = np.empty([100,0])
for i in range(len(X)):
array = np.append(x_new, sum(X[i]))

Using the sum method on a 2d array:
In [8]: x = np.arange(12).reshape(3,4)
In [9]: x
Out[9]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
In [10]: x.sum(axis=1)
Out[10]: array([ 6, 22, 38])
In [12]: x.sum(axis=1, keepdims=True)
Out[12]:
array([[ 6],
[22],
[38]])
In [13]: _.shape
Out[13]: (3, 1)
reference: https://numpy.org/doc/stable/reference/generated/numpy.sum.html

Related

Apply multiple masks at once to a Numpy array

Is there a way to apply multiple masks at once to a multi-dimensional Numpy array?
For instance:
X = np.arange(12).reshape(3, 4)
# array([[ 0, 1, 2, 3],
# [ 4, 5, 6, 7],
# [ 8, 9, 10, 11]])
m0 = (X>0).all(axis=1) # array([False, True, True])
m1 = (X<3).any(axis=0) # array([ True, True, True, False])
# In one step: error
X[m0, m1]
# IndexError: shape mismatch: indexing arrays could not
# be broadcast together with shapes (2,) (3,)
# In two steps: works (but awkward)
X[m0, :][:, m1]
# array([[ 4, 5, 6],
# [ 8, 9, 10]])
Try:
>>> X[np.ix_(m0, m1)]
array([[ 4, 5, 6],
[ 8, 9, 10]])
From the docs:
Combining multiple Boolean indexing arrays or a Boolean with an integer indexing array can best be understood with the obj.nonzero() analogy. The function ix_ also supports boolean arrays and will work without any surprises.
Another solution (also straight from the docs but less intuitive IMO):
>>> X[m0.nonzero()[0][:, np.newaxis], m1]
array([[ 4, 5, 6],
[ 8, 9, 10]])
The error tells you what you need to do: the mask dimensions need to broadcast together. You can fix this at the source:
m0 = (X>0).all(axis=1, keepdims=True)
m1 = (X<3).any(axis=0, keepdims=True)
>>> X[m0 & m1]
array([ 4, 5, 6, 8, 9, 10])
You only really need to apply keepdims to m0, so you can leave the masks as 1D:
>>> X[m0[:, None] & m1]
array([ 4, 5, 6, 8, 9, 10])
You can reshape to the desired shape:
>>> X[m0[:, None] & m1].reshape(np.count_nonzero(m0), np.count_nonzero(m1))
array([[ 4, 5, 6],
[ 8, 9, 10]])
Another option is to convert the masks to indices:
>>> X[np.flatnonzero(m0)[:, None], np.flatnonzero(m1)]
array([[ 4, 5, 6],
[ 8, 9, 10]])

Understanding references: why this Numpy assignment is not working?

I have a little test code like so:
import numpy as np
foo = np.zeros(1, dtype=int)
bar = np.zeros((10, 1), dtype=int)
foo_copy = np.copy(foo)
bar[-1] = foo_copy
foo_copy[-1] = 10
print(foo_copy)
print(bar)
I was expecting both foo_copy and the last element of bar to contain the value 10, but instead the last element of bar is still an np array with value 0 in it.
[10]
[[0]
[0]
[0]
[0]
[0]
[0]
[0]
[0]
[0]
[0]] # <<--- why not 10?
Isn't that last element pointing to foo_copy?
Or in all assignments np will copy the data over and I can't change it by using the original ndarray?
If so, is there a way to keep that last element as a pointer to foo_bar?
A numpy array have numeric values, not references (at least for numeric dtypes):
Make a 1d array, and reshape it to 2d:
In [64]: bar = np.arange(12).reshape(4,3)
In [65]: bar
Out[65]:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]])
Another 1d array:
In [66]: foo = np.array([10])
In [67]: foo
Out[67]: array([10])
This assignment is by value:
In [68]: bar[1,1] = foo
In [69]: bar
Out[69]:
array([[ 0, 1, 2],
[ 3, 10, 5],
[ 6, 7, 8],
[ 9, 10, 11]])
So is this, though the values are broadcasted to the whole row:
In [70]: bar[2] = foo
In [71]: bar
Out[71]:
array([[ 0, 1, 2],
[ 3, 10, 5],
[10, 10, 10],
[ 9, 10, 11]])
We can view the 2d array as 1d. This is closer representation of how the values are actually stored (but in a c byte array, 12*8 bytes long):
In [72]: bar1 = bar.ravel()
In [73]: bar1
Out[73]: array([ 0, 1, 2, 3, 10, 5, 10, 10, 10, 9, 10, 11])
Changing an element of view changes the corresponding element of the 2d:
In [74]: bar1[3] = 30
In [75]: bar
Out[75]:
array([[ 0, 1, 2],
[30, 10, 5],
[10, 10, 10],
[ 9, 10, 11]])
While we can make object dtype arrays, which store references as lists do, they do not have any performance benefits.
The bytestring containing the 'raw data' of bar:
In [76]: bar.tobytes()
Out[76]: b'\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x1e\x00\x00\x00\x00\x00\x00\x00\n\x00\x00\x00\x00\x00\x00\x00\x05\x00\x00\x00\x00\x00\x00\x00\n\x00\x00\x00\x00\x00\x00\x00\n\x00\x00\x00\x00\x00\x00\x00\n\x00\x00\x00\x00\x00\x00\x00\t\x00\x00\x00\x00\x00\x00\x00\n\x00\x00\x00\x00\x00\x00\x00\x0b\x00\x00\x00\x00\x00\x00\x00'
The fabled numpy speed comes from working with this raw data with compiled c code. Accessing individual elements with the Python code is relatively slow. It's the whole-array operations like bar*3 that are fast.

Numpy call array values by list of indices

I have a 2D array of values, and I want to call it by two list of indices x,y. It used to work perfect before, I don't know why it's not working now, maybe python version, not sure.
x = np.squeeze(np.where(data['info'][:,2]==cdp)[0])
y = np.squeeze(np.where((data['Time']>=ub) & (data['Time']<=lb))[0])
s = data['gather'][x,y]
Error:
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (36,) (45,)
I don't what is the problem. It works when I do it in two stages.
s = data['gather'][:,y]; s = s[x,:]
But, I can't do this, I need to do at one run
In [92]: data = np.arange(12).reshape(3,4)
In [93]: x,y = np.arange(3), np.arange(4)
In [94]: data[x,y]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-94-8bd18da6c0ef> in <module>
----> 1 data[x,y]
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (3,) (4,)
When you provide 2 or more arrays as indices, numpy broadcasts them against each other. Understanding broadcasting is important.
In MATLAB providing two indexing arrays (actually 2d matrices) fetches a block. In numpy, to arrays, if they match in shape, fetch elements, e.g. a diagonal:
In [99]: data[x,x]
Out[99]: array([ 0, 5, 10])
The MATLAB equivalent requires an extra function, 'indices to sub' or some such name.
Two stage indexing:
In [95]: data[:,y][x,:]
Out[95]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
ix_ is a handy tool for constructing indices for block access:
In [96]: data[np.ix_(x,y)]
Out[96]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
Notice what it produces:
In [97]: np.ix_(x,y)
Out[97]:
(array([[0],
[1],
[2]]), array([[0, 1, 2, 3]]))
that's the same as doing:
In [98]: data[x[:,None], y]
Out[98]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
x[:,None] is (3,1), y is (4,); they broadcast to produce a (3,4) selection.

Reshape np.array based on the default size

There are 2 np.arrays and I would like to reshape np.array1 from shape (12,)in reference to array2 with shape (4,):
array1 = np.array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]) and
array1.shape
returns: (12,)
array2 = np.array([ 12, 34, 56, 78])
and
array2.shape
returns: (4,)
I tried to execute reshape:
array1.reshape(array2.shape)
But, there is an error:
ValueError: cannot reshape array of size 12 into shape (4,)
So, Expected result is array1 with 4 elements:
np.array([ 1, 2, 3, 4]),
instead of 12.
I'd appreciate for any idea and help.
If I understand your requirements correctly, I think what you're looking for is simple slicing:
In [140]: array2 = np.array([ 12, 34, 56, 78])
In [135]: a_sliced = array1[:array2.shape[0]]
In [136]: a_sliced.shape
Out[136]: (4,)
If array2 is multi-dimensional, then use the approach suggested by Mad Physicist:
sliced_arr = array1[tuple(slice(0, d) for d in array2.shape)]
Alternatively, if you intended to split the array into three equal halves, then use numpy.split() as in:
# split `array1` into 3 portions
In [138]: np.split(array1, 3)
Out[138]: [array([1, 2, 3, 4]), array([5, 6, 7, 8]), array([ 9, 10, 11, 12])]

numpy: how to get a max from an argmax result

I have a numpy array of arbitrary shape, e.g.:
a = array([[[ 1, 2],
[ 3, 4],
[ 8, 6]],
[[ 7, 8],
[ 9, 8],
[ 3, 12]]])
a.shape = (2, 3, 2)
and a result of argmax over the last axis:
np.argmax(a, axis=-1) = array([[1, 1, 0],
[1, 0, 1]])
I'd like to get max:
np.max(a, axis=-1) = array([[ 2, 4, 8],
[ 8, 9, 12]])
But without recalculating everything. I've tried:
a[np.arange(len(a)), np.argmax(a, axis=-1)]
But got:
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (2,) (2,3)
How to do it? Similar question for 2-d: numpy 2d array max/argmax
You can use advanced indexing -
In [17]: a
Out[17]:
array([[[ 1, 2],
[ 3, 4],
[ 8, 6]],
[[ 7, 8],
[ 9, 8],
[ 3, 12]]])
In [18]: idx = a.argmax(axis=-1)
In [19]: m,n = a.shape[:2]
In [20]: a[np.arange(m)[:,None],np.arange(n),idx]
Out[20]:
array([[ 2, 4, 8],
[ 8, 9, 12]])
For a generic ndarray case of any number of dimensions, as stated in the comments by #hpaulj, we could use np.ix_, like so -
shp = np.array(a.shape)
dim_idx = list(np.ix_(*[np.arange(i) for i in shp[:-1]]))
dim_idx.append(idx)
out = a[dim_idx]
For ndarray with arbitrary shape, you can flatten the argmax indices, then recover the correct shape, as so:
idx = np.argmax(a, axis=-1)
flat_idx = np.arange(a.size, step=a.shape[-1]) + idx.ravel()
maximum = a.ravel()[flat_idx].reshape(*a.shape[:-1])
For arbitrary-shape arrays, the following should work :)
a = np.arange(5 * 4 * 3).reshape((5,4,3))
# for last axis
argmax = a.argmax(axis=-1)
a[tuple(np.indices(a.shape[:-1])) + (argmax,)]
# for other axis (eg. axis=1)
argmax = a.argmax(axis=1)
idx = list(np.indices(a.shape[:1]+a.shape[2:]))
idx[1:1] = [argmax]
a[tuple(idx)]
or
a = np.arange(5 * 4 * 3).reshape((5,4,3))
argmax = a.argmax(axis=0)
np.choose(argmax, np.moveaxis(a, 0, 0))
argmax = a.argmax(axis=1)
np.choose(argmax, np.moveaxis(a, 1, 0))
argmax = a.argmax(axis=2)
np.choose(argmax, np.moveaxis(a, 2, 0))
argmax = a.argmax(axis=-1)
np.choose(argmax, np.moveaxis(a, -1, 0))

Categories

Resources