Is there a numpy way to reduce arrays? - python

I have this numpy array which is a concatention of other numpy arrays
array([array([[ 0., 1., 0., 0., 1., 0.]]),
array([[ 1., 0., 0., 1., 0., 0.]]),
array([[ 0., 0., 0., 0., 1., 1.]]),
array([[ 0., 1., 0., 0., 0., 1.]]),
array([[ 0., 1., 0., 1., 0., 0.]]),
array([[ 1., 0., 0., 0., 0., 1.]])], dtype=object)
its current shape is (6,). what I want is this with a shape (6,6)
array([[ 0., 1., 0., 0., 1., 0.],
[ 1., 0., 0., 1., 0., 0.],
[ 0., 0., 0., 0., 1., 1.],
[ 0., 1., 0., 0., 0., 1.],
[ 0., 1., 0., 1., 0., 0.],
[ 1., 0., 0., 0., 0., 1.]], dtype=object)
Is there a numpy way to solve this problem or do I have to loop through the arrays and append it?

If the display is accurate, and the array really is (6,), then we have to recreate it with:
In [27]: array=np.array
In [28]: alist = [array([[ 0., 1., 0., 0., 1., 0.]]),
...: array([[ 1., 0., 0., 1., 0., 0.]]),
...: array([[ 0., 0., 0., 0., 1., 1.]]),
...: array([[ 0., 1., 0., 0., 0., 1.]]),
...: array([[ 0., 1., 0., 1., 0., 0.]]),
...: array([[ 1., 0., 0., 0., 0., 1.]])]
...:
In [29]: A = np.empty((6,),object)
In [30]: A
Out[30]: array([None, None, None, None, None, None], dtype=object)
In [31]: A[:]=alist
In [32]: A
Out[32]:
array([array([[ 0., 1., 0., 0., 1., 0.]]),
array([[ 1., 0., 0., 1., 0., 0.]]),
array([[ 0., 0., 0., 0., 1., 1.]]),
array([[ 0., 1., 0., 0., 0., 1.]]),
array([[ 0., 1., 0., 1., 0., 0.]]),
array([[ 1., 0., 0., 0., 0., 1.]])], dtype=object)
reshape does not work:
In [33]: A.reshape(6,6)
...
ValueError: cannot reshape array of size 6 into shape (6,6)
But the array can be treated as a list, and given to concatenate:
In [34]: np.concatenate(A, axis=1)
Out[34]:
array([[ 0., 1., 0., 0., 1., 0., 1., 0., 0., 1., 0., 0., 0.,
0., 0., 0., 1., 1., 0., 1., 0., 0., 0., 1., 0., 1.,
0., 1., 0., 0., 1., 0., 0., 0., 0., 1.]])
In [35]: np.concatenate(A, axis=0)
Out[35]:
array([[ 0., 1., 0., 0., 1., 0.],
[ 1., 0., 0., 1., 0., 0.],
[ 0., 0., 0., 0., 1., 1.],
[ 0., 1., 0., 0., 0., 1.],
[ 0., 1., 0., 1., 0., 0.],
[ 1., 0., 0., 0., 0., 1.]])
Concatenate on the list works just as well: np.concatenate(alist, axis=0)
I should note that the resulting array is dtype float, not object. It could be converted with astype, but who would want that?
Simple copy-n-paste produces a 3d array, since the outer array ignores the inner division and creates as high-a-dimensional array as it can:
In [37]: array([array([[ 0., 1., 0., 0., 1., 0.]]),
...: array([[ 1., 0., 0., 1., 0., 0.]]),
...: array([[ 0., 0., 0., 0., 1., 1.]]),
...: array([[ 0., 1., 0., 0., 0., 1.]]),
...: array([[ 0., 1., 0., 1., 0., 0.]]),
...: array([[ 1., 0., 0., 0., 0., 1.]])])
Out[37]:
array([[[ 0., 1., 0., 0., 1., 0.]],
[[ 1., 0., 0., 1., 0., 0.]],
...
[[ 1., 0., 0., 0., 0., 1.]]])
In [38]: _.shape
Out[38]: (6, 1, 6)
So we need to careful how we recreate cases like this.

You should try this:
my_array = my_array.reshape(6,6)
It works with the above array when pasted as is as it will remove the third dimension. Other methods like vstack and concatenate as shown on #Divikar comment above should work as well for this purpose

Related

Adding zeros in between elements in numpy array with (a,b,c) shape

I have a numpy array like this
array([[[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.]],
[[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.]]])
shape = (2, 3, 5)
And I want an output which looks like this
output = array([[[1., 0., 0., 1., 0., 0., 1., 0., 0., 1., 0., 0., 1.],
[1., 0., 0., 1., 0., 0., 1., 0., 0., 1., 0., 0., 1.],
[1., 0., 0., 1., 0., 0., 1., 0., 0., 1., 0., 0., 1.]],
[[1., 0., 0., 1., 0., 0., 1., 0., 0., 1., 0., 0., 1.],
[1., 0., 0., 1., 0., 0., 1., 0., 0., 1., 0., 0., 1.],
[1., 0., 0., 1., 0., 0., 1., 0., 0., 1., 0., 0., 1.]]])
Note: The number of zeros to be inserted can vary depending on given factor, in this case the factor was k=3, and the insert is (k-1) which means two zeros will be inserted between numbers. Also given this output I would like to get to the initial input
You can use numpy.zeros to initialize an output array of the desired shape, then indexing to fill the values:
k = 3
shape = a.shape
output = np.zeros(shape[:-1]+((shape[-1]-1)*k+1,))
output[...,::k] = a
output:
array([[[1., 0., 0., 1., 0., 0., 1., 0., 0., 1., 0., 0., 1.],
[1., 0., 0., 1., 0., 0., 1., 0., 0., 1., 0., 0., 1.],
[1., 0., 0., 1., 0., 0., 1., 0., 0., 1., 0., 0., 1.]],
[[1., 0., 0., 1., 0., 0., 1., 0., 0., 1., 0., 0., 1.],
[1., 0., 0., 1., 0., 0., 1., 0., 0., 1., 0., 0., 1.],
[1., 0., 0., 1., 0., 0., 1., 0., 0., 1., 0., 0., 1.]]])

Sparse DataArray Xarray search

Using DataArray objects in xarray what is the best way to find all cells that have values != 0.
For example in pandas I would do
df.loc[df.col1 > 0]
My specific example I'm trying to look at 3 dimensional brain imaging data.
first_image_xarray.shape
(140, 140, 96)
dims = ['x','y','z']
Looking at the documentation for xarray.DataArray.where it seems I want something like this:
first_image_xarray.where(first_image_xarray.y + first_image_xarray.x > 0,drop = True)[:,0,0]
But I still get arrays with zeros.
<xarray.DataArray (x: 140)>
array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., -0., 0., -0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
Dimensions without coordinates: x
Also - a side question - why are there some negative zeros? Are these values rounded and -0. is actually equal to something like -0.009876 or something?
(Answer to main question)
You are almost there. However, a slight syntax difference makes a big difference here. On one hand, here is the solution to filter >0 values using a "value-based" mask.
# if you want to DROP values which do not suffice a mask condition
first_image_xarray[:,0,0].where(first_image_xarray[:,0,0] > 0, drop=True)
or
# if you want to KEEP values which do not suffice a mask condition as nan
first_image_xarray[:,0,0].where(first_image_xarray[:,0,0] > 0, np.nan)
On the other hand, the reason why your attempt did not work as you hoped is because with first_image_xarray.x, it is referring to the index of elements in the array (in x direction) rather than referring to the value of the elements. Thus only the 1st element of your output should be nan instead of 0 because it only does not suffice the mask condition in slice [:,0,0]. Yes, you were creating an "index-based" mask.
The following small experiment (hopefully) articulates this critical difference.
Suppose we have DataArray which consists of only 0 and 1 (dimension is aligned with the original post (OP) of the question (140,140,96)). Firstly let's mask it based on index as OP did:
import numpy as np
import xarray as xr
np.random.seed(0)
# create a DataArray which randomly contains 0 or 1 values
a = xr.DataArray(np.random.randint(0, 2, 140*140*96).reshape((140, 140, 96)), dims=('x', 'y', 'z'))
# with this "index-based" mask, only elements where index of both x and y are 0 are replaced by nan
a.where(a.x + a.y > 0, drop=True)[:,0,0]
Out:
<xarray.DataArray (x: 140)>
array([ nan, 0., 1., 1., 0., 0., 0., 1., 0., 0., 0., 0.,
0., 1., 0., 1., 0., 1., 0., 0., 0., 1., 0., 0.,
1., 1., 0., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 0., 1., 1., 1., 1., 1., 1., 1., 0., 1.,
1., 0., 0., 0., 1., 1., 1., 0., 0., 1., 0., 0.,
1., 0., 1., 1., 0., 0., 1., 0., 0., 1., 1., 1.,
0., 0., 0., 1., 1., 0., 1., 0., 1., 1., 0., 0.,
0., 0., 1., 1., 0., 1., 1., 1., 1., 0., 1., 0.,
0., 0., 0., 0., 0., 0., 1., 0., 1., 1., 0., 0.,
0., 0., 1., 0., 1., 0., 0., 0., 0., 1., 0., 1.,
0., 0., 1., 0., 0., 0., 0., 0., 1., 1., 0., 0.,
0., 1., 0., 0., 1., 0., 0., 1.])
Dimensions without coordinates: x
With the mask above, only the element where index of both x and y are 0 turns in to nan and the rest has not been changed or dropped at all.
In contrast, the proposed solution masks the DataArray based on the values of DataArray elements.
# with this "value-based" mask, all the values which do not suffice the mask condition are dropped
a[:,0,0].where(a[:,0,0] > 0, drop=True)
Out:
<xarray.DataArray (x: 65)>
array([ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1.])
Dimensions without coordinates: x
This successfully dropped all the values which do not suffice a mask condition based on the values of DataArray elements.
(Answer to side question)
As for the origin of -0 and 0 in DataArray, rounded values from negative or positive side towards 0 would be the possibility: A related discussion was done here How to eliminate the extra minus sign when rounding negative numbers towards zero in numpy? The below is a tiny example of this case.
import numpy as np
import xarray as xr
xr_array = xr.DataArray([-0.1, 0.1])
# you can use either xr.DataArray.round() or np.round() for rounding values of DataArray
xr.DataArray.round(xr_array)
Out:
<xarray.DataArray (dim_0: 2)>
array([-0., 0.])
Dimensions without coordinates: dim_0
np.round(xr_array)
Out:
<xarray.DataArray (dim_0: 2)>
array([-0., 0.])
Dimensions without coordinates: dim_0
As a side note, the other possibility for getting -0 in NumPy array can be numpy.set_printoptions(precision=0), which hides below decimal point like below (but I know this is not the case this time since you are using DataArray):
import numpy as np
# default value is precision=8 in ver1.15
np.set_printoptions(precision=0)
np.array([-0.1, 0.1])
Out:
array([-0., 0.])
Anyway, My best guess is that the conversion to -0 should be manual and intentional rather than automatic in data preparation & pre-processing phase.
Hope this helps.

Appending matrix A with matrix B

Say I have two matrices A and B. For example,
A = numpy.zeros((5,5))
B = np.eye(5)
Is there a way to append A and B?
It sounds to me like you're looking for np.hstack:
>>> import numpy as np
>>> a = np.zeros((5, 5))
>>> b = np.eye(5)
>>> np.hstack((a, b))
array([[ 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 1., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]])
np.vstack will work if you want to stack them downward:
>>> np.vstack((a, b))
array([[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 1., 0., 0., 0., 0.],
[ 0., 1., 0., 0., 0.],
[ 0., 0., 1., 0., 0.],
[ 0., 0., 0., 1., 0.],
[ 0., 0., 0., 0., 1.]])

"Trailing" One-Hot Encode

I am trying to do something similar to One-Hot-Encoding but instead of the selected class being 1 and the rest zero, I want all the classes up to (and including the selected class) to be 1. Say I have a training batch with labels (5 possible class labels; 0, 1, 2, 3, 4)
y = np.array([0,2,1,3,4,1])
I can one-hot-encode with
def one_hot_encode(arr, num_classes):
return np.eye(num_classes)[arr]
which gives
>>> one_hot_encode(y, 5)
array([[ 1., 0., 0., 0., 0.],
[ 0., 0., 1., 0., 0.],
[ 0., 1., 0., 0., 0.],
[ 0., 0., 0., 1., 0.],
[ 0., 0., 0., 0., 1.],
[ 0., 1., 0., 0., 0.]])
I liked to get
array([[ 1., 0., 0., 0., 0.],
[ 1., 1., 1., 0., 0.],
[ 1., 1., 0., 0., 0.],
[ 1., 1., 1., 1., 0.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 0., 0., 0.]])
Anyone know how to do this?
You could achieve this by using a lower-triangular matrix instead of an identity matrix in your function definition:
def many_hot_encode(arr, num_classes):
return np.tril(np.ones(num_classes))[arr]
many_hot_encode(y,5)
array([[ 1., 0., 0., 0., 0.],
[ 1., 1., 1., 0., 0.],
[ 1., 1., 0., 0., 0.],
[ 1., 1., 1., 1., 0.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 0., 0., 0.]])
You can also use broadcasting -
out = (y[:,None] >= np.arange(num_classes)).astype(float)
Sample run -
In [71]: y = np.array([0,2,1,3,4,1])
In [72]: num_classes = 5
In [73]: (y[:,None] >= np.arange(num_classes)).astype(float)
Out[73]:
array([[ 1., 0., 0., 0., 0.],
[ 1., 1., 1., 0., 0.],
[ 1., 1., 0., 0., 0.],
[ 1., 1., 1., 1., 0.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 0., 0., 0.]])

reduce() hstack python

I am trying to use reduce() function to create a function hstack() which horizontally stacks multiple arrays. As a simple example, lets say
>>>>M=eye((4))
>>>>M
array([[ 1., 0., 0., 0.],
[ 0., 1., 0., 0.],
[ 0., 0., 1., 0.],
[ 0., 0., 0., 1.]])
>>>>hstack([M,M])
array([[ 1., 0., 0., 0., 1., 0., 0., 0.],
[ 0., 1., 0., 0., 0., 1., 0., 0.],
[ 0., 0., 1., 0., 0., 0., 1., 0.],
[ 0., 0., 0., 1., 0., 0., 0., 1.]])
This works as I want. Now I define
>>>> hstackm = lambda *args: reduce(hstack, args)
And try to do the hstack() from the previous case
>>>>hstackm([M,M])
[array([[ 1., 0., 0., 0.],
[ 0., 1., 0., 0.],
[ 0., 0., 1., 0.],
[ 0., 0., 0., 1.]]),
array([[ 1., 0., 0., 0.],
[ 0., 1., 0., 0.],
[ 0., 0., 1., 0.],
[ 0., 0., 0., 1.]])]
Which is incorrect. How do I define hstackm() to obtain a proper output?
My final objective will be to create a hstackm() function to stack SPARSE matrices if it is possible. Something like,
hstackm = lambda *args: reduce(sparse.hstack, args).
The _*args_ would be csr or _lil_matrix_
thank you
In [16]: hstackm = lambda args: reduce(lambda x,y:hstack((x,y)), args)
In [17]: hstackm([M,M])
Out[17]:
array([[ 1., 0., 0., 0., 1., 0., 0., 0.],
[ 0., 1., 0., 0., 0., 1., 0., 0.],
[ 0., 0., 1., 0., 0., 0., 1., 0.],
[ 0., 0., 0., 1., 0., 0., 0., 1.]])
Your function hstack takes one parameter, a list of matrices. reduce() calls it with two parameters instead, each a matrix.
Change your hstack method to accept an arbitrary number of arguments instead:
def hstack(*matrices):
....
instead of hstack(matrices), then call it as hstack(M, M).

Categories

Resources