Removing nan in array at position from another numpy array - python

I want to remove nans from two arrays if there is a nan in the same position in either of them. The arrays are of same length. Here is what I am doing:
y = numpy.delete(y, numpy.where(numpy.isnan(x)))
numpy.delete(y, numpy.where(numpy.isnan(x)))
However, this only works if x is the one with nan's. How do I make it work if either x or y have nan?

You have to keep track of the indices to remove from both arrays. You don't need where since numpy supports boolean indexing (masks). Also, you don't need delete since you can just get a subset of the array.
mask = ~np.isnan(x)
x = x[mask]
y = y[mask]
mask = ~np.isnan(y)
x = x[mask]
y = y[mask]
Or more compactly:
mask = ~np.isnan(x) & ~np.isnan(y)
x = x[mask]
y = y[mask]
The first implementation only has an advantage if the arrays are enormous and computing the mask for y from a smaller array has a performance benefit. In general, I would recommend the second approach.

import numpy as np
import numpy.ma as ma
y = ma.masked_array(y, mask=~np.isnan(x))
y = y.compress() # y without nan where x has nan's
or, after the comments:
mask = ~np.isnan(x) & ~np.isnan(y)
y = ma.masked_array(y, mask=mask)
y = y.compress() # y without nan where x and y have nan's
x = ma.masked_array(x, mask=mask)
x = x.compress() # x without nan where x and y have nan's
or without mask:
mask = ~np.isnan(x) & ~np.isnan(y)
y = y[mask]
x = x[mask]

Related

Numpy array assignment by boolean indices array

I have a very large array, but I'll use a smaller one to explain.
Given source array X
X = [ [1,1,1,1],
[2,2,2,2],
[3,3,3,3]]
A target array with the same size Y
Y = [ [-1,-1,-1,-1],
[-2,-2,-2,-2],
[-3,-3,-3,-3]]
And an assigment array IDX:
IDX = [ [1,0,0,0],
[0,0,1,0],
[0,1,0,1]]
I want to assign Y to X by IDX - Only assign where IDX==1
In this case, something like:
X[IDX] = Y[IDX]
will result in:
X = [ [-1,1,1,1],
[2,2,-2,2],
[3,-3,3,-3]]
How can this be done efficiently (not a for-loop) in numpy/pandas?
Thx
If IDX is a NumPy array of Boolean type, and X and Y are NumPy arrays then your intuition works:
X = np.array(X)
Y = np.array(Y)
IDX = np.array(IDX).astype(bool)
X[IDX] = Y[IDX]
This changes X in place.
If you don't want to do all this type casting, or don't want to overwrite X, then np.where() does what you want in one go:
np.where(IDX==1, Y, X)

Howw to reshape X and Y from (300, 1000,50) to (300000, 50)?

I have a dataset:
X.shape = (300, 1000, 50)
Y.shape = (300,)
Y is the true values (4 options: [0..3])
I want to reshape X to: (300000, 50) and Y to (300000,)
The new X.shape will be [X.shape[0]*X.shape[1], X.shape[2]]
The new Y.shape will be [X.shape[0]*X.shape[1],] and it will contain the right duplicate values of Y (according to the new shape).
How can I do it?
You can do
X = X.reshape(-1, Y.shape[-1])
However, it becomes unclear how to broadcast the arrays then, since broadcasting aligns to the right (implicitly prepends unit dimensions, not appends).
You could repeat the elements of Y:
Y = np.repeat(Y, 1000)
or the entire array itself:
Y = np.tile(Y, 4)
Both are overkill.
Just use proper broadcasting instead:
# X untouched
Y = Y[:, None, None]
Or equivalently
Y = Y.reshape(-1, 1, 1)

Populate a Numpy array according to values of two different arrays

I have two arrays 5x5x3:
A = np.random.randint(0, 255, (5,5,3), np.uint8)
B = np.random.randint(0, 255, (5,5,3), np.uint8)
and I need to populate a third array C (same shape of A and B) populating its values from A or B according to the values in A.
Pure Python code should be:
C = np.zeros(A.shape, dtype=np.uint8)
h, w, ch = C.shape
for y in range(0, h):
for x in range(0, w):
for z in range(0, ch):
if A[y, x, z] > 128:
C[y, x, z] = max(A[y, x, z], B[y, x, z])
else:
C[y, x, z] = min(A[y, x, z], B[y, x, z])
The above code works but it's very slow with big arrays.
My attempt with numpy was the following:
C = np.zeros(A.shape, dtype=np.uint8)
C[A>128] = max(A,B)
C[A<128] = min(A,B)
but the output was:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
With np.where you can avoid creating empty array before. np.maximum, np.minimum return arrays with the same shape as A and B. Condition A>128 will select from them correct values
ะก = np.where(A>128, np.maximum(A,B), np.minimum(A,B))
Try the following code:
C = np.zeros(shape=A.shape)
C = (A>128)*np.maximum(A,B)+(A<=128)*np.minimum(A,B)
It was 5 times faster for me.
Pythons min/max functions does not seem to work element-wise on numpy ndarrays (which it seems like you are looking for)
You should be able to use np.maximum and np.minimum instead
C[A>128] = np.maximum(A,B)[A>128]
C[A<=128] = np.minimum(A,B)[A<=128]

How to remove columns in 2D numpy array if one element is smaller or larger than a certain value

Right now I have a 2-D numpy arrays that represents the coordinates pixels of an image
points = [[-1,-2,0,1,2,3,5,8] [-3,-4,0,-3,5,9,2,1]]
Each column represents a coordinate in the image, e.g:
array[0] = [-1,-3] means x = -1 and y = -3
Right now, I want to remove columns that either has x less than 0 && more than 5 or y less than 0 && more than 5
I know how to remove elements of a certain value
#remove x that is less than 0 and more than 5
x = points[0,:]
x = x[np.logical_and(x>=0, x<=5)]
#remove y that is less than 0 and more than 5
y = points[1,:]
y = y[np.logical_and(y>=0,y<=5)]
Is there a way to remove the y that shares the same index with the x that is deleted?(in other words, remove columns when either the condition for x deletion or y deletion is satisfied)
You can convert list to ndarray, then create a mask of boolean and reassign x, y. The nested logical_and mean you create a mask of x>=0 and x<=5 and y>=0 and y<=5, then the AND operator ensure that if once x[i] deleted, y[i] got deleted as well
points = [[-1,-2,0,1,2,3,5,8], [-3,-4,0,-3,5,9,2,1]]
x = np.array(points[0,:])
y = np.array(points[1,:])
mask = np.logical_and(np.logical_and(x>=0, x<=5), np.logical_and(y>=0, y<=5))
# mask = array([False, False, True, False, True, False, True, False])
x = x[mask] # x = array([0, 2, 5])
y = y[mask] # y = array([0, 5, 2])
You can use np.compress along the axis=1 to get the points you need:
np.compress((x>=0) * (x<=5) * (y>=0) * (y<=5), points, axis=1)
array([[0, 2, 5],
[0, 5, 2]])
where I have assumed that x, y and points are numpy arrays.

Python numpy array manipulation

i need to manipulate an numpy array:
My Array has the followng format:
x = [1280][720][4]
The array stores image data in the third dimension:
x[0][0] = [Red,Green,Blue,Alpha]
Now i need to manipulate my array to the following form:
x = [1280][720]
x[0][0] = Red + Green + Blue / 3
My current code is extremly slow and i want to use the numpy array manipulation to speed it up:
for a in range(0,719):
for b in range(0,1279):
newx[a][b] = x[a][b][0]+x[a][b][1]+x[a][b][2]
x = newx
Also, if possible i need the code to work for variable array sizes.
Thansk Alot
Use the numpy.mean function:
import numpy as np
n = 1280
m = 720
# Generate a n * m * 4 matrix with random values
x = np.round(np.random.rand(n, m, 4)*10)
# Calculate the mean value over the first 3 values along the 2nd axix (starting from 0)
xnew = np.mean(x[:, :, 0:3], axis=2)
x[:, :, 0:3] gives you the first 3 values in the 3rd dimension, see: numpy indexing
axis=2 specifies, along which axis of the matrix the mean value is calculated.
Slice the alpha channel out of the array, and then sum the array along the RGB axis and divide by 3:
x = x[:,:,:-1]
x_sum = x.sum(axis=2)
x_div = x_sum / float(3)

Categories

Resources