I am trying to interpolate missing values of a 2D array in Python. I found this question, however in that case the rows and columns are all equidistant. In my case I have two arrays
x = [275. 290. 310. 330. 350. 410. 450.]
y = [ 8. 12. 16. 20. 30. 35. 40. 45.]
where x and y are the grid coordinates that represent the column and row nodes at which my 2d array
c = [[4 6 9 9 9 8 2]
[1 6 3 7 1 5 4]
[8 nan 3 nan 2 9 2]
[8 2 3 4 3 4 7]
[2 nan 4 nan 6 1 3]
[4 nan 8 nan 1 7 6]
[8 nan 6 nan 5 6 5]
[1 nan 1 nan 3 1 9]]
is defined.
What is the best way to fill the missing values?
scipy includes 2D interpolation for grid data (there are some other interpolation functions as well):
import numpy as np
import pandas as pd
from numpy import nan
from scipy.interpolate import griddata
x = [275, 290, 310, 330, 350, 410, 450]
y = [ 8, 12, 16, 20, 30, 35, 40, 45,]
c = np.array([[ 4., 6., 9., 9., 9., 8., 2.],
[ 1., 6., 3., 7., 1., 5., 4.],
[ 8., nan, 3., nan, 2., 9., 2.],
[ 8., 2., 3., 4., 3., 4., 7.],
[ 2., nan, 4., nan, 6., 1., 3.],
[ 4., nan, 8., nan, 1., 7., 6.],
[ 8., nan, 6., nan, 5., 6., 5.],
[ 1., nan, 1., nan, 3., 1., 9.]])
# generate x_coord, y_coord values for each grid point
x_grid, y_grid = np.meshgrid(x,y)
# get known values to set the interpolator
mask = [~np.isnan(c)]
x = x_grid[mask].reshape(-1)
y = y_grid[mask].reshape(-1)
points = np.array([x,y]).T
values = c[mask].reshape(-1)
# generate interpolated grid data
interp_grid = griddata(points, values, (x_grid, y_grid), method='nearest')
# interp grid:
array([[4., 6., 9., 9., 9., 8., 2.],
[1., 6., 3., 7., 1., 5., 4.],
[8., 6., 3., 7., 2., 9., 2.],
[8., 2., 3., 4., 3., 4., 7.],
[2., 2., 4., 4., 6., 1., 3.],
[4., 4., 8., 4., 1., 7., 6.],
[8., 8., 6., 6., 5., 6., 5.],
[1., 1., 1., 1., 3., 1., 9.]])
Related
I have 2 numpy arrays:
one of shape (753,8,1) denoting 8 sequential actions of a customer
and other of shape (753,10) denoting 10 features of a training sample.
How can I combine these two such that:
all 10 features are appended to each of the 8 sequential actions of a training sample , that is, the combined final array should have shape of (753,8,11).
Maybe something like this:
import numpy as np
# create dummy arrays
a = np.zeros((753, 8, 1))
b = np.arange(753*10).reshape(753, 10)
# make a new axis for b and repeat the values along axis 1
c = np.repeat(b[:, np.newaxis, :], 8, axis=1)
c.shape
>>> (753, 8, 10)
# now the first two axes of a and c have the same shape
# append the values in c to a along the last axis
result = np.append(a, c, axis=2)
result.shape
>>> (753, 8, 11)
result[0]
>>> array([[0., 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.],
[0., 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.],
[0., 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.],
[0., 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.],
[0., 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.],
[0., 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.],
[0., 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.],
[0., 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.]])
# values from b (0-9) have been appended to a (0)
I would like to get an array of size 11x11 with different subarrays, for example the array M composed by the following arrays (shape in parenthesis):
CC(3x3) CA(3x4) CB(3x4)
AC(4x3) AA(4x4) AB(4x4)
BC(4x3) BA(4x4) BB(4x4)
I could use concatenate but it is not optimal. I also tried the stack function, but arrays must have the same shape. Do you have any ideas to do it?
Thanks a lot!
You want np.block(). It creates an array out of 'blocks', like what you have. For e.g.
>>> CC = 1*np.ones((3, 3))
>>> CA = 2*np.ones((3, 4))
>>> CB = 3*np.ones((3, 4))
>>> AC = 4*np.ones((4, 3))
>>> AA = 5*np.ones((4, 4))
>>> AB = 6*np.ones((4, 4))
>>> BC = 7*np.ones((4, 3))
>>> BA = 8*np.ones((4, 4))
>>> BB = 9*np.ones((4, 4))
>>> M = np.block([[CC, CA, CB],
[AC, AA, AB],
[BC, BA, BB]])
>>> M
array([[ 1., 1., 1., 2., 2., 2., 2., 3., 3., 3., 3.],
[ 1., 1., 1., 2., 2., 2., 2., 3., 3., 3., 3.],
[ 1., 1., 1., 2., 2., 2., 2., 3., 3., 3., 3.],
[ 4., 4., 4., 5., 5., 5., 5., 6., 6., 6., 6.],
[ 4., 4., 4., 5., 5., 5., 5., 6., 6., 6., 6.],
[ 4., 4., 4., 5., 5., 5., 5., 6., 6., 6., 6.],
[ 4., 4., 4., 5., 5., 5., 5., 6., 6., 6., 6.],
[ 7., 7., 7., 8., 8., 8., 8., 9., 9., 9., 9.],
[ 7., 7., 7., 8., 8., 8., 8., 9., 9., 9., 9.],
[ 7., 7., 7., 8., 8., 8., 8., 9., 9., 9., 9.],
[ 7., 7., 7., 8., 8., 8., 8., 9., 9., 9., 9.]])
I have a 3d numpy array of following form:
array([[[ 1., 5., 4.],
[ 1., 5., 4.],
[ 1., 2., 4.]],
[[ 3., 6., 4.],
[ 6., 6., 4.],
[ 6., 6., 4.]]])
Is there a efficient way to convert it to a 2d array of form:
array([[1, 1, 1, 5, 5, 2, 4, 4, 4],
[3, 6, 6, 6, 6, 6, 4, 4, 4]])
Thanks a lot!
In [54]: arr = np.array([[[ 1., 5., 4.],
[ 1., 5., 4.],
[ 1., 2., 4.]],
[[ 3., 6., 4.],
[ 6., 6., 4.],
[ 6., 6., 4.]]])
In [61]: arr.reshape((arr.shape[0], -1), order='F')
Out[61]:
array([[ 1., 1., 1., 5., 5., 2., 4., 4., 4.],
[ 3., 6., 6., 6., 6., 6., 4., 4., 4.]])
The array arr has shape (2, 3, 3). We wish to keep the first axis of length 2, and flatten the two axes of length 3.
If we call arr.reshape(h, w) then NumPy will attempt to reshape arr to shape (h, w). If we call arr.reshape(h, -1) then NumPy will replace the -1 with whatever integer is needed for the reshape to make sense -- in this case, arr.size/h.
Hence,
In [63]: arr.reshape((arr.shape[0], -1))
Out[63]:
array([[ 1., 5., 4., 1., 5., 4., 1., 2., 4.],
[ 3., 6., 4., 6., 6., 4., 6., 6., 4.]])
This is almost what we want, but notice that the values in each subarray, such as
[[ 1., 5., 4.],
[ 1., 5., 4.],
[ 1., 2., 4.]]
are being traversed by marching from left to right before going down to the next row.
We want to march down the rows before going on to the next column.
To achieve that, use order='F'.
Usually the elements in a NumPy array are visited in C-order -- where the last index moves fastest. If we visit the elements in F-order then the first index moves fastest. Since in a 2D array of shape (h, w), the first axis is associated with the rows and the last axis the columns, traversing the array in F-order marches down each row before moving on to the next column.
I have an array:
a = array([
[ nan, 2., 3., 2., 5., 3.],
[ nan, 4., 3., 2., 5., 4.],
[ nan, 2., 1., 2., 3., 2.]
])
And I make a filled contour with:
plt.contourf(a)
So, I'll have it:
Nothing happens when I do plt.axis('tight'), but I want to hide boundary NaN values. How to do it easy?
You can set the min and max xlim using nanmin and nanmax:
import numpy as np
a = np.array([
[ np.nan, 2., 3., 2., 5., 3.],
[ np.nan, 4., 3., 2., 5., 4.],
[ np.nan, 2., 1., 2., 3., 2.]
])
import pylab as plt
xmax= np.nanmax(a)
xmin=np.nanmin(a)
plt.xlim(xmin,xmax)
plt.contourf(a)
plt.show()
If the array has the NaNs in a column like in your example, you can do the following way:
import matplotlib.pyplot as plt
a = array([
[ nan, 2., 3., 2., 5., 3.],
[ nan, 4., 3., 2., 5., 4.],
[ nan, 2., 1., 2., 3., 2.]
])
b = np.delete(a,0,1)
plt.contourf(b)
Well..
If I consider columns os NaNs in the begin and end, I tried that and it worked:
x = np.arange(0,a.shape[1])
plt.xlim([x[~np.isnan(a[0,:])][0],x[~np.isnan(a[0,:])][-1]])
I'm working with 3-dimensional arrays (for the purpose of this example you can imagine they represent the RGB values at X, Y coordinates of the screen).
>>> import numpy as np
>>> a = np.floor(10 * np.random.random((2, 2, 3)))
>>> a
array([[[ 7., 3., 1.],
[ 9., 6., 9.]],
[[ 4., 6., 8.],
[ 8., 1., 1.]]])
What I would like to do, is to set to an arbitrary value the G channel for those pixels whose G channel is already below 5. I can manage to isolate the pixel I am interested in using:
>>> a[np.where(a[:, :, 1] < 5)]
array([[ 7., 3., 1.],
[ 8., 1., 1.]])
but I am struggling to understand how to assign a new value to the G channel only. I tried:
>>> a[np.where(a[:, :, 1] < 5)][1] = 9
>>> a
array([[[ 7., 3., 1.],
[ 9., 6., 9.]],
[[ 4., 6., 8.],
[ 8., 1., 1.]]])
...but it seems not to produce any effect. I also tried:
>>> a[np.where(a[:, :, 1] < 5), 1] = 9
>>> a
array([[[ 7., 3., 1.],
[ 9., 9., 9.]],
[[ 4., 6., 8.],
[ 9., 9., 9.]]])
...(failing to understand what is happening). Finally I tried:
>>> a[np.where(a[:, :, 1] < 5)][:, 1] = 9
>>> a
array([[[ 7., 3., 1.],
[ 9., 6., 9.]],
[[ 4., 6., 8.],
[ 8., 1., 1.]]])
I suspect I am missing something fundamental on how NumPy works (this is the first time I use the library). I would appreciate some help in how to achieve what I want as well as some explanation on what happened with my previous attempts.
Many thanks in advance for your help and expertise!
EDIT: The outcome I would like to get is:
>>> a
array([[[ 7., 9., 1.], # changed the second number here
[ 9., 6., 9.]],
[[ 4., 6., 8.],
[ 8., 9., 1.]]]) # changed the second number here
>>> import numpy as np
>>> a = np.array([[[ 7., 3., 1.],
... [ 9., 6., 9.]],
...
... [[ 4., 6., 8.],
... [ 8., 1., 1.]]])
>>> a
array([[[ 7., 3., 1.],
[ 9., 6., 9.]],
[[ 4., 6., 8.],
[ 8., 1., 1.]]])
>>> a[:,:,1][a[:,:,1] <; 5 ] = 9
>>> a
array([[[ 7., 9., 1.],
[ 9., 6., 9.]],
[[ 4., 6., 8.],
[ 8., 9., 1.]]])
a[:,:,1] gives you G channel, I subsetted it by a[:,:,1] < 5 using it as index. then assigned value 9 to that selected elements.
there is no need to use where, you can directly index an array with the boolean array resulting from your comparison operator.
a=array([[[ 7., 3., 1.],
[ 9., 6., 9.]],
[[ 4., 6., 8.],
[ 8., 1., 1.]]])
>>> a[a[:, :, 1] < 5]
array([[ 7., 3., 1.],
[ 8., 1., 1.]])
>>> a[a[:, :, 1] < 5]=9
>>> a
array([[[ 9., 9., 9.],
[ 9., 6., 9.]],
[[ 4., 6., 8.],
[ 9., 9., 9.]]])
you do not list the expected output in your question, so I am not sure this is what you want.