plt.axis(): How to 'tight' axis hiding boundary NaNs? - python

I have an array:
a = array([
[ nan, 2., 3., 2., 5., 3.],
[ nan, 4., 3., 2., 5., 4.],
[ nan, 2., 1., 2., 3., 2.]
])
And I make a filled contour with:
plt.contourf(a)
So, I'll have it:
Nothing happens when I do plt.axis('tight'), but I want to hide boundary NaN values. How to do it easy?

You can set the min and max xlim using nanmin and nanmax:
import numpy as np
a = np.array([
[ np.nan, 2., 3., 2., 5., 3.],
[ np.nan, 4., 3., 2., 5., 4.],
[ np.nan, 2., 1., 2., 3., 2.]
])
import pylab as plt
xmax= np.nanmax(a)
xmin=np.nanmin(a)
plt.xlim(xmin,xmax)
plt.contourf(a)
plt.show()

If the array has the NaNs in a column like in your example, you can do the following way:
import matplotlib.pyplot as plt
a = array([
[ nan, 2., 3., 2., 5., 3.],
[ nan, 4., 3., 2., 5., 4.],
[ nan, 2., 1., 2., 3., 2.]
])
b = np.delete(a,0,1)
plt.contourf(b)

Well..
If I consider columns os NaNs in the begin and end, I tried that and it worked:
x = np.arange(0,a.shape[1])
plt.xlim([x[~np.isnan(a[0,:])][0],x[~np.isnan(a[0,:])][-1]])

Related

how to split a numpy array into subarrays based on values of one colums

I have a big numpy array and want to split it. I have read this solution but it could not help me. The target column can have several values but I know based on which one I want to split it. In my simplified example the target column is the third one and I want to split it based on the value 2.. This is my array.
import numpy as np
big_array = np.array([[0., 10., 2.],
[2., 6., 2.],
[3., 1., 7.1],
[3.3, 6., 7.8],
[4., 5., 2.],
[6., 6., 2.],
[7., 1., 2.],
[8., 5., 2.1]])
Rows that have this value (2.) make one split. Then, the next rows (number three and four) which are not 2., make another one. Again in my data set I see this value (2.) and make a split out of it and again I keep non 2. values (last row) as another split. The final result should look like this:
spl_array = [np.array([[0., 10., 2.],
[2., 6., 2.]]),
np.array([[3., 1., 7.1],
[3.3, 6., 7.8]]),
np.array([[4., 5., 2.],
[6., 6., 2.],
[7., 1., 2.]]),
np.array([[8., 5., 2.1]])]
In advance I do appreciate any help.
First you find all arrays which contains 2 or which do not contains 2. This array will be full with True and False values. Transform this array to an array with zeros and ones. Check where there are differences (like [0, 0, 1, 1, 0] will be: 0, 1, 0, -1.
Based on the change one can use numpy where to find the indices of those values.
Insert the index 0 and the last index for the big array, so you are able to zip them in a left and right slice.
import numpy as np
big_array = np.array([[0., 10., 2.],
[2., 6., 2.],
[3., 1., 7.1],
[3.3, 6., 7.8],
[4., 5., 2.],
[6., 6., 2.],
[7., 1., 2.],
[8., 5., 2.1]])
idx = [2 in array for array in big_array]
idx *= np.ones(len(idx))
slices = list(np.where(np.diff(idx) != 0)[0] + 1)
slices.insert(0,0)
slices.append(len(big_array))
result = list()
for left, right in zip(slices[:-1], slices[1:]):
result.append(big_array[left:right])
'''
[array([[ 0., 10., 2.],
[ 2., 6., 2.]]),
array([[3. , 1. , 7.1],
[3.3, 6. , 7.8]]),
array([[4., 5., 2.],
[6., 6., 2.],
[7., 1., 2.]]),
array([[8. , 5. , 2.1]])]
'''
You can do this with numpy
np.split(
big_array,
np.flatnonzero(np.diff(big_array[:,2] == 2) != 0) + 1
)
Output
[array([[ 0., 10., 2.],
[ 2., 6., 2.]]),
array([[3. , 1. , 7.1],
[3.3, 6. , 7.8]]),
array([[4., 5., 2.],
[6., 6., 2.],
[7., 1., 2.]]),
array([[8. , 5. , 2.1]])]

Interpolate missing values on non-uniform 2D grid

I am trying to interpolate missing values of a 2D array in Python. I found this question, however in that case the rows and columns are all equidistant. In my case I have two arrays
x = [275. 290. 310. 330. 350. 410. 450.]
y = [ 8. 12. 16. 20. 30. 35. 40. 45.]
where x and y are the grid coordinates that represent the column and row nodes at which my 2d array
c = [[4 6 9 9 9 8 2]
[1 6 3 7 1 5 4]
[8 nan 3 nan 2 9 2]
[8 2 3 4 3 4 7]
[2 nan 4 nan 6 1 3]
[4 nan 8 nan 1 7 6]
[8 nan 6 nan 5 6 5]
[1 nan 1 nan 3 1 9]]
is defined.
What is the best way to fill the missing values?
scipy includes 2D interpolation for grid data (there are some other interpolation functions as well):
import numpy as np
import pandas as pd
from numpy import nan
from scipy.interpolate import griddata
x = [275, 290, 310, 330, 350, 410, 450]
y = [ 8, 12, 16, 20, 30, 35, 40, 45,]
c = np.array([[ 4., 6., 9., 9., 9., 8., 2.],
[ 1., 6., 3., 7., 1., 5., 4.],
[ 8., nan, 3., nan, 2., 9., 2.],
[ 8., 2., 3., 4., 3., 4., 7.],
[ 2., nan, 4., nan, 6., 1., 3.],
[ 4., nan, 8., nan, 1., 7., 6.],
[ 8., nan, 6., nan, 5., 6., 5.],
[ 1., nan, 1., nan, 3., 1., 9.]])
# generate x_coord, y_coord values for each grid point
x_grid, y_grid = np.meshgrid(x,y)
# get known values to set the interpolator
mask = [~np.isnan(c)]
x = x_grid[mask].reshape(-1)
y = y_grid[mask].reshape(-1)
points = np.array([x,y]).T
values = c[mask].reshape(-1)
# generate interpolated grid data
interp_grid = griddata(points, values, (x_grid, y_grid), method='nearest')
# interp grid:
array([[4., 6., 9., 9., 9., 8., 2.],
[1., 6., 3., 7., 1., 5., 4.],
[8., 6., 3., 7., 2., 9., 2.],
[8., 2., 3., 4., 3., 4., 7.],
[2., 2., 4., 4., 6., 1., 3.],
[4., 4., 8., 4., 1., 7., 6.],
[8., 8., 6., 6., 5., 6., 5.],
[1., 1., 1., 1., 3., 1., 9.]])

OLS Regression Results

I am trying to do an "OLS Regression Results" ,to a college project, and my code is this:
import statsmodels.api as sm
from statsmodels.formula.api import ols
import numpy as np
data=np.loadtxt('file.txt',skiprows=1)
season=data[:nb,0]
tod=data[:nb,1]
obs=data[:nb,2]
pr=data[:nb,3]
data_lm = ols('pr ~ tod + season',data=data).fit()
table = sm.stats.anova_lm(data_lm, typ=2)
data_lm.summary()
print(table)
It gives me this error "PatsyError: Error evaluating factor: IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices
pr ~ tod) + season"
I think the error is in the format of my data. The text file contains 4 different columns (season, tod, obs and pr).
season:[3., 3., 1., 3., 3., 3., 3., 3., 1., 3., 3., 1., 3., 2., 3., 3., 3.,
1., 1., 1., 1., 3., 1., 2., 1., 3., 1., 1., 2., 1., 3., 3., 1., 1.,
1., 2., 3.]
tod:[2., 4., 1., 2., 2., 2., 4., 1., 3., 3., 1., 3., 3., 2., 2., 4., 3.,
3., 4., 3., 3., 2., 4., 1., 3., 4., 1., 1., 1., 3., 3., 4., 3., 3.,
4., 4., 4.]
obs:[ 1., 1., 1., 3., 3., 3., 3., 3., 4., 4., 4., 5., 5.,
5., 5., 5., 6., 9., 9., 12., 12., 12., 12., 12., 13., 13.,
16., 16., 17., 19., 19., 19., 20., 20., 20., 20., 24.]
pr:[0. , 0. , 0. , 0.1, 0.2, 0.2, 0.4, 0.4, 0.5, 0.5, 0.7, 0.7, 0.7,
0.8, 0.8, 0.8, 0.8, 0.9, 0.9, 1. , 1. , 1.1, 1.1, 1.2, 1.3, 1.4,
1.4, 1.5, 1.6, 1.7, 1.7, 1.8, 1.8, 1.9, 2. , 2. , 2. ]
Can anyone help me?
data is a basic NumPy ndarray object. These accept integers, slices, or other "array like" objects when you index them with []. However, the ols function explicitly says in the documentation:
data must define __getitem__ with the keys in the formula
That means data must be a pandas DataFrame, a dictionary, or a NumPy structured array, with a __getitem__ method that accepts str objects as indices.

np.nanmax() over all except one axis - is this the best way?

For a numpy array of dimension n, I'd like to apply np.nanmax() to n-1 dimensions producing a 1 dimensional array of maxima, ignoring all values set to np.nan.
q = np.arange(5*4*3.).reshape(3,4,5) % (42+1)
q[q%5==0] = np.nan
producing:
array([[[ nan, 1., 2., 3., 4.],
[ nan, 6., 7., 8., 9.],
[ nan, 11., 12., 13., 14.],
[ nan, 16., 17., 18., 19.]],
[[ nan, 21., 22., 23., 24.],
[ nan, 26., 27., 28., 29.],
[ nan, 31., 32., 33., 34.],
[ nan, 36., 37., 38., 39.]],
[[ nan, 41., 42., nan, 1.],
[ 2., 3., 4., nan, 6.],
[ 7., 8., 9., nan, 11.],
[ 12., 13., 14., nan, 16.]]])
If I know ahead of time that I want to use the last axis as the remaining dimension, I can use the -1 feature in .reshape() and do this:
np.nanmax(q.reshape(-1, q.shape[-1]), axis=0)
which produces the result I want:
array([ 12., 41., 42., 38., 39.])
However, suppose I don't know ahead of time to which one of the axes that I don't want to apply the maximum? Suppose I started with n=4 dimensions, and wanted it to apply to all axes except the mth axis, which could be 0, 1, 2, or 3? Would have to actually use a conditional if-elif-else ?
Is there something that would work like a hypothetical exeptaxis=m?
The axis argument of nanmax can be a tuple of axes over which the maximum is computed. In your case, you want that tuple to contain all the axes except m. Here's one way you could do that:
In [62]: x
Out[62]:
array([[[[ 4., 3., nan, nan],
[ 0., 2., 2., nan],
[ 4., 5., nan, 3.],
[ 2., 0., 3., 1.]],
[[ 2., 0., 0., 1.],
[ nan, 3., 0., nan],
[ 0., 1., nan, 2.],
[ 5., 4., 0., 1.]],
[[ 4., 0., 2., 0.],
[ 4., 0., 4., 5.],
[ 3., 4., 1., 0.],
[ 5., 3., 4., 3.]]],
[[[ 2., nan, 6., 4.],
[ 3., 1., 2., nan],
[ 5., 4., 1., 0.],
[ 2., 6., 0., nan]],
[[ 4., 1., 4., 2.],
[ nan, 1., 5., 5.],
[ 2., 0., 1., 1.],
[ 6., 3., 6., 5.]],
[[ 1., 0., 0., 1.],
[ 1., nan, 2., nan],
[ 3., 4., 0., 5.],
[ 1., 6., 2., 3.]]]])
In [63]: m = 0
In [64]: np.nanmax(x, axis=tuple(i for i in range(x.ndim) if i != m))
Out[64]: array([ 5., 6.])

Multiplication in Python with arrays of different length

I have a five 100x100 arrays, A, and I want to multiply each matrix by a value from an array of length five, B. I wish to multiply the first matrix in A by the first value in B and the second matrix by the second value in B, etc. Am I able to do this?
Actually the answer has been provided by gboffi in his comment. Yet I want to elaborate that answer, giving a concrete example with code:
import numpy as np
#example data, all arrays of ones 100x100
A1 = A2 = A3 =A4 = A5 = np.ones((100, 100))
#example array containing the factor for each matrix
B = np.array([1, 2, 3, 4, 5])
#create an array containing all matrices
A = np.array([A1, A2, A3, A4, A5])
A*B[:,None,None]
The result then looks like this:
array([[[ 1., 1., 1., ..., 1., 1., 1.],
[ 1., 1., 1., ..., 1., 1., 1.],
[ 1., 1., 1., ..., 1., 1., 1.],
...,
[ 1., 1., 1., ..., 1., 1., 1.],
[ 1., 1., 1., ..., 1., 1., 1.],
[ 1., 1., 1., ..., 1., 1., 1.]],
[[ 2., 2., 2., ..., 2., 2., 2.],
[ 2., 2., 2., ..., 2., 2., 2.],
[ 2., 2., 2., ..., 2., 2., 2.],
...,
[ 2., 2., 2., ..., 2., 2., 2.],
[ 2., 2., 2., ..., 2., 2., 2.],
[ 2., 2., 2., ..., 2., 2., 2.]],
[[ 3., 3., 3., ..., 3., 3., 3.],
[ 3., 3., 3., ..., 3., 3., 3.],
[ 3., 3., 3., ..., 3., 3., 3.],
...,
[ 3., 3., 3., ..., 3., 3., 3.],
[ 3., 3., 3., ..., 3., 3., 3.],
[ 3., 3., 3., ..., 3., 3., 3.]],
[[ 4., 4., 4., ..., 4., 4., 4.],
[ 4., 4., 4., ..., 4., 4., 4.],
[ 4., 4., 4., ..., 4., 4., 4.],
...,
[ 4., 4., 4., ..., 4., 4., 4.],
[ 4., 4., 4., ..., 4., 4., 4.],
[ 4., 4., 4., ..., 4., 4., 4.]],
[[ 5., 5., 5., ..., 5., 5., 5.],
[ 5., 5., 5., ..., 5., 5., 5.],
[ 5., 5., 5., ..., 5., 5., 5.],
...,
[ 5., 5., 5., ..., 5., 5., 5.],
[ 5., 5., 5., ..., 5., 5., 5.],
[ 5., 5., 5., ..., 5., 5., 5.]]])

Categories

Resources