How to add nan values in specific index positions in numpy arrays? - python

I have 2 np.arrays:
The first one called data:
data= array([ 17. , nan, 8.1, 25.1, nan, 6.9, nan, 27.1, 46.6,
34.1, 25.7, nan, ... , 25.3 ])
Array of float 64 Size (366,)
To get the second one i did an interpolation. So i should first drop the NaN values:
data = data[~numpy.isnan(data)]
So i have now the data like this:
data = array([ 17. , 8.1, 25.1, 6.9, 27.1, 46.6,
34.1, 25.7, ... , 25.3 ])
Array of float 64 Size (283,)
And after the interpolation i get the second one:
interpolated_data = array([ 16 , 7.1, 24.1, 7.9, 26.1, 45.6,
33.1, 27.7, ... , 24.3 ])
Array of float 64 Size (283,)
Now i want to give it back the nan values in the same index position in both arrays.
Expected values:
data = array([ 17. , nan, 8.1, 25.1, nan, 6.9, nan, 27.1, 46.6,
34.1, 25.7, nan, ... , 25.3 ])
Array of float 64 Size (366,)
interpolated_data = array([ 16 , nan, 7.1, 24.1, nan, 7.9, nan, 26.1, 45.6,
33.1, 27.7, nan, ... , 24.3 ])
Array of float 64 Size (366,)
Would you mind to help me? Thanks in advance.

First your extract the values from your data array with the mask you created:
data= array([ 17. , nan, 8.1, 25.1, nan, 6.9, nan, 27.1, 46.6,
34.1, 25.7, nan, ... , 25.3 ])
nan_mask = numpy.isnan(data)
data1 = data[~nan_mask]
From there you get your interpolated_data. Then, you can create an empty array of the same size of the initial data array and then put back your interpolated_data and the np.nan in this empty array:
interpolated_array = np.empty(data.shape)
interpolated_array[~nan_mask] = interpolated_data
interpolated_array[nan_mask] = np.nan

Keep index where there is no nan, do computation and recreate an array with the same dimension as the initial filled by nan. Use your index to copy your value to the new array.
# initial array
a = array([ 1., 2., nan, 4., nan, 6.])
# index where no nan
idx = np.where(~np.isnan(a))
# new array without nan
m = a[idx]
print(m)
array([1., 2., 4., 6.])
# ... interpolation ...
print(i)
array([10, 20, 40, 60])
# replace nan
b = np.array([np.nan] * len(a))
new[idx] = i
print(b)
array([10., 20., nan, 40., nan, 60.])

Here you go:
# generate data with nan values
data = np.ones(10)
data[4] = np.nan
# get boolean selection where data is nan
boolean_selection = np.isnan(data)
# apply some interpolation on the data that is not nan
# this is just a placeholder
interpolated_data = data[np.logical_not(boolean_selection)]
# fill back the interpolated data
data[np.logical_not(boolean_selection)] = interpolated_data

Related

Replacing all elements except NaN in Python

I want to replace all elements in array X except nan with 10.0. Is there a one-step way to do it? I present the expected output.
import numpy as np
from numpy import nan
X = np.array([[3.25774286e+02, 3.22008654e+02, nan, 1.85356823e+02,
1.85356823e+02, 3.22008654e+02, nan, 3.22008654e+02]])
The expected output is
X = array([[10.0, 10.0, nan, 10.0,
10.0, 10.0, nan, 10.0]])
You can get an array of True/False for nan location using np.isnan, invert it, and use it to replace all other values with 10.0
indices = np.isnan(X)
X[~indices] = 10.0
print(X) # [[10. 10. nan 10. 10. 10. nan 10.]]
You can use a combination of numpy.isnan and numpy.where.
>>> np.where(np.isnan(X), X, 10)
array([[10., 10., nan, 10., 10., 10., nan, 10.]])
One-liner, in-place for x
np.place(x, np.invert(np.isnan(x)), 10.0)

How to create a series of integers in a dataframe only when there are no NaNs, and integers increasing by 1 unit?

I have a dataframe that look like this:
data = pd.DataFrame([NaN, NaN, NaN, 0.5, 1.2, 2.7, 3.8, NaN, NaN, 0.1, 0.7, 2.3, NaN, NaN, NaN, NaN, NaN, 0.01, 0.4, 1.5, 2.8, 4.5, 5.6, NaN, NaN, NaN, NaN, NaN, NaN])
The extra column I need to create should look like this:
data = pd.DataFrame([NaN, NaN, NaN, 1, 1, 1, 1, NaN, NaN, 2, 2, 2, NaN, NaN, NaN, NaN, NaN, 3, 3, 3, 3, 3, 3, NaN, NaN, NaN, NaN, NaN, NaN])
Basically, when there are NaNs, the integer remains the same, and when not a NaN is found it should increment by 1 unit and be constant until another NaN is found.
Is there an elegant way to do this without going through every row individually? My dataframe is very large.
Let's consider this input data
NaN = np.nan
data = pd.DataFrame(
[NaN, NaN, NaN, 0.5, 1.2, 2.7, 3.8, NaN, NaN, 0.1, 0.7, 2.3, NaN, NaN,
NaN, NaN, NaN, 0.01, 0.4, 1.5, 2.8, 4.5, 5.6, NaN, NaN, NaN, NaN, NaN, NaN])
So the idea is to first create a Boolean column where it is not nan with notna.
m = data[0].notna()
Now, you can get the diff on this Boolean column to get True when it change from True to False or vice-versa. Keep the True only when it changes when notna with &m. Use cumsum to get the incremental integer and then where with the Boolean mask m to replace by NaN where it was Nan originally. So in one liner:
data['res'] = (m.diff()&m).cumsum().where(m)
print(data)
0 res
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 0.50 1.0
4 1.20 1.0
5 2.70 1.0
6 3.80 1.0
7 NaN NaN
8 NaN NaN
9 0.10 2.0
10 0.70 2.0
11 2.30 2.0
12 NaN NaN
13 NaN NaN
...

How to create a count function for an array that maintains the time and location of the points (python)

I have two netcdf files which I have used to calculate the Humidex of the Houston area. From there I need to find a way to count the number of days at each lat/lon that have days that meet a certain threshold (41). I then need to plot a spatial map of the count number over the region so I can compare the number of extremely hot days a each point in the region. I've used xarray.where in order to isolate the number of days at this threshold, but when I apply a count function I lose my time and lat/lon variables, and just get an output of the total number of data points at this threshold.
humidex is a calculation of two different netcdf files, it has latitude and longitude variables
​
>>> hotday = xr.DataArray(humidex)
>>> hotday.where(hotday >=41)
<xarray.DataArray 'tasmax' (lat: 960, lon: 1920)>
array([[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]], dtype=float32)
Coordinates:
* lat (lat) float64 -89.86 -89.67 -89.48 -89.3 ... 89.3 89.48 89.67 89.86
* lon (lon) float64 0.0 0.1875 0.375 0.5625 ... 359.2 359.4 359.6 359.8
height float64 2.0
>>>for ii in hotday:
>>> counting=xr.DataArray.count(ii)
>>>counting
<xarray.DataArray 'tasmax' ()>
array(1920)
Coordinates:
lat float64 89.86
height float64 ...
I hope this makes sense, I'm still new to coding and this has really thrown me.
Welcome to SO. There are numerous ways to do solve your problem.
Here's one proposed method:
import xarray as xr
data = xr.tutorial.open_dataset('air_temperature')
high_temps = xr.where(data > 300, 1, 0) #set all temps over 300K = 1; others to 0
summed_temps = high_temps.sum(dim='time')
You could then plot the heat map directly.

How can I export my NUMPY array into a CSV or EXCEL file

This is my first time trying to do coding in Python, so I learned how to make a numpy array, and how I can export it as CSV file using np.savetxt. But, when I open the CSV file in excel, the columns of my matrix seems to be merged in one and it is impossible to do analysis on it. I was wondering how I can fix this issue. I don't know whether numpy is a proper choice for doing this analysis or not. So, if you have any other suggestions, please include.
Here, I have created an empty array with a1, b1 dimensions.
# Create an empty array with dim = (a1: num of months, b1:num of stations)
aa = np.empty((a1, b1))
aa[:] = np.nan
Here, I have filled the empty array row by row with a for loop:
for i in range(1, a1):
S_Obs = Sta_M.iloc [i-1, 2]
R_Val = Rad_M.iloc [i, 2:]
addadjuster = adjust.AdjustAdd(coords, coords, nnear_raws = 5)
addadjusted = addadjuster(S_Obs, R_Val)
aa[i,:] = addadjusted
Finally, when I display my array row by row, it looks like this:
aa[111, :]
array([ nan, nan, nan, 16.296, 24.888, nan, nan, nan,
nan, nan, nan, nan, nan, nan, 23.496, 1.704,
52.32 , nan, 25.368, nan, nan, nan, nan, nan,
nan, nan, nan, 21.264, 19.584, 22.272, 0.144, 10.008,
1.68 , 0. , nan, nan, nan, nan, nan, 0. ,
0. , 30.696, nan, nan, 24.888, nan, nan, 3.648,
14.832, 7.944, nan, nan, nan, nan, nan, nan,
nan])
I want to save this array in a way that I can do some simple analysis on it. It can be in EXCEL or CSV. I used this code, but it doesn't show the columns properly.
np.savetxt("AAtest.csv", aa, delimiter="/")
You can use Pandas to save your NumPy array as CSV (See here)
Suppose numArr is your numpy array. And you can do like this:
import pandas as pd
df = pd.DataFrame(numArr)
df.to_csv('file.csv',index=False)
In [155]: arr = np.zeros((4,5))
In [156]: arr[:] = np.nan
In [158]: arr[[0,0,1,2,2,3],[0,2,1,3,4,3]]=1.23
In [159]: arr
Out[159]:
array([[1.23, nan, 1.23, nan, nan],
[ nan, 1.23, nan, nan, nan],
[ nan, nan, nan, 1.23, 1.23],
[ nan, nan, nan, 1.23, nan]])
In [160]: np.savetxt('test.csv',arr, delimiter=',')
This is a properly formatted comma separated file. But the numbers are saved with scientific notation.
In [161]: cat test.csv
1.229999999999999982e+00,nan,1.229999999999999982e+00,nan,nan
nan,1.229999999999999982e+00,nan,nan,nan
nan,nan,nan,1.229999999999999982e+00,1.229999999999999982e+00
nan,nan,nan,1.229999999999999982e+00,nan
To line up the columns we need to specify a format. For example:
In [162]: np.savetxt('test.csv',arr, delimiter=',', fmt='%10f')
In [163]: cat test.csv
1.230000, nan, 1.230000, nan, nan
nan, 1.230000, nan, nan, nan
nan, nan, nan, 1.230000, 1.230000
nan, nan, nan, 1.230000, nan
If you cannot use pandas for some reason, stick with numpy by
aa.tofile('my_csv.csv', sep=',', format='%s')

Removing nan elements from matrix

I have a bunch of matrices eq1, eq2, etc. defined like
from numpy import meshgrid, sqrt, arange
# from numpy import isnan, logical_not
xs = arange(-7.25, 7.25, 0.01)
ys = arange(-5, 5, 0.01)
x, y = meshgrid(xs, ys)
eq1 = ((x/7.0)**2.0*sqrt(abs(abs(x)-3.0)/(abs(x)-3.0))+(y/3.0)**2.0*sqrt(abs(y+3.0/7.0*sqrt(33.0))/(y+3.0/7.0*sqrt(33.0)))-1.0)
eq2 = (abs(x/2.0)-((3.0*sqrt(33.0)-7.0)/112.0)*x**2.0-3.0+sqrt(1-(abs(abs(x)-2.0)-1.0)**2.0)-y)
where eq1, eq2, eq3, etc. are large square matrices. As you can see, there are many nan elements surrounding a 'block' of plot-able values. I want to remove all the nan values whilst keeping the shape of the block of the valid values in the matrix. Note that these 'blocks' can be located anywhere in the eq1, eq2 matrix.
I've looked at answers given in Removing nan values from an array and Removing NaN elements from a matrix, but these don't seem to be completely relevant to my case.
IIUC, you can use boolean indexing with np.isnan to keep the slices. There are probably slicker ways to do this, but starting from something like:
>>> eq = np.zeros((5,6)) + np.nan
>>> eq[2:4, 1:3].flat = [1,np.nan,3,4]
>>> eq
array([[ nan, nan, nan, nan, nan, nan],
[ nan, nan, nan, nan, nan, nan],
[ nan, 1., nan, nan, nan, nan],
[ nan, 3., 4., nan, nan, nan],
[ nan, nan, nan, nan, nan, nan]])
You could select the rows and columns with data using something like
>>> eq = eq[:,~np.isnan(eq).all(0)]
>>> eq = eq[~np.isnan(eq).all(1)]
>>> eq
array([[ 1., nan],
[ 3., 4.]])
Short and sweet,
eq1_c = eq1[~np.isnan(eq1)]
np.isnan returns a bool array that can be used to index your original array. Take its negation and you will get back the non-nan values.
One option is to manually iterate through the grid and check for Nan values. A Nan value is easy to spot because comparing it to itself will result in False. You could use this to set all Nan values to 0.0 for example.
for x in xrange(len(eq1)):
for y in xrange(len(eq1[x])):
v = eq1[x][y]
if v!=v:
eq1[x][y] = 0.0

Categories

Resources