'ignore nan' in numpy functions - python

The elementary functions in numpy, like mean() and std() returns np.nan when encounter np.nan. Can I make them ignore it?

The "normal" functions like np.mean and np.std evalutates the NaN i.e the result you've provided evaluates to NaN.
If you want to avoid that, use np.nanmean and np.nanstd. Note that since you have only one non-nan element the std is 0, thus you are dividing by zero.

Related

Take a "nanmean" in xarray

I know that I can use the numpy nanmean function to take the mean of a numpy array, while ignoring NaN values. Is there an analogous way to accomplish this with xarray? I will give an example...
numpy_array=[1,2,3,4,float('nan'),5]
np.mean(numpy_array)
> NaN
np.nanmean(numpy_array)
> 3.0
In xarray, I can do
xarray_example.mean(dim='dimension')
How can I change this to a nanmean?
Thanks.
According to the xarray docs for the mean function, you can set the skipna parameter as True (it will skip missing float values by default). So:
xarray_example.mean(dim='dimension', skipna=True)

Find max values of 1D arrays that may contain NAN

I am trying to find the maximum value in a 1D array using the max function in python. However, these arrays may contain NAN as consequence of missing data (flagged astronomical data). Every time I try to find the max value in the array, it gives me NAN as the maximum value. I was wondering if there is a way to find the maximum real number in the array.
I don't believe the two functions 'min', 'max' in Python are affected by 'nan' values. Something is wrong with your code logic. As tested with both Python 2 and 3, min/max functions give correct output values.
There's no code in your question but I can guess out you may misconcept between NAN (not a number value), and "NAN" a string constant. Here's a possible case that 'max' function gives output result as "NAN":

pandas - vectorized formula computation with nans

I have a DataFrame (Called signal) that is a simple timeseries with 5 columns. This is what its .describe() looks like:
ES NK NQ YM
count 5294.000000 6673.000000 4798.000000 3415.000000
mean -0.000340 0.000074 -0.000075 -0.000420
std 0.016726 0.018401 0.023868 0.015399
min -0.118724 -0.156342 -0.144667 -0.103101
25% -0.008862 -0.010297 -0.011481 -0.008162
50% -0.001422 -0.000590 -0.001747 -0.001324
75% 0.007069 0.009163 0.009841 0.006304
max 0.156365 0.192686 0.181245 0.132630
I want to apply a simple function on every single row, and receive back a matrix with the same dimensions:
weights = -2*signal.subtract( signal.mean(axis=1), axis=0).divide( signal.sub( signal.mean(axis=1), axis=0).abs().sum(axis=1), axis=0 )
However, when I run this line, the program gets stuck. I believe this issue comes from the difference in length/presence of nans. Dropping the nans/filling it is not an option, for any given row that has a nan I want that nan to simply be excluded from the computation. A temporary solution would be to do this iteratively using .iterrows(), but this is not an efficient solution.
Are there any smart solutions to this problem?
The thing is, the pandas mean and sum methods already exclude NaN values by default (see the description of the skipna keyword in the linked docs). Additionally, subtract and divide allow for the use of a fill_value keyword arg:
fill_value : None or float value, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing
So you may be able to get what you want by setting fill_value=0 in the calls to subtract, and fill_value=1 in the calls to divide.
However, I suspect that the default behavior (NaN is ignored in mean and sum, NaN - anything = NaN, NaN\anything = NaN) is what you actually want. In that case, your problem isn't directly related to NaNs, and you're going to have to clarify your statement "when I run this line, the program gets stuck" in order to get a useful answer.

Pandas Series Apply Method

I've been using pandas apply method for both series and dataframe, but I am obviously still missing something, because I'm stumped on a simple function i'm trying to execute.
This is what I was doing:
def minmax(row):
return (row - row.min())/(row.max() - row.min())
row.apply(minmax)
but, this returns an all zero Series. For example, if
row = pd.Series([0, 1, 2])
then
minmax(row)
returns [0.0, 0.5, 1.0], as desired. But, row.apply(minmax) returns [0,0,0].
I believe this is because the series is of ints and the integer division returns 0. However, I don't understand,
why it works with minmax(row) (shouldn't it act the same?), and
how to cast it correctly in the apply function to return appropriate float values (i've tried to cast it using .astype and this gives me all NaNs... which I also don't understand)
if apply this to a dataframe, as df.apply(minmax) it also works as desired. (edit added)
i suspect i'm missing something fundamental in how the apply works... or being dense. either way, thanks in advance.
When you call row.apply(minmax) on a Series only the values are passed to the function. This is called element-wise.
Invoke function on values of Series. Can be ufunc (a NumPy function that applies to the entire Series) or a Python function that only works on single values.
When you call row.apply(minmax) on a DataFrame either rows (default) or columns are passed to the function (according to the value of axis).
Objects passed to functions are Series objects having index either the DataFrame’s index (axis=0) or the columns (axis=1). Return type depends on whether passed function aggregates, or the reduce argument if the DataFrame is empty. This is called row-wise or column-wise.
This is why your example works as expected on the DataFrame and not on the Series. Check this answer for information on mapping functions to Series.

Average of a numpy array returns NaN

I have an np.array with over 330,000 rows. I simply try to take the average of it and it returns NaN. Even if I try to filter out any potential NaN values in my array (there shouldn't be any anyways), average returns NaN. Am I doing something totally wacky?
My code is here:
average(ngma_heat_daily)
Out[70]: nan
average(ngma_heat_daily[ngma_heat_daily != nan])
Out[71]: nan
try this:
>>> np.nanmean(ngma_heat_daily)
This function drops NaN values from your array before taking the mean.
Edit: the reason that average(ngma_heat_daily[ngma_heat_daily != nan]) doesn't work is because of this:
>>> np.nan == np.nan
False
according to the IEEE floating-point standard, NaN is not equal to itself! You could do this instead to implement the same idea:
>>> average(ngma_heat_daily[~np.isnan(ngma_heat_daily)])
np.isnan, np.isinf, and similar functions are very useful for this type of data masking.
Also, there is a function named nanmedian which ignores NaN values. Signature of that function is: numpy.nanmedian(a, axis=None, out=None, overwrite_input=False, keepdims=<no value>)

Categories

Resources