numpy.insert() invalid slice -- Trying to Insert NaN in Numpy Array - python

I know there are already lots of questions about this, but none of the answers I've seen have solved my problem. I have a pandas DataFrame with 10 columns for data, but on some rows I have just 9 columns-worth of data. For the rows with just 9 datapoints, I need the data to be in the last nine columns. My solution is to insert a NaN value in front of the length-9 arrays so that the data is pushed to the correct columns. But everything I've tried has thrown up errors!
(I'm trying to insert NaN into a numpy array that looks like this: [6070000.0 6639000.0 15004000.0 15944000.0 8888000.0 9896000.0 22502500.0 23577000.0 14835500.0])
My current best guess:
a = np.array(a,dtype=float)
a = np.insert(a,np.nan,0)
**IndexError: invalid slice**
Any ideas about how I can get this doggone NaN into the array?

Your code is currently attempting to insert 0 at index np.nan. Switch the args around:
a = np.insert(a, 0, np.nan)

Related

Print Values From 2D Numpy Array

I'm new to numpy and have read several other posts like mine but nothing is working for me.
I have a large array with many NaNs and I'd like to look at values that are not NaN.
flower_matrix = np.array([
[NaN,1,2,NaN,NaN,NaN,NaN,NaN,NaN,NaN,10,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN],
[0,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,12,13,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN],
[0,NaN,NaN,3,4,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN],
[NaN,NaN,2,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,22,23],
[NaN,NaN,2,NaN,NaN,5,6,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN],
[NaN,NaN,NaN,NaN,4,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,16,17,NaN,NaN,NaN,NaN,NaN,NaN],
[NaN,NaN,NaN,NaN,4,NaN,NaN,7,8,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,],
[NaN,NaN,NaN,NaN,NaN,NaN,6,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,18,19,NaN,NaN,NaN,NaN],
[NaN,NaN,NaN,NaN,NaN,NaN,6,NaN,NaN,9,10,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN],
[NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,8,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,20,21,NaN,NaN],
[0,NaN,NaN,NaN,NaN,NaN,NaN,NaN,8,NaN,NaN,11,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN],
[NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,10,NaN,NaN,NaN,14,15,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN],
[NaN,1,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,15,NaN,NaN,NaN,19,NaN,NaN,NaN,NaN],
[NaN,1,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,18,NaN,NaN,NaN,22,NaN],
[NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,11,NaN,NaN,NaN,NaN,NaN,17,NaN,NaN,NaN,21,NaN,NaN],
[NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,11,12,NaN,NaN,NaN,16,NaN,NaN,NaN,NaN,NaN,NaN,NaN],
[NaN,NaN,NaN,NaN,NaN,5,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,15,NaN,NaN,NaN,NaN,NaN,NaN,NaN,23],
[NaN,NaN,NaN,NaN,NaN,5,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,14,NaN,NaN,NaN,18,NaN,NaN,NaN,NaN,NaN],
[NaN,NaN,NaN,NaN,NaN,NaN,NaN,7,NaN,NaN,NaN,NaN,NaN,13,NaN,NaN,NaN,17,NaN,NaN,NaN,NaN,NaN,NaN],
[NaN,NaN,NaN,NaN,NaN,NaN,NaN,7,NaN,NaN,NaN,NaN,12,NaN,NaN,NaN,NaN,NaN,NaN,NaN,20,NaN,NaN,NaN],
[NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,9,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,19,NaN,NaN,NaN,23],
[NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,9,NaN,NaN,NaN,NaN,14,NaN,NaN,NaN,NaN,NaN,NaN,NaN,22,NaN],
[NaN,NaN,NaN,3,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,13,NaN,NaN,NaN,NaN,NaN,NaN,NaN,21,NaN,NaN],
[NaN,NaN,NaN,3,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,16,NaN,NaN,NaN,20,NaN,NaN,NaN]])
I know that I can do
print(flower_matrix[0,1])
to get the value 1.0. I'm looking to do something similar but iterated through columns and rows. My best guess is something like:
for i in flower_matrix:
for j in flower_matrix:
if (i,j) != NaN:
print(i,j)
But of course this doesn't work. I have 24 columns and 24 rows and I want to iterate through each value and return the value if it is not NaN. Does this make sense?
Thanks in advance!
You can try numpy 'isnan' function instead of != comparison which does not work for NaN values. Also you may try 'is' keyword of Python.
For Python:
NaN == NaN gives False
NaN is NaN gives True
This should help you with your problem.
print(flower_matrix[~np.isnan(flower_matrix)])
If you want iterative case:
for i in flower_matrix:
for j in i:
if j == j:
print(j)

how does this pandas snippet code work behind the scene

data = credit_data[credit_data['CREDIT_LIMIT'].isna()]
this is the code snippet from a code I was writing.
Here I wanted to print all the rows that contain nan values in a column.
This code accomplishes that but what I want to know is how is this actually happening.
As credit_data['CREDIT_LIMIT'].isna() prints out a series containing bool values so how by just passing that series through our dataframe (credit_data) we are getting all the rows that contain nan values
at this point I have searched on some blogs and pandas documentation for dataframe.isna()
and some answers on this site but haven't found anything satisfactory.
I would be great if you can point me right direction like give a blog post link or some answer that already answers this query
thanks
As credit_data['CREDIT_LIMIT'].isna() prints out a series containing
bool values so how by just passing that series through our dataframe
(credit_data) we are getting all the rows that contain nan values
By passing boolean Series you have used feature named Boolean Masking, it is done by providing iterable (which might be, but does not have to be Series) of bool values of length equal to DataFrame, consider following example
import pandas as pd
df = pd.DataFrame({'letter':['A','B','C','D','E']})
mask = [True,False,True,False,True]
print(df[mask])
output
letter
0 A
2 C
4 E
Note that this feature is also present in numpy for example
import numpy as np
arr = np.arange(25).reshape((5,5))
mask = [True,False,True,False,True]
print(arr[mask])
output
[[ 0 1 2 3 4]
[10 11 12 13 14]
[20 21 22 23 24]]

Is there a way to overwrite Nan values in a pandas dataframe with values of the previous row?

I am working with a dataframe called ´tabla_combinada´ that looks like this:
Structure of the dataframe used:
What I am attempting to do is to get rid of the Nan values in the 'End Meter' column and replace it with the value of the same column in the previous row. I tried to implement the following code:
counter=0
for x in tabla_combinada['End Meter']:
if math.isnan(x):
x = tabla_combinada['End Meter'][counter-1]
tabla_combinada['End Meter'][counter-1] = tabla_combinada['Start Meter'][counter]
counter = counter + 1
This is not working for me, in the first place I am getting the following warning:
A value is trying to be set on a copy of a slice from a DataFrame.
But what bugs me is that I am obtaining no change in the dataframe at all. I do understand the cause of the warning and I suspect that this is not the optimal approach to solve the problem. I guess there is a proper way to do this with loc, but I couldn't find out how to tell the program to replace the Nan with the value of the previous row.
Sorry for the long question and thanks in advance.
All you need to do is this:
tabla_combinada['End Meter'].fillna(method='ffill')
This will propagate non-null values forward

replace values by NAN

I've got a dataframe that looks like this;
[index, Data]
[1, [5,3,6,8,4,5,7etc]]
The data in my "data"column stays in an array. I need to have at least 75 values in each array. The dataframe is 438 rows long.
I need to make a filter where all the arrays that contains less than 75 values, will be replaced by NaN.
I thought of something like this:
for i in range(len(df_window)):
if len(df_window['Data'][i][0])<75:
I don't know if this is right and how to continue. The dataframe called df_window
can someone help me quick please?
You can use lengths = df_window['Data'].apply(len) to get the serie of array lengths. Then by using df_window.loc[(lengths < 75), 'Data'] = np.nan you should get what you want.
EDIT: Corrected first line.

Values being altered in numpy array

So I have a 2D numpy array (256,256), containing values between 0 and 10, which is essentially an image. I need to remove the 0 values and set them to NaN so that I can plot the array using a specific library (APLpy). However whenever I try and change all of the 0 values, some of the other values get altered, in some cases to 100 times their original value (no idea why).
The code I'm using is:
for index, value in np.ndenumerate(tex_data):
if value == 0:
tex_data[index] = 'NaN'
where tex_data is the data array from which I need to remove the zeros. Unfortunately I can't just use a mask for the values I don't need, as APLpy wont except masked arrays as far as I can tell.
Is there anyway I can set the 0 values to NaN without changing the other values in the array?
Use fancy-indexing. Like this:
tex_data[tex_data==0] = np.nan
I don't know why your original code was failing. It looks correct to me, although terribly inefficient.
Using float rules,
tex_data/tex_data*tex_data
make the job here also.

Categories

Resources