I'm working on python 3.x.I have a pandas data frame with only one column, student.At 501th row student contains nan
df.at[501,'student'] returns nan
To remove this I used following code
df.at['student'].replace('', np.nan, inplace=True)
But after that I'm still getting nan for df.at[501,'student']
I also tried this
df.at['student'].replace('', np.nan, inplace=True)
But I'm using df in for loop to check value of student to apply some business logic but with inplace=True I'm getting key error :501
Can you suggest me how do I remove the nan & use df in for loop to check student value?
Adding another answer since it's completely a different case.
I think you are not looping correctly on the dataframe, seems like you are looping relying on the index of the dataframe when you should probably loop on the items row by row or preferably use df.apply.
If you still want to loop on the items and you don't care about the previous index, you can reset the index with df.reset_index(drop=True)
df['student'].replace('', np.nan, inplace=True)
df['student'].dropna(inplace=True)
df = df.reset_index(drop=True)
# do your loop here
your problem is that you are dropping the item at index 501 then trying to access it, when you drop items pandas doesn't automatically update the index.
the replace function that you use would replace the first parameter with the second.
if you want to replace the np.nan with empty then you have to do
df['student'].replace(np.nan, '', inplace=True)
but this would not remove the row, it'd just replace it with an empty string, what you want is
df['student'].dropna(inplace=True)
but you gotta do this before looping over the elements, don't dropna in the loop.
it'd be helpful to know what exactly you are doing in the loop
One way to remove the rows that contains Nan values in the "student" column is
df = df[~df['student'].isnull()]
Related
I want to be able to remove rows that are empty in column NymexPlus and NymexMinus
right now the code I have is
df.dropna(subset=['NymexPlus'], inplace=True)
The thing about this code is that it will also delete rows in the column NymexMinus which I don't want to happen.
Is there an If/AND statement that will work in terms of only getting rid of empty cells for both of the columns?
Use a list as subset parameter and how='all':
df.dropna(subset=['NymexPlus', 'NymexMinus'], how='all', inplace=True)
With my df, I dropped index 3116:
df=df.drop(df.index[3116],axis=0)
However, when I try to use a for loop with the rows in the df later, there's an error at 3116. Not sure why? Is it not dropped correctly? Because when I use df.info(), there is 1 less column so I would think it's correct but later there's an error:
for i in range(df['ever_married'].count()):
if df['ever_married'][i] == 'Yes':
df['ever_married'][i]=1
elif df['ever_married'][i] =='No':
df['ever_married'][i]=0
This brings:
'KeyError: 3116'
However, when I add to before the first if block in the for loop:
if i==3116:
pass
The error goes away, but the code doesn't perform how I want by converting all values from obj to int.
How can I fix this? Thank you!
If you drop the index at that position, then the dataframe has a non-contiguous index. Later, when you loop over it, you make the assumption that the index is contiguous:
for i in range(df['ever_married'].count()):
This will loop from 0 to the number of rows in your database, and does not skip any dropped rows. There are four fixes you could choose from here:
Get rid of the loop. Series.map() could be applied to this problem, like so:
df['ever_married'] = df['ever_married'].map({'No': 0, 'Yes': 1})
This is both faster and more robust. It replaces No with 0, and Yes with 1, everywhere in the column.
Index using .iloc[] instead of indexes. The .iloc[] indexer selects by position within the series or dataframe, rather than by index.
Example of how to set the value in column "ever_married" at index i to 1.
df.iloc[i, df.columns.get_loc('ever_married')]=1
Restore a contiguous index using .reset_index(). DataFrame.reset_index() can reset the index so that it is contiguous and does not skip any numbers.
Example:
df = df.reset_index(drop=True)
Use for i in df.index: to skip over missing rows.
Of the four solutions, I would suggest solution 1.
I have the following Dataframe
Date A AAPL FB GOOG MSFT WISE.L
2021-10-15 153.270004 144.839996 324.76001 2833.5 304.209991 900.0
I am trying to write a code that will check if the df.columns have any string with ".L" at the end, and then change it's value. For example: In the df above I want to reach 900.0 and change it.
Note: the strings that contain ".L" can be numerous, and have different name, all depends on user input, so i'll need to fetch all of them and change them at once.
Is it possible to do it or I should find out a different way to do it?
--Editing my question after #Kosmos suggestion
Create a list of the columns which ends with “.L”.
col_list = [col for col in df.columns of col.endswith(”.L”)]
#Kosmos suggetion works well so I tweaked it to:
for col in df.columns:
if col.endswith(".L"):
#do something
In the #do something space i'll need to convert the value stored in the columns with ".L" values (convert the number to USD) which I already know how to do, the issue is how can I access and change it on the frame without exctracting and inserting it again?
Create a list of the columns which ends with “.L”.
col_list = [col for col in df.columns if col.endswith(”.L”)]
The following operation gives a frame with only the columns which ends with “.L”.
df.loc[:,col_list]
After update
1st Solution
I see your problem. The list comprehension i suggested is not immediately suitable (it could be fixed using a custom function). I think that you are very close to done with your new suggestion. Editing the df column-wise can be done as such:
for col in df.columns:
if col.endswith(".L"):
df.loc[:,col] = df.loc[:,col]*arbitrary_value
2nd solution
Note that if all columns in col_list is aggregated using the same value (e.g. converting to USD), the following can also be done:
df.loc[:,col_list] = df.loc[:,col_list]*arbitrary_value
I have a dataframe with accident data of some streets:
I'd like to remove (or at least select) that first row that is indexed as np.nan. I tried streets.loc[np.nan,:] but that returns a KeyError: nan. I'm not sure how else to specifically select that record.
Other than using pd.DataFrame.iloc[0,:] (which is imprecise as it relies on location rather than index name) how can I select that specific record?
I think there are two options you can do.
You can fill any random value to nan and then select it.
df.fillna(value={'ON STREET NAME': 'random'})
streets.loc['random',:]
assign another index column, but this can affect your dataframe later.
You can do df = df.dropna()
This will remove all rows with at least one nan value.
Optionally, you could also do df.dropna(inplace=True) The parameter inplace just means that you don't have to specify df = df.dropna() and it will modify the original var for you.
You can find more info on this here: pandas.DataFrame.dropna
I will do
df = df[df.index.notna()]
I want to iterate over each row and and want to check in each column if value is NaN if it is then i want to replace it with the previous value of the same row which is not null.
I believe the prefer way would be using lamba function. But still not figure out to code it
Note: I have thousands of rows and 200 columns in each row
The following should do the work:
df.fillna(method='ffill', axis=1, inplace=True)
Can you please clarify what you want to be done with NaNs in first column(s)?
i think you can use this -
your_df.apply(lambda x : x.fillna(method='ffill'), axis=1)