I want to iterate over each row and and want to check in each column if value is NaN if it is then i want to replace it with the previous value of the same row which is not null.
I believe the prefer way would be using lamba function. But still not figure out to code it
Note: I have thousands of rows and 200 columns in each row
The following should do the work:
df.fillna(method='ffill', axis=1, inplace=True)
Can you please clarify what you want to be done with NaNs in first column(s)?
i think you can use this -
your_df.apply(lambda x : x.fillna(method='ffill'), axis=1)
Related
I want to be able to remove rows that are empty in column NymexPlus and NymexMinus
right now the code I have is
df.dropna(subset=['NymexPlus'], inplace=True)
The thing about this code is that it will also delete rows in the column NymexMinus which I don't want to happen.
Is there an If/AND statement that will work in terms of only getting rid of empty cells for both of the columns?
Use a list as subset parameter and how='all':
df.dropna(subset=['NymexPlus', 'NymexMinus'], how='all', inplace=True)
I am trying to get the column names which match a particular string "Buy" for each row of a data frame(df). So if there are 2 columns matching "Buy" then both the column names are considered.
I have tried out the below code which works perfectly but it takes a long time for execution. Is there a way to improve its performance, as I can see the apply statement is impacting the performance. I have heard about vectorization/swifter apply to improve performance , but I am not able to understand how to apply it for my specific requirement.
Step1:
Get column names having '_buy_sell' as a partial string in its column name:
[Col_Buy_Sell]==[col for col in df.columns if '_buy_sell' in col]
Step2:
In the selected columns in step 1 ,if any of the value is 'Buy' then all those column names are moved to field 'Final_Buy'. The multiple columns are separated by ",":
df['Final_Buy']=(df[Col_Buy_Sell] == 'Buy').apply(lambda y: (','.join(df[Col_Buy_Sell].columns[y])), axis=1)
Thanks in advance!
If you want to populate a new column with the names of all the columns in each row that have the value 'Buy' try this.
df['Final_Buy'] = df.eq('Buy').dot(df.columns + ',').str.rstrip(', ')
I have a dataframe with accident data of some streets:
I'd like to remove (or at least select) that first row that is indexed as np.nan. I tried streets.loc[np.nan,:] but that returns a KeyError: nan. I'm not sure how else to specifically select that record.
Other than using pd.DataFrame.iloc[0,:] (which is imprecise as it relies on location rather than index name) how can I select that specific record?
I think there are two options you can do.
You can fill any random value to nan and then select it.
df.fillna(value={'ON STREET NAME': 'random'})
streets.loc['random',:]
assign another index column, but this can affect your dataframe later.
You can do df = df.dropna()
This will remove all rows with at least one nan value.
Optionally, you could also do df.dropna(inplace=True) The parameter inplace just means that you don't have to specify df = df.dropna() and it will modify the original var for you.
You can find more info on this here: pandas.DataFrame.dropna
I will do
df = df[df.index.notna()]
I'm working on python 3.x.I have a pandas data frame with only one column, student.At 501th row student contains nan
df.at[501,'student'] returns nan
To remove this I used following code
df.at['student'].replace('', np.nan, inplace=True)
But after that I'm still getting nan for df.at[501,'student']
I also tried this
df.at['student'].replace('', np.nan, inplace=True)
But I'm using df in for loop to check value of student to apply some business logic but with inplace=True I'm getting key error :501
Can you suggest me how do I remove the nan & use df in for loop to check student value?
Adding another answer since it's completely a different case.
I think you are not looping correctly on the dataframe, seems like you are looping relying on the index of the dataframe when you should probably loop on the items row by row or preferably use df.apply.
If you still want to loop on the items and you don't care about the previous index, you can reset the index with df.reset_index(drop=True)
df['student'].replace('', np.nan, inplace=True)
df['student'].dropna(inplace=True)
df = df.reset_index(drop=True)
# do your loop here
your problem is that you are dropping the item at index 501 then trying to access it, when you drop items pandas doesn't automatically update the index.
the replace function that you use would replace the first parameter with the second.
if you want to replace the np.nan with empty then you have to do
df['student'].replace(np.nan, '', inplace=True)
but this would not remove the row, it'd just replace it with an empty string, what you want is
df['student'].dropna(inplace=True)
but you gotta do this before looping over the elements, don't dropna in the loop.
it'd be helpful to know what exactly you are doing in the loop
One way to remove the rows that contains Nan values in the "student" column is
df = df[~df['student'].isnull()]
I have a dataframe with 7 columns and ~5.000 rows. I want to check that all the column values in a row are in my list and if so either add them to a new dataframe OR remove those where all values do not match, i.e. remove false rows (w/e is the easiest);
for row in df:
for columns in row:
if df.iloc[row, column].isin(MyList):
...*something*
I could imagine that .apply and .all could be used, but I'm afraid my python skills are limited, any help?
If I understood correctly, you can solve this by using apply with a lambda expression like:
df.loc[df.apply(lambda row: all(value in MyList for value in row), axis=1))]