This question already has answers here:
Pandas: drop columns with all NaN's
(4 answers)
Closed 4 years ago.
In earlier versions of pandas, you could drop empty columns simply with:
df.dropna(axis='columns')
However, dropna has been depreciated in later builds. How would one now drop multiple (without specifically indexing) empty columns from a dataframe?
I am able to drop empty columns using dropna() with the current version of Pandas (0.23.4). The code I used is:
df.dropna(how='all', axis=1)
Looks like what is deprecated is passing multiple axes at once (i.e. df.dropna(how='all', axis = [0, 1]). You can read here that they made this decision - "let's deprecate passing multiple axes, we don't do this for any other pandas functions".
You can get the columns that are not null and then filter your DataFrame on those.
Here's an example
non_null_columns = [col for col in df.columns if df.loc[:, col].notna().any()]
df[non_null_columns]
Related
This question already has answers here:
Reversing 'one-hot' encoding in Pandas
(9 answers)
Closed 1 year ago.
I've been trying to use reverse explode from here: How to implode(reverse of pandas explode) based on a column
But I have a little bit different df.
I have df looking like this:
I need to 'reverse explode' it, but I couldn't find any option to groupby by index. Is there any option to do that?
To be precise, I need all columns to remain, but all '1' should be combined in a row.
I merged dummy df with main df, but can not figure out what to do next.
rest_cuisine_style = pd.concat([rest_cuisine_style, cuisine_dummies], axis=1)
Does this work?
rest_cuisine_style = rest_cuisine_style.idxmax(axis=1)
This question already has answers here:
How to select all columns whose names start with X in a pandas DataFrame
(11 answers)
Closed 2 years ago.
I manually select the columns in a pandas dataframe using
df_final = df[['column1','column2'.......'column90']]
Instead I provide the list of column names in a list by
dp_col = [col for col in df if col.startswith('column')]
But not sure how to use this list to get only those set of columns from the source dataframe.
You can use this as the list of columns to select, so:
df_final = df[[col for col in df if col.startswith('column')]]
The "origin" of the list of strings is of no importance, as long as you pass a list of strings to the subscript, this will normally work.
Use loc access with boolean masking:
df.loc[:, df.columns.str.startswith('column')]
This question already has answers here:
Difference between df.reindex() and df.set_index() methods in pandas
(3 answers)
Closed 2 years ago.
I was losing data on a reindex. I just wanted to make an existing column the index.
So this works:
df_all_maa = df_all_maa.set_index("VERSION_SEQ")
Originally I was doing this:
df_all_maa = df_all_maa.reindex(df_all_maa["VERSION_SEQ"])
I think what was happening was I was only getting values in the resulting dataframe, where the VERSION_SEQ value happened to match the numeric default index, but I would be interested to know what my original incorrect syntax was actually doing.
reindex is similar to loc, but allowing non-existing indexes. reindex creates a row with nan values whence there are non-existing indexes, while loc would throw an error.
This question already has answers here:
Python: How to drop a row whose particular column is empty/NaN?
(2 answers)
How do I select rows from a DataFrame based on column values?
(16 answers)
Closed 4 years ago.
I'm new to python pandas. Need some help with deleting a few rows where there are null values. In the screenshot, I need to delete rows where charge_per_line == "-" using python pandas.
If the relevant entries in Charge_Per_Line are empty (NaN) when you read into pandas, you can use df.dropna:
df = df.dropna(axis=0, subset=['Charge_Per_Line'])
If the values are genuinely -, then you can replace them with np.nan and then use df.dropna:
import numpy as np
df['Charge_Per_Line'] = df['Charge_Per_Line'].replace('-', np.nan)
df = df.dropna(axis=0, subset=['Charge_Per_Line'])
Multiple ways
Use str.contains to find rows containing '-'
df[~df['Charge_Per_Line'].str.contains('-')]
Replace '-' by nan and use dropna()
df.replace('-', np.nan, inplace = True)
df = df.dropna()
This question already has answers here:
Delete a column from a Pandas DataFrame
(20 answers)
Closed 6 years ago.
I have imported a Excel sheet into pandas. It has 7 columns which are numeric and 1 column which is a string (a flag).
After converting the flag to a categorical variable, I am trying to drop the string column from the Pandas dataframe. However, I am not able to do it.
Here's the code:
[In] parts_median_temp.columns
[Out] Index([u'PART_NBR', u'PRT_QTY', u'PRT_DOL', u'BTS_QTY', u'BTS_DOL', u'Median', u'Upper_Limit', u'Flag_median'], dtype='object')
The column I'm trying to drop is 'Flag_median'.
[In] parts_median_temp.drop('Flag_median')
[Out] ...ValueError: labels ['Flag_median'] not contained in axis
Help me drop the Flag_median column from the Pandas dataframe.
You have to use the inplace and axis parameter:
parts_median_temp.drop('Flag_median', axis=1, inplace=True)
The default value of 'inplace' is False, and axis' default is 0. axis=0 means dropping by index, whereas axis=1 will drop by column.
You can try this:
parts_median_temp = parts_median_temp.drop('Flag_median', axis=1)