This question already has answers here:
Delete a column from a Pandas DataFrame
(20 answers)
Closed 6 years ago.
I have imported a Excel sheet into pandas. It has 7 columns which are numeric and 1 column which is a string (a flag).
After converting the flag to a categorical variable, I am trying to drop the string column from the Pandas dataframe. However, I am not able to do it.
Here's the code:
[In] parts_median_temp.columns
[Out] Index([u'PART_NBR', u'PRT_QTY', u'PRT_DOL', u'BTS_QTY', u'BTS_DOL', u'Median', u'Upper_Limit', u'Flag_median'], dtype='object')
The column I'm trying to drop is 'Flag_median'.
[In] parts_median_temp.drop('Flag_median')
[Out] ...ValueError: labels ['Flag_median'] not contained in axis
Help me drop the Flag_median column from the Pandas dataframe.
You have to use the inplace and axis parameter:
parts_median_temp.drop('Flag_median', axis=1, inplace=True)
The default value of 'inplace' is False, and axis' default is 0. axis=0 means dropping by index, whereas axis=1 will drop by column.
You can try this:
parts_median_temp = parts_median_temp.drop('Flag_median', axis=1)
Related
This question already has answers here:
How to select all columns whose names start with X in a pandas DataFrame
(11 answers)
Closed 2 years ago.
I manually select the columns in a pandas dataframe using
df_final = df[['column1','column2'.......'column90']]
Instead I provide the list of column names in a list by
dp_col = [col for col in df if col.startswith('column')]
But not sure how to use this list to get only those set of columns from the source dataframe.
You can use this as the list of columns to select, so:
df_final = df[[col for col in df if col.startswith('column')]]
The "origin" of the list of strings is of no importance, as long as you pass a list of strings to the subscript, this will normally work.
Use loc access with boolean masking:
df.loc[:, df.columns.str.startswith('column')]
This question already has answers here:
how to sort pandas dataframe from one column
(13 answers)
Closed 2 years ago.
I have a column in my csv file that I want to have sorted by the datetime. It's in the format like 2020-10-06 03:28:00. I tried doing it like this but nothing seems to have happened.
df = pd.read_csv('data.csv')
df = df.sort_index()
df.to_csv('btc.csv', index= False)
I need to have that index= False in the .to_csv so that it is formatted properly for later so I can't remove that if that is causing an issue. The dtime is my first column in the csv file and the second column is a unix timestamp so I could also use that if it would work better.
sort_values(by=column_name) to sort pandas. DataFrame by the contents of a column named column_name . Before doing this, the data in the column must be converted to datetime if it is in another format using pandas. to_datetime(arg) with arg as the column of dates.
This question already has answers here:
Pandas: drop columns with all NaN's
(4 answers)
Closed 4 years ago.
In earlier versions of pandas, you could drop empty columns simply with:
df.dropna(axis='columns')
However, dropna has been depreciated in later builds. How would one now drop multiple (without specifically indexing) empty columns from a dataframe?
I am able to drop empty columns using dropna() with the current version of Pandas (0.23.4). The code I used is:
df.dropna(how='all', axis=1)
Looks like what is deprecated is passing multiple axes at once (i.e. df.dropna(how='all', axis = [0, 1]). You can read here that they made this decision - "let's deprecate passing multiple axes, we don't do this for any other pandas functions".
You can get the columns that are not null and then filter your DataFrame on those.
Here's an example
non_null_columns = [col for col in df.columns if df.loc[:, col].notna().any()]
df[non_null_columns]
This question already has answers here:
Python: How to drop a row whose particular column is empty/NaN?
(2 answers)
How do I select rows from a DataFrame based on column values?
(16 answers)
Closed 4 years ago.
I'm new to python pandas. Need some help with deleting a few rows where there are null values. In the screenshot, I need to delete rows where charge_per_line == "-" using python pandas.
If the relevant entries in Charge_Per_Line are empty (NaN) when you read into pandas, you can use df.dropna:
df = df.dropna(axis=0, subset=['Charge_Per_Line'])
If the values are genuinely -, then you can replace them with np.nan and then use df.dropna:
import numpy as np
df['Charge_Per_Line'] = df['Charge_Per_Line'].replace('-', np.nan)
df = df.dropna(axis=0, subset=['Charge_Per_Line'])
Multiple ways
Use str.contains to find rows containing '-'
df[~df['Charge_Per_Line'].str.contains('-')]
Replace '-' by nan and use dropna()
df.replace('-', np.nan, inplace = True)
df = df.dropna()
This question already has answers here:
Selecting multiple columns in a Pandas dataframe
(22 answers)
Closed 5 years ago.
How do you print (in the terminal) a subset of columns from a pandas dataframe?
I don't want to remove any columns from the dataframe; I just want to see a few columns in the terminal to get an idea of how the data is pulling through.
Right now, I have print(df2.head(10)) which prints the first 10 rows of the dataframe, but how to I choose a few columns to print? Can you choose columns by their indexed number and/or name?
print(df2[['col1', 'col2', 'col3']].head(10)) will select the top 10 rows from columns 'col1', 'col2', and 'col3' from the dataframe without modifying the dataframe.