Pandas DataFrame.reset_index for columns [duplicate] - python

This question already has answers here:
Reset column index in Pandas to 0,1,2,3...?
(7 answers)
Closed last year.
Is there a reset_index equivalent for the column headings? In other words, if the column names are an MultiIndex, how would I drop one of the levels?

Answer to the second question:
df.columns = df.columns.droplevel(level)
First question is as #AndyHayden points out not that straight forward. It only would make sense if your columns names are of the same type as your column values.

Here's a really dumb way to turn your columns into tuples instead:
df.columns = list(df.columns)
You can build on that to get whatever you want, for example if you had a 2 level MultiIndex, to remove the outermost level, you could just do:
df.columns = [col[1] for col in df.columns]
You can't do fancy indexing over the iteration because it's generating tuples, but you can do things like:
df.columns = MultiIndex.from_tuples([col[1:] for col in df.columns])
So you have some options there.

Transpose df, reset index, and transopse again.
df.T.reset_index().T

Related

changing row values in a dataframe by looking into another dataframe [duplicate]

This question already has answers here:
Pandas Merging 101
(8 answers)
Closed last year.
I have a look up table as a dataframe (1000 rows) consisting of codes and labels. I have another dataframe (2,00,000 rows) consisting of codes and geometries.
I need to get label names for each corresponding code by looking in the look up dataframe.
Output should be dataframe.
I tried it as follows.
df = pd.read_csv(filepath)
codes = df['codes'].values
labels = df['labels'].values
df2 = pd.read_csv(filepath)
print (df2.shape)
for ix in df2.index:
code = df2.loc[ix, 'code']
df2.loc[ix, 'label'] = labels[codes==code][0]
print (df2)
Result is correct, but it's very slow... for looping is very slow
Can you help me?
You should use the merge method of DataFrames (https://pandas.pydata.org/docs/reference/api/pandas.merge.html). It allows to join two dataframes based on a common column. Your code should look like this:
df2 = df2.merge(df, left_on="code", right_on="codes", how="left")
# Check labels using df2["labels"]
The common column name is specified in the parameters left_on and right_on. The parameter how='left' indicates that all the rows from df2 are preserved even if there is no code for a row.

how to modify dataframe when referring them through a list loop [duplicate]

This question already has answers here:
Why isn't Pandas .fillna() filling values in DataFrame?
(2 answers)
Closed last year.
I have many dataframes and I store them in a list.
Now I'd like to do simple fillna(0) to each dataframe, so I do the following, but it didn't work:
df_list = [df_1,df_2,df_3,df_4]
for df in df_list:
df = df.fillna(0)
df.index = df.index.strftime('%Y-%m-%d')
I think df on the left hand side inside the loop is not the same as original dataframe, how to do it?
In your first line of the loop you are defining a new dataframe and doing nothing with it.
Instead you can just use inplace = True to do the work on the dataframe without creating a new one.
for df in df_list:
df.fillna(0, inplace = True)
df.index=df.index.strftime('%Y-%m-%d')

Get the specified set of columns from pandas dataframe [duplicate]

This question already has answers here:
How to select all columns whose names start with X in a pandas DataFrame
(11 answers)
Closed 2 years ago.
I manually select the columns in a pandas dataframe using
df_final = df[['column1','column2'.......'column90']]
Instead I provide the list of column names in a list by
dp_col = [col for col in df if col.startswith('column')]
But not sure how to use this list to get only those set of columns from the source dataframe.
You can use this as the list of columns to select, so:
df_final = df[[col for col in df if col.startswith('column')]]
The "origin" of the list of strings is of no importance, as long as you pass a list of strings to the subscript, this will normally work.
Use loc access with boolean masking:
df.loc[:, df.columns.str.startswith('column')]

How to drop multiple columns (using column names) from a dataframe using pandas? [duplicate]

This question already has an answer here:
Dropping columns in a dataframe
(1 answer)
Closed 2 years ago.
I have a data frame df with around 200 columns. I want to drop the columns with an index position from 50 to 90 and 120 to 170 with its name rather than its index position. How to do that.
I cannot use:
df.drop('column name', axis=1)
directly because there are so many columns to drop and I cannot really type each of its column names as in the above cases.
I am interested in knowing how to select the columns from particular column name column50 to another column name column90 and column120 to column170 rather than with the int
You can use np.r_ to do this:
import numpy
idx = np.r_[50:90, 120:170]
df.drop(df.columns[idx], axis=1, inplace=True)
From the np.r_ docs:
Translates slice objects to concatenation along the first axis.
In your case, it concatenates non-contiguous slices of arrays which you can use in df.drop command.
df.drop(df.columns.to_series()["column_name_1":"column_name_2"], axis=1)
By converting to a series you can actually use a range to drop. You'd just need to know the column names.
For non-contiguous slice objectss, always prefer to use np.r_.
The main goal for np.r_ is:
Translates slice objects to concatenation along the first axis.
Given that you have now concatenated, non-contiguous slice blocks, it gets easy to perform operations. You can use drop, loc, iloc or whatever logic you want (not much gain here beyond readability).
For example,
df = df.iloc[:, np.r_[50:90, 120:170]]
or, as suggested by #anky
df[df.columns ^ df.columns[np.r_[50:90,120:170]]]
You can create a list of columns like this:
idx = list(range(50,90)) + list(range(120,170))
df = df.drop(df.columns[idx], axis=1)

dropping empty columns in pandas 0.23+ [duplicate]

This question already has answers here:
Pandas: drop columns with all NaN's
(4 answers)
Closed 4 years ago.
In earlier versions of pandas, you could drop empty columns simply with:
df.dropna(axis='columns')
However, dropna has been depreciated in later builds. How would one now drop multiple (without specifically indexing) empty columns from a dataframe?
I am able to drop empty columns using dropna() with the current version of Pandas (0.23.4). The code I used is:
df.dropna(how='all', axis=1)
Looks like what is deprecated is passing multiple axes at once (i.e. df.dropna(how='all', axis = [0, 1]). You can read here that they made this decision - "let's deprecate passing multiple axes, we don't do this for any other pandas functions".
You can get the columns that are not null and then filter your DataFrame on those.
Here's an example
non_null_columns = [col for col in df.columns if df.loc[:, col].notna().any()]
df[non_null_columns]

Categories

Resources