Get the specified set of columns from pandas dataframe [duplicate] - python

This question already has answers here:
How to select all columns whose names start with X in a pandas DataFrame
(11 answers)
Closed 2 years ago.
I manually select the columns in a pandas dataframe using
df_final = df[['column1','column2'.......'column90']]
Instead I provide the list of column names in a list by
dp_col = [col for col in df if col.startswith('column')]
But not sure how to use this list to get only those set of columns from the source dataframe.

You can use this as the list of columns to select, so:
df_final = df[[col for col in df if col.startswith('column')]]
The "origin" of the list of strings is of no importance, as long as you pass a list of strings to the subscript, this will normally work.

Use loc access with boolean masking:
df.loc[:, df.columns.str.startswith('column')]

Related

from list of list in pandas dataframe to new set of list with multiple columns in pandas dataframe [duplicate]

This question already has answers here:
Split a Pandas column of lists into multiple columns
(11 answers)
Closed 3 months ago.
I have these values in dataset in a pandas dataframe column
col1
[[1,2],[1,2]]
[[3,4],[3,4]]
[[5,6],[5,6]]
I want to get a new column of two elements as list in new columns as rows.
This is the columns that I want to get.
col1 col2
[1,1] [2,2]
[3,3] [4,4]
[5,5] [6,6]
Assuming lists, use the DataFrame constructor:
out = pd.DataFrame(df['col1'].tolist(), columns=['col1', 'col2'])
If you have strings, first convert to lists:
df['col1'] = df['col1'].apply(pd.eval)
Assuming so is the name of your dataframe and "a" the name of the original column you want to split you can do it using apply-lambda approach. I am not sure if that is the best way:
so["b"] = so["a"].apply(lambda x: x[1])
so["a"] = so["a"].apply(lambda x: x[0])

How do I convert strings in a column into numbers which I can use later, in a dataframe? [duplicate]

This question already has an answer here:
Factorize a column of strings in pandas
(1 answer)
Closed 2 years ago.
I have a dataset consisting of 382 rows and 4 columns. I need to convert all the names in to numbers. The names do repeat here and there, so I can't just randomly give numbers.
So, I made a dictionary of the names and the corresponding values. But now, I am not able to change the values in the column.
This is how I tried to add the values to the column:
test_df.replace(to_replace = d_loc,value = None, regex = True, inplace = True)
print(test_df)
but test_df just gives me the same dataframe, without any modifications.
What should I use? I have over 100 unique names, so I cannot mannually rename them.
df.applymap() works on each item in a dataframe:
test_df.applymap(lambda x: dict_to_replace[x])

Delete rows if there are null values in a specific column in Pandas dataframe [duplicate]

This question already has answers here:
Python: How to drop a row whose particular column is empty/NaN?
(2 answers)
How do I select rows from a DataFrame based on column values?
(16 answers)
Closed 4 years ago.
I'm new to python pandas. Need some help with deleting a few rows where there are null values. In the screenshot, I need to delete rows where charge_per_line == "-" using python pandas.
If the relevant entries in Charge_Per_Line are empty (NaN) when you read into pandas, you can use df.dropna:
df = df.dropna(axis=0, subset=['Charge_Per_Line'])
If the values are genuinely -, then you can replace them with np.nan and then use df.dropna:
import numpy as np
df['Charge_Per_Line'] = df['Charge_Per_Line'].replace('-', np.nan)
df = df.dropna(axis=0, subset=['Charge_Per_Line'])
Multiple ways
Use str.contains to find rows containing '-'
df[~df['Charge_Per_Line'].str.contains('-')]
Replace '-' by nan and use dropna()
df.replace('-', np.nan, inplace = True)
df = df.dropna()

Unable to drop a column from pandas dataframe [duplicate]

This question already has answers here:
Delete a column from a Pandas DataFrame
(20 answers)
Closed 6 years ago.
I have imported a Excel sheet into pandas. It has 7 columns which are numeric and 1 column which is a string (a flag).
After converting the flag to a categorical variable, I am trying to drop the string column from the Pandas dataframe. However, I am not able to do it.
Here's the code:
[In] parts_median_temp.columns
[Out] Index([u'PART_NBR', u'PRT_QTY', u'PRT_DOL', u'BTS_QTY', u'BTS_DOL', u'Median', u'Upper_Limit', u'Flag_median'], dtype='object')
The column I'm trying to drop is 'Flag_median'.
[In] parts_median_temp.drop('Flag_median')
[Out] ...ValueError: labels ['Flag_median'] not contained in axis
Help me drop the Flag_median column from the Pandas dataframe.
You have to use the inplace and axis parameter:
parts_median_temp.drop('Flag_median', axis=1, inplace=True)
The default value of 'inplace' is False, and axis' default is 0. axis=0 means dropping by index, whereas axis=1 will drop by column.
You can try this:
parts_median_temp = parts_median_temp.drop('Flag_median', axis=1)

Pandas DataFrame.reset_index for columns [duplicate]

This question already has answers here:
Reset column index in Pandas to 0,1,2,3...?
(7 answers)
Closed last year.
Is there a reset_index equivalent for the column headings? In other words, if the column names are an MultiIndex, how would I drop one of the levels?
Answer to the second question:
df.columns = df.columns.droplevel(level)
First question is as #AndyHayden points out not that straight forward. It only would make sense if your columns names are of the same type as your column values.
Here's a really dumb way to turn your columns into tuples instead:
df.columns = list(df.columns)
You can build on that to get whatever you want, for example if you had a 2 level MultiIndex, to remove the outermost level, you could just do:
df.columns = [col[1] for col in df.columns]
You can't do fancy indexing over the iteration because it's generating tuples, but you can do things like:
df.columns = MultiIndex.from_tuples([col[1:] for col in df.columns])
So you have some options there.
Transpose df, reset index, and transopse again.
df.T.reset_index().T

Categories

Resources