This question already has answers here:
Selecting multiple columns in a Pandas dataframe
(22 answers)
Closed 5 years ago.
How do you print (in the terminal) a subset of columns from a pandas dataframe?
I don't want to remove any columns from the dataframe; I just want to see a few columns in the terminal to get an idea of how the data is pulling through.
Right now, I have print(df2.head(10)) which prints the first 10 rows of the dataframe, but how to I choose a few columns to print? Can you choose columns by their indexed number and/or name?
print(df2[['col1', 'col2', 'col3']].head(10)) will select the top 10 rows from columns 'col1', 'col2', and 'col3' from the dataframe without modifying the dataframe.
Related
This question already has answers here:
Use a list of values to select rows from a Pandas dataframe
(8 answers)
Closed 13 days ago.
I want to keep all rows where the value in one of the columns is in a list. (or if you want delete the rows where the value in a columns is not in the list.
In the df['ID'] column there is a large number of ID's and they occur several times. There are only some of them that i want to keep. Those ID's are kept as strings in a list called surv3.
I tried this:
df = df.loc[(df['ID'] in surv3)]
Use the isin method.
For instance:
df = pd.DataFrame({'test': ['ga', 'bu', 'zo', 'meu']})
mylist = ['ga', 'meu']
df[df.test.isin(mylist)] returns:
test
0 ga
3 meu
This question already has answers here:
How can I pivot a dataframe?
(5 answers)
How to pivot a dataframe in Pandas? [duplicate]
(2 answers)
Closed 1 year ago.
I have a dataframe like this:
index,col1,value
1,A,1
1,B,2
2,A,3
2,D,4
2,C,5
2,B,6
And I would like to convert this dataframe to this:
index,col1_A,col1_B,col1_C,col1_D
1,1,2,np.Nan,np.nan
2,3,4,5,6
The conversion is based on the index column value and for each unique index column, the column values from col1 is converted to column name and its associated value is set to the corresponding value available in value columns.
Currently my solution contains looping by creating subset of df as temporary df based on each index and then starting looping there. I am wondering if there is already builtin solution available for it in pandas. please feel free to suggest.
This question already has answers here:
How to remove nan value while combining two column in Panda Data frame?
(5 answers)
Closed 1 year ago.
I have two DFs that I am trying to merge on the column 'conId'.The DFs have different number of rows and the only other overlapping column is 'delta'.
I am using pf.merge(greek,on='conId',how='left')
The resulting DF is giving me columns 'delta_x' and 'delta_y'
how can I merge these two columns into one column?
Thank you!
You can use
df['delta_x'] = df['detlt_x'].fillna(df['delta_y'])
then drop column if you want
df.drop(['delta_y'], axis=1)
This question already has answers here:
How to select all columns whose names start with X in a pandas DataFrame
(11 answers)
Closed 2 years ago.
I manually select the columns in a pandas dataframe using
df_final = df[['column1','column2'.......'column90']]
Instead I provide the list of column names in a list by
dp_col = [col for col in df if col.startswith('column')]
But not sure how to use this list to get only those set of columns from the source dataframe.
You can use this as the list of columns to select, so:
df_final = df[[col for col in df if col.startswith('column')]]
The "origin" of the list of strings is of no importance, as long as you pass a list of strings to the subscript, this will normally work.
Use loc access with boolean masking:
df.loc[:, df.columns.str.startswith('column')]
This question already has answers here:
Pandas: how to merge two dataframes on a column by keeping the information of the first one?
(4 answers)
Closed 3 years ago.
I have two dataframes.
The first dataframe is A.
And the second dataframe is B.
Basically both dataframes have AdId fields. First dataframe has unique AdIds per row but the second dataframe has multiple instances of a single AdId. I want to get all the information of that AdId to the second dataframe.
I am expecting the output as follows
I have tried the following code
B.join(A, on='AdId', how='left', lsuffix='_caller')
But this does not give the expected output.
Use pandas concat:
result = pd.concat([df1, df4], axis=1, sort=False)
More on merging with pandas: https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html#set-logic-on-the-other-axes