This question already has answers here:
Pandas: sum DataFrame rows for given columns
(8 answers)
Closed 4 years ago.
I want to sum multiple columns of dataframe to a new column. For 2 columns I was using this.
import pandas as pd, numpy as np
df=pd.read_csv("Calculation_test.csv")
#creating new colums
df["Test1"] = 0
#sum of 2 columns
df["Test1"]= df['col1']+df['col2']
df.to_csv('test_cal.csv', index=False)
But, for my project, I need to do sums of around 15-20 columns. Every time I do not want to write df['col1']+df['col2']+......................
I have the list of columns, which I have to add. Like:
'col1'+'col2'+ 'col5'+'col8'+----+'col18'
or like this:
'col1', 'col2', 'col5', 'col8',----,'col18'
How can I use this list directly to do the sum of columns?
Try slicing the columns:
import pandas as pd
df = pd.read_csv("whatever.csv")
df.loc[:,'col1':'col18'].sum(axis = 1)
Related
This question already has answers here:
How do I find numeric columns in Pandas?
(13 answers)
Closed 12 months ago.
How do we select the columns which have numeric values of all the columns present in a data frame?
We can select the required columns by using the column name and then by slicing those columns from the data frame, but how do we extract those columns using the data type that is present in it
let's imagine you already have a dataframe df
df with only numerical data is
import pandas as pd
import numpy as np
df_numeric = df.select_dtypes(include=np.number)
This question already has answers here:
How to select all columns whose names start with X in a pandas DataFrame
(11 answers)
Closed 2 years ago.
I manually select the columns in a pandas dataframe using
df_final = df[['column1','column2'.......'column90']]
Instead I provide the list of column names in a list by
dp_col = [col for col in df if col.startswith('column')]
But not sure how to use this list to get only those set of columns from the source dataframe.
You can use this as the list of columns to select, so:
df_final = df[[col for col in df if col.startswith('column')]]
The "origin" of the list of strings is of no importance, as long as you pass a list of strings to the subscript, this will normally work.
Use loc access with boolean masking:
df.loc[:, df.columns.str.startswith('column')]
This question already has answers here:
How to switch columns rows in a pandas dataframe
(2 answers)
How can I pivot a dataframe?
(5 answers)
Closed 4 years ago.
I am trying to pivot the below dataframe. I want the column names to be added as rows. First row is a statis one but the Column names are not static since they will be calculated for the all numerical columns from the data frame. Could you please help.
This is my data frame:
Expected Dataframe:
You just add .T :) df.describe().T to transpose your results:
import pandas as pd
import numpy as np
#Create a Dictionary of series
d = {'Name':pd.Series(['Alisa','Bobby','Cathrine','Madonna','Rocky','Sebastian','Jaqluine',
'Rahul','David','Andrew','Ajay','Teresa']),
'Age':pd.Series([26,27,25,24,31,27,25,33,42,32,51,47]),
'Score':pd.Series([89,87,67,55,47,72,76,79,44,92,99,69])}
#Create a DataFrame
pd.DataFrame(d).describe().T
Results:
You can transpose the dataframe:
data_pivot = data_pd.T
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.transpose.html
Other way is .transpose():
data_pivot = data_pd.transpose()
This question already has answers here:
Python: How to drop a row whose particular column is empty/NaN?
(2 answers)
How do I select rows from a DataFrame based on column values?
(16 answers)
Closed 4 years ago.
I'm new to python pandas. Need some help with deleting a few rows where there are null values. In the screenshot, I need to delete rows where charge_per_line == "-" using python pandas.
If the relevant entries in Charge_Per_Line are empty (NaN) when you read into pandas, you can use df.dropna:
df = df.dropna(axis=0, subset=['Charge_Per_Line'])
If the values are genuinely -, then you can replace them with np.nan and then use df.dropna:
import numpy as np
df['Charge_Per_Line'] = df['Charge_Per_Line'].replace('-', np.nan)
df = df.dropna(axis=0, subset=['Charge_Per_Line'])
Multiple ways
Use str.contains to find rows containing '-'
df[~df['Charge_Per_Line'].str.contains('-')]
Replace '-' by nan and use dropna()
df.replace('-', np.nan, inplace = True)
df = df.dropna()
This question already has answers here:
Selecting multiple columns in a Pandas dataframe
(22 answers)
Closed 5 years ago.
How do you print (in the terminal) a subset of columns from a pandas dataframe?
I don't want to remove any columns from the dataframe; I just want to see a few columns in the terminal to get an idea of how the data is pulling through.
Right now, I have print(df2.head(10)) which prints the first 10 rows of the dataframe, but how to I choose a few columns to print? Can you choose columns by their indexed number and/or name?
print(df2[['col1', 'col2', 'col3']].head(10)) will select the top 10 rows from columns 'col1', 'col2', and 'col3' from the dataframe without modifying the dataframe.