This question already has answers here:
How do I find numeric columns in Pandas?
(13 answers)
Closed 12 months ago.
How do we select the columns which have numeric values of all the columns present in a data frame?
We can select the required columns by using the column name and then by slicing those columns from the data frame, but how do we extract those columns using the data type that is present in it
let's imagine you already have a dataframe df
df with only numerical data is
import pandas as pd
import numpy as np
df_numeric = df.select_dtypes(include=np.number)
Related
This question already has answers here:
Use a list of values to select rows from a Pandas dataframe
(8 answers)
Closed 10 months ago.
I have a pandas dataframe containing data and a python list which contains ids. I want to extract data from the pandas dataframe which matches with the values of list.
ids = ['SW00003062', 'SW00003063', 'SW00003067', 'SW00003072']
Dataframe is this:
You can use isin
out = df[df['id'].isin(ids)]
This question already has answers here:
How can I pivot a dataframe?
(5 answers)
How to pivot a dataframe in Pandas? [duplicate]
(2 answers)
Closed 1 year ago.
I have a dataframe like this:
index,col1,value
1,A,1
1,B,2
2,A,3
2,D,4
2,C,5
2,B,6
And I would like to convert this dataframe to this:
index,col1_A,col1_B,col1_C,col1_D
1,1,2,np.Nan,np.nan
2,3,4,5,6
The conversion is based on the index column value and for each unique index column, the column values from col1 is converted to column name and its associated value is set to the corresponding value available in value columns.
Currently my solution contains looping by creating subset of df as temporary df based on each index and then starting looping there. I am wondering if there is already builtin solution available for it in pandas. please feel free to suggest.
This question already has answers here:
pandas: to_numeric for multiple columns
(6 answers)
Closed 3 years ago.
I am reading a data frame from the azure databricks cluster and converting it into a pandas data frame. Pandas declares the datatype as object for all features instead of int64.
The only solution is to use astype and covert each column individually, but I have 122 columns...
pd_train = df_train.toPandas()
pd_test = df_test.toPandas()
pd_train.dtypes
pd_train displays the pandas dataframe for the training set
pd_test displays the pandas dataframe for the testing set
They are both spark dataframes
Here is one way of doing it.
First you could get all of the column names,
#Get column names
columns = pd_train.columns
Next you could use pd.to_numeric and the column names to convert all columns to int64
#Convert to numeric
pd_train[columns] = pd_train[columns].apply(pd.to_numeric, errors='coerce')
You could then repeat this process for the pd_test dataframe.
This question already has answers here:
How to switch columns rows in a pandas dataframe
(2 answers)
How can I pivot a dataframe?
(5 answers)
Closed 4 years ago.
I am trying to pivot the below dataframe. I want the column names to be added as rows. First row is a statis one but the Column names are not static since they will be calculated for the all numerical columns from the data frame. Could you please help.
This is my data frame:
Expected Dataframe:
You just add .T :) df.describe().T to transpose your results:
import pandas as pd
import numpy as np
#Create a Dictionary of series
d = {'Name':pd.Series(['Alisa','Bobby','Cathrine','Madonna','Rocky','Sebastian','Jaqluine',
'Rahul','David','Andrew','Ajay','Teresa']),
'Age':pd.Series([26,27,25,24,31,27,25,33,42,32,51,47]),
'Score':pd.Series([89,87,67,55,47,72,76,79,44,92,99,69])}
#Create a DataFrame
pd.DataFrame(d).describe().T
Results:
You can transpose the dataframe:
data_pivot = data_pd.T
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.transpose.html
Other way is .transpose():
data_pivot = data_pd.transpose()
This question already has answers here:
Pandas: sum DataFrame rows for given columns
(8 answers)
Closed 4 years ago.
I want to sum multiple columns of dataframe to a new column. For 2 columns I was using this.
import pandas as pd, numpy as np
df=pd.read_csv("Calculation_test.csv")
#creating new colums
df["Test1"] = 0
#sum of 2 columns
df["Test1"]= df['col1']+df['col2']
df.to_csv('test_cal.csv', index=False)
But, for my project, I need to do sums of around 15-20 columns. Every time I do not want to write df['col1']+df['col2']+......................
I have the list of columns, which I have to add. Like:
'col1'+'col2'+ 'col5'+'col8'+----+'col18'
or like this:
'col1', 'col2', 'col5', 'col8',----,'col18'
How can I use this list directly to do the sum of columns?
Try slicing the columns:
import pandas as pd
df = pd.read_csv("whatever.csv")
df.loc[:,'col1':'col18'].sum(axis = 1)