Pandas loop through Excel sheets and append to df - python

I am trying to loop through an Excel sheet and append the data from multiple sheets into a data frame.
So far I have:
master_df = pd.DataFrame()
for sheet in target_sheets:
df1 = file.parse(sheet, skiprows=4)
master_df.append(df1, ignore_index=True)
But then when I call master_df.head() it returns __
The data on these sheets is in the same format and relate to each other.
So I would like to join them like this:
Sheet 1 contains:
A1
B1
C1
Sheet 2 contains:
A2
B2
C2
Sheet 3:
A3
B3
C3
End result:
A1
B1
C1
A2
B2
C2
A3
B3
C3
Is my logic correct or how can I achieve this?

Below code will work even if you don't know the exact sheet_names in the excel file. You can try this:
import pandas as pd
xls = pd.ExcelFile('myexcel.xls')
out_df = pd.DataFrame()
for sheet in xls.sheet_names:
df = pd.read_excel('myexcel.xls', sheet_name=sheet)
out_df.append(df) ## This will append rows of one dataframe to another(just like your expected output)
print(out_df)
## out_df will have data from all the sheets
Let me know if this helps.

Simply use pd.concat():
pd.concat([pd.read_excel(file, sheet_name=sheet) for sheet in ['Sheet1','Sheet2','Sheet3']], axis=1)
For example, will yield:
A1 B1 C1 A2 B2 C2 A3 B3 C3
0 1 2 3 1 2 3 1 2 3
1 4 5 6 4 5 6 4 5 6
2 7 8 9 7 8 9 7 8 9

The output desired in the question is obtained by setting axis=0.
import pandas as pd
df2 = pd.concat([pd.read_excel(io="projects.xlsx", sheet_name=sheet) for sheet in ['JournalArticles','Proposals','Books']], axis=0)
df2

Related

Pandas: pivoting rows to columns with columns as column-row

I have a data frame that looks like this
df = pd.DataFrame({'A': [1,2,3], 'B': [11,12,13]})
df
A B
0 1 11
1 2 12
2 3 13
I would like to create the following data frame where the columns are a combination of each column-row
A0 A1 A2 B0 B1 B2
0 1 2 3 11 12 13
It seems that the pivot and transpose functions will switch columns and rows but I actually want to flatten the data frame to a single row. How can I achieve this?
IIUC
s=df.stack().sort_index(level=1).to_frame(0).T
s.columns=s.columns.map('{0[1]}{0[0]}'.format)
s
A0 A1 A2 B0 B1 B2
0 1 2 3 11 12 13
One option, with pivot_wider:
# pip install pyjanitor
import janitor
import pandas as pd
df.index = [0] * len(df)
df = df.assign(num=range(len(df)))
df.pivot_wider(names_from="num", names_sep = "")
A0 A1 A2 B0 B1 B2
0 1 2 3 11 12 13

pandas: How to merge multiple dataframes with same column names on one column?

I have N dataframes:
df1:
time data
1.0 a1
2.0 b1
3.0 c1
df2:
time data
1.0 a2
2.0 b2
3.0 c2
df3:
time data
1.0 a3
2.0 b3
3.0 c3
I want to merge all of them on id, thus getting
time data1 data2 data3
1.0 a1 a2 a3
2.0 b1 b2 b3
3.0 c1 c2 c3
I can assure all the ids are the same in all dataframes.
How can I do this in pandas?
One idea is use concat for list of DataFrames - only necessary create index by id for each DaatFrame. Also for avoid duplicated columns names is added keys parameter, but it create MultiIndex in output. So added map with format for flatten it:
dfs = [df1, df2, df3]
dfs = [x.set_index('id') for x in dfs]
df = pd.concat(dfs, axis=1, keys=range(1, len(dfs) + 1))
df.columns = df.columns.map('{0[1]}{0[0]}'.format)
df = df.reset_index()
print (df)
id data1 data2 data3
0 1 a1 a2 a3
1 2 b1 b2 b3
2 3 c1 c2 c3

How to compare two data frames with same columns but different number of rows?

df1=
A B C D
a1 b1 c1 1
a2 b2 c2 2
a3 b3 c3 4
df2=
A B C D
a1 b1 c1 2
a2 b2 c2 1
I want to compare the value of the column 'D' in both dataframes. If both dataframes had same number of rows I would just do this.
newDF = df1['D']-df2['D']
However there are times when the number of rows are different. I want a result Dataframe which shows a dataframe like this.
resultDF=
A B C D_df1 D_df2 Diff
a1 b1 c1 1 2 -1
a2 b2 c2 2 1 1
EDIT: if 1st row in A,B,C from df1 and df2 is same then and only then compare 1st row of column D for each dataframe. Similarly, repeat for all the row.
Use merge and df.eval
df1.merge(df2, on=['A','B','C'], suffixes=['_df1','_df2']).eval('Diff=D_df1 - D_df2')
Out[314]:
A B C D_df1 D_df2 Diff
0 a1 b1 c1 1 2 -1
1 a2 b2 c2 2 1 1

Grouping data on column value

Hi I have data (in excel and text file as well) like
C1 C2 C3
1 p a
1 q b
2 r c
2 s d
And I want the output like:
C1 C2 C3
1 p,q a,b
2 r,s c,d
How can I group the data based on column values.
I am open to anything: any library, any language, any tool
Like python, bash, or even excel?
I think we can do this using pandas in python, but I havent used it before.
Any leads appreciated.
First pandas.read_excel - output is DataFrame:
df = pd.read_excel('file.xlsx')
Then you can use groupby with agg join:
df = df.groupby('C1').agg(','.join).reset_index()
print (df)
C1 C2 C3
0 1 p,q a,b
1 2 r,s c,d
If more columns in df and need filter only C2 and C3:
df = df.groupby('C1')['C2','C3'].agg(','.join).reset_index()
print (df)
C1 C2 C3
0 1 p,q a,b
1 2 r,s c,d
For save to excel file use DataFrame.to_excel, obviously without index:
df.to_excel('file.xlsx', index=False)

how to make excel into dict by xlrd

I have the following data in Excel:
Column(A) Column(B) Column(C)
Header1
A a
FC Qty
select a1 1
a2 2
derived a3 3
Header 2
B b
FC Qty
select b1 1
derived b2 2
b3 3
And I need to add this data to a dictionary in the following format (dict with tuples):
my_dict = { A:[a,select:[(a1,1),(a2,2),(a3,3)], derived:[(a3,3)]], B:[b,select:(b1,1),derived:[(b2,2),(b3,3)]]}

Categories

Resources