Read each excel sheet as a different dataframe in Python - python

I have an excel file with 40 sheet_names. I want to read each sheet to a different dataframe, so I can export an xlsx file for each sheet.
Instead of writing all the sheet names one by one, I want to create a loop that will get all sheet names and add them as a variable in the "sheet_name" option of "pandas_read_excel"
I am trying to avoid this:
df1 = pd.read_excel(r'C:\Users\filename.xlsx', sheet_name= 'Sheet1');
df2 = pd.read_excel(r'C:\Users\filename.xlsx', sheet_name= 'Sheet2');
....
df40 = pd.read_excel(r'C:\Users\filename.xlsx', sheet_name= 'Sheet40');
thank you all guys

Specifying sheet_name as None with read_excel reads all worksheets and returns a dict of DataFrames.
import pandas as pd
file = 'C:\Users\filename.xlsx'
xl = pd.read_excel(file, sheet_name=None)
sheets = xl.keys()
for sheet in sheets:
xl[sheet].to_excel(f"{sheet}.xlsx")

I think this is what you are looking for.
import pandas as pd
xlsx = pd.read_excel('file.xlsx', sheet_name=None, header=None)
for sheet in xlsx.keys(): xlsx[sheet].to_excel(sheet+'.xlsx', header=False, index=False)

Related

How to add a empty column to a specific sheet in excel using panda?

I have an excel file that contains 3 sheets (PizzaHut, InAndOut, ColdStone). I want to add an empty column to the InAndOut sheet.
path = 'C:\\testing\\test.xlsx'
data = pd.ExcelFile(path)
sheets = data.sheet_names
if 'InAndOut' in sheets:
something something add empty column called toppings to the sheet
data.to_excel('output.xlsx')
Been looking around, but I couldn't find an intuitive solution to this.
Any help will be appreciated!
Read in the sheet by name.
Do what you need to do.
Overwrite the sheet with the modified data.
sheet_name = 'InAndOut'
df = pd.read_excel(path, sheet_name)
# Do whatever
with pd.ExcelWriter(path, engine="openpyxl", mode="a", if_sheet_exists="replace") as writer:
df.to_excel(writer, sheet_name, index=False)
See pd.read_excel and pd.ExcelWriter.

How to read and compare two excel files with multiple worksheets?

I have two excel files and both of them have 10 worksheets. I wanted to read each worksheets, compare them and print data in 3rd excel file, even that would be written in multiple worksheets.
The below program works for single worksheet
import pandas as pd
df1 = pd.read_excel('zyx_5661.xlsx')
df2 = pd.read_excel('zyx_5662.xlsx')
df1.rename(columns= lambda x : x + '_file1', inplace=True)
df2.rename(columns= lambda x : x + '_file2', inplace=True)
df_join = df1.merge(right = df2, left_on = df1.columns.to_list(), right_on = df2.columns.to_list(), how = 'outer')
with pd.ExcelWriter('xl_join_diff.xlsx') as writer:
df_join.to_excel(writer, sheet_name='testing', index=False)
How can I optimize it to work with multiple worksheets?
I think this should achieve what you need. Loop through each sheet name (assuming they're named the same across both excel documents. If not, you can use numbers instead). Write the new output to a new sheet, and save the excel document.
import pandas as pd
writer = pd.ExcelWriter('xl_join_diff.xlsx')
for sheet in ['sheet1', 'sheet2', 'sheet3']: #list of sheet names
#Pull in data for each sheet, and merge together.
df1 = pd.read_excel('zyx_5661.xlsx', sheet_name=sheet)
df2 = pd.read_excel('zyx_5662.xlsx', sheet_name=sheet)
df1.rename(columns= lambda x : x + '_file1', inplace=True)
df2.rename(columns= lambda x : x + '_file2', inplace=True)
df_join = df1.merge(right=df2, left_on=df1.columns.to_list(),
right_on=df2.columns.to_list(), how='outer')
df_join.to_excel(writer, sheet, index=False) #write to excel as new sheet
writer.save() #save excel document once all sheets have been done
You can use the loop to read files and sheets
writer = pd.ExcelWriter('multiple.xlsx', engine='xlsxwriter')
# create writer for writing all sheets in 1 file
list_files=['zyx_5661.xlsx','zyx_5662.xlsx']
count_sheets=0
for file_name in list_files:
file = pd.ExcelFile(file_name)
for sheet_name in file.sheet_names:
df = pd.read_excel(file, sheet_name)
# ... you can do your process
count_sheets=count_sheets + 1
df.to_excel(writer, sheet_name='Sheet-'+count_sheets)
writer.save()

Append data to the last row of the Excel sheet using Pandas

I have an excel data for three variables (Acct, Order, Date) in a Sheet name called Orders
I have created a data frame by reading this Sheet
import pandas as pd
sheet_file=pd_ExcelFile("Orders.xlsx", engine="openpyxl")
for sheet_name in worksheets:
df=pd.read_excel(sheet_file,sheet_name,header=1)
append_data.append(df)
append_data=pd.concat(append_data)
I have another Excel file called "Total_Orders.xlsx" with ~100k rows and I need to append the above dataframe to this excel file (Sheet Name="Orders")
with pd.ExcelWriter('Total_Orders.xlsx',sheet_name='Orders',engine="openpyxl") as writer:
append_data.to_excel(writer,startrow=2,header=False,index=False)
writer.save()
The above is overwriting the data instead of appending it. I know startrow is the key here but I am not sure how to fix this. Any help is much appreciated
Have you tried in mode="a", along these lines:
with pd.ExcelWriter("Total_Orders.xlsx", mode="a", engine="openpyxl") as writer:
append_data.to_excel(writer, sheet_name="Orders")
EDIT - in response to comment
import pandas as pd
from openpyxl.utils.dataframe import dataframe_to_rows
from openpyxl import load_workbook
append_data = pd.DataFrame([{'Acct':3, 'Order':333, 'Note':'third'},
{'Acct':4, 'Order':444, 'Note':'fourth'}])
wb = load_workbook(filename = "stackoverflow.xlsx")
ws = wb["Orders"]
for r in dataframe_to_rows(append_data, index=False, header=False): #No index and don't append the column headers
ws.append(r)
wb.save("stackoverflow.xlsx")
The stackoverflow.xlsx before:
The stackoverflow.xlsx after (the 'Other' sheet was not affected):

Append to Existing excel without changing formatting

I'm trying to take data from other excel/csv sheets and append them to an existing workbook/worksheet. The code below works in terms of appending, however, it removes formatting for not only the sheet I've appended, but all other sheets in the workbook.
From what I understand, the reason this happens is because I'm reading the entire ExcelWorkbook as a dictionary, turning it into a Pandas Dataframe, and then rewriting it back into Excel. But I'm not sure how to go about it otherwise.
How do I need to modify my code to make it so that I'm only appending the data I need, and leaving everything else untouched? Is pandas the incorrect way to go about this?
import os
import pandas as pd
import openpyxl
#Read Consolidated Sheet as Dictionary
#the 'Consolidation' excel sheet has 3 sheets:
#Consolidate, Skip1, Skip2
ws_dict = pd.read_excel('.\Consolidation.xlsx', sheet_name=None)
#convert relevant sheet to datafarme
mod_df = ws_dict['Consolidate']
#check that mod_df is the 'Consolidate' Tab
mod_df
#do work on mod_df
#grab extra sheets with data and make into pd dataframes
excel1 = 'doc1.xlsx'
excel2 = 'doc2.xlsx'
df1 = pd.read_excel(excel1)
df1 = df1.reset_index(drop=True)
df2 = pd.read_excel(excel2)
df2 = df2.reset_index(drop=True)
#concate the sheets
mod_df = pd.concat([mod_df, df1, df2], axis=0, ignore_index = True, sort=False)
#reassign the modified to the sheet
ws_dict['Consolidate'] = mod_df
#write to the consolidation workbook
with pd.ExcelWriter('Consolidation.xlsx', engine='xlsxwriter') as writer:
for ws_name, df_sheet in ws_dict.items():
df_sheet.to_excel(writer, sheet_name=ws_name, index=False)

How to separate multiple data frames in pd.read_html() when saving to excel using Python

I am attempting to save data from multiple tables brought in through pd.read_html(). If I print df, I can see it captured all the data, but when saving the data it is only saving the first table to excel. How do I separate out the tables so I can save each one to a separate sheet in excel (i.e. Quarterly Income Statement on sheet1, Annual Income Statement on sheet2, etc.). Below is my code. Any help is appreciated.
dfs = pd.read_html(https://www.google.com/finance?q=googl&fstype=ii, flavor='html5lib')
writer = pd.ExcelWriter(output.xlsx, engine='xlsxwriter')
for df in dfs:
df.to_excel(writer, sheet_name='Sheet1')
writer.save()
You can iterate on your list and flush them to a new sheet of the same workbook
import pandas as pd
dfs = pd.read_html('https://www.google.com/finance?q=googl&fstype=ii', flavor='html5lib')
# Create a Pandas Excel writer.
xlWriter = pd.ExcelWriter('myworkbook.xlsx', engine='xlsxwriter')
# Write each df to its own sheet
for i, df in enumerate(dfs):
df.to_excel(xlWriter, sheet_name='Sheet{}'.format(i))
# Close the writer and output the Excel file (mandatory!)
xlWriter.save()

Categories

Resources