Append to Existing excel without changing formatting - python

I'm trying to take data from other excel/csv sheets and append them to an existing workbook/worksheet. The code below works in terms of appending, however, it removes formatting for not only the sheet I've appended, but all other sheets in the workbook.
From what I understand, the reason this happens is because I'm reading the entire ExcelWorkbook as a dictionary, turning it into a Pandas Dataframe, and then rewriting it back into Excel. But I'm not sure how to go about it otherwise.
How do I need to modify my code to make it so that I'm only appending the data I need, and leaving everything else untouched? Is pandas the incorrect way to go about this?
import os
import pandas as pd
import openpyxl
#Read Consolidated Sheet as Dictionary
#the 'Consolidation' excel sheet has 3 sheets:
#Consolidate, Skip1, Skip2
ws_dict = pd.read_excel('.\Consolidation.xlsx', sheet_name=None)
#convert relevant sheet to datafarme
mod_df = ws_dict['Consolidate']
#check that mod_df is the 'Consolidate' Tab
mod_df
#do work on mod_df
#grab extra sheets with data and make into pd dataframes
excel1 = 'doc1.xlsx'
excel2 = 'doc2.xlsx'
df1 = pd.read_excel(excel1)
df1 = df1.reset_index(drop=True)
df2 = pd.read_excel(excel2)
df2 = df2.reset_index(drop=True)
#concate the sheets
mod_df = pd.concat([mod_df, df1, df2], axis=0, ignore_index = True, sort=False)
#reassign the modified to the sheet
ws_dict['Consolidate'] = mod_df
#write to the consolidation workbook
with pd.ExcelWriter('Consolidation.xlsx', engine='xlsxwriter') as writer:
for ws_name, df_sheet in ws_dict.items():
df_sheet.to_excel(writer, sheet_name=ws_name, index=False)

Related

How to add a empty column to a specific sheet in excel using panda?

I have an excel file that contains 3 sheets (PizzaHut, InAndOut, ColdStone). I want to add an empty column to the InAndOut sheet.
path = 'C:\\testing\\test.xlsx'
data = pd.ExcelFile(path)
sheets = data.sheet_names
if 'InAndOut' in sheets:
something something add empty column called toppings to the sheet
data.to_excel('output.xlsx')
Been looking around, but I couldn't find an intuitive solution to this.
Any help will be appreciated!
Read in the sheet by name.
Do what you need to do.
Overwrite the sheet with the modified data.
sheet_name = 'InAndOut'
df = pd.read_excel(path, sheet_name)
# Do whatever
with pd.ExcelWriter(path, engine="openpyxl", mode="a", if_sheet_exists="replace") as writer:
df.to_excel(writer, sheet_name, index=False)
See pd.read_excel and pd.ExcelWriter.

Read each excel sheet as a different dataframe in Python

I have an excel file with 40 sheet_names. I want to read each sheet to a different dataframe, so I can export an xlsx file for each sheet.
Instead of writing all the sheet names one by one, I want to create a loop that will get all sheet names and add them as a variable in the "sheet_name" option of "pandas_read_excel"
I am trying to avoid this:
df1 = pd.read_excel(r'C:\Users\filename.xlsx', sheet_name= 'Sheet1');
df2 = pd.read_excel(r'C:\Users\filename.xlsx', sheet_name= 'Sheet2');
....
df40 = pd.read_excel(r'C:\Users\filename.xlsx', sheet_name= 'Sheet40');
thank you all guys
Specifying sheet_name as None with read_excel reads all worksheets and returns a dict of DataFrames.
import pandas as pd
file = 'C:\Users\filename.xlsx'
xl = pd.read_excel(file, sheet_name=None)
sheets = xl.keys()
for sheet in sheets:
xl[sheet].to_excel(f"{sheet}.xlsx")
I think this is what you are looking for.
import pandas as pd
xlsx = pd.read_excel('file.xlsx', sheet_name=None, header=None)
for sheet in xlsx.keys(): xlsx[sheet].to_excel(sheet+'.xlsx', header=False, index=False)

How to read and compare two excel files with multiple worksheets?

I have two excel files and both of them have 10 worksheets. I wanted to read each worksheets, compare them and print data in 3rd excel file, even that would be written in multiple worksheets.
The below program works for single worksheet
import pandas as pd
df1 = pd.read_excel('zyx_5661.xlsx')
df2 = pd.read_excel('zyx_5662.xlsx')
df1.rename(columns= lambda x : x + '_file1', inplace=True)
df2.rename(columns= lambda x : x + '_file2', inplace=True)
df_join = df1.merge(right = df2, left_on = df1.columns.to_list(), right_on = df2.columns.to_list(), how = 'outer')
with pd.ExcelWriter('xl_join_diff.xlsx') as writer:
df_join.to_excel(writer, sheet_name='testing', index=False)
How can I optimize it to work with multiple worksheets?
I think this should achieve what you need. Loop through each sheet name (assuming they're named the same across both excel documents. If not, you can use numbers instead). Write the new output to a new sheet, and save the excel document.
import pandas as pd
writer = pd.ExcelWriter('xl_join_diff.xlsx')
for sheet in ['sheet1', 'sheet2', 'sheet3']: #list of sheet names
#Pull in data for each sheet, and merge together.
df1 = pd.read_excel('zyx_5661.xlsx', sheet_name=sheet)
df2 = pd.read_excel('zyx_5662.xlsx', sheet_name=sheet)
df1.rename(columns= lambda x : x + '_file1', inplace=True)
df2.rename(columns= lambda x : x + '_file2', inplace=True)
df_join = df1.merge(right=df2, left_on=df1.columns.to_list(),
right_on=df2.columns.to_list(), how='outer')
df_join.to_excel(writer, sheet, index=False) #write to excel as new sheet
writer.save() #save excel document once all sheets have been done
You can use the loop to read files and sheets
writer = pd.ExcelWriter('multiple.xlsx', engine='xlsxwriter')
# create writer for writing all sheets in 1 file
list_files=['zyx_5661.xlsx','zyx_5662.xlsx']
count_sheets=0
for file_name in list_files:
file = pd.ExcelFile(file_name)
for sheet_name in file.sheet_names:
df = pd.read_excel(file, sheet_name)
# ... you can do your process
count_sheets=count_sheets + 1
df.to_excel(writer, sheet_name='Sheet-'+count_sheets)
writer.save()

How do I write to individual excel sheets for each dataframe generated from for loop?

I have input data in the form of a dictionary consisting of 3 dataframes of numbers. I wish to iterate through each dataframe with some operations and then finally write results for each dataframe to excel.
The following code works fine except that it only writes the resulting dataframe for the last key in the dictionary.
How do I get results for all 3 dataframes written to individual sheets?
Input_Data={'k1':test1,'k2':test24,'k3':test3}
for v in Input_Data.values():
df1 = v[126:236]
df=df1.sort_index(ascending=False)
Indexer=df.columns.tolist()
df = [(pd.concat([df[Indexer[0]],df[Indexer[num]]],axis=1)) for num in [1,2,3,4,5,6]]
df = [(df[num].astype(str).agg(','.join, axis=1)) for num in [0,1,2,3,4,5]]
df=pd.DataFrame(df)
dff=df.loc[0].append(df.loc[1].append(df.loc[2].append(df.loc[3].append(df.loc[4].append(df.loc[5])))))
dff.to_excel('test.xlsx',index=False, header=False)
Your first issue is that with each iteration of the loop you are opening a new file.
As per pandas documentation:
"Multiple sheets may be written to by specifying unique sheet_name. With all data written to the file it is necessary to save the changes. Note that creating an ExcelWriter object with a file name that already exists will result in the contents of the existing file being erased."
Second, you are not providing a variable sheet name, so each time the data is being re-written as the same sheet.
An example solution, with ExcelWriter
#df1, df2, df3 - dataframes
input_data={
'sheet_name1' : df1,
'sheet_name2' : df2,
'sheet_name3' : df3
}
# Initiate ExcelWriter - use xlsx engine
writer = pd.ExcelWriter('multiple_sheets.xlsx', engine='xlsxwriter')
# Iterate over input_data dictionary
for sheet_name, df in input_data.items():
"""
Perform operations here
"""
# Write each dataframe to a different worksheet.
df.to_excel(writer, sheet_name=sheet_name)
# Finally, save ExcelWriter to file
writer.save()
Note 1. You only initiate and save the ExcelWriter object once, the iterations only add sheets to that object
Note 2. Compared to your code, the variable "sheet_name" is provided to the "to_excel()" function
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
# Write each dataframe to a different worksheet.
for sheet_name, df in zip(sheet_names, dfs):
df.to_excel(writer, sheet_name=sheet_name)
# Close the Pandas Excel writer and output the Excel file.
writer.save()
Try to change the file name at each iteration:
Input_Data={'k1':test1,'k2':test24,'k3':test3}
file_number = 1
for v in Input_Data.values():
df1 = v[126:236]
df=df1.sort_index(ascending=False)
Indexer=df.columns.tolist()
df = [(pd.concat([df[Indexer[0]],df[Indexer[num]]],axis=1)) for num in [1,2,3,4,5,6]]
df = [(df[num].astype(str).agg(','.join, axis=1)) for num in [0,1,2,3,4,5]]
df=pd.DataFrame(df)
dff=df.loc[0].append(df.loc[1].append(df.loc[2].append(df.loc[3].append(df.loc[4].append(df.loc[5])))))
file_name='test'
file_number=str(file_number)
dff.to_excel( str(file_name+file_number)+".xlsx",index=False, header=False)
file_number=int(file_number)
file_number = file_number+1

How to separate multiple data frames in pd.read_html() when saving to excel using Python

I am attempting to save data from multiple tables brought in through pd.read_html(). If I print df, I can see it captured all the data, but when saving the data it is only saving the first table to excel. How do I separate out the tables so I can save each one to a separate sheet in excel (i.e. Quarterly Income Statement on sheet1, Annual Income Statement on sheet2, etc.). Below is my code. Any help is appreciated.
dfs = pd.read_html(https://www.google.com/finance?q=googl&fstype=ii, flavor='html5lib')
writer = pd.ExcelWriter(output.xlsx, engine='xlsxwriter')
for df in dfs:
df.to_excel(writer, sheet_name='Sheet1')
writer.save()
You can iterate on your list and flush them to a new sheet of the same workbook
import pandas as pd
dfs = pd.read_html('https://www.google.com/finance?q=googl&fstype=ii', flavor='html5lib')
# Create a Pandas Excel writer.
xlWriter = pd.ExcelWriter('myworkbook.xlsx', engine='xlsxwriter')
# Write each df to its own sheet
for i, df in enumerate(dfs):
df.to_excel(xlWriter, sheet_name='Sheet{}'.format(i))
# Close the writer and output the Excel file (mandatory!)
xlWriter.save()

Categories

Resources