Adding data from different data frame to excel - python

Currently what I want to do is take data I have from a data frame list and add them to an existing excel file as their own tabs.
To test this out, I have tried it with one data frame. There are no error but when I go to open the excel file it says it is corrupt. I proceed to recover the information but I rather not have to do that every time. I believe it would fail if I looped through my list to make this happen.
import os,glob
import pandas as pd
from openpyxl import load_workbook
master_file='combined_csv.xlsx'
#set the directory
os.chdir(r'C:\Users\test')
#set the type of file
extension = 'csv'
#take all files with the csv extension into an array
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]
col_to_keep=["Name",
"Area (ft)",
"Length (ft)",
"Center (ft)",
"ID",
"SyncID"]
combine_csv = pd.concat([pd.read_csv(f, delimiter=';', usecols=col_to_keep) for f in all_filenames])
combine_csv.to_excel(master_file, index=False,sheet_name='All')
# Defining the path which excel needs to be created
# There must be a pre-existing excel sheet which can be updated
FilePath = r'C:\Users\test'
# Generating workbook
ExcelWorkbook = load_workbook(FilePath)
# Generating the writer engine
writer = pd.ExcelWriter(FilePath, engine = 'openpyxl')
# Assigning the workbook to the writer engine
writer.book = ExcelWorkbook
# Creating first dataframe
drip_file = pd.read_csv(all_filenames[0], delimiter = ';', usecols=col_to_keep)
SimpleDataFrame1=pd.DataFrame(data=drip_file)
print(SimpleDataFrame1)
# Adding the DataFrames to the excel as a new sheet
SimpleDataFrame1.to_excel(writer, sheet_name = 'Drip')
writer.save()
writer.close()
It seems like it runs fine with no errors but when I open the excel file I get the error shown below.
Does anyone see something wrong with the code that would cause excel to give me this error?
Thank you in advance

Your code knows its printing data to the same workbook, but to use writer you will also need to tell python what the sheet names are:
book = load_workbook(your_destination_file)
writer = pd.ExcelWriter(your_destination_file, engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets) # tells
pandas/python what the sheet names are
Your_dataframe.to_excel(writer, sheet_name=DesiredSheetname)
writer.save()
Also, if you have pivots, pictures, external connections in the document they will be deleted and could be what is causing the corruption.

Related

Python: Image/Shape disappears from Excel sheet when DataFrame is exported to another Sheet

This is a miniature view of the problem I am facing. I have an Excel Macro file (.xlsm) in which Sheet named Shape has a Block Arrow in it (Excel -> Insert -> Shapes -> Block Arrows). See the image:
Now, I want to export a DataFrame into another Sheet of this Excel Macro File. DataFrame gets exported successfully, but the Block Arrow shape, as shown in the image above disappears from the Shape sheet.
Following is my code:
Template.xlsm is an .xlsm file with just one sheet (named Shape), as shown in picture above.
import openpyxl
import pandas as pd
from shutil import copyfile
df = pd.DataFrame({'A':[1,2,3],'B':[7,8,9]})
copyfile(os.path.join(path,'Template.xlsm'),
os.path.join(path,'Output.xlsm'))
excel_file_path = os.path.join(path,'Output.xlsm')
writer = pd.ExcelWriter(excel_file_path, engine='openpyxl')
writer.book = openpyxl.load_workbook(excel_file_path, keep_vba= True)
writer.sheets = {ws.title: ws for ws in writer.book.worksheets}
df.to_excel(writer, sheet_name= 'Data',index=False)
workbook = writer.book
workbook.filename = excel_file_path
writer.save()
writer.close()
The Shape sheet in Output.xlsm doesn't have this Block Arrow:
This SO Post says that we ned to install Pillow python package and then openpyxl will work properly, but it doesn't solve my problem.
Does anyone have some suggestions?
Thanks.

How to append a dataframe to an existing excel sheet (without overwriting it) after openpyxl update?

I have an existing excel file which I have to update every week with new data, appending it to the last line of an existing sheet. I was accomplishing this in this manner, following the solution provided in this post How to write to an existing excel file without overwriting data (using pandas)?
import pandas as pd
import openpyxl
from openpyxl import load_workbook
book = load_workbook(excel_path)
writer = pd.ExcelWriter(excel_path, engine = 'openpyxl', mode = 'a')
writer.book = book
## ExcelWriter for some reason uses writer.sheets to access the sheet.
## If you leave it empty it will not know that sheet Main is already there
## and will create a new sheet.
ws = book.worksheets[1]
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
df.to_excel(writer, 'Preço_por_quilo', startrow = len(ws["C"]), header = False, index = False)
writer.save()
writer.close()
This code was running ok until today, when it returned the following error:
ValueError: Sheet 'Preço_por_quilo' already exists and if_sheet_exists is set to 'error'.
which apparently results from the latest update of the openpyxl package, which added the "if_sheet_exists" argument to the ExcelWriter function.
How can I correct this code, in order to append my data to the last line of the sheet?
adding if_sheet_exists=replace to the end of your df.to_excel should work, like below:
df.to_excel(writer, 'Preço_por_quilo', startrow = len(ws["C"]), header = False, index = False, if_sheet_exists='replace')
More information on it's use can be found here:
https://pandas.pydata.org/docs/reference/api/pandas.ExcelWriter.html

Pandas To Excel write to an existing xlsx file

I am trying to write a dataframe to an existing Excel worksheet on one workbook. I have other worksheets in this excel workbook which should not be affected. The worksheet I am looking to overwrite is a tab called 'Data'. The code I have below:
df= pd.read_sql(sql='EXEC [dbo].[spData]', con=engine)
excel_file_path = "C:/Shared/Test.xlsx"
book = load_workbook(excel_file_path)
writer = ExcelWriter(excel_file_path, engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
df.to_excel(writer, sheet_name='Data', index=False, header=[
'A','B','C','D','E','F'])
writer.save()
The code has been running for ages in debug mode with no errors but I am not sure if the above is correct in what I am expecting it to do. I can see the file says 0KB which so it has got rid of the other worksheets as the original file was 55,939kb. I was able to use ExcelWriter and engine 'openpyxl' to write to a workbook as a new sheet. But in the above code I want to replace the content of a worksheet with the data from my dataframe.
This worked added mode='a'
writer = ExcelWriter(excel_file_path, engine='openpyxl', mode='a')

How load and update large excel file >15MB using openpyxl without read_only = True

I am new to python, I was trying to load large excel file of size 15MB with 3 sheets/tab. I am trying to update 3rd tab. Since I need to update 3rd sheet, I was trying to load the excel with openpyxl.load_workbook() without read_only. My system got hung while loading could you please help. I dont want use read_only=True, because i want to edit the third sheet.
Thanks,
import pandas as pd
from openpyxl import load_workbook
meta_df = pd.read_csv('metafile')
file = 'file.xlsx'
book = load_workbook(file)
writer = pd.ExcelWriter(file, engine='openpyxl')
writer.book = book
writer.sheets = dict((wsh.title, wsh) for wsh in book.worksheets)
meta_df.to_excel(writer, 'meta_data', index=False, header=False, startrow=1)
writer.save()

Add new .xlsx to an existing .xlsx in sheet(tab)

I have one code that goes like below..
#After performing some operation using pandas I have written df to the .xlsx
df.to_excel('file5.xlsx',index=False) # This excel has a single tab(sheet) inside
Then I have another .xlsx file (already provided) Final.xlsx , that has multiple tab(sheet) inside it like file1,file2,file3,file4 . I want to add the newly create file5.xls to the Final.xlsx as new sheet after sheet file4 .
Below answer provided by Anky, it is adding sheet the xlsx file5.xlsx to 'Final.xlsx' but the content inside sheets file1 2 3 4 is getting missed, format broken and also data is missing ...
import pandas
from openpyxl import load_workbook
book = load_workbook('foo.xlsx')
writer = pandas.ExcelWriter('foo.xlsx', engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
df1=pd.read_excel('file5.xlsx')
df1.to_excel(writer, "new",index=False)
writer.save()
Need help to fix this..
I have asked this in separate question - Data missing, format changed in .xlsx file having multiple sheets using pandas, openpyxl while adding new sheet in existing .xlsx file
import pandas
from openpyxl import load_workbook
book = load_workbook('foo.xlsx')
writer = pandas.ExcelWriter('foo.xlsx', engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
df.to_excel(writer, "file5",index=False)
writer.save()
Sheetname can be whatever you want to keep ex: file5

Categories

Resources