I have code from a while ago that I am re-using for a new task. The task is to write a new DataFrame into a new sheet, into an existing excel file. But there is one part of the code that I do not understand, but it just makes the code "work".
working:
from openpyxl import load_workbook
import pandas as pd
file = r'YOUR_PATH_TO_EXCEL_HERE'
df1 = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]})
book = load_workbook(file)
writer = pd.ExcelWriter(file, engine='openpyxl')
writer.book = book # <---------------------------- piece i do not understand
df1.to_excel(writer, sheet_name='New', index=None)
writer.save()
The little line of writer.book=book has me stumped. Without that piece of code, the Excel file will delete all other sheets, except the sheet used in the sheetname= parameter in df1.to_excel.
i looked at xlsxwriter's documentation as well as openpyxl's, but cannot seem to figure out why that line gives me my expected output. Any ideas?
edit: i believe this post is where i got the original idea from.
In the source code of ExcelWriter, with openpyxl, it initializes empty workbook and delete all sheets. That's why you need to add it explicitly
class _OpenpyxlWriter(ExcelWriter):
engine = 'openpyxl'
supported_extensions = ('.xlsx', '.xlsm')
def __init__(self, path, engine=None, **engine_kwargs):
# Use the openpyxl module as the Excel writer.
from openpyxl.workbook import Workbook
super(_OpenpyxlWriter, self).__init__(path, **engine_kwargs)
# Create workbook object with default optimized_write=True.
self.book = Workbook()
# Openpyxl 1.6.1 adds a dummy sheet. We remove it.
if self.book.worksheets:
try:
self.book.remove(self.book.worksheets[0])
except AttributeError:
# compat
self.book.remove_sheet(self.book.worksheets[0])
Related
Currently what I want to do is take data I have from a data frame list and add them to an existing excel file as their own tabs.
To test this out, I have tried it with one data frame. There are no error but when I go to open the excel file it says it is corrupt. I proceed to recover the information but I rather not have to do that every time. I believe it would fail if I looped through my list to make this happen.
import os,glob
import pandas as pd
from openpyxl import load_workbook
master_file='combined_csv.xlsx'
#set the directory
os.chdir(r'C:\Users\test')
#set the type of file
extension = 'csv'
#take all files with the csv extension into an array
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]
col_to_keep=["Name",
"Area (ft)",
"Length (ft)",
"Center (ft)",
"ID",
"SyncID"]
combine_csv = pd.concat([pd.read_csv(f, delimiter=';', usecols=col_to_keep) for f in all_filenames])
combine_csv.to_excel(master_file, index=False,sheet_name='All')
# Defining the path which excel needs to be created
# There must be a pre-existing excel sheet which can be updated
FilePath = r'C:\Users\test'
# Generating workbook
ExcelWorkbook = load_workbook(FilePath)
# Generating the writer engine
writer = pd.ExcelWriter(FilePath, engine = 'openpyxl')
# Assigning the workbook to the writer engine
writer.book = ExcelWorkbook
# Creating first dataframe
drip_file = pd.read_csv(all_filenames[0], delimiter = ';', usecols=col_to_keep)
SimpleDataFrame1=pd.DataFrame(data=drip_file)
print(SimpleDataFrame1)
# Adding the DataFrames to the excel as a new sheet
SimpleDataFrame1.to_excel(writer, sheet_name = 'Drip')
writer.save()
writer.close()
It seems like it runs fine with no errors but when I open the excel file I get the error shown below.
Does anyone see something wrong with the code that would cause excel to give me this error?
Thank you in advance
Your code knows its printing data to the same workbook, but to use writer you will also need to tell python what the sheet names are:
book = load_workbook(your_destination_file)
writer = pd.ExcelWriter(your_destination_file, engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets) # tells
pandas/python what the sheet names are
Your_dataframe.to_excel(writer, sheet_name=DesiredSheetname)
writer.save()
Also, if you have pivots, pictures, external connections in the document they will be deleted and could be what is causing the corruption.
I have an existing excel file which I have to update every week with new data, appending it to the last line of an existing sheet. I was accomplishing this in this manner, following the solution provided in this post How to write to an existing excel file without overwriting data (using pandas)?
import pandas as pd
import openpyxl
from openpyxl import load_workbook
book = load_workbook(excel_path)
writer = pd.ExcelWriter(excel_path, engine = 'openpyxl', mode = 'a')
writer.book = book
## ExcelWriter for some reason uses writer.sheets to access the sheet.
## If you leave it empty it will not know that sheet Main is already there
## and will create a new sheet.
ws = book.worksheets[1]
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
df.to_excel(writer, 'Preço_por_quilo', startrow = len(ws["C"]), header = False, index = False)
writer.save()
writer.close()
This code was running ok until today, when it returned the following error:
ValueError: Sheet 'Preço_por_quilo' already exists and if_sheet_exists is set to 'error'.
which apparently results from the latest update of the openpyxl package, which added the "if_sheet_exists" argument to the ExcelWriter function.
How can I correct this code, in order to append my data to the last line of the sheet?
adding if_sheet_exists=replace to the end of your df.to_excel should work, like below:
df.to_excel(writer, 'Preço_por_quilo', startrow = len(ws["C"]), header = False, index = False, if_sheet_exists='replace')
More information on it's use can be found here:
https://pandas.pydata.org/docs/reference/api/pandas.ExcelWriter.html
The problem is the following:
I'm loading an existing excel file as follow:
import pandas as pd
from openpyxl import load_workbook
book = load_workbook('template.xlsx')
writer = pd.ExcelWriter('template.xlsx', engine='openpyxl')
writer.book = book
Then I performe some modification to the file and I save it with
writer.save()
Since this procedure is a part of a bigger pipe, it would be beneficial to be able to rename the file template.xlsx before saving the modification. Is it possible?
Thanks in adavance for any suggestion!
Why not just pass a new name to pd.ExcelWriter(...)?
import pandas as pd
from openpyxl import load_workbook
book = load_workbook('template.xlsx')
writer = pd.ExcelWriter('foo.xlsx', engine='openpyxl')
writer.book = book
writer.save()
I am using Openpyxl to add some data to an existing excel file but unfortunately it also changes the format of my chart (border, background and curves colors) and deletes textbox.
Does anyone know how to prevent these changes ?
Thanks ahead !
A simplified version of my code below and
a screenshot of my excel file
so everyone can reproduce the excel file.
import os
import sys
import pandas as pd
from openpyxl import load_workbook
folder = r'C:\MyFolder'
filename = r'test.xlsx'
writer = pd.ExcelWriter(os.path.join(folder, filename),
engine='openpyxl',
datetime_format='dd/mm/yyyy',
date_format='dd/mm/yyyy')
writer.book = load_workbook(os.path.join(folder, filename))
writer.sheets = {ws.title:ws for ws in writer.book.worksheets}
writer.save()
I am new to python, I was trying to load large excel file of size 15MB with 3 sheets/tab. I am trying to update 3rd tab. Since I need to update 3rd sheet, I was trying to load the excel with openpyxl.load_workbook() without read_only. My system got hung while loading could you please help. I dont want use read_only=True, because i want to edit the third sheet.
Thanks,
import pandas as pd
from openpyxl import load_workbook
meta_df = pd.read_csv('metafile')
file = 'file.xlsx'
book = load_workbook(file)
writer = pd.ExcelWriter(file, engine='openpyxl')
writer.book = book
writer.sheets = dict((wsh.title, wsh) for wsh in book.worksheets)
meta_df.to_excel(writer, 'meta_data', index=False, header=False, startrow=1)
writer.save()