Pandas to openpyxl Workbook to download file in Flask - python

The goal is save multiple dataframes in Excel sheet (each dataframe as a sheet) and download the file when the user hit the specified URL.
This is the code.
#app.server.route("/file/excel")
def download_excel():
wb = Workbook()
df1 = pd.DataFrame(...)
sheet1 = wb.active
sheet1.title = "Sheet1"
for r in dataframe_to_rows(df1, index=False, header=True):
sheet1.append(r)
df2 = pd.DataFrame(...)
sheet2 = wb.active
sheet2.title = "Sheet1"
for r in dataframe_to_rows(df2, index=False, header=True):
sheet2.append(r)
excel_stream = io.BytesIO()
wb.save(excel_stream)
excel_stream.seek(0) # go to the beginning of the stream
#
return send_file(
excel_stream,
mimetype='application/vnd.openxmlformats-officedocument.spreadsheetml.sheet',
attachment_filename="File.xlsx",
as_attachment=True,
cache_timeout=0
)
I am getting the following error.
AttributeError: 'DatetimeArray' object has no attribute 'tolist'
df1 has a column with datatime data type. I did some search and found out that iterating through dataframe is not advised and that is causing this error.
The alternative is to use df.to_excel(), but I don't know how to make it work with BytesIO as I need to stream the data to be downloaded.
Question: how can I save the data to the excel sheet and get the error.
I have to use send_file() for flask to download the file on the client.

Converting the datatime dtype to string before appending to the excel sheet resolved the issue. There might be a better solution, but this solved my issue

Related

Adding data from different data frame to excel

Currently what I want to do is take data I have from a data frame list and add them to an existing excel file as their own tabs.
To test this out, I have tried it with one data frame. There are no error but when I go to open the excel file it says it is corrupt. I proceed to recover the information but I rather not have to do that every time. I believe it would fail if I looped through my list to make this happen.
import os,glob
import pandas as pd
from openpyxl import load_workbook
master_file='combined_csv.xlsx'
#set the directory
os.chdir(r'C:\Users\test')
#set the type of file
extension = 'csv'
#take all files with the csv extension into an array
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]
col_to_keep=["Name",
"Area (ft)",
"Length (ft)",
"Center (ft)",
"ID",
"SyncID"]
combine_csv = pd.concat([pd.read_csv(f, delimiter=';', usecols=col_to_keep) for f in all_filenames])
combine_csv.to_excel(master_file, index=False,sheet_name='All')
# Defining the path which excel needs to be created
# There must be a pre-existing excel sheet which can be updated
FilePath = r'C:\Users\test'
# Generating workbook
ExcelWorkbook = load_workbook(FilePath)
# Generating the writer engine
writer = pd.ExcelWriter(FilePath, engine = 'openpyxl')
# Assigning the workbook to the writer engine
writer.book = ExcelWorkbook
# Creating first dataframe
drip_file = pd.read_csv(all_filenames[0], delimiter = ';', usecols=col_to_keep)
SimpleDataFrame1=pd.DataFrame(data=drip_file)
print(SimpleDataFrame1)
# Adding the DataFrames to the excel as a new sheet
SimpleDataFrame1.to_excel(writer, sheet_name = 'Drip')
writer.save()
writer.close()
It seems like it runs fine with no errors but when I open the excel file I get the error shown below.
Does anyone see something wrong with the code that would cause excel to give me this error?
Thank you in advance
Your code knows its printing data to the same workbook, but to use writer you will also need to tell python what the sheet names are:
book = load_workbook(your_destination_file)
writer = pd.ExcelWriter(your_destination_file, engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets) # tells
pandas/python what the sheet names are
Your_dataframe.to_excel(writer, sheet_name=DesiredSheetname)
writer.save()
Also, if you have pivots, pictures, external connections in the document they will be deleted and could be what is causing the corruption.

How to append a dataframe to an existing excel sheet (without overwriting it) after openpyxl update?

I have an existing excel file which I have to update every week with new data, appending it to the last line of an existing sheet. I was accomplishing this in this manner, following the solution provided in this post How to write to an existing excel file without overwriting data (using pandas)?
import pandas as pd
import openpyxl
from openpyxl import load_workbook
book = load_workbook(excel_path)
writer = pd.ExcelWriter(excel_path, engine = 'openpyxl', mode = 'a')
writer.book = book
## ExcelWriter for some reason uses writer.sheets to access the sheet.
## If you leave it empty it will not know that sheet Main is already there
## and will create a new sheet.
ws = book.worksheets[1]
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
df.to_excel(writer, 'Preço_por_quilo', startrow = len(ws["C"]), header = False, index = False)
writer.save()
writer.close()
This code was running ok until today, when it returned the following error:
ValueError: Sheet 'Preço_por_quilo' already exists and if_sheet_exists is set to 'error'.
which apparently results from the latest update of the openpyxl package, which added the "if_sheet_exists" argument to the ExcelWriter function.
How can I correct this code, in order to append my data to the last line of the sheet?
adding if_sheet_exists=replace to the end of your df.to_excel should work, like below:
df.to_excel(writer, 'Preço_por_quilo', startrow = len(ws["C"]), header = False, index = False, if_sheet_exists='replace')
More information on it's use can be found here:
https://pandas.pydata.org/docs/reference/api/pandas.ExcelWriter.html

How load and update large excel file >15MB using openpyxl without read_only = True

I am new to python, I was trying to load large excel file of size 15MB with 3 sheets/tab. I am trying to update 3rd tab. Since I need to update 3rd sheet, I was trying to load the excel with openpyxl.load_workbook() without read_only. My system got hung while loading could you please help. I dont want use read_only=True, because i want to edit the third sheet.
Thanks,
import pandas as pd
from openpyxl import load_workbook
meta_df = pd.read_csv('metafile')
file = 'file.xlsx'
book = load_workbook(file)
writer = pd.ExcelWriter(file, engine='openpyxl')
writer.book = book
writer.sheets = dict((wsh.title, wsh) for wsh in book.worksheets)
meta_df.to_excel(writer, 'meta_data', index=False, header=False, startrow=1)
writer.save()

Pandas Excel Writer using Openpyxl with existing workbook

I have code from a while ago that I am re-using for a new task. The task is to write a new DataFrame into a new sheet, into an existing excel file. But there is one part of the code that I do not understand, but it just makes the code "work".
working:
from openpyxl import load_workbook
import pandas as pd
file = r'YOUR_PATH_TO_EXCEL_HERE'
df1 = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]})
book = load_workbook(file)
writer = pd.ExcelWriter(file, engine='openpyxl')
writer.book = book # <---------------------------- piece i do not understand
df1.to_excel(writer, sheet_name='New', index=None)
writer.save()
The little line of writer.book=book has me stumped. Without that piece of code, the Excel file will delete all other sheets, except the sheet used in the sheetname= parameter in df1.to_excel.
i looked at xlsxwriter's documentation as well as openpyxl's, but cannot seem to figure out why that line gives me my expected output. Any ideas?
edit: i believe this post is where i got the original idea from.
In the source code of ExcelWriter, with openpyxl, it initializes empty workbook and delete all sheets. That's why you need to add it explicitly
class _OpenpyxlWriter(ExcelWriter):
engine = 'openpyxl'
supported_extensions = ('.xlsx', '.xlsm')
def __init__(self, path, engine=None, **engine_kwargs):
# Use the openpyxl module as the Excel writer.
from openpyxl.workbook import Workbook
super(_OpenpyxlWriter, self).__init__(path, **engine_kwargs)
# Create workbook object with default optimized_write=True.
self.book = Workbook()
# Openpyxl 1.6.1 adds a dummy sheet. We remove it.
if self.book.worksheets:
try:
self.book.remove(self.book.worksheets[0])
except AttributeError:
# compat
self.book.remove_sheet(self.book.worksheets[0])

Getting the excel file after df.to_excel(...) with Panda

I am using Pyrebase to upload my files to Firebase.
I have a DataFrame df and convert it to an Excel File as follows:
writer = ExcelWriter('results.xlsx')
excelFile = df.to_excel(writer,'Sheet1')
print(excelFile)
# Save to firebase
childRef = "path/to/results.xlsx"
storage = firebase.storage()
storage.child(childRef).put(excelFile)
However, this stores the Excel file as an Office Spreadsheet with zero bytes. If I run writer.save() then I do get the appropriate filetype (xlsx), but it is stored on my Server (which I want to avoid). How can I generate the right filetype as one would do with writer.save()?
Note: print(excelFile) returns None
It can be solved by using local memory:
# init writer
bio = BytesIO()
writer = pd.ExcelWriter(bio, engine='xlsxwriter')
filename = "output.xlsx"
# sheets
dfValue.to_excel(writer, "sheetname")
# save the workbook
writer.save()
bio.seek(0)
# get the excel file (answers my question)
workbook = bio.read()
excelFile = workbook
# save the excelfile to firebase
# see also issue: https://github.com/thisbejim/Pyrebase/issues/142
timestamp = str(int(time.time()*1000));
childRef = "/path/to/" + filename
storage = firebase.storage()
storage.child(childRef).put(excelFile)
fileUrl = storage.child(childRef).get_url(None)
According to the documentation you should add
writer.save()
source

Categories

Resources