Getting the excel file after df.to_excel(...) with Panda - python

I am using Pyrebase to upload my files to Firebase.
I have a DataFrame df and convert it to an Excel File as follows:
writer = ExcelWriter('results.xlsx')
excelFile = df.to_excel(writer,'Sheet1')
print(excelFile)
# Save to firebase
childRef = "path/to/results.xlsx"
storage = firebase.storage()
storage.child(childRef).put(excelFile)
However, this stores the Excel file as an Office Spreadsheet with zero bytes. If I run writer.save() then I do get the appropriate filetype (xlsx), but it is stored on my Server (which I want to avoid). How can I generate the right filetype as one would do with writer.save()?
Note: print(excelFile) returns None

It can be solved by using local memory:
# init writer
bio = BytesIO()
writer = pd.ExcelWriter(bio, engine='xlsxwriter')
filename = "output.xlsx"
# sheets
dfValue.to_excel(writer, "sheetname")
# save the workbook
writer.save()
bio.seek(0)
# get the excel file (answers my question)
workbook = bio.read()
excelFile = workbook
# save the excelfile to firebase
# see also issue: https://github.com/thisbejim/Pyrebase/issues/142
timestamp = str(int(time.time()*1000));
childRef = "/path/to/" + filename
storage = firebase.storage()
storage.child(childRef).put(excelFile)
fileUrl = storage.child(childRef).get_url(None)

According to the documentation you should add
writer.save()
source

Related

How to properly implement openpyxl and csvwriter in AWS with S3 bucket as input directory

I was using this code to process an excel file in Python on my local machine where input_dir was my input directory and file was just the file I wanted to grab from that directory:
input_file = input_dir + file
def excel_to_csv(input_file):
#open workbook and store excel object
excel = openpyxl.load_workbook(input_file)
#select active sheet
sheet = excel["PUBLISH"]
#create writer object
col = csv.writer(open("tt.csv",'w', newline=""))
#write data to csv
for r in sheet.rows:
col.writerow([cell.value for cell in r])
#Convert CSV to dataframe
excel_to_csv(input_file)
jpm = pd.DataFrame(pd.read_csv("tt.csv", header = 11, usecols = [*range(1,16)]))
However when I tried to migrate this to AWS using an S3 bucket as the source directory, the code fails. I know it is because I need to use an io.BytesIO object to accomplish this, but I am very much unversed in AWS and am not sure how to use openpyxl and csv.writer in the AWS environment.

How can I add multiple sheets from multiple workbooks into one workbook without overwriting the whole file?

I have two excel files (.xls) in the "Files" folder. I want to take each sheet of them both and put them into one separate workbook, called masterFile.xls. The code below downloads some example files so you can see what I'm working with.
import pandas as pd
import os
import requests
resp = requests.get("https://www.ons.gov.uk/file?uri=%2femploymentandlabourmarket%2fpeopleinwork%2femploymentandemployeetypes%2fdatasets%2fsummaryoflabourmarketstatistics%2fcurrent/a01dec2021.xls")
output = open("1.xls", 'wb')
output.write(resp.content)
output.close()
resp = requests.get("https://www.ons.gov.uk/file?uri=%2femploymentandlabourmarket%2fpeopleinwork%2femploymentandemployeetypes%2fdatasets%2femploymentunemploymentandeconomicinactivityforpeopleaged16andoverandagedfrom16to64seasonallyadjusteda02sa%2fcurrent/a02sadec2021.xls")
output = open("2.xls", 'wb')
output.write(resp.content)
output.close()
cwd = os.path.abspath('')
files = os.listdir(cwd)
for file in files:
if file.endswith('.xls'):
excelFile = pd.ExcelFile(file)
sheets = excelFile.sheet_names
for sheet in sheets:
data = pd.read_excel(excelFile,sheet_name = sheet)
data.to_excel("masterFile.xls",sheet_name = sheet)
Each time it adds the sheet, it replaces whatever was already there instead of adding a new sheet.

Adding data from different data frame to excel

Currently what I want to do is take data I have from a data frame list and add them to an existing excel file as their own tabs.
To test this out, I have tried it with one data frame. There are no error but when I go to open the excel file it says it is corrupt. I proceed to recover the information but I rather not have to do that every time. I believe it would fail if I looped through my list to make this happen.
import os,glob
import pandas as pd
from openpyxl import load_workbook
master_file='combined_csv.xlsx'
#set the directory
os.chdir(r'C:\Users\test')
#set the type of file
extension = 'csv'
#take all files with the csv extension into an array
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]
col_to_keep=["Name",
"Area (ft)",
"Length (ft)",
"Center (ft)",
"ID",
"SyncID"]
combine_csv = pd.concat([pd.read_csv(f, delimiter=';', usecols=col_to_keep) for f in all_filenames])
combine_csv.to_excel(master_file, index=False,sheet_name='All')
# Defining the path which excel needs to be created
# There must be a pre-existing excel sheet which can be updated
FilePath = r'C:\Users\test'
# Generating workbook
ExcelWorkbook = load_workbook(FilePath)
# Generating the writer engine
writer = pd.ExcelWriter(FilePath, engine = 'openpyxl')
# Assigning the workbook to the writer engine
writer.book = ExcelWorkbook
# Creating first dataframe
drip_file = pd.read_csv(all_filenames[0], delimiter = ';', usecols=col_to_keep)
SimpleDataFrame1=pd.DataFrame(data=drip_file)
print(SimpleDataFrame1)
# Adding the DataFrames to the excel as a new sheet
SimpleDataFrame1.to_excel(writer, sheet_name = 'Drip')
writer.save()
writer.close()
It seems like it runs fine with no errors but when I open the excel file I get the error shown below.
Does anyone see something wrong with the code that would cause excel to give me this error?
Thank you in advance
Your code knows its printing data to the same workbook, but to use writer you will also need to tell python what the sheet names are:
book = load_workbook(your_destination_file)
writer = pd.ExcelWriter(your_destination_file, engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets) # tells
pandas/python what the sheet names are
Your_dataframe.to_excel(writer, sheet_name=DesiredSheetname)
writer.save()
Also, if you have pivots, pictures, external connections in the document they will be deleted and could be what is causing the corruption.

Pandas - After Successfully Writing to a New Workbook, Excel Gives Invalid File Error

When running my code with a macro workbook, I get the following error: "Excel cannot be open the file 'filename.xlsm' because the file format or file extension is not valid," but when I run my code on an xlsx file, I do not get this error and the new workbook loads up fine.
I am using Pandas to read the file with the openpyxl engine. I've tried solutions with the XlsxWriter, and that hasn't helped.
Appreciate any suggestions
This is how I write to a new workbook:
writer = pd.ExcelWriter(file_path2, engine = 'openpyxl', mode = 'a', if_sheet_exists = 'replace') #Append data to exisiting sheet
dataClean.to_excel(writer, sheet_name = 'Data Input', header = False, index = False)
writer.save()
writer.close()
I know this is an old post, but recently I have ran into the same issue and this was the only post I've found about it.
writer.close() already saves the file according to the documentation. There is no need to call writer.save() manually.
You could also use a with statement:
with pd.ExcelWriter(file_path2, engine = 'openpyxl', mode = 'a', if_sheet_exists = 'replace') as writer: # Append data to exisiting sheet
dataClean.to_excel(writer, sheet_name = 'Data Input', header = False, index = False)

Pandas to openpyxl Workbook to download file in Flask

The goal is save multiple dataframes in Excel sheet (each dataframe as a sheet) and download the file when the user hit the specified URL.
This is the code.
#app.server.route("/file/excel")
def download_excel():
wb = Workbook()
df1 = pd.DataFrame(...)
sheet1 = wb.active
sheet1.title = "Sheet1"
for r in dataframe_to_rows(df1, index=False, header=True):
sheet1.append(r)
df2 = pd.DataFrame(...)
sheet2 = wb.active
sheet2.title = "Sheet1"
for r in dataframe_to_rows(df2, index=False, header=True):
sheet2.append(r)
excel_stream = io.BytesIO()
wb.save(excel_stream)
excel_stream.seek(0) # go to the beginning of the stream
#
return send_file(
excel_stream,
mimetype='application/vnd.openxmlformats-officedocument.spreadsheetml.sheet',
attachment_filename="File.xlsx",
as_attachment=True,
cache_timeout=0
)
I am getting the following error.
AttributeError: 'DatetimeArray' object has no attribute 'tolist'
df1 has a column with datatime data type. I did some search and found out that iterating through dataframe is not advised and that is causing this error.
The alternative is to use df.to_excel(), but I don't know how to make it work with BytesIO as I need to stream the data to be downloaded.
Question: how can I save the data to the excel sheet and get the error.
I have to use send_file() for flask to download the file on the client.
Converting the datatime dtype to string before appending to the excel sheet resolved the issue. There might be a better solution, but this solved my issue

Categories

Resources