Pandas Export Dictionary of Data Frames to Excel - python

Hi I am hoping someone can help me if possible.
I have a large spreadsheet of data that I have created a 'dictionary of data frames' I am however struggling to now export this to one excel file with each data frame having its own sheet in the excel document. As this could be used by other people I would also like to make the file export flexible ( it will be a clickable exe file)
I have looked at the following posts for help but just cant seem to get my head round it:
Python - splitting dataframe into multiple dataframes based on column values and naming them with those values
Save list of DataFrames to multisheet Excel spreadsheet
My code is as follows:
# Sort the Dataframe
df.sort_values(by = 'Itinerary_Departure_Date')
#Seperate Bookings By Itinerary
df_dict = dict(iter(df.groupby('Itinerary_Departure_Date')))
filepath = filedialog.asksaveasfilename(defaultextension = 'xlsx')
def frames_to_excel(df_dict, path = 'filepath'):
#Write dictionary of dataframes to separate sheets, within 1 file.
writer = pd.ExcelWriter(path, engine='xlsxwriter')
for tab_name, df_dict in df_dict.items():
df_dict.to_excel(writer, sheet_name=tab_name)
writer.save()
Fixed it!
Went down a different rabbit hole!
#Seperate Bookings By Itinerary
dict_of_itin = {k: v for k, v in df.groupby('Itinerary_Departure_Date')}
#Chooseemptyexcelfromwhereeversaved
root = tk.Tk()
root.withdraw()
file_path = filedialog.askopenfilename()
book = load_workbook(file_path.replace('\\','/'))
writer = pd.ExcelWriter(file_path, engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
for df_name, df in dict_of_itin.items():
df.to_excel(writer, sheet_name=df_name)
writer.save()
It relies on the person using it saving an empty spreadsheet first wherever they want but then will write to it.
not as elegant but it works! :D
Chris

Related

Write DataFrame to Excel Template and Save as new file

I have a dataframe and an Excel template file, which has a worksheet that contains column headers, some formula, and pivot tables on another sheets.
I want to paste the data onto it then save the template as a new Excel file.
First thing I notice is that I cannot save the template as a new excel file.
Second is I cannot write the Dataframe to existing worksheet, it will create a new sheet for the data.
Then I found an option on pd.ExcelWriter, if_sheet_exists='overlay' on the internet. But it gives me Error
'overlay' is not valid for if_sheet_exists. Valid options are 'error', 'new' and 'replace'.
I'm using pandas version 1.5.1. Is it still possible to achieve this, or is there any better solution?
def write_report(df):
template_filename = f'Daily Quality Report Template.xlsx'
today_str = datetime.strftime(datetime.now(), '%Y%m%d')
result_filename = f'Report\\Daily Quality Report {today_str}.xlsx'
result_sheetname = today_str
# create new file
xlresult = Workbook()
xlresult.save(result_file_name)
# write
writer = pd.ExcelWriter(result_filename, engine='openpyxl', mode='a', if_sheet_exists='overlay')
writer.book = load_workbook(template_filename)
writer.sheets = {ws.title: ws for ws in writer.book.worksheets}
df.to_excel(writer, result_sheetname, startrow=1, header=False, index=False)
writer.save()

How to add a empty column to a specific sheet in excel using panda?

I have an excel file that contains 3 sheets (PizzaHut, InAndOut, ColdStone). I want to add an empty column to the InAndOut sheet.
path = 'C:\\testing\\test.xlsx'
data = pd.ExcelFile(path)
sheets = data.sheet_names
if 'InAndOut' in sheets:
something something add empty column called toppings to the sheet
data.to_excel('output.xlsx')
Been looking around, but I couldn't find an intuitive solution to this.
Any help will be appreciated!
Read in the sheet by name.
Do what you need to do.
Overwrite the sheet with the modified data.
sheet_name = 'InAndOut'
df = pd.read_excel(path, sheet_name)
# Do whatever
with pd.ExcelWriter(path, engine="openpyxl", mode="a", if_sheet_exists="replace") as writer:
df.to_excel(writer, sheet_name, index=False)
See pd.read_excel and pd.ExcelWriter.

CSV to Excel - preserving percentages using Python

I am trying to write a script that merge multiple CSV files into one Excel file with separate sheets. In doing so, I realized any column that has a percentage in it, will have the error The number in this cell is formatted as text or preceded by an apostrophe. There are multiples columns in each sheet with no specific order that contain percentages, so I am looking for a solution that avoids the error for all columns containing a percentage. I have tried passing different parameters but they haven't solved the problem yet.
Here is a sample code of what I have:
import pandas as pd
path = 'Test.csv'
df = pd.read_csv(path)
writer = pd.ExcelWriter('result.xlsx', engine='xlsxwriter')
df.to_excel(writer, index = None, header=True)
writer.save()
I have included screenshots of the csv file and the excel file, so I hope it helps to better demonstrate the issue at hand. Just to be clear, this is only for the sake of presentation and not the actual files I am working with.
you can add formats using set_column to worksheet by accessing it.
data = {'Text':['This', 'Column', 'is filled', 'with'], 'Number':[20, 21, 19, 18], 'Percentages':['23.3%','46.82%','1.28%','33.3%']}
df = pd.DataFrame(data)
writer = pd.ExcelWriter('result.xlsx', engine='xlsxwriter')
df.to_excel(writer, index=False, header=True, sheet_name='somename')
# get the workbook and worksheet
workbook = writer.book
worksheet = writer.sheets['somename']
# add format (percentage with 2 decimals)
percent_fmt = workbook.add_format({'num_format': '0.00%'})
# set 3rd column to percentage format
worksheet.set_column(2,2,12, percent_fmt)
writer.save()

Copying records matching a specific field, to a specific sheet in an Excel file

I'm totally new to scripting and have been learning Python. I'm trying to copy an entire row of data from one Excel file to another. More specifically, I have a field called bound in my input excel spreadsheet. When this equals 5002, I'd like to copy that entire row to a sheet called 'bound_5002' in a new spreadsheet created by the Python script. My script works when I hardcode 5002 and bound_5002, but I have a list of about 30 of these unique bound codes that I'd like it to cycle through. I've tried to iterate through a list of the codes (shown below), but it creates an Excel file that is incorrect. Upon opening an error message appears
we found a problem with some content in data_recon_xlsx. Do you want us to try to recover as much as we can...
It has created new tabs with no data and the names ecovered_Sheet1 etc. Is my iterator wrong, missing something or can this function not work when iterating through a list?
Written the script without it iterating and it works when hardcoded in, but on trying to iterate through a list of codes it doesn't. I've tried printing out the fields being iterated, adding in a ' character either side (sheet_ref) or without commas.
Expected - an Excel file called 'data_recon.xlsx' with multiple tabs, containing the data for the corresponding bound field.
Actual - an Excel file with all the tabs created and headers as required, but missing the data that was required to be copied across. New sheets have been added but they are blank and have the names, 'Recovered_Sheet1', 'Recovered_Sheet2', etc.
### Create a list of the domain codes of interest
bounds = ['800', '3001', '3002', '3003', '3101', '3102', '3103', '3105', '3106', '3110', '3111', '3112', '5002', '5003', '5004', '5005', '5006', '5101', '5102', '5104', '5105', '5106', '5107', '5110', '9003', '9004', '9101', '9102', '9103', '9104', '9105', '9106']
### Copy out only the matching domains to the tabs
i = 0
ids = [(bounds[i])]
final_result = {}
while i <= 15:
with open(import_file_path_orig, 'r') as NN:
reader = csv.reader(NN)
next(reader)
for compid, dhid, length, gimp, to, bound, auppm, aucap in reader:
if bound in ids:
final_result.setdefault('compid', []).append(compid)
final_result.setdefault('dhid', []).append(dhid)
final_result.setdefault('length', []).append(length)
final_result.setdefault('gimp', []).append(gimp)
final_result.setdefault('to', []).append(to)
final_result.setdefault('bound', []).append(bound)
final_result.setdefault('auppm', []).append(auppm)
final_result.setdefault('aucap', []).append(aucap)
df = pd.DataFrame.from_dict(final_result)
### Paste the data matching the bound from dataframe to Excel sheet
book = load_workbook('data_recon.xlsx')
sheet_ref = ("'" + 'bound_'+ bounds[i] + "'")
sheet_name = (sheet_ref)
with pd.ExcelWriter('data_recon.xlsx', engine='openpyxl') as writer:
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
df.to_excel(writer, sheet_name=sheet_name, startrow = 1, startcol=0, header=False, index=False, engine='openpyxl')
writer.save()
print("bound_" + bounds[i] + " Sheet Populated")
### tests
print (sheet_ref)
print (bounds[i])
i += 1
print("DATA RECON FILE COMPLETE")
Below is showing an earlier version, without it iterating and works as required:
### Copy out only the matching domains to the tabs
ids = ['5101']
final_result = {}
with open('inout_file.csv', 'r') as NN:
reader = csv.reader(NN)
next(reader)
for compid, dhid, length, gimp, to, bound, auppm, aucap in reader:
if bound in ids:
final_result.setdefault('compid', []).append(compid)
final_result.setdefault('dhid', []).append(dhid)
final_result.setdefault('length', []).append(length)
final_result.setdefault('gimp', []).append(gimp)
final_result.setdefault('to', []).append(to)
final_result.setdefault('bound', []).append(bound)
final_result.setdefault('auppm', []).append(auppm)
final_result.setdefault('aucap', []).append(aucap)
df = pd.DataFrame.from_dict(final_result)
### Paste the data matching the bound from dataframe to Excel sheet
book = load_workbook('data_recon.xlsx')
sheet_name = 'bound_5101'
with pd.ExcelWriter('data_recon.xlsx', engine='openpyxl') as writer:
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
df.to_excel(writer, sheet_name=sheet_name, startrow = 1, startcol=0, header=False, index=False, engine='openpyxl')
print(sheet_name + " Sheet Populated")
I've updated this answer to show a much simpler version of what you have outlined above
In order to write multiple dataframes to a file on different sheets you need to do this outside of your loop once you have all the dataframes built.
# Import the csv file into a single datafrome
df = pd.read_csv(import_file_path_orig, columns=['compid', 'dhid', 'length', 'gimp', 'to', 'bound', 'auppm', 'aucap'])
# Creating a new sheet for each dataframe
# Open the proper filehandle
with pd.ExcelWriter('data_recon.xlsx', engine='openpyxl') as writer:
# .... If you have other stuff to do on the main sheet, do it here ....
# Now, we write a single sheet for each set of rows that include the 'bound' value
for b in bounds:
# Filter the dataset for those rows that match the current value of `b`
temp_df = df[df['bound']==b]
# Build the name of the sheet to be written
sheet_name = f'bound_{b}'
# Write the filtered values to a sheet in the current workbook
temp_df.to_excel(writer,sheet_name=sheet_name)
The issue in your current code is that once you write a workbook, it will be rewritten by the next time you try to write the workbook, you can't add sheets. docs

Overwrite sheets in Excel with Python

I'm new to Python (and programming in general) and am running into a problem when writing data out to sheets in Excel.
I'm reading in an Excel file, performing a sum calculation on specific columns, and then writing the results out to a new workbook. Then at the end, it creates two charts based on the results.
The code works, except every time I run it, it creates new sheets with numbers appended to the end. I really just want it to overwrite the sheet names I provide, instead of creating new ones.
I'm not familiar enough with all the modules to understand all the options that are available. I've researched openpyxl, and pandas, and similar examples to what I'm trying to do either aren't easy to find, or don't seem to work when I try them.
import pandas as pd
import xlrd
import openpyxl as op
from openpyxl import load_workbook
import matplotlib.pyplot as plt
# declare the input file
input_file = 'TestData.xlsx'
# declare the output_file name to be written to
output_file = 'TestData_Output.xlsx'
book = load_workbook(output_file)
writer = pd.ExcelWriter(output_file, engine='openpyxl')
writer.book = book
# read the source Excel file and calculate sums
excel_file = pd.read_excel(input_file)
num_events_main = excel_file.groupby(['Column1']).sum()
num_events_type = excel_file.groupby(['Column2']).sum()
# create dataframes and write names and sums out to new workbook/sheets
df_1 = pd.DataFrame(num_events_main)
df_2 = pd.DataFrame(num_events_type)
df_1.to_excel(writer, sheet_name = 'TestSheet1')
df_2.to_excel(writer, sheet_name = 'TestSheet2')
# save and close
writer.save()
writer.close()
# dataframe for the first sheet
df = pd.read_excel(output_file, sheet_name='TestSheet1')
values = df[['Column1', 'Column3']]
# dataframe for the second sheet
df = pd.read_excel(output_file, sheet_name='TestSheet2')
values_2 = df[['Column2', 'Column3']]
# create the graphs
events_graph = values.plot.bar(x = 'Column1', y = 'Column3', rot = 60) # rot = rotation
type_graph = values_2.plot.bar(x = 'Column2', y = 'Column3', rot = 60) # rot = rotation
plt.show()
I get the expected results, and the charts work fine. I'd really just like to get the sheets to overwrite with each run.
From the pd.DataFrame.to_excel documentation:
Multiple sheets may be written to by specifying unique sheet_name.
With all data written to the file it is necessary to save the changes.
Note that creating an ExcelWriter object with a file name that already
exists will result in the contents of the existing file being erased.
Try writing to the book like
import pandas as pd
df = pd.DataFrame({'col1':[1,2,3],'col2':[4,5,6]})
writer = pd.ExcelWriter('g.xlsx')
df.to_excel(writer, sheet_name = 'first_df')
df.to_excel(writer, sheet_name = 'second_df')
writer.save()
If you inspect the workbook, you will have two worksheets.
Then lets say you wanted to write new data to the same workbook:
writer = pd.ExcelWriter('g.xlsx')
df.to_excel(writer, sheet_name = 'new_df')
writer.save()
If you inspect the workbook now, you will just have one worksheet named new_df
If there are other worksheets in the excel file that you want to keep and just overwrite the desired worksheets, you would need to use load_workbook.
Before you wrtie any data, you could delete the sheets you want to write to with:
std=book.get_sheet_by_name(<sheee_name>)
book.remove_sheet(std)
That will stop the behavior where a number gets appended to the worksheet name once you attempt to write a workbook with a duplicate sheet name.

Categories

Resources