xlsxwriter and pandas for reporting - python

I am trying to create a basic excel report.
I am trying display a dataframe as well as some custom text/titles, not part of the dataframe.
However, I can only get one or the other. I don't really understand the end of the code that is needed for the dataframe to appear (workbook = writer.book and worksheet = writer.sheets['Reports'].
Here is my code:
writer = pd.ExcelWriter('reportTemplate.xlsx', engine='xlsxwriter')
workbook = xlsxwriter.Workbook('reportTemplate.xlsx')
worksheet = workbook.add_worksheet('Reports')
# REPORT TITLE
worksheet.write('D2','Daily In-Store Report')
workbook = xlsxwriter.Workbook('reportTemplate.xlsx')
worksheet = workbook.add_worksheet('Reports')
worksheet.write('D2','Daily In-Store Report')
reportTimes = ['Day','Week','Period','Quarter','Year']
cityList = ['ontario','bayshore','ottawa','limeridge','oshawa','scarborough','sherway','massonville','gatineau',
'quebec','anjou','dix30','Fairview','laval','mtltrust','stbruno','gcapitale','stefoy','rivieres','chicoutimi','sherbrooke','canada']
# LOOP THROUGH FILES
rowNb = 4
for time in reportTimes:
# TITLE
tableTitle = time + ' report as of ...'
worksheet.write('A'+str(rowNb),tableTitle)
rowNb += 1
headRow, secondHead = createHeadings(time)
worksheet.write_row('B' + str(rowNb), headRow)
worksheet.write_row('B' + str(rowNb), secondHead)
rowNb += 2
df = pd.read_csv('fy_' + time.lower() + '.csv')
df.set_index('legacy_id',inplace=True)
df = df.reindex(cityList)
print(df)
df.to_excel(writer,sheet_name='Reports',startrow = rowNb,header=False)
workbook = writer.book
worksheet = writer.sheets['Reports']
writer.save()
As the code is right now, it only displays the dataframe

It's not clear to me whether you're trying to write multiple sheets in one Excel file. If so, the problem may be that you're re-writing the same sheet called 'Reports' four times. Also, here are some basics to try. Put the df.to_excel() after pd.ExcelWriter(). Then remove from the for loop the last four lines. Finally, put writer.save() after the for loop ends. (This was not very clear for me when I first learned them, too. See more examples at this link.)
Edit: here's fully executing code (with stub data). One of of the keys was to enable multiple writes to the worksheet using writer.sheets['Reports'] = worksheet - see this explanation.
dummy_df = pd.DataFrame([[10,np.NaN],[12,42],[16,np.NaN],[20,3],[25,16],[30,1],[40,19],[60,99]],columns=['legacy_id', 'b'])
writer = pd.ExcelWriter('reportTemplate.xlsx', engine='xlsxwriter')
workbook = writer.book
worksheet = workbook.add_worksheet('Reports')
writer.sheets['Reports'] = worksheet # enable multiple writes to sheet
# REPORT TITLE
worksheet.write('D2','Daily In-Store Report')
reportTimes = ['Day','Week','Period','Quarter','Year']
cityList = ['ontario','bayshore','ottawa','limeridge','oshawa','scarborough','sherway','massonville','gatineau',
'quebec','anjou','dix30','Fairview','laval','mtltrust','stbruno','gcapitale','stefoy','rivieres','chicoutimi','sherbrooke','canada']
# LOOP THROUGH FILES
rowNb = 4
for time in reportTimes:
# TITLE
tableTitle = time + ' report as of ...'
worksheet.write('A'+str(rowNb),tableTitle)
rowNb += 1
headRow, secondHead = "dummy head row", "dummy second head" #I don't have your createHeadings(time)
worksheet.write_row('B' + str(rowNb), headRow)
worksheet.write_row('B' + str(rowNb), secondHead)
rowNb += 2
df = dummy_df.copy(deep=True) # pd.read_csv('fy_' + time.lower() + '.csv')
df.set_index('legacy_id',inplace=True)
df = df.reindex(cityList)
#print(df)
df.to_excel(writer,sheet_name='Reports', startrow = rowNb)
rowNb += df.shape[0] #gives row count
writer.save()

Related

Splitting Excel Data by Groupings into Separate Workbook Sheets

Background:I have a large 40MB XLSX file that contains client data which is Grouped over multiple levels, like so:
Expanded -
Not Expanded (sorry about the terrible dummy data!) -
Objective:I would like to split Client A, B C etc... and all their respective underlying data into separate sheets (named 'Client A' etc...) in a Workbook.
Question:Am I correct in assuming that there is no python library that would help with this (e.g., xlsxwriter) and that I will likely have to save into multiple pandas df before splitting and writing to the xlsx file?
Sample Data:Here is a link to some randomized sample data. In this file you will see only 1 client (the total row can be ignored) however imagine the normal file having 40 clients / groupings and sub levels.
Sample Code: this function takes the '.xlsxand writes each grouping to an appropriately named tab (e.g., 'Client A') to a separate Worksheet in a new.xlsx`. The issue with this code is that because I am basically going through and copying each cell individually, I didn't think to consider more holistically however to ensure the Groupings/Levels would be preserved. I think this code needs a complete re-write, and welcome feedback
import openpyxl
from copy import copy
from openpyxl import load_workbook
columns=['A','B','C','D','E','F','G','H','I','J','K','L']
def copy_cell(ws, row,ws_row,ws1):
for col in columns:
ws_cell=ws1[col+str(ws_row)]
new_cell = ws[col+str(row)]
if ws_cell.has_style:
new_cell.font = copy(ws_cell.font)
new_cell.border = copy(ws_cell.border)
new_cell.fill = copy(ws_cell.fill)
new_cell.number_format = copy(ws_cell.number_format)
new_cell.protection = copy(ws_cell.protection)
new_cell.alignment = copy(ws_cell.alignment)
wb1 = openpyxl.load_workbook('annonamized_test_data_to_be_split.xlsx')
ws1=wb1.active
indexs=[]
clients=[]
index=1
while ws1['A'+str(index)]:
if str(ws1['A'+str(index)].alignment.indent)=='0.0':
indexs.append(index)
clients.append(ws1['A'+str(index)].value)
if ws1['A'+str(index)].value is None:
indexs.append(index)
break
index+=1
wb1.close()
wb = openpyxl.Workbook()
ws=wb.active
start_index=1
headers=['Ownership Structure', 'Fee Schedule', 'Management Style', 'Advisory Firm', 'Inception Date', 'Days in Time Period', 'Adjusted Average Daily Balance (No Div, USD)', 'Assets Billed On (USD)',
'Effective Billing Rate', 'Billing Fees (USD)', 'Bill To Account', 'Model Type']
for y,index in enumerate(indexs):
try:
client=0
if len(clients[y])>=32:
client=clients[y][:31]
else:
client=clients[y]
wb.create_sheet(client)
ws=wb[client]
ws.column_dimensions['A'].width=35
ws.append(headers)
row_index=2
for i in range(start_index,indexs[y+1]):
ws.append([ws1[col+str(i)].value for col in columns])
copy_cell(ws,row_index,i,ws1)
row_index+=1
start_index=indexs[y+1]
except:
pass
wb.save('split_data.xlsx')
wb.close()
try:
wb1 = openpyxl.load_workbook('split_data.xlsx')
a=wb1['Sheet']
wb1.remove(a)
a=wb1['Sheet1']
wb1.remove(a)
wb1.save('split_data.xlsx')
wb1.close()
except:
pass
Please can someone point me in the right direction of a resource that might teach me how to achieve this?
from openpyxl import load_workbook
def get_client_rows(sheet):
"""Get client rows.
Skip header and then look for row dimensions without outline level
"""
return [row[0].row for row in sheet.iter_rows(2) if row[0].alignment.indent == 0.0]
return [
row_index
for row_index, row_dimension in sheet.row_dimensions.items()
if row_index > 1 and row_dimension.outline_level == 0
]
def delete_client_block(sheet, start, end):
"""
Delete rows starting from up to and including end.
"""
for row in range(start, end + 1):
sheet.row_dimensions.pop(row, None)
sheet.delete_rows(start, end - start + 1)
def split_workbook(input_file, output_file):
"""
Split workbook each main group into its own sheet.
Not too loose any formatting we copy the current sheet and remove all rows
which do not belong to extacted group.
"""
try:
workbook = load_workbook(input_file)
data_sheet = workbook.active
client_rows = get_client_rows(data_sheet)
for index, client_row in enumerate(client_rows):
# create new sheet for given client, shorten client as it might be too long
client_sheet = workbook.copy_worksheet(data_sheet)
client_sheet.title = data_sheet.cell(client_row, 1).value[:32]
# delete rows after current client if available
if index < len(client_rows) - 1:
row_after_client = client_rows[index + 1]
delete_client_block(
client_sheet, row_after_client, client_sheet.max_row
)
# delete rows before current client if available
if index > 0:
first_client_row = client_rows[0]
delete_client_block(
client_sheet, first_client_row, client_row - first_client_row + 1
)
# move left over dimensions to top of the sheet
for row_index in list(client_sheet.row_dimensions.keys()):
# skip header row dimension
if row_index > first_client_row - 1:
row_dimension = client_sheet.row_dimensions.pop(row_index)
new_index = row_index - client_row + first_client_row
row_dimension.index = new_index
client_sheet.row_dimensions[new_index] = row_dimension
del workbook[data_sheet.title]
workbook.save(output_file)
finally:
workbook.close()
if __name__ == "__main__":
# input_file = "annonamized_test_data_to_be_split.xlsx"
input_file = 'partial_Q1_Client_Billing_Data.xlsx'
# output_file = "split_data.xlsx"
output_file = "splitting_full_data.xlsx"
split_workbook(input_file, output_file)

How to add Pandas Dataframe to multiple worksheets starting from FIRST CELL?

I am trying to add Pandas dataframe to all the worksheets in an Excel file.However the starting header index is always becoming B1 wheres I am trying to fit it from A1.
Below is the code:
import os
import xlwt
from xlwt.Workbook import *
from pandas import ExcelWriter
import xlsxwriter
from openpyxl import Workbook, load_workbook
Categories = ["Column" + str(column) for column in range(1,10)]
wb1 = Workbook()
for i in range(1,5):
ws = wb1.create_sheet("1_"+ str(i))
for i in range(5):
ws = wb1.create_sheet("2_"+ str(i))
for i in range(5):
ws = wb1.create_sheet("3_"+ str(i))
for i in range(5):
ws = wb1.create_sheet("4_"+ str(i))
for i in range(5):
ws = wb1.create_sheet("5_"+ str(i))
for i in range(5):
ws = wb1.create_sheet("6_"+ str(i))
wb1.save('FrameFiles.xlsx')
df = pd.DataFrame(columns=Categories)
book = load_workbook('FrameFiles.xlsx')
writer = pd.ExcelWriter('FrameFiles.xlsx',engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
for i in wb1.sheetnames:
df.to_excel(writer, sheet_name=i,index=True,startrow=1,startcol=1)
writer.save()
And the output is coming as following :
enter image description here
I want the header to start from A1 position not B1.I have tried with startrow=0 and startcol=0 also but the result is same. Any suggestion to solve this issue would be highy appreciated.
The empty column is your index column but you have no index in your dataframe so it is empty. try using index=False and you should get what you are expecting
df.to_excel(writer, sheet_name=i,index=False,startrow=0,startcol=0)

How to get Pandas to create new sheet instead of overwriting?

I am building a little automatic reporting tool for my job. I am trying to make my code work to create another sheet every time (each day) that I run the program and generate the report.
date_time = time.strftime('%b %d %Y')
writer = pd.ExcelWriter('BrokerRisk.xlsx', engine='xlsxwriter')
df.to_excel(writer,'DataFrame-' + date_time)
sums.to_excel(writer,'TotalByCounterparty-' + date_time)
sums_sort.to_excel(writer,'SortedRank-' + date_time)
workbook = writer.book
worksheet1 = writer.sheets['DataFrame-' + date_time]
worksheet2 = writer.sheets['TotalByCounterparty-' + date_time]
worksheet3 = writer.sheets['SortedRank-' + date_time]
writer.save()
I tried implementing the date feature so that it would change the name technically every day, but this doesn't seem to work either. Can anyone suggest a simple fix?
Using engine=openpyxl as writer will do what you want, for instance:
from openpyxl import load_workbook
from copy import copy
import time
class CopyWorkbook(object):
def __init__(self, fname):
self.fname = fname
self.wb = load_workbook(fname)
def save(self):
self.wb.save(self.fname)
def copy_worksheet(self, from_worksheet):
# Create new empty sheet and append it to self(Workbook)
ws = self.wb.create_sheet( title=from_worksheet.title )
for row, row_data in enumerate(from_worksheet.rows,1):
for column, from_cell in enumerate(row_data,1):
cell = ws.cell(row=row, column=column)
cell.value = from_cell.value
cell.font = copy(from_cell.font)
date_time = time.strftime('%b %d %Y')
writer = pd.ExcelWriter('dummy.xlsx', engine='openpyxl')
df.to_excel(writer,'DataFrame-' + date_time)
# ... Other DataFrame's .to_excel(writer, ...
wb = CopyWorkbook('BrokerRisk.xlsx')
for ws_new in writer.book.worksheets:
wb.copy_worksheet(ws_new)
wb.save()
Tested with Python: 3.4.2 - openpyxl: 2.4.1 - LibreOffice: 4.3.3.2

Pandas Not Reading Excel Properly

I am trying to use and Add-In for Excel that gets removed when I use win32com.client forcing me to restart my computer. I have found a work around using xlrd, openpyxl, and pandas but I have run into a completely new issue.
I first open Excel with the pandas and read through the file extracting the information that I require.
xl = pandas.ExcelFile(xlsx)
sheets = xl.sheet_names
df = xl.parse(sheets[2])
I then have to go into the same workbook and update the Meter Name and the date.
for i, value in enumerate(dataList):
wb = openpyxl.load_workbook(xlsx)
worksheets = wb.sheetnames
worksheet = wb.get_sheet_by_name(worksheets[0])
rowCoordinate = i
meterName = value[0]
creationDate = value[1]
units = value[2]
worksheet.cell(row=1, column=2).value = meterName
wb.save(copyXlsx)
dateList = []
for k, dateRange in enumerate(value[3]):
sDate = dateRange[0]
eDate = dateRange[1]
wb = openpyxl.load_workbook(copyXlsx)
worksheets = wb.sheetnames
worksheet = wb.get_sheet_by_name(worksheets[0])
worksheet.cell(row=2, column=2).value = sDate
worksheet.cell(row=3, column=2).value = eDate
wb.save(copyXlsx1)
print meterName, dateRange
xl1 = pandas.ExcelFile(copyXlsx1)
sheets = xl1.sheet_names
df = xl.parse(sheets[0])
print df
My issue is that the excel file opens and write the information perfectly. but pandas has all the header information updated but the numbers are the same from the original document. I have gone in and explored the Intermediate Excel Document and it doesn't match the number pandas shows

Importing Multiple Excel Files using OpenPyXL

I am trying to read in multiple excel files and append the data from each file into one master file. Each file will have the same headers (So I can skip the import of the first row after the initial file).
I am pretty new to both Python and the OpenPyXL module. I am able to import the first workbook without problem. My problem comes in when I need to open the subsequent file and copy the data to paste into the original worksheet.
Here is my code so far:
# Creating blank workbook
from openpyxl import Workbook
wb = Workbook()
# grab active worksheet
ws = wb.active
# Read in excel data
from openpyxl import load_workbook
wb = load_workbook('first_file.xlsx') #explicitly loading workbook, will automate later
# grab active worksheet in current workbook
ws = wb.active
#get max columns and rows
sheet = wb.get_sheet_by_name('Sheet1')
print ("Rows: ", sheet.max_row) # for debugging purposes
print ("Columns: ", sheet.max_column) # for debugging purposes
last_data_point = ws.cell(row = sheet.max_row, column = sheet.max_column).coordinate
print ("Last data point in current worksheet:", last_data_point) #for debugging purposes
#import next file and add to master
append_point = ws.cell(row = sheet.max_row + 1, column = 1).coordinate
print ("Start new data at:", append_point)
wb = load_workbook('second_file.xlsx')
sheet2 = wb.get_sheet_by_name('Sheet1')
start = ws.cell(coordinate='A2').coordinate
print("New data start: ", start)
end = ws.cell(row = sheet2.max_row, column = sheet2.max_column).coordinate
print ("New data end: ", end)
# write a value to selected cell
#sheet[append_point] = 311
#print (ws.cell(append_point).value)
#save file
wb.save('master_file.xlsx')
Thanks!
I don't really understand your code. It looks too complicated. When copying between worksheets you probably want to use ws.rows.
wb1 = load_workbook('master.xlsx')
ws2 = wb1.active
for f in files:
wb2 = load_workbook(f)
ws2 = wb2['Sheet1']
for row in ws2.rows[1:]:
ws1.append((cell.value for cell in row))

Categories

Resources