I'm attempting to create a script to process several Excel sheets at once and one of the steps i'm trying to get Python to handle is to create a table using data passed from a pandas data frame. Creating a table seems pretty straightforward looking at the documentation.
Following the example from here:
# define a table style
mediumstyle = TableStyleInfo(name='TableStyleMedium2', showRowStripes=True)
# create a table
table = Table(displayName='IdlingReport', ref='A1:C35', tableStyleInfo=mediumstyle)
# add the table to the worksheet
sheet2.add_table(table)
# Saving the report
wb.save(openexcel.filename)
print('Report Saved')
However this creates an empty table, instead of using the data present in cells 'A1:C35'. I can't seem to find any examples anywhere that go beyond these steps so any help with what I may be doing wrong is greatly appreciated.
The data in 'A1:C35' is being written to Excel as follows:
while i < len(self.sheets):
with pd.ExcelWriter(filename, engine='openpyxl') as writer:
writer.book = excelbook
writer.sheets = dict((ws.title, ws) for ws in excelbook.worksheets)
self.df_7.to_excel(writer, self.sheets[i], index=False, header=True, startcol=0, startrow=0)
writer.save()
i += 1
The output looks something like this
Time Location Duration
1/01/2019 [-120085722,-254580042] 5 Min
1/02/2019 [-120085722,-254580042] 15 Min
1/02/2019 [-120085722,-254580042] 7 Min
Just to clarify right now I am first writing my data frame to Excel and then after formatting the data I've written as a table. Reversing these steps by creating the table first and then writing to Excel fills the table, but gets rid of the formatting(font color, font type, size, etc). Which means I'd have to add an additional step to fix the formatting(which i'd like to avoid if possible).
Your command
# create a table
table = Table(displayName='IdlingReport', ref='A1:C35', tableStyleInfo=mediumstyle)
creates a special Excel object — an empty table with the name IdlingReport.
You probably want something else - to fill a sheet of your Excel workbook with data from a Pandas dataframe.
For this purpuse there is a function dataframe_to_rows():
from openpyxl import Workbook
from openpyxl.utils.dataframe import dataframe_to_rows
wb = Workbook()
ws = wb.active # to rename this sheet: ws.title = "some_name"
# to create a new sheet: ws = wb.create_sheet("some_name")
for row in dataframe_to_rows(df, index=True, header=True):
ws.append(row) # appends this row after a previous one
wb.save("something.xlsx")
See Working with Pandas Dataframes and Tutorial.
Related
I’m really stuck on what should be an easy problem.
I have an excel workbook that I’m making an update to 2 Columns for one record for the clean_data sheet. From there, I’m saving and closing the file.
After that, I’m trying to pull in the updated roll up sheet values as a data frame (graphs_rolling) which has formulas utilizing the clean_data sheet.
When I view the data frame, all the values are Nan. I can open the exel file and see the updated values on the graphs_rolling sheet. What can I do to fix the data frame to populate with values?
Code is shown below:
import pandas as pd
import openpyxl
from openpyxl import load_workbook
#Import Data with Correct Rows and Columns for SSM Commercial
book = load_workbook('//CPI Projects//Test//SampleSSM//NewSSM.xlsx')
writer = pd.ExcelWriter('//CPI Projects//Test//SampleSSM//NewSSM.xlsx', engine = 'openpyxl')
writer.book = book
df1 = pd.read_excel('//CPI Projects//Test//SampleSSM//NewSSM.xlsx',sheet_name='clean_data')
df1.loc[df1['ev_id']==20201127, 'commercial_weight'] = 0 df1.loc[df1['ev_id']==20201127, 'commercial'] = 0
book.remove(book['clean_data'])
df1.to_excel(writer, sheet_name = 'clean_data',index=False)
writer.save()
writer.close()
df5 = pd.read_excel('//CPI Projects//Test//SampleSSM//NewSSM.xlsx',sheet_name='graphs_rolling_avg',skiprows=30)
print(df5)
I'm totally new to scripting and have been learning Python. I'm trying to copy an entire row of data from one Excel file to another. More specifically, I have a field called bound in my input excel spreadsheet. When this equals 5002, I'd like to copy that entire row to a sheet called 'bound_5002' in a new spreadsheet created by the Python script. My script works when I hardcode 5002 and bound_5002, but I have a list of about 30 of these unique bound codes that I'd like it to cycle through. I've tried to iterate through a list of the codes (shown below), but it creates an Excel file that is incorrect. Upon opening an error message appears
we found a problem with some content in data_recon_xlsx. Do you want us to try to recover as much as we can...
It has created new tabs with no data and the names ecovered_Sheet1 etc. Is my iterator wrong, missing something or can this function not work when iterating through a list?
Written the script without it iterating and it works when hardcoded in, but on trying to iterate through a list of codes it doesn't. I've tried printing out the fields being iterated, adding in a ' character either side (sheet_ref) or without commas.
Expected - an Excel file called 'data_recon.xlsx' with multiple tabs, containing the data for the corresponding bound field.
Actual - an Excel file with all the tabs created and headers as required, but missing the data that was required to be copied across. New sheets have been added but they are blank and have the names, 'Recovered_Sheet1', 'Recovered_Sheet2', etc.
### Create a list of the domain codes of interest
bounds = ['800', '3001', '3002', '3003', '3101', '3102', '3103', '3105', '3106', '3110', '3111', '3112', '5002', '5003', '5004', '5005', '5006', '5101', '5102', '5104', '5105', '5106', '5107', '5110', '9003', '9004', '9101', '9102', '9103', '9104', '9105', '9106']
### Copy out only the matching domains to the tabs
i = 0
ids = [(bounds[i])]
final_result = {}
while i <= 15:
with open(import_file_path_orig, 'r') as NN:
reader = csv.reader(NN)
next(reader)
for compid, dhid, length, gimp, to, bound, auppm, aucap in reader:
if bound in ids:
final_result.setdefault('compid', []).append(compid)
final_result.setdefault('dhid', []).append(dhid)
final_result.setdefault('length', []).append(length)
final_result.setdefault('gimp', []).append(gimp)
final_result.setdefault('to', []).append(to)
final_result.setdefault('bound', []).append(bound)
final_result.setdefault('auppm', []).append(auppm)
final_result.setdefault('aucap', []).append(aucap)
df = pd.DataFrame.from_dict(final_result)
### Paste the data matching the bound from dataframe to Excel sheet
book = load_workbook('data_recon.xlsx')
sheet_ref = ("'" + 'bound_'+ bounds[i] + "'")
sheet_name = (sheet_ref)
with pd.ExcelWriter('data_recon.xlsx', engine='openpyxl') as writer:
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
df.to_excel(writer, sheet_name=sheet_name, startrow = 1, startcol=0, header=False, index=False, engine='openpyxl')
writer.save()
print("bound_" + bounds[i] + " Sheet Populated")
### tests
print (sheet_ref)
print (bounds[i])
i += 1
print("DATA RECON FILE COMPLETE")
Below is showing an earlier version, without it iterating and works as required:
### Copy out only the matching domains to the tabs
ids = ['5101']
final_result = {}
with open('inout_file.csv', 'r') as NN:
reader = csv.reader(NN)
next(reader)
for compid, dhid, length, gimp, to, bound, auppm, aucap in reader:
if bound in ids:
final_result.setdefault('compid', []).append(compid)
final_result.setdefault('dhid', []).append(dhid)
final_result.setdefault('length', []).append(length)
final_result.setdefault('gimp', []).append(gimp)
final_result.setdefault('to', []).append(to)
final_result.setdefault('bound', []).append(bound)
final_result.setdefault('auppm', []).append(auppm)
final_result.setdefault('aucap', []).append(aucap)
df = pd.DataFrame.from_dict(final_result)
### Paste the data matching the bound from dataframe to Excel sheet
book = load_workbook('data_recon.xlsx')
sheet_name = 'bound_5101'
with pd.ExcelWriter('data_recon.xlsx', engine='openpyxl') as writer:
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
df.to_excel(writer, sheet_name=sheet_name, startrow = 1, startcol=0, header=False, index=False, engine='openpyxl')
print(sheet_name + " Sheet Populated")
I've updated this answer to show a much simpler version of what you have outlined above
In order to write multiple dataframes to a file on different sheets you need to do this outside of your loop once you have all the dataframes built.
# Import the csv file into a single datafrome
df = pd.read_csv(import_file_path_orig, columns=['compid', 'dhid', 'length', 'gimp', 'to', 'bound', 'auppm', 'aucap'])
# Creating a new sheet for each dataframe
# Open the proper filehandle
with pd.ExcelWriter('data_recon.xlsx', engine='openpyxl') as writer:
# .... If you have other stuff to do on the main sheet, do it here ....
# Now, we write a single sheet for each set of rows that include the 'bound' value
for b in bounds:
# Filter the dataset for those rows that match the current value of `b`
temp_df = df[df['bound']==b]
# Build the name of the sheet to be written
sheet_name = f'bound_{b}'
# Write the filtered values to a sheet in the current workbook
temp_df.to_excel(writer,sheet_name=sheet_name)
The issue in your current code is that once you write a workbook, it will be rewritten by the next time you try to write the workbook, you can't add sheets. docs
I'm new to Python (and programming in general) and am running into a problem when writing data out to sheets in Excel.
I'm reading in an Excel file, performing a sum calculation on specific columns, and then writing the results out to a new workbook. Then at the end, it creates two charts based on the results.
The code works, except every time I run it, it creates new sheets with numbers appended to the end. I really just want it to overwrite the sheet names I provide, instead of creating new ones.
I'm not familiar enough with all the modules to understand all the options that are available. I've researched openpyxl, and pandas, and similar examples to what I'm trying to do either aren't easy to find, or don't seem to work when I try them.
import pandas as pd
import xlrd
import openpyxl as op
from openpyxl import load_workbook
import matplotlib.pyplot as plt
# declare the input file
input_file = 'TestData.xlsx'
# declare the output_file name to be written to
output_file = 'TestData_Output.xlsx'
book = load_workbook(output_file)
writer = pd.ExcelWriter(output_file, engine='openpyxl')
writer.book = book
# read the source Excel file and calculate sums
excel_file = pd.read_excel(input_file)
num_events_main = excel_file.groupby(['Column1']).sum()
num_events_type = excel_file.groupby(['Column2']).sum()
# create dataframes and write names and sums out to new workbook/sheets
df_1 = pd.DataFrame(num_events_main)
df_2 = pd.DataFrame(num_events_type)
df_1.to_excel(writer, sheet_name = 'TestSheet1')
df_2.to_excel(writer, sheet_name = 'TestSheet2')
# save and close
writer.save()
writer.close()
# dataframe for the first sheet
df = pd.read_excel(output_file, sheet_name='TestSheet1')
values = df[['Column1', 'Column3']]
# dataframe for the second sheet
df = pd.read_excel(output_file, sheet_name='TestSheet2')
values_2 = df[['Column2', 'Column3']]
# create the graphs
events_graph = values.plot.bar(x = 'Column1', y = 'Column3', rot = 60) # rot = rotation
type_graph = values_2.plot.bar(x = 'Column2', y = 'Column3', rot = 60) # rot = rotation
plt.show()
I get the expected results, and the charts work fine. I'd really just like to get the sheets to overwrite with each run.
From the pd.DataFrame.to_excel documentation:
Multiple sheets may be written to by specifying unique sheet_name.
With all data written to the file it is necessary to save the changes.
Note that creating an ExcelWriter object with a file name that already
exists will result in the contents of the existing file being erased.
Try writing to the book like
import pandas as pd
df = pd.DataFrame({'col1':[1,2,3],'col2':[4,5,6]})
writer = pd.ExcelWriter('g.xlsx')
df.to_excel(writer, sheet_name = 'first_df')
df.to_excel(writer, sheet_name = 'second_df')
writer.save()
If you inspect the workbook, you will have two worksheets.
Then lets say you wanted to write new data to the same workbook:
writer = pd.ExcelWriter('g.xlsx')
df.to_excel(writer, sheet_name = 'new_df')
writer.save()
If you inspect the workbook now, you will just have one worksheet named new_df
If there are other worksheets in the excel file that you want to keep and just overwrite the desired worksheets, you would need to use load_workbook.
Before you wrtie any data, you could delete the sheets you want to write to with:
std=book.get_sheet_by_name(<sheee_name>)
book.remove_sheet(std)
That will stop the behavior where a number gets appended to the worksheet name once you attempt to write a workbook with a duplicate sheet name.
Using xlsxwriter, how do I insert a new row to an Excel worksheet? For instance, there is an existing data table at the cell range A1:G10 of the Excel worksheet, and I want to insert a row (A:A) to give it some space for the title of the report.
I looked through the documentation here http://xlsxwriter.readthedocs.io/worksheet.html, but couldn't find such method.
import xlsxwriter
# Create a workbook and add a worksheet.
workbook = xlsxwriter.Workbook('Expenses01.xlsx')
worksheet = workbook.add_worksheet()
worksheet.insert_row(1) # This method doesn't exist
December 2021, this is still not a possibility. You can get around this by doing some planning, and then writing your dataframe starting on different row. Building on the example from the xlsxwriter documentation:
df = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]})
writer = pd.ExcelWriter('my_excel_spreadsheet.xlsx', engine='xlsxwriter')
with writer as writer:
df.to_excel(writer, sheet_name='Sheet1', startrow = 4) # <<< notice the startrow here
And then, you can write to the earlier rows as mentioned in other comments:
workbook = writer.book
worksheet = writer.sheets['Sheet1']
worksheet.write(row, 0, 'Some Text') # <<< Then you can write to a different row
Not quite the insert() method we want, but better than nothing.
I have found that the planning involved in this process is not really ever something I can get around, even if I didn't have this problem. When I reach the stage where I am taking my data to excel, I have to do a little 'by hand' work in order to make the excel sheet pretty enough for human consumption, which is the whole point of moving things to excel. So, I don't look at the need to pre-plan my start rows as too much out of my way.
By using openpyxl you can insert iew rows and columns
import openpyxl
file = "xyz.xlsx"
#loading XL sheet bassed on file name provided by user
book = openpyxl.load_workbook(file)
#opening sheet whose index no is 0
sheet = book.worksheets[0]
#insert_rows(idx, amount=1) Insert row or rows before row==idx, amount will be no of
#rows you want to add and it's optional
sheet.insert_rows(13)
Hope this helps
Unfortunately this is not something xlsxwriter can do.
openpyxl is a good alternative to xlsxwriter, and if you are starting a new project do not use xlsxwriter.
Currently openpyxl can not insert rows, but here is an extension class for openpyxl that can.
openpyxl also allows reading of excel documents, which xlsxwriter does not.
You can try this
import xlsxwriter
wb = Workbook("name.xlsx")
ws = wb.add_worksheet("sheetname")
# Write a blank cell
ws.write_blank(0, 0, None, cell_format)
ws.write_blank('A2', None, cell_format)
Here is the official documentation:
Xlsxwriter worksheet.write_blank() method
Another alternative is to merge a few blank columns
ws.merge_range('A1:D1', "")
Otherwise you'll need to run a loop to write each blank cell
# Replace 1 for the row number you need
for c in range(0,10):
ws.write_blank(1, c, None, cell_format)
Inserting a row is equivalent to adding +1 to your row count. Technically there is no need for a "blank row" method and I'm pretty sure that's why it isn't there.
you should usewrite
read this: set_column(first_col, last_col, width, cell_format, options)
for example:
import xlsxwriter
workbook =xlsxwriter.Workbook('xD.xlsx')
worksheet = workbook.add_worksheet()
worksheet.write(row, col, 'First Name')
workbook.close()
I am very much unhappy with the answers. The library xlxsWriter tends to perform most of the operations easily.
To add a row in the existing worksheet , you can
wb.write_row(rowNumber,columnNumber,listToAdd)
I'd like for the code to run 12345 thru the loop, input it in a worksheet, then start on 54321 and do the same thing except input the dataframe into a new worksheet but in the same workbook. Below is my code.
workbook = xlsxwriter.Workbook('Renewals.xlsx')
groups = ['12345', '54321']
for x in groups:
(Do a bunch of data manipulation and get pandas df called renewals)
writer = pd.ExcelWriter('Renewals.xlsx', engine='xlsxwriter')
worksheet = workbook.add_worksheet(str(x))
renewals.to_excel(writer, sheet_name=str(x))
When this runs, I am left with a workbook with only 1 worksheet (54321).
try something like this:
import pandas as pd
#initialze the excel writer
writer = pd.ExcelWriter('MyFile.xlsx', engine='xlsxwriter')
#store your dataframes in a dict, where the key is the sheet name you want
frames = {'sheetName_1': dataframe1, 'sheetName_2': dataframe2,
'sheetName_3': dataframe3}
#now loop thru and put each on a specific sheet
for sheet, frame in frames.iteritems(): # .use .items for python 3.X
frame.to_excel(writer, sheet_name = sheet)
#critical last step
writer.save()
import pandas as pd
writer = pd.ExcelWriter('Renewals.xlsx', engine='xlsxwriter')
renewals.to_excel(writer, sheet_name=groups[0])
renewals.to_excel(writer, sheet_name=groups[1])
writer.save()
Building on the accepted answer, you can find situations where the sheet name will cause the save to fail if it has invalid characters or is too long. This could happen if you are using grouped values for the sheet name as an example. A helper function could address this and save you some pain.
def clean_sheet_name(sheet):
"""Clean sheet name so that it is a valid Excel sheet name.
Removes characters in []:*?/\ and limits to 30 characters.
Args:
sheet (str): Name to use for sheet.
Returns:
cleaned_sheet (str): Cleaned sheet name.
"""
if sheet in (None, ''):
return sheet
clean_sheet = sheet.translate({ord(i): None for i in '[]:*?/\\'})
if len(clean_sheet) > 30: # Set value you feel is appropriate
clean_sheet = clean_sheet[:30]
return clean_sheet
Then add a call to the helper function before writing to Excel.
for sheet, frame in groups.items():
# Clean sheet name for length and invalid characters
sheet = clean_sheet_name(sheet)
frame.to_excel(writer, sheet_name = sheet, index=False)
writer.save()