I have my dataframe ready to be written to an excel file but I need to add a single cell of string above it. How do I do that?
You can save the dataframe starting from the second row and then use other tools to write the first cell of your excel file.
Note that writing from pandas to excel overwrites its data, so we have to follow this order (but there are also methods how to write to an existing excel file without overwriting data).
1. Save the dataframe, specifying startrow=1:
df.to_excel("filename.xlsx", startrow=1, index=False)
2. Write a cell value.
For example, using openpyxl (from a GeeksforGeeks tutorial):
from openpyxl import load_workbook
# load excel file
workbook = load_workbook(filename="filename.xlsx")
# open workbook
sheet = workbook.active
# modify the desired cell
sheet["A1"] = "A60983A Register"
# save the file
workbook.save(filename="filename.xlsx")
import pandas as pd
df = pd.DataFrame({'label':['first','second','first','first','second','second'],
'first_text':['how is your day','the weather is nice','i am feeling well','i go to school','this is good','that is new'],
'second_text':['today is warm','this is cute','i am feeling sick','i go to work','math is hard','you are old'],
'third_text':['i am a student','the weather is cold','she is cute','ii am at home','this is bad','this is trendy']})
df.loc[-1] = df.columns.values
df.sort_index(inplace=True)
df.reset_index(drop=True, inplace=True)
df.rename(columns=
{"label": "Register", 'first_text':'', 'second_text':'', 'third_text':''},
inplace=True)
Try this MRE, so you can change your data as well.
Related
I have write some content to a xlsx file by using xlsxwriter
workbook = xlsxwriter.Workbook(file_name)
worksheet = workbook.add_worksheet()
worksheet.write(row, col, value)
worksheet.close()
I'd like to add a dataframe after the existing rows to this file by to_excel
df.to_excel(file_name,
startrow=len(existing_content),
engine='xlsxwriter')
However, this seems not work.The dataframe not inserted to the file. Anyone knows why?
Unfortunately, as the content above is not specifically written, let's take a look at to_excel and XlsxWriter as examples.
using xlsxwriter
import xlsxwriter
# Create a new Excel file and add a worksheet
workbook = xlsxwriter.Workbook('example.xlsx')
worksheet = workbook.add_worksheet()
# Add some data to the worksheet
worksheet.write('A1', 'Language')
worksheet.write('B1', 'Score')
worksheet.write('A2', 'Python')
worksheet.write('B2', 100)
worksheet.write('A3', 'Java')
worksheet.write('B3', 98)
worksheet.write('A4', 'Ruby')
worksheet.write('B4', 88)
# Save the file
workbook.close()
Using the above code, we have saved the table similar to the one below to an Excel file.
Language
Score
Python
100
Java
98
Ruby
88
Next, if we want to add rows using a dataframe.to_excel :
using to_excel
import pandas as pd
# Load an existing Excel file
existing_file = pd.read_excel('example.xlsx')
# Create a new DataFrame to append
df = pd.DataFrame({
'Language': ['C++', 'Javascript', 'C#'],
'Score': [78, 97, 67]
})
# Append the new DataFrame to the existing file
result = pd.concat([existing_file, df])
# Write the combined DataFrame to the existing file
result.to_excel('example.xlsx', index=False)
The reason for using pandas concat:
To append, it is necessary to use pandas.DataFrame.ExcelWriter(), but XlsxWriter does not support append mode in ExcelWriter
Although the task can be accomplished using pandas.DataFrame.append(), the append method is slated to be deleted in the future, so we use concat instead.
The OP is using xlsxwriter in the engine parameter. Per XlsxWriter documentation "XlsxWriter is designed only as a file writer. It cannot read or modify an existing Excel file." (link to XlsxWriter Docs).
Below I've provided a fully reproducible example of how you can go about modifying an existing .xlsx workbook using the openpyxl module (link to Openpyxl Docs).
For demonstration purposes, I'll first create create a workbook called test.xlsx using pandas:
import pandas as pd
df = pd.DataFrame({'Col_A': [1,2,3,4],
'Col_B': [5,6,7,8],
'Col_C': [0,0,0,0],
'Col_D': [13,14,15,16]})
df.to_excel('test.xlsx', index=False)
This is the Expected output at this point:
Using openpyxl you can use another dataset to load the existing workbook ('test.xlsx') and modify the third column with different data from the new dataframe while preserving the other existing data. In this example, for simplicity, I update it with a one column dataframe but you could extend it to update or add more data.
from openpyxl import load_workbook
import pandas as pd
df_new = pd.DataFrame({'Col_C': [9, 10, 11, 12]})
wb = load_workbook('test.xlsx')
ws = wb['Sheet1']
for index, row in df_new.iterrows():
cell = 'C%d' % (index + 2)
ws[cell] = row[0]
wb.save('test.xlsx')
With the Expected output at the end:
I am trying to take a workbook, loop through specific worksheets retrieve a dataframe, manipulate it and essentially paste the dataframe back in the same place without changing any of the other data / sheets in the document, this is what I am trying:
path= '<folder location>.xlsx'
wb = pd.ExcelFile(path)
for sht in ['sheet1','sheet2','sheet3']:
df= pd.read_excel(wb,sheet_name = sht, skiprows = 607,nrows = 11, usecols = range(2,15))
# here I manipulate the df, to then save it down in the same place
df.to_excel(wb,sheet_name = sht, startcol=3, startrow=607)
# Save down file
wb.save(path))
wb.close()
My solution so far will just save the first sheet down with ONLY the data that I manipulated, I lose all other sheets and data that was on the sheet that I want to stay, so I end up with just sheet1 with only the data I manipulated.
Would really appreciate any help, thank you
Try using an ExcelWriter instead of an ExcelFile:
path= 'folder location.xlsx'
with pd.ExcelWriter(path) as writer:
for sht in ['sheet1','sheet2','sheet3']:
df= pd.read_excel(wb,sheet_name = sht, skiprows = 607,nrows = 11, usecols = range(2,15))
####here I manipulate the df, to then save it down in the same place###
df.to_excel(writer,sheet_name = sht, startcol=3, startrow=607)
Although I am not sure how it will behave when the file already exists and you overwrite some of them. It might be easier to read everything in first, manipulate the required sheets and save to a new file.
I'm new to Python (and programming in general) and am running into a problem when writing data out to sheets in Excel.
I'm reading in an Excel file, performing a sum calculation on specific columns, and then writing the results out to a new workbook. Then at the end, it creates two charts based on the results.
The code works, except every time I run it, it creates new sheets with numbers appended to the end. I really just want it to overwrite the sheet names I provide, instead of creating new ones.
I'm not familiar enough with all the modules to understand all the options that are available. I've researched openpyxl, and pandas, and similar examples to what I'm trying to do either aren't easy to find, or don't seem to work when I try them.
import pandas as pd
import xlrd
import openpyxl as op
from openpyxl import load_workbook
import matplotlib.pyplot as plt
# declare the input file
input_file = 'TestData.xlsx'
# declare the output_file name to be written to
output_file = 'TestData_Output.xlsx'
book = load_workbook(output_file)
writer = pd.ExcelWriter(output_file, engine='openpyxl')
writer.book = book
# read the source Excel file and calculate sums
excel_file = pd.read_excel(input_file)
num_events_main = excel_file.groupby(['Column1']).sum()
num_events_type = excel_file.groupby(['Column2']).sum()
# create dataframes and write names and sums out to new workbook/sheets
df_1 = pd.DataFrame(num_events_main)
df_2 = pd.DataFrame(num_events_type)
df_1.to_excel(writer, sheet_name = 'TestSheet1')
df_2.to_excel(writer, sheet_name = 'TestSheet2')
# save and close
writer.save()
writer.close()
# dataframe for the first sheet
df = pd.read_excel(output_file, sheet_name='TestSheet1')
values = df[['Column1', 'Column3']]
# dataframe for the second sheet
df = pd.read_excel(output_file, sheet_name='TestSheet2')
values_2 = df[['Column2', 'Column3']]
# create the graphs
events_graph = values.plot.bar(x = 'Column1', y = 'Column3', rot = 60) # rot = rotation
type_graph = values_2.plot.bar(x = 'Column2', y = 'Column3', rot = 60) # rot = rotation
plt.show()
I get the expected results, and the charts work fine. I'd really just like to get the sheets to overwrite with each run.
From the pd.DataFrame.to_excel documentation:
Multiple sheets may be written to by specifying unique sheet_name.
With all data written to the file it is necessary to save the changes.
Note that creating an ExcelWriter object with a file name that already
exists will result in the contents of the existing file being erased.
Try writing to the book like
import pandas as pd
df = pd.DataFrame({'col1':[1,2,3],'col2':[4,5,6]})
writer = pd.ExcelWriter('g.xlsx')
df.to_excel(writer, sheet_name = 'first_df')
df.to_excel(writer, sheet_name = 'second_df')
writer.save()
If you inspect the workbook, you will have two worksheets.
Then lets say you wanted to write new data to the same workbook:
writer = pd.ExcelWriter('g.xlsx')
df.to_excel(writer, sheet_name = 'new_df')
writer.save()
If you inspect the workbook now, you will just have one worksheet named new_df
If there are other worksheets in the excel file that you want to keep and just overwrite the desired worksheets, you would need to use load_workbook.
Before you wrtie any data, you could delete the sheets you want to write to with:
std=book.get_sheet_by_name(<sheee_name>)
book.remove_sheet(std)
That will stop the behavior where a number gets appended to the worksheet name once you attempt to write a workbook with a duplicate sheet name.
I am trying to add an empty excel sheet into an existing Excel File using python xlsxwriter.
Setting the formula up as follows works well.
workbook = xlsxwriter.Workbook(file_name)
worksheet_cover = workbook.add_worksheet("Cover")
Output4 = workbook
Output4.close()
But once I try to add further sheets with dataframes into the Excel it overwrites the previous excel:
with pd.ExcelWriter('Luther_April_Output4.xlsx') as writer:
data_DifferingRates.to_excel(writer, sheet_name='Differing Rates')
data_DifferingMonthorYear.to_excel(writer, sheet_name='Differing Month or Year')
data_DoubleEntries.to_excel(writer, sheet_name='Double Entries')
How should I write the code, so that I can add empty sheets and existing data frames into an existing excel file.
Alternatively it would be helpful to answer how to switch engines, once I have produced the Excel file...
Thanks for any help!
If you're not forced use xlsxwriter try using openpyxl. Simply pass 'openpyxl' as the Engine for the pandas built-in ExcelWriter class. I had asked a question a while back on why this works. It is helpful code. It works well with the syntax of pd.to_excel() and it won't overwrite your already existing sheets.
from openpyxl import load_workbook
import pandas as pd
book = load_workbook(file_name)
writer = pd.ExcelWriter(file_name, engine='openpyxl')
writer.book = book
data_DifferingRates.to_excel(writer, sheet_name='Differing Rates')
data_DifferingMonthorYear.to_excel(writer, sheet_name='Differing Month or Year')
data_DoubleEntries.to_excel(writer, sheet_name='Double Entries')
writer.save()
You could use pandas.ExcelWriter with optional mode='a' argument for appending to existing Excel workbook.
You can also append to an existing Excel file:
>>> with ExcelWriter('path_to_file.xlsx', mode='a') as writer:`
... df.to_excel(writer, sheet_name='Sheet3')`
However unfortunately, this requires using a different engine, since as you observe the ExcelWriter does not support the optional mode='a' (append). If you try to pass this parameter to the constructor, it raises an error.
So you will need to use a different engine to do the append, like openpyxl. You'll need to ensure that the package is installed, otherwise you'll get a "Module Not Found" error. I have tested using openpyxl as the engine, and it is able to append new a worksheet to existing workbook:
with pd.ExcelWriter(engine='openpyxl', path='Luther_April_Output4.xlsx', mode='a') as writer:
data_DifferingRates.to_excel(writer, sheet_name='Differing Rates')
data_DifferingMonthorYear.to_excel(writer, sheet_name='Differing Month or Year')
data_DoubleEntries.to_excel(writer, sheet_name='Double Entries')
I think you need to write the data into a new file. This works for me:
# Write multiple tabs (sheets) into to a new file
import pandas as pd
from openpyxl import load_workbook
Work_PATH = r'C:\PythonTest'+'\\'
ar_source = Work_PATH + 'Test.xlsx'
Output_Wkbk = Work_PATH + 'New_Wkbk.xlsx'
# Need workbook from openpyxl load_workbook to enumerage tabs
# is there another way with only xlsxwriter?
workbook = load_workbook(filename=ar_source)
# Set sheet names in workbook as a series.
# You can also set the series manually tabs = ['sheet1', 'sheet2']
tabs = workbook.sheetnames
print ('\nWorkbook sheets: ',tabs,'\n')
# Replace this function with functions for what you need to do
def default_col_width (df, sheetname, writer):
# Note, this seems to use xlsxwriter as the default engine.
for column in df:
# map col width to col name. Ugh.
column_width = max(df[column].astype(str).map(len).max(), len(column))
# set special column widths
narrower_col = ['OS','URL'] #change to fit your workbook
if column in narrower_col: column_width = 10
if column_width >30: column_width = 30
if column == 'IP Address': column_width = 15 #change for your workbook
col_index = df.columns.get_loc(column)
writer.sheets[sheetname].set_column(col_index,col_index,column_width)
return
# Note nothing is returned. Writer.sheets is global.
with pd.ExcelWriter(Output_Wkbk,engine='xlsxwriter') as writer:
# Iterate throuth he series of sheetnames
for tab in tabs:
df1 = pd.read_excel(ar_source, tab).astype(str)
# I need to trim my input
df1.drop(list(df1)[23:],axis='columns', inplace=True, errors='ignore')
try:
# Set spreadsheet focus
df1.to_excel(writer, sheet_name=tab, index = False, na_rep=' ')
# Do something with the spreadsheet - Calling a function
default_col_width(df1, tab, writer)
except:
# Function call failed so just copy tab with no changes
df1.to_excel(writer, sheet_name=tab, index = False,na_rep=' ')
If I use the input file name as the output file name, it fails and erases the original. No need to save or close if you use With... it closes autmatically.
import pandas as pd
from pandas import ExcelWriter
trans=pd.read_csv('HMIS-DICR-2011-12-Manipur-Bishnupur.csv')
df=trans[["April 10-11","May 10-11","June 10-11","July 10-11","August 10-11","September 10-11","October 10-11","November 10-11","December 10-11","January 10-11","February 10-11","March 10-11","April 11-12","May 11-12","June 11-12","July 11-12","August 11-12","September 11-12","October 11-12","November 11-12","December 11-12","January 11-12","February 11-12","March 11-12"]]
writer1 = ExcelWriter('manipur1.xlsx')
df.to_excel(writer1,'Sheet1',index=False)
writer1.save()
this code successfully writes the data in a sheet 1 but how can append data of another data frame(df) from different excel file(mention below) into existing sheet(sheet1) "manipur1" excel file
for example:
my data frame is like:
trans=pd.read_csv('HMIS-DICR-2013-2014-Manipur-Bishnupur.csv')
df=trans[["April 12-13","May 12-13","June 12-13","July 12-13","August 12-13","September 12-13","October 12-13","November 12-13","December 12-13","January 12-13","February 12-13","March 12-13","April 13-14","May 13-14","June 13-14","July 13-14","August 13-14","September 13-14","October 13-14","November 13-14","December 13-14","January 13-14","February 13-14","March 13-14"]]
You can only append new data to an existing excel file while loading the existing data into pandas, appending the new data, and saving the concatenated data frame again.
To preserve existing sheets which are supposed to remain unchanged, you need to iterate over the entire workbook and handle each sheet. Sheets to be changed and appended are defined in the to_update dictionary.
# get data to be appended
trans=pd.read_csv('HMIS-DICR-2011-12-Manipur-Bishnupur.csv')
df_append = trans[["April 12-13","May 12-13","June 12-13","July 12-13","August 12-13","September 12-13","October 12-13","November 12-13","December 12-13","January 12-13","February 12-13","March 12-13","April 13-14","May 13-14","June 13-14","July 13-14","August 13-14","September 13-14","October 13-14","November 13-14","December 13-14","January 13-14","February 13-14","March 13-14"]]
# define what sheets to update
to_update = {"Sheet1": df_append}
# load existing data
file_name = 'manipur1.xlsx'
excel_reader = pd.ExcelFile(file_name)
# write and update
excel_writer = pd.ExcelWriter(file_name)
for sheet in excel_reader.sheet_names:
sheet_df = excel_reader.parse(sheet)
append_df = to_update.get(sheet)
if append_df is not None:
sheet_df = pd.concat([sheet_df, append_df], axis=1)
sheet_df.to_excel(excel_writer, sheet, index=False)
excel_writer.save()
However, any layouting/formatting in your existing excel will be lost. You can use openpyxl if you want to retain the formatting but this is more complicated.