Replacing data in xlsx sheet with pandas dataframe

Replacing data in xlsx sheet with pandas dataframe - python

I have an xlsx file with multiple tabs, one of them being Town_names that already has some data in it.
I'd like to overwrite that data with a dataframe - Town_namesDF - while keeping the rest of the xlsx tabs intact.
I've tried the following:
with pd.ExcelWriter(r'path/to/file.xlsx', engine='openpyxl', mode='a') as writer:
Town_namesDF.to_excel(writer,sheet_name='Town_names')
writer.save()
writer.close()
But it ends up creating a new tab Town_names1 instead of overwriting the Town_names tab. Am I missing something? Thanks.

Since you want to overwrite, but there is no direct option for that(like in julia's XLSX there is option for cell_ref). Simply delete the duplicate if it exists and then write.
with pd.ExcelWriter('/path/to/file.xlsx',engine = "openpyxl", mode='a') as writer:
workBook = writer.book
try:
workBook.remove(workBook['Town_names'])
except:
print("worksheet doesn't exist")
finally:
df.to_excel(writer, sheet_name='Town_names')
writer.save()

You could try this to store all of the other sheets temporarily and then add them back. I don't think this would save any formulas or formatting though.
Store_sheet1=pd.read_excel('path/to/file.xlsx',sheetname='Sheet1')
Store_sheet2=pd.read_excel('path/to/file.xlsx',sheetname='Sheet2')
Store_sheet3=pd.read_excel('path/to/file.xlsx',sheetname='Sheet3')
with pd.ExcelWriter(r'path/to/file.xlsx', engine='openpyxl', mode='a') as writer:
Town_namesDF.to_excel(writer,sheet_name='Town_names')
Store_sheet1.to_excel(writer,sheet_name='Sheet1')
Store_sheet2.to_excel(writer,sheet_name='Sheet2')
Store_sheet3.to_excel(writer,sheet_name='Sheet3')
writer.save()
writer.close()

Well, I've managed to do this. This is not a clean solution and not fast at all, but I've made use of openpyxl documentation for working with pandas found here: https://openpyxl.readthedocs.io/en/latest/pandas.html
I'm effectively selecting the Town_names sheet, clearing it with ws.delete_rows() and then appending each row of my dataframe to the sheet.
wb = openpyxl.load_workbook(r'path/to/file.xlsx')
ws = wb.get_sheet_by_name('Town_names')
ws.delete_rows(0, 1000)
wb.save(r'path/to/file.xlsx')
wb = openpyxl.load_workbook(r'path/to/file.xlsx')
activeSheet = wb.get_sheet_by_name('Town_names')
for r in dataframe_to_rows(Town_namesDF, index=False, header=True):
activeSheet.append(r)
for cell in activeSheet['A'] + activeSheet[1]:
cell.style = 'Pandas'
wb.save(r'path/to/file.xlsx')
A bit messy and I hope there's a better solution than mine, but this worked for me.

since pandas version 1.3.0. there is a new parameter: "if_sheet_exists"
{‘error’, ‘new’, ‘replace’}
pd.ExcelWriter(r'path/to/file.xlsx', engine='openpyxl', mode='a', if_sheet_exists='replace')

Hi you could use xlwings for that task. Here is an example.
import xlwings as xw
import pandas as pd
filename = "test.xlsx"
df = pd.read_excel(filename, "Town_names")
# Do your modifications of the worksheet here. For example, the following line "df * 2".
df = df * 2
app = xw.App(visible=False)
wb = xw.Book(filename)
ws = wb.sheets["Town_names"]
ws.clear()
ws["A1"].options(pd.DataFrame, header=1, index=False, expand='table').value = df
# If formatting of column names and index is needed as xlsxwriter does it, the following lines will do it.
ws["A1"].expand("right").api.Font.Bold = True
ws["A1"].expand("down").api.Font.Bold = True
ws["A1"].expand("right").api.Borders.Weight = 2
ws["A1"].expand("down").api.Borders.Weight = 2
wb.save(filename)
app.quit()

Related

Overwrite a sheet in excel using python

I'm trying to overwrite one sheet of my excel file with data from a .txt file. The excel file I'm bringing the data into has several sheets but I only want to overwrite the 'Previous Month' sheet. Every time I run this code and open the excel file only the previous month sheet is there and nothing else. Many solutions on here show how to add more sheets, I'm trying to update an already existing sheet in an excel with 8 sheets total.
How can I fix my code so that only the one sheet is edited but all of them stay there?
import pandas as pd
#importing previous month data#
writer = pd.ExcelWriter('file.xlsx')
df = pd.read_csv('file.txt', sep='\t')
df.to_excel(writer, sheet_name='Previous Month', startrow=4, startcol=2)
writer.save()
writer.close()
Edited code- whatever is happening here keeps corrupting my original file
import pandas as pd
import openpyxl
#importing previous month data#
writer= pd.ExcelWriter('file.xlsx', mode= 'a', engine="openpyxl", if_sheet_exists="replace")
df = pd.read_csv('file.txt', sep='\t')
df.to_excel(writer, sheet_name="Previous Month", startrow=2, startcol=4)
writer.save()
writer.close()

You can use openpyxl.load_workbook() to do what you are looking for. While I did try the above suggestions, it didn't work for me. the load_workbook() usually runs without issues. So, hope this works for you as well.
I open the output file using load_workbook(), deleted the existing sheet (Sheet2 here) if it exists, then create and write the data using create_sheet() and dataframe_to_rows (Ref). Let me know in case of questions/issues.
import pandas as pd
import openpyxl
df = pd.read_csv('file.txt', sep='\t')
wb=openpyxl.load_workbook('output.xlsx') # Open workbook
if "Sheet2" in wb.sheetnames: # If sheet exists, delete it
del wb['Sheet2']
ws = wb.create_sheet(title='Sheet2') # Create new sheet
from openpyxl.utils.dataframe import dataframe_to_rows
rows = dataframe_to_rows(df, index=False, header=True) # Write dataframe as rows
for r_idx, row in enumerate(rows, 1):
for c_idx, value in enumerate(row, 1):
ws.cell(row=r_idx+2, column=c_idx+4, value=value) #Add... the 2, 4 are the offset, similar to the startrow and startcol in your code
wb.save('output.xlsx')

How to Write Multiple Pandas Dataframes to Excel? (Current Method Corrupts .xlsx)

I am trying to write two Pandas dataframes to two different worksheets within the same workbook.
I am using openpyxl 3.0.7 and Pandas 1.2.3.
My workbook's name is 'test.xlsx', and there are two tabs inside: 'Tab1' and 'Tab2'.
Here is the code I am using:
import pandas as pd
from openpyxl import load_workbook
def export(df1, df2):
excelBook = load_workbook('test.xlsx')
with pd.ExcelWriter('test.xlsx', engine='openpyxl') as writer:
writer.book = excelBook
writer.sheets = dict((ws.title, ws) for ws in excelBook.worksheets)
df1.to_excel(writer, sheet_name = 'Tab1', index = False)
df2.to_excel(writer, sheet_name = 'Tab2', index = False)
writer.save()
df1 = pd.DataFrame(data = [1,2,3], columns = ['Numbers1'])
df2 = pd.DataFrame(data = [4,5,6], columns = ['Numbers2'])
export(df1, df2)
When running the above code, it executes without error. However, when I go to open test.xlsx in Excel, I get a warning telling me that: "We found a problem with some content in 'test.xlsx'. Do you want us to try to recover as much as we can? If you trust the source of this workbook, click Yes."
When I click "Yes", Excel fixes the issue and my two dataframes are populated on their proper tabs. I can then save the file as a new filename, and the file is no longer corrupted.
Any help is much appreciated!

Try to use one engine to open/write at one time:
import pandas as pd
def export(df1, df2):
with pd.ExcelWriter('test.xlsx', engine='openpyxl') as writer:
df1.to_excel(writer, sheet_name = 'Tab1', index = False)
df2.to_excel(writer, sheet_name = 'Tab2', index = False)
writer.save()

The solution to this question is to remove writer.save() from the script. In Pandas versions 1.1.5 and earlier, having this writer.save() did not cause file corruption. However, in versions 1.2.0 and later, this does cause file corruption. The official pandas docs do not show using writer.save after calling pd.ExcelWriter.

Pandas write to different sheet

I have the following code:
import pandas
data = pandas.DataFrame(dataset)
writer = pandas.ExcelWriter("C:/adhoc/test.xlsx", engine='xlsxwriter')
data.to_excel(writer, sheet_name='Test')
writer.save()
I have two sheets, Sheet1 and Test. When I run the code it is deleting Sheet1 and just writing the data onto Test. What am I doing wrong here? Expected output I want is to not write anything on Sheet1 and have the data written to Test.

You need to use append as the file mode in the ExcelWriter. But append does not supported with the xlsxwriter.
To append you need to specify the engine as openpyxl
This will write the data to the Test sheet and leave the Sheet1 as it is.
import pandas
file_path = "C:/adhoc/test.xlsx"
data = pandas.DataFrame(dataset)
writer = pandas.ExcelWriter(file_path, engine='openpyxl', mode='a')
data.to_excel(writer, sheet_name='Test')
writer.save()
Alternatively, you can use context manager here:
import pandas
file_path = "C:/adhoc/test.xlsx"
data = pandas.DataFrame(dataset)
with pandas.ExcelWriter(file_path, engine='openpyxl', mode='a') as writer:
data.to_excel(writer, sheet_name='Test')

Formatting integers with comma separator using openpyxl and to_excel

I am writing DataFrames to excel using to_excel(). I need to use openpyxl instead of XlsxWriter, I think, as the writer engine because I need to open existing Excel files and add sheets. Regardless, I'm deep into other formatting using openpyxl so I'm not keen on changing.
This writes the DataFrame, and formats the floats, but I can't figure out how to format the int dtypes.
import pandas as pd
from openpyxl import load_workbook
df = pd.DataFrame({'county':['Cnty1','Cnty2','Cnty3'], 'ints':[5245,70000,4123123], 'floats':[3.212, 4.543, 6.4555]})
fileName = "Maryland - test.xlsx"
book = load_workbook(fileName)
writer = pd.ExcelWriter(fileName, engine='openpyxl')
writer.book = book
df.to_excel(writer, sheet_name='Test', float_format='%.2f', header=False, index=False, startrow=3)
ws = writer.sheets['Test']
writer.save()
writer.close()
Tried using this, but I think it only works with XlsxWriter:
intFormat = book.add_format({'num_format': '#,###'})
ws.set_column('B:B', intFormat)
This type of thing could be used cell-by-cell with a loop, but there's A LOT of data:
ws['B2'].number_format = '#,###'

This can be fixed by using number_fomat from openpyxl.styles
from openpyxl.styles import numbers
def sth():
#This will output a number like: 2,000.00
cell.number_format = numbers.FORMAT_NUMBER_COMMA_SEPARATED1
Checkout this link for further reading thedocs

How to add an empty Worksheet into an existing Workbook using Pandas ExcelWriter

I am trying to add an empty excel sheet into an existing Excel File using python xlsxwriter.
Setting the formula up as follows works well.
workbook = xlsxwriter.Workbook(file_name)
worksheet_cover = workbook.add_worksheet("Cover")
Output4 = workbook
Output4.close()
But once I try to add further sheets with dataframes into the Excel it overwrites the previous excel:
with pd.ExcelWriter('Luther_April_Output4.xlsx') as writer:
data_DifferingRates.to_excel(writer, sheet_name='Differing Rates')
data_DifferingMonthorYear.to_excel(writer, sheet_name='Differing Month or Year')
data_DoubleEntries.to_excel(writer, sheet_name='Double Entries')
How should I write the code, so that I can add empty sheets and existing data frames into an existing excel file.
Alternatively it would be helpful to answer how to switch engines, once I have produced the Excel file...
Thanks for any help!

If you're not forced use xlsxwriter try using openpyxl. Simply pass 'openpyxl' as the Engine for the pandas built-in ExcelWriter class. I had asked a question a while back on why this works. It is helpful code. It works well with the syntax of pd.to_excel() and it won't overwrite your already existing sheets.
from openpyxl import load_workbook
import pandas as pd
book = load_workbook(file_name)
writer = pd.ExcelWriter(file_name, engine='openpyxl')
writer.book = book
data_DifferingRates.to_excel(writer, sheet_name='Differing Rates')
data_DifferingMonthorYear.to_excel(writer, sheet_name='Differing Month or Year')
data_DoubleEntries.to_excel(writer, sheet_name='Double Entries')
writer.save()

You could use pandas.ExcelWriter with optional mode='a' argument for appending to existing Excel workbook.
You can also append to an existing Excel file:
>>> with ExcelWriter('path_to_file.xlsx', mode='a') as writer:`
... df.to_excel(writer, sheet_name='Sheet3')`
However unfortunately, this requires using a different engine, since as you observe the ExcelWriter does not support the optional mode='a' (append). If you try to pass this parameter to the constructor, it raises an error.
So you will need to use a different engine to do the append, like openpyxl. You'll need to ensure that the package is installed, otherwise you'll get a "Module Not Found" error. I have tested using openpyxl as the engine, and it is able to append new a worksheet to existing workbook:
with pd.ExcelWriter(engine='openpyxl', path='Luther_April_Output4.xlsx', mode='a') as writer:
data_DifferingRates.to_excel(writer, sheet_name='Differing Rates')
data_DifferingMonthorYear.to_excel(writer, sheet_name='Differing Month or Year')
data_DoubleEntries.to_excel(writer, sheet_name='Double Entries')

I think you need to write the data into a new file. This works for me:
# Write multiple tabs (sheets) into to a new file
import pandas as pd
from openpyxl import load_workbook
Work_PATH = r'C:\PythonTest'+'\\'
ar_source = Work_PATH + 'Test.xlsx'
Output_Wkbk = Work_PATH + 'New_Wkbk.xlsx'
# Need workbook from openpyxl load_workbook to enumerage tabs
# is there another way with only xlsxwriter?
workbook = load_workbook(filename=ar_source)
# Set sheet names in workbook as a series.
# You can also set the series manually tabs = ['sheet1', 'sheet2']
tabs = workbook.sheetnames
print ('\nWorkbook sheets: ',tabs,'\n')
# Replace this function with functions for what you need to do
def default_col_width (df, sheetname, writer):
# Note, this seems to use xlsxwriter as the default engine.
for column in df:
# map col width to col name. Ugh.
column_width = max(df[column].astype(str).map(len).max(), len(column))
# set special column widths
narrower_col = ['OS','URL'] #change to fit your workbook
if column in narrower_col: column_width = 10
if column_width >30: column_width = 30
if column == 'IP Address': column_width = 15 #change for your workbook
col_index = df.columns.get_loc(column)
writer.sheets[sheetname].set_column(col_index,col_index,column_width)
return
# Note nothing is returned. Writer.sheets is global.
with pd.ExcelWriter(Output_Wkbk,engine='xlsxwriter') as writer:
# Iterate throuth he series of sheetnames
for tab in tabs:
df1 = pd.read_excel(ar_source, tab).astype(str)
# I need to trim my input
df1.drop(list(df1)[23:],axis='columns', inplace=True, errors='ignore')
try:
# Set spreadsheet focus
df1.to_excel(writer, sheet_name=tab, index = False, na_rep=' ')
# Do something with the spreadsheet - Calling a function
default_col_width(df1, tab, writer)
except:
# Function call failed so just copy tab with no changes
df1.to_excel(writer, sheet_name=tab, index = False,na_rep=' ')
If I use the input file name as the output file name, it fails and erases the original. No need to save or close if you use With... it closes autmatically.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Replacing data in xlsx sheet with pandas dataframe - python

since pandas version 1.3.0. there is a new parameter: "if_sheet_exists" {‘error’, ‘new’, ‘replace’} pd.ExcelWriter(r'path/to/file.xlsx', engine='openpyxl', mode='a', if_sheet_exists='replace')

Related

Overwrite a sheet in excel using python

How to Write Multiple Pandas Dataframes to Excel? (Current Method Corrupts .xlsx)

Pandas write to different sheet

Formatting integers with comma separator using openpyxl and to_excel

How to add an empty Worksheet into an existing Workbook using Pandas ExcelWriter

Categories

Resources