Pandas write to different sheet - python

I have the following code:
import pandas
data = pandas.DataFrame(dataset)
writer = pandas.ExcelWriter("C:/adhoc/test.xlsx", engine='xlsxwriter')
data.to_excel(writer, sheet_name='Test')
writer.save()
I have two sheets, Sheet1 and Test. When I run the code it is deleting Sheet1 and just writing the data onto Test. What am I doing wrong here? Expected output I want is to not write anything on Sheet1 and have the data written to Test.

You need to use append as the file mode in the ExcelWriter. But append does not supported with the xlsxwriter.
To append you need to specify the engine as openpyxl
This will write the data to the Test sheet and leave the Sheet1 as it is.
import pandas
file_path = "C:/adhoc/test.xlsx"
data = pandas.DataFrame(dataset)
writer = pandas.ExcelWriter(file_path, engine='openpyxl', mode='a')
data.to_excel(writer, sheet_name='Test')
writer.save()
Alternatively, you can use context manager here:
import pandas
file_path = "C:/adhoc/test.xlsx"
data = pandas.DataFrame(dataset)
with pandas.ExcelWriter(file_path, engine='openpyxl', mode='a') as writer:
data.to_excel(writer, sheet_name='Test')

Related

Appending to a sheet in Excel creates a new sheet instead of appending

I am trying to use this code to append a dataframe to an existing sheet in Excel, but instead of appending the new data to it, it creates a new sheet. Here is the code:
import pandas as pd
import openpyxl as op
df = ['normal_dataframe']
with pd.ExcelWriter('test.xlsx', engine='openpyxl', mode='a') as writer:
df.to_excel(writer, sheet_name='Sheet1', header=False, index=False)
'test.xlsx' has a 'Sheet1', but when the file is appended, theres 2 sheets. 'Sheet1' and 'Sheet11'.
One approach with COM:
import win32com.client
xl = win32com.client.Dispatch("Excel.Application")
path = r'c:\Users\Alex20\Documents\test.xlsx'
wb = xl.Workbooks.Open(path)
ws = wb.Worksheets("Sheet1")
ws.Range("E9:F10").Value = [[9,9],[10,10]]
wb.Close(True)
xl.Quit()

Pandas to_excel with sheets being hidden, or efficiently hiding excel tabs?

Creating and saving df:
df = pd.DataFrame({'Attribute1': ['A', 'B'], 'Attribute2': ['X', 'Y']})
filename = 'test.xlsx'
writer = pd.ExcelWriter(filename, engine='xlsxwriter')
with writer:
df.to_excel(writer, sheet_name='df')
df.to_excel(writer, sheet_name='2nd tab')
I'd like to add something like hidden=True to a sheet condition but pandas doesn't seem to support this.
We can do this manually with openpyxl, but it's prohibitively slow for large files:
from openpyxl import load_workbook
wb = load_workbook(filename)
wb["df"].sheet_state='hidden'
wb.save(filename)
Clearly the best solution is to not use Excel, but that's the requirement. What's a practical way to do this?
You need to get a handle to the XlsxWriter worksheet object to hide it. Like this:
import pandas as pd
df = pd.DataFrame({'Attribute1': ['A', 'B'],
'Attribute2': ['X', 'Y']})
filename = 'test.xlsx'
writer = pd.ExcelWriter(filename, engine='xlsxwriter')
with writer:
df.to_excel(writer, sheet_name='df')
df.to_excel(writer, sheet_name='2nd tab')
# Get a handle to the worksheet.
worksheet = writer.sheets['2nd tab']
worksheet.hide()
Output:
If you want to hide the first worksheet you need to do a bit of extra work since Excel doesn't allow you to hide the "active" worksheet, which is generally the first worksheet. In that case you also need to "activate" another worksheet. Like this:
with writer:
df.to_excel(writer, sheet_name='df')
df.to_excel(writer, sheet_name='2nd tab')
worksheet = writer.sheets['df']
worksheet.hide()
worksheet = writer.sheets['2nd tab']
worksheet.activate()
Output:
For more details see the Working with Python Pandas and XlsxWriter
Also note, the datetime_format parameter in your example is incorrect. That is a property of the writer class and not the to_excel() method. I left it out of the example above.

Creating a Master excel file from dynamic CSV output using Python

I am trying to create a repository "Master" excel file from a CSV which will be generated and overwritten every couple of hours. The code below creates a new excel file and writes the content from "combo1.csv" to "master.xlsx". However, whenever the combo1 file is updated, the code basically overwrites the contents in the "master.xlsx" file. I need to append the contents from "combo1" to "Master" without the headers being inserted every time. Can someone help me with this?
import pandas as pd
writer = pd.ExcelWriter('master.xlsx', engine='xlsxwriter')
df = pd.read_csv('combo1.csv')
df.to_excel(writer, sheet_name='sheetname')
writer.save()
Refer to Append Data at the End of an Excel Sheet section in this medium article:
Using Python Pandas with Excel Sheets
(Credit to Nensi Trambadiya for the article)
Basically you'll have to first read the Excel file and find the number of rows before pushing the new data.
reader = pd.read_excel(r'master.xlsx')
df.to_excel(writer,index=False,header=False,startrow=len(reader)+1)
First read the excel file and then need to perform below method to append the rows.
import pandas as pd
from xlsxwriter import load_workbook
df = pd.DataFrame({'Name': ['abc','def','xyz','ysv'],
'Age': [08,45,32,26]})
writer = pd.ExcelWriter('master.xlsx', engine='xlsxwriter')
writer.book = load_workbook('Master.xlsx')
writer.sheets = dict((ws.title, ws) for ws in writer.book.worksheets)
reader = pd.read_excel(r'master.xlsx')
df.to_excel(writer,index=False,header=False,startrow=len(reader)+1)
writer.close()
import pandas as pd
from openpyxl import load_workbook
# new dataframe with same columns
df = pd.read_csv('combo.csv')
writer = pd.ExcelWriter('master.xlsx', engine='openpyxl')
# try to open an existing workbook
writer.book = load_workbook('master.xlsx')
# copy existing sheets
writer.sheets = dict((ws.title, ws) for ws in writer.book.worksheets)
# read existing file
reader = pd.read_excel(r'master.xlsx')
# write out the new sheet
df.to_excel(writer, index=False, header=False, startrow=len(reader) + 1)
writer.close()
Note that a Master has to be created before running the script

Replacing data in xlsx sheet with pandas dataframe

I have an xlsx file with multiple tabs, one of them being Town_names that already has some data in it.
I'd like to overwrite that data with a dataframe - Town_namesDF - while keeping the rest of the xlsx tabs intact.
I've tried the following:
with pd.ExcelWriter(r'path/to/file.xlsx', engine='openpyxl', mode='a') as writer:
Town_namesDF.to_excel(writer,sheet_name='Town_names')
writer.save()
writer.close()
But it ends up creating a new tab Town_names1 instead of overwriting the Town_names tab. Am I missing something? Thanks.
Since you want to overwrite, but there is no direct option for that(like in julia's XLSX there is option for cell_ref). Simply delete the duplicate if it exists and then write.
with pd.ExcelWriter('/path/to/file.xlsx',engine = "openpyxl", mode='a') as writer:
workBook = writer.book
try:
workBook.remove(workBook['Town_names'])
except:
print("worksheet doesn't exist")
finally:
df.to_excel(writer, sheet_name='Town_names')
writer.save()
You could try this to store all of the other sheets temporarily and then add them back. I don't think this would save any formulas or formatting though.
Store_sheet1=pd.read_excel('path/to/file.xlsx',sheetname='Sheet1')
Store_sheet2=pd.read_excel('path/to/file.xlsx',sheetname='Sheet2')
Store_sheet3=pd.read_excel('path/to/file.xlsx',sheetname='Sheet3')
with pd.ExcelWriter(r'path/to/file.xlsx', engine='openpyxl', mode='a') as writer:
Town_namesDF.to_excel(writer,sheet_name='Town_names')
Store_sheet1.to_excel(writer,sheet_name='Sheet1')
Store_sheet2.to_excel(writer,sheet_name='Sheet2')
Store_sheet3.to_excel(writer,sheet_name='Sheet3')
writer.save()
writer.close()
Well, I've managed to do this. This is not a clean solution and not fast at all, but I've made use of openpyxl documentation for working with pandas found here: https://openpyxl.readthedocs.io/en/latest/pandas.html
I'm effectively selecting the Town_names sheet, clearing it with ws.delete_rows() and then appending each row of my dataframe to the sheet.
wb = openpyxl.load_workbook(r'path/to/file.xlsx')
ws = wb.get_sheet_by_name('Town_names')
ws.delete_rows(0, 1000)
wb.save(r'path/to/file.xlsx')
wb = openpyxl.load_workbook(r'path/to/file.xlsx')
activeSheet = wb.get_sheet_by_name('Town_names')
for r in dataframe_to_rows(Town_namesDF, index=False, header=True):
activeSheet.append(r)
for cell in activeSheet['A'] + activeSheet[1]:
cell.style = 'Pandas'
wb.save(r'path/to/file.xlsx')
A bit messy and I hope there's a better solution than mine, but this worked for me.
since pandas version 1.3.0. there is a new parameter: "if_sheet_exists"
{‘error’, ‘new’, ‘replace’}
pd.ExcelWriter(r'path/to/file.xlsx', engine='openpyxl', mode='a', if_sheet_exists='replace')
Hi you could use xlwings for that task. Here is an example.
import xlwings as xw
import pandas as pd
filename = "test.xlsx"
df = pd.read_excel(filename, "Town_names")
# Do your modifications of the worksheet here. For example, the following line "df * 2".
df = df * 2
app = xw.App(visible=False)
wb = xw.Book(filename)
ws = wb.sheets["Town_names"]
ws.clear()
ws["A1"].options(pd.DataFrame, header=1, index=False, expand='table').value = df
# If formatting of column names and index is needed as xlsxwriter does it, the following lines will do it.
ws["A1"].expand("right").api.Font.Bold = True
ws["A1"].expand("down").api.Font.Bold = True
ws["A1"].expand("right").api.Borders.Weight = 2
ws["A1"].expand("down").api.Borders.Weight = 2
wb.save(filename)
app.quit()

Formatting integers with comma separator using openpyxl and to_excel

I am writing DataFrames to excel using to_excel(). I need to use openpyxl instead of XlsxWriter, I think, as the writer engine because I need to open existing Excel files and add sheets. Regardless, I'm deep into other formatting using openpyxl so I'm not keen on changing.
This writes the DataFrame, and formats the floats, but I can't figure out how to format the int dtypes.
import pandas as pd
from openpyxl import load_workbook
df = pd.DataFrame({'county':['Cnty1','Cnty2','Cnty3'], 'ints':[5245,70000,4123123], 'floats':[3.212, 4.543, 6.4555]})
fileName = "Maryland - test.xlsx"
book = load_workbook(fileName)
writer = pd.ExcelWriter(fileName, engine='openpyxl')
writer.book = book
df.to_excel(writer, sheet_name='Test', float_format='%.2f', header=False, index=False, startrow=3)
ws = writer.sheets['Test']
writer.save()
writer.close()
Tried using this, but I think it only works with XlsxWriter:
intFormat = book.add_format({'num_format': '#,###'})
ws.set_column('B:B', intFormat)
This type of thing could be used cell-by-cell with a loop, but there's A LOT of data:
ws['B2'].number_format = '#,###'
This can be fixed by using number_fomat from openpyxl.styles
from openpyxl.styles import numbers
def sth():
#This will output a number like: 2,000.00
cell.number_format = numbers.FORMAT_NUMBER_COMMA_SEPARATED1
Checkout this link for further reading thedocs

Categories

Resources