I am writing DataFrames to excel using to_excel(). I need to use openpyxl instead of XlsxWriter, I think, as the writer engine because I need to open existing Excel files and add sheets. Regardless, I'm deep into other formatting using openpyxl so I'm not keen on changing.
This writes the DataFrame, and formats the floats, but I can't figure out how to format the int dtypes.
import pandas as pd
from openpyxl import load_workbook
df = pd.DataFrame({'county':['Cnty1','Cnty2','Cnty3'], 'ints':[5245,70000,4123123], 'floats':[3.212, 4.543, 6.4555]})
fileName = "Maryland - test.xlsx"
book = load_workbook(fileName)
writer = pd.ExcelWriter(fileName, engine='openpyxl')
writer.book = book
df.to_excel(writer, sheet_name='Test', float_format='%.2f', header=False, index=False, startrow=3)
ws = writer.sheets['Test']
writer.save()
writer.close()
Tried using this, but I think it only works with XlsxWriter:
intFormat = book.add_format({'num_format': '#,###'})
ws.set_column('B:B', intFormat)
This type of thing could be used cell-by-cell with a loop, but there's A LOT of data:
ws['B2'].number_format = '#,###'
This can be fixed by using number_fomat from openpyxl.styles
from openpyxl.styles import numbers
def sth():
#This will output a number like: 2,000.00
cell.number_format = numbers.FORMAT_NUMBER_COMMA_SEPARATED1
Checkout this link for further reading thedocs
Related
According to the openpyxl documentation, it should be possible to change the format of entire columns at once. However, I just can't get it to work. Here is a minimum example, where I try to make the excel content bold:
import openpyxl
import sklearn.datasets
df = sklearn.datasets.load_iris(as_frame=True).data
filename = "iris.xlsx"
wb = openpyxl.Workbook(filename)
ws = wb.create_sheet("main")
for i in range(1, df.shape[0] + 1):
column_letter = openpyxl.utils.get_column_letter(i)
ws.column_dimensions[column_letter].font = openpyxl.styles.Font(bold=True)
wb.save(filename)
with pd.ExcelWriter(filename, engine='openpyxl', mode="a", if_sheet_exists="overlay") as writer:
df.to_excel(writer, "main")
wb.close()
How can I fix this to get a formatted excel data file? I am using Windows 11 with latest excel, Python 3.9, and Openpyxl 3.1.0.
I have the following code:
import pandas
data = pandas.DataFrame(dataset)
writer = pandas.ExcelWriter("C:/adhoc/test.xlsx", engine='xlsxwriter')
data.to_excel(writer, sheet_name='Test')
writer.save()
I have two sheets, Sheet1 and Test. When I run the code it is deleting Sheet1 and just writing the data onto Test. What am I doing wrong here? Expected output I want is to not write anything on Sheet1 and have the data written to Test.
You need to use append as the file mode in the ExcelWriter. But append does not supported with the xlsxwriter.
To append you need to specify the engine as openpyxl
This will write the data to the Test sheet and leave the Sheet1 as it is.
import pandas
file_path = "C:/adhoc/test.xlsx"
data = pandas.DataFrame(dataset)
writer = pandas.ExcelWriter(file_path, engine='openpyxl', mode='a')
data.to_excel(writer, sheet_name='Test')
writer.save()
Alternatively, you can use context manager here:
import pandas
file_path = "C:/adhoc/test.xlsx"
data = pandas.DataFrame(dataset)
with pandas.ExcelWriter(file_path, engine='openpyxl', mode='a') as writer:
data.to_excel(writer, sheet_name='Test')
i have to insert a database into excel with borders and all values in data frame should be centered i tried doing formatting to cells but does not work
df1.to_excel(writer,index=False,header=True,startrow=12,sheet_name='Sheet1')
writer.close()
writer=pd.ExcelWriter(s, engine="xlsxwriter")
writer.book = load_workbook(s)
workbooks= writer.book
worksheet = workbooks['Sheet1']
f1= workbooks.add_format()
worksheet.conditional_format(12,0,len(df1)+1,7,{'format':f1})
can u please help me with this
I'm not going to lie: I've done this for the first time right now, so this might not be a very good solution. I'm using openpyxl because it seems more flexible to me than XlsxWriter. I hope you can use it too.
My assumption is that the variable file_name contains a valid file name.
First your Pandas step:
with pd.ExcelWriter(file_name, engine='xlsxwriter') as writer:
df1.to_excel(writer, index=False, header=True, startrow=12, sheet_name='Sheet1')
Then the necessary imports from openpyxl:
from openpyxl import load_workbook
from openpyxl.styles import NamedStyle, Alignment, Border, Side
Loading the workbook and selecting the worksheet:
wb = load_workbook(file_name)
ws = wb['Sheet1']
Defining the required style:
centered_with_frame = NamedStyle('centered_with_frame')
centered_with_frame.alignment = Alignment(horizontal='center')
bd = Side(style='thin')
centered_with_frame.border = Border(left=bd, top=bd, right=bd, bottom=bd)
Selecting the relevant cells:
cells = ws[ws.cell(row=12+1, column=1).coordinate:
ws.cell(row=12+1+df1.shape[0], column=df1.shape[1]).coordinate]
Applying the defined style to the selected cells:
for row in cells:
for cell in row:
cell.style = centered_with_frame
Finally saving the workbook:
wb.save(file_name)
As I said: This might not be optimal.
I have an xlsx file with multiple tabs, one of them being Town_names that already has some data in it.
I'd like to overwrite that data with a dataframe - Town_namesDF - while keeping the rest of the xlsx tabs intact.
I've tried the following:
with pd.ExcelWriter(r'path/to/file.xlsx', engine='openpyxl', mode='a') as writer:
Town_namesDF.to_excel(writer,sheet_name='Town_names')
writer.save()
writer.close()
But it ends up creating a new tab Town_names1 instead of overwriting the Town_names tab. Am I missing something? Thanks.
Since you want to overwrite, but there is no direct option for that(like in julia's XLSX there is option for cell_ref). Simply delete the duplicate if it exists and then write.
with pd.ExcelWriter('/path/to/file.xlsx',engine = "openpyxl", mode='a') as writer:
workBook = writer.book
try:
workBook.remove(workBook['Town_names'])
except:
print("worksheet doesn't exist")
finally:
df.to_excel(writer, sheet_name='Town_names')
writer.save()
You could try this to store all of the other sheets temporarily and then add them back. I don't think this would save any formulas or formatting though.
Store_sheet1=pd.read_excel('path/to/file.xlsx',sheetname='Sheet1')
Store_sheet2=pd.read_excel('path/to/file.xlsx',sheetname='Sheet2')
Store_sheet3=pd.read_excel('path/to/file.xlsx',sheetname='Sheet3')
with pd.ExcelWriter(r'path/to/file.xlsx', engine='openpyxl', mode='a') as writer:
Town_namesDF.to_excel(writer,sheet_name='Town_names')
Store_sheet1.to_excel(writer,sheet_name='Sheet1')
Store_sheet2.to_excel(writer,sheet_name='Sheet2')
Store_sheet3.to_excel(writer,sheet_name='Sheet3')
writer.save()
writer.close()
Well, I've managed to do this. This is not a clean solution and not fast at all, but I've made use of openpyxl documentation for working with pandas found here: https://openpyxl.readthedocs.io/en/latest/pandas.html
I'm effectively selecting the Town_names sheet, clearing it with ws.delete_rows() and then appending each row of my dataframe to the sheet.
wb = openpyxl.load_workbook(r'path/to/file.xlsx')
ws = wb.get_sheet_by_name('Town_names')
ws.delete_rows(0, 1000)
wb.save(r'path/to/file.xlsx')
wb = openpyxl.load_workbook(r'path/to/file.xlsx')
activeSheet = wb.get_sheet_by_name('Town_names')
for r in dataframe_to_rows(Town_namesDF, index=False, header=True):
activeSheet.append(r)
for cell in activeSheet['A'] + activeSheet[1]:
cell.style = 'Pandas'
wb.save(r'path/to/file.xlsx')
A bit messy and I hope there's a better solution than mine, but this worked for me.
since pandas version 1.3.0. there is a new parameter: "if_sheet_exists"
{‘error’, ‘new’, ‘replace’}
pd.ExcelWriter(r'path/to/file.xlsx', engine='openpyxl', mode='a', if_sheet_exists='replace')
Hi you could use xlwings for that task. Here is an example.
import xlwings as xw
import pandas as pd
filename = "test.xlsx"
df = pd.read_excel(filename, "Town_names")
# Do your modifications of the worksheet here. For example, the following line "df * 2".
df = df * 2
app = xw.App(visible=False)
wb = xw.Book(filename)
ws = wb.sheets["Town_names"]
ws.clear()
ws["A1"].options(pd.DataFrame, header=1, index=False, expand='table').value = df
# If formatting of column names and index is needed as xlsxwriter does it, the following lines will do it.
ws["A1"].expand("right").api.Font.Bold = True
ws["A1"].expand("down").api.Font.Bold = True
ws["A1"].expand("right").api.Borders.Weight = 2
ws["A1"].expand("down").api.Borders.Weight = 2
wb.save(filename)
app.quit()
I'm in the midst of writing a iPython notebook that will pull the contents of a .csv file and paste them into a specified tab on an .xlsx file. The tab on the .xlsx is filled with a bunch of pre-programmed formulas so that I might run an analysis on the original content of the .csv file.
I've ran into a snag, however, with the the date fields that I copy over from the .csv into the .xlsx file.
The dates do not get properly processed by the Excel formulas unless I double-click the date cells or apply Excel's "text to columns" function on the column of dates and set a tab as the delimiter (which I should note, does not split the cell).
I'm wondering if there's a way to either...
write a helper function that logs the keystrokes of applying the "text to columns" function call
write a helper function to double click and return down each row of the column of dates
from openpyxl import load_workbook
import pandas as pd
def transfer_hours(report_name, ER_hours_analysis_wb):
df = pd.read_csv(report_name, index_col=0)
book = load_workbook(ER_hours_analysis_wb)
sheet_name = "ER Work Log"
with pd.ExcelWriter("ER Hours Analysis 248112.xlsx",
engine='openpyxl') as writer:
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
df.to_excel(writer, sheet_name=sheet_name,
startrow=1, startcol=0, engine='openpyxl')
Use the xlsx module
import xlsx
load_workbook ( filen = (filePath, read_only=False, data_only=False )
Setting data_only to False will return the formulas whereas data_only=True returns the non-formula values.
As great a tool as pandas is designed to be, in this case there may not be a reason to include.
Here is a shorter structure for what you're trying to accomplish:
import csv
import datetime
from openpyxl import load_workbook
def transfer_hours(report_name, ER_hours_analysis_wb):
wb = load_workbook(ER_hours_analysis_wb)
ws = wb['ER Work Log']
csvfile = open(report_name, 'rt')
reader = csv.reader(csvfile,delimiter=',')
#iterators
rownum = 0
colnum = 0
for row in reader:
for col in row:
dttm = datetime.datetime.strptime(col, "%m/%d/%Y")
ws.cell(column=colnum,row=rownum).value = dttm
wb.save('new_spreadsheet.xlsx')
What you'll be able to do from here is break out which columns should have what format based on the position in the csv. Here is an example:
for row in reader:
ws.cell(column=0,row=rownum,value=row[0])
dttm = datetime.datetime.strptime(row[1], "%m/%d/%Y")
ws.cell(column=1,row=rownum).value = dttm
For reference:
https://openpyxl.readthedocs.io/en/stable/usage.html
In Python, how do I read a file line-by-line into a list?
How to format columns with headers using OpenPyXL