I found to different methods to write dataframes and formulas to an excel file.
import pandas as pd
import numpy as np
import xlsxwriter
# Method 1
writer = pd.ExcelWriter('example.xlsx', engine='xlsxwriter')
A = pd.DataFrame(np.array([[1,2,3],[4,5,6],[7,8,9]]))
A.to_excel(writer , sheet_name='Sheet1')
writer.save()
# Method 2
workbook = xlsxwriter.Workbook('example.xlsx')
worksheet = workbook.add_worksheet()
worksheet.write_formula('B5' , '=_xlfn.STDEV.S(B3:B5)')
workbook.close()
One works for adding the dataframe the other for the formula. Problem: Method 2 deletes what was written to the file with Method 1. How can I combine them?
Here is one way to do it if you want to combine the two actions into one:
import pandas as pd
import numpy as np
import xlsxwriter
# Method 1
writer = pd.ExcelWriter('example.xlsx', engine='xlsxwriter')
A = pd.DataFrame(np.array([[1,2,3],[4,5,6],[7,8,9]]))
A.to_excel(writer , sheet_name='Sheet1')
# Get the xlsxwriter objects from the dataframe writer object.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Write the formula.
worksheet.write_formula('B5' , '=_xlfn.STDEV.S(B2:B4)')
# Or Create a new worksheet and add the formula there.
worksheet = workbook.add_worksheet()
worksheet.write_formula('B5' , '=_xlfn.STDEV.S(Sheet1!B2:B4)')
writer.save()
Output:
See the Working with Python Pandas and XlsxWriter section of the XlsxWriter docs.
Note, I corrected the range of the formula to avoid a circular reference.
Related
I have dataframes as output and I need to export to excel file. I can use pandas for the task but I need the output to be the worksheet from right to left direction. I have searched and didn't find any clue regarding using the pandas to change the direction .. I have found the package xlsxwriter do that
import xlsxwriter
workbook = xlsxwriter.Workbook('output.xlsx')
worksheet1 = workbook.add_worksheet()
format_right_to_left = workbook.add_format({'reading_order': 2})
worksheet1.set_column('A:A', 20)
worksheet1.right_to_left()
worksheet1.write(new_df)
workbook.close()
But I don't know how to export the dataframe using this approach ..
snapshot to clarify the directions:
** I have used multiple lines as for format point
myformat = workbook.add_format()
myformat.set_reading_order(2)
myformat.set_align('center')
myformat.set_align('vcenter')
Is it possible to make such lines shorter using dictionary ..for example?
You can do this:
import xlsxwriter
writer = pd.ExcelWriter('pandas_excel.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1') # Assuming you already have a `df`
workbook = writer.book
worksheet = writer.sheets['Sheet1']
format_right_to_left = workbook.add_format({'reading_order': 2})
worksheet.right_to_left()
writer.save()
I'm using to_excel to write multiple DataFrames to multiple Excel documents. This works fine except that the index of the Dataframes is appended in bold with a border around each cell (see image).
The following code is a simplification of the code I use but has the same problem:
import pandas as pd
from openpyxl import load_workbook
df = pd.DataFrame(np.random.randint(50,60, size=(20, 3)))
xls_loc = r'test_doc.xlsx'
wb = load_workbook(xls_loc)
writer = pd.ExcelWriter(xls_loc, engine='openpyxl')
writer.book = wb
df.to_excel(writer, sheet_name='test sheet',index=True,startrow=1,startcol=1, header=False)
writer.save()
writer.close()
Is there a way to append the index without making the index bold and add borders?
Make the index a new column and then set index=False in to_excel()
df.insert(0, 'index', df.index)
You could insert the dataframe using xlwings to avoid formatting:
import pandas as pd
import xlwings as xw
df = pd._testing.makeDataFrame()
with xw.App(visible=False) as app:
wb = xw.Book()
wb.sheets[0]["A1"].value = df
wb.save("test.xlsx")
wb.close()
import pandas as pd
data = [11,12,13,14,15]
df = pd.DataFrame(data)
wb = pd.ExcelWriter('FileName.xlsx', engine='xlsxwriter')
df.style.set_properties(**{'text-align': 'center'}).to_excel(wb, sheet_name='sheet_01',index=False,header=None)
wb.save()
In to_excel() method index=False & header=None is the main trick
I am writing DataFrames to excel using to_excel(). I need to use openpyxl instead of XlsxWriter, I think, as the writer engine because I need to open existing Excel files and add sheets. Regardless, I'm deep into other formatting using openpyxl so I'm not keen on changing.
This writes the DataFrame, and formats the floats, but I can't figure out how to format the int dtypes.
import pandas as pd
from openpyxl import load_workbook
df = pd.DataFrame({'county':['Cnty1','Cnty2','Cnty3'], 'ints':[5245,70000,4123123], 'floats':[3.212, 4.543, 6.4555]})
fileName = "Maryland - test.xlsx"
book = load_workbook(fileName)
writer = pd.ExcelWriter(fileName, engine='openpyxl')
writer.book = book
df.to_excel(writer, sheet_name='Test', float_format='%.2f', header=False, index=False, startrow=3)
ws = writer.sheets['Test']
writer.save()
writer.close()
Tried using this, but I think it only works with XlsxWriter:
intFormat = book.add_format({'num_format': '#,###'})
ws.set_column('B:B', intFormat)
This type of thing could be used cell-by-cell with a loop, but there's A LOT of data:
ws['B2'].number_format = '#,###'
This can be fixed by using number_fomat from openpyxl.styles
from openpyxl.styles import numbers
def sth():
#This will output a number like: 2,000.00
cell.number_format = numbers.FORMAT_NUMBER_COMMA_SEPARATED1
Checkout this link for further reading thedocs
I'm trying to print a green, bold font into an excel spreadsheet.
I can print this in jupyter notebook without a problem, but this is what I get in the spreadsheet: [1m[92mHello
import pandas as pd
import numpy as np
writer = pd.ExcelWriter('out.xlsx')
pd.DataFrame([1,'\033[1m' + '\033[92m'+ 'Hello',3]).to_excel(writer, sheet_name= 'sheet1')
writer.save()
You can use the ExcelWriter classes and methods to do many things in the workbook/worksheet. To do what you are intending to do, do the following.
import pandas as pd
import numpy as np
writer = pd.ExcelWriter('out.xlsx', engine='xlsxwriter')
pd.DataFrame([1,'Hello',3]).to_excel(writer, sheet_name= 'sheet1')
worksheet = writer.sheets['sheet1']
workbook = writer.book
cell_format = workbook.add_format({'bold':True, 'font_color': 'green'})
worksheet.set_row(2,None,cell_format)
writer.save()
Documentation of ExcelWriter
Also, If you are trying to change the format of the header, you have to reset the header style first. Put the following before defining the writer
pd.io.formats.excel.header_style = None
pandas cannot achieve that.
You can use openpyxl to do so.
So what you have to do is export your data into excel using pandas, and then load the workbook using openpyxl and handle the coloring and other visualisation aspects from there.
from openpyxl.styles import colors
from openpyxl.styles import Font, Color
from openpyxl import Workbook
wb = load_workbook('yourworkbookname.xlsx')
ws = wb.active
a1 = ws['A1']
d4 = ws['D4']
ft = Font(color=colors.GREEN, bold=True)
a1.font = ft
d4.font = ft
wb.save()
For more documentation into openpyxl, visit here
I have been searching this question to write in an existing excel sheet starting from specific row and column however methods like dataframe_to_rows is not writing from a specific position in a cell.
I am now using a custom loop to write this however was wondering if there is a better approach.
The loops works like this
import pandas as pd
import numpy as np
from openpyxl import load_workbook
from openpyxl.utils import get_column_letter
df = pd.DataFrame(np.random.randn(20, 4), columns=list('ABCD'))
file = "C:\\somepath\\some_existing_file.xlsx"
wb = load_workbook(filename=file, read_only=False)
ws = wb['some_existing_sheet']
##Fill up the row and column needed
stcol = 5
strow = 5
## Writing the column header
for c in range(0,len(df.columns)):
ws[get_column_letter(c+stcol)+str(strow)].value = df.columns[c]
## Writing the data
for r in range(0,len(df)):
for c in range(0,len(df.columns)):
ws[get_column_letter(c+stcol)+str(strow+r+1)].value = df.iloc[r][c]
wb.save(file)
Please let me know if there is a better way to write to specefic position in a cell. By any chance if this turns out to be duplicate question, happy to merge in the original thread.
I do have another approach however with xlsx writer but this removes all other data from existing sheet
import win32com.client as win32
excel = win32.gencache.EnsureDispatch('Excel.Application') # opens Excel
writer = pd.ExcelWriter(file', engine='xlsxwriter')
df.to_excel(writer, sheet_name='abc', startrow=5, startcol=5,index=False)
writer.save()
Instead of
ws[get_column_letter(c+stcol)+str(strow)]
you can use
ws.cell(column=c+stcol, row=strow)