I have dataframes as output and I need to export to excel file. I can use pandas for the task but I need the output to be the worksheet from right to left direction. I have searched and didn't find any clue regarding using the pandas to change the direction .. I have found the package xlsxwriter do that
import xlsxwriter
workbook = xlsxwriter.Workbook('output.xlsx')
worksheet1 = workbook.add_worksheet()
format_right_to_left = workbook.add_format({'reading_order': 2})
worksheet1.set_column('A:A', 20)
worksheet1.right_to_left()
worksheet1.write(new_df)
workbook.close()
But I don't know how to export the dataframe using this approach ..
snapshot to clarify the directions:
** I have used multiple lines as for format point
myformat = workbook.add_format()
myformat.set_reading_order(2)
myformat.set_align('center')
myformat.set_align('vcenter')
Is it possible to make such lines shorter using dictionary ..for example?
You can do this:
import xlsxwriter
writer = pd.ExcelWriter('pandas_excel.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1') # Assuming you already have a `df`
workbook = writer.book
worksheet = writer.sheets['Sheet1']
format_right_to_left = workbook.add_format({'reading_order': 2})
worksheet.right_to_left()
writer.save()
Related
i have to insert a database into excel with borders and all values in data frame should be centered i tried doing formatting to cells but does not work
df1.to_excel(writer,index=False,header=True,startrow=12,sheet_name='Sheet1')
writer.close()
writer=pd.ExcelWriter(s, engine="xlsxwriter")
writer.book = load_workbook(s)
workbooks= writer.book
worksheet = workbooks['Sheet1']
f1= workbooks.add_format()
worksheet.conditional_format(12,0,len(df1)+1,7,{'format':f1})
can u please help me with this
I'm not going to lie: I've done this for the first time right now, so this might not be a very good solution. I'm using openpyxl because it seems more flexible to me than XlsxWriter. I hope you can use it too.
My assumption is that the variable file_name contains a valid file name.
First your Pandas step:
with pd.ExcelWriter(file_name, engine='xlsxwriter') as writer:
df1.to_excel(writer, index=False, header=True, startrow=12, sheet_name='Sheet1')
Then the necessary imports from openpyxl:
from openpyxl import load_workbook
from openpyxl.styles import NamedStyle, Alignment, Border, Side
Loading the workbook and selecting the worksheet:
wb = load_workbook(file_name)
ws = wb['Sheet1']
Defining the required style:
centered_with_frame = NamedStyle('centered_with_frame')
centered_with_frame.alignment = Alignment(horizontal='center')
bd = Side(style='thin')
centered_with_frame.border = Border(left=bd, top=bd, right=bd, bottom=bd)
Selecting the relevant cells:
cells = ws[ws.cell(row=12+1, column=1).coordinate:
ws.cell(row=12+1+df1.shape[0], column=df1.shape[1]).coordinate]
Applying the defined style to the selected cells:
for row in cells:
for cell in row:
cell.style = centered_with_frame
Finally saving the workbook:
wb.save(file_name)
As I said: This might not be optimal.
I found to different methods to write dataframes and formulas to an excel file.
import pandas as pd
import numpy as np
import xlsxwriter
# Method 1
writer = pd.ExcelWriter('example.xlsx', engine='xlsxwriter')
A = pd.DataFrame(np.array([[1,2,3],[4,5,6],[7,8,9]]))
A.to_excel(writer , sheet_name='Sheet1')
writer.save()
# Method 2
workbook = xlsxwriter.Workbook('example.xlsx')
worksheet = workbook.add_worksheet()
worksheet.write_formula('B5' , '=_xlfn.STDEV.S(B3:B5)')
workbook.close()
One works for adding the dataframe the other for the formula. Problem: Method 2 deletes what was written to the file with Method 1. How can I combine them?
Here is one way to do it if you want to combine the two actions into one:
import pandas as pd
import numpy as np
import xlsxwriter
# Method 1
writer = pd.ExcelWriter('example.xlsx', engine='xlsxwriter')
A = pd.DataFrame(np.array([[1,2,3],[4,5,6],[7,8,9]]))
A.to_excel(writer , sheet_name='Sheet1')
# Get the xlsxwriter objects from the dataframe writer object.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Write the formula.
worksheet.write_formula('B5' , '=_xlfn.STDEV.S(B2:B4)')
# Or Create a new worksheet and add the formula there.
worksheet = workbook.add_worksheet()
worksheet.write_formula('B5' , '=_xlfn.STDEV.S(Sheet1!B2:B4)')
writer.save()
Output:
See the Working with Python Pandas and XlsxWriter section of the XlsxWriter docs.
Note, I corrected the range of the formula to avoid a circular reference.
I am writing DataFrames to excel using to_excel(). I need to use openpyxl instead of XlsxWriter, I think, as the writer engine because I need to open existing Excel files and add sheets. Regardless, I'm deep into other formatting using openpyxl so I'm not keen on changing.
This writes the DataFrame, and formats the floats, but I can't figure out how to format the int dtypes.
import pandas as pd
from openpyxl import load_workbook
df = pd.DataFrame({'county':['Cnty1','Cnty2','Cnty3'], 'ints':[5245,70000,4123123], 'floats':[3.212, 4.543, 6.4555]})
fileName = "Maryland - test.xlsx"
book = load_workbook(fileName)
writer = pd.ExcelWriter(fileName, engine='openpyxl')
writer.book = book
df.to_excel(writer, sheet_name='Test', float_format='%.2f', header=False, index=False, startrow=3)
ws = writer.sheets['Test']
writer.save()
writer.close()
Tried using this, but I think it only works with XlsxWriter:
intFormat = book.add_format({'num_format': '#,###'})
ws.set_column('B:B', intFormat)
This type of thing could be used cell-by-cell with a loop, but there's A LOT of data:
ws['B2'].number_format = '#,###'
This can be fixed by using number_fomat from openpyxl.styles
from openpyxl.styles import numbers
def sth():
#This will output a number like: 2,000.00
cell.number_format = numbers.FORMAT_NUMBER_COMMA_SEPARATED1
Checkout this link for further reading thedocs
I'm in the midst of writing a iPython notebook that will pull the contents of a .csv file and paste them into a specified tab on an .xlsx file. The tab on the .xlsx is filled with a bunch of pre-programmed formulas so that I might run an analysis on the original content of the .csv file.
I've ran into a snag, however, with the the date fields that I copy over from the .csv into the .xlsx file.
The dates do not get properly processed by the Excel formulas unless I double-click the date cells or apply Excel's "text to columns" function on the column of dates and set a tab as the delimiter (which I should note, does not split the cell).
I'm wondering if there's a way to either...
write a helper function that logs the keystrokes of applying the "text to columns" function call
write a helper function to double click and return down each row of the column of dates
from openpyxl import load_workbook
import pandas as pd
def transfer_hours(report_name, ER_hours_analysis_wb):
df = pd.read_csv(report_name, index_col=0)
book = load_workbook(ER_hours_analysis_wb)
sheet_name = "ER Work Log"
with pd.ExcelWriter("ER Hours Analysis 248112.xlsx",
engine='openpyxl') as writer:
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
df.to_excel(writer, sheet_name=sheet_name,
startrow=1, startcol=0, engine='openpyxl')
Use the xlsx module
import xlsx
load_workbook ( filen = (filePath, read_only=False, data_only=False )
Setting data_only to False will return the formulas whereas data_only=True returns the non-formula values.
As great a tool as pandas is designed to be, in this case there may not be a reason to include.
Here is a shorter structure for what you're trying to accomplish:
import csv
import datetime
from openpyxl import load_workbook
def transfer_hours(report_name, ER_hours_analysis_wb):
wb = load_workbook(ER_hours_analysis_wb)
ws = wb['ER Work Log']
csvfile = open(report_name, 'rt')
reader = csv.reader(csvfile,delimiter=',')
#iterators
rownum = 0
colnum = 0
for row in reader:
for col in row:
dttm = datetime.datetime.strptime(col, "%m/%d/%Y")
ws.cell(column=colnum,row=rownum).value = dttm
wb.save('new_spreadsheet.xlsx')
What you'll be able to do from here is break out which columns should have what format based on the position in the csv. Here is an example:
for row in reader:
ws.cell(column=0,row=rownum,value=row[0])
dttm = datetime.datetime.strptime(row[1], "%m/%d/%Y")
ws.cell(column=1,row=rownum).value = dttm
For reference:
https://openpyxl.readthedocs.io/en/stable/usage.html
In Python, how do I read a file line-by-line into a list?
How to format columns with headers using OpenPyXL
Given the following data frame:
import pandas as pd
d=pd.DataFrame({'a':['a','a','b','b'],
'b':['a','b','c','d'],
'c':[1,2,3,4]})
d=d.groupby(['a','b']).sum()
d
I'd like to export this with the same alignment with respect to the index (see how the left-most column is centered vertically?).
The rub is that when exporting this to Excel, the left column is aligned to the top of each cell:
writer = pd.ExcelWriter('pandas_out.xlsx', engine='xlsxwriter')
workbook = writer.book
f=workbook.add_format({'align': 'vcenter'})
d.to_excel(writer, sheet_name='Sheet1')
writer.save()
...produces...
Is there any way to center column A vertically via XLSX Writer or another library?
Thanks in advance!
You are trying to change the formatting of the header so you should first reset the default header settings
from pandas.io.formats.excel import ExcelFormatter
ExcelFormatter.header_style = None
Then apply the formatting as required
format = workbook.add_format()
format.set_align('center')
format.set_align('vcenter')
worksheet.set_column('A:C',5, format)
here is complete working code
d=pd.DataFrame({'a':['a','a','b','b'],
'b':['a','b','c','d'],
'c':[1,2,3,4]})
d=d.groupby(['a','b']).sum()
pd.core.format.header_style = None
writer = pd.ExcelWriter('pandas_out.xlsx', engine='xlsxwriter')
workbook = writer.book
d.to_excel(writer, sheet_name='Sheet1')
worksheet = writer.sheets['Sheet1']
format = workbook.add_format()
format.set_align('center')
format.set_align('vcenter')
worksheet.set_column('A:C',5, format)
writer.save()