Add formats to dataframe and insert in excel using python?

Add formats to dataframe and insert in excel using python? - python

i have to insert a database into excel with borders and all values in data frame should be centered i tried doing formatting to cells but does not work
df1.to_excel(writer,index=False,header=True,startrow=12,sheet_name='Sheet1')
writer.close()
writer=pd.ExcelWriter(s, engine="xlsxwriter")
writer.book = load_workbook(s)
workbooks= writer.book
worksheet = workbooks['Sheet1']
f1= workbooks.add_format()
worksheet.conditional_format(12,0,len(df1)+1,7,{'format':f1})
can u please help me with this

I'm not going to lie: I've done this for the first time right now, so this might not be a very good solution. I'm using openpyxl because it seems more flexible to me than XlsxWriter. I hope you can use it too.
My assumption is that the variable file_name contains a valid file name.
First your Pandas step:
with pd.ExcelWriter(file_name, engine='xlsxwriter') as writer:
df1.to_excel(writer, index=False, header=True, startrow=12, sheet_name='Sheet1')
Then the necessary imports from openpyxl:
from openpyxl import load_workbook
from openpyxl.styles import NamedStyle, Alignment, Border, Side
Loading the workbook and selecting the worksheet:
wb = load_workbook(file_name)
ws = wb['Sheet1']
Defining the required style:
centered_with_frame = NamedStyle('centered_with_frame')
centered_with_frame.alignment = Alignment(horizontal='center')
bd = Side(style='thin')
centered_with_frame.border = Border(left=bd, top=bd, right=bd, bottom=bd)
Selecting the relevant cells:
cells = ws[ws.cell(row=12+1, column=1).coordinate:
ws.cell(row=12+1+df1.shape[0], column=df1.shape[1]).coordinate]
Applying the defined style to the selected cells:
for row in cells:
for cell in row:
cell.style = centered_with_frame
Finally saving the workbook:
wb.save(file_name)
As I said: This might not be optimal.

Related

Pandas create a new sheet instead of adding the data in the active one

I am creating a spreadsheet with openpyxl and adding some data.
import pandas as pd
import numpy as np
from openpyxl import Workbook
from openpyxl.utils.dataframe import dataframe_to_rows
from openpyxl import load_workbook
from collections import OrderedDict
workbook = Workbook()
sheet = workbook.active
def fill_static_values():
sheet["A1"] = "Run No."
sheet["A2"] = "MLIDMLPA"
sheet["A48"] = "Patients here"
sheet["B1"] = "Patient"
fill_static_values()
output = "./Name_of_run.xlsx"
workbook.save(filename=output)
Then my application do some data management and I want to add some of this data into the existing file.
book = load_workbook(output)
writer = pd.ExcelWriter(output, engine='openpyxl')
writer.book = book
## ExcelWriter for some reason uses writer.sheets to access the sheet.
## If you leave it empty it will not know that sheet Main is already there
## and will create a new sheet.
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
data_no_control.to_excel(writer, "sheet", startrow=2, startcol=3,
header=False,
index=False)
writer.save()
Solution found on this StackOverflow link
However, this is creating and adding the data in the correct position but in a new sheet called sheet2. What I am doing wrong?

The to_excel has incorrect sheet name. The S should be in CAPS. Change the line from
data_no_control.to_excel(writer, "sheet", startrow=2, startcol=3,
to
data_no_control.to_excel(writer, "Sheet", startrow=2, startcol=3,
As there is already a sheet in the excel, it is writing the data to Sheet2
EDIT
Noticed that you are using writer.sheets. If you want to use want the program pick up the first sheet from excel automatically, you can use this as well...
data_no_control.to_excel(writer, sheet_name=list(writer.sheets.keys())[0], startrow=2, startcol=3,
This will pick up the first sheet (in your case the only sheet) as the worksheet to update

Export dataframe to excel file using xlsxwriter

I have dataframes as output and I need to export to excel file. I can use pandas for the task but I need the output to be the worksheet from right to left direction. I have searched and didn't find any clue regarding using the pandas to change the direction .. I have found the package xlsxwriter do that
import xlsxwriter
workbook = xlsxwriter.Workbook('output.xlsx')
worksheet1 = workbook.add_worksheet()
format_right_to_left = workbook.add_format({'reading_order': 2})
worksheet1.set_column('A:A', 20)
worksheet1.right_to_left()
worksheet1.write(new_df)
workbook.close()
But I don't know how to export the dataframe using this approach ..
snapshot to clarify the directions:
** I have used multiple lines as for format point
myformat = workbook.add_format()
myformat.set_reading_order(2)
myformat.set_align('center')
myformat.set_align('vcenter')
Is it possible to make such lines shorter using dictionary ..for example?

You can do this:
import xlsxwriter
writer = pd.ExcelWriter('pandas_excel.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1') # Assuming you already have a `df`
workbook = writer.book
worksheet = writer.sheets['Sheet1']
format_right_to_left = workbook.add_format({'reading_order': 2})
worksheet.right_to_left()
writer.save()

Formatting integers with comma separator using openpyxl and to_excel

I am writing DataFrames to excel using to_excel(). I need to use openpyxl instead of XlsxWriter, I think, as the writer engine because I need to open existing Excel files and add sheets. Regardless, I'm deep into other formatting using openpyxl so I'm not keen on changing.
This writes the DataFrame, and formats the floats, but I can't figure out how to format the int dtypes.
import pandas as pd
from openpyxl import load_workbook
df = pd.DataFrame({'county':['Cnty1','Cnty2','Cnty3'], 'ints':[5245,70000,4123123], 'floats':[3.212, 4.543, 6.4555]})
fileName = "Maryland - test.xlsx"
book = load_workbook(fileName)
writer = pd.ExcelWriter(fileName, engine='openpyxl')
writer.book = book
df.to_excel(writer, sheet_name='Test', float_format='%.2f', header=False, index=False, startrow=3)
ws = writer.sheets['Test']
writer.save()
writer.close()
Tried using this, but I think it only works with XlsxWriter:
intFormat = book.add_format({'num_format': '#,###'})
ws.set_column('B:B', intFormat)
This type of thing could be used cell-by-cell with a loop, but there's A LOT of data:
ws['B2'].number_format = '#,###'

This can be fixed by using number_fomat from openpyxl.styles
from openpyxl.styles import numbers
def sth():
#This will output a number like: 2,000.00
cell.number_format = numbers.FORMAT_NUMBER_COMMA_SEPARATED1
Checkout this link for further reading thedocs

Format dates in Excel for comparison

I'm in the midst of writing a iPython notebook that will pull the contents of a .csv file and paste them into a specified tab on an .xlsx file. The tab on the .xlsx is filled with a bunch of pre-programmed formulas so that I might run an analysis on the original content of the .csv file.
I've ran into a snag, however, with the the date fields that I copy over from the .csv into the .xlsx file.
The dates do not get properly processed by the Excel formulas unless I double-click the date cells or apply Excel's "text to columns" function on the column of dates and set a tab as the delimiter (which I should note, does not split the cell).
I'm wondering if there's a way to either...
write a helper function that logs the keystrokes of applying the "text to columns" function call
write a helper function to double click and return down each row of the column of dates
from openpyxl import load_workbook
import pandas as pd
def transfer_hours(report_name, ER_hours_analysis_wb):
df = pd.read_csv(report_name, index_col=0)
book = load_workbook(ER_hours_analysis_wb)
sheet_name = "ER Work Log"
with pd.ExcelWriter("ER Hours Analysis 248112.xlsx",
engine='openpyxl') as writer:
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
df.to_excel(writer, sheet_name=sheet_name,
startrow=1, startcol=0, engine='openpyxl')

Use the xlsx module
import xlsx
load_workbook ( filen = (filePath, read_only=False, data_only=False )
Setting data_only to False will return the formulas whereas data_only=True returns the non-formula values.

As great a tool as pandas is designed to be, in this case there may not be a reason to include.
Here is a shorter structure for what you're trying to accomplish:
import csv
import datetime
from openpyxl import load_workbook
def transfer_hours(report_name, ER_hours_analysis_wb):
wb = load_workbook(ER_hours_analysis_wb)
ws = wb['ER Work Log']
csvfile = open(report_name, 'rt')
reader = csv.reader(csvfile,delimiter=',')
#iterators
rownum = 0
colnum = 0
for row in reader:
for col in row:
dttm = datetime.datetime.strptime(col, "%m/%d/%Y")
ws.cell(column=colnum,row=rownum).value = dttm
wb.save('new_spreadsheet.xlsx')
What you'll be able to do from here is break out which columns should have what format based on the position in the csv. Here is an example:
for row in reader:
ws.cell(column=0,row=rownum,value=row[0])
dttm = datetime.datetime.strptime(row[1], "%m/%d/%Y")
ws.cell(column=1,row=rownum).value = dttm
For reference:
https://openpyxl.readthedocs.io/en/stable/usage.html
In Python, how do I read a file line-by-line into a list?
How to format columns with headers using OpenPyXL

Insert a title at the beginning of the Excel worksheet

row = 5
column = 0
writer = pd.ExcelWriter(file_name, engine='openpyxl')
response = send_request('2017-2018-regular', item).content
df = pd.read_csv(io.StringIO(response.decode('utf-8')))
df.to_excel(writer, sheets, startrow=row, startcol=column, index=False)
I would like to put a simple title at the top of my Excel sheet in considering I am working with pandas and openpyxl. How could I do such thing? I want that title could be displayed on the top of the sheet (startrow=0, startcol=0). Please show me an example how to use it.
I know the question Write dataframe to excel with a title is related, but I can't use it for the simple reason that the engine is different. I use openpyxl lib and they used xlsxwriter lib in their answer. What is the equivalent for write_string, but with pandas?

well in openpyxl first row/column start with 1 instead of 0 so row=1,column=1 will be first (0,0) top-left cell where you need to start writing
check following example.
from openpyxl import Workbook
wb = Workbook()
dest_filename = 'empty_book.xlsx'
ws1 = wb.active #first default sheet if you want to create new one use wb.create_sheet(title="xyz")
ws1.title = "Title set example"
for col in range(1, 10):
ws1.cell(column=col, row=1, value="Title_{0}".format(col))
wb.save(filename = dest_filename)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Add formats to dataframe and insert in excel using python? - python

Related

Pandas create a new sheet instead of adding the data in the active one

Export dataframe to excel file using xlsxwriter

Formatting integers with comma separator using openpyxl and to_excel

Format dates in Excel for comparison

Insert a title at the beginning of the Excel worksheet

Categories

Resources