I have an excel sheet which came from a pandas dataframe. I then use Xlsxwriter to add formulas, new columns and formatting. The problem is I only seem to be able format what I've written using xlsxwriter and nothing that came from the dataframe. So what I get is something like this half formatted table
As you can see from the image the two columns from the dataframe remain untouched. They must have some kind of default formatting that is overriding mine.
Since I don't know how to convert a worksheet back into to a dataframe the code below is obviously completely wrong but it's just to give an idea of what I'm looking for.
export = "files/sharepointExtract.xlsx"
df = pd.read_excel(export)# df = dataframe
writer = pd.ExcelWriter('files/new_report-%s.xlsx' % (date.today()), engine = 'xlsxwriter')
workbook = writer.book
# Code to make the header red, this works fine because
# it's written in xlsxwriter using write.row()
colour_format = workbook.add_format()
colour_format.set_bg_color('#640000')
colour_format.set_font_color('white')
worksheet.set_row(0, 15, colour_format)
table_body_format = workbook.add_format()
table_body_format.set_bg_color('blue')
for row in worksheet.rows:
row.set_row(0,15, table_body_format)
This code gives an Attribute error but even without the for loop we just get what can be seen in the image.
The following should work:
import pandas as pd
from datetime import date
export = "files/sharepointExtract.xlsx"
df = pd.read_excel(export)
writer = pd.ExcelWriter('files/new_report-{}.xlsx'.format(date.today()), engine ='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', startrow=1 , startcol=0, header=False, index=False, encoding='utf8')
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Code to make the header red background with white text
colour_format = workbook.add_format()
colour_format.set_bg_color('#640000')
colour_format.set_font_color('white')
# Code to make the body blue
table_body_format = workbook.add_format()
table_body_format.set_bg_color('blue')
# Set the header (row 0) to height 15 using colour_format
worksheet.set_row(0, 15, colour_format)
# Set the default format for other rows
worksheet.set_column('A:Z', 15, table_body_format)
# Write the header manually
for colx, value in enumerate(df.columns.values):
worksheet.write(0, colx, value)
writer.save()
When Pandas is used to write the header, it uses its own format style which overwrites the underlying xlsxwriter version. The simplest approach is to stop it from writing the header and get it to write the rest of the data from row 1 onwards (not 0). This avoids the formatting from being altered. You can then easily write your own header using the column values from the dataframe.
Related
I've already set cell format as 'Text' in the target Excel column. However pandas.to_excel changes the format to 'General' when writing strings to this column, eventually the column ends up with blank cells are formatted as 'Text' and non-blank ones as 'General'. Is there a way to write data as 'Text' instead of 'General'?
def exportData(df, dstfile, sheet):
book = load_workbook(dstfile)
writer = pd.ExcelWriter(dstfile, engine='openpyxl', date_format='dd/mm/yyyy', datetime_format='mm/dd/yyyy hh:mm')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
df.to_excel(writer, header=False, startrow=2, index=False, sheet_name=sheet)
writer.save()
You can iterate over the cells using the following method.
The cells you want to format as TEXT just use:
cell.number_format = '#'
This will set the cell formatting as TEXT in Excel.
There might be a way to do it straight from the ExcelWriter within Pandas but I'm unfarmiliar with it, maybe someone who knows better will edit the answer for that option as well.
All you need to do is to save data in pandas as 'object' (which is text) like:
df = pd.DataFrame(data=d, dtype=object)
and after that if you export to Excel with pandas to_excel method it will be stored in Excel as text.
I am trying to write a script that merge multiple CSV files into one Excel file with separate sheets. In doing so, I realized any column that has a percentage in it, will have the error The number in this cell is formatted as text or preceded by an apostrophe. There are multiples columns in each sheet with no specific order that contain percentages, so I am looking for a solution that avoids the error for all columns containing a percentage. I have tried passing different parameters but they haven't solved the problem yet.
Here is a sample code of what I have:
import pandas as pd
path = 'Test.csv'
df = pd.read_csv(path)
writer = pd.ExcelWriter('result.xlsx', engine='xlsxwriter')
df.to_excel(writer, index = None, header=True)
writer.save()
I have included screenshots of the csv file and the excel file, so I hope it helps to better demonstrate the issue at hand. Just to be clear, this is only for the sake of presentation and not the actual files I am working with.
you can add formats using set_column to worksheet by accessing it.
data = {'Text':['This', 'Column', 'is filled', 'with'], 'Number':[20, 21, 19, 18], 'Percentages':['23.3%','46.82%','1.28%','33.3%']}
df = pd.DataFrame(data)
writer = pd.ExcelWriter('result.xlsx', engine='xlsxwriter')
df.to_excel(writer, index=False, header=True, sheet_name='somename')
# get the workbook and worksheet
workbook = writer.book
worksheet = writer.sheets['somename']
# add format (percentage with 2 decimals)
percent_fmt = workbook.add_format({'num_format': '0.00%'})
# set 3rd column to percentage format
worksheet.set_column(2,2,12, percent_fmt)
writer.save()
I'm in the midst of writing a iPython notebook that will pull the contents of a .csv file and paste them into a specified tab on an .xlsx file. The tab on the .xlsx is filled with a bunch of pre-programmed formulas so that I might run an analysis on the original content of the .csv file.
I've ran into a snag, however, with the the date fields that I copy over from the .csv into the .xlsx file.
The dates do not get properly processed by the Excel formulas unless I double-click the date cells or apply Excel's "text to columns" function on the column of dates and set a tab as the delimiter (which I should note, does not split the cell).
I'm wondering if there's a way to either...
write a helper function that logs the keystrokes of applying the "text to columns" function call
write a helper function to double click and return down each row of the column of dates
from openpyxl import load_workbook
import pandas as pd
def transfer_hours(report_name, ER_hours_analysis_wb):
df = pd.read_csv(report_name, index_col=0)
book = load_workbook(ER_hours_analysis_wb)
sheet_name = "ER Work Log"
with pd.ExcelWriter("ER Hours Analysis 248112.xlsx",
engine='openpyxl') as writer:
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
df.to_excel(writer, sheet_name=sheet_name,
startrow=1, startcol=0, engine='openpyxl')
Use the xlsx module
import xlsx
load_workbook ( filen = (filePath, read_only=False, data_only=False )
Setting data_only to False will return the formulas whereas data_only=True returns the non-formula values.
As great a tool as pandas is designed to be, in this case there may not be a reason to include.
Here is a shorter structure for what you're trying to accomplish:
import csv
import datetime
from openpyxl import load_workbook
def transfer_hours(report_name, ER_hours_analysis_wb):
wb = load_workbook(ER_hours_analysis_wb)
ws = wb['ER Work Log']
csvfile = open(report_name, 'rt')
reader = csv.reader(csvfile,delimiter=',')
#iterators
rownum = 0
colnum = 0
for row in reader:
for col in row:
dttm = datetime.datetime.strptime(col, "%m/%d/%Y")
ws.cell(column=colnum,row=rownum).value = dttm
wb.save('new_spreadsheet.xlsx')
What you'll be able to do from here is break out which columns should have what format based on the position in the csv. Here is an example:
for row in reader:
ws.cell(column=0,row=rownum,value=row[0])
dttm = datetime.datetime.strptime(row[1], "%m/%d/%Y")
ws.cell(column=1,row=rownum).value = dttm
For reference:
https://openpyxl.readthedocs.io/en/stable/usage.html
In Python, how do I read a file line-by-line into a list?
How to format columns with headers using OpenPyXL
I am trying to get the following output. All rows and columns are text wrapped except the header though:
import pandas as pd
import pandas.io.formats.style
import os
from pandas import ExcelWriter
import numpy as np
from xlsxwriter.utility import xl_rowcol_to_cell
writer = pd.ExcelWriter('test1.xlsx',engine='xlsxwriter',options={'strings_to_numbers': True},date_format='mmmm dd yyyy')
df = pd.read_csv("D:\\Users\\u700216\\Desktop\\Reports\\CD_Counts.csv")
df.to_excel(writer,sheet_name='Sheet1',startrow=1 , startcol=1, header=True, index=False, encoding='utf8')
workbook = writer.book
worksheet = writer.sheets['Sheet1']
format = workbook.add_format()
format1 = workbook.add_format({'bold': True, 'align' : 'left'})
format.set_align('Center')
format1.set_align('Center')
format.set_text_wrap()
format1.set_text_wrap()
worksheet.set_row(0, 20, format1)
worksheet.set_column('A:Z', 30, format)
writer.save()
format is applied for all rows and columns except header. i dont know why format is not applied to first column (Header) or i would like to manually add column header numbers such as 0,1,2 etc so that i will turn of the header therefore all the rows and columns will be formatted
In the above screenshot wrap text is not applied to A1 to E1, C1 column has header with lot of space. if i manually click wrap text it gets aligned else all the header is not formatted using text wrap.
A couple of problems:
Your code is correctly attempting to format the header, but when you create your file using .to_excel() you are telling it to start at row/col 1, 1. The cells though are numbered from 0, 0. So if you change to:
df.to_excel(writer,sheet_name='Sheet1', startrow=0, startcol=0, header=True, index=False, encoding='utf8')
You will see col A and row 1 are both formatted:
i.e. Col A is 0 and Row 1 is 0
When using Pandas to write the header, it applies its own format which will overwrite the formatting you have provided. To get around this, turn off headers and get it to only write the data from row 1 onwards and write the header manually.
The following might be a bit clearer:
import pandas as pd
import pandas.io.formats.style
import os
from pandas import ExcelWriter
import numpy as np
from xlsxwriter.utility import xl_rowcol_to_cell
writer = pd.ExcelWriter('test1.xlsx', engine='xlsxwriter', options={'strings_to_numbers': True}, date_format='mmmm dd yyyy')
#df = pd.read_csv("D:\\Users\\u700216\\Desktop\\Reports\\CD_Counts.csv")
df = pd.read_csv("CD_Counts.csv")
df.to_excel(writer, sheet_name='Sheet1', startrow=1 , startcol=0, header=False, index=False, encoding='utf8')
workbook = writer.book
worksheet = writer.sheets['Sheet1']
format_header = workbook.add_format()
format_header.set_align('center')
format_header.set_bold()
format_header.set_text_wrap()
format_header.set_border()
format_data = workbook.add_format()
format_data.set_align('center')
format_data.set_text_wrap()
worksheet.set_column('A:Z', 20, format_data)
worksheet.set_row(0, 40, format_header)
# Write the header manually
for colx, value in enumerate(df.columns.values):
worksheet.write(0, colx, value)
writer.save()
Which would give you:
Note: It is also possible to tell Pandas the style to use, or to force it to None so it will inherit your own style. The only drawback with that approach is that the method required to do that depends on the version of Pandas that is being used. This approach works for all versions.
Using xlsxwriter, how do I insert a new row to an Excel worksheet? For instance, there is an existing data table at the cell range A1:G10 of the Excel worksheet, and I want to insert a row (A:A) to give it some space for the title of the report.
I looked through the documentation here http://xlsxwriter.readthedocs.io/worksheet.html, but couldn't find such method.
import xlsxwriter
# Create a workbook and add a worksheet.
workbook = xlsxwriter.Workbook('Expenses01.xlsx')
worksheet = workbook.add_worksheet()
worksheet.insert_row(1) # This method doesn't exist
December 2021, this is still not a possibility. You can get around this by doing some planning, and then writing your dataframe starting on different row. Building on the example from the xlsxwriter documentation:
df = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]})
writer = pd.ExcelWriter('my_excel_spreadsheet.xlsx', engine='xlsxwriter')
with writer as writer:
df.to_excel(writer, sheet_name='Sheet1', startrow = 4) # <<< notice the startrow here
And then, you can write to the earlier rows as mentioned in other comments:
workbook = writer.book
worksheet = writer.sheets['Sheet1']
worksheet.write(row, 0, 'Some Text') # <<< Then you can write to a different row
Not quite the insert() method we want, but better than nothing.
I have found that the planning involved in this process is not really ever something I can get around, even if I didn't have this problem. When I reach the stage where I am taking my data to excel, I have to do a little 'by hand' work in order to make the excel sheet pretty enough for human consumption, which is the whole point of moving things to excel. So, I don't look at the need to pre-plan my start rows as too much out of my way.
By using openpyxl you can insert iew rows and columns
import openpyxl
file = "xyz.xlsx"
#loading XL sheet bassed on file name provided by user
book = openpyxl.load_workbook(file)
#opening sheet whose index no is 0
sheet = book.worksheets[0]
#insert_rows(idx, amount=1) Insert row or rows before row==idx, amount will be no of
#rows you want to add and it's optional
sheet.insert_rows(13)
Hope this helps
Unfortunately this is not something xlsxwriter can do.
openpyxl is a good alternative to xlsxwriter, and if you are starting a new project do not use xlsxwriter.
Currently openpyxl can not insert rows, but here is an extension class for openpyxl that can.
openpyxl also allows reading of excel documents, which xlsxwriter does not.
You can try this
import xlsxwriter
wb = Workbook("name.xlsx")
ws = wb.add_worksheet("sheetname")
# Write a blank cell
ws.write_blank(0, 0, None, cell_format)
ws.write_blank('A2', None, cell_format)
Here is the official documentation:
Xlsxwriter worksheet.write_blank() method
Another alternative is to merge a few blank columns
ws.merge_range('A1:D1', "")
Otherwise you'll need to run a loop to write each blank cell
# Replace 1 for the row number you need
for c in range(0,10):
ws.write_blank(1, c, None, cell_format)
Inserting a row is equivalent to adding +1 to your row count. Technically there is no need for a "blank row" method and I'm pretty sure that's why it isn't there.
you should usewrite
read this: set_column(first_col, last_col, width, cell_format, options)
for example:
import xlsxwriter
workbook =xlsxwriter.Workbook('xD.xlsx')
worksheet = workbook.add_worksheet()
worksheet.write(row, col, 'First Name')
workbook.close()
I am very much unhappy with the answers. The library xlxsWriter tends to perform most of the operations easily.
To add a row in the existing worksheet , you can
wb.write_row(rowNumber,columnNumber,listToAdd)