Reading csv file and writing the df to excel with text wrap

Reading csv file and writing the df to excel with text wrap - python

I am trying to get the following output. All rows and columns are text wrapped except the header though:
import pandas as pd
import pandas.io.formats.style
import os
from pandas import ExcelWriter
import numpy as np
from xlsxwriter.utility import xl_rowcol_to_cell
writer = pd.ExcelWriter('test1.xlsx',engine='xlsxwriter',options={'strings_to_numbers': True},date_format='mmmm dd yyyy')
df = pd.read_csv("D:\\Users\\u700216\\Desktop\\Reports\\CD_Counts.csv")
df.to_excel(writer,sheet_name='Sheet1',startrow=1 , startcol=1, header=True, index=False, encoding='utf8')
workbook = writer.book
worksheet = writer.sheets['Sheet1']
format = workbook.add_format()
format1 = workbook.add_format({'bold': True, 'align' : 'left'})
format.set_align('Center')
format1.set_align('Center')
format.set_text_wrap()
format1.set_text_wrap()
worksheet.set_row(0, 20, format1)
worksheet.set_column('A:Z', 30, format)
writer.save()
format is applied for all rows and columns except header. i dont know why format is not applied to first column (Header) or i would like to manually add column header numbers such as 0,1,2 etc so that i will turn of the header therefore all the rows and columns will be formatted
In the above screenshot wrap text is not applied to A1 to E1, C1 column has header with lot of space. if i manually click wrap text it gets aligned else all the header is not formatted using text wrap.

A couple of problems:
Your code is correctly attempting to format the header, but when you create your file using .to_excel() you are telling it to start at row/col 1, 1. The cells though are numbered from 0, 0. So if you change to:
df.to_excel(writer,sheet_name='Sheet1', startrow=0, startcol=0, header=True, index=False, encoding='utf8')
You will see col A and row 1 are both formatted:
i.e. Col A is 0 and Row 1 is 0
When using Pandas to write the header, it applies its own format which will overwrite the formatting you have provided. To get around this, turn off headers and get it to only write the data from row 1 onwards and write the header manually.
The following might be a bit clearer:
import pandas as pd
import pandas.io.formats.style
import os
from pandas import ExcelWriter
import numpy as np
from xlsxwriter.utility import xl_rowcol_to_cell
writer = pd.ExcelWriter('test1.xlsx', engine='xlsxwriter', options={'strings_to_numbers': True}, date_format='mmmm dd yyyy')
#df = pd.read_csv("D:\\Users\\u700216\\Desktop\\Reports\\CD_Counts.csv")
df = pd.read_csv("CD_Counts.csv")
df.to_excel(writer, sheet_name='Sheet1', startrow=1 , startcol=0, header=False, index=False, encoding='utf8')
workbook = writer.book
worksheet = writer.sheets['Sheet1']
format_header = workbook.add_format()
format_header.set_align('center')
format_header.set_bold()
format_header.set_text_wrap()
format_header.set_border()
format_data = workbook.add_format()
format_data.set_align('center')
format_data.set_text_wrap()
worksheet.set_column('A:Z', 20, format_data)
worksheet.set_row(0, 40, format_header)
# Write the header manually
for colx, value in enumerate(df.columns.values):
worksheet.write(0, colx, value)
writer.save()
Which would give you:
Note: It is also possible to tell Pandas the style to use, or to force it to None so it will inherit your own style. The only drawback with that approach is that the method required to do that depends on the version of Pandas that is being used. This approach works for all versions.

Related

Problem with different indexing in pandas and xlsxwriter

Here is the code which works fine:
import pandas as pd
# Create a Pandas dataframe from some data.
df = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]})
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('pandas_conditional.xlsx', engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name='Sheet1')
# Get the xlsxwriter workbook and worksheet objects.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Apply a conditional format to the cell range.
worksheet.conditional_format(1,1,1,1, {'type': '3_color_scale'}) ##CHANGES THE COLOR OF SECOND ROW
# Close the Pandas Excel writer and output the Excel file.
writer.save()
This creates below output.
My question is, Is it possible to include the Header Data in the pandas indexing? I want to start indexing from 1st row. So Header Data should have index 0. It is useful because in xlsxwriter 1st row has index 0.

Firstly index is an object and default index starts from 0. You can swift it by typing:
df.index += 1
As for the header's index name pandas method to_excel takes an argument which is called index_label. So your code should be:
import pandas as pd
# Create a Pandas dataframe from some data.
df = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]})
df.index += 1
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('pandas_conditional.xlsx', engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name='Sheet1', index_label='0')
# Get the xlsxwriter workbook and worksheet objects.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Apply a conditional format to the cell range.
worksheet.conditional_format(1,1,1,1, {'type': '3_color_scale'}) ##CHANGES THE COLOR OF SECOND ROW
# Close the Pandas Excel writer and output the Excel file.
writer.save()
Output:

Pandas 0.24 core header style not working [duplicate]

I'm saving pandas DataFrame to_excel using xlsxwriter. I've managed to format all of my data (set column width, font size etc) except for changing header's font and I can't find the way to do it. Here's my example:
import pandas as pd
data = pd.DataFrame({'test_data': [1,2,3,4,5]})
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
data.to_excel(writer, sheet_name='test', index=False)
workbook = writer.book
worksheet = writer.sheets['test']
font_fmt = workbook.add_format({'font_name': 'Arial', 'font_size': 10})
header_fmt = workbook.add_format({'font_name': 'Arial', 'font_size': 10, 'bold': True})
worksheet.set_column('A:A', None, font_fmt)
worksheet.set_row(0, None, header_fmt)
writer.save()
The penultimate line that tries to set format for the header does nothing.

I think you need first reset default header style, then you can change it:
pd.core.format.header_style = None
All together:
import pandas as pd
data = pd.DataFrame({'test_data': [1,2,3,4,5]})
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
pd.core.format.header_style = None
data.to_excel(writer, sheet_name='test', index=False)
workbook = writer.book
worksheet = writer.sheets['test']
font_fmt = workbook.add_format({'font_name': 'Arial', 'font_size': 10})
header_fmt = workbook.add_format({'font_name': 'Arial', 'font_size': 10, 'bold': True})
worksheet.set_column('A:A', None, font_fmt)
worksheet.set_row(0, None, header_fmt)
writer.save()
Explaining by jmcnamara, thank you:
In Excel a cell format overrides a row format overrides a column format.The pd.core.format.header_style is converted to a format and is applied to each cell in the header. As such the default cannot be overridden by set_row(). Setting pd.core.format.header_style to None means that the header cells don't have a user defined format and thus it can be overridden by set_row().
EDIT: In version 0.18.1 you have to change
pd.core.format.header_style = None
to:
pd.formats.format.header_style = None
EDIT: from version 0.20 this changed again
import pandas.io.formats.excel
pandas.io.formats.excel.header_style = None
thanks krvkir.
EDIT: from version 0.24 this is now required
import pandas.io.formats.excel
pandas.io.formats.excel.ExcelFormatter.header_style = None
thanks Chris Vecchio.

An update for anyone who comes across this post and is using Pandas 0.20.1.
It seems the required code is now
import pandas.io.formats.excel
pandas.io.formats.excel.header_style = None
Apparently the excel submodule isn't imported automatically, so simply trying pandas.io.formats.excel.header_style = None alone will raise an AttributeError.

Another option for Pandas 0.25 (probably also 0.24). Likely not the best way to do it, but it worked for me.
import pandas.io.formats.excel
pandas.io.formats.excel.ExcelFormatter.header_style = None

for pandas 0.24:
The below doesn't work anymore:
import pandas.io.formats.excel
pandas.io.formats.excel.header_style = None
Instead, create a cell formatting object, and re-write the first row's content (your header) one cell at a time with the new cell formatting object.
Now, you are future proof.
Use the following pseudo code:
# [1] write df to excel as usual
writer = pd.ExcelWriter(path_output, engine='xlsxwriter')
df.to_excel(writer, sheet_name, index=False)
# [2] do formats for other rows and columns first
# [3] create a format object with your desired formatting for the header, let's name it: headercellformat
# [4] write to the 0th (header) row **one cell at a time**, with columnname and format
for columnnum, columnname in enumerate(list(df.columns)):
worksheet.write(0, columnnum, columnname, headercellformat)

In pandas 0.20 the solution of the accepted answer changed again.
The format that should be set to None can be found at:
pandas.io.formats.excel.header_style

If you do not want to set the header style for pandas entirely, you can alternatively also pass a header=False to ExcelWriter:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(3, 5),
columns=pd.date_range('2019-01-01', periods=5, freq='M'))
file_path='output.xlsx'
writer = pd.ExcelWriter(file_path, engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', header=False, index=False )
workbook = writer.book
fmt = workbook.add_format({'num_format': 'mm/yyyy', 'bold': True})
worksheet = writer.sheets['Sheet1']
worksheet.set_row(0, None, fmt)
writer.save()

unfortunately add_format is in not avaiable anymore

xlsxwriter changes the font style to bold for the first column within an excel document. How to disable this option?

One of these two openpyxl, xlsxwriter changes automatically the font style to bold when exporting an csv file to excel. It happens only for the first column. Do you know why and how can i overcome this behavior ?
import pandas as pd
import openpyxl
import xlsxwriter
from pandas import DataFrame
import time
from glob import iglob
data = pd.read_csv(next(iglob('*.csv')))
data = data.sort_values(by=['A'], ascending=False)
data.to_excel('out.xlsx',engine='xlsxwriter', index=False)

You can pass a custom pd.ExcelWriter instance to pd.DataFrame.to_excel() and explicitly disable any styles on the header column:
import pandas as pd
df = pd.DataFrame(data={'col_a': [1,2,3,4], 'col_b': [5,6,7,8]})
with pd.ExcelWriter('out.xlsx') as writer:
df.to_excel(writer, index=False)
workbook = writer.book
worksheet = writer.sheets['Sheet1']
header_format = workbook.add_format({
'bold': False,
'border': False})
for col_num, value in enumerate(df.columns.values):
worksheet.write(0, col_num, value, header_format)
Edit [no longer supported in pandas 0.24]: alternatively, you can reset all pandas default header styling. Afterwards, just call df.to_excel() as usual.
import pandas.io.formats.excel
pandas.io.formats.excel.header_style = None

Format dates in Excel for comparison

I'm in the midst of writing a iPython notebook that will pull the contents of a .csv file and paste them into a specified tab on an .xlsx file. The tab on the .xlsx is filled with a bunch of pre-programmed formulas so that I might run an analysis on the original content of the .csv file.
I've ran into a snag, however, with the the date fields that I copy over from the .csv into the .xlsx file.
The dates do not get properly processed by the Excel formulas unless I double-click the date cells or apply Excel's "text to columns" function on the column of dates and set a tab as the delimiter (which I should note, does not split the cell).
I'm wondering if there's a way to either...
write a helper function that logs the keystrokes of applying the "text to columns" function call
write a helper function to double click and return down each row of the column of dates
from openpyxl import load_workbook
import pandas as pd
def transfer_hours(report_name, ER_hours_analysis_wb):
df = pd.read_csv(report_name, index_col=0)
book = load_workbook(ER_hours_analysis_wb)
sheet_name = "ER Work Log"
with pd.ExcelWriter("ER Hours Analysis 248112.xlsx",
engine='openpyxl') as writer:
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
df.to_excel(writer, sheet_name=sheet_name,
startrow=1, startcol=0, engine='openpyxl')

Use the xlsx module
import xlsx
load_workbook ( filen = (filePath, read_only=False, data_only=False )
Setting data_only to False will return the formulas whereas data_only=True returns the non-formula values.

As great a tool as pandas is designed to be, in this case there may not be a reason to include.
Here is a shorter structure for what you're trying to accomplish:
import csv
import datetime
from openpyxl import load_workbook
def transfer_hours(report_name, ER_hours_analysis_wb):
wb = load_workbook(ER_hours_analysis_wb)
ws = wb['ER Work Log']
csvfile = open(report_name, 'rt')
reader = csv.reader(csvfile,delimiter=',')
#iterators
rownum = 0
colnum = 0
for row in reader:
for col in row:
dttm = datetime.datetime.strptime(col, "%m/%d/%Y")
ws.cell(column=colnum,row=rownum).value = dttm
wb.save('new_spreadsheet.xlsx')
What you'll be able to do from here is break out which columns should have what format based on the position in the csv. Here is an example:
for row in reader:
ws.cell(column=0,row=rownum,value=row[0])
dttm = datetime.datetime.strptime(row[1], "%m/%d/%Y")
ws.cell(column=1,row=rownum).value = dttm
For reference:
https://openpyxl.readthedocs.io/en/stable/usage.html
In Python, how do I read a file line-by-line into a list?
How to format columns with headers using OpenPyXL

Xlsxwriter - Trouble formatting pandas dataframe cells using xlsxwriter

I have an excel sheet which came from a pandas dataframe. I then use Xlsxwriter to add formulas, new columns and formatting. The problem is I only seem to be able format what I've written using xlsxwriter and nothing that came from the dataframe. So what I get is something like this half formatted table
As you can see from the image the two columns from the dataframe remain untouched. They must have some kind of default formatting that is overriding mine.
Since I don't know how to convert a worksheet back into to a dataframe the code below is obviously completely wrong but it's just to give an idea of what I'm looking for.
export = "files/sharepointExtract.xlsx"
df = pd.read_excel(export)# df = dataframe
writer = pd.ExcelWriter('files/new_report-%s.xlsx' % (date.today()), engine = 'xlsxwriter')
workbook = writer.book
# Code to make the header red, this works fine because
# it's written in xlsxwriter using write.row()
colour_format = workbook.add_format()
colour_format.set_bg_color('#640000')
colour_format.set_font_color('white')
worksheet.set_row(0, 15, colour_format)
table_body_format = workbook.add_format()
table_body_format.set_bg_color('blue')
for row in worksheet.rows:
row.set_row(0,15, table_body_format)
This code gives an Attribute error but even without the for loop we just get what can be seen in the image.

The following should work:
import pandas as pd
from datetime import date
export = "files/sharepointExtract.xlsx"
df = pd.read_excel(export)
writer = pd.ExcelWriter('files/new_report-{}.xlsx'.format(date.today()), engine ='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', startrow=1 , startcol=0, header=False, index=False, encoding='utf8')
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Code to make the header red background with white text
colour_format = workbook.add_format()
colour_format.set_bg_color('#640000')
colour_format.set_font_color('white')
# Code to make the body blue
table_body_format = workbook.add_format()
table_body_format.set_bg_color('blue')
# Set the header (row 0) to height 15 using colour_format
worksheet.set_row(0, 15, colour_format)
# Set the default format for other rows
worksheet.set_column('A:Z', 15, table_body_format)
# Write the header manually
for colx, value in enumerate(df.columns.values):
worksheet.write(0, colx, value)
writer.save()
When Pandas is used to write the header, it uses its own format style which overwrites the underlying xlsxwriter version. The simplest approach is to stop it from writing the header and get it to write the rest of the data from row 1 onwards (not 0). This avoids the formatting from being altered. You can then easily write your own header using the column values from the dataframe.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Reading csv file and writing the df to excel with text wrap - python

Related

Problem with different indexing in pandas and xlsxwriter

Pandas 0.24 core header style not working [duplicate]

xlsxwriter changes the font style to bold for the first column within an excel document. How to disable this option?

Format dates in Excel for comparison

Xlsxwriter - Trouble formatting pandas dataframe cells using xlsxwriter

Categories

Resources