Pandas DataFrame to Excel: Vertical Alignment of Index - python

Given the following data frame:
import pandas as pd
d=pd.DataFrame({'a':['a','a','b','b'],
'b':['a','b','c','d'],
'c':[1,2,3,4]})
d=d.groupby(['a','b']).sum()
d
I'd like to export this with the same alignment with respect to the index (see how the left-most column is centered vertically?).
The rub is that when exporting this to Excel, the left column is aligned to the top of each cell:
writer = pd.ExcelWriter('pandas_out.xlsx', engine='xlsxwriter')
workbook = writer.book
f=workbook.add_format({'align': 'vcenter'})
d.to_excel(writer, sheet_name='Sheet1')
writer.save()
...produces...
Is there any way to center column A vertically via XLSX Writer or another library?
Thanks in advance!

You are trying to change the formatting of the header so you should first reset the default header settings
from pandas.io.formats.excel import ExcelFormatter
ExcelFormatter.header_style = None
Then apply the formatting as required
format = workbook.add_format()
format.set_align('center')
format.set_align('vcenter')
worksheet.set_column('A:C',5, format)
here is complete working code
d=pd.DataFrame({'a':['a','a','b','b'],
'b':['a','b','c','d'],
'c':[1,2,3,4]})
d=d.groupby(['a','b']).sum()
pd.core.format.header_style = None
writer = pd.ExcelWriter('pandas_out.xlsx', engine='xlsxwriter')
workbook = writer.book
d.to_excel(writer, sheet_name='Sheet1')
worksheet = writer.sheets['Sheet1']
format = workbook.add_format()
format.set_align('center')
format.set_align('vcenter')
worksheet.set_column('A:C',5, format)
writer.save()

Related

Export dataframe to excel file using xlsxwriter

I have dataframes as output and I need to export to excel file. I can use pandas for the task but I need the output to be the worksheet from right to left direction. I have searched and didn't find any clue regarding using the pandas to change the direction .. I have found the package xlsxwriter do that
import xlsxwriter
workbook = xlsxwriter.Workbook('output.xlsx')
worksheet1 = workbook.add_worksheet()
format_right_to_left = workbook.add_format({'reading_order': 2})
worksheet1.set_column('A:A', 20)
worksheet1.right_to_left()
worksheet1.write(new_df)
workbook.close()
But I don't know how to export the dataframe using this approach ..
snapshot to clarify the directions:
** I have used multiple lines as for format point
myformat = workbook.add_format()
myformat.set_reading_order(2)
myformat.set_align('center')
myformat.set_align('vcenter')
Is it possible to make such lines shorter using dictionary ..for example?
You can do this:
import xlsxwriter
writer = pd.ExcelWriter('pandas_excel.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1') # Assuming you already have a `df`
workbook = writer.book
worksheet = writer.sheets['Sheet1']
format_right_to_left = workbook.add_format({'reading_order': 2})
worksheet.right_to_left()
writer.save()

Add formats to dataframe and insert in excel using python?

i have to insert a database into excel with borders and all values in data frame should be centered i tried doing formatting to cells but does not work
df1.to_excel(writer,index=False,header=True,startrow=12,sheet_name='Sheet1')
writer.close()
writer=pd.ExcelWriter(s, engine="xlsxwriter")
writer.book = load_workbook(s)
workbooks= writer.book
worksheet = workbooks['Sheet1']
f1= workbooks.add_format()
worksheet.conditional_format(12,0,len(df1)+1,7,{'format':f1})
can u please help me with this
I'm not going to lie: I've done this for the first time right now, so this might not be a very good solution. I'm using openpyxl because it seems more flexible to me than XlsxWriter. I hope you can use it too.
My assumption is that the variable file_name contains a valid file name.
First your Pandas step:
with pd.ExcelWriter(file_name, engine='xlsxwriter') as writer:
df1.to_excel(writer, index=False, header=True, startrow=12, sheet_name='Sheet1')
Then the necessary imports from openpyxl:
from openpyxl import load_workbook
from openpyxl.styles import NamedStyle, Alignment, Border, Side
Loading the workbook and selecting the worksheet:
wb = load_workbook(file_name)
ws = wb['Sheet1']
Defining the required style:
centered_with_frame = NamedStyle('centered_with_frame')
centered_with_frame.alignment = Alignment(horizontal='center')
bd = Side(style='thin')
centered_with_frame.border = Border(left=bd, top=bd, right=bd, bottom=bd)
Selecting the relevant cells:
cells = ws[ws.cell(row=12+1, column=1).coordinate:
ws.cell(row=12+1+df1.shape[0], column=df1.shape[1]).coordinate]
Applying the defined style to the selected cells:
for row in cells:
for cell in row:
cell.style = centered_with_frame
Finally saving the workbook:
wb.save(file_name)
As I said: This might not be optimal.

Problem with different indexing in pandas and xlsxwriter

Here is the code which works fine:
import pandas as pd
# Create a Pandas dataframe from some data.
df = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]})
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('pandas_conditional.xlsx', engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name='Sheet1')
# Get the xlsxwriter workbook and worksheet objects.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Apply a conditional format to the cell range.
worksheet.conditional_format(1,1,1,1, {'type': '3_color_scale'}) ##CHANGES THE COLOR OF SECOND ROW
# Close the Pandas Excel writer and output the Excel file.
writer.save()
This creates below output.
My question is, Is it possible to include the Header Data in the pandas indexing? I want to start indexing from 1st row. So Header Data should have index 0. It is useful because in xlsxwriter 1st row has index 0.
Firstly index is an object and default index starts from 0. You can swift it by typing:
df.index += 1
As for the header's index name pandas method to_excel takes an argument which is called index_label. So your code should be:
import pandas as pd
# Create a Pandas dataframe from some data.
df = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]})
df.index += 1
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('pandas_conditional.xlsx', engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name='Sheet1', index_label='0')
# Get the xlsxwriter workbook and worksheet objects.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Apply a conditional format to the cell range.
worksheet.conditional_format(1,1,1,1, {'type': '3_color_scale'}) ##CHANGES THE COLOR OF SECOND ROW
# Close the Pandas Excel writer and output the Excel file.
writer.save()
Output:

How to color text in a cell containing a specific string using pandas

After running my algorithms I saved all the data in an excel file using pandas.
writer = pd.ExcelWriter('Diff.xlsx', engine='xlsxwriter')
Now, some of the cells contain strings which includes "-->" in it. I have the row and column number for those cells using:
xl_rowcol_to_cell(rows[i],cols[i])
But I couldn't figure how to color those cells or atleast the whole text in it.
Any suggestions/tips?
You could use a conditional format in Excel like this:
import pandas as pd
# Create a Pandas dataframe from some data.
df = pd.DataFrame({'Data': ['foo', 'a --> b', 'bar']})
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('pandas_conditional.xlsx', engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name='Sheet1')
# Get the xlsxwriter workbook and worksheet objects.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Add a format. Light red fill with dark red text.
format1 = workbook.add_format({'bg_color': '#FFC7CE',
'font_color': '#9C0006'})
# Apply a conditional format to the cell range.
worksheet.conditional_format(1, 1, len(df), 1,
{'type': 'text',
'criteria': 'containing',
'value': '-->',
'format': format1})
# Close the Pandas Excel writer and output the Excel file.
writer.save()
Output:
See Adding Conditional Formatting to Dataframe output in the XlsxWriter docs.
def highlight (dataframe):
if dataframe[dataframe['c'].str.contains("-->")]:
return ['background-color: yellow']*5
else:
return ['background-color: white']*5
df.style.apply(highlight, axis=1)

Xlsxwriter - Trouble formatting pandas dataframe cells using xlsxwriter

I have an excel sheet which came from a pandas dataframe. I then use Xlsxwriter to add formulas, new columns and formatting. The problem is I only seem to be able format what I've written using xlsxwriter and nothing that came from the dataframe. So what I get is something like this half formatted table
As you can see from the image the two columns from the dataframe remain untouched. They must have some kind of default formatting that is overriding mine.
Since I don't know how to convert a worksheet back into to a dataframe the code below is obviously completely wrong but it's just to give an idea of what I'm looking for.
export = "files/sharepointExtract.xlsx"
df = pd.read_excel(export)# df = dataframe
writer = pd.ExcelWriter('files/new_report-%s.xlsx' % (date.today()), engine = 'xlsxwriter')
workbook = writer.book
# Code to make the header red, this works fine because
# it's written in xlsxwriter using write.row()
colour_format = workbook.add_format()
colour_format.set_bg_color('#640000')
colour_format.set_font_color('white')
worksheet.set_row(0, 15, colour_format)
table_body_format = workbook.add_format()
table_body_format.set_bg_color('blue')
for row in worksheet.rows:
row.set_row(0,15, table_body_format)
This code gives an Attribute error but even without the for loop we just get what can be seen in the image.
The following should work:
import pandas as pd
from datetime import date
export = "files/sharepointExtract.xlsx"
df = pd.read_excel(export)
writer = pd.ExcelWriter('files/new_report-{}.xlsx'.format(date.today()), engine ='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', startrow=1 , startcol=0, header=False, index=False, encoding='utf8')
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Code to make the header red background with white text
colour_format = workbook.add_format()
colour_format.set_bg_color('#640000')
colour_format.set_font_color('white')
# Code to make the body blue
table_body_format = workbook.add_format()
table_body_format.set_bg_color('blue')
# Set the header (row 0) to height 15 using colour_format
worksheet.set_row(0, 15, colour_format)
# Set the default format for other rows
worksheet.set_column('A:Z', 15, table_body_format)
# Write the header manually
for colx, value in enumerate(df.columns.values):
worksheet.write(0, colx, value)
writer.save()
When Pandas is used to write the header, it uses its own format style which overwrites the underlying xlsxwriter version. The simplest approach is to stop it from writing the header and get it to write the rest of the data from row 1 onwards (not 0). This avoids the formatting from being altered. You can then easily write your own header using the column values from the dataframe.

Categories

Resources