Applying conditional formatting to excel column from pandas dataframe - python

Im trying to make an excel document with multiple sheets and apply conditional formatting to select columns in the sheet, however, for some reason I cannot get the conditional formatting to apply when I open the sheet.
newexcelfilename= 'ResponseData_'+date+'.xlsx'
exceloutput = "C:\\Users\\jimbo\\Desktop\\New folder (3)\\output\\"+newexcelfilename
print("Writing to Excel file...")
# Given a dict of pandas dataframes
dfs = {'Tracts': tracts_finaldf, 'Place':place_finaldf,'MCDs':MCD_finaldf,'Counties': counties_finaldf, 'Congressional Districts':cd_finaldf,'AIAs':aia_finaldf}
writer = pd.ExcelWriter(exceloutput, engine='xlsxwriter')
workbook = writer.book
## columns for 3 color scale formatting export out of pandas as text, need to convert to
number format.
numberformat = workbook.add_format({'num_format': '00.0'})
## manually applying header format
header_format = workbook.add_format({
'bold': True,
'text_wrap': False,
'align': 'left',
})
for sheetname, df in dfs.items(): # loop through `dict` of dataframes
df.to_excel(writer, sheet_name=sheetname, startrow=1,header=False,index=False) # send df to writer
worksheet = writer.sheets[sheetname] # pull worksheet object
for col_num, value in enumerate(df.columns.values):
worksheet.write(0, col_num, value, header_format)
for idx, col in enumerate(df): # loop through all columns
series = df[col]
col_len = len(series.name) # len of column name/header
worksheet.set_column(idx,idx,col_len)
if col in ['Daily Internet Response Rate (%)',
'Daily Response Rate (%)',
'Cumulative Internet Response Rate (%)',
'Cumulative Response Rate (%)']:
worksheet.set_column(idx,idx,col_len,numberformat)
if col == 'DATE':
worksheet.set_column(idx,idx,10)
if col == 'ACO':
worksheet.set_column(idx,idx,5)
## applying conditional formatting to columns which were converted to the
numberformat
if worksheet == 'Tracts':
worksheet.conditional_format('E2:H11982', {'type':'3_color_scale',
'min_color': 'FF5733',
'mid_color':'FFB233',
'max_color': 'C7FF33',
'min_value': 0,
'max_vallue': 100})
writer.save()
Everything functions properly in the code in terms of resizing column widths and applying the numeric format to the specified columns, however I cannot get the conditional formatting to apply.
Ive tried to search all other questions on stack exchange but I cannot find an answer.

You have a few syntax errors in the conditional format, such as not specifying the colours in Html format and a typo in max_value. Once those are fixed it should work. Here is a smaller working example based on yours:
import pandas as pd
# Create a Pandas dataframe from some data.
df = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]})
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('pandas_conditional.xlsx', engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name='Sheet1')
# Get the xlsxwriter workbook and worksheet objects.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Apply a conditional format to the cell range.
worksheet.conditional_format('B2:B8',
{'type': '3_color_scale',
'min_color': '#FF5733',
'mid_color': '#FFB233',
'max_color': '#C7FF33',
'min_value': 0,
'max_value': 100})
# Close the Pandas Excel writer and output the Excel file.
writer.save()
Output:
Also, this line:
if worksheet == 'Tracts':
Should probably be:
if sheetname == 'Tracts':

Related

Excel Table Mis-Aligned Column headers

I have the following code that takes a dataframe dem and creates a formal table in Excel with the data contained in the dataframe. The one issue that I am running into is the headers are not aligned because of the index column. Screenshot of the issue below - notice how Student is over the Index column and everything is misaligned because of this.
Where is the issue in this code? Do I need to reset the index or something?
destination = shutil.copy2(demnote, dnotearch)
writer = pd.ExcelWriter(demnote, engine='xlsxwriter')
dem.to_excel(writer,"Demand")
workbook = writer.book
worksheet_table_header = writer.sheets['Demand']
end_row = len(dem.index)
end_column = len(dem.columns)
cell_range = xlsxwriter.utility.xl_range(0, 0, end_row, end_column)
header = [{'header': c} for c in dem.columns.tolist()]
worksheet_table_header.add_table(cell_range,{'header_row': True, 'columns':header, 'style':'Table Style Medium 11'})
worksheet_table_header.freeze_panes(1, 1)
writer.save()
writer.close()
You need to turn off the index in the dataframe (and you probably should turn off and skip the header row since you are overwriting it:
dem.to_excel(writer, "Demand", startrow=1, header=False, index=False)
Here is a full working example from the XlsxWriter docs:
import pandas as pd
# Create a Pandas dataframe from some data.
df = pd.DataFrame({
'Country': ['China', 'India', 'United States', 'Indonesia'],
'Population': [1404338840, 1366938189, 330267887, 269603400],
'Rank': [1, 2, 3, 4]})
# Order the columns if necessary.
df = df[['Rank', 'Country', 'Population']]
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('pandas_table.xlsx', engine='xlsxwriter')
# Write the dataframe data to XlsxWriter. Turn off the default header and
# index and skip one row to allow us to insert a user defined header.
df.to_excel(writer, sheet_name='Sheet1', startrow=1, header=False, index=False)
# Get the xlsxwriter workbook and worksheet objects.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Get the dimensions of the dataframe.
(max_row, max_col) = df.shape
# Create a list of column headers, to use in add_table().
column_settings = [{'header': column} for column in df.columns]
# Add the Excel table structure. Pandas will add the data.
worksheet.add_table(0, 0, max_row, max_col - 1, {'columns': column_settings})
# Make the columns wider for clarity.
worksheet.set_column(0, max_col - 1, 12)
# Close the Pandas Excel writer and output the Excel file.
writer.save()
Output:

How to change the background color of Excel cells/rows using Pandas?

How should I set the color of a group of cells in Excel using Pandas?
import pandas as pd
import numpy as np
np.random.seed(24)
df = pd.DataFrame({'A': np.linspace(1, 10, 10)})
df = pd.concat([df, pd.DataFrame(np.random.randn(10, 4), columns=list('BCDE'))],
axis=1)
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
workbook = writer.book
worksheet = workbook.add_worksheet('Summary')
writer.sheets['Summary'] = worksheet
format = workbook.add_format({'bg_color': '#21AD25'})
worksheet.set_row(0, cell_format= format) # set the color of the first row to green
df.to_excel(writer, sheet_name='Summary', index=False)
writer.save()
Above code can change the color of the first row. However, the background color of the dataframe header is not changed.
As per the documentation, "Pandas writes the dataframe header with a default cell format. Since it is a cell format it cannot be overridden using set_row(). If you wish to use your own format for the headings then the best approach is to turn off the automatic header from Pandas and write your own.". Solution is to include header=False and startrow=1 in the to_excel call to avoid copying the headers and leave the first row blank, and then iterate over the headers to paste column values and the format you like.
Full example using your code:
import pandas as pd
import numpy as np
np.random.seed(24)
df = pd.DataFrame({'A': np.linspace(1, 10, 10)})
df = pd.concat([df, pd.DataFrame(np.random.randn(10, 4), columns=list('BCDE'))],
axis=1)
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
workbook = writer.book
worksheet = workbook.add_worksheet('Summary')
writer.sheets['Summary'] = worksheet
# format = workbook.add_format({'bg_color': '#21AD25'})
# worksheet.set_row(0, cell_format= format) # set the color of the first row to green
df.to_excel(writer, sheet_name='Summary', index=False, startrow=1, header=False)
# Add a header format.
header_format = workbook.add_format({
'bg_color': '#21AD25', # your setting
'bold': True, # additional stuff...
'text_wrap': True,
'valign': 'top',
'align': 'center',
'border': 1})
# Write the column headers with the defined format.
for col_num, value in enumerate(df.columns.values):
worksheet.write(0, col_num + 1, value, header_format)
writer.save()

Problem with different indexing in pandas and xlsxwriter

Here is the code which works fine:
import pandas as pd
# Create a Pandas dataframe from some data.
df = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]})
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('pandas_conditional.xlsx', engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name='Sheet1')
# Get the xlsxwriter workbook and worksheet objects.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Apply a conditional format to the cell range.
worksheet.conditional_format(1,1,1,1, {'type': '3_color_scale'}) ##CHANGES THE COLOR OF SECOND ROW
# Close the Pandas Excel writer and output the Excel file.
writer.save()
This creates below output.
My question is, Is it possible to include the Header Data in the pandas indexing? I want to start indexing from 1st row. So Header Data should have index 0. It is useful because in xlsxwriter 1st row has index 0.
Firstly index is an object and default index starts from 0. You can swift it by typing:
df.index += 1
As for the header's index name pandas method to_excel takes an argument which is called index_label. So your code should be:
import pandas as pd
# Create a Pandas dataframe from some data.
df = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]})
df.index += 1
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('pandas_conditional.xlsx', engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name='Sheet1', index_label='0')
# Get the xlsxwriter workbook and worksheet objects.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Apply a conditional format to the cell range.
worksheet.conditional_format(1,1,1,1, {'type': '3_color_scale'}) ##CHANGES THE COLOR OF SECOND ROW
# Close the Pandas Excel writer and output the Excel file.
writer.save()
Output:

How to color text in a cell containing a specific string using pandas

After running my algorithms I saved all the data in an excel file using pandas.
writer = pd.ExcelWriter('Diff.xlsx', engine='xlsxwriter')
Now, some of the cells contain strings which includes "-->" in it. I have the row and column number for those cells using:
xl_rowcol_to_cell(rows[i],cols[i])
But I couldn't figure how to color those cells or atleast the whole text in it.
Any suggestions/tips?
You could use a conditional format in Excel like this:
import pandas as pd
# Create a Pandas dataframe from some data.
df = pd.DataFrame({'Data': ['foo', 'a --> b', 'bar']})
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('pandas_conditional.xlsx', engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name='Sheet1')
# Get the xlsxwriter workbook and worksheet objects.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Add a format. Light red fill with dark red text.
format1 = workbook.add_format({'bg_color': '#FFC7CE',
'font_color': '#9C0006'})
# Apply a conditional format to the cell range.
worksheet.conditional_format(1, 1, len(df), 1,
{'type': 'text',
'criteria': 'containing',
'value': '-->',
'format': format1})
# Close the Pandas Excel writer and output the Excel file.
writer.save()
Output:
See Adding Conditional Formatting to Dataframe output in the XlsxWriter docs.
def highlight (dataframe):
if dataframe[dataframe['c'].str.contains("-->")]:
return ['background-color: yellow']*5
else:
return ['background-color: white']*5
df.style.apply(highlight, axis=1)

Pandas ExcelWriter set_column fails to format numbers after DataFrame.to_excel used

I have tried the example code found on the xlsxwriter webpage at http://xlsxwriter.readthedocs.org/en/latest/example_pandas_column_formats.html
import pandas as pd
# Create a Pandas dataframe from some data.
df = pd.DataFrame({'Numbers': [1010, 2020, 3030, 2020, 1515, 3030, 4545],
'Percentage': [.1, .2, .33, .25, .5, .75, .45 ],
})
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter("pandas_column_formats.xlsx", engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name='Sheet1')
# Get the xlsxwriter workbook and worksheet objects.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Add some cell formats.
format1 = workbook.add_format({'num_format': '#,##0.00'})
format2 = workbook.add_format({'num_format': '0%'})
# Note: It isn't possible to format any cells that already have a format such
# as the index or headers or any cells that contain dates or datetimes.
# Set the column width and format.
worksheet.set_column('B:B', 18, format1)
# Set the format but not the column width.
worksheet.set_column('C:C', None, format2)
# Close the Pandas Excel writer and output the Excel file.
writer.save()
However it does not format the columns as expected - they simply appear unformatted (no numeric rounding, percentages.)
I am using Pandas 0.15.2. Any ideas. Has this changed recently in Pandas perhaps?
Any ideas would be welcome.
This seems like it is fixed in Pandas 16. See https://github.com/jmcnamara/XlsxWriter/issues/204

Categories

Resources