Excel Table Mis-Aligned Column headers - python

I have the following code that takes a dataframe dem and creates a formal table in Excel with the data contained in the dataframe. The one issue that I am running into is the headers are not aligned because of the index column. Screenshot of the issue below - notice how Student is over the Index column and everything is misaligned because of this.
Where is the issue in this code? Do I need to reset the index or something?
destination = shutil.copy2(demnote, dnotearch)
writer = pd.ExcelWriter(demnote, engine='xlsxwriter')
dem.to_excel(writer,"Demand")
workbook = writer.book
worksheet_table_header = writer.sheets['Demand']
end_row = len(dem.index)
end_column = len(dem.columns)
cell_range = xlsxwriter.utility.xl_range(0, 0, end_row, end_column)
header = [{'header': c} for c in dem.columns.tolist()]
worksheet_table_header.add_table(cell_range,{'header_row': True, 'columns':header, 'style':'Table Style Medium 11'})
worksheet_table_header.freeze_panes(1, 1)
writer.save()
writer.close()

You need to turn off the index in the dataframe (and you probably should turn off and skip the header row since you are overwriting it:
dem.to_excel(writer, "Demand", startrow=1, header=False, index=False)
Here is a full working example from the XlsxWriter docs:
import pandas as pd
# Create a Pandas dataframe from some data.
df = pd.DataFrame({
'Country': ['China', 'India', 'United States', 'Indonesia'],
'Population': [1404338840, 1366938189, 330267887, 269603400],
'Rank': [1, 2, 3, 4]})
# Order the columns if necessary.
df = df[['Rank', 'Country', 'Population']]
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('pandas_table.xlsx', engine='xlsxwriter')
# Write the dataframe data to XlsxWriter. Turn off the default header and
# index and skip one row to allow us to insert a user defined header.
df.to_excel(writer, sheet_name='Sheet1', startrow=1, header=False, index=False)
# Get the xlsxwriter workbook and worksheet objects.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Get the dimensions of the dataframe.
(max_row, max_col) = df.shape
# Create a list of column headers, to use in add_table().
column_settings = [{'header': column} for column in df.columns]
# Add the Excel table structure. Pandas will add the data.
worksheet.add_table(0, 0, max_row, max_col - 1, {'columns': column_settings})
# Make the columns wider for clarity.
worksheet.set_column(0, max_col - 1, 12)
# Close the Pandas Excel writer and output the Excel file.
writer.save()
Output:

Related

How to change the background color of Excel cells/rows using Pandas?

How should I set the color of a group of cells in Excel using Pandas?
import pandas as pd
import numpy as np
np.random.seed(24)
df = pd.DataFrame({'A': np.linspace(1, 10, 10)})
df = pd.concat([df, pd.DataFrame(np.random.randn(10, 4), columns=list('BCDE'))],
axis=1)
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
workbook = writer.book
worksheet = workbook.add_worksheet('Summary')
writer.sheets['Summary'] = worksheet
format = workbook.add_format({'bg_color': '#21AD25'})
worksheet.set_row(0, cell_format= format) # set the color of the first row to green
df.to_excel(writer, sheet_name='Summary', index=False)
writer.save()
Above code can change the color of the first row. However, the background color of the dataframe header is not changed.
As per the documentation, "Pandas writes the dataframe header with a default cell format. Since it is a cell format it cannot be overridden using set_row(). If you wish to use your own format for the headings then the best approach is to turn off the automatic header from Pandas and write your own.". Solution is to include header=False and startrow=1 in the to_excel call to avoid copying the headers and leave the first row blank, and then iterate over the headers to paste column values and the format you like.
Full example using your code:
import pandas as pd
import numpy as np
np.random.seed(24)
df = pd.DataFrame({'A': np.linspace(1, 10, 10)})
df = pd.concat([df, pd.DataFrame(np.random.randn(10, 4), columns=list('BCDE'))],
axis=1)
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
workbook = writer.book
worksheet = workbook.add_worksheet('Summary')
writer.sheets['Summary'] = worksheet
# format = workbook.add_format({'bg_color': '#21AD25'})
# worksheet.set_row(0, cell_format= format) # set the color of the first row to green
df.to_excel(writer, sheet_name='Summary', index=False, startrow=1, header=False)
# Add a header format.
header_format = workbook.add_format({
'bg_color': '#21AD25', # your setting
'bold': True, # additional stuff...
'text_wrap': True,
'valign': 'top',
'align': 'center',
'border': 1})
# Write the column headers with the defined format.
for col_num, value in enumerate(df.columns.values):
worksheet.write(0, col_num + 1, value, header_format)
writer.save()

Use pandas to calculate month and week from a given date column in excel and append to another column in same sheet

I am working on a pandas program, where i fetch rows from other excel sheets and append them to the main file:
import pandas as pd
from openpyxl import load_workbook
#reading all three ticket excel sheets
df1 = pd.read_excel("sheet a.xlsx")
df2 = pd.read_excel("sheet b.xlsx")
df3 = pd.read_excel("sheet c.xlsx")
#Creating Panadas Excel writer using xlsxwriter as engine
writer = pd.ExcelWriter(r"main_excel.xlsx", engine = "openpyxl")
writer.book = load_workbook(r"main_excel.xlsx")
sheets = writer.book.sheetnames
reader1 = pd.read_excel(r"main_excel.xlsx", "sheet a")
reader2 = pd.read_excel(r"main_excel.xlsx", "sheet b")
reader3 = pd.read_excel(r"main_excel.xlsx", "sheet c")
df1.to_excel(writer, sheet_name =sheets[0], index = False, header = False,startrow=len(reader1)+1)
df2.to_excel(writer, sheet_name =sheets[2], index = False, header = False,startrow=len(reader2)+1)
df3.to_excel(writer, sheet_name =sheets[4], index = False, header = False,startrow=len(reader3)+1)
writer.save()
writer.close()
After writing the data to the excel file, I have to calculate the month and the week number from the dates in the data and fill in the missing columns.
The data gets appended each week, so i would to append the data to the pre-existing columns.
is there a way to do that without writing the formula in the excel sheet itself? by coding it in the program?
You can convert your date column using the code below:
df['Opened'] = pd.to_datetime(df['Opened'])
Then you can get your other columns using:
df['Month'] = df['Opened'].dt.month_name()
df['Week'] = df['Opened'].dt.week

Applying conditional formatting to excel column from pandas dataframe

Im trying to make an excel document with multiple sheets and apply conditional formatting to select columns in the sheet, however, for some reason I cannot get the conditional formatting to apply when I open the sheet.
newexcelfilename= 'ResponseData_'+date+'.xlsx'
exceloutput = "C:\\Users\\jimbo\\Desktop\\New folder (3)\\output\\"+newexcelfilename
print("Writing to Excel file...")
# Given a dict of pandas dataframes
dfs = {'Tracts': tracts_finaldf, 'Place':place_finaldf,'MCDs':MCD_finaldf,'Counties': counties_finaldf, 'Congressional Districts':cd_finaldf,'AIAs':aia_finaldf}
writer = pd.ExcelWriter(exceloutput, engine='xlsxwriter')
workbook = writer.book
## columns for 3 color scale formatting export out of pandas as text, need to convert to
number format.
numberformat = workbook.add_format({'num_format': '00.0'})
## manually applying header format
header_format = workbook.add_format({
'bold': True,
'text_wrap': False,
'align': 'left',
})
for sheetname, df in dfs.items(): # loop through `dict` of dataframes
df.to_excel(writer, sheet_name=sheetname, startrow=1,header=False,index=False) # send df to writer
worksheet = writer.sheets[sheetname] # pull worksheet object
for col_num, value in enumerate(df.columns.values):
worksheet.write(0, col_num, value, header_format)
for idx, col in enumerate(df): # loop through all columns
series = df[col]
col_len = len(series.name) # len of column name/header
worksheet.set_column(idx,idx,col_len)
if col in ['Daily Internet Response Rate (%)',
'Daily Response Rate (%)',
'Cumulative Internet Response Rate (%)',
'Cumulative Response Rate (%)']:
worksheet.set_column(idx,idx,col_len,numberformat)
if col == 'DATE':
worksheet.set_column(idx,idx,10)
if col == 'ACO':
worksheet.set_column(idx,idx,5)
## applying conditional formatting to columns which were converted to the
numberformat
if worksheet == 'Tracts':
worksheet.conditional_format('E2:H11982', {'type':'3_color_scale',
'min_color': 'FF5733',
'mid_color':'FFB233',
'max_color': 'C7FF33',
'min_value': 0,
'max_vallue': 100})
writer.save()
Everything functions properly in the code in terms of resizing column widths and applying the numeric format to the specified columns, however I cannot get the conditional formatting to apply.
Ive tried to search all other questions on stack exchange but I cannot find an answer.
You have a few syntax errors in the conditional format, such as not specifying the colours in Html format and a typo in max_value. Once those are fixed it should work. Here is a smaller working example based on yours:
import pandas as pd
# Create a Pandas dataframe from some data.
df = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]})
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('pandas_conditional.xlsx', engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name='Sheet1')
# Get the xlsxwriter workbook and worksheet objects.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Apply a conditional format to the cell range.
worksheet.conditional_format('B2:B8',
{'type': '3_color_scale',
'min_color': '#FF5733',
'mid_color': '#FFB233',
'max_color': '#C7FF33',
'min_value': 0,
'max_value': 100})
# Close the Pandas Excel writer and output the Excel file.
writer.save()
Output:
Also, this line:
if worksheet == 'Tracts':
Should probably be:
if sheetname == 'Tracts':

Problem with different indexing in pandas and xlsxwriter

Here is the code which works fine:
import pandas as pd
# Create a Pandas dataframe from some data.
df = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]})
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('pandas_conditional.xlsx', engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name='Sheet1')
# Get the xlsxwriter workbook and worksheet objects.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Apply a conditional format to the cell range.
worksheet.conditional_format(1,1,1,1, {'type': '3_color_scale'}) ##CHANGES THE COLOR OF SECOND ROW
# Close the Pandas Excel writer and output the Excel file.
writer.save()
This creates below output.
My question is, Is it possible to include the Header Data in the pandas indexing? I want to start indexing from 1st row. So Header Data should have index 0. It is useful because in xlsxwriter 1st row has index 0.
Firstly index is an object and default index starts from 0. You can swift it by typing:
df.index += 1
As for the header's index name pandas method to_excel takes an argument which is called index_label. So your code should be:
import pandas as pd
# Create a Pandas dataframe from some data.
df = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]})
df.index += 1
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('pandas_conditional.xlsx', engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name='Sheet1', index_label='0')
# Get the xlsxwriter workbook and worksheet objects.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Apply a conditional format to the cell range.
worksheet.conditional_format(1,1,1,1, {'type': '3_color_scale'}) ##CHANGES THE COLOR OF SECOND ROW
# Close the Pandas Excel writer and output the Excel file.
writer.save()
Output:

How to color text in a cell containing a specific string using pandas

After running my algorithms I saved all the data in an excel file using pandas.
writer = pd.ExcelWriter('Diff.xlsx', engine='xlsxwriter')
Now, some of the cells contain strings which includes "-->" in it. I have the row and column number for those cells using:
xl_rowcol_to_cell(rows[i],cols[i])
But I couldn't figure how to color those cells or atleast the whole text in it.
Any suggestions/tips?
You could use a conditional format in Excel like this:
import pandas as pd
# Create a Pandas dataframe from some data.
df = pd.DataFrame({'Data': ['foo', 'a --> b', 'bar']})
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('pandas_conditional.xlsx', engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name='Sheet1')
# Get the xlsxwriter workbook and worksheet objects.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Add a format. Light red fill with dark red text.
format1 = workbook.add_format({'bg_color': '#FFC7CE',
'font_color': '#9C0006'})
# Apply a conditional format to the cell range.
worksheet.conditional_format(1, 1, len(df), 1,
{'type': 'text',
'criteria': 'containing',
'value': '-->',
'format': format1})
# Close the Pandas Excel writer and output the Excel file.
writer.save()
Output:
See Adding Conditional Formatting to Dataframe output in the XlsxWriter docs.
def highlight (dataframe):
if dataframe[dataframe['c'].str.contains("-->")]:
return ['background-color: yellow']*5
else:
return ['background-color: white']*5
df.style.apply(highlight, axis=1)

Categories

Resources