I am writing a dataframe into excel and using xlsx writer to format my date columns to a custom format but the excel always contains a datetime value and ignores the custom formatting specified in my code. Here is the code:
writer = ExcelWriter(path+'test.xlsx', engine='xlsxwriter')
workbook = writer.book
df.to_excel(writer,sheet_name='sheet1', index=False, startrow = 1, header=False)
worksheet1 = writer.sheets['sheet1']
fmt = workbook.add_format({'num_format':'d-mmm-yy'})
worksheet1.set_column('C:C', None, fmt)
# Adjusting column width
worksheet1.set_column(0, 20, 12)
# Adding back the header row
column_list = df.columns
for idx, val in enumerate(column_list):
worksheet1.write(0, idx, val)
writer.save()
Here I want 'd-mmm-yy' format for column C but the exported excel contains datetime values. I also don't want to use strftime to convert my columns to strings to ensure easy date filtering in excel.
Excel output:
The reason this doesn't work as expected is because Pandas uses a default datetime format with datetime objects and it applies this format at the cell level. In XlsxWriter, and Excel, a cell format overrides a column format so you column format has no effect.
The easiest way to handle this is to specify the Pandas date (or datetime) format as a parameter in pd.ExcelWriter():
import pandas as pd
from datetime import date
df = pd.DataFrame({'Dates': [date(2020, 2, 1),
date(2020, 2, 2),
date(2020, 2, 3),
date(2020, 2, 4),
date(2020, 2, 5)]})
writer = pd.ExcelWriter('pandas_datetime.xlsx',
engine='xlsxwriter',
date_format='d-mmm-yy')
df.to_excel(writer, sheet_name='Sheet1')
writer.save()
Output:
See also this Pandas Datetime example from the XlsxWriter docs.
Related
Here is the code which works fine:
import pandas as pd
# Create a Pandas dataframe from some data.
df = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]})
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('pandas_conditional.xlsx', engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name='Sheet1')
# Get the xlsxwriter workbook and worksheet objects.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Apply a conditional format to the cell range.
worksheet.conditional_format(1,1,1,1, {'type': '3_color_scale'}) ##CHANGES THE COLOR OF SECOND ROW
# Close the Pandas Excel writer and output the Excel file.
writer.save()
This creates below output.
My question is, Is it possible to include the Header Data in the pandas indexing? I want to start indexing from 1st row. So Header Data should have index 0. It is useful because in xlsxwriter 1st row has index 0.
Firstly index is an object and default index starts from 0. You can swift it by typing:
df.index += 1
As for the header's index name pandas method to_excel takes an argument which is called index_label. So your code should be:
import pandas as pd
# Create a Pandas dataframe from some data.
df = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]})
df.index += 1
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('pandas_conditional.xlsx', engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name='Sheet1', index_label='0')
# Get the xlsxwriter workbook and worksheet objects.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Apply a conditional format to the cell range.
worksheet.conditional_format(1,1,1,1, {'type': '3_color_scale'}) ##CHANGES THE COLOR OF SECOND ROW
# Close the Pandas Excel writer and output the Excel file.
writer.save()
Output:
I want to format a column in a dataframe to have ',' between large numbers once i send the df to_excel. i have a code that works but it selects the column based on its position. I want a code to select the column based on its name and not position. can someone help me please?
df.to_excel(writer, sheet_name = 'Final Trade List')
wb = writer.book
ws = writer.sheets['Final Trade List']
format = wb.add_format({'num_format': '#,##'})
ws.set_column('O:O', 12, format) # this code works but its based on position and not name
ws.set_column(df['$ to buy'], 12, format) # this gives me an error
writer.save()
TypeError: cannot convert the series to <class 'int'>
This should do the trick:
import pandas as pd
df['columnname'] = pd.Series([format(val, ',') for val in df['columnname']], index = df.index)
I am trying to get the following output. All rows and columns are text wrapped except the header though:
import pandas as pd
import pandas.io.formats.style
import os
from pandas import ExcelWriter
import numpy as np
from xlsxwriter.utility import xl_rowcol_to_cell
writer = pd.ExcelWriter('test1.xlsx',engine='xlsxwriter',options={'strings_to_numbers': True},date_format='mmmm dd yyyy')
df = pd.read_csv("D:\\Users\\u700216\\Desktop\\Reports\\CD_Counts.csv")
df.to_excel(writer,sheet_name='Sheet1',startrow=1 , startcol=1, header=True, index=False, encoding='utf8')
workbook = writer.book
worksheet = writer.sheets['Sheet1']
format = workbook.add_format()
format1 = workbook.add_format({'bold': True, 'align' : 'left'})
format.set_align('Center')
format1.set_align('Center')
format.set_text_wrap()
format1.set_text_wrap()
worksheet.set_row(0, 20, format1)
worksheet.set_column('A:Z', 30, format)
writer.save()
format is applied for all rows and columns except header. i dont know why format is not applied to first column (Header) or i would like to manually add column header numbers such as 0,1,2 etc so that i will turn of the header therefore all the rows and columns will be formatted
In the above screenshot wrap text is not applied to A1 to E1, C1 column has header with lot of space. if i manually click wrap text it gets aligned else all the header is not formatted using text wrap.
A couple of problems:
Your code is correctly attempting to format the header, but when you create your file using .to_excel() you are telling it to start at row/col 1, 1. The cells though are numbered from 0, 0. So if you change to:
df.to_excel(writer,sheet_name='Sheet1', startrow=0, startcol=0, header=True, index=False, encoding='utf8')
You will see col A and row 1 are both formatted:
i.e. Col A is 0 and Row 1 is 0
When using Pandas to write the header, it applies its own format which will overwrite the formatting you have provided. To get around this, turn off headers and get it to only write the data from row 1 onwards and write the header manually.
The following might be a bit clearer:
import pandas as pd
import pandas.io.formats.style
import os
from pandas import ExcelWriter
import numpy as np
from xlsxwriter.utility import xl_rowcol_to_cell
writer = pd.ExcelWriter('test1.xlsx', engine='xlsxwriter', options={'strings_to_numbers': True}, date_format='mmmm dd yyyy')
#df = pd.read_csv("D:\\Users\\u700216\\Desktop\\Reports\\CD_Counts.csv")
df = pd.read_csv("CD_Counts.csv")
df.to_excel(writer, sheet_name='Sheet1', startrow=1 , startcol=0, header=False, index=False, encoding='utf8')
workbook = writer.book
worksheet = writer.sheets['Sheet1']
format_header = workbook.add_format()
format_header.set_align('center')
format_header.set_bold()
format_header.set_text_wrap()
format_header.set_border()
format_data = workbook.add_format()
format_data.set_align('center')
format_data.set_text_wrap()
worksheet.set_column('A:Z', 20, format_data)
worksheet.set_row(0, 40, format_header)
# Write the header manually
for colx, value in enumerate(df.columns.values):
worksheet.write(0, colx, value)
writer.save()
Which would give you:
Note: It is also possible to tell Pandas the style to use, or to force it to None so it will inherit your own style. The only drawback with that approach is that the method required to do that depends on the version of Pandas that is being used. This approach works for all versions.
I am working on a project where I am writing out onto an xlsx spreadsheet and need to format the one column for 'Date'. I get the program to run and all but the column format is still set to 'General'.
Try this in a different way with different code to see if anyone answers.:
for row in cur.execute('''SELECT `Mapline`,`Plant`,`Date`,`Action` from AEReport'''):
lengthOfHeadings = len(row)
output = '%s-%s.xlsx' % ("AEReport",now.strftime("%m%d%Y-%H%M"))
workbook = xlsxwriter.Workbook(output, {'strings_to_numbers':True})
worksheet = workbook.add_worksheet()
format=workbook.add_format({'font_size':'8','border':True})
format2=workbook.add_format({'font_size':'8','border':True,'num_format':'mm/dd/yy hh:mm'})
count = 0
for name in range(0,lengthOfHeadings):
if name==row[2]:
name=int(name)
worksheet.write(counter, count, row[name],format2)
else:
worksheet.write(counter, count, row[name],format)
count += 1
counter += 1
Slihthinden
To get the date time format working, you would have to get the date value converted to a excel serial date value.
Here is an example showing how does it work:
import pandas as pd
data = pd.DataFrame({'test_date':pd.date_range('1/1/2011', periods=12, freq='M') })
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
data.test_date = data.test_date - pd.datetime(1899, 12, 31)
pd.core.format.header_style = None
data.to_excel(writer, sheet_name='test', index=False)
workbook = writer.book
worksheet = writer.sheets['test']
formatdict = {'num_format':'mm/dd/yyyy'}
fmt = workbook.add_format(formatdict)
worksheet.set_column('A:A', None, fmt)
writer.save()
This is how the output will look like:
from datetime import datetime
date_format = workbook.add_format({'num_format':'yyyy-mm-dd hh:mm:ss'})
worksheet.write(0, 0, datetime.today(),date_format)
result:
image from Excel Generated
date = workbook.add_format({'num_format': 'dd-mm-yyyy'})
worksheet.write(1, 1 , 02-12-199, date)
I am trying to write out my pandas table using xlsxwriter. I have two columns:
Date | Time
10/10/2015 8:57
11/10/2015 10:23
But when I use xlsxwriter, the output is:
Date | Time
10/10/2015 0.63575435
11/10/2015 0.33256774
I tried using datetime_format = 'hh:mm:ss' but this didn't change it. How else can I get the date to format correctly without effecting the date column?
The following code works for me, but there are some caveats. If the custom formatting will work depends on the Windows/Excel version you open it with. Excels custom formatting depends on the language settings of the Windows OS.
Excel custom formatting
Windows date/time settings
So yeah, not the best solution... but the idea is to change the formatting for each column instead of changing how to interpret a type of data for the whole excel file that is being created.
import pandas as pd
from datetime import datetime, date
# Create a Pandas dataframe from some datetime data.
df = pd.DataFrame({'Date and time': [date(2015, 1, 1),
date(2015, 1, 2),
date(2015, 1, 3),
date(2015, 1, 4),
date(2015, 1, 5)],
'Time only': ["11:30:55",
"1:20:33",
"11:10:00",
"16:45:35",
"12:10:15"],
})
df['Time only'] = df['Time only'].apply(pd.to_timedelta)
#df['Date and time'] = df['Date and time'].apply(pd.to_datetime)
# Create a Pandas Excel writer using XlsxWriter as the engine.
# Also set the default datetime and date formats.
writer = pd.ExcelWriter("pandas_datetime.xlsx",
engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name='Sheet1')
# Get the xlsxwriter workbook and worksheet objects in order to set the column
# widths, to make the dates clearer.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
#PLAY AROUND WITH THE NUM_FORMAT, IT DEPENDS ON YOUR WINDOWS AND EXCEL DATE/TIME SETTINGS WHAT WILL WORK
# Add some cell formats.
format1 = workbook.add_format({'num_format': 'd-mmm-yy'})
format2 = workbook.add_format({'num_format': "h:mm:ss"})
# Set the format
worksheet.set_column('B:B', None, format1)
worksheet.set_column('C:C', None, format2)
worksheet.set_column('B:C', 20)
# Close the Pandas Excel writer and output the Excel file.
writer.save()