I am trying to write out my pandas table using xlsxwriter. I have two columns:
Date | Time
10/10/2015 8:57
11/10/2015 10:23
But when I use xlsxwriter, the output is:
Date | Time
10/10/2015 0.63575435
11/10/2015 0.33256774
I tried using datetime_format = 'hh:mm:ss' but this didn't change it. How else can I get the date to format correctly without effecting the date column?
The following code works for me, but there are some caveats. If the custom formatting will work depends on the Windows/Excel version you open it with. Excels custom formatting depends on the language settings of the Windows OS.
Excel custom formatting
Windows date/time settings
So yeah, not the best solution... but the idea is to change the formatting for each column instead of changing how to interpret a type of data for the whole excel file that is being created.
import pandas as pd
from datetime import datetime, date
# Create a Pandas dataframe from some datetime data.
df = pd.DataFrame({'Date and time': [date(2015, 1, 1),
date(2015, 1, 2),
date(2015, 1, 3),
date(2015, 1, 4),
date(2015, 1, 5)],
'Time only': ["11:30:55",
"1:20:33",
"11:10:00",
"16:45:35",
"12:10:15"],
})
df['Time only'] = df['Time only'].apply(pd.to_timedelta)
#df['Date and time'] = df['Date and time'].apply(pd.to_datetime)
# Create a Pandas Excel writer using XlsxWriter as the engine.
# Also set the default datetime and date formats.
writer = pd.ExcelWriter("pandas_datetime.xlsx",
engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name='Sheet1')
# Get the xlsxwriter workbook and worksheet objects in order to set the column
# widths, to make the dates clearer.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
#PLAY AROUND WITH THE NUM_FORMAT, IT DEPENDS ON YOUR WINDOWS AND EXCEL DATE/TIME SETTINGS WHAT WILL WORK
# Add some cell formats.
format1 = workbook.add_format({'num_format': 'd-mmm-yy'})
format2 = workbook.add_format({'num_format': "h:mm:ss"})
# Set the format
worksheet.set_column('B:B', None, format1)
worksheet.set_column('C:C', None, format2)
worksheet.set_column('B:C', 20)
# Close the Pandas Excel writer and output the Excel file.
writer.save()
Related
example data:
A
B
1
2020/10/01
2
2021/10/01
Im using pandas.to_excel like so:
df = pd.DataFrame(list(data))
writer = pd.ExcelWriter("excel.xlsx", engine='xlsxwriter', date_format="dd/mm/yyyy;#")
df.to_excel(writer_head, sheet_name='Sheet 1', index=False, startrow=4)
then i create the formatting like this:
df.to_excel(writer, sheet_name='Sheet 1', index=False, startrow=1)
workbook = writer.book
worksheet = writer.sheets['Sheet 1']
# format
date_align = workbook.add_format({
'align': 'center',
'valign': 'vcenter',
'num_format': 'dd/mm/yyyy;#',
})
So i tried to apply the formatting like this:
worksheet.set_column('B:B', 13, date_align)
writer.save()
But it didn't work, the date being created from pd.to_excel() doesn't change in alignment nor number format, but if i tried to write the data manually it worked like so:
worksheet.write('B', datetime.now().today())
worksheet.set_column('B:B', 13, date_align)
writer.save()
Now that's worked, but i want the data from pd.to_excel() to be formatted, and i checked the type from the list is indeed datetime.date and the excel output has category of 'Date' not custom or anything else. Oh and the alignment worked fine using pd.to_excel() as long as it is not date or datetime
I am writing a dataframe into excel and using xlsx writer to format my date columns to a custom format but the excel always contains a datetime value and ignores the custom formatting specified in my code. Here is the code:
writer = ExcelWriter(path+'test.xlsx', engine='xlsxwriter')
workbook = writer.book
df.to_excel(writer,sheet_name='sheet1', index=False, startrow = 1, header=False)
worksheet1 = writer.sheets['sheet1']
fmt = workbook.add_format({'num_format':'d-mmm-yy'})
worksheet1.set_column('C:C', None, fmt)
# Adjusting column width
worksheet1.set_column(0, 20, 12)
# Adding back the header row
column_list = df.columns
for idx, val in enumerate(column_list):
worksheet1.write(0, idx, val)
writer.save()
Here I want 'd-mmm-yy' format for column C but the exported excel contains datetime values. I also don't want to use strftime to convert my columns to strings to ensure easy date filtering in excel.
Excel output:
The reason this doesn't work as expected is because Pandas uses a default datetime format with datetime objects and it applies this format at the cell level. In XlsxWriter, and Excel, a cell format overrides a column format so you column format has no effect.
The easiest way to handle this is to specify the Pandas date (or datetime) format as a parameter in pd.ExcelWriter():
import pandas as pd
from datetime import date
df = pd.DataFrame({'Dates': [date(2020, 2, 1),
date(2020, 2, 2),
date(2020, 2, 3),
date(2020, 2, 4),
date(2020, 2, 5)]})
writer = pd.ExcelWriter('pandas_datetime.xlsx',
engine='xlsxwriter',
date_format='d-mmm-yy')
df.to_excel(writer, sheet_name='Sheet1')
writer.save()
Output:
See also this Pandas Datetime example from the XlsxWriter docs.
This question already has answers here:
Convert Pandas Column to DateTime
(8 answers)
Closed 2 years ago.
Need guidance on how I can format a value to date format in Pandas before it prints out the value to an Excel sheet.
I am new to Pandas and had to edit an existing code when the values are output to an Excel sheet.
After some conditional/functional calculations are done, the value is then output to Excel.
My Current value seems to be in string format which is not an Excel friendly date format.
Output of the value looks like this :
Needed to format the output to the date format
I did try the options of strptime, but as per my understanding, these values will also give the output in string format. the strange part is, I am not able to format the column in Excel to date format using Excel formatting option as well.
Thank you for your time and help.
My code is something like this:
def calculate(snumber,owner,reason):
#some if conditions and then
date11 = Date + relativedelta(months = 1)
return date11.strftime('%d %b %Y')
df['date1'] = df.apply(lambda x: calculate(x['snumber'], x['owner'], x['reason']), axis=1)
For making sure you have column in dateformat, use following
df['date1'] = df['date1'].dt.strftime('%Y/%m/%d')
Once that is done, you can use Pandas ExcelWriter's xlsxwriter engine.
Please see more details about that in this article: https://xlsxwriter.readthedocs.io/example_pandas_column_formats.html
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter("pandas_column_formats.xlsx", engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name='Sheet1')
# Get the xlsxwriter workbook and worksheet objects.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
format = workbook.add_format({'num_format': 'dd/mm/yy'})
# Set the column width and format.
# Provide proper column where you have date info.
worksheet.set_column('A:A', 18, format)
# Close the Pandas Excel writer and output the Excel file.
writer.save()
Convert date format in pandas dataframe itself using below:
date['date1'] = pd.to_datetime(df['date1'])
Example:
I have a dataframe
PRODUCT PRICE PURCHSE date
0 ABC 5000 True 2020/06/01
1 ABB 2500 False 2020/06/01
apply above given formulae on date in dataframe
df['date'] = pd.to_datetime(df['date'])
Output will like:
PRODUCT PRICE PURCHSE date
0 ABC 5000 True 2020-06-01
1 ABB 2500 False 2020-06-01
I have an excel sheet which came from a pandas dataframe. I then use Xlsxwriter to add formulas, new columns and formatting. The problem is I only seem to be able format what I've written using xlsxwriter and nothing that came from the dataframe. So what I get is something like this half formatted table
As you can see from the image the two columns from the dataframe remain untouched. They must have some kind of default formatting that is overriding mine.
Since I don't know how to convert a worksheet back into to a dataframe the code below is obviously completely wrong but it's just to give an idea of what I'm looking for.
export = "files/sharepointExtract.xlsx"
df = pd.read_excel(export)# df = dataframe
writer = pd.ExcelWriter('files/new_report-%s.xlsx' % (date.today()), engine = 'xlsxwriter')
workbook = writer.book
# Code to make the header red, this works fine because
# it's written in xlsxwriter using write.row()
colour_format = workbook.add_format()
colour_format.set_bg_color('#640000')
colour_format.set_font_color('white')
worksheet.set_row(0, 15, colour_format)
table_body_format = workbook.add_format()
table_body_format.set_bg_color('blue')
for row in worksheet.rows:
row.set_row(0,15, table_body_format)
This code gives an Attribute error but even without the for loop we just get what can be seen in the image.
The following should work:
import pandas as pd
from datetime import date
export = "files/sharepointExtract.xlsx"
df = pd.read_excel(export)
writer = pd.ExcelWriter('files/new_report-{}.xlsx'.format(date.today()), engine ='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', startrow=1 , startcol=0, header=False, index=False, encoding='utf8')
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Code to make the header red background with white text
colour_format = workbook.add_format()
colour_format.set_bg_color('#640000')
colour_format.set_font_color('white')
# Code to make the body blue
table_body_format = workbook.add_format()
table_body_format.set_bg_color('blue')
# Set the header (row 0) to height 15 using colour_format
worksheet.set_row(0, 15, colour_format)
# Set the default format for other rows
worksheet.set_column('A:Z', 15, table_body_format)
# Write the header manually
for colx, value in enumerate(df.columns.values):
worksheet.write(0, colx, value)
writer.save()
When Pandas is used to write the header, it uses its own format style which overwrites the underlying xlsxwriter version. The simplest approach is to stop it from writing the header and get it to write the rest of the data from row 1 onwards (not 0). This avoids the formatting from being altered. You can then easily write your own header using the column values from the dataframe.
I have tried the example code found on the xlsxwriter webpage at http://xlsxwriter.readthedocs.org/en/latest/example_pandas_column_formats.html
import pandas as pd
# Create a Pandas dataframe from some data.
df = pd.DataFrame({'Numbers': [1010, 2020, 3030, 2020, 1515, 3030, 4545],
'Percentage': [.1, .2, .33, .25, .5, .75, .45 ],
})
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter("pandas_column_formats.xlsx", engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name='Sheet1')
# Get the xlsxwriter workbook and worksheet objects.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Add some cell formats.
format1 = workbook.add_format({'num_format': '#,##0.00'})
format2 = workbook.add_format({'num_format': '0%'})
# Note: It isn't possible to format any cells that already have a format such
# as the index or headers or any cells that contain dates or datetimes.
# Set the column width and format.
worksheet.set_column('B:B', 18, format1)
# Set the format but not the column width.
worksheet.set_column('C:C', None, format2)
# Close the Pandas Excel writer and output the Excel file.
writer.save()
However it does not format the columns as expected - they simply appear unformatted (no numeric rounding, percentages.)
I am using Pandas 0.15.2. Any ideas. Has this changed recently in Pandas perhaps?
Any ideas would be welcome.
This seems like it is fixed in Pandas 16. See https://github.com/jmcnamara/XlsxWriter/issues/204