I have a DataFrame that contains a datetime64 and a timedelta64. Unfortunately, I can't export the latter to a properly formatted hh:mm:ss column in an Excel file:
import pandas as pd
data = {
"date": [
"2023-02-05",
"2023-02-05",
"2022-12-02",
"2022-11-29",
"2022-11-18",
],
"duration": [
"01:07:48",
"05:23:06",
"02:41:58",
"00:35:11",
"02:00:20",
],
}
df = pd.DataFrame(data)
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d')
df['duration'] = pd.to_timedelta(df['duration'])
with pd.ExcelWriter(
"df.xlsx",
datetime_format="YYYY-MM-DD",
engine="xlsxwriter",
) as writer:
workbook = writer.book
time_format = workbook.add_format({"num_format": "HH:MM:SS"})
df.to_excel(writer, sheet_name="sheet", index=False)
worksheet = writer.sheets["sheet"]
worksheet.set_column("A:A", 20)
worksheet.set_column("B:B", 50, cell_format=time_format)
The resulting Excel file will display like this:
So, the date_time format in the ExcelWriter object is applied correctly for column A, as well as the width setting for column B, but the number formatting isn't working.
What am I doing wrong?
The reason that the column format isn't being applied is that Pandas is applying a cell number format of "0" to the timedelta values. The cell format overrides the column format so that isn't applied. You can verify this by adding the following at the end of the with statement and you will see that it is formatted as expected:
worksheet.write(7, 1, .5)
I'm not sure what is the best way to work around but you could iterate over the timedelta values and rewrite them out to override the pandas formatted values. Something like this:
import pandas as pd
data = {
"date": [
"2023-02-05",
"2023-02-05",
"2022-12-02",
"2022-11-29",
"2022-11-18",
],
"duration": [
"01:07:48",
"05:23:06",
"02:41:58",
"00:35:11",
"02:00:20",
],
}
df = pd.DataFrame(data)
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d')
df['duration'] = pd.to_timedelta(df['duration'])
with pd.ExcelWriter(
"df.xlsx",
datetime_format="YYYY-MM-DD",
engine="xlsxwriter",
) as writer:
workbook = writer.book
time_format = workbook.add_format({"num_format": "HH:MM:SS"})
df.to_excel(writer, sheet_name="sheet", index=False)
worksheet = writer.sheets["sheet"]
worksheet.set_column("A:A", 20)
worksheet.set_column("B:B", 50, cell_format=time_format)
col = df.columns.get_loc('duration')
for row, timedelta in enumerate(df['duration'], 1):
worksheet.write(row, col, timedelta)
Output:
You could also covert the timedelta back to a number (like Pandas does) since dates or times in Excel are just numbers anyway with a format.
Something like this, which will give the same result as above:
df = pd.DataFrame(data)
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d')
df['duration'] = pd.to_timedelta(df['duration']).dt.total_seconds() / 86400
with pd.ExcelWriter(
"df.xlsx",
datetime_format="YYYY-MM-DD",
engine="xlsxwriter",
) as writer:
workbook = writer.book
time_format = workbook.add_format({"num_format": "HH:MM:SS"})
df.to_excel(writer, sheet_name="sheet", index=False)
worksheet = writer.sheets["sheet"]
worksheet.set_column("A:A", 20)
worksheet.set_column("B:B", 50, cell_format=time_format)
The problem is that Excel is measuring your time in day units. For example, for your first value, (1:07:48 = 4068s) you are getting a duration of (4048/(24*3600)) days.
You have the posible solutions here:
formatting timedelta64 when using pandas.to_excel
Related
I have an ecxel file like this and I want the numbers in the date field to be converted to a date like (2021.7.22) and replaced in the date field again using python
You can try something like this:
import pandas as pd
dfs = pd.read_excel('Test.xlsx', sheet_name=None)
output = {}
for ws, df in dfs.items():
if 'date' in df.columns:
df['date'] = pd.to_datetime(df['date'].apply(lambda x: f'{str(x)[:4]}.{str(x)[4:6 if len(str(x)) > 7 else 5]}.{str(x)[-2:]}')).dt.date
output[ws] = df
writer = pd.ExcelWriter('TestOutput.xlsx')
for ws, df in output.items():
df.to_excel(writer, index=None, sheet_name=ws)
writer.save()
writer.close()
For each worksheet containing the column date in the input xlsx file, it will convert the integer it finds to a date, assuming that the month portion may be 1 or 2 digits and that the day portion is always a full 2 digits. If the actual month/day protocol in your data is different, you can adjust the logic accordingly.
The code creates a new output xlsx reflecting the above changes.
I am writing a dataframe into excel and using xlsx writer to format my date columns to a custom format but the excel always contains a datetime value and ignores the custom formatting specified in my code. Here is the code:
writer = ExcelWriter(path+'test.xlsx', engine='xlsxwriter')
workbook = writer.book
df.to_excel(writer,sheet_name='sheet1', index=False, startrow = 1, header=False)
worksheet1 = writer.sheets['sheet1']
fmt = workbook.add_format({'num_format':'d-mmm-yy'})
worksheet1.set_column('C:C', None, fmt)
# Adjusting column width
worksheet1.set_column(0, 20, 12)
# Adding back the header row
column_list = df.columns
for idx, val in enumerate(column_list):
worksheet1.write(0, idx, val)
writer.save()
Here I want 'd-mmm-yy' format for column C but the exported excel contains datetime values. I also don't want to use strftime to convert my columns to strings to ensure easy date filtering in excel.
Excel output:
The reason this doesn't work as expected is because Pandas uses a default datetime format with datetime objects and it applies this format at the cell level. In XlsxWriter, and Excel, a cell format overrides a column format so you column format has no effect.
The easiest way to handle this is to specify the Pandas date (or datetime) format as a parameter in pd.ExcelWriter():
import pandas as pd
from datetime import date
df = pd.DataFrame({'Dates': [date(2020, 2, 1),
date(2020, 2, 2),
date(2020, 2, 3),
date(2020, 2, 4),
date(2020, 2, 5)]})
writer = pd.ExcelWriter('pandas_datetime.xlsx',
engine='xlsxwriter',
date_format='d-mmm-yy')
df.to_excel(writer, sheet_name='Sheet1')
writer.save()
Output:
See also this Pandas Datetime example from the XlsxWriter docs.
I am currently using XlsxWriter to output the dataframe to Excel files.
My dataframe is as follows.
Num Des price percentage
321 pencil $23 24%
452 pen $12 29%
444 key $32 33%
111 eraser $49 14%
And the dollar and percent signs I used are by the codes:
df['price'] = df['price'].apply(lambda x: format(x, '.0%'))
df['percentage'] = df['percentage'].apply(lambda x: format(x, "${:20,.0f}"))
But after I output the dataframe into Excel by XlsxWriter, the values with signs turn into strings.
Is there a way that I can keep the number type?
Do not use pandas to format the file, use them for what they 're best at, pandas for the data manipulation and xlsxwriter for the format.
So your code should be something like this:
import pandas as pd
# Create your dataframe
df = pd.DataFrame({'Num': [321,452,444,111],
'Des': ['pencil','pen','key','eraser'],
'price': [23,12,32,49],
'percentage': [24,29,33,14]})
# Divide by 100 the column with the percentages
df['percentage'] = df['percentage'] / 100
# Pass the df into the xlsxwriter
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', index=False)
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Define the formats
cell_format1 = workbook.add_format({'num_format': '$#,##0'})
cell_format2 = workbook.add_format({'num_format': '0%'})
# Set the columns width and format
worksheet.set_column('C:C', 12, cell_format1)
worksheet.set_column('D:D', 12, cell_format2)
# Write the file
writer.save()
Output:
For more info about xlsxwriter's format class have a look here, it really has everything you need.
I am working on a project where I am writing out onto an xlsx spreadsheet and need to format the one column for 'Date'. I get the program to run and all but the column format is still set to 'General'.
Try this in a different way with different code to see if anyone answers.:
for row in cur.execute('''SELECT `Mapline`,`Plant`,`Date`,`Action` from AEReport'''):
lengthOfHeadings = len(row)
output = '%s-%s.xlsx' % ("AEReport",now.strftime("%m%d%Y-%H%M"))
workbook = xlsxwriter.Workbook(output, {'strings_to_numbers':True})
worksheet = workbook.add_worksheet()
format=workbook.add_format({'font_size':'8','border':True})
format2=workbook.add_format({'font_size':'8','border':True,'num_format':'mm/dd/yy hh:mm'})
count = 0
for name in range(0,lengthOfHeadings):
if name==row[2]:
name=int(name)
worksheet.write(counter, count, row[name],format2)
else:
worksheet.write(counter, count, row[name],format)
count += 1
counter += 1
Slihthinden
To get the date time format working, you would have to get the date value converted to a excel serial date value.
Here is an example showing how does it work:
import pandas as pd
data = pd.DataFrame({'test_date':pd.date_range('1/1/2011', periods=12, freq='M') })
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
data.test_date = data.test_date - pd.datetime(1899, 12, 31)
pd.core.format.header_style = None
data.to_excel(writer, sheet_name='test', index=False)
workbook = writer.book
worksheet = writer.sheets['test']
formatdict = {'num_format':'mm/dd/yyyy'}
fmt = workbook.add_format(formatdict)
worksheet.set_column('A:A', None, fmt)
writer.save()
This is how the output will look like:
from datetime import datetime
date_format = workbook.add_format({'num_format':'yyyy-mm-dd hh:mm:ss'})
worksheet.write(0, 0, datetime.today(),date_format)
result:
image from Excel Generated
date = workbook.add_format({'num_format': 'dd-mm-yyyy'})
worksheet.write(1, 1 , 02-12-199, date)
I am trying to write out my pandas table using xlsxwriter. I have two columns:
Date | Time
10/10/2015 8:57
11/10/2015 10:23
But when I use xlsxwriter, the output is:
Date | Time
10/10/2015 0.63575435
11/10/2015 0.33256774
I tried using datetime_format = 'hh:mm:ss' but this didn't change it. How else can I get the date to format correctly without effecting the date column?
The following code works for me, but there are some caveats. If the custom formatting will work depends on the Windows/Excel version you open it with. Excels custom formatting depends on the language settings of the Windows OS.
Excel custom formatting
Windows date/time settings
So yeah, not the best solution... but the idea is to change the formatting for each column instead of changing how to interpret a type of data for the whole excel file that is being created.
import pandas as pd
from datetime import datetime, date
# Create a Pandas dataframe from some datetime data.
df = pd.DataFrame({'Date and time': [date(2015, 1, 1),
date(2015, 1, 2),
date(2015, 1, 3),
date(2015, 1, 4),
date(2015, 1, 5)],
'Time only': ["11:30:55",
"1:20:33",
"11:10:00",
"16:45:35",
"12:10:15"],
})
df['Time only'] = df['Time only'].apply(pd.to_timedelta)
#df['Date and time'] = df['Date and time'].apply(pd.to_datetime)
# Create a Pandas Excel writer using XlsxWriter as the engine.
# Also set the default datetime and date formats.
writer = pd.ExcelWriter("pandas_datetime.xlsx",
engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name='Sheet1')
# Get the xlsxwriter workbook and worksheet objects in order to set the column
# widths, to make the dates clearer.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
#PLAY AROUND WITH THE NUM_FORMAT, IT DEPENDS ON YOUR WINDOWS AND EXCEL DATE/TIME SETTINGS WHAT WILL WORK
# Add some cell formats.
format1 = workbook.add_format({'num_format': 'd-mmm-yy'})
format2 = workbook.add_format({'num_format': "h:mm:ss"})
# Set the format
worksheet.set_column('B:B', None, format1)
worksheet.set_column('C:C', None, format2)
worksheet.set_column('B:C', 20)
# Close the Pandas Excel writer and output the Excel file.
writer.save()