I need to convert dates into Excel serial numbers for a data munging script I am writing. By playing with dates in my OpenOffice Calc workbook, I was able to deduce that '1-Jan 1899 00:00:00' maps to the number zero.
I wrote the following function to convert from a python datetime object into an Excel serial number:
def excel_date(date1):
temp=dt.datetime.strptime('18990101', '%Y%m%d')
delta=date1-temp
total_seconds = delta.days * 86400 + delta.seconds
return total_seconds
However, when I try some sample dates, the numbers are different from those I get when I format the date as a number in Excel (well OpenOffice Calc). For example, testing '2009-03-20' gives 3478032000 in Python, whilst excel renders the serial number as 39892.
What is wrong with the formula above?
*Note: I am using Python 2.6.3, so do not have access to datetime.total_seconds()
It appears that the Excel "serial date" format is actually the number of days since 1900-01-00, with a fractional component that's a fraction of a day, based on http://www.cpearson.com/excel/datetime.htm. (I guess that date should actually be considered 1899-12-31, since there's no such thing as a 0th day of a month)
So, it seems like it should be:
def excel_date(date1):
temp = dt.datetime(1899, 12, 30) # Note, not 31st Dec but 30th!
delta = date1 - temp
return float(delta.days) + (float(delta.seconds) / 86400)
While this is not exactly relevant to the excel serial date format, this was the top hit for exporting python date time to Excel. What I have found particularly useful and simple is to just export using strftime.
import datetime
current_datetime = datetime.datetime.now()
current_datetime.strftime('%x %X')
This will output in the following format '06/25/14 09:59:29' which is accepted by Excel as a valid date/time and allows for sorting in Excel.
if the problem is that we want DATEVALUE() excel serial number for dates, the toordinal() function can be used. Python serial numbers start from Jan1 of year 1 whereas excel starts from 1 Jan 1900 so apply an offset. Also see excel 1900 leap year bug (https://support.microsoft.com/en-us/help/214326/excel-incorrectly-assumes-that-the-year-1900-is-a-leap-year)
def convert_date_to_excel_ordinal(day, month, year) :
offset = 693594
current = date(year,month,day)
n = current.toordinal()
return (n - offset)
With the 3rd party xlrd.xldate module, you can supply a tuple structured as (year, month, day, hour, minute, second) and, if necessary, calculate a day fraction from any microseconds component:
from datetime import datetime
from xlrd import xldate
from operator import attrgetter
def excel_date(input_date):
components = ('year', 'month', 'day', 'hour', 'minute', 'second')
frac = input_date.microsecond / (86400 * 10**6) # divide by microseconds in one day
return xldate.xldate_from_datetime_tuple(attrgetter(*components)(input_date), 0) + frac
res = excel_date(datetime(1900, 3, 1, 12, 0, 0, 5*10**5))
# 61.50000578703704
According to #akgood's answer, when the datetime is before 1/0/1900, the return value is wrong, the corrected return expression may be:
def excel_date(date1):
temp = dt.datetime(1899, 12, 30) # Note, not 31st Dec but 30th!
delta = date1 - temp
return float(delta.days) + (-1.0 if delta.days < 0 else 1.0)*(delta.seconds)) / 86400
This worked when I tested using the csv package to create a spreadsheet:
from datetime import datetime
def excel_date(date1):
return date1.strftime('%x %-I:%M:%S %p')
now = datetime.now()
current_datetime=now.strftime('%x %-I:%M:%S %p')
time_data.append(excel_date(datetime.now()))
...
Related
I have a Value 38142 I need to convert it into date format using python.
if use this number in excel and do a right click and format cell at that time the value will be converted to 04/06/2004 and I need the same result using python. How can I achieve this
The offset in Excel is the number of days since 1900/01/01, with 1 being the first of January 1900, so add the number of days as a timedelta to 1899/12/31:
from datetime import datetime, timedelta
def from_excel_ordinal(ordinal: float, _epoch0=datetime(1899, 12, 31)) -> datetime:
if ordinal >= 60:
ordinal -= 1 # Excel leap year bug, 1900 is not a leap year!
return (_epoch0 + timedelta(days=ordinal)).replace(microsecond=0)
You have to adjust the ordinal by one day for any date after 1900/02/28; Excel has inherited a leap year bug from Lotus 1-2-3 and treats 1900 as a leap year. The code above returns datetime(1900, 2, 28, 0, 0) for both 59 and 60 to correct for this, with fractional values in the range [59.0 - 61.0) all being a time between 00:00:00.0 and 23:59:59.999999 on that day.
The above also supports serials with a fraction to represent time, but since Excel doesn't support microseconds those are dropped.
from datetime import datetime, timedelta
def from_excel_ordinal(ordinal, epoch=datetime(1900, 1, 1)):
# Adapted from above, thanks to #Martijn Pieters
if ordinal > 59:
ordinal -= 1 # Excel leap year bug, 1900 is not a leap year!
inDays = int(ordinal)
frac = ordinal - inDays
inSecs = int(round(frac * 86400.0))
return epoch + timedelta(days=inDays - 1, seconds=inSecs) # epoch is day 1
excelDT = 42548.75001 # Float representation of 27/06/2016 6:00:01 PM in Excel format
pyDT = from_excel_ordinal(excelDT)
The above answer is fine for just a date value, but here I extend the above solution to include time and return a datetime values as well.
I would recomment the following:
import pandas as pd
def convert_excel_time(excel_time):
return pd.to_datetime('1900-01-01') + pd.to_timedelta(excel_time,'D')
Or
import datetime
def xldate_to_datetime(xldate):
temp = datetime.datetime(1900, 1, 1)
delta = datetime.timedelta(days=xldate)
return temp+delta
Is taken from
https://gist.github.com/oag335/9959241
I came to this question when trying to do the same above, but for entire columns within a df. I made this function, which did it for me:
import pandas as pd
from datetime import datetime, timedelta
import copy as cp
def xlDateConv(df, *cols):
tempDt = []
fin = cp.deepcopy(df)
for col in [*cols]:
for i in range(len(fin[col])):
tempDate = datetime(1900, 1, 1)
delta = timedelta(float(fin[col][i]))
tempDt.append(pd.to_datetime(tempDate+delta))
fin[col] = tempDt
tempDt = []
return fin
Note that you need to type each column, quoted (as string), as one parameter, which can most likely be improved (list of columns as input, for instance). Also, it returns a copy of the original df (doesn't change the original).
Btw, partly inspired by this (https://gist.github.com/oag335/9959241).
If you are working with Pandas this could be useful
import xlrd
import datetime as dt
def from_excel_datetime(x):
return dt.datetime(*xlrd.xldate_as_tuple(x, datemode=0))
df['date'] = df.excel_date.map(from_excel_datetime)
If the date seems to be 4 years delayed, maybe you can try with datemode 1.
:param datemode: 0: 1900-based, 1: 1904-based.
My code is the following:
date = datetime.datetime.now()- datetime.datetime.now()
print date
h, m , s = str(date).split(':')
When I print h the result is:
-1 day, 23
How do I get only the hour (the 23) from the substract using datetime?
Thanks.
If you subtract the current date from a past date, you would get a negative timedelta value.
You can get the seconds with td.seconds and corresponding hour value via just dividing by 3600.
from datetime import datetime
import time
date1 = datetime.now()
time.sleep(3)
date2 = datetime.now()
# timedelta object
td = date2 - date1
print(td.days, td.seconds // 3600, td.seconds)
# 0 0 3
You're not too far off but you should just ask your question as opposed to a question with a "real scenario" later as those are often two very different questions. That way you get an answer to your actual question.
All that said, rather than going through a lot of hoop-jumping with splitting the datetime object, assigning it to a variable which you then later use look for what you need in, it's better to just know what DateTime can do since that can be such a common part of your coding. You would also do well to look at timedelta (which is part of datetime) and if you use pandas, timestamp.
from datetime import datetime
date = datetime.now()
print(date)
print(date.hour)
I can get you the hour of datetime.datetime.now()
You could try indexing a list of a string of datetime.datetime.now():
print(list(str(datetime.datetime.now()))[11] + list(str(datetime.datetime.now()))[12])
Output (in my case when tested):
09
Hope I am of help!
I have spent some time trying to figure out how to get a time delta between time values. The only issue is that one of the times was stored in a file. So I have one string which is in essence str(datetime.datetime.now()) and datetime.datetime.now().
Specifically, I am having issues getting a delta because one of the objects is a datetime object and the other is a string.
I think the answer is that I need to get the string back in a datetime object for the delta to work.
I have looked at some of the other Stack Overflow questions relating to this including the following:
Python - Date & Time Comparison using timestamps, timedelta
Comparing a time delta in python
Convert string into datetime.time object
Converting string into datetime
Example code is as follows:
f = open('date.txt', 'r+')
line = f.readline()
date = line[:26]
now = datetime.datetime.now()
then = time.strptime(date)
delta = now - then # This does not work
Can anyone tell me where I am going wrong?
For reference, the first 26 characters are acquired from the first line of the file because this is how I am storing time e.g.
f.write(str(datetime.datetime.now())
Which would write the following:
2014-01-05 13:09:42.348000
time.strptime returns a struct_time.
datetime.datetime.now() returns a datetime object.
The two can not be subtracted directly.
Instead of time.strptime you could use datetime.datetime.strptime, which returns a datetime object. Then you could subtract now and then.
For example,
import datetime as DT
now = DT.datetime.now()
then = DT.datetime.strptime('2014-1-2', '%Y-%m-%d')
delta = now - then
print(delta)
# 3 days, 8:17:14.428035
By the way, you need to supply a date format string to time.strptime or DT.datetime.strptime.
time.strptime(date)
should have raised a ValueError.
It looks like your date string is 26 characters long. That might mean you have a date string like 'Fri, 10 Jun 2011 11:04:17 '.
If that is true, you may want to parse it like this:
then = DT.datetime.strptime('Fri, 10 Jun 2011 11:04:17 '.strip(), "%a, %d %b %Y %H:%M:%S")
print(then)
# 2011-06-10 11:04:17
There is a table describing the available directives (like %Y, %m, etc.) here.
Try this:
import time
import datetime
d = datetime.datetime.now()
now = time.mktime(d.timetuple())
And then apply the delta
if you have the year,month,day of 'then' you may use:
year = 2013
month = 1
day = 1
now_date = datetime.datetime.now()
then_date = now_date.replace(year = year, month = month, day = day)
delta = now_date - then_date
I need to parse strings representing 6-digit dates in the format yymmdd where yy ranges from 59 to 05 (1959 to 2005). According to the time module docs, Python's default pivot year is 1969 which won't work for me.
Is there an easy way to override the pivot year, or can you suggest some other solution? I am using Python 2.7. Thanks!
I'd use datetime and parse it out normally. Then I'd use datetime.datetime.replace on the object if it is past your ceiling date -- Adjusting it back 100 yrs.:
import datetime
dd = datetime.datetime.strptime(date,'%y%m%d')
if dd.year > 2005:
dd = dd.replace(year=dd.year-100)
Prepend the century to your date using your own pivot:
year = int(date[0:2])
if 59 <= year <= 99:
date = '19' + date
else
date = '20' + date
and then use strptime with the %Y directive instead of %y.
import datetime
date = '20-Apr-53'
dt = datetime.datetime.strptime( date, '%d-%b-%y' )
if dt.year > 2000:
dt = dt.replace( year=dt.year-100 )
^2053 ^1953
print dt.strftime( '%Y-%m-%d' )
You can also perform the following:
today=datetime.datetime.today().strftime("%m/%d/%Y")
today=today[:-4]+today[-2:]
Recently had a similar case, ended up with this basic calculation and logic:
pivotyear = 1969
century = int(str(pivotyear)[:2]) * 100
def year_2to4_digit(year):
return century + year if century + year > pivotyear else (century + 100) + year
If you are dealing with very recent dates as well as very old dates and want to use the current date as a pivot (not just the current year), try this code:
import datetime
def parse_date(date_str):
parsed = datetime.datetime.strptime(date_str,'%y%m%d')
current_date = datetime.datetime.now()
if parsed > current_date:
parsed = parsed.replace(year=parsed.year - 100)
return parsed
I have an excel file with dates formatted as such:
22.10.07 16:00
22.10.07 17:00
22.10.07 18:00
22.10.07 19:00
After using the parse method of pandas to read the data, the dates are read almost correctly:
In [55]: nts.data['Tid'][10000:10005]
Out[55]:
10000 2007-10-22 15:59:59.997905
10001 2007-10-22 16:59:59.997904
10002 2007-10-22 17:59:59.997904
10003 2007-10-22 18:59:59.997904
What do I need to do to either a) get it to work correctly, or b) is there a trick to fix this easily? (e.g. some kind of 'round' function for datetime)
I encountered the same issue and got around it by not parsing the dates using Pandas, but rather applying my own function (shown below) to the relevant column(s) of the dataframe:
def ExcelDateToDateTime(xlDate):
epoch = dt.datetime(1899, 12, 30)
delta = dt.timedelta(hours = round(xlDate*24))
return epoch + delta
df = pd.DataFrame.from_csv('path')
df['Date'] = df['Date'].apply(ExcelDateToDateTime)
Note: This will ignore any time granularity below the hour level, but that's all I need, and it looks from your example that this could be the case for you too.
Excel serializes datetimes with a ddddd.tttttt format, where the d part is an integer number representing the offset from a reference day (like Dec 31st, 1899), and the t part is a fraction between 0.0 and 1.0 which stands for the part of the day at the given time (for example at 12:00 it's 0.5, at 18:00 it's 0.75 and so on).
I asked you to upload a file with sample data. .xlsx files are really ZIP archives which contains your XML-serialized worksheets. This are the dates I extracted from the relevant column. Excerpt:
38961.666666666628
38961.708333333292
38961.749999999956
When you try to manually deserialize you get the same datetimes as Panda. Unfortunately, the way Excel stores times makes it impossible to represent some values exactly, so you have to round them for displaying purposes. I'm not sure if rounded data is needed for analysis, though.
This is the script I used to test that deserialized datetimes are really the same ones as Panda:
from datetime import date, datetime, time, timedelta
from urllib2 import urlopen
def deserialize(text):
tokens = text.split(".")
date_tok = tokens[0]
time_tok = tokens[1] if len(tokens) == 2 else "0"
d = date(1899, 12, 31) + timedelta(int(date_tok))
t = time(*helper(float("0." + time_tok), (24, 60, 60, 1000000)))
return datetime.combine(d, t)
def helper(factor, units):
result = list()
for unit in units:
value, factor = divmod(factor * unit, 1)
result.append(int(value))
return result
url = "https://gist.github.com/RaffaeleSgarro/877d7449bd19722b44cb/raw/" \
"45d5f0b339d4abf3359fe673fcd2976374ed61b8/dates.txt"
for line in urlopen(url):
print deserialize(line)