Python converting date from 5 digit code wrong [duplicate]

Python converting date from 5 digit code wrong [duplicate] - python

I have a Value 38142 I need to convert it into date format using python.
if use this number in excel and do a right click and format cell at that time the value will be converted to 04/06/2004 and I need the same result using python. How can I achieve this

The offset in Excel is the number of days since 1900/01/01, with 1 being the first of January 1900, so add the number of days as a timedelta to 1899/12/31:
from datetime import datetime, timedelta
def from_excel_ordinal(ordinal: float, _epoch0=datetime(1899, 12, 31)) -> datetime:
if ordinal >= 60:
ordinal -= 1 # Excel leap year bug, 1900 is not a leap year!
return (_epoch0 + timedelta(days=ordinal)).replace(microsecond=0)
You have to adjust the ordinal by one day for any date after 1900/02/28; Excel has inherited a leap year bug from Lotus 1-2-3 and treats 1900 as a leap year. The code above returns datetime(1900, 2, 28, 0, 0) for both 59 and 60 to correct for this, with fractional values in the range [59.0 - 61.0) all being a time between 00:00:00.0 and 23:59:59.999999 on that day.
The above also supports serials with a fraction to represent time, but since Excel doesn't support microseconds those are dropped.

from datetime import datetime, timedelta
def from_excel_ordinal(ordinal, epoch=datetime(1900, 1, 1)):
# Adapted from above, thanks to #Martijn Pieters
if ordinal > 59:
ordinal -= 1 # Excel leap year bug, 1900 is not a leap year!
inDays = int(ordinal)
frac = ordinal - inDays
inSecs = int(round(frac * 86400.0))
return epoch + timedelta(days=inDays - 1, seconds=inSecs) # epoch is day 1
excelDT = 42548.75001 # Float representation of 27/06/2016 6:00:01 PM in Excel format
pyDT = from_excel_ordinal(excelDT)
The above answer is fine for just a date value, but here I extend the above solution to include time and return a datetime values as well.

I would recomment the following:
import pandas as pd
def convert_excel_time(excel_time):
return pd.to_datetime('1900-01-01') + pd.to_timedelta(excel_time,'D')
Or
import datetime
def xldate_to_datetime(xldate):
temp = datetime.datetime(1900, 1, 1)
delta = datetime.timedelta(days=xldate)
return temp+delta
Is taken from
https://gist.github.com/oag335/9959241

I came to this question when trying to do the same above, but for entire columns within a df. I made this function, which did it for me:
import pandas as pd
from datetime import datetime, timedelta
import copy as cp
def xlDateConv(df, *cols):
tempDt = []
fin = cp.deepcopy(df)
for col in [*cols]:
for i in range(len(fin[col])):
tempDate = datetime(1900, 1, 1)
delta = timedelta(float(fin[col][i]))
tempDt.append(pd.to_datetime(tempDate+delta))
fin[col] = tempDt
tempDt = []
return fin
Note that you need to type each column, quoted (as string), as one parameter, which can most likely be improved (list of columns as input, for instance). Also, it returns a copy of the original df (doesn't change the original).
Btw, partly inspired by this (https://gist.github.com/oag335/9959241).

If you are working with Pandas this could be useful
import xlrd
import datetime as dt
def from_excel_datetime(x):
return dt.datetime(*xlrd.xldate_as_tuple(x, datemode=0))
df['date'] = df.excel_date.map(from_excel_datetime)
If the date seems to be 4 years delayed, maybe you can try with datemode 1.
:param datemode: 0: 1900-based, 1: 1904-based.

Related

Changing a string with dates into days after Jan. 1, 2010

For the function below, I am inputting a string like "6/29/2020" and "8/10/2010" and I want to get a numbers of days after Jan. 1, 2010. For example, if I input "1/29/2010", I want the integer 29 to be returned.
Currently, I have gotten "6/29/2020" to a string "2020-06-29". Now I just need help with converting that string into the days after Jan. 1, 2010.
I feel like I have posted everything needed for you to help, but if you need more information, let me know. Thank You for helping me with this problem.
def day_conversion(dates):
import datetime
i = 0
for day in dates:
day = day.split('/')
if len(day[0]) == 1:
day[0] = f"0{day[0]}"
if len(day[1]) == 1:
day[1] = f"0{day[1]}"
day = f"{day[2]}-{day[0]}-{day[1]}"
# day = date.format(day)
# from datetime import date
# day0 = date(2000, 1, 1)
# day = day - day0
dates[i] = day
i += 1
return dates

datetime has a function for parsing dates, and subtracting two datetime objects gives a timedelta object with a .days attribute:
from datetime import datetime
def days_since_jan1_2010(date):
dt = datetime.strptime(date, '%m/%d/%Y')
diff = dt - datetime(2010, 1, 1)
return diff.days
def day_conversion(dates):
return [days_since_jan1_2010(d) for d in dates]
print(day_conversion(['6/29/2020', '8/10/2010', '1/1/2010', '1/2/2010']))
Output:
[3832, 221, 0, 1]

Everything in the previous answer is correct, but just thought I'd point out that you were very nearly there if you include the commented out part in your code above except for the following points:
from datetime import date needs to come before you try to use date.
You want date.fromisoformat, not date.format.
Your code has Jan 1 2000 but you state in your question that you want the number of days from Jan 1 2010.
If you substitute the commented part of your original code for the following four lines you should get the result you are after.
from datetime import date
day = date.fromisoformat(day)
day0 = date(2010, 1, 1)
day = day - day0

Get date from week number

Please what's wrong with my code:
import datetime
d = "2013-W26"
r = datetime.datetime.strptime(d, "%Y-W%W")
print(r)
Display "2013-01-01 00:00:00", Thanks.

A week number is not enough to generate a date; you need a day of the week as well. Add a default:
import datetime
d = "2013-W26"
r = datetime.datetime.strptime(d + '-1', "%Y-W%W-%w")
print(r)
The -1 and -%w pattern tells the parser to pick the Monday in that week. This outputs:
2013-07-01 00:00:00
%W uses Monday as the first day of the week. While you can pick your own weekday, you may get unexpected results if you deviate from that.
See the strftime() and strptime() behaviour section in the documentation, footnote 4:
When used with the strptime() method, %U and %W are only used in calculations when the day of the week and the year are specified.
Note, if your week number is a ISO week date, you'll want to use %G-W%V-%u instead! Those directives require Python 3.6 or newer.

In Python 3.8 there is the handy datetime.date.fromisocalendar:
>>> from datetime import date
>>> date.fromisocalendar(2020, 1, 1) # (year, week, day of week)
datetime.date(2019, 12, 30, 0, 0)
In older Python versions (3.7-) the calculation can use the information from datetime.date.isocalendar to figure out the week ISO8601 compliant weeks:
from datetime import date, timedelta
def monday_of_calenderweek(year, week):
first = date(year, 1, 1)
base = 1 if first.isocalendar()[1] == 1 else 8
return first + timedelta(days=base - first.isocalendar()[2] + 7 * (week - 1))
Both works also with datetime.datetime.

To complete the other answers - if you are using ISO week numbers, this string is appropriate (to get the Monday of a given ISO week number):
import datetime
d = '2013-W26'
r = datetime.datetime.strptime(d + '-1', '%G-W%V-%u')
print(r)
%G, %V, %u are ISO equivalents of %Y, %W, %w, so this outputs:
2013-06-24 00:00:00
Availabe in Python 3.6+; from docs.

import datetime
res = datetime.datetime.strptime("2018 W30 w1", "%Y %W w%w")
print res
Adding of 1 as week day will yield exact current week start. Adding of timedelta(days=6) will gives you the week end.
datetime.datetime(2018, 7, 23)

If anyone is looking for a simple function that returns all working days (Mo-Fr) dates from a week number consider this (based on accepted answer)
import datetime
def weeknum_to_dates(weeknum):
return [datetime.datetime.strptime("2021-W"+ str(weeknum) + str(x), "%Y-W%W-%w").strftime('%d.%m.%Y') for x in range(-5,0)]
weeknum_to_dates(37)
Output:
['17.09.2021', '16.09.2021', '15.09.2021', '14.09.2021', '13.09.2021']

In case you have the yearly number of week, just add the number of weeks to the first day of the year.
>>> import datetime
>>> from dateutil.relativedelta import relativedelta
>>> week = 40
>>> year = 2019
>>> date = datetime.date(year,1,1)+relativedelta(weeks=+week)
>>> date
datetime.date(2019, 10, 8)

Another solution which worked for me that accepts series data as opposed to strptime only accepting single string values:
#fw_to_date
import datetime
import pandas as pd
# fw is input in format 'YYYY-WW'
# Add weekday number to string 1 = Monday
fw = fw + '-1'
# dt is output column
# Use %G-%V-%w if input is in ISO format
dt = pd.to_datetime(fw, format='%Y-%W-%w', errors='coerce')

Here's a handy function including the issue with zero-week.

Converting date formats python - Unusual date formats - Extract %Y%M%D

I have a large data set with a variety of Date information in the following formats:
DAYS since Jan 1, 1900 - ex: 41213 - I believe these are from Excel http://www.kirix.com/stratablog/jd-edwards-date-conversions-cyyddd
YYDayofyear - ex 2012265
I am familiar with python's time module, strptime() method, and strftime () method. However, I am not sure what these date formats above are called on if there is a python module I can use to convert these unusual date formats.
Any idea how to get the %Y%M%D format from these unusual date formats without writing my own calculator?
Thanks.

You can try something like the following:
In [1]: import datetime
In [2]: s = '2012265'
In [3]: datetime.datetime.strptime(s, '%Y%j')
Out[3]: datetime.datetime(2012, 9, 21, 0, 0)
In [4]: d = '41213'
In [5]: datetime.date(1900, 1, 1) + datetime.timedelta(int(d))
Out[5]: datetime.date(2012, 11, 2)
The first one is the trickier one, but it uses the %j parameter to interpret the day of the year you provide (after a four-digit year, represented by %Y). The second one is simply the number of days since January 1, 1900.
This is the general conversion - not sure of your input format but hopefully this can be tweaked to suit it.

On the Excel integer to Python datetime bit:
Note that there are two Excel date systems (one 1-Jan-1900 based and another 1-Jan 1904 based); see https://support.microsoft.com/en-us/help/214330/differences-between-the-1900-and-the-1904-date-system-in-excel for more information.
Also note that the system is NOT zero-based. So, in the 1900 system, 1-Jan-1900 is day 1 (not day 0).
import datetime
EXCEL_DATE_SYSTEM_PC=1900
EXCEL_DATE_SYSTEM_MAC=1904
i = 42129 # Excel number for 5-May-2015
d = datetime.date(EXCEL_DATE_SYSTEM_PC, 1, 1) + datetime.timedelta(i-2)

Both of these formats seems pretty straightforward to work with. The first one, in fact, is just an integer, so why don't you just do something like this?
import datetime
def days_since_jan_1_1900_to_datetime(d):
return datetime.datetime(1900,1,1) + \
datetime.timedelta(days=d)
For the second one, the details depend on exactly how the format is defined (e.g. can you always expect 3 digits after the year even when the number of days is less than 100, or is it possible that there are 2 or 1 – and if so, is the year always 4 digits?) but once you've got that part down it can be done very similarly.

According to http://docs.python.org/2/library/datetime.html#strftime-and-strptime-behavior
, day of the year is "%j", whereas the first case can be solved by toordinal() and fromordinal(): date.fromordinal(date(1900, 1, 1).toordinal() + x)

I'd think timedelta.
import datetime
d = datetime.timedelta(days=41213)
start = datetime.datetime(year=1900, month=1, day=1)
the_date = start + d
For the second one, you can 2012265[:4] to get the year and use the same method.
edit: See the answer with %j for the second.

from datetime import datetime
df(['timeelapsed'])=(pd.to_datetime(df['timeelapsed'], format='%H:%M:%S') - datetime(1900, 1, 1)).dt.total_seconds()

Converting Matlab's datenum format to Python

I just started moving from Matlab to Python 2.7 and I have some trouble reading my .mat-files. Time information is stored in Matlab's datenum format. For those who are not familiar with it:
A serial date number represents a calendar date as the number of days that has passed since a fixed base date. In MATLAB, serial date number 1 is January 1, 0000.
MATLAB also uses serial time to represent fractions of days beginning at midnight; for example, 6 p.m. equals 0.75 serial days. So the string '31-Oct-2003, 6:00 PM' in MATLAB is date number 731885.75.
(taken from the Matlab documentation)
I would like to convert this to Pythons time format and I found this tutorial. In short, the author states that
If you parse this using python's datetime.fromordinal(731965.04835648148) then the result might look reasonable [...]
(before any further conversions), which doesn't work for me, since datetime.fromordinal expects an integer:
>>> datetime.fromordinal(731965.04835648148)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: integer argument expected, got float
While I could just round them down for daily data, I actually need to import minutely time series. Does anyone have a solution for this problem? I would like to avoid reformatting my .mat files since there's a lot of them and my colleagues need to work with them as well.
If it helps, someone else asked for the other way round. Sadly, I'm too new to Python to really understand what is happening there.
/edit (2012-11-01): This has been fixed in the tutorial posted above.

You link to the solution, it has a small issue. It is this:
python_datetime = datetime.fromordinal(int(matlab_datenum)) + timedelta(days=matlab_datenum%1) - timedelta(days = 366)
a longer explanation can be found here

Using pandas, you can convert a whole array of datenum values with fractional parts:
import numpy as np
import pandas as pd
datenums = np.array([737125, 737124.8, 737124.6, 737124.4, 737124.2, 737124])
timestamps = pd.to_datetime(datenums-719529, unit='D')
The value 719529 is the datenum value of the Unix epoch start (1970-01-01), which is the default origin for pd.to_datetime().
I used the following Matlab code to set this up:
datenum('1970-01-01') % gives 719529
datenums = datenum('06-Mar-2018') - linspace(0,1,6) % test data
datestr(datenums) % human readable format

Just in case it's useful to others, here is a full example of loading time series data from a Matlab mat file, converting a vector of Matlab datenums to a list of datetime objects using carlosdc's answer (defined as a function), and then plotting as time series with Pandas:
from scipy.io import loadmat
import pandas as pd
import datetime as dt
import urllib
# In Matlab, I created this sample 20-day time series:
# t = datenum(2013,8,15,17,11,31) + [0:0.1:20];
# x = sin(t)
# y = cos(t)
# plot(t,x)
# datetick
# save sine.mat
urllib.urlretrieve('http://geoport.whoi.edu/data/sine.mat','sine.mat');
# If you don't use squeeze_me = True, then Pandas doesn't like
# the arrays in the dictionary, because they look like an arrays
# of 1-element arrays. squeeze_me=True fixes that.
mat_dict = loadmat('sine.mat',squeeze_me=True)
# make a new dictionary with just dependent variables we want
# (we handle the time variable separately, below)
my_dict = { k: mat_dict[k] for k in ['x','y']}
def matlab2datetime(matlab_datenum):
day = dt.datetime.fromordinal(int(matlab_datenum))
dayfrac = dt.timedelta(days=matlab_datenum%1) - dt.timedelta(days = 366)
return day + dayfrac
# convert Matlab variable "t" into list of python datetime objects
my_dict['date_time'] = [matlab2datetime(tval) for tval in mat_dict['t']]
# print df
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 201 entries, 2013-08-15 17:11:30.999997 to 2013-09-04 17:11:30.999997
Data columns (total 2 columns):
x 201 non-null values
y 201 non-null values
dtypes: float64(2)
# plot with Pandas
df = pd.DataFrame(my_dict)
df = df.set_index('date_time')
df.plot()

Here's a way to convert these using numpy.datetime64, rather than datetime.
origin = np.datetime64('0000-01-01', 'D') - np.timedelta64(1, 'D')
date = serdate * np.timedelta64(1, 'D') + origin
This works for serdate either a single integer or an integer array.

Just building on and adding to previous comments. The key is in the day counting as carried out by the method toordinal and constructor fromordinal in the class datetime and related subclasses. For example, from the Python Library Reference for 2.7, one reads that fromordinal
Return the date corresponding to the proleptic Gregorian ordinal, where January 1 of year 1 has ordinal 1. ValueError is raised unless 1 <= ordinal <= date.max.toordinal().
However, year 0 AD is still one (leap) year to count in, so there are still 366 days that need to be taken into account. (Leap year it was, like 2016 that is exactly 504 four-year cycles ago.)
These are two functions that I have been using for similar purposes:
import datetime
def datetime_pytom(d,t):
'''
Input
d Date as an instance of type datetime.date
t Time as an instance of type datetime.time
Output
The fractional day count since 0-Jan-0000 (proleptic ISO calendar)
This is the 'datenum' datatype in matlab
Notes on day counting
matlab: day one is 1 Jan 0000
python: day one is 1 Jan 0001
hence an increase of 366 days, for year 0 AD was a leap year
'''
dd = d.toordinal() + 366
tt = datetime.timedelta(hours=t.hour,minutes=t.minute,
seconds=t.second)
tt = datetime.timedelta.total_seconds(tt) / 86400
return dd + tt
def datetime_mtopy(datenum):
'''
Input
The fractional day count according to datenum datatype in matlab
Output
The date and time as a instance of type datetime in python
Notes on day counting
matlab: day one is 1 Jan 0000
python: day one is 1 Jan 0001
hence a reduction of 366 days, for year 0 AD was a leap year
'''
ii = datetime.datetime.fromordinal(int(datenum) - 366)
ff = datetime.timedelta(days=datenum%1)
return ii + ff
Hope this helps and happy to be corrected.

How to convert a python datetime.datetime to excel serial date number

I need to convert dates into Excel serial numbers for a data munging script I am writing. By playing with dates in my OpenOffice Calc workbook, I was able to deduce that '1-Jan 1899 00:00:00' maps to the number zero.
I wrote the following function to convert from a python datetime object into an Excel serial number:
def excel_date(date1):
temp=dt.datetime.strptime('18990101', '%Y%m%d')
delta=date1-temp
total_seconds = delta.days * 86400 + delta.seconds
return total_seconds
However, when I try some sample dates, the numbers are different from those I get when I format the date as a number in Excel (well OpenOffice Calc). For example, testing '2009-03-20' gives 3478032000 in Python, whilst excel renders the serial number as 39892.
What is wrong with the formula above?
*Note: I am using Python 2.6.3, so do not have access to datetime.total_seconds()

It appears that the Excel "serial date" format is actually the number of days since 1900-01-00, with a fractional component that's a fraction of a day, based on http://www.cpearson.com/excel/datetime.htm. (I guess that date should actually be considered 1899-12-31, since there's no such thing as a 0th day of a month)
So, it seems like it should be:
def excel_date(date1):
temp = dt.datetime(1899, 12, 30) # Note, not 31st Dec but 30th!
delta = date1 - temp
return float(delta.days) + (float(delta.seconds) / 86400)

While this is not exactly relevant to the excel serial date format, this was the top hit for exporting python date time to Excel. What I have found particularly useful and simple is to just export using strftime.
import datetime
current_datetime = datetime.datetime.now()
current_datetime.strftime('%x %X')
This will output in the following format '06/25/14 09:59:29' which is accepted by Excel as a valid date/time and allows for sorting in Excel.

if the problem is that we want DATEVALUE() excel serial number for dates, the toordinal() function can be used. Python serial numbers start from Jan1 of year 1 whereas excel starts from 1 Jan 1900 so apply an offset. Also see excel 1900 leap year bug (https://support.microsoft.com/en-us/help/214326/excel-incorrectly-assumes-that-the-year-1900-is-a-leap-year)
def convert_date_to_excel_ordinal(day, month, year) :
offset = 693594
current = date(year,month,day)
n = current.toordinal()
return (n - offset)

With the 3rd party xlrd.xldate module, you can supply a tuple structured as (year, month, day, hour, minute, second) and, if necessary, calculate a day fraction from any microseconds component:
from datetime import datetime
from xlrd import xldate
from operator import attrgetter
def excel_date(input_date):
components = ('year', 'month', 'day', 'hour', 'minute', 'second')
frac = input_date.microsecond / (86400 * 10**6) # divide by microseconds in one day
return xldate.xldate_from_datetime_tuple(attrgetter(*components)(input_date), 0) + frac
res = excel_date(datetime(1900, 3, 1, 12, 0, 0, 5*10**5))
# 61.50000578703704

According to #akgood's answer, when the datetime is before 1/0/1900, the return value is wrong, the corrected return expression may be:
def excel_date(date1):
temp = dt.datetime(1899, 12, 30) # Note, not 31st Dec but 30th!
delta = date1 - temp
return float(delta.days) + (-1.0 if delta.days < 0 else 1.0)*(delta.seconds)) / 86400

This worked when I tested using the csv package to create a spreadsheet:
from datetime import datetime
def excel_date(date1):
return date1.strftime('%x %-I:%M:%S %p')
now = datetime.now()
current_datetime=now.strftime('%x %-I:%M:%S %p')
time_data.append(excel_date(datetime.now()))
...

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.