I just started moving from Matlab to Python 2.7 and I have some trouble reading my .mat-files. Time information is stored in Matlab's datenum format. For those who are not familiar with it:
A serial date number represents a calendar date as the number of days that has passed since a fixed base date. In MATLAB, serial date number 1 is January 1, 0000.
MATLAB also uses serial time to represent fractions of days beginning at midnight; for example, 6 p.m. equals 0.75 serial days. So the string '31-Oct-2003, 6:00 PM' in MATLAB is date number 731885.75.
(taken from the Matlab documentation)
I would like to convert this to Pythons time format and I found this tutorial. In short, the author states that
If you parse this using python's datetime.fromordinal(731965.04835648148) then the result might look reasonable [...]
(before any further conversions), which doesn't work for me, since datetime.fromordinal expects an integer:
>>> datetime.fromordinal(731965.04835648148)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: integer argument expected, got float
While I could just round them down for daily data, I actually need to import minutely time series. Does anyone have a solution for this problem? I would like to avoid reformatting my .mat files since there's a lot of them and my colleagues need to work with them as well.
If it helps, someone else asked for the other way round. Sadly, I'm too new to Python to really understand what is happening there.
/edit (2012-11-01): This has been fixed in the tutorial posted above.
You link to the solution, it has a small issue. It is this:
python_datetime = datetime.fromordinal(int(matlab_datenum)) + timedelta(days=matlab_datenum%1) - timedelta(days = 366)
a longer explanation can be found here
Using pandas, you can convert a whole array of datenum values with fractional parts:
import numpy as np
import pandas as pd
datenums = np.array([737125, 737124.8, 737124.6, 737124.4, 737124.2, 737124])
timestamps = pd.to_datetime(datenums-719529, unit='D')
The value 719529 is the datenum value of the Unix epoch start (1970-01-01), which is the default origin for pd.to_datetime().
I used the following Matlab code to set this up:
datenum('1970-01-01') % gives 719529
datenums = datenum('06-Mar-2018') - linspace(0,1,6) % test data
datestr(datenums) % human readable format
Just in case it's useful to others, here is a full example of loading time series data from a Matlab mat file, converting a vector of Matlab datenums to a list of datetime objects using carlosdc's answer (defined as a function), and then plotting as time series with Pandas:
from scipy.io import loadmat
import pandas as pd
import datetime as dt
import urllib
# In Matlab, I created this sample 20-day time series:
# t = datenum(2013,8,15,17,11,31) + [0:0.1:20];
# x = sin(t)
# y = cos(t)
# plot(t,x)
# datetick
# save sine.mat
urllib.urlretrieve('http://geoport.whoi.edu/data/sine.mat','sine.mat');
# If you don't use squeeze_me = True, then Pandas doesn't like
# the arrays in the dictionary, because they look like an arrays
# of 1-element arrays. squeeze_me=True fixes that.
mat_dict = loadmat('sine.mat',squeeze_me=True)
# make a new dictionary with just dependent variables we want
# (we handle the time variable separately, below)
my_dict = { k: mat_dict[k] for k in ['x','y']}
def matlab2datetime(matlab_datenum):
day = dt.datetime.fromordinal(int(matlab_datenum))
dayfrac = dt.timedelta(days=matlab_datenum%1) - dt.timedelta(days = 366)
return day + dayfrac
# convert Matlab variable "t" into list of python datetime objects
my_dict['date_time'] = [matlab2datetime(tval) for tval in mat_dict['t']]
# print df
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 201 entries, 2013-08-15 17:11:30.999997 to 2013-09-04 17:11:30.999997
Data columns (total 2 columns):
x 201 non-null values
y 201 non-null values
dtypes: float64(2)
# plot with Pandas
df = pd.DataFrame(my_dict)
df = df.set_index('date_time')
df.plot()
Here's a way to convert these using numpy.datetime64, rather than datetime.
origin = np.datetime64('0000-01-01', 'D') - np.timedelta64(1, 'D')
date = serdate * np.timedelta64(1, 'D') + origin
This works for serdate either a single integer or an integer array.
Just building on and adding to previous comments. The key is in the day counting as carried out by the method toordinal and constructor fromordinal in the class datetime and related subclasses. For example, from the Python Library Reference for 2.7, one reads that fromordinal
Return the date corresponding to the proleptic Gregorian ordinal, where January 1 of year 1 has ordinal 1. ValueError is raised unless 1 <= ordinal <= date.max.toordinal().
However, year 0 AD is still one (leap) year to count in, so there are still 366 days that need to be taken into account. (Leap year it was, like 2016 that is exactly 504 four-year cycles ago.)
These are two functions that I have been using for similar purposes:
import datetime
def datetime_pytom(d,t):
'''
Input
d Date as an instance of type datetime.date
t Time as an instance of type datetime.time
Output
The fractional day count since 0-Jan-0000 (proleptic ISO calendar)
This is the 'datenum' datatype in matlab
Notes on day counting
matlab: day one is 1 Jan 0000
python: day one is 1 Jan 0001
hence an increase of 366 days, for year 0 AD was a leap year
'''
dd = d.toordinal() + 366
tt = datetime.timedelta(hours=t.hour,minutes=t.minute,
seconds=t.second)
tt = datetime.timedelta.total_seconds(tt) / 86400
return dd + tt
def datetime_mtopy(datenum):
'''
Input
The fractional day count according to datenum datatype in matlab
Output
The date and time as a instance of type datetime in python
Notes on day counting
matlab: day one is 1 Jan 0000
python: day one is 1 Jan 0001
hence a reduction of 366 days, for year 0 AD was a leap year
'''
ii = datetime.datetime.fromordinal(int(datenum) - 366)
ff = datetime.timedelta(days=datenum%1)
return ii + ff
Hope this helps and happy to be corrected.
Related
I have a Value 38142 I need to convert it into date format using python.
if use this number in excel and do a right click and format cell at that time the value will be converted to 04/06/2004 and I need the same result using python. How can I achieve this
The offset in Excel is the number of days since 1900/01/01, with 1 being the first of January 1900, so add the number of days as a timedelta to 1899/12/31:
from datetime import datetime, timedelta
def from_excel_ordinal(ordinal: float, _epoch0=datetime(1899, 12, 31)) -> datetime:
if ordinal >= 60:
ordinal -= 1 # Excel leap year bug, 1900 is not a leap year!
return (_epoch0 + timedelta(days=ordinal)).replace(microsecond=0)
You have to adjust the ordinal by one day for any date after 1900/02/28; Excel has inherited a leap year bug from Lotus 1-2-3 and treats 1900 as a leap year. The code above returns datetime(1900, 2, 28, 0, 0) for both 59 and 60 to correct for this, with fractional values in the range [59.0 - 61.0) all being a time between 00:00:00.0 and 23:59:59.999999 on that day.
The above also supports serials with a fraction to represent time, but since Excel doesn't support microseconds those are dropped.
from datetime import datetime, timedelta
def from_excel_ordinal(ordinal, epoch=datetime(1900, 1, 1)):
# Adapted from above, thanks to #Martijn Pieters
if ordinal > 59:
ordinal -= 1 # Excel leap year bug, 1900 is not a leap year!
inDays = int(ordinal)
frac = ordinal - inDays
inSecs = int(round(frac * 86400.0))
return epoch + timedelta(days=inDays - 1, seconds=inSecs) # epoch is day 1
excelDT = 42548.75001 # Float representation of 27/06/2016 6:00:01 PM in Excel format
pyDT = from_excel_ordinal(excelDT)
The above answer is fine for just a date value, but here I extend the above solution to include time and return a datetime values as well.
I would recomment the following:
import pandas as pd
def convert_excel_time(excel_time):
return pd.to_datetime('1900-01-01') + pd.to_timedelta(excel_time,'D')
Or
import datetime
def xldate_to_datetime(xldate):
temp = datetime.datetime(1900, 1, 1)
delta = datetime.timedelta(days=xldate)
return temp+delta
Is taken from
https://gist.github.com/oag335/9959241
I came to this question when trying to do the same above, but for entire columns within a df. I made this function, which did it for me:
import pandas as pd
from datetime import datetime, timedelta
import copy as cp
def xlDateConv(df, *cols):
tempDt = []
fin = cp.deepcopy(df)
for col in [*cols]:
for i in range(len(fin[col])):
tempDate = datetime(1900, 1, 1)
delta = timedelta(float(fin[col][i]))
tempDt.append(pd.to_datetime(tempDate+delta))
fin[col] = tempDt
tempDt = []
return fin
Note that you need to type each column, quoted (as string), as one parameter, which can most likely be improved (list of columns as input, for instance). Also, it returns a copy of the original df (doesn't change the original).
Btw, partly inspired by this (https://gist.github.com/oag335/9959241).
If you are working with Pandas this could be useful
import xlrd
import datetime as dt
def from_excel_datetime(x):
return dt.datetime(*xlrd.xldate_as_tuple(x, datemode=0))
df['date'] = df.excel_date.map(from_excel_datetime)
If the date seems to be 4 years delayed, maybe you can try with datemode 1.
:param datemode: 0: 1900-based, 1: 1904-based.
I am trying to import a dataframe from a spreadsheet using pandas and then carry out numpy operations with its columns. The problem is that I obtain the error specified in the title: TypeError: Cannot do inplace boolean setting on mixed-types with a non np.nan value.
The reason for this is that my dataframe contains a column with dates, like:
ID Date
519457 25/02/2020 10:03
519462 25/02/2020 10:07
519468 25/02/2020 10:12
... ...
And Numpy requires the format to be floating point numbers, as so:
ID Date
519457 43886.41875
519462 43886.42153
519468 43886.425
... ...
How can I make this change without having to modify the spreadsheet itself?
I have seen a lot of posts on the forum asking the opposite, and asking about the error, and read the docs on xlrd.xldate, but have not managed to do this, which seems very simple.
I am sure this kind of problem has been dealt with before, but have not been able to find a similar post.
The code I am using is the following
xls=pd.ExcelFile(r'/home/.../TwoData.xlsx')
xls.sheet_names
df=pd.read_excel(xls,"Hoja 1")
df["E_t"]=df["Date"].diff()
Any help or pointers would be really appreciated!
PS. I have seen solutions that require computing the exact number that wants to be obtained, but this is not possible in this case due to the size of the dataframes.
You can convert the date into the Unix timestamp. In python, if you have a datetime object in UTC, you can the timestamp() to get a UTC timestamp. This function returns the time since epoch for that datetime object.
Please see an example below-
from datetime import timezone
dt = datetime(2015, 10, 19)
timestamp = dt.replace(tzinfo=timezone.utc).timestamp()
print(timestamp)
1445212800.0
Please check the datetime module for more info.
I think you need:
#https://stackoverflow.com/a/9574948/2901002
#rewritten to vectorized solution
def excel_date(date1):
temp = pd.Timestamp(1899, 12, 30) # Note, not 31st Dec but 30th!
delta = date1 - temp
return (delta.dt.days) + (delta.dt.seconds) / 86400
df["Date"] = pd.to_datetime(df["Date"]).pipe(excel_date)
print (df)
ID Date
0 519457 43886.418750
1 519462 43886.421528
2 519468 43886.425000
I have exported a list of AD Users out of AD and need to validate their login times.
The output from the powershell script give lastlogin as LDAP/FILE time
EXAMPLE 130305048577611542
I am having trouble converting this to readable time in pandas
Im using the following code:
df['date of login'] = pd.to_datetime(df['FileTime'], unit='ns')
The column FileTime contains time formatted like the EXAMPLE above.
Im getting the following output in my new column date of login
EXAMPLE 1974-02-17 03:50:48.577611542
I know this is being parsed incorrectly as when i input this date time on a online converter i get this output
EXAMPLE:
Epoch/Unix time: 1386031258
GMT: Tuesday, December 3, 2013 12:40:58 AM
Your time zone: Monday, December 2, 2013 4:40:58 PM GMT-08:00
Anyone have an idea of what occuring here why are all my dates in the 1970'
I know this answer is very late to the party, but for anyone else looking in the future.
The 18-digit Active Directory timestamps (LDAP), also named 'Windows NT time format','Win32 FILETIME or SYSTEMTIME' or NTFS file time. These are used in Microsoft Active Directory for pwdLastSet, accountExpires, LastLogon, LastLogonTimestamp and LastPwdSet. The timestamp is the number of 100-nanoseconds intervals (1 nanosecond = one billionth of a second) since Jan 1, 1601 UTC.
Therefore, 130305048577611542 does indeed relate to December 3, 2013.
When putting this value through the date time function in Python, it is truncating the value to nine digits. Therefore the timestamp becomes 130305048 and goes from 1.1.1970 which does result in a 1974 date!
In order to get the correct Unix timestamp you need to do:
(130305048577611542 / 10000000) - 11644473600
Here's a solution I did in Python that worked well for me:
import datetime
def ad_timestamp(timestamp):
if timestamp != 0:
return datetime.datetime(1601, 1, 1) + datetime.timedelta(seconds=timestamp/10000000)
return np.nan
So then if you need to convert a Pandas column:
df.lastLogonTimestamp = df.lastLogonTimestamp.fillna(0).apply(ad_timestamp)
Note: I needed to use fillna before using apply. Also, since I filled with 0's, I checked for that in the conversion function about, if timestamp != 0. Hope that makes sense. It's extra stuff but you may need it to convert the column in question.
I've been stuck on this for couple of days. But now i am ready to share really working solution in more easy to use form:
import datetime
timestamp = 132375402928051110
value = datetime.datetime (1601, 1, 1) +
datetime.timedelta(seconds=timestamp/10000000) ### combine str 3 and 4
print(value.strftime('%Y-%m-%d %H:%M:%S'))
How can one convert a serial date number, representing the number of days since epoch (1970), to the corresponding date string? I have seen multiple posts showing how to go from string to date number, but I haven't been able to find any posts on how to do the reverse.
For example, 15951 corresponds to "2013-09-02".
>>> import datetime
>>> (datetime.datetime(2013, 9, 2) - datetime.datetime(1970,1,1)).days + 1
15951
(The + 1 because whatever generated these date numbers followed the convention that Jan 1, 1970 = 1.)
TL;DR: Looking for something to do the following:
>>> serial_date_to_string(15951) # arg is number of days since 1970
"2013-09-02"
This is different from Python: Converting Epoch time into the datetime because I am starting with days since 1970. I not sure if you can just multiply by 86,400 due to leap seconds, etc.
Use the datetime package as follows:
import datetime
def serial_date_to_string(srl_no):
new_date = datetime.datetime(1970,1,1,0,0) + datetime.timedelta(srl_no - 1)
return new_date.strftime("%Y-%m-%d")
This is a function which returns the string as required.
So:
serial_date_to_string(15951)
Returns
>> "2013-09-02"
And for a Pandas Dataframe:
df["date"] = pd.to_datetime(df["date"], unit="d")
... assuming that the "date" column contains values like 18687 which is days from Unix Epoch of 1970-01-01 to 2021-03-01.
Also handles seconds and milliseconds since Unix Epoch, use unit="s" and unit="ms" respectively.
Also see my other answer with the exact reverse.
I have an excel file with dates formatted as such:
22.10.07 16:00
22.10.07 17:00
22.10.07 18:00
22.10.07 19:00
After using the parse method of pandas to read the data, the dates are read almost correctly:
In [55]: nts.data['Tid'][10000:10005]
Out[55]:
10000 2007-10-22 15:59:59.997905
10001 2007-10-22 16:59:59.997904
10002 2007-10-22 17:59:59.997904
10003 2007-10-22 18:59:59.997904
What do I need to do to either a) get it to work correctly, or b) is there a trick to fix this easily? (e.g. some kind of 'round' function for datetime)
I encountered the same issue and got around it by not parsing the dates using Pandas, but rather applying my own function (shown below) to the relevant column(s) of the dataframe:
def ExcelDateToDateTime(xlDate):
epoch = dt.datetime(1899, 12, 30)
delta = dt.timedelta(hours = round(xlDate*24))
return epoch + delta
df = pd.DataFrame.from_csv('path')
df['Date'] = df['Date'].apply(ExcelDateToDateTime)
Note: This will ignore any time granularity below the hour level, but that's all I need, and it looks from your example that this could be the case for you too.
Excel serializes datetimes with a ddddd.tttttt format, where the d part is an integer number representing the offset from a reference day (like Dec 31st, 1899), and the t part is a fraction between 0.0 and 1.0 which stands for the part of the day at the given time (for example at 12:00 it's 0.5, at 18:00 it's 0.75 and so on).
I asked you to upload a file with sample data. .xlsx files are really ZIP archives which contains your XML-serialized worksheets. This are the dates I extracted from the relevant column. Excerpt:
38961.666666666628
38961.708333333292
38961.749999999956
When you try to manually deserialize you get the same datetimes as Panda. Unfortunately, the way Excel stores times makes it impossible to represent some values exactly, so you have to round them for displaying purposes. I'm not sure if rounded data is needed for analysis, though.
This is the script I used to test that deserialized datetimes are really the same ones as Panda:
from datetime import date, datetime, time, timedelta
from urllib2 import urlopen
def deserialize(text):
tokens = text.split(".")
date_tok = tokens[0]
time_tok = tokens[1] if len(tokens) == 2 else "0"
d = date(1899, 12, 31) + timedelta(int(date_tok))
t = time(*helper(float("0." + time_tok), (24, 60, 60, 1000000)))
return datetime.combine(d, t)
def helper(factor, units):
result = list()
for unit in units:
value, factor = divmod(factor * unit, 1)
result.append(int(value))
return result
url = "https://gist.github.com/RaffaeleSgarro/877d7449bd19722b44cb/raw/" \
"45d5f0b339d4abf3359fe673fcd2976374ed61b8/dates.txt"
for line in urlopen(url):
print deserialize(line)