Python: How to parse Timex3 into datetime or equivalent - python

In my Python3 project, I use SUTime together with Stanford CoreNLP to retrieve normalized time expression in the Timex3 standard. I access CoreNLP using pycorenlp. How can I parse the resulting time expression in Timex3 (part of the TimeML standard) into a datetime or another temporal instance? I guess that datetime is probably not sufficient since it cannot represent dateranges
For instance, Timex3 expressions are:
2017-11 referring to the whole month November
2017-10-07 referring to the whole seventh day in November
I've already tried parsing them with the Python library parsedatetime, which is able to read almost any input containing a temporal expression, but the library is obviously not suited for the Timex3 format, i.e., it converts 2017-11 to a datetime of the current day (assuming it's November 2017):
>>> dateparser.parse('2017-11')
datetime.datetime(2017, 11, 27, 0, 0)
Isn't there a Python library for converting Timex3 expressions into datetime or something equivalent? Someone commented that 2017-11 cannot be converted into a datetime (as each only reflects a single points in time). But there's for example the pandaslibrary, which contains date ranges as well.

Related

How to detect the date format? (python)

I have a Dataframe in python, with the data coming from a csv.
In the column "Date" I have a date (:)) but I don't know the date format. How can I detect it?
e.g.: I can have 05/05/2022. this can be M/D/Y or D/M/Y. I can manually understand it by looking at other entries, but I wish I can do it automatically.
Is there a way to do so?
thank you
datetime.strptime requires you to know the format.
trying (try - exept)-commands isn't good since there are so many different format I can receive.
it would be nice to have something that recognizes the format...
Update:
Thank you for the first answers, but the output I would like to have is THE FORMAT of the date that is used in the column.
Knowing also the fact that the format is unique within each column
You can try the dateutil library.
To deal with dates and also with the diversity of timezones people often use external libraries such as pytz or dateutil.
dateutil has a very powerful parser.
from dateutil.parser import parse
parse('05/05/2022') # datetime.datetime(2022, 5, 5, 0, 0)
parse('2022-05-05') # datetime.datetime(2022, 5, 5, 0, 0)
Use the isinstance built-in function to check if a variable is a datetime object in Python, e.g. if isinstance(today, datetime): . The isinstance function returns True if the passed in object is an instance or a subclass of the passed in class. Copied!20-Apr-2022
check out here
https://www.folkstalk.com/2022/10/python-check-if-string-is-date-format-with-code-examples.html

Convert an RFC 3339 nano time to Python datetime

Is there an easy way to convert an RFC 3339 nano time into a regular Python timestamp?
For example, time = '2022-07-14T12:01:25.225089838+08:00',
I found a way using datetime
from datetime import datetime
time = '2022-07-14T12:01:25.225089+08:00'
date = datetime.fromisoformat(time) # good
time = '2022-07-14T12:01:25.225089838+08:00'
date = datetime.fromisoformat(time) # error
It works with string like '2022-07-14T12:01:25.225089+08:00', but it doesn't work with the time above.
There are a few ways to do it.
Depends on what is the input format and how you define an easy way.
There are actually many post asking similar issues you have.
I'll post a few at the end for your reference if you are interested and please check next time before posting.
The main issue of datetime object is that it only holds 6 digit after second.
You will need a different data structure to save it if you want to preserve all of the digits.
If you are ok with cutting off at 6 digit, FObersteiner's answer is perfect.
Another methodology is vanilla datetime string parsing
from datetime import datetime
date = '2022-07-14T12:01:25.225089838+08:00'.removesuffix('+08:00')
x = datetime.strptime( date[:-3], '%Y-%m-%dT%H:%M:%S.%f')
If you would like to preserve all the digits. You may want to create your own class extending from the datetime class or create some function for it.
Convert an RFC 3339 time to a standard Python timestamp
Parsing datetime strings containing nanoseconds
from datetime.fromisoformat docs:
Caution: This does not support parsing arbitrary ISO 8601 strings - it is only intended as the inverse operation of datetime.isoformat(). A more full-featured ISO 8601 parser, dateutil.parser.isoparse is available in the third-party package dateutil.
dateutil's isoparse will do the job:
from dateutil.parser import isoparse
time = '2022-07-14T12:01:25.225089838+08:00'
date = isoparse(time)
print(date)
# 2022-07-14 12:01:25.225089+08:00
print(repr(date))
# datetime.datetime(2022, 7, 14, 12, 1, 25, 225089, tzinfo=tzoffset(None, 28800))
Note: it doesn't round to microseconds, it just slices off the last 3 decimal places. So basically, if you're dealing with a standardized format like RFC 3339, you can do the slicing yourself like
from datetime import datetime
time = '2022-07-14T12:01:25.225089838+08:00'
date = datetime.fromisoformat(time[:-9] + time[-6:])
print(date)
# 2022-07-14 12:01:25.225089+08:00

How to format a datetime string as defined below?

How can I format a datetime string like 2020-04-30T22:30:00-04:00 to something like 2015-03-22T10:00:00+0900 in Python? The formatting, not the actual date.
The 2 examples you have provided don't appear to have different formatting,
they're both ISO8601 Extended format.
If you need to specify your own format you can use the datetime function strftime and a format string, the reverse of this is strptime, which takes an input string and a format string and returns a datetime object
If you need to change timezones then the tzinfo object what your looking for
You might also be interested in the dateutil
Computing of relative deltas (next month, next year, next Monday, last week of month, etc);
Computing of relative deltas between two given date and/or datetime objects;
Computing of dates based on very flexible recurrence rules, using a superset of the iCalendar specification.
Parsing of RFC strings is supported as well. Generic parsing of dates in almost any string format
Timezone (tzinfo) implementations for tzfile(5) format files (/etc/localtime,/usr/share/zoneinfo, etc), TZ environment string (in all known
formats), iCalendar format files, given ranges (with help from relative deltas), local machine timezone, fixed offset timezone, UTC timezone, and Windows registry-based time zones. Internal up-to-date world timezone information based on Olson’s database.
Computing of Easter Sunday dates for any given year, using Western, Orthodox or Julian algorithms

python strptime vs dateutil - recommended use

I need to convert between strings and datetime objects quite often - up until now i have always used strptime and strftime.
I started working with the google calendar API where i recieve strings like this: '2018-03-17T09:00:00+01:00
It seems like i need to convert the +01:00 into 0100 for strptime which is a little annoying.
While i dont have this issue with dateutil there are a few other inconveniences.Also i saw that the last update of dateutil was in 2016 which seems odd.
So my question is which one would you recommend for adding and substracting dates and datetimes and switching between string and datetime obj?
Also is dateutil still maintained or is it outdated?
Thanks a lot!
- Sally
It seems like i need to convert the +01:00 into 0100 for strptime
No you don't.
For one thing, the standard format is +0100, not 0100.
For another, strptime handles +01:00 just fine:
>>> datetime.datetime.strptime('2018-08-13T11:18:24+00:00',
... '%Y-%m-%dT%H:%M:%S%z')
datetime.datetime(2018, 8, 13, 11, 18, 24, tzinfo=datetime.timezone.utc)
>>> datetime.datetime.strptime('2018-08-13T11:18:24+01:00',
... '%Y-%m-%dT%H:%M:%S%z')
datetime.datetime(2018, 8, 13, 11, 18, 24, tzinfo=datetime.timezone(datetime.timedelta(seconds=3600)))
So, the problem you're trying to solve doesn't exist in the first place.
While i dont have this issue with dateutil there are a few other inconveniences.Also i saw that the last update of dateutil was in 2016 which seems odd.
As of 13 Aug 2018, the last update to dateutil was 2 days ago. And the last official release to PyPI, version 2.7.3, was 3 months ago.
So, your secondary problem doesn't exist either.
So my question is which one would you recommend for adding and substracting dates and datetimes and switching between string and datetime obj?
Since dateutil just gives you the same datetime objects that datetime gives you, neither one is better for adding and subtracting dates and datetimes.
For converting to and from string format, sometimes dateutil is more convenient, and it also supports a wider range of formats that you don't care about—but for what you're doing, they both work fine, so there's no difference. If you expect to need other formats in the future, it might be worth bringing in dateutil, but if not, you might as well stick with the standard library.
dateutil last version (2.7.3) is in may 2018. It just says "copyright 2016" somewhere in the credits. Moreover, the documentation talks about policy for future versions, so it seems to be quite active. I would suggest to prefer it over strptime. However, be sure to get the latest version. Previous versions had a bug with converting ISO dates, with which you are working.

Now to convert this strings to date time object in Python or django?

Now to convert this strings to date time object in Python or django?
2010-08-17T19:00:00Z
2010-08-17T18:30:00Z
2010-08-17T17:05:00Z
2010-08-17T14:30:00Z
2010-08-10T22:20:00Z
2010-08-10T21:20:00Z
2010-08-10T20:25:00Z
2010-08-10T19:30:00Z
2010-08-10T19:00:00Z
2010-08-10T18:30:00Z
2010-08-10T17:30:00Z
2010-08-10T17:05:00Z
2010-08-10T17:05:00Z
2010-08-10T15:30:00Z
2010-08-10T14:30:00Z
whrn i do this datestr=datetime.strptime( datetime, "%Y-%m-%dT%H:%M:%S" )
it tell me that unconverted data remains: Z
You can parse the strings as-is without the need to slice if you don't mind using the handy dateutil module. For e.g.
>>> from dateutil.parser import parse
>>> s = "2010-08-17T19:00:00Z"
>>> parse(s)
datetime.datetime(2010, 8, 17, 19, 0, tzinfo=tzutc())
>>>
Use slicing to remove "Z" before supplying the string for conversion
datestr=datetime.strptime( datetime[:-1], "%Y-%m-%dT%H:%M:%S" )
>>> test = "2010-08-17T19:00:00Z"
>>> test[:-1]
'2010-08-17T19:00:00'
Those seem to be ISO 8601 dates. If your timezone is always the same, just remove the last letter before parsing it with strptime (e.g by slicing).
The Z indicates the timezone, so be sure that you are taking that into account when converting it to a datetime of a different timezone. If the timezone can change in your application, you'll have to parse that information also and change the datetime object accordingly.
You could also use the pyiso8601 module to parse these ISO dates, it will most likely also work with slighty different ISO date formats. If your data may contain different timezones I would suggest to use this module.
change your format string to ""%Y-%m-%dT%H:%M:%SZ" so that it includes the trailing Z (which makes it no longer unconverted). Note, however, that this Z perhaps is there to indicate that the time is in UTC which might be something you need to account for otherwise

Categories

Resources