Why dateutil.parser does not work as expected? - python

I'm trying get a datetime instance representing April 23rd, but always get March 4th if the argument passed is '3.4', setting dayfirst=False is of no use:
In [115]: from dateutil import parser
In [116]: parser.parse('4-23', ) #√
Out[116]: datetime.datetime(2014, 4, 23, 0, 0)
In [117]: parser.parse('4/23', ) #√
Out[117]: datetime.datetime(2014, 4, 23, 0, 0)
In [118]: parser.parse('4.23', ) #×
Out[118]: datetime.datetime(2014, 3, 4, 0, 0)
In [120]: parser.parse('4.23', dayfirst=False) #×
Out[120]: datetime.datetime(2014, 3, 4, 0, 0)
is it a bug of parser?

The simple answer is that a dot character is not supported as a separator between units of time by the parse method since it is used in the context of a time string represented in ISO format.
Please try converting all dots to slashes(/) or heifens(-)
parser.parse('4.23'.replace('.','/'))
to resolve this problem
EDIT (to address new comments):
Here is an example of this in action
parser works as expected below:
>>> parser.parse('4/11/2019', dayfirst=True)
datetime.datetime(2019, 11, 4, 0, 0)
>>> parser.parse('4/11/2019', dayfirst=False)
datetime.datetime(2019, 4, 11, 0, 0)
parser assumes errors on behalf of the invoker and attempts to auto-correct the issue by invalidating the dayfirst parameter:
>>> parser.parse('3/13/2019', dayfirst=False)
datetime.datetime(2019, 3, 13, 0, 0)
>>> parser.parse('3/13/2019', dayfirst=True)
datetime.datetime(2019, 3, 13, 0, 0)
There cannot be a 13th month and therefore parser assumes that invoker incorrectly requested the dayfirst parameter - attempts to resolve the issue by invalidating the dayfirst instead of throwing an exception. This is another issue with this parser.

Related

How to create a datetime object with tzinfo set as 'UTC'?

I have a list of datetimes objects :
time_range = [datetime.datetime(2019, 7, 9, 0, 0, tzinfo=tzutc()),
datetime.datetime(2019, 7, 8, 0, 0, tzinfo=tzutc()),
datetime.datetime(2019, 7, 7, 0, 0, tzinfo=tzutc()),
datetime.datetime(2019, 7, 6, 0, 0, tzinfo=tzutc())
... ]
And I have an other datetime object :
time = datetime(2019, 7, 7)
I have to test if time is in time_range.
But each time I test :
time in time_range
I get the output False, because I don't have the tzinfo.
Here's what I've tried :
I tried to add the tzinfo :
time = datetime(2019, 7, 7, tzinfo=tzuct())
but I can't find where the tzutc() function is.
I also tried tu use pandas :
import pandas as pd
pd.to_datetime(str(time) + '+00:00')
I get the UTC :
Timestamp('2019-07-05 00:00:00+0000', tz='UTC')
But this is not a datetime.datetime object...
Do you have an idea how I could do ?
(Note : i'm compelled to use the form time in time_range, because of the rest of my program)
In datetime constructor, tzinfo parameter expects a type of timezone. It's not the clearest documentation. Try this:
from datetime import datetime, timezone
dt = datetime(2019, 7, 7, tzinfo=timezone.utc)
>>> from datetime import datetime, timezone
>>> time = datetime(2019, 7, 7, tzinfo=timezone.utc)
>>> print(time)
2019-07-07 00:00:00+00:00
>>> print(time.tzinfo)
UTC
After some research, I found an other solution, using pandas :
utc_time = pd.to_datetime(str(time) + '+00:00').to_pydatetime()
returns a datetime.datetime object :
datetime.datetime(2019, 7, 7, 0, 0, tzinfo=<UTC>)
However, to avoid importing pandas library, here's the solution I used :
from datetime import datetime, timezone
new_time = time.replace(tzinfo=timezone.utc)
new_time in time_range.
>>> True

pandas representing datetime with midnight as hour 24

I am working in python pandas and I am doing the following:
StDt = datetime(2018, 1, 1, 1, 0)
EnDt = datetime(2020, 1, 1, 1, 0)
allHours = pd.date_range(StDt, EnDt, freq='H').to_pydatetime()
The midnight hours are represented as:
datetime(2018, 1, 3, 0, 0)
datetime(2018, 1, 5, 0, 0)
Is it possible to create the series in a way such that midnight is represented as hour 24 of previous day
i.e. the above two cases will look as:
datetime(2018, 1, 2, 24, 0)
datetime(2018, 1, 4, 24, 0)
i.e. I am looking for following:
datetime(2018, 1, 3, 0, 0) = datetime(2018, 1, 2, 24, 0)
datetime(2018, 1, 5, 0, 0) = datetime(2018, 1, 4, 24, 0)
Edit:
My particular situation requires working in hour ending world and that is how the convention is in what I am working in.
Using datetimes, this is not possible. Python simply doesn't accept datetime(2018, 1, 2, 24, 0) as a valid time.
There was a request in 2010 to allow for this time to be accepted
Issue 10427: 24:-00 Hour in DateTime
which was rejected.
My only suggestion would be to consider whether you really need this time depicted as you outlined. For actual data manipulation, it should not make any difference as any operations you'd like to do in Pandas with datetimes will conform to this same restriction anyways.
I was working with similar data, and found it useful to consider that Hour Ending data labeled 1-24 is the equivalent of Hour Beginning data labeled 0-23.
So you'll have to change your rule set notation, but it should be a straightforward change.

Why does relativedelta with positive arguments return a date in the past?

I have some trouble to understand the behavior of dateutil.relativedelta.
I understand that relativedelta could return past dates if I use negative arguments as specified in relativedelta doc.
However, when I provide positive parameters, I expect that it always return a date in the future... that seems legit right?
My use case is the following : we are Tuesday, it is 8:35. I want to get the date of the closest Monday and Tuesday at 6:00.
Here what I did. The first result seems correct to me, while the second one is wrong.
>>> import datetime
>>> now = datetime.datetime.now()
>>> now
datetime.datetime(2016, 11, 29, 8, 35, 23, 786349)
>>> from dateutil import relativedelta
>>> now.weekday()
1
>>> now + relativedelta.relativedelta(weekday=0, hour=6, minute=0) # should give a time in the future
datetime.datetime(2016, 12, 5, 6, 0, 23, 786349) # here this is correct, in the future
>>> now + relativedelta.relativedelta(weekday=1, hour=6, minute=0) # should give a time in the future
datetime.datetime(2016, 11, 29, 6, 0, 23, 786349) # but this is in the past / I would expect result (2016, 12, 6, 6, 0, 23, 786349)
So , am I doing something wrong here ?
So according to your initial date, you're actually at 8AM, but you're targeting 6AM by using the hour param, if you're trying to increment one hour, you should use hours and minutes respectively
>>> now
datetime.datetime(2016, 11, 29, 3, 5, 41, 763818)
>>> now.weekday()
1
>>> now + relativedelta.relativedelta(weekday=1, hour=1)
datetime.datetime(2016, 11, 29, 1, 5, 41, 763818) # Notice how it's in the past
>>> now + relativedelta.relativedelta(weekday=1, hours=1)
datetime.datetime(2016, 11, 29, 4, 5, 41, 763818) # Notice how it's one hour in the future
>>> n + relativedelta.relativedelta(weekday=1, hour=6, minute=0, weeks=1)
datetime.datetime(2016, 12, 6, 6, 0, 41, 763818)
I think it's in the doc:
Starting with, about weekday:
These instances may receive a parameter N, specifying the Nth weekday, which could be positive or negative (like MO(+1) or MO(-2). Not specifying it is the same as specifying +1.
So by passing 1, it's like you're passing (1, 1)
Then, continuing on the doc, on the 7th dot of behavior of operations with relativedelta:
Notice that if the calculated date is already Monday, for example, using (0, 1) or (0, -1) won’t change the day.
So the 29th of November is already a Tuesday, and you're asking for a Tuesday.
So nothing changes.

How to change the date format of the whole column?

I am analyzing the .csv file and in this my first column is of the datetime in the format "2016-09-15T00:00:13" and I want to change this format to standard python datetime object.I can change the format for one but date but for whole column I can not do that.
My code that I am using:
import numpy
import dateutil.parser
mydate = dateutil.parser.parse(numpy.mydata[1:,0])
print(mydate)
I am getting the error:
'module' object has no attribute 'mydata'
Here is the column for which I want the format to be changed.
print(mydata[1:,0])
['2016-09-15T00:00:13'
'2016-09-15T00:00:38'
'2016-09-15T00:00:53'
...,
'2016-09-15T23:59:28'
'2016-09-15T23:59:37'
'2016-09-15T23:59:52']
from datetime import datetime
for date in mydata:
date_object = datetime.strptime(date, '%Y-%m-%dT%H:%M:%S')
Here's a link to the method I'm using. That same link also lists the format arguments.
Oh and about the
'module' object has no attribute 'mydata'
You call numpy.mydata which is a reference to the "mydata" attribute of the numpy module you imported. The problem is, is that "mydata" is just one of your variables, not something included with numpy.
Unless you have a compelling reason to avoid it, pandas is the way to go with this kind of analysis. You can simply do
import pandas
df = pandas.read_csv('myfile.csv', parse_dates=True)
This will assume the first column is the index column and parse dates in it. This is probably what you want.
Assuming you've dealt with that numpy.mydata[1:,0] attribute error
Your data looks like:
In [268]: mydata=['2016-09-15T00:00:13' ,
...: '2016-09-15T00:00:38' ,
...: '2016-09-15T00:00:53' ,
...: '2016-09-15T23:59:28' ,
...: '2016-09-15T23:59:37' ,
...: '2016-09-15T23:59:52']
or in array form it is a ld array of strings
In [269]: mydata=np.array(mydata)
In [270]: mydata
Out[270]:
array(['2016-09-15T00:00:13', '2016-09-15T00:00:38', '2016-09-15T00:00:53',
'2016-09-15T23:59:28', '2016-09-15T23:59:37', '2016-09-15T23:59:52'],
dtype='<U19')
numpy has a version of datetime that stores as a 64 bit float, and can be used numerically. Your dates readily convert to that with astype (your format is standard):
In [271]: mydata.astype(np.datetime64)
Out[271]:
array(['2016-09-15T00:00:13', '2016-09-15T00:00:38', '2016-09-15T00:00:53',
'2016-09-15T23:59:28', '2016-09-15T23:59:37', '2016-09-15T23:59:52'],
dtype='datetime64[s]')
tolist converts this array to a list - and the dates to datetime objects:
In [274]: D.tolist()
Out[274]:
[datetime.datetime(2016, 9, 15, 0, 0, 13),
datetime.datetime(2016, 9, 15, 0, 0, 38),
datetime.datetime(2016, 9, 15, 0, 0, 53),
datetime.datetime(2016, 9, 15, 23, 59, 28),
datetime.datetime(2016, 9, 15, 23, 59, 37),
datetime.datetime(2016, 9, 15, 23, 59, 52)]
which could be turned back into an array of dtype object:
In [275]: np.array(D.tolist())
Out[275]:
array([datetime.datetime(2016, 9, 15, 0, 0, 13),
datetime.datetime(2016, 9, 15, 0, 0, 38),
datetime.datetime(2016, 9, 15, 0, 0, 53),
datetime.datetime(2016, 9, 15, 23, 59, 28),
datetime.datetime(2016, 9, 15, 23, 59, 37),
datetime.datetime(2016, 9, 15, 23, 59, 52)], dtype=object)
These objects couldn't be used in array calculations. The list would be just as useful.
If your string format wasn't standard you'd have to use the datetime parser in a list comprehension as #staples shows.

Pandas : first datetime field gets automatically converted to timestamp type

When creating a pandas dataframe object (python 2.7.9, pandas 0.16.2), the first datetime field gets automatically converted into a pandas timestamp. Why? Is it possible to prevent this so as to keep the field in the original type?
Please see code below:
import numpy as np
import datetime
import pandas
create a dict:
x = {'cusip': np.array(['10553M10', '67085120', '67085140'], dtype='|S8'),
'vstart':np.array([datetime.datetime(2001, 11, 16, 0, 0),
datetime.datetime(2012, 2, 28, 0, 0), datetime.datetime(2014, 12, 22, 0, 0)],
dtype=object),
'vstop': np.array([datetime.datetime(2012, 2, 28, 0, 0),
datetime.datetime(2014, 12, 22, 0, 0), datetime.datetime(9999, 12, 31, 0, 0)],
dtype=object),
'id': np.array(['EQ0000000000041095', 'EQ0000000000041095', 'EQ0000000000041095'],
dtype='|S18')}
So, the vstart and vstop keys are datetime so far. However, after:
df = pandas.DataFrame(data = x)
the vstart becomes a pandas Timestamp automatically while vstop remains a datetime
type(df.vstart[0])
#class 'pandas.tslib.Timestamp'
type(df.vstop[0])
#type 'datetime.datetime'
I don't understand why the first datetime column that the constructor comes across gets converted to Timestamp by pandas. And how to tell pandas to keep the data types as they are. Can you help? Thank you.
actually I've noticed something in your data , it has nothing to do with your first or second date column in your column vstop there is a datetime with value dt.datetime(9999, 12, 31, 0, 0) , if you changed the year on this date to a normal year like 2020 for example both columns will be treated the same .
just note that I'm importing datetime module as dt
x = {'cusip': np.array(['10553M10', '67085120', '67085140'], dtype='|S8'),
'vstop': np.array([dt.datetime(2012, 2, 28, 0, 0), dt.datetime(2014, 12, 22, 0, 0), dt.datetime(2020, 12, 31, 0, 0)], dtype=object),
'vstart': np.array([dt.datetime(2001, 11, 16, 0, 0),dt.datetime(2012, 2, 28, 0, 0), dt.datetime(2014, 12, 22, 0, 0)], dtype=object),
'id': np.array(['EQ0000000000041095', 'EQ0000000000041095', 'EQ0000000000041095'], dtype='|S18')}
In [27]:
df = pd.DataFrame(x)
df
Out[27]:
cusip id vstart vstop
10553M10 EQ0000000000041095 2001-11-16 2012-02-28
67085120 EQ0000000000041095 2012-02-28 2014-12-22
67085140 EQ0000000000041095 2014-12-22 2020-12-31
In [25]:
type(df.vstart[0])
Out[25]:
pandas.tslib.Timestamp
In [26]:
type(df.vstop[0])
Out[26]:
pandas.tslib.Timestamp

Categories

Resources