parsing month year pairs into datetime - python

Let's say i have 2 strings 'Jan-2010' and 'Mar-2010' and i want to parse it such that it returns 2 datetime objects: 1-Jan-2010 and 31-Mar-2010 (i.e. the last day).
What would be the best strategy in python? Should i just split the string into tokens or use regular expressions and then use the calendar functions to get say the last day of the month for 'Mar-2010' (getting the first day is trivial, its always 1 in this case unless i wanted the first working day of the month).
Any suggestions? Thanks in advance.

strptime does the string parsing into dates on your behalf:
def firstofmonth(MmmYyyy):
return datetime.datetime.strptime(MmmYyyy, '%b-%Y').date()
much better than messing around with tokenization, regexp, &c!-).
To get the date of the last day of the month, you can indeed use the calendar module:
def lastofmonth(MmmYyyy):
first = firstofmonth(MmmYyyy)
_, lastday = calendar.monthrange(first.year, first.month)
return datetime.date(first.year, first.month, lastday)
You could ALMOST do it neatly with datetime alone, e.g., an ALMOST working approach:
def lastofmonth(MmmYyyy):
first = firstofmonth(MmmYyyy)
return first.replace(month=first.month+1, day=1
) - datetime.timedelta(days=1)
but, alas!, this breaks for December, and the code needed to specialcase December makes the overall approach goofier than calendar affords;-).

I highly recommend using the python timeseries module, which you can download and read about here:
http://pytseries.sourceforge.net/
You should also use the dateutil package for parsing the date string, which you can find here:
http://labix.org/python-dateutil
Then you can do something like this
import datetime
import dateutil.parser
import scikits.timeseries as TS
m1 = TS.Date('M', datetime=dateutil.parser.parse('Jan-2010'))
m2 = TS.Date('M', datetime=dateutil.parser.parse('Mar-2010'))
d1 = m1.asfreq('D', relation='START') # returns a TS.Date object
d2 = m2.asfreq('D', relation='END')
firstDay = d1.datetime
lastDay = d2.datetime
This solution is dependent out outside modules, but they're very powerful and well written.

from datetime import datetime, timedelta
def first_day(some_date):
return some_date.replace(day=1, hour=0, minute=0, second=0, microsecond=0)
def next_month(some_date):
return first_day(first_day(some_date) + timedelta(days=31))
def last_day(some_date):
return next_month(some_date) - timedelta(days=1)
# testing:
months = [('Jan-2010', 'Mar-2010'), # your example
('Apr-2009', 'Apr-2009'), # same month, 30 days
('Jan-2008', 'Dec-2008'), # whole year
('Jan-2007', 'Feb-2007')] # february involved
for date1, date2 in months:
print first_day(datetime.strptime(date1, '%b-%Y')),
print '-',
print last_day(datetime.strptime(date2, '%b-%Y'))
That prints:
2010-01-01 00:00:00 - 2010-03-31 00:00:00
2009-04-01 00:00:00 - 2009-04-30 00:00:00
2008-01-01 00:00:00 - 2008-12-31 00:00:00
2007-01-01 00:00:00 - 2007-02-28 00:00:00

i know it's long time gone, but if someone needs:
from dateutil import rrule
from dateutil import parser
from datetime import datetime
first_day = parser.parse('Jan-2010',default=datetime(1,1,1))
last_day = rrule.rrule(rrule.MONTHLY,count=1,bymonthday=-1, bysetpos=1,dtstart=parser.parse('Mar-2010'))

Riffing on Alex Martelli's:
import datetime
def lastofmonthHelper(MmmYyyy): # Takes a date
return MmmYyyy.replace(year=MmmYyyy.year+(MmmYyyy.month==12), month=MmmYyyy.month%12 + 1, day=1) - datetime.timedelta(days=1)
>>> for month in range(1,13):
... t = datetime.date(2009,month,1)
... print t, lastofmonthHelper(t)
...
2009-01-01 2009-01-31
2009-02-01 2009-02-28
2009-03-01 2009-03-31
2009-04-01 2009-04-30
2009-05-01 2009-05-31
2009-06-01 2009-06-30
2009-07-01 2009-07-31
2009-08-01 2009-08-31
2009-09-01 2009-09-30
2009-10-01 2009-10-31
2009-11-01 2009-11-30
2009-12-01 2009-12-31
You don't have to use the first day of the month, BTW. I would have put this in a comment but we all know how the formatting would have turned out. Feel free to upvote Alex.
If you call with the result of a firstofmonth() call, you get the desired result:
>>> lastofmonthHelper(firstofmonth('Apr-2009'))
datetime.date(2009, 4, 30)

Related

how can I strip the 00:00:00 from a date in python

This might be a really simple question, but I'm using the code below to add 1 day to a date and then output the new date. I found this code online.
from datetime import datetime
from datetime import timedelta
# taking input as the date
Begindatestring = "2020-10-11"
# carry out conversion between string
# to datetime object
Begindate = datetime.strptime(Begindatestring, "%Y-%m-%d")
# print begin date
print("Beginning date")
print(Begindate)
# calculating end date by adding 1 day
Enddate = Begindate + timedelta(days=1)
# printing end date
print("Ending date")
print(Enddate)
this code works but the output its gives me looks like this
Beginning date
2020-10-11 00:00:00
Ending date
2020-10-12 00:00:00
but for the rest of my code to run properly I need to get rid of the 00:00:00
so I need an output that looks like this
Beginning date
2020-10-11
Ending date
2020-10-12
It seems like there might be a simple solution but I can't find it.
Try using:
print(Begindate.strftime("%Y-%m-%d"))
Check https://www.programiz.com/python-programming/datetime/strftime to learn more.
Begindate = datetime.strptime(Begindatestring, "%Y-%m-%d").date()
Enddate = Enddate.date()

Calling a datetime object as string in Python

I have a slice function in Python that slices parse dates like "1960-01-01". I have tried to assign variables to make the code generic, However, when the data is not called like this :
calibration_period = slice('1960-01-01', '2000-12-31')
validation_period = slice('2001-01-01', '2014-12-31')
and called like:
calibration_period = slice(Base, Date[-1])
validation_period = slice(Date2[0],Date2[-1])
The last value is read as 2014-12-31 00:00:00, but I want to read it as "2014-12-31" so the calculations continue up to 2014-12-31 23:00:00.
I have used this:
from datetime import datetime
t=pd.to_datetime(str(Date2[-1]))
strg=t.strftime('%Y-%m-%d')
although the print function shows it as 2014-12-31 the print for validation is still:
slice(numpy.datetime64('2001-01-01T00:00:00.000000000'), Timestamp('2014-12-31 00:00:00'), None)
I would be really grateful if someone has a suggestion.
I think the issue is you're mixing up datetime formats and string formats.
from datetime import datetime
time = datetime.strptime('01/01/2010', '%d/%m/%Y')
newtime = datetime.strftime(time, '%d/%m/%Y')
print(time, newtime)
2010-01-01 00:00:00 01/01/2010
Press any key to continue . . .
Convert your date times to a string with the format you want using datetime.strftime, then you can use logic to do the calculation, i.e:
A datetime object with value 2010-01-01 23:30:00 will always be converted to a string of type 2010-01-01 when using:
value = datetime.strftime(value, '%Y-%m-%d')
Can then perform logic on the two strings
if value == newtime:
print(value)
Full example:
from datetime import datetime
time = datetime.strptime('01/01/2010 20:30:30', '%d/%m/%Y %H:%M:%S')
newtime = datetime.strftime(time, '%d/%m/%Y')
print(time)
print(newtime)
#Outputs:
2010-01-01 20:30:30
01/01/2010
May be helpful
>>> from datetime import datetime
>>> d = datetime.utcnow()
>>> d.date()
datetime.date(2018, 7, 10)
>>> str(d.date())
'2018-07-10'

Previous month datetime pandas

I have a datetime instance declared as follows:
dtDate = datetime.datetime(2016,1,1,0,0)
How do I get the previous month and previous year from dtDate?
e.g. something like:
dtDate.minusOneMonth()
# to return datetime.datetime(2015,12,1,0,0)
You can use:
dtDate = datetime.datetime(2016,1,1,0,0)
print (dtDate - pd.DateOffset(months=1))
2015-12-01 00:00:00
print (dtDate - pd.DateOffset(years=1))
2015-01-01 00:00:00
Add s is important, because if use year only:
print (dtDate - pd.DateOffset(year=1))
0001-01-01 00:00:00
You can use DateOffset:
In [32]:
dtDate = dt.datetime(2016,1,1,0,0)
dtDate - pd.DateOffset(months=1)
Out[32]:
Timestamp('2015-12-01 00:00:00')
To Manipulate an entire pandas series.
Use pd.DateOffset() with .dt.to_period("M")
df['year_month'] = df['timestamp'].dt.to_period("M")
df['prev_year_month'] = (df['timestamp'] - pd.DateOffset(months=1)).dt.to_period("M")
If you want to go forward a month, set months=-1.
Use relativedelta from dateutil:
import datetime
import dateutil.relativedelta
dtDate = datetime.datetime(2016,1,1,0,0)
# get previous month
print ((dtDate+dateutil.relativedelta.relativedelta(months=-1)).month)
# get previous year
print ((dtDate+dateutil.relativedelta.relativedelta(years=-1)).year)
Output:
12
2015

How do I find the next 7am in a timezone [duplicate]

This question already has answers here:
How to convert tomorrows (at specific time) date to a timestamp
(2 answers)
Closed 6 years ago.
current_datetime = datetime.now(tz)
next_hour = datetime(current_datetime.year, current_datetime.month, current_datetime.day, 7, 0, 0, 0, tz)
timedelta_until_next_hour = next_hour - current_datetime
if timedelta_until_next_hour.total_seconds() < 0:
timedelta_until_next_hour += timedelta(days=1)
return timedelta_until_next_hour.total_seconds()
I'm trying to find the next time it's 7am for a local timezone and return the number of seconds until that.
I'm having some daylight savings time issues. For Example: America/New_York current_datetime has a utcoffset of -4 hours
Whereas next_hour has an offset of -5 hours, so the subtraction of the two is off by an hour
Finding the next 7am
You can do this pretty easily with python-dateutil's relativedelta module:
from dateutil.relativedelta import relativedelta
def next_7am(dt):
relative_days = (dt.hour >= 7)
absolute_kwargs = dict(hour=7, minute=0, second=0, microsecond=0)
return dt + relativedelta(days=relative_days, **absolute_kwargs)
The way it works is that relativedelta takes absolute arguments (denoted by being in the singular, e.g. month, year, day) and relative arguments (denoted by being in the plural, e.g. months, years, days). If you add a relativedelta object to a datetime, it will replace absolute values in the datetime, then add the relative values, so what I've done above is specify that relative_days should be 1 if it's already 7am, otherwise it should be 0, and the absolute arguments say "replace the time with 7 AM". Add that to your datetime and it will give you the next 7am.
Dealing with time zones
The next step depends on what you are using for your time zone. If you are using a dateutil time zone, then you can just use the function defined above:
dt_next_7am = next_7am(dt)
If you are using a pytz timezone, you should strip it off and do the calculation as a naive date-time, then re-localize the time zone, as below:
dt_next_7am = tz.localize(next_7am(dt.replace(tzinfo=None)))
If you want to get the absolute number of hours between those two times, you should do the arithmetic in UTC:
time_between = dt_next_7am.astimezone(tz=UTC) - dt.astimezone(tz=UTC)
Where UTC has been defined as either dateutil.tz.tzutc() or pytz.UTC or equivalent.
Examples across a DST transition
Here is an example using dateutil (with the result in the comment):
from datetime import datetime
from dateutil.tz import gettz, tzutc
LA = gettz('America/Los_Angeles')
dt = datetime(2011, 11, 5, 12, 30, tzinfo=LA)
dt7 = next_7am(dt)
print(dt7.astimezone(tzutc()) - dt.astimezone(tzutc())) # 19:30:00
And an example showing the wrong and right way to do this with pytz:
from datetime import datetime
import pytz
LA = pytz.timezone('America/Los_Angeles')
UTC = pytz.UTC
dt = LA.localize(datetime(2011, 11, 5, 12, 30))
dt7_bad = next_7am(dt) # pytz won't like this
dt7_good = LA.localize(next_7am(dt.replace(tzinfo=None)))
dt_utc = dt.astimezone(pytz.UTC)
print(dt7_bad.astimezone(pytz.UTC) - dt_utc) # 18:30:00 (Wrong)
print(dt7_good.astimezone(pytz.UTC) - dt_utc) # 19:30:00 (Right)
Ambiguous / Non-existent 7 AM
If you are dealing with certain dates in certain zones, specifically those that would result in an ambiguous time are on the following list (as of April 2016):
1901-12-13 07:00:00 (/Pacific/Fakaofo)
1901-12-14 07:00:00 (/Asia/Kamchatka)
1901-12-14 07:00:00 (/Asia/Ust-Nera)
1901-12-14 07:00:00 (/Pacific/Bougainville)
1901-12-14 07:00:00 (/Pacific/Kosrae)
1901-12-14 07:00:00 (/Pacific/Majuro)
1917-03-25 07:00:00 (/Antarctica/Macquarie)
1918-03-31 07:00:00 (/EST5EDT)
1919-03-31 07:00:00 (/Antarctica/Macquarie)
1952-01-13 07:00:00 (/Antarctica/DumontDUrville)
1954-02-13 07:00:00 (/Antarctica/Mawson)
1957-01-13 07:00:00 (/Antarctica/Davis)
1969-01-01 07:00:00 (/Antarctica/Casey)
1969-02-01 07:00:00 (/Antarctica/Davis)
1969-09-29 07:00:00 (/Kwajalein)
1969-09-29 07:00:00 (/Pacific/Kwajalein)
1979-09-30 07:00:00 (/Pacific/Enderbury)
1979-09-30 07:00:00 (/Pacific/Kiritimati)
2009-10-18 07:00:00 (/Antarctica/Casey)
2011-09-23 07:00:00 (/Pacific/Apia)
2011-10-28 07:00:00 (/Antarctica/Casey)
Then the resulting 7AM value will be either ambiguous or non-existent. If you want to handle these edge cases, see this answer. It is probably worth noting that after PEP495 has been implemented, dealing with ambiguous times will probably be handled slightly differently.
An alternative implementation using python-dateutil's rrule module for generating recurrence rules and approach with pytz zones is below (note that this will work with non-pytz zones, but it will not resolve ambiguious/non-existent times properly):
from datetime import datetime
from dateutil import rrule
import pytz
def next_real_7am(dt):
tzi = dt.tzinfo
dt_naive = dt.replace(tzinfo=None)
rr = rrule.rrule(freq=rrule.DAILY, byhour=7, dtstart=dt_naive)
for ndt in rr:
localize = getattr(tzi, 'localize', None)
if tzi is not None and localize is not None:
try:
ndt = localize(ndt, is_dst=None)
except pytz.AmbiguousTimeError:
return min([localize(ndt, is_dst=True),
localize(ndt, is_dst=False)])
except pytz.NonExistentTimeError:
continue
else:
ndt = ndt.replace(tzinfo=tzi)
return ndt
KWA = pytz.timezone('Pacific/Kwajalein')
dtstart = KWA.localize(datetime(1969, 9, 29, 18))
dt7 = next_real_7am(dtstart)
print(dt7.tzname()) # Should be MHT, before the transition
dtstart = KWA.localize(datetime(1993, 8, 19, 18)) # There was no 8/20 in this zone
dt7 = next_real_7am(dtstart)
print(dt7) # Should be 1993-8-21 07:00:00
I'm trying to find the next time it's 7am for a local timezone and return the number of seconds until that.
Find dt7 using the same code as for dt6 (replace time(6) with time(7)).
Then the number of seconds until that is (dt7 - now).total_seconds().
See the bullet points that explain when other solutions may fail.

How can I get all the dates within a week of a certain day using datetime?

I have some measurements that happened on specific days in a dictionary. It looks like
date_dictionary['YYYY-MM-DD'] = measurement.
I want to calculate the variance between the measurements within 7 days from a given date. When I convert the date strings to a datetime.datetime, the result looks like a tuple or an array, but doesn't behave like one.
Is there an easy way to generate all the dates one week from a given date? If so, how can I do that efficiently?
You can do this using - timedelta . Example -
>>> from datetime import datetime,timedelta
>>> d = datetime.strptime('2015-07-22','%Y-%m-%d')
>>> for i in range(1,8):
... print(d + timedelta(days=i))
...
2015-07-23 00:00:00
2015-07-24 00:00:00
2015-07-25 00:00:00
2015-07-26 00:00:00
2015-07-27 00:00:00
2015-07-28 00:00:00
2015-07-29 00:00:00
You do not actually need to print it, datetime object + timedelta object returns a datetime object. You can use that returned datetime object directly in your calculation.
Using datetime, to generate all 7 dates following a given date, including the the given date, you can do:
import datetime
dt = datetime.datetime(...)
week_dates = [ dt + datetime.timedelta(days=i) for i in range(7) ]
There are libraries providing nicer APIs for performing datetime/date operations, most notably pandas (though it includes much much more). See pandas.date_range.

Categories

Resources