Datetime ints in fewer lines - python

Right now I am using the following functions to calculate a date and time int like this (ymd), (hms). I believe it is easier to do this for comparison.
def getDayAsInt():
time = datetime.datetime.now()
year = time.strftime("%Y")
month=makeTimeTwoDigit(time.strftime("%m"))
day=makeTimeTwoDigit(time.strftime("%d"))
return year+month+day
def getTimeOfDay():
day=makeTimeTwoDigit(time.strftime("%d"))
hour=makeTimeTwoDigit(time.strftime("%H"))
minute=makeTimeTwoDigit(time.strftime("%M"))
second=makeTimeTwoDigit(time.strftime("%S"))
return hour+minute+second
I initially tried something like this:
'date': str(datetime.now()),
However I ran into an issue of easier generating a date range to query it. For example if today is 20140616 I can simply query dates between 20140601 and 20140616 where as generating all of the possible date times is harder. Does that make sense?
Ex I want to find out events that happened today but having a date time string stored in dynamodb is harder (more things to match to) to match.
I'm wondering if there is an easier or more efficient way? Is breaking the date and time down like that done? Should I take this:
year = time.strftime("%Y")
month=makeTimeTwoDigit(time.strftime("%m"))
day=makeTimeTwoDigit(time.strftime("%d"))
And do it inn one line? Like should I do time.strftime("%Y%m%d")?

If you are doing the comparisons in python, an easier solution would be to use builtin datetime objects and the normal comparison operators, like < and >.
from datetime import datetime
dt_object = datetime.strptime('Jun 1 2005 1:33PM', '%b %d %Y %I:%M%p')
if datetime(2006, 6, 5, 0, 0, 0) <= dt_object < datetime(2006, 6, 6, 0, 0, 0):
# do something when date is anytime on June 5th, 2006
If you must do the comparison in the query, you can use regular string comparison as long as your dates are stored in ISO-8601 format. The advantage of ISO-8601 is that chronological sorting is equivalent to lexographic sorting, i.e. you can treat them as normal strings.
The equivalent comparison using ISO-8601 format:
'2006-06-05T00:00:00Z' <= dt < '2006-06-06T00:00:00Z'

I thinking breaking the day (year/month/date) from time (hour/minute/second) is the cleanest solution for you since you want to do query on day.

Related

How to iterate over range between two datetime objects in Python? [duplicate]

Okay so I am relatively new to programming and this has me absolutely stumped. Im scraping data from a website and the data changes every week. I want to run my scraping process each time the data changes starting back on 09-09-2015 and running to current.
I know how to do this easily running thru every number like 0909 then 0910 then 0911 but that is not what I need as that will be requesting way too many requests from the server that are pointless.
Here is the format of the URL
http://www.myexamplesite.com/?date=09092015
I know the simple:
for i in range(startDate, endDate):
url = 'http://www.myexamplesite.com/?date={}'.format(i)
driver.get(url)
But one thing i've never been able to figure out is manipulate pythons dateTime to accurately reflect the format the website uses.
i.e:
09092015
09162015
09232015
09302015
10072015
...
09272017
If all else fails I only need to do this once so it wouldnt take too long to just ignore the loop altogether and just manually enter the date I wish to scrape from and then just append all of my dataframes together. Im mainly curious on how to manipulate the datetime function in this sense for future projects that may require more data.
A good place to start are datetime, date and timedelta objects docs.
First, let's construct our starting date and ending date (today):
>>> from datetime import date, timedelta
>>> start = date(2015, 9, 9)
>>> end = date.today()
>>> start, end
(datetime.date(2015, 9, 9), datetime.date(2017, 9, 27))
Now let's define the unit of increment -- one day:
>>> day = timedelta(days=1)
>>> day
datetime.timedelta(1)
A nice thing about dates (date/datetime) and time deltas (timedelta) is they and can be added:
>>> start + day
datetime.date(2015, 9, 10)
We can also use format() to get that date in a human-readable form:
>>> "{date.day:02}{date.month:02}{date.year}".format(date=start+day)
'10092015'
So, when we put all this together:
from datetime import date, timedelta
start = date(2015, 9, 9)
end = date.today()
week = timedelta(days=7)
mydate = start
while mydate < end:
print("{date.day:02}{date.month:02}{date.year}".format(date=mydate))
mydate += week
we get a simple iteration over dates starting with 2015-09-09 and ending with today, incremented by 7 days (a week):
09092015
16092015
23092015
30092015
07102015
...
Take a look here
https://docs.python.org/2/library/datetime.html#strftime-and-strptime-behavior
You can see the table pictured here for formatting dates and times and the usage.
Of course, if the format of the dates changes in the future or you are parsing different strings, you will have to make code changes. There really is no way around that.

Generate randomly formatted date strings for machine learning

For a NLP project in python I need to generate random dates for model training purpose. Particularly, the date format must be random and coherent with a set of language locales. The formats includes those with only numbers and formats with (partially) written out day and month names, and various common punctuations.
My best solution so far is the following algorithm:
generate a datetime() object with random values (nice solution here)
randomly select a locale, i.e. pick one of ['en_US','fr_FR','it_IT','de_DE'] where in this case this list is well known and short, so not a problem.
randomly select a format string for strftime(), i.e. ['%Y-%m-%d','%d %B %Y',...]. In my case the list should reflect potentially occuring date formats in the documents that will be exposed to the NLP model in the future.
generate a sting with strftime()
Especially for 3) i do not know a better version than to hardcode the list of what I saw manually within the training documents. I could not yet find a function that would turn ocr-dates into a format string, such that i could extend the list when yet-unseen date formats come by.
Do you have any suggestions on how to come up with better randomly formatted dates, or how to improve this approach?
USE random.randrange() AND datetime.timedelta() TO GENERATE A RANDOM DATE BETWEEN TWO DATES
Call datetime.date(year, month, day) to return a datetime object representing the time indicated by year, month, and day. Call this twice to define the start and end date. Subtract the start date from the end date to get the time between the two dates. Call datetime.timedelta.days to get the number of days from the previous result datetime.timedelta. Call random.randrange(days) to get a random integer less than the previous result days. Call datetime.timedelta(days=n) to get a datetime.timedelta representing the previous result n. Add this result to the start date.
start_date = datetime.date(2020, 1, 1)
end_date = datetime.date(2020, 2, 1)
time_between_dates = end_date - start_date
days_between_dates = time_between_dates.days
random_number_of_days = random.randrange(days_between_dates)
random_date = start_date + datetime.timedelta(days=random_number_of_days)
print(random_date)
Here is my solution. Concerning the local, all need to be available on your computer to avoid error
import random
from datetime import datetime, timedelta
import locale
LOCALE = ['en_US','fr_FR','it_IT','de_DE'] # all need to be available on your computer to avoid error
DATE_FORMAT = ['%Y-%m-%d','%d %B %Y']
def gen_datetime(min_year=1900, max_year=datetime.now().year):
# generate a datetime
start = datetime(min_year, 1, 1)
years = max_year - min_year + 1
end = start + timedelta(days=365 * years)
format_date = DATE_FORMAT[random.randint(0, len(DATE_FORMAT)-1)]
locale_date = LOCALE[random.randint(0, len(LOCALE)-1)]
locale.setlocale(locale.LC_ALL, locale_date) # generate error if local are not available on your computer
return (start + (end - start) * random.random()).strftime(format_date)
date = gen_datetime()
print(date)

Subtract "n" days from a list of format [month, day]

I have a this list in the Python.
dates= [6,15],[8,24]
I want to subtract values from this list. For example [6,15] is [month,day] so I want to subtract 15 to 10. I want to get [6,5], well this operation will repeat after that I want to get [5, 26] like this. How can I do this code?
You probably want to use the builtin datetime and 3rd-party dateutil modules for this. Note you will need to specify a year, since some years have months of differing lengths (i.e leap years) -parse will assume the current year:
import datetime.date as dt
from dateutil.parser import parse
from dateutil.relativedelta import relativedelta
print(parse('6/15') - relativedelta(days=10))
You should be using inbuilt datetime.datetime and datetime.timedelta objects to achieve this in simplified way as:
>>> from datetime import datetime, timedelta
>>> my_date_list = [6, 15] # your current list in "[month, day]" format
# create `datetime` object using above values
# since you don't care about year, using 2018 for demonstration. But you need this.
>>> datetime_obj = datetime(month=my_date_list[0], day=my_date_list[1], year=2018)
# timedelta object for the number of days you want the diff
>>> diff = timedelta(days=20)
# New datetime object
>>> new_datetime_obj = datetime_obj - diff
>>> new_datetime_obj
datetime.datetime(2018, 5, 26, 0, 0)
# You desired format list of [month, day]
>>> [new_datetime_obj.month, new_datetime_obj.day]
[5, 26]
PS: You shouldn't be even storing your initial and final list as "[Month, Day]" format. Simply store the list of datetime objects, and use it where ever you need. new_datetime_obj.month yields the month and new_datetime_obj.day yields the day
Note: You must consider about the year in your code. It is necessary in doing your computation and calculating the days. For example, the calculation for February for leap year and non-leap years yields different results.

Getting a lists of times in Python from 0:0:0 to 23:59:59 and then comparing with datetime values

I was working on code to generate the time for an entire day with 30 second intervals. I tried using DT.datetime and DT.time but I always end up with either a datetime value or a timedelta value like (0,2970). Can someone please tell me how to do this.
So I need a list that has data like:
[00:00:00]
[00:00:01]
[00:00:02]
till [23:59:59] and needed to compare it against a datetime value like 6/23/2011 6:38:00 AM.
Thanks!
Is there a reason you want to use datetimes instead of just 3 for loops counting up? Similarly, do you want to do something fancy or do you want to just compare against the time? If you don't need to account for leap seconds or anything like that, just do it the easy way.
import datetime
now = datetime.datetime.now()
for h in xrange(24):
for m in xrange(60):
for s in xrange(60):
time_string = '%02d:%02d:%02d' % (h,m,s)
if time_string == now.strftime('%H:%m:%S'):
print 'you found it! %s' % time_string
Can you give any more info about why you are doing this? It seems like you would be much much better off parsing the datetimes or using strftime to get what you need instead of looping through 60*60*24 times.
There's a great answer on how to get a list of incremental values for seconds for a 24-hour day. I reused a part of it.
Note 1. I'm not sure how you're thinking of comparing time with datetime. Assuming that you're just going to compare the time part and extracting that.
Note 2. The time.strptime call expects a 12-hour AM/PM-based time, as in your example. Its result is then passed to time.strftime that returns a 24-hour-based time.
Here's what I think you're looking for:
my_time = '6/23/2011 6:38:00 AM' # time you defined
from datetime import datetime, timedelta
from time import strftime, strptime
now = datetime(2013, 1, 1, 0, 0, 0)
last = datetime(2013, 1, 1, 23, 59, 59)
delta = timedelta(seconds=1)
times = []
while now <= last:
times.append(now.strftime('%H:%M:%S'))
now += delta
twenty_four_hour_based_time = strftime('%H:%M:%S', strptime(my_time, '%m/%d/%Y %I:%M:%S %p'))
twenty_four_hour_based_time in times # returns True

Converting date formats python - Unusual date formats - Extract %Y%M%D

I have a large data set with a variety of Date information in the following formats:
DAYS since Jan 1, 1900 - ex: 41213 - I believe these are from Excel http://www.kirix.com/stratablog/jd-edwards-date-conversions-cyyddd
YYDayofyear - ex 2012265
I am familiar with python's time module, strptime() method, and strftime () method. However, I am not sure what these date formats above are called on if there is a python module I can use to convert these unusual date formats.
Any idea how to get the %Y%M%D format from these unusual date formats without writing my own calculator?
Thanks.
You can try something like the following:
In [1]: import datetime
In [2]: s = '2012265'
In [3]: datetime.datetime.strptime(s, '%Y%j')
Out[3]: datetime.datetime(2012, 9, 21, 0, 0)
In [4]: d = '41213'
In [5]: datetime.date(1900, 1, 1) + datetime.timedelta(int(d))
Out[5]: datetime.date(2012, 11, 2)
The first one is the trickier one, but it uses the %j parameter to interpret the day of the year you provide (after a four-digit year, represented by %Y). The second one is simply the number of days since January 1, 1900.
This is the general conversion - not sure of your input format but hopefully this can be tweaked to suit it.
On the Excel integer to Python datetime bit:
Note that there are two Excel date systems (one 1-Jan-1900 based and another 1-Jan 1904 based); see https://support.microsoft.com/en-us/help/214330/differences-between-the-1900-and-the-1904-date-system-in-excel for more information.
Also note that the system is NOT zero-based. So, in the 1900 system, 1-Jan-1900 is day 1 (not day 0).
import datetime
EXCEL_DATE_SYSTEM_PC=1900
EXCEL_DATE_SYSTEM_MAC=1904
i = 42129 # Excel number for 5-May-2015
d = datetime.date(EXCEL_DATE_SYSTEM_PC, 1, 1) + datetime.timedelta(i-2)
Both of these formats seems pretty straightforward to work with. The first one, in fact, is just an integer, so why don't you just do something like this?
import datetime
def days_since_jan_1_1900_to_datetime(d):
return datetime.datetime(1900,1,1) + \
datetime.timedelta(days=d)
For the second one, the details depend on exactly how the format is defined (e.g. can you always expect 3 digits after the year even when the number of days is less than 100, or is it possible that there are 2 or 1 – and if so, is the year always 4 digits?) but once you've got that part down it can be done very similarly.
According to http://docs.python.org/2/library/datetime.html#strftime-and-strptime-behavior
, day of the year is "%j", whereas the first case can be solved by toordinal() and fromordinal(): date.fromordinal(date(1900, 1, 1).toordinal() + x)
I'd think timedelta.
import datetime
d = datetime.timedelta(days=41213)
start = datetime.datetime(year=1900, month=1, day=1)
the_date = start + d
For the second one, you can 2012265[:4] to get the year and use the same method.
edit: See the answer with %j for the second.
from datetime import datetime
df(['timeelapsed'])=(pd.to_datetime(df['timeelapsed'], format='%H:%M:%S') - datetime(1900, 1, 1)).dt.total_seconds()

Categories

Resources