As input, I have a date string that can take three general formats:
a) January 6, 2011
b) 4 days ago
c) 12 hours ago
I want the script to be able to recognize the format and call the appropriate function with the parameters.
So if a then convert_full_string("January 6, 2011")
if b then convert_days(4)
if c then convert_hours(12)
Once I recognize the format and able to call the appropriate function, it will be relatively easy. I plan on using dateutil
But I am not sure how to recognize the format.
Any suggestions with code samples much appreciated.
Using parsedatetime, you could parse all three date formats into datetime.datetime objects without having to code the logic yourself:
import parsedatetime.parsedatetime as pdt
import parsedatetime.parsedatetime_consts as pdc
import datetime
c = pdc.Constants()
p = pdt.Calendar(c)
for text in ('january 6, 2011', '4 days ago', '12 hours ago'):
date=datetime.datetime(*p.parse(text)[0][:6])
# print(date.isoformat())
# 2011-01-06T09:00:18
# 2011-01-02T09:00:18
# 2011-01-05T21:00:18
print(date.strftime('%Y%m%dT%H%M%S'))
# 20110106T090208
# 20110102T090208
# 20110105T210208
if 'days' in userinput:
convert_days(userinput[:userinput.index('days')].strip())
elif 'hours' in userinput:
convert_hours(userinput[:userinput.index('hours')].strip())
else:
convert_full_string(userinput)
This assumes that when "days" or "hours" is contained in userinput, you always want the chars that came immediately before those two words.
You can match with regular expressions:
import re
re.search(r".* [0-9]{1,2}, [0-9]{4}", tomatch)
Similar with [0-9]{1,2} days ago, etc.
Related
I'm using Python to parse some logfiles produced via jetty.net.ssl on an external platform running a JVM to which I have no access.
For reasons I don't understand (and nor can I find documented anywhere) the log timestamps have the first hour of each day expressed as 24 rather than 00 e.g.
javax.net.ssl|DEBUG|15|Mux|2022-07-01 24:00:11.298 UTC|SSLSocketOutputRecord.java:334|WRITE: TLSv1.3 application_data, length = 31
which corresponds to 2022-07-01 00:00:11.298 rather than 2022-07-02 00:00:11.298
This format breaks things like Python's datetime.datetime() and dateutils.parser.parse(). I can code around this, stripping out the various elements of the timestamp string using a regex and altering the hour where necessary, along the lines of
timere = re.compile(r"^(\d{4})-(\d{2})-(\d{2})\s+(\d{2}).(\d{2}).(\d{2})\.(\d{3}).*$")
if not (match:=timere.match(tstr)):
raise ValueError(f"Time string {tstr} is not valid")
yy = int(match.groups()[0])
mm = int(match.groups()[1])
dd = int(match.groups()[2])
hr = int(match.groups()[3]) % 24
mi = int(match.groups()[4])
se = int(match.groups()[5])
us = int(match.groups()[6]) * 1000
d = datetime.datetime(yy, mm, dd, hr, mi, se, us, tzinfo=datetime.timezone.utc)
I am, however, intrigued as to why the timestamps are in that format and is there some subtlety of which I am unaware? I'm kind of assuming that the developers used "24" as a valid hour deliberately for reasons I don't yet understand.
The OpenJDK sun.security.ssl.SSLLogger uses the following syntax to output the timestamp.
private static final String PATTERN = "yyyy-MM-dd kk:mm:ss.SSS z";
private static final DateTimeFormatter dateTimeFormat =
DateTimeFormatter.ofPattern(PATTERN, Locale.ENGLISH)
.withZone(ZoneId.systemDefault());
This means the hour is represented by kk portion, which according to java.time.format.DateTimeFormatter is "clock-hour-of-day (1-24)"
Not sure if python has the same date/time pattern it can use.
When using Python strftime, is there a way to remove the first 0 of the date if it's before the 10th, ie. so 01 is 1? Can't find a %thingy for that?
Thanks!
Actually I had the same problem and I realized that, if you add a hyphen between the % and the letter, you can remove the leading zero.
For example %Y/%-m/%-d.
This only works on Unix (Linux, OS X), not Windows (including Cygwin). On Windows, you would use #, e.g. %Y/%#m/%#d.
We can do this sort of thing with the advent of the format method since python2.6:
>>> import datetime
>>> '{dt.year}/{dt.month}/{dt.day}'.format(dt = datetime.datetime.now())
'2013/4/19'
Though perhaps beyond the scope of the original question, for more interesting formats, you can do stuff like:
>>> '{dt:%A} {dt:%B} {dt.day}, {dt.year}'.format(dt=datetime.datetime.now())
'Wednesday December 3, 2014'
And as of python3.6, this can be expressed as an inline formatted string:
Python 3.6.0a2 (v3.6.0a2:378893423552, Jun 13 2016, 14:44:21)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import datetime
>>> dt = datetime.datetime.now()
>>> f'{dt:%A} {dt:%B} {dt.day}, {dt.year}'
'Monday August 29, 2016'
Some platforms may support width and precision specification between % and the letter (such as 'd' for day of month), according to http://docs.python.org/library/time.html -- but it's definitely a non-portable solution (e.g. doesn't work on my Mac;-). Maybe you can use a string replace (or RE, for really nasty format) after the strftime to remedy that? e.g.:
>>> y
(2009, 5, 7, 17, 17, 17, 3, 127, 1)
>>> time.strftime('%Y %m %d', y)
'2009 05 07'
>>> time.strftime('%Y %m %d', y).replace(' 0', ' ')
'2009 5 7'
Here is the documentation of the modifiers supported by strftime() in the GNU C library. (Like people said before, it might not be portable.) Of interest to you might be:
%e instead of %d will replace leading zero in day of month with a space
It works on my Python (on Linux). I don't know if it will work on yours.
>>> import datetime
>>> d = datetime.datetime.now()
>>> d.strftime('X%d/X%m/%Y').replace('X0','X').replace('X','')
'5/5/2011'
On Windows, add a '#', as in '%#m/%#d/%Y %#I:%M:%S %p'
For reference: https://msdn.microsoft.com/en-us/library/fe06s4ak.aspx
quite late to the party but %-d works on my end.
datetime.now().strftime('%B %-d, %Y') produces something like "November 5, 2014"
cheers :)
Take a look at - bellow:
>>> from datetime import datetime
>>> datetime.now().strftime('%d-%b-%Y')
>>> '08-Oct-2011'
>>> datetime.now().strftime('%-d-%b-%Y')
>>> '8-Oct-2011'
>>> today = datetime.date.today()
>>> today.strftime('%d-%b-%Y')
>>> print(today)
I find the Django template date formatting filter to be quick and easy. It strips out leading zeros. If you don't mind importing the Django module, check it out.
http://docs.djangoproject.com/en/dev/ref/templates/builtins/#date
from django.template.defaultfilters import date as django_date_filter
print django_date_filter(mydate, 'P, D M j, Y')
simply use replace like this:
(datetime.date.now()).strftime("%Y/%m/%d").replace("/0", "/")
it will output:
'2017/7/21'
For %d you can convert to integer using int() then it'll automatically remove leading 0 and becomes integer. You can then convert back to string using str().
using, for example, "%-d" is not portable even between different versions of the same OS.
A better solution would be to extract the date components individually, and choose between date specific formatting operators and date attribute access for each component.
e = datetime.date(2014, 1, 6)
"{date:%A} {date.day} {date:%B}{date.year}".format(date=e)
if we want to fetch only date without leading zero we can
d = date.today()
day = int(d.strftime("%d"))
Because Python really just calls the C language strftime(3) function on your platform, it might be that there are format characters you could use to control the leading zero; try man strftime and take a look. But, of course, the result will not be portable, as the Python manual will remind you. :-)
I would try using a new-style datetime object instead, which has attributes like t.year and t.month and t.day, and put those through the normal, high-powered formatting of the % operator, which does support control of leading zeros. See http://docs.python.org/library/datetime.html for details. Better yet, use the "".format() operator if your Python has it and be even more modern; it has lots of format options for numbers as well. See: http://docs.python.org/library/string.html#string-formatting.
Based on Alex's method, this will work for both the start-of-string and after-spaces cases:
re.sub('^0|(?<= )0', '', "01 January 2000 08:00am")
I like this better than .format or %-d because this is cross-platform and allows me to keep using strftime (to get things like "November" and "Monday").
Old question, but %l (lower-case L) worked for me in strftime: this may not work for everyone, though, as it's not listed in the Python documentation I found
import datetime
now = datetime.datetime.now()
print now.strftime("%b %_d")
Python 3.6+:
from datetime import date
today = date.today()
text = "Today it is " + today.strftime(f"%A %B {today.day}, %Y")
I am late, but a simple list slicing will do the work
today_date = date.today().strftime('%d %b %Y')
if today_date[0] == '0':
today_date = today_date[1:]
The standard library is good enough for most cases but for a really detailed manipulation with dates you should always look for some specialized third-party library.
Using Arrow:
>>> import arrow
>>> arrow.utcnow().format('dddd, D. M. YYYY')
'Friday, 6. 5. 2022'
Look at the full list of supported tokens.
A little bit tricky but works for me
ex. from 2021-02-01T00:00:00.000Z to 2021-02-1
from datetime import datetime
dateObj = datetime.strptime('2021-02-01T00:00:00.000Z','%Y-%m-%dT%H:%M:%S.%fZ')
dateObj.strftime('%Y-%m-{}').format(dateObj.day)
How can I convert YYYY-MM-DD hh:mm:ss format to integer in python?
for example 2014-02-12 20:51:14 -> to integer.
I only know how to convert hh:mm:ss but not yyyy-mm-dd hh:mm:ss
def time_to_num(time_str):
hh, mm , ss = map(int, time_str.split(':'))
return ss + 60*(mm + 60*hh)
It depends on what the integer is supposed to encode. You could convert the date to a number of milliseconds from some previous time. People often do this affixed to 12:00 am January 1 1970, or 1900, etc., and measure time as an integer number of milliseconds from that point. The datetime module (or others like it) will have functions that do this for you: for example, you can use int(datetime.datetime.utcnow().timestamp()).
If you want to semantically encode the year, month, and day, one way to do it is to multiply those components by order-of-magnitude values large enough to juxtapose them within the integer digits:
2012-06-13 --> 20120613 = 10,000 * (2012) + 100 * (6) + 1*(13)
def to_integer(dt_time):
return 10000*dt_time.year + 100*dt_time.month + dt_time.day
E.g.
In [1]: import datetime
In [2]: %cpaste
Pasting code; enter '--' alone on the line to stop or use Ctrl-D.
:def to_integer(dt_time):
: return 10000*dt_time.year + 100*dt_time.month + dt_time.day
: # Or take the appropriate chars from a string date representation.
:--
In [3]: to_integer(datetime.date(2012, 6, 13))
Out[3]: 20120613
If you also want minutes and seconds, then just include further orders of magnitude as needed to display the digits.
I've encountered this second method very often in legacy systems, especially systems that pull date-based data out of legacy SQL databases.
It is very bad. You end up writing a lot of hacky code for aligning dates, computing month or day offsets as they would appear in the integer format (e.g. resetting the month back to 1 as you pass December, then incrementing the year value), and boiler plate for converting to and from the integer format all over.
Unless such a convention lives in a deep, low-level, and thoroughly tested section of the API you're working on, such that everyone who ever consumes the data really can count on this integer representation and all of its helper functions, then you end up with lots of people re-writing basic date-handling routines all over the place.
It's generally much better to leave the value in a date context, like datetime.date, for as long as you possibly can, so that the operations upon it are expressed in a natural, date-based context, and not some lone developer's personal hack into an integer.
I think I have a shortcut for that:
# Importing datetime.
from datetime import datetime
# Creating a datetime object so we can test.
a = datetime.now()
# Converting a to string in the desired format (YYYYMMDD) using strftime
# and then to int.
a = int(a.strftime('%Y%m%d'))
This in an example that can be used for example to feed a database key, I sometimes use instead of using AUTOINCREMENT options.
import datetime
dt = datetime.datetime.now()
seq = int(dt.strftime("%Y%m%d%H%M%S"))
The other answers focused on a human-readable representation with int(mydate.strftime("%Y%m%d%H%M%S")). But this makes you lose a lot, including normal integer semantics and arithmetics, therefore I would prefer something like bash date's "seconds since the epoch (1970-01-01 UTC)".
As a reference, you could use the following bash command to get 1392234674 as a result:
date +%s --date="2014-02-12 20:51:14"
As ely hinted in the accepted answer, just a plain number representation is unmistakeable and by far easier to handle and parse, especially programmatically. Plus conversion from and to human-readable is an easy oneliner both ways.
To do the same thing in python, you can use datetime.timestamp() as djvg commented. For other methods you can consider the edit history.
Here is a simple date -> second conversion tool:
def time_to_int(dateobj):
total = int(dateobj.strftime('%S'))
total += int(dateobj.strftime('%M')) * 60
total += int(dateobj.strftime('%H')) * 60 * 60
total += (int(dateobj.strftime('%j')) - 1) * 60 * 60 * 24
total += (int(dateobj.strftime('%Y')) - 1970) * 60 * 60 * 24 * 365
return total
(Effectively a UNIX timestamp calculator)
Example use:
from datetime import datetime
x = datetime(1970, 1, 1)
time_to_int(x)
Output: 0
x = datetime(2021, 12, 31)
time_to_int(x)
Output: 1639785600
x = datetime(2022, 1, 1)
time_to_int(x)
Output: 1639872000
x = datetime(2022, 1, 2)
time_to_int(x)
Output: 1639958400
When converting datetime to integers one must keep in mind the tens, hundreds and thousands.... like
"2018-11-03" must be like 20181103 in int
for that you have to
2018*10000 + 100* 11 + 3
Similarly another example,
"2018-11-03 10:02:05" must be like 20181103100205 in int
Explanatory Code
dt = datetime(2018,11,3,10,2,5)
print (dt)
#print (dt.timestamp()) # unix representation ... not useful when converting to int
print (dt.strftime("%Y-%m-%d"))
print (dt.year*10000 + dt.month* 100 + dt.day)
print (int(dt.strftime("%Y%m%d")))
print (dt.strftime("%Y-%m-%d %H:%M:%S"))
print (dt.year*10000000000 + dt.month* 100000000 +dt.day * 1000000 + dt.hour*10000 + dt.minute*100 + dt.second)
print (int(dt.strftime("%Y%m%d%H%M%S")))
General Function
To avoid that doing manually use below function
def datetime_to_int(dt):
return int(dt.strftime("%Y%m%d%H%M%S"))
df.Date = df.Date.str.replace('-', '').astype(int)
In my python based application, user can enter dates in format of dd/mm/yy with date separator variations(like they can use /,- or space as a seperator). Therefore all these are valid dates:
10/02/2009
07 22 2009
09-08-2008
9-9/2008
11/4 2010
03/07-2009
09-01 2010
Now in order to test it, I need to create a list of such dates, but I am not sure how to auto generate random combinations of these date strings with seperators.
This is what I started doing:
date = ['10', '10', '2010']
seperators = ['/', '-', ' ']
for s in seperators:
new_date = s.join(date)
I think the previous answers didn't really help too much. If you choose "day" as a number from 1-31 and "month" as any number from 1-12 in your test data, your productive code MUST raise Exceptions somewhere - 02/31/2013 should not be accepted!
Therefore, you should create random, but valid dates and then create strings from them with arbitrarily chosen format strings. This is what my code does:
import datetime
import time
import random
separators = ["/",",","-"," "]
prefixes = [""," "]
def random_datetime(min_date, max_date):
since_epoch_min = time.mktime(min_date.timetuple())
since_epoch_max = time.mktime(max_date.timetuple())
random_time = random.randint(since_epoch_min, since_epoch_max)
return datetime.datetime.fromtimestamp(random_time)
def random_date_string_with_random_separators(dt):
prefix = random.choice(prefixes)
sep1 = random.choice(separators)
sep2 = random.choice(separators)
format_string = "{}%m{}%d{}%Y".format(prefix, sep1, sep2)
return dt.strftime(format_string)
min_date = datetime.datetime(2012,01,01)
max_date = datetime.datetime(2013,01,01)
for i in range(10):
print random_date_string_with_random_separators(
random_datetime(min_date, max_date)
)
This should cover all cases (if you take more than ten values).
Nevertheless, I have two remarks:
Don't use random data as test-input
You'll never know if someday your test will fail, maybe you don't catch all possible problems with the data generated. In your case it should be o.k., but generally it's not good practice (if you have another choice). Alternatively, you could create a well-thought set of hard-coded input strings where you cover all corner cases. And if someday your tests fail, you know it's no random effect.
Use well-tested code
For the task you describe, there's a library for that! Use dateutil. They have a fantastic datetime-parser that swallows almost everything you throw at it. Example:
from dateutil import parser
for i in range(10):
date_string = random_date_string_with_random_separators(
random_datetime(min_date, max_date)
)
parsed_datetime = parser.parse(date_string)
print date_string, parsed_datetime.strftime("%m/%d/%Y")
Output:
01 05,2012 01/05/2012
05 17-2012 05/17/2012
06-07-2012 06/07/2012
10 31,2012 10/31/2012
10/04,2012 10/04/2012
11 16,2012 11/16/2012
03/23 2012 03/23/2012
02-26-2012 02/26/2012
01,12-2012 01/12/2012
12-21 2012 12/21/2012
Then you can be sure it works. dateutilhas tons of unit tests and "just will work". And the best code you can write is code you don't have to test.
I suggest you give certain information in input:
For example:
date = raw_input("Enter date (mm/dd/yyyy): ")
Now use strptime() to check if it's correct or not:
try:
date = time.strptime(date, '%m/%d/%Y')
except ValueError:
print('Invalid date!')
References:
http://docs.python.org/2/library/time.html#time.strptime
How can I validate a date in Python 3.x?
To create those dates automatically and add them to a list, you can use this:
from random import choice, randrange
dates = []
s = ' -/'
for i in range(100):
dates.append( "%i%s%i%s%i" % (randrange(1,13), choice(s), randrange(1,32), choice(s), randrange(2000,2019) ) )
print dates
I am getting time data in string format like this, 'HH:MM', for example '13:33' would be 13 hours and 33 minutes.
So, I used this code to get time object and it works great
datetime.datetime.strptime('13:33', '%H:%M').time()
However, I now have new problem.New strings started coming in representing more than 24 hours and datetime.datetime.strptime('25:33', '%H:%M').time() will simply fail.What's your suggestion?
A datetime.time object represents a (local) time of day, independent of any particular day.
You shouldn't use it to represent an elapsed time, like you appear to be.
More appropriate might be a datetime.timedelta:
A timedelta object represents a duration, the difference between two dates or times.
class datetime.timedelta([days[, seconds[, microseconds[, milliseconds[, minutes[, hours[, weeks]]]]]]])
All arguments are optional and default to 0. Arguments may be ints, longs, or floats, and may be positive or negative.
An example:
>>> from datetime import timedelta
>>> d = timedelta(hours=25,minutes=10)
>>> d
datetime.timedelta(1, 4200) #Days, seconds
When you say "it will simply fail", I assume you want to know when it will fail. Here's one approach:
>>> import datetime
>>>
>>> time_strs = [
... "13:33",
... "25:33",
... "12:88"
... ]
>>>
>>> for s in time_strs:
... try:
... print datetime.datetime.strptime(s, '%H:%M').time()
... except ValueError:
... print "Bad time: {0}".format(s)
...
13:33:00
Bad time: 25:33
Bad time: 12:88
You'll have to do this manually, alas.
def parse_stopwatch(s):
hours, minutes = s.split(':', 1)
return datetime.time(hour=int(hours), minute=int(minutes))
This is really dumb, granted. You could automatically have more than 60 minutes convert to hours, or get all fancy with a regex, or add support for days or seconds. But you'll have to be a bit more specific about where this data is coming from and what it's supposed to represent. :)