In my python based application, user can enter dates in format of dd/mm/yy with date separator variations(like they can use /,- or space as a seperator). Therefore all these are valid dates:
10/02/2009
07 22 2009
09-08-2008
9-9/2008
11/4 2010
03/07-2009
09-01 2010
Now in order to test it, I need to create a list of such dates, but I am not sure how to auto generate random combinations of these date strings with seperators.
This is what I started doing:
date = ['10', '10', '2010']
seperators = ['/', '-', ' ']
for s in seperators:
new_date = s.join(date)
I think the previous answers didn't really help too much. If you choose "day" as a number from 1-31 and "month" as any number from 1-12 in your test data, your productive code MUST raise Exceptions somewhere - 02/31/2013 should not be accepted!
Therefore, you should create random, but valid dates and then create strings from them with arbitrarily chosen format strings. This is what my code does:
import datetime
import time
import random
separators = ["/",",","-"," "]
prefixes = [""," "]
def random_datetime(min_date, max_date):
since_epoch_min = time.mktime(min_date.timetuple())
since_epoch_max = time.mktime(max_date.timetuple())
random_time = random.randint(since_epoch_min, since_epoch_max)
return datetime.datetime.fromtimestamp(random_time)
def random_date_string_with_random_separators(dt):
prefix = random.choice(prefixes)
sep1 = random.choice(separators)
sep2 = random.choice(separators)
format_string = "{}%m{}%d{}%Y".format(prefix, sep1, sep2)
return dt.strftime(format_string)
min_date = datetime.datetime(2012,01,01)
max_date = datetime.datetime(2013,01,01)
for i in range(10):
print random_date_string_with_random_separators(
random_datetime(min_date, max_date)
)
This should cover all cases (if you take more than ten values).
Nevertheless, I have two remarks:
Don't use random data as test-input
You'll never know if someday your test will fail, maybe you don't catch all possible problems with the data generated. In your case it should be o.k., but generally it's not good practice (if you have another choice). Alternatively, you could create a well-thought set of hard-coded input strings where you cover all corner cases. And if someday your tests fail, you know it's no random effect.
Use well-tested code
For the task you describe, there's a library for that! Use dateutil. They have a fantastic datetime-parser that swallows almost everything you throw at it. Example:
from dateutil import parser
for i in range(10):
date_string = random_date_string_with_random_separators(
random_datetime(min_date, max_date)
)
parsed_datetime = parser.parse(date_string)
print date_string, parsed_datetime.strftime("%m/%d/%Y")
Output:
01 05,2012 01/05/2012
05 17-2012 05/17/2012
06-07-2012 06/07/2012
10 31,2012 10/31/2012
10/04,2012 10/04/2012
11 16,2012 11/16/2012
03/23 2012 03/23/2012
02-26-2012 02/26/2012
01,12-2012 01/12/2012
12-21 2012 12/21/2012
Then you can be sure it works. dateutilhas tons of unit tests and "just will work". And the best code you can write is code you don't have to test.
I suggest you give certain information in input:
For example:
date = raw_input("Enter date (mm/dd/yyyy): ")
Now use strptime() to check if it's correct or not:
try:
date = time.strptime(date, '%m/%d/%Y')
except ValueError:
print('Invalid date!')
References:
http://docs.python.org/2/library/time.html#time.strptime
How can I validate a date in Python 3.x?
To create those dates automatically and add them to a list, you can use this:
from random import choice, randrange
dates = []
s = ' -/'
for i in range(100):
dates.append( "%i%s%i%s%i" % (randrange(1,13), choice(s), randrange(1,32), choice(s), randrange(2000,2019) ) )
print dates
Related
I'm using Python to parse some logfiles produced via jetty.net.ssl on an external platform running a JVM to which I have no access.
For reasons I don't understand (and nor can I find documented anywhere) the log timestamps have the first hour of each day expressed as 24 rather than 00 e.g.
javax.net.ssl|DEBUG|15|Mux|2022-07-01 24:00:11.298 UTC|SSLSocketOutputRecord.java:334|WRITE: TLSv1.3 application_data, length = 31
which corresponds to 2022-07-01 00:00:11.298 rather than 2022-07-02 00:00:11.298
This format breaks things like Python's datetime.datetime() and dateutils.parser.parse(). I can code around this, stripping out the various elements of the timestamp string using a regex and altering the hour where necessary, along the lines of
timere = re.compile(r"^(\d{4})-(\d{2})-(\d{2})\s+(\d{2}).(\d{2}).(\d{2})\.(\d{3}).*$")
if not (match:=timere.match(tstr)):
raise ValueError(f"Time string {tstr} is not valid")
yy = int(match.groups()[0])
mm = int(match.groups()[1])
dd = int(match.groups()[2])
hr = int(match.groups()[3]) % 24
mi = int(match.groups()[4])
se = int(match.groups()[5])
us = int(match.groups()[6]) * 1000
d = datetime.datetime(yy, mm, dd, hr, mi, se, us, tzinfo=datetime.timezone.utc)
I am, however, intrigued as to why the timestamps are in that format and is there some subtlety of which I am unaware? I'm kind of assuming that the developers used "24" as a valid hour deliberately for reasons I don't yet understand.
The OpenJDK sun.security.ssl.SSLLogger uses the following syntax to output the timestamp.
private static final String PATTERN = "yyyy-MM-dd kk:mm:ss.SSS z";
private static final DateTimeFormatter dateTimeFormat =
DateTimeFormatter.ofPattern(PATTERN, Locale.ENGLISH)
.withZone(ZoneId.systemDefault());
This means the hour is represented by kk portion, which according to java.time.format.DateTimeFormatter is "clock-hour-of-day (1-24)"
Not sure if python has the same date/time pattern it can use.
I have a timestamp in the below format -
Fri 6Jun21 11:11:11.402 pm
and I'd like to do two things -
add proceeding 0 before 6Jun21 so it reads 06Jun21, and ignore for
double digit occurrences
Convert time into 24h format
Final output should look like
Fri 06Jun21 23:11:11.402
I tried traditional formatting and unable to do so. I'm trying to avoid parsing as regex and add an overhead on my processing.
Any ideas?
You might try regex like this - I've included a timeit so you can see how long each modification takes.
import re
ts = "Fri 6Jun21 11:11:11.402 pm"
def modifytimestamp(ts):
tsre = r"^(\S+ )(\d+)(\S+ )(\d+)(:\d+:\d+\.\d+) ((p|a)m)"
matches = re.match(tsre,ts)
# for g in matches.groups():
# print( f"{g=}" )
monthday = int(matches.group(2))
monthday = str(monthday) if monthday >9 else "0"+str(monthday)
hour = int(matches.group(4))
ampm = matches.group(6)
if ampm == 'pm':
hour += 12
hour=str(hour)
# re-assemble the poissibly-modifed timestamp
result = matches.group(1)+monthday+matches.group(3)+hour+matches.group(5)
return result
if __name__=='__main__':
result = modifytimestamp(ts)
print( f"{result=}" )
import timeit
repeats=1000000
totaltime = timeit.timeit(stmt='modifytimestamp("Fri 6Jun21 11:11:11.402 pm")',globals=globals(),number=repeats)
print( f"{totaltime=} for {repeats} iterations" )
Results:
result='Fri 06Jun21 23:11:11.402'
totaltime=3.5960077 for 1000000 iterations
So modifying the timestamp takes 3.6 microseconds (on my laptop, CoreI7 something-or-other)
I haven't pre-compiled the regexp, or optimised in any way - you can probably halve this time with some simple optimisations. Yes other methods of picking apart the string are likely a lot faster, but this may be enough to not slow down your other code too much.
How can I convert YYYY-MM-DD hh:mm:ss format to integer in python?
for example 2014-02-12 20:51:14 -> to integer.
I only know how to convert hh:mm:ss but not yyyy-mm-dd hh:mm:ss
def time_to_num(time_str):
hh, mm , ss = map(int, time_str.split(':'))
return ss + 60*(mm + 60*hh)
It depends on what the integer is supposed to encode. You could convert the date to a number of milliseconds from some previous time. People often do this affixed to 12:00 am January 1 1970, or 1900, etc., and measure time as an integer number of milliseconds from that point. The datetime module (or others like it) will have functions that do this for you: for example, you can use int(datetime.datetime.utcnow().timestamp()).
If you want to semantically encode the year, month, and day, one way to do it is to multiply those components by order-of-magnitude values large enough to juxtapose them within the integer digits:
2012-06-13 --> 20120613 = 10,000 * (2012) + 100 * (6) + 1*(13)
def to_integer(dt_time):
return 10000*dt_time.year + 100*dt_time.month + dt_time.day
E.g.
In [1]: import datetime
In [2]: %cpaste
Pasting code; enter '--' alone on the line to stop or use Ctrl-D.
:def to_integer(dt_time):
: return 10000*dt_time.year + 100*dt_time.month + dt_time.day
: # Or take the appropriate chars from a string date representation.
:--
In [3]: to_integer(datetime.date(2012, 6, 13))
Out[3]: 20120613
If you also want minutes and seconds, then just include further orders of magnitude as needed to display the digits.
I've encountered this second method very often in legacy systems, especially systems that pull date-based data out of legacy SQL databases.
It is very bad. You end up writing a lot of hacky code for aligning dates, computing month or day offsets as they would appear in the integer format (e.g. resetting the month back to 1 as you pass December, then incrementing the year value), and boiler plate for converting to and from the integer format all over.
Unless such a convention lives in a deep, low-level, and thoroughly tested section of the API you're working on, such that everyone who ever consumes the data really can count on this integer representation and all of its helper functions, then you end up with lots of people re-writing basic date-handling routines all over the place.
It's generally much better to leave the value in a date context, like datetime.date, for as long as you possibly can, so that the operations upon it are expressed in a natural, date-based context, and not some lone developer's personal hack into an integer.
I think I have a shortcut for that:
# Importing datetime.
from datetime import datetime
# Creating a datetime object so we can test.
a = datetime.now()
# Converting a to string in the desired format (YYYYMMDD) using strftime
# and then to int.
a = int(a.strftime('%Y%m%d'))
This in an example that can be used for example to feed a database key, I sometimes use instead of using AUTOINCREMENT options.
import datetime
dt = datetime.datetime.now()
seq = int(dt.strftime("%Y%m%d%H%M%S"))
The other answers focused on a human-readable representation with int(mydate.strftime("%Y%m%d%H%M%S")). But this makes you lose a lot, including normal integer semantics and arithmetics, therefore I would prefer something like bash date's "seconds since the epoch (1970-01-01 UTC)".
As a reference, you could use the following bash command to get 1392234674 as a result:
date +%s --date="2014-02-12 20:51:14"
As ely hinted in the accepted answer, just a plain number representation is unmistakeable and by far easier to handle and parse, especially programmatically. Plus conversion from and to human-readable is an easy oneliner both ways.
To do the same thing in python, you can use datetime.timestamp() as djvg commented. For other methods you can consider the edit history.
Here is a simple date -> second conversion tool:
def time_to_int(dateobj):
total = int(dateobj.strftime('%S'))
total += int(dateobj.strftime('%M')) * 60
total += int(dateobj.strftime('%H')) * 60 * 60
total += (int(dateobj.strftime('%j')) - 1) * 60 * 60 * 24
total += (int(dateobj.strftime('%Y')) - 1970) * 60 * 60 * 24 * 365
return total
(Effectively a UNIX timestamp calculator)
Example use:
from datetime import datetime
x = datetime(1970, 1, 1)
time_to_int(x)
Output: 0
x = datetime(2021, 12, 31)
time_to_int(x)
Output: 1639785600
x = datetime(2022, 1, 1)
time_to_int(x)
Output: 1639872000
x = datetime(2022, 1, 2)
time_to_int(x)
Output: 1639958400
When converting datetime to integers one must keep in mind the tens, hundreds and thousands.... like
"2018-11-03" must be like 20181103 in int
for that you have to
2018*10000 + 100* 11 + 3
Similarly another example,
"2018-11-03 10:02:05" must be like 20181103100205 in int
Explanatory Code
dt = datetime(2018,11,3,10,2,5)
print (dt)
#print (dt.timestamp()) # unix representation ... not useful when converting to int
print (dt.strftime("%Y-%m-%d"))
print (dt.year*10000 + dt.month* 100 + dt.day)
print (int(dt.strftime("%Y%m%d")))
print (dt.strftime("%Y-%m-%d %H:%M:%S"))
print (dt.year*10000000000 + dt.month* 100000000 +dt.day * 1000000 + dt.hour*10000 + dt.minute*100 + dt.second)
print (int(dt.strftime("%Y%m%d%H%M%S")))
General Function
To avoid that doing manually use below function
def datetime_to_int(dt):
return int(dt.strftime("%Y%m%d%H%M%S"))
df.Date = df.Date.str.replace('-', '').astype(int)
I'm trying to increase the time.
I want to get an hour format like this: 13:30:45,123 (in Java: "HH:mm:ss,SSS"), but Python displays 13:30:45,123456 ("%H:%M:%S,%f")(microseconds of 6 digits).
I read on the web and found possible solutions like:
from datetime import datetime
hour = datetime.utcnow().strftime('%H:%M:%S,%f')[:-3]
print(hour)
The output is: 04:33:16,123
But it's a bad solution, because if the hour is for example: 01:49:56,020706, the output is: 01:49:56,020, that the right should be: 01:49:56,021 (rounded).
The real purpose is that if I increase the milliseconds, even reaching rounds the seconds.
Example: (I want to increase 500 microseconds)
If the Input: 00:01:48,557, the Output should be: 00:01:49,057
The code of the program in Java (working good) is:
SimpleDateFormat df = new SimpleDateFormat("HH:mm:ss,SSS");
System.out.print("Input the time: ");
t1 = in.next();
Date d = df.parse(t1);
Calendar cal = Calendar.getInstance();
cal.setTime(d);
cal.add(Calendar.MILLISECOND, 500);//here increase the milliseconds (microseconds)
t2 = df.format(cal.getTime());
System.out.print("The Output (+500): "+t2);
I don't know if exists in Python something like SimpleDateFormat (in Java).
As to addition, you can add 500ms to your datetime object, using a timedelta object:
from datetime import datetime, timedelta
t1 = datetime.utcnow()
t2 = t1 + timedelta(milliseconds=500)
So as long as you're working with datetime objects instead of strings, you can easily do all the time-operations you'd like.
So we're left with the question of how to format the time when you want to display it.
As you pointed out, the [:-3]-trick seems to be the common solution, and seems to me it should work fine. If you really care about rounding correctly to the closest round millisecond, you can use the following "rounding trick":
You must have seen this trick in the past, for floats:
def round(x):
return int(x + 0.5)
The same idea (i.e. adding 0.5) can also be applied to datetimes:
def format_dt(t):
tr = t + timedelta(milliseconds=0.5)
return tr.strftime('%H:%M:%S,%f')[:-3]
You can round of digits using decimal
from decimal import Decimal
ts = datetime.utcnow()
sec = Decimal(ts.strftime('%S.%f'))
print ts.strftime('%H:%M:')+str(round(sec, 3))
As input, I have a date string that can take three general formats:
a) January 6, 2011
b) 4 days ago
c) 12 hours ago
I want the script to be able to recognize the format and call the appropriate function with the parameters.
So if a then convert_full_string("January 6, 2011")
if b then convert_days(4)
if c then convert_hours(12)
Once I recognize the format and able to call the appropriate function, it will be relatively easy. I plan on using dateutil
But I am not sure how to recognize the format.
Any suggestions with code samples much appreciated.
Using parsedatetime, you could parse all three date formats into datetime.datetime objects without having to code the logic yourself:
import parsedatetime.parsedatetime as pdt
import parsedatetime.parsedatetime_consts as pdc
import datetime
c = pdc.Constants()
p = pdt.Calendar(c)
for text in ('january 6, 2011', '4 days ago', '12 hours ago'):
date=datetime.datetime(*p.parse(text)[0][:6])
# print(date.isoformat())
# 2011-01-06T09:00:18
# 2011-01-02T09:00:18
# 2011-01-05T21:00:18
print(date.strftime('%Y%m%dT%H%M%S'))
# 20110106T090208
# 20110102T090208
# 20110105T210208
if 'days' in userinput:
convert_days(userinput[:userinput.index('days')].strip())
elif 'hours' in userinput:
convert_hours(userinput[:userinput.index('hours')].strip())
else:
convert_full_string(userinput)
This assumes that when "days" or "hours" is contained in userinput, you always want the chars that came immediately before those two words.
You can match with regular expressions:
import re
re.search(r".* [0-9]{1,2}, [0-9]{4}", tomatch)
Similar with [0-9]{1,2} days ago, etc.