I'm using Python to parse some logfiles produced via jetty.net.ssl on an external platform running a JVM to which I have no access.
For reasons I don't understand (and nor can I find documented anywhere) the log timestamps have the first hour of each day expressed as 24 rather than 00 e.g.
javax.net.ssl|DEBUG|15|Mux|2022-07-01 24:00:11.298 UTC|SSLSocketOutputRecord.java:334|WRITE: TLSv1.3 application_data, length = 31
which corresponds to 2022-07-01 00:00:11.298 rather than 2022-07-02 00:00:11.298
This format breaks things like Python's datetime.datetime() and dateutils.parser.parse(). I can code around this, stripping out the various elements of the timestamp string using a regex and altering the hour where necessary, along the lines of
timere = re.compile(r"^(\d{4})-(\d{2})-(\d{2})\s+(\d{2}).(\d{2}).(\d{2})\.(\d{3}).*$")
if not (match:=timere.match(tstr)):
raise ValueError(f"Time string {tstr} is not valid")
yy = int(match.groups()[0])
mm = int(match.groups()[1])
dd = int(match.groups()[2])
hr = int(match.groups()[3]) % 24
mi = int(match.groups()[4])
se = int(match.groups()[5])
us = int(match.groups()[6]) * 1000
d = datetime.datetime(yy, mm, dd, hr, mi, se, us, tzinfo=datetime.timezone.utc)
I am, however, intrigued as to why the timestamps are in that format and is there some subtlety of which I am unaware? I'm kind of assuming that the developers used "24" as a valid hour deliberately for reasons I don't yet understand.
The OpenJDK sun.security.ssl.SSLLogger uses the following syntax to output the timestamp.
private static final String PATTERN = "yyyy-MM-dd kk:mm:ss.SSS z";
private static final DateTimeFormatter dateTimeFormat =
DateTimeFormatter.ofPattern(PATTERN, Locale.ENGLISH)
.withZone(ZoneId.systemDefault());
This means the hour is represented by kk portion, which according to java.time.format.DateTimeFormatter is "clock-hour-of-day (1-24)"
Not sure if python has the same date/time pattern it can use.
Related
I need to convert a time in UT combined with a longitude into local time. I am working on analyzing a future Earth observing satellite mission and need to do this conversion to continue.
I found a general solution to this problem here: T(lt) = T(ut) + lon/(360/24), however implementing it is driving me absolutely bonkers.
My data are datetime time objects:
In[9]: sattime[0]
Out[9]: datetime.time(18, 0)
and longitude coordinates from 0 to 360 degrees.
I need to take this object and use the above equation to convert to a local time. I ONLY care about the time relative to midnight and NOT about the date (in fact the rest of my code is currently using the datetime.time object only and preferably the local time output would be the same type of object).
I have tried the following from here but am stuck getting the local variable back into a time object.
test = 4
for (i,j) in zip(sattime, satloncor):
td = datetime.datetime.combine(datetime.datetime.min, i) - datetime.datetime.min
seconds = td // datetime.timedelta(milliseconds=1)
local = (seconds + (j/(360/86400)))/1000
print (local)
if test<0:
break
test-=1
The test part of the code just makes sure I am not wasting time doing the conversion for all ~400,000 data points.
So to summarize, I want to take a datetime.time object in UT coupled with a corresponding longitude, and convert it to local solar time as a datetime.time object.
This all seems super convoluted and seems like there should be an easier way. Any help is greatly appreciated! Thanks!
I figured it out but it is not pretty.
localtimes = []
for (i,j) in zip(sattime, satloncor):
td = dt.datetime.combine(dt.datetime.min, i) - dt.datetime.min
seconds = td // dt.timedelta(seconds=1)
local = (seconds + (j/(360/86400)))/3600
if local>24:
local-=24
strip = [math.modf(local)[1],math.modf(local)[0]*60 ]
if strip[0]==24:
localtimes.append(dt.time(0, int(strip[1]),0))
else:
localtimes.append(dt.time(int(strip[0]), int(strip[1]),0))
Sorry, but it is super convoluted because it's a complicated subject. For just one example, did you know that every few years there is a day with 86401 seconds in it?
A couple of good web sites to start with:
http://www.ephemeris.com/books.html
I can recommend the "Practical Astronomy with your Calculator" book.
https://www.timeanddate.com/
Hope this helps.
How can I convert YYYY-MM-DD hh:mm:ss format to integer in python?
for example 2014-02-12 20:51:14 -> to integer.
I only know how to convert hh:mm:ss but not yyyy-mm-dd hh:mm:ss
def time_to_num(time_str):
hh, mm , ss = map(int, time_str.split(':'))
return ss + 60*(mm + 60*hh)
It depends on what the integer is supposed to encode. You could convert the date to a number of milliseconds from some previous time. People often do this affixed to 12:00 am January 1 1970, or 1900, etc., and measure time as an integer number of milliseconds from that point. The datetime module (or others like it) will have functions that do this for you: for example, you can use int(datetime.datetime.utcnow().timestamp()).
If you want to semantically encode the year, month, and day, one way to do it is to multiply those components by order-of-magnitude values large enough to juxtapose them within the integer digits:
2012-06-13 --> 20120613 = 10,000 * (2012) + 100 * (6) + 1*(13)
def to_integer(dt_time):
return 10000*dt_time.year + 100*dt_time.month + dt_time.day
E.g.
In [1]: import datetime
In [2]: %cpaste
Pasting code; enter '--' alone on the line to stop or use Ctrl-D.
:def to_integer(dt_time):
: return 10000*dt_time.year + 100*dt_time.month + dt_time.day
: # Or take the appropriate chars from a string date representation.
:--
In [3]: to_integer(datetime.date(2012, 6, 13))
Out[3]: 20120613
If you also want minutes and seconds, then just include further orders of magnitude as needed to display the digits.
I've encountered this second method very often in legacy systems, especially systems that pull date-based data out of legacy SQL databases.
It is very bad. You end up writing a lot of hacky code for aligning dates, computing month or day offsets as they would appear in the integer format (e.g. resetting the month back to 1 as you pass December, then incrementing the year value), and boiler plate for converting to and from the integer format all over.
Unless such a convention lives in a deep, low-level, and thoroughly tested section of the API you're working on, such that everyone who ever consumes the data really can count on this integer representation and all of its helper functions, then you end up with lots of people re-writing basic date-handling routines all over the place.
It's generally much better to leave the value in a date context, like datetime.date, for as long as you possibly can, so that the operations upon it are expressed in a natural, date-based context, and not some lone developer's personal hack into an integer.
I think I have a shortcut for that:
# Importing datetime.
from datetime import datetime
# Creating a datetime object so we can test.
a = datetime.now()
# Converting a to string in the desired format (YYYYMMDD) using strftime
# and then to int.
a = int(a.strftime('%Y%m%d'))
This in an example that can be used for example to feed a database key, I sometimes use instead of using AUTOINCREMENT options.
import datetime
dt = datetime.datetime.now()
seq = int(dt.strftime("%Y%m%d%H%M%S"))
The other answers focused on a human-readable representation with int(mydate.strftime("%Y%m%d%H%M%S")). But this makes you lose a lot, including normal integer semantics and arithmetics, therefore I would prefer something like bash date's "seconds since the epoch (1970-01-01 UTC)".
As a reference, you could use the following bash command to get 1392234674 as a result:
date +%s --date="2014-02-12 20:51:14"
As ely hinted in the accepted answer, just a plain number representation is unmistakeable and by far easier to handle and parse, especially programmatically. Plus conversion from and to human-readable is an easy oneliner both ways.
To do the same thing in python, you can use datetime.timestamp() as djvg commented. For other methods you can consider the edit history.
Here is a simple date -> second conversion tool:
def time_to_int(dateobj):
total = int(dateobj.strftime('%S'))
total += int(dateobj.strftime('%M')) * 60
total += int(dateobj.strftime('%H')) * 60 * 60
total += (int(dateobj.strftime('%j')) - 1) * 60 * 60 * 24
total += (int(dateobj.strftime('%Y')) - 1970) * 60 * 60 * 24 * 365
return total
(Effectively a UNIX timestamp calculator)
Example use:
from datetime import datetime
x = datetime(1970, 1, 1)
time_to_int(x)
Output: 0
x = datetime(2021, 12, 31)
time_to_int(x)
Output: 1639785600
x = datetime(2022, 1, 1)
time_to_int(x)
Output: 1639872000
x = datetime(2022, 1, 2)
time_to_int(x)
Output: 1639958400
When converting datetime to integers one must keep in mind the tens, hundreds and thousands.... like
"2018-11-03" must be like 20181103 in int
for that you have to
2018*10000 + 100* 11 + 3
Similarly another example,
"2018-11-03 10:02:05" must be like 20181103100205 in int
Explanatory Code
dt = datetime(2018,11,3,10,2,5)
print (dt)
#print (dt.timestamp()) # unix representation ... not useful when converting to int
print (dt.strftime("%Y-%m-%d"))
print (dt.year*10000 + dt.month* 100 + dt.day)
print (int(dt.strftime("%Y%m%d")))
print (dt.strftime("%Y-%m-%d %H:%M:%S"))
print (dt.year*10000000000 + dt.month* 100000000 +dt.day * 1000000 + dt.hour*10000 + dt.minute*100 + dt.second)
print (int(dt.strftime("%Y%m%d%H%M%S")))
General Function
To avoid that doing manually use below function
def datetime_to_int(dt):
return int(dt.strftime("%Y%m%d%H%M%S"))
df.Date = df.Date.str.replace('-', '').astype(int)
I have a dataframe of session log-in data. Each entry is associated with a class (e, c, g, m). So rows look like this:
1: [session_start_time session_end_time class_id problems_completed student_id student_account_created student_previous_logins_total student_previous_class_logins duration]
2: [1/6/12 16:28 1/6/12 16:55 e 37 91 10/26/11 0:00 76 27 1/1/04 0:27]
3: [1/11/12 13:18 1/11/12 13:58 m 33 172 1/10/12 0:00 5 3 1/1/04 0:40]
I am trying to calculate the average "duration" for each class (e, c, g, etc.). I am having trouble finding the right command to calculate the average per class, rather than the mean of the whole column.
I am not sure exactly what data format/structure you
mean your source data is in, since what you present is not an exact Python representation. But let's assume your rows are
lists of strings (or
can easily be converted into them):
rows = [
[ '1/6/12 16:28', '1/6/12 16:55', 'e' ],
[ '1/11/12 13:18', '1/11/12 13:58', 'm' ],
[ '1/13/12 13:20', '1/13/12 13:24', 'm' ]
]
Then, here's one way to compute the mean by class:
from collections import Counter
from datetime import datetime
def parse(s, format="%x %H:%M"):
"""
Return parsed datetime in the given format.
"""
return datetime.strptime(s, format)
total_items = Counter()
total_duration = Counter()
for start, end, kind in rows:
duration = parse(end) - parse(start)
total_items[kind] += 1
total_duration[kind] += duration.total_seconds()
means = { k: total_duration[k] / total_items[k] for k in total_items }
print means
This uses collections.Counters to track both the count of each class in the log and the duration. Duration must be computed, first by parsing the date/time string representation into an internal format like datetime.datetime. Once the counters are accumulated, a dictionary comprehension computes the mean per kind (what you call
"class" but that's a technical Python construct, so I call it a kind).
The resulting means stores the computed values. means['m'] gives the mean for all of the 'm' entries, and so forth.
While the parse function will work for the few data samples you showed in your question, date/time parsing is pretty finicky. Instead of using the strptime method here, I recommend using a more expansive and inclusive parser, such as that found in the dateutil module. If you wanted to use that, delete or rename the parse function found here, and substitute:
from dateutil.parser import parse
That provides a drop-in replacement with a much broader range of accepted formats.
In my python based application, user can enter dates in format of dd/mm/yy with date separator variations(like they can use /,- or space as a seperator). Therefore all these are valid dates:
10/02/2009
07 22 2009
09-08-2008
9-9/2008
11/4 2010
03/07-2009
09-01 2010
Now in order to test it, I need to create a list of such dates, but I am not sure how to auto generate random combinations of these date strings with seperators.
This is what I started doing:
date = ['10', '10', '2010']
seperators = ['/', '-', ' ']
for s in seperators:
new_date = s.join(date)
I think the previous answers didn't really help too much. If you choose "day" as a number from 1-31 and "month" as any number from 1-12 in your test data, your productive code MUST raise Exceptions somewhere - 02/31/2013 should not be accepted!
Therefore, you should create random, but valid dates and then create strings from them with arbitrarily chosen format strings. This is what my code does:
import datetime
import time
import random
separators = ["/",",","-"," "]
prefixes = [""," "]
def random_datetime(min_date, max_date):
since_epoch_min = time.mktime(min_date.timetuple())
since_epoch_max = time.mktime(max_date.timetuple())
random_time = random.randint(since_epoch_min, since_epoch_max)
return datetime.datetime.fromtimestamp(random_time)
def random_date_string_with_random_separators(dt):
prefix = random.choice(prefixes)
sep1 = random.choice(separators)
sep2 = random.choice(separators)
format_string = "{}%m{}%d{}%Y".format(prefix, sep1, sep2)
return dt.strftime(format_string)
min_date = datetime.datetime(2012,01,01)
max_date = datetime.datetime(2013,01,01)
for i in range(10):
print random_date_string_with_random_separators(
random_datetime(min_date, max_date)
)
This should cover all cases (if you take more than ten values).
Nevertheless, I have two remarks:
Don't use random data as test-input
You'll never know if someday your test will fail, maybe you don't catch all possible problems with the data generated. In your case it should be o.k., but generally it's not good practice (if you have another choice). Alternatively, you could create a well-thought set of hard-coded input strings where you cover all corner cases. And if someday your tests fail, you know it's no random effect.
Use well-tested code
For the task you describe, there's a library for that! Use dateutil. They have a fantastic datetime-parser that swallows almost everything you throw at it. Example:
from dateutil import parser
for i in range(10):
date_string = random_date_string_with_random_separators(
random_datetime(min_date, max_date)
)
parsed_datetime = parser.parse(date_string)
print date_string, parsed_datetime.strftime("%m/%d/%Y")
Output:
01 05,2012 01/05/2012
05 17-2012 05/17/2012
06-07-2012 06/07/2012
10 31,2012 10/31/2012
10/04,2012 10/04/2012
11 16,2012 11/16/2012
03/23 2012 03/23/2012
02-26-2012 02/26/2012
01,12-2012 01/12/2012
12-21 2012 12/21/2012
Then you can be sure it works. dateutilhas tons of unit tests and "just will work". And the best code you can write is code you don't have to test.
I suggest you give certain information in input:
For example:
date = raw_input("Enter date (mm/dd/yyyy): ")
Now use strptime() to check if it's correct or not:
try:
date = time.strptime(date, '%m/%d/%Y')
except ValueError:
print('Invalid date!')
References:
http://docs.python.org/2/library/time.html#time.strptime
How can I validate a date in Python 3.x?
To create those dates automatically and add them to a list, you can use this:
from random import choice, randrange
dates = []
s = ' -/'
for i in range(100):
dates.append( "%i%s%i%s%i" % (randrange(1,13), choice(s), randrange(1,32), choice(s), randrange(2000,2019) ) )
print dates
Is there a good method to convert a string representing time in the format of [m|h|d|s|w] (m= minutes, h=hours, d=days, s=seconds w=week) to number of seconds? I.e.
def convert_to_seconds(timeduration):
...
convert_to_seconds("1h")
-> 3600
convert_to_seconds("1d")
-> 86400
etc?
Thanks!
Yes, there is a good simple method that you can use in most languages without having to read the manual for a datetime library. This method can also be extrapolated to ounces/pounds/tons etc etc:
seconds_per_unit = {"s": 1, "m": 60, "h": 3600, "d": 86400, "w": 604800}
def convert_to_seconds(s):
return int(s[:-1]) * seconds_per_unit[s[-1]]
I recommend using the timedelta class from the datetime module:
from datetime import timedelta
UNITS = {"s":"seconds", "m":"minutes", "h":"hours", "d":"days", "w":"weeks"}
def convert_to_seconds(s):
count = int(s[:-1])
unit = UNITS[ s[-1] ]
td = timedelta(**{unit: count})
return td.seconds + 60 * 60 * 24 * td.days
Internally, timedelta objects store everything as microseconds, seconds, and days. So while you can give it parameters in units like milliseconds or months or years, in the end you'll have to take the timedelta you created and convert back to seconds.
In case the ** syntax confuses you, it's the Python apply syntax. Basically, these function calls are all equivalent:
def f(x, y): pass
f(5, 6)
f(x=5, y=6)
f(y=6, x=5)
d = {"x": 5, "y": 6}
f(**d)
And another to add to the mix.
This solution is brief, but fairly tolerant, and allows for multiples, such as 10m 30s
from datetime import timedelta
import re
UNITS = {'s':'seconds', 'm':'minutes', 'h':'hours', 'd':'days', 'w':'weeks'}
def convert_to_seconds(s):
return int(timedelta(**{
UNITS.get(m.group('unit').lower(), 'seconds'): float(m.group('val'))
for m in re.finditer(r'(?P<val>\d+(\.\d+)?)(?P<unit>[smhdw]?)', s, flags=re.I)
}).total_seconds())
Test results:
>>> convert_to_seconds('10s')
10
>>> convert_to_seconds('1') # defaults to seconds
1
>>> convert_to_seconds('1m 10s') # chaining
70
>>> convert_to_seconds('1M10S') # case insensitive
70
>>> convert_to_seconds('1week 3days') # ignores 'eek' and 'ays'
864000
>>> convert_to_seconds('This will take 1.25min, probably.') # floats
75
not perfect
>>> convert_to_seconds('1month 3days') # actually 1minute + 3 days
259260
>>> convert_to_seconds('40s 10s') # 1st value clobbered by 2nd
10
I usually need to support raw numbers, string numbers and string numbers ending in [m|h|d|s|w].
This version will handle: 10, "10", "10s", "10m", "10h", "10d", "10w".
Hat tip to #Eli Courtwright's answer on the string conversion.
UNITS = {"s":"seconds", "m":"minutes", "h":"hours", "d":"days", "w":"weeks"}
def convert_to_seconds(s):
if isinstance(s, int):
# We are dealing with a raw number
return s
try:
seconds = int(s)
# We are dealing with an integer string
return seconds
except ValueError:
# We are dealing with some other string or type
pass
# Expecting a string ending in [m|h|d|s|w]
count = int(s[:-1])
unit = UNITS[ s[-1] ]
td = timedelta(**{unit: count})
return td.seconds + 60 * 60 * 24 * td.days
I wrote an Open source library MgntUtils in java (not php) that answers in part to this requirement. It contains a static method parsingStringToTimeInterval(String value) this method parses a string that is expected to hold some time interval value - a numeric value with optional time unit suffix. For example, string "38s" will be parsed as 38 seconds, "24m" - 24 minutes "4h" - 4 hours, "3d" - 3 days and "45" as 45 milliseconds. Supported suffixes are "s" for seconds, "m" for minutes, "h" for hours, and "d" for days. String without suffix is considered to hold a value in milliseconds. Suffixes are case insensitive. If provided String contains an unsupported suffix or holds negative numeric value or zero or holds a non-numeric value - then IllegalArgumentException is thrown. This method returns TimeInterval class - a class also defined in this library. Essentially, it holds two properties with relevant getters and setters: long "value" and java.util.concurrent.TimeUnit. But in addition to getters and setters this class has methods toMillis(), toSeconds(), toMinutes(), toHours() toDays(). Those methods return long vlaue in specified time scale (The same way as corresponding methods in class java.util.concurrent.TimeUnit)
This method may be very useful for parsing time interval properties such as timeouts or waiting periods from configuration files. It eliminates unneeded calculations from different time scales to milliseconds back and forth. Consider that you have a methodInvokingInterval property that you need to set for 5 days. So in order to set the milliseconds value you will need to calculate that 5 days is 432000000 milliseconds (obviously not an impossible task but annoying and error prone) and then anyone else who sees the value 432000000 will have to calculate it back to 5 days which is frustrating. But using this method you will have a property value set to "5d" and invoking the code
long seconds = TextUtils.parsingStringToTimeInterval("5d").toSeconds();
will solve your conversion problem. Obviously, this is not overly complex feature, but it could add simplicity and clarity in your configuration files and save some frustration and "stupid" miscalculation into milliseconds bugs. Here is the link to the article that describes the MgntUtils library as well as where to get it: MgntUtils