python - parsing mystery date format [duplicate]

python - parsing mystery date format [duplicate] - python

This question already has answers here:
Convert weird Python date format to readable date
(2 answers)
Closed 7 years ago.
I'm importing data from an Excel spreadsheet into python. My dates are coming through in a bizarre format of which I am not familiar and cannot parse.
in excel: (7/31/2015)
42216
after I import it:
u'/Date(1438318800000-0500)/'
Two questions:
what format is this and how might I parse it into something more intuitive and easier to read?
is there a robust, swiss-army-knife-esque way to convert dates without specifying input format?

Timezones necessarily make this more complex, so let's ignore them...
As #SteJ remarked, what you get is (close to) the time in seconds since 1st January 1970. Here's a Wikipedia article how that's normally used. Oddly, the string you get seems to have a timezone (-0500, EST in North America) attached. Makes no sense if it's properly UNIX time (which is always in UTC), but we'll pass on that...
Assuming you can get it reduced to a number (sans timezone) the conversion into something sensible in Python is really straight-forward (note the reduction in precision; your original number is the number of milliseconds since the epoch, rather than the standard number of seconds from the epoch):
from datetime import datetime
time_stamp = 1438318800
time_stamp_dt = datetime.fromtimestamp(time_stamp)
You can then get time_stamp_dt into any format you think best using strftime, e.g., time_stamp_dt.strftime('%m/%d/%Y'), which pretty much gives you what you started with.
Now, assuming that the format of the string you provided is fairly regular, we can extract the relevant time quite simply like this:
s = '/Date(1438318800000-0500)/'
time_stamp = int(s[6:16])

Related

How do I convert a struct_time output into DD/MM/YY, Hour:Minute:Second format?

I'm relatively uninitiated when it comes to Python, and I'm trying to figure out how to take an output I'm getting from a sensor into proper day, month, year and hour, minute, second format.
An example of the output, which also includes a basic counter (the first output), and a timestamp (the third output) is shown below:
(305, struct_time(tm_year=2022, tm_mon=11, tm_mday=9, tm_hour=16, tm_min=42, tm_sec=8, tm_wday=2, tm_yday=313, tm_isdst=-1), 7.036)
I've seen a lot of questions and answers for this, but I'm left feeling kind of stumped on all of them because I'm not sure how to take the output I have (real_time, which gives a struct_time output) and turn it into this format. Any help (and understanding about my lack of fluency in this field) would be really appreciated!

time.strftime exists for exactly this purpose:
import time
now_local = time.localtime()
fmt = "%d/%m/%Y %H:%M:%S"
out = time.strftime(fmt, now_local)
print(out)
However, two words of warning:
time.struct_time is not "timezone aware". This will turn out to matter when you least expect it. Unless you are very sure that you know the timezone of the incoming data, and have the correct safeguards in your application and database for managing time zone iformation, use the datetime.datetime class instead.
D/M/Y date format can be ambiguous. Y-M-D format is substantially safer. It is not ambiguous in any widely-used locale, and it has the extra benefit that lexical ordering of Y-M-D strings is also a correct ordering of the dates that they represent. This format is laid out by RFC 3339 and has become widely accepted as the standard, correct formatting for datetime strings.

So as it turns out, I was able to find a solution after all. Essentially I just used this function:
def _format_datetime(datetime):
return "{:02}/{:02}/{} {:02}:{:02}:{:02}".format(
datetime.tm_mon,
datetime.tm_mday,
datetime.tm_year,
datetime.tm_hour,
datetime.tm_min,
datetime.tm_sec,
)
And then applied it to the struct_time output as such (with real_time being said output):
real_time = time.localtime()
current_time = time.monotonic()
formatted_time = _format_datetime(real_time)
Hopefully this helps other people using CircuitPython for similar purposes!

What the meaning of T Z for date?

I am using api it require date as this format
2020-03-01T00:00:00Z
I googled around and couldn't under stand what the T Z means.
For Now I made this string with this code by python
dt_now.strftime('%Y-%m-%dT%H:%M:%SZ')
However it looks a bit awkward and I am not sure if it is correct.
Is there any good way for python datetime??

This looks like ISO 8601 time format. The T stands for time and is used as separator, while Z determines time offset and stands for Zulu which is commonly used, military originated, name alias for UTC+0 offset. For other offsets you need to specify it as HH:MM, with + or - respectively. So the Z is therefore equivalent of writing +00:00.
See https://en.wikipedia.org/wiki/ISO_8601 for more info.

Is there an easy way to plot and manipulate time duration (hours/minutes/seconds) data in Python? NOT datetime data

I'm working with some video game speedrunning (basically, races where people try to beat a game as fast as they can) data, and I have many different run timings in HH:MM:SS format. I know it's possible to convert to seconds, but I want to keep in this format for the purposes of making the axes on any graphs easy to read.
I have all the data in a data frame already and tried converting the timing data to datetime format, with format = '%H:%M:%S', but it just uses this as the time on 1900-01-01.
data=[['Aggy','01:02:32'], ['Kirby','01:04:54'],['Sally','01:06:04']]
df=pd.DataFrame(data, columns=['Runner','Time'])
df['Time']=pd.to_datetime(df['Time'], format='%H:%M:%S')
I thought specifying the format to be just hours/minutes/seconds would strip away any date, but when I print out the header of my dataframe, it says that the time data is now 1900-01-01 01:02:32, as an example. 1:02:32 AM on January 1st, 1900. I want Python to recognize the 1:02:32 as a duration of time, not a datetime format. What's the best way to go about this?

The format argument defines the format of the input date, not the format of the resulting datetime object (reference).
For your needs you can either use the H:m:s part of the datetime, or use the to_timedelta
method.

Parsing date which may or may not contain milliseconds

So this question is more of best way to handle this sort of input in python. Here is an example of input date 2018-12-31 23:59:59.999999. The millisecond part may or may not be part of input.
I am currently using this code to convert this to datetime
input_ts = datetime.datetime.strptime(input_str, '%Y-%m-%dT%H:%M:%S.%f')
But the problem in this case is that it will throw an exception if input string doesn't contain milliseconds part i.e., 2018-12-31 23:59:59
In Java, I could have approached this problem in two ways. (its a pseudo explanation, without taking into account of small boundary checks)
(preferred approach). Check the input string length. if its less than 19 then it is missing milliseconds. Append .000000 to it.
(not preferred). Let the main code parse the string, if it throws an exception, then parse it with new time format i.e., %Y-%m-%dT%H:%M:%S
The third approach could be just strip off milliseconds.
I am not sure if python has anything built-in to handle these kind of situations. Any suggestions?

You could use python-dateutil library, it is smart enough to parse most of the basic date formats.
import dateutil.parser
dateutil.parser.parse('2018-12-31 23:59:59.999999')
dateutil.parser.parse('2018-12-31 23:59:59')
In case you don't want to install any external libraries, you could iterate over list of different formats as proposed in this answer.

from datetime import datetime # import datetime class from datetime package
dt = datetime.now() # get current time
dt1 = dt1.strftime('%Y-%m-%d %H:%M:%S') # converting time to string
dt3 = dt2.strptime('2018/5/20','%Y/%m/%d') # converting a string to specified time

Understanding difference in unix epoch time via Python vs. InfluxDB

I've been trying to figure out how to generate the same Unix epoch time that I see within InfluxDB next to measurement entries.
Let me start by saying I am trying to use the same date and time in all tests:
April 01, 2017 at 2:00AM CDT
If I view a measurement in InfluxDB, I see time stamps such as:
1491030000000000000
If I view that measurement in InfluxDB using the -precision rfc3339 it appears as:
2017-04-01T07:00:00Z
So I can see that InfluxDB used UTC
I cannot seem to generate that same timestamp through Python, however.
For instance, I've tried a few different ways:
>>> calendar.timegm(time.strptime('04/01/2017 02:00:00', '%m/%d/%Y %H:%M:%S'))
1491012000
>>> calendar.timegm(time.strptime('04/01/2017 07:00:00', '%m/%d/%Y %H:%M:%S'))
1491030000
>>> t = datetime.datetime(2017,04,01,02,00,00)
>>> print "Epoch Seconds:", time.mktime(t.timetuple())
Epoch Seconds: 1491030000.0
The last two samples above at least appear to give me the same number, but it's much shorter than what InfluxDB has. I am assuming that is related to the precision, InfluxDB does things down to nanosecond I think?
Python Result: 1491030000
Influx Result: 1491030000000000000
If I try to enter a measurement into InfluxDB using the result Python gives me it ends up showing as:
1491030000 = 1970-01-01T00:00:01.49103Z
So I have to add on the extra nine 0's.
I suppose there are a few ways to do this programmatically within Python if it's as simple as adding on nine 0's to the result. But I would like to know why I can't seem to generate the same precision level in just one conversion.
I have a CSV file with tons of old timestamps that are simply, "4/1/17 2:00". Every day at 2 am there is a measurement.
I need to be able to convert that to the proper format that InfluxDB needs "1491030000000000000" to insert all these old measurements.
A better understanding of what is going on and why is more important than how to programmatically solve this in Python. Although I would be grateful to responses that can do both; explain the issue and what I am seeing and why as well as ideas on how to take a CSV with one column that contains time stamps that appear as "4/1/17 2:00" and convert them to timestamps that appear as "1491030000000000000" either in a separate file or in a second column.

InfluxDB can be told to return epoch timestamps in second precision in order to work more easily with tools/libraries that do not support nanosecond precision out of the box, like Python.
Set epoch=s in query parameters to enable this.
See influx HTTP API timestamp format documentation.

Something like this should work to solve your current problem. I didn't have a test csv to try this on, but it will likely work for you. It will take whatever csv file you put where "old.csv" is and create a second csv with the timestamp in nanoseconds.
import time
import datetime
import csv
def convertToNano(date):
s = date
secondsTimestamp = time.mktime(datetime.datetime.strptime(s, "%d/%m/%y %H:%M").timetuple())
nanoTimestamp = str(secondsTimestamp).replace(".0", "000000000")
return nanoTimestamp
with open('old.csv', 'rb') as old_csv:
csv_reader = csv.reader(old_csv)
with open('new.csv', 'wb') as new_csv:
csv_writer = csv.writer(new_csv)
for i, row in enumerate(csv_reader):
if i != 0:
# Put whatever rows the data appears in and the row you want the data to go in here
row.append(convertToNano(row[<location of date in the row>]))
csv_writer.writerow(row)
As to why this is happening, after reading this it seems like you aren't the only one getting frustrated by this issue. It seems as though influxdb just happens to be using a different type of precision then most python modules. I didn't really see any way to get around it than doing the string manipulation of the date conversion unfortunately.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

python - parsing mystery date format [duplicate] - python

Related

How do I convert a struct_time output into DD/MM/YY, Hour:Minute:Second format?

What the meaning of T Z for date?

Is there an easy way to plot and manipulate time duration (hours/minutes/seconds) data in Python? NOT datetime data

Parsing date which may or may not contain milliseconds

Understanding difference in unix epoch time via Python vs. InfluxDB

Categories

Resources