How to properly convert the current date for example in python timestamp in influxdb? what is the correct formula to do this. I have tried many solutions and I still get the result that timestamp is for example 1970-01-01T00:00:00.15436224Z.
Related
I have some discrepancy when subtracting dates in Postgresql and SQLAlchemy. For instance, I have the following in Postgresql:
SELECT trunc(EXTRACT(EPOCH FROM ('2019-07-05 15:20:10.111497-07:00'::timestamp - '2019-07-04 11:45:17.293328-07:00'::timestamp)))
--99292
and the following query in SQLAlchemy:
date_diff = session.query(func.trunc((func.extract('epoch',
func.date('2019-07-05 15:20:10.111497-07:00'))-
func.extract('epoch',
func.date('2019-07-04 11:45:17.293328-07:00'))))).all()
print(date_diff)
#[(86400.0,)]
We can see that the most exact difference is coming from Postgresql query. How can I get the same result using SQLAlchemy? I have not been able to spot what is the cause of this difference. If you know please let me know.
Thanks a lot.
Have never used SQLAlchemy before but it looks like you are trying to truncate to a date instead of a timestamp or datetime
Don't worry, this is an easy mistake to make. DateTime libraries can be confusing with their definitions (a date is a literally a Date so YYYY-MM-DD whereas a timestamp includes both the date and time to some denomination)
This is why you have a difference of 86,400 (one day) because it is comparing the dates of the two objects (2019-07-05 - 2019-07-04)
Try using the func.time.as_utc() or something similar to get a timestamp
You want to be comparing the WHOLE timestamp
EDIT: Sorry, didn't see your comment until after posting.
I have a question about using dates on pandas.
In the CSV I am importing (if I ordering it), I will find that the maximum date is 10/09/2019 18:22:00
Immediately after importing (still as object), the date that appears is 31/12/2018 12:05.
And if I convert in this way to date and time:
df['Data_Abertura_Processo'] = pd.to_datetime(df['Data_Abertura_Processo'])
the value changes to: Timestamp('2019-12-08 18:40:00').
How do I get the maximum date I find into the CSV by filtering in Excel itself?
Today I'm using:
df['Data_Abertura_Processo'].max()
Am I wrong in converting or using max ()?
df['Data_Abertura_Processo'] = pd.to_datetime(df['Data_Abertura_Processo'],format="%d/%m/%Y %H:%M:%S")
Make sure that your datetimes have all the same format.
I am processing a dataset with a date column in it. But the date format is strange to me:
date
59:06.4
42:42.9
07:18.0
......
I have never seen this format before. Could anyone let me know what this format is? and if I use python to process it, what functions I should use?
I think I know. This is the date + time format. When I read it in python. It automatically transfer into datetime format
I am using Python 2--I am behind moving over my code--so perhaps this issue has gone away.
Using pandas, I can create a datetime like this:
import pandas as pd
big_date= pd.datetime(9999,12,31)
print big_date
9999-12-31 00:00:00
big_date2 = pd.to_datetime(big_date)
. . .
Out of bounds nanosecond timestamp: 9999-12-31 00:00:00
I understand the reason for the error in that there are obviously too many nanoseconds in a date that big. I also know that big_date2 = pd.to_datetime(big_date, errors='ignore') would work. However, in my situation, I have a column of what are supposed to be dates (read from SQL server) and I do indeed want it to change invalid data/dates to NaT. In effect, I was using pd.to_datetime as a validity check. To Pandas, on the one hand, 9999-12-31 is a valid date, and on the other, it's not. That means I can't use it and have had to come up with something else.
I've played around with the arguments in pandas to_datetime and not been able to solve this.
I've looked at other questions/problems of this nature, and not found an answer.
I have a similar issue and was able to find a solution.
I have a pandas dataframe with one column that contains a datetime (retrieved from a database table where the column was a DateTime2 data type), but I need to be able to represents date that are further in the future than the Timestamp.max value.
Fortunately, I didn't need to worry about the time part of the datetime column - it was actually always 00:00:00 (I didn't create the database design and, yes, it probably should have been a Date data type and not a DateTime2 data type). So I was able to get round the issue by converting the pandas dataframe column to just a date type. For example:
for i, row in df.iterrows():
df.set_value(i, 'DateColumn', datetime.datetime(9999, 12, 31).date())
sets all of the values in the column to the date 9999-12-31 and you don't receive any errors when using this column anymore.
So, if you can afford to lose the time part of the date you are trying to use you can work round the limitation of the datetime values in the dataframe by converting to a date.
I've been trying to figure out how to generate the same Unix epoch time that I see within InfluxDB next to measurement entries.
Let me start by saying I am trying to use the same date and time in all tests:
April 01, 2017 at 2:00AM CDT
If I view a measurement in InfluxDB, I see time stamps such as:
1491030000000000000
If I view that measurement in InfluxDB using the -precision rfc3339 it appears as:
2017-04-01T07:00:00Z
So I can see that InfluxDB used UTC
I cannot seem to generate that same timestamp through Python, however.
For instance, I've tried a few different ways:
>>> calendar.timegm(time.strptime('04/01/2017 02:00:00', '%m/%d/%Y %H:%M:%S'))
1491012000
>>> calendar.timegm(time.strptime('04/01/2017 07:00:00', '%m/%d/%Y %H:%M:%S'))
1491030000
>>> t = datetime.datetime(2017,04,01,02,00,00)
>>> print "Epoch Seconds:", time.mktime(t.timetuple())
Epoch Seconds: 1491030000.0
The last two samples above at least appear to give me the same number, but it's much shorter than what InfluxDB has. I am assuming that is related to the precision, InfluxDB does things down to nanosecond I think?
Python Result: 1491030000
Influx Result: 1491030000000000000
If I try to enter a measurement into InfluxDB using the result Python gives me it ends up showing as:
1491030000 = 1970-01-01T00:00:01.49103Z
So I have to add on the extra nine 0's.
I suppose there are a few ways to do this programmatically within Python if it's as simple as adding on nine 0's to the result. But I would like to know why I can't seem to generate the same precision level in just one conversion.
I have a CSV file with tons of old timestamps that are simply, "4/1/17 2:00". Every day at 2 am there is a measurement.
I need to be able to convert that to the proper format that InfluxDB needs "1491030000000000000" to insert all these old measurements.
A better understanding of what is going on and why is more important than how to programmatically solve this in Python. Although I would be grateful to responses that can do both; explain the issue and what I am seeing and why as well as ideas on how to take a CSV with one column that contains time stamps that appear as "4/1/17 2:00" and convert them to timestamps that appear as "1491030000000000000" either in a separate file or in a second column.
InfluxDB can be told to return epoch timestamps in second precision in order to work more easily with tools/libraries that do not support nanosecond precision out of the box, like Python.
Set epoch=s in query parameters to enable this.
See influx HTTP API timestamp format documentation.
Something like this should work to solve your current problem. I didn't have a test csv to try this on, but it will likely work for you. It will take whatever csv file you put where "old.csv" is and create a second csv with the timestamp in nanoseconds.
import time
import datetime
import csv
def convertToNano(date):
s = date
secondsTimestamp = time.mktime(datetime.datetime.strptime(s, "%d/%m/%y %H:%M").timetuple())
nanoTimestamp = str(secondsTimestamp).replace(".0", "000000000")
return nanoTimestamp
with open('old.csv', 'rb') as old_csv:
csv_reader = csv.reader(old_csv)
with open('new.csv', 'wb') as new_csv:
csv_writer = csv.writer(new_csv)
for i, row in enumerate(csv_reader):
if i != 0:
# Put whatever rows the data appears in and the row you want the data to go in here
row.append(convertToNano(row[<location of date in the row>]))
csv_writer.writerow(row)
As to why this is happening, after reading this it seems like you aren't the only one getting frustrated by this issue. It seems as though influxdb just happens to be using a different type of precision then most python modules. I didn't really see any way to get around it than doing the string manipulation of the date conversion unfortunately.