Pandas: Read Timestamp from CSV in GMT then resample - python

I have a CSV with epoch GMT timestamp at irregular intervals paired with a value. I tried reading it from the CSV but all the times zones are shifted to my local timezone. How can I make it read in as-is (in GMT)? Then I would like the resample to one minute intervals, HOWEVER, I would like to skip gaps which are larger than a user specified value. If this is not possible, is there way to resample to one minute, but in the gaps, put in an arbitrary value like 0.0?
Data:
Time,Data
1354979750250,0.2343
1354979755250,2.3433
1354979710250,1.2343
def date_utc(s):
return parse(s, tzinfos=tzutc)
x = read_csv("time2.csv", date_parser=date_utc, converters={'Time': lambda x:datetime.fromtimestamp(int(x)/1000.)}).set_index('Time')

Convert local datetime to GMT datetime like this:
gmtDatetime = localdatetime - datetime.timedelta(hours=8)
The time zone is +08 (China).
Or using 'datetime.utcfromtimestamp':
classmethod datetime.utcfromtimestamp(timestamp)
classmethod datetime.fromtimestamp(timestamp, tz=None)
Return the UTC datetime corresponding to the POSIX timestamp, with
tzinfo None. This may raise OverflowError, if the timestamp is out of
the range of values supported by the platform C gmtime() function, and
OSError on gmtime() failure. It’s common for this to be restricted to
years in 1970 through 2038. See also fromtimestamp().

Related

How to solve datetime comparing issue in python

My goal is to compare the datetime now with another datetime given to my program from a json.
After comparing the two datetimes , the result is different from the reality.
The timezone is tz = pytz.timezone('Europe/Athens') which is UTC+3
The json time initially is in string format and after handling I turn the format into datetime
"start_time": "2020-08-11T20:13:00+03:00", the json data
start_time = data.get('start_time')
start_datetime = dateutil.parser.parse(start_time), #datetime format
Now after calling a function in order to check which datetime is bigger than the other, with
the information that the date now is:
2020-08-11 14:51:21.713511+03:00
and start_date is :
2020-08-11 13:00:00+03:00
the function returns True which is wrong since the start_datetime is not bigger than the datetime now.
Here is the function:
def check_start_datetime_bigger_than_now(start_datetime):
tz = pytz.timezone('Europe/Athens')
dts = start_datetime.replace(tzinfo=tz)
dtnow = datetime.now(pytz.timezone('Europe/Athens'))
print(dts)
print(dtnow)
#compare the datetimes
if dts >= dtnow:
return True
else:
return False
Can anyone help me on clarifying what's happening?
before the compare the print of datetimes is giving:
2020-08-11 20:13:00+01:35
2020-08-11 15:06:55.397784+03:00
Why the start date is giving +01:35
You should not use datetime.replace to change the timezone of a datetime instance. It is not smart and cannot handle anything other than simple timezones like UTC. Use datetime.astimezone to convert an existing aware datetime to another timezone, or use tz.localize to add a timezone to a naïve datetime instance.
But really, if start_datetime already has a timezone, you do not need to change its timezone for it to be comparable to dtnow. Datetimes from two different timezones are still comparable. Only a mix of naïve and aware datetimes aren't comparable.

Localise pandas timestamps with no DST transistion

I'm trying to import CSV data from a file produced by a device which has a system clock which is set to 'Australia/Adelaide' time, but doesn't switch from standard to daylight time in summer. I can import it no problem as tz-naive but I need to correlate it with data which is tz-aware.
The following is incorrect as it assumes the data transitions to summer time on '2017-10-01'
data = pd.read_csv('~/dev/datasets/data.csv', parse_dates=['timestamp'], index_col=['timestamp'])
data.index.tz_localize('Australia/Adelaide')
tz_localize contains a number of arguments to deal with ambiguous dates - but I don't see any way to tell it that the data doesn't transition at all. Is there a way to specify a "custom" timezone that's 'Australia/Adelaide', no daylight savings?
Edit: I found this question - Create New Timezone in pytz which has given me some ideas - in this case the timestamps are a constant offset from UTC so i can probably add that to the date after importing, localise as UTC then convert to 'Australia/Adelaide'. I'll report back...
The solution I came up with is as follows:
Since the data is 'Australia/Adelaide' with no DLS transistion, that means the UTC offset is a constant (+10:30) all year. Hence a solution is to import that data as tz-naive, subtract 10 hours and 30 minutes, localise as UTC then convert to 'Australia/Adelaide', i.e.
data = pd.read_csv('~/dev/datasets/data.csv', parse_dates=['timestamp'], index_col=['timestamp'])
data.index = data.index - pd.DateOffset(hours=10) - pd.DateOffset(minutes=30)
data.index = data.index.tz_localize('UTC').tz_convert('Australia/Adelaide')

Python - How to convert datetime data using toordinal considering the time

Let's assume that I have the following data:
25/01/2000 05:50
When I convert it using datetime.toordinal, it returns this value:
730144
That's nice, but this value just considers the date itself. I also want it to consider the hour and minutes (05:50). How can I do it using datetime?
EDIT:
I want to convert a whole Pandas Series.
An ordinal date is by definition only considering the year and day of year, i.e. its resolution is 1 day.
You can get the microseconds / milliseconds (depending on your platform) from epoch using
datetime.datetime.strptime('25/01/2000 05:50', '%d/%m/%Y %H:%M').timestamp()
for a pandas series you can do
s = pd.Series(['25/01/2000 05:50', '25/01/2000 05:50', '25/01/2000 05:50'])
s = pd.to_datetime(s) # make sure you're dealing with datetime instances
s.apply(lambda v: v.timestamp())
If you use python 3.x. You can get date with time in seconds from 1/1/1970 00:00
from datetime import datetime
dt = datetime.today() # Get timezone naive now
seconds = dt.timestamp()

Python: How to compare two date/time?

I have the following two date/time which are date_time1 and date_time2 respectively:
2017-04-15 00:00:00
2017-04-17 15:35:19+00:00
parsed1 = dateutil.parser.parse(date_time1)
parsed2 = dateutil.parser.parse(date_time2)
and would if I were to receive another date/time called input_date_time (e.g. 2017-04-16 12:11:42+00:00), would like to do the following:
# Would like to check if `input_date_time` is within the range
if parsed1 <= input_date_time <= parsed2:
…
And got an error: TypeError: can't compare offset-naive and offset-aware datetimes
Thought up of breaking it down to just year, month, day, hour, minute, and second, and compare every single one.
What would be the proper way to do so?
here is my edited (again) example
I think we should provide timezone data to every datetime object
assume that date_time1 is a local time.
I think we should add timezone data to date_time1 instead of clear other tzinfo (my first example)
import dateutil.parser
import datetime
from pytz import utc
date_time1 ='2017-04-15 00:00:00'
date_time2 ='2017-04-17 15:35:19+00:00'
input_date_time = '2017-04-16 12:11:42+00:00'
parsed1 = dateutil.parser.parse(date_time1).astimezone(utc)
parsed2 = dateutil.parser.parse(date_time2)
input_parsed = dateutil.parser.parse(input_date_time)
if parsed1 <= input_parsed <= parsed2:
print('input is between')
this can check if input is between parsed1 and parsed2
Assuming you have python datetime obejcts,
two objects in python can be compared with the "<", "==", and ">" signs.
You don't need to parse them to compare them.
if date_time1 <= input_date_time <= datetime_2:
#do work
If you don't have datetime objects, there is also a method called datetime in the datetime class, which will allow you to create datetime objects, if you'll find that useful.
You need to apply a timezone to the 'naive ' datetime object (2017-04-15 00:00:00 in your example) (to make it TZ aware) OR convert the 'aware' datetime object (2017-04-17 15:35:19+00:00 in your example) to a 'naive' object and the date you are trying to compare.
Then your TypeError will disappear.
Since your second date has a timezone offset of +00:00 and your input_datetime is also +00:00, let's apply UTC to the naive first date (assuming that it's the correct timezone) and then convert it to whatever timezone you need (you can skip the conversion if UTC is correct - the comparison will now work.)
parsed1 = dateutil.parser.parse(date_time1)
parsed2 = dateutil.parser.parse(date_time2)
# make parsed1 timezone aware (UTC)
parsed1 = parsed1.replace(tzinfo=pytz.utc)
Now your comparison should work.
If you want to apply another timezone to any of the dates, you can use the astimezone function. Lets change the timezone to that applicable to Sydney, Australia. Here is a list of timezones https://gist.github.com/heyalexej/8bf688fd67d7199be4a1682b3eec7568
syd_tz = pytz.timezone('Australia/Sydney')
syd_parsed1 = parsed1.astimezone(syd_tz)
You can now check what timezone is applied to each of your datetime objects using the %zand %Z parameters for strftime. Using %c will print it in the local time format as will %x and %X.
Using Python3+:
print("Local time: %s" % syd_parsed1.strftime('%c'))
print("Offset-Timezone-Date-Time: %s" % syd_parsed1.strftime("%z-%Z-%x-%X))
Hope that helps, the timezone functions did my head in when I used them the first time when I didn't know about %c.

Pandas: Calculate the difference between two Datetime columns from different timezones

I have two different time series. One is a series of timestamps in ms-format from the CET timezone delivered as strings. The other are unix-timestamps in s-format in the UTC timezone.
Each of them is in a column in a larger dataframe, none of them is a DatetimeIndex and should not be one.
I need to convert the CET time to UTC and then calculate the difference between both columns and I'm lost between the Datetime functionalities of Python and Pandas, and the variety of different datatypes.
Here's an example:
import pandas as pd
import pytz
germany = pytz.timezone('Europe/Berlin')
D1 = ["2016-08-22 00:23:58.254","2016-08-22 00:23:58.254",
"2016-08-22 00:23:58.254","2016-08-22 00:40:33.260",
"2016-08-22 00:40:33.260","2016-08-22 00:40:33.260"]
D2 = [1470031195, 1470031195, 1470031195, 1471772027, 1471765890, 1471765890]
S1 = pd.to_datetime(pd.Series(D1))
S2 = pd.to_datetime(pd.Series(D2),unit='s')
First problem
is with the use of tz_localize. I need the program to understand, that the data in S1 is not in UTC, but in CET. However using tz_localize like this seems to interpret the given datetime as CET assuming it's UTC to begin with:
F1 = S1.apply(lambda x: x.tz_localize(germany)).to_frame()
Trying tz_convert always throws something like:
TypeError: index is not a valid DatetimeIndex or PeriodIndex
Second problem
is that even with both of them having the same format I'm stuck because I can't calculate the difference between the two columns now:
F1 = S1.apply(lambda x: x.tz_localize(germany)).to_frame()
F1.columns = ["CET"]
F2 = S2.apply(lambda x: x.tz_localize('UTC')).to_frame()
F2.columns = ["UTC"]
FF = pd.merge(F1,F2,left_index=True,right_index=True)
FF.CET-FF.UTC
ValueError: Incompatbile tz's on datetime subtraction ops
I need a way to do these calculation with tz-aware datetime objects that are no DatetimeIndex objects.
Alternatively I need a way to make my CET-column to just look like this:
2016-08-21 22:23:58.254
2016-08-21 22:23:58.254
2016-08-21 22:23:58.254
2016-08-21 22:40:33.260
2016-08-21 22:40:33.260
2016-08-21 22:40:33.260
That is, I don't need my datetime to be tz-aware, I just want to convert it automatically by adding/subtracting the necessary amount of time with an awareness for daylight saving times.
If it weren't for DST I could just do a simple subtraction on two integers.
First you need to convert the CET timestamps to datetime and specify the timezone:
S1 = pd.to_datetime(pd.Series(D1))
T1_cet = pd.DatetimeIndex(S1).tz_localize('Europe/Berlin')
Then convert the UTC timestamps to datetime and specify the timezone to avoid confusion:
S2 = pd.to_datetime(pd.Series(D2), unit='s')
T2_utc = pd.DatetimeIndex(S1).tz_localize('UTC')
Now convert the CET timestamps to UTC:
T1_utc = T1_cet.tz_convert('UTC')
And finally calculate the difference between the timestamps:
diff = pd.Series(T1_utc) - pd.Series(T2_utc)

Categories

Resources