Pandas: Calculate the difference between two Datetime columns from different timezones - python

I have two different time series. One is a series of timestamps in ms-format from the CET timezone delivered as strings. The other are unix-timestamps in s-format in the UTC timezone.
Each of them is in a column in a larger dataframe, none of them is a DatetimeIndex and should not be one.
I need to convert the CET time to UTC and then calculate the difference between both columns and I'm lost between the Datetime functionalities of Python and Pandas, and the variety of different datatypes.
Here's an example:
import pandas as pd
import pytz
germany = pytz.timezone('Europe/Berlin')
D1 = ["2016-08-22 00:23:58.254","2016-08-22 00:23:58.254",
"2016-08-22 00:23:58.254","2016-08-22 00:40:33.260",
"2016-08-22 00:40:33.260","2016-08-22 00:40:33.260"]
D2 = [1470031195, 1470031195, 1470031195, 1471772027, 1471765890, 1471765890]
S1 = pd.to_datetime(pd.Series(D1))
S2 = pd.to_datetime(pd.Series(D2),unit='s')
First problem
is with the use of tz_localize. I need the program to understand, that the data in S1 is not in UTC, but in CET. However using tz_localize like this seems to interpret the given datetime as CET assuming it's UTC to begin with:
F1 = S1.apply(lambda x: x.tz_localize(germany)).to_frame()
Trying tz_convert always throws something like:
TypeError: index is not a valid DatetimeIndex or PeriodIndex
Second problem
is that even with both of them having the same format I'm stuck because I can't calculate the difference between the two columns now:
F1 = S1.apply(lambda x: x.tz_localize(germany)).to_frame()
F1.columns = ["CET"]
F2 = S2.apply(lambda x: x.tz_localize('UTC')).to_frame()
F2.columns = ["UTC"]
FF = pd.merge(F1,F2,left_index=True,right_index=True)
FF.CET-FF.UTC
ValueError: Incompatbile tz's on datetime subtraction ops
I need a way to do these calculation with tz-aware datetime objects that are no DatetimeIndex objects.
Alternatively I need a way to make my CET-column to just look like this:
2016-08-21 22:23:58.254
2016-08-21 22:23:58.254
2016-08-21 22:23:58.254
2016-08-21 22:40:33.260
2016-08-21 22:40:33.260
2016-08-21 22:40:33.260
That is, I don't need my datetime to be tz-aware, I just want to convert it automatically by adding/subtracting the necessary amount of time with an awareness for daylight saving times.
If it weren't for DST I could just do a simple subtraction on two integers.

First you need to convert the CET timestamps to datetime and specify the timezone:
S1 = pd.to_datetime(pd.Series(D1))
T1_cet = pd.DatetimeIndex(S1).tz_localize('Europe/Berlin')
Then convert the UTC timestamps to datetime and specify the timezone to avoid confusion:
S2 = pd.to_datetime(pd.Series(D2), unit='s')
T2_utc = pd.DatetimeIndex(S1).tz_localize('UTC')
Now convert the CET timestamps to UTC:
T1_utc = T1_cet.tz_convert('UTC')
And finally calculate the difference between the timestamps:
diff = pd.Series(T1_utc) - pd.Series(T2_utc)

Related

Python Convert UTC Datetime in string to unix time

I have a column called 'created_at' in dataframe df, its value is like '2/3/15 2:00' in UTC. Now I want to convert it to unix time, how can I do that?
I tried the script like:
time.mktime(datetime.datetime.strptime(df['created_at'], "%m/%d/%Y, %H:%MM").timetuple())
It returns error I guess the tricky part is the year is '15' instead of '2015'
Is there any efficient way that I am able to deal with it?
Thanks!
since you mention that you're working with a pandas DataFrame, you can simplify to using
import pandas as pd
import numpy as np
df = pd.DataFrame({'times': ['2/3/15 2:00']})
# to datetime, format is inferred correctly
df['datetime'] = pd.to_datetime(df['times'])
# df['datetime']
# 0 2015-02-03 02:00:00
# Name: datetime, dtype: datetime64[ns]
# to Unix time / seconds since 1970-1-1 Z
# .astype(np.int64) on datetime Series gives you nanoseconds, so divide by 1e9 to get seconds
df['unix'] = df['datetime'].astype(np.int64) / 1e9
# df['unix']
# 0 1.422929e+09
# Name: unix, dtype: float64
%Y is for 4-digit years.
Since you have 2-digits years (assuming it's 20##), you can use %y specifier instead (notice the lower-case y).
You should use lowercase %y (year without century) rather than uppercase %Y (year with century)

How to solve datetime comparing issue in python

My goal is to compare the datetime now with another datetime given to my program from a json.
After comparing the two datetimes , the result is different from the reality.
The timezone is tz = pytz.timezone('Europe/Athens') which is UTC+3
The json time initially is in string format and after handling I turn the format into datetime
"start_time": "2020-08-11T20:13:00+03:00", the json data
start_time = data.get('start_time')
start_datetime = dateutil.parser.parse(start_time), #datetime format
Now after calling a function in order to check which datetime is bigger than the other, with
the information that the date now is:
2020-08-11 14:51:21.713511+03:00
and start_date is :
2020-08-11 13:00:00+03:00
the function returns True which is wrong since the start_datetime is not bigger than the datetime now.
Here is the function:
def check_start_datetime_bigger_than_now(start_datetime):
tz = pytz.timezone('Europe/Athens')
dts = start_datetime.replace(tzinfo=tz)
dtnow = datetime.now(pytz.timezone('Europe/Athens'))
print(dts)
print(dtnow)
#compare the datetimes
if dts >= dtnow:
return True
else:
return False
Can anyone help me on clarifying what's happening?
before the compare the print of datetimes is giving:
2020-08-11 20:13:00+01:35
2020-08-11 15:06:55.397784+03:00
Why the start date is giving +01:35
You should not use datetime.replace to change the timezone of a datetime instance. It is not smart and cannot handle anything other than simple timezones like UTC. Use datetime.astimezone to convert an existing aware datetime to another timezone, or use tz.localize to add a timezone to a naïve datetime instance.
But really, if start_datetime already has a timezone, you do not need to change its timezone for it to be comparable to dtnow. Datetimes from two different timezones are still comparable. Only a mix of naïve and aware datetimes aren't comparable.

ValueError: time data '10/11/2006 24:00' does not match format '%d/%m/%Y %H:%M'

I tried:
df["datetime_obj"] = df["datetime"].apply(lambda dt: datetime.strptime(dt, "%d/%m/%Y %H:%M"))
but got this error:
ValueError: time data '10/11/2006 24:00' does not match format
'%d/%m/%Y %H:%M'
How to solve it correctly?
The reason why this does not work is because the %H parameter only accepts values in the range of 00 to 23 (both inclusive). This thus means that 24:00 is - like the error says - not a valid time string.
I think therefore we have not much other options than convert the string to a valid format. We can do this by first replacing 24:00 with 00:00, and then later increment the day for these timestamps.
Like:
from datetime import timedelta
import pandas as pd
df['datetime_zero'] = df['datetime'].str.replace('24:00', '0:00')
df['datetime_er'] = pd.to_datetime(df['datetime_zero'], format='%d/%m/%Y %H:%M')
selrow = df['datetime'].str.contains('24:00')
df['datetime_obj'] = df['datetime_er'] + selrow * timedelta(days=1)
The last line thus adds one day to the rows that contain 24:00, such that '10/11/2006 24:00' gets converted to '11/11/2006 24:00'. Note however that the above is rather unsafe since depending on the format of the timestamp this will/will not work. For the above it will (probably) work, since there is only one colon. But if for example the datetimes have seconds as well, the filter could get triggered for 00:24:00, so it might require some extra work to get it working.
Your data doesn't follow the conventions used by Python / Pandas datetime objects. There should be only one way of storing a particular datetime, i.e. '10/11/2006 24:00' should be rewritten as '11/11/2006 00:00'.
Here's one way to approach the problem:
# find datetimes which have '24:00' and rewrite
twenty_fours = df['strings'].str[-5:] == '24:00'
df.loc[twenty_fours, 'strings'] = df['strings'].str[:-5] + '00:00'
# construct datetime series
df['datetime'] = pd.to_datetime(df['strings'], format='%d/%m/%Y %H:%M')
# add one day where applicable
df.loc[twenty_fours, 'datetime'] += pd.DateOffset(1)
Here's some data to test:
dateList = ['10/11/2006 24:00', '11/11/2006 00:00', '12/11/2006 15:00']
df = pd.DataFrame({'strings': dateList})
Result after transformations described above:
print(df['datetime'])
0 2006-11-11 00:00:00
1 2006-11-11 00:00:00
2 2006-11-12 15:00:00
Name: datetime, dtype: datetime64[ns]
As indicated in the documentation (https://docs.python.org/2/library/datetime.html#strftime-strptime-behavior), hours go from 00 to 23. 24:00 is then an error.

Python - How to convert datetime data using toordinal considering the time

Let's assume that I have the following data:
25/01/2000 05:50
When I convert it using datetime.toordinal, it returns this value:
730144
That's nice, but this value just considers the date itself. I also want it to consider the hour and minutes (05:50). How can I do it using datetime?
EDIT:
I want to convert a whole Pandas Series.
An ordinal date is by definition only considering the year and day of year, i.e. its resolution is 1 day.
You can get the microseconds / milliseconds (depending on your platform) from epoch using
datetime.datetime.strptime('25/01/2000 05:50', '%d/%m/%Y %H:%M').timestamp()
for a pandas series you can do
s = pd.Series(['25/01/2000 05:50', '25/01/2000 05:50', '25/01/2000 05:50'])
s = pd.to_datetime(s) # make sure you're dealing with datetime instances
s.apply(lambda v: v.timestamp())
If you use python 3.x. You can get date with time in seconds from 1/1/1970 00:00
from datetime import datetime
dt = datetime.today() # Get timezone naive now
seconds = dt.timestamp()

how to subtract date from date from sql in python

I run a sql query that returns a date in the format '2015-03-01T17:09:00.000+0000' I want to subtract this from today's date.
I am getting today's date with the following:
import datetime
now = datetime.datetime.now()
The formats don't seem to line up and I can't figure out a standardize format.
You can use strptime from datetime module to get python compatible date time from your query result using a format string. (You might have to play with the format string a bit to suit your case)
ts = '2015-03-01T17:09:00.000+0000' to a format string like
f = '%Y-%m-%dT%H:%M:%S.%f%z'
date_from_sql = datetime.datetime.strptime(ts, f)
now = datetime.datetime.now()
delta = date_from_sql - now
The .000 is probably microseconds (denoted by %f in the format string) and the +0000 is the utc offset (denoted by %z in the format string). Check this out for more formatting options: https://docs.python.org/2/library/datetime.html#strftime-strptime-behavior
Check out this thread for an example: what is the proper way to convert between mysql datetime and python timestamp?
Checkout this for more on strptime https://docs.python.org/2/library/datetime.html#datetime.datetime.strptime
Getting the delta between two datetime objects in Python is really simple, you simply subtract them.
import datetime
d1 = datetime.datetime.now()
d2 = datetime.datetime.now()
delta = d2 - d1
print delta.total_seconds()
d2 - d1 returns a datetime.timedelta object, from which you can get the total second difference between the two dates.
As for formatting the dates, you can read about formatting strings into datetime objects, and datetime objects into string here
You'll read about the strftime() and strptime() functions, and with them you can get yourself two datetime objects which you can subtract from each other.

Categories

Resources