I'm trying to convert timestamps in EST to various localized timestamps in a pandas dataframe. I have a dataframe with timestamps in EST and a timezone into which they need to be converted.
I know that there are several threads already on topics like this. However, they either start in UTC or I can't replicate with my data.
Before writing, I consulted: How to convert GMT time to EST time using python
I imported the data:
import pandas
import datetime as dt
import pytz
transaction_timestamp_est local_timezone
2013-05-28 05:18:00+00:00 America/Chicago
2013-06-12 05:23:20+00:00 America/Los_Angeles
2014-06-21 05:26:26+00:00 America/New_York
I converted to datetime and created the following function:
df.transaction_timestamp_est =
pd.to_datetime(df.transaction_timestamp_est)
def db_time_to_local(row):
db_tz = pytz.timezone('America/New_York')
local_tz = pytz.timezone(row['local_timezone'])
db_date = db_tz.localize(row['transaction_timestamp_est'])
local_date = db_date.astimezone(local_tz)
return local_date
I run it here:
df['local_timestamp'] = df.apply(db_time_to_local, axis=1)
And get this error:
ValueError: ('Not naive datetime (tzinfo is already set)', 'occurred at index 0')
I expect a new column in the dataframe called 'local_timestamp' that has the timestamp adjusted according to the data in the local_timezone column.
Any help is appreciated!
The error you see looks like its because you are trying to localize a tz-aware timestamp. The '+00:00' in your timestamps indicates these are tz-aware, in UTC (or something like it).
Some terminology: a naive date/time has no concept of timezone, a tz-aware (or localised) one is associated with a particular timezone. Localizing refers to converting a tz-naive date/time to a tz-aware one. By definition you can't localize a tz-aware date/time: you either either convert it to naive and then localize, or convert directly to the target timezone.
To get that column into EST, convert to naive and then localize to EST:
In [98]: df['transaction_timestamp_est'] = df['transaction_timestamp_est'].dt.tz_localize(None).dt.tz_localize('EST')
In [99]: df
Out [99]:
0 2013-05-28 05:18:00-05:00
1 2013-06-12 05:23:20-05:00
2 2014-06-21 05:26:26-05:00
Name: transaction_timestamp_est, dtype: datetime64[ns, EST]
Note the 'EST' in the dtype.
Then, you can convert each timestamp to its target timezone:
In [100]: df['local_ts'] = df.apply(lambda x: x[0].tz_convert(x[1]), axis=1)
In [101]: df
Out[101]:
transaction_timestamp_est local_timezone local_ts
0 2013-05-28 05:18:00-05:00 America/Chicago 2013-05-28 05:18:00-05:00
1 2013-06-12 05:23:20-05:00 America/Los_Angeles 2013-06-12 03:23:20-07:00
2 2014-06-21 05:26:26-05:00 America/New_York 2014-06-21 06:26:26-04:00
To explain: each element in the first column is of type pd.Timestamp. Its tz_convert() method changes its timezone, converting the date/time to the new zone.
This produces a column of pd.Timestamps with a mixture of timezones, which is a pain to handle in pandas. Most (perhaps all) pandas functions that operate on columns of date/times require the whole column to have the same timezone.
If you prefer, convert to tz-naive:
In [102]: df['local_ts'] = df.apply(lambda x: x[0].tz_convert(x[1]).tz_convert(None), axis=1)
In [103]: df
Out[103]:
transaction_timestamp_est local_timezone local_ts
0 2013-05-28 05:18:00-05:00 America/Chicago 2013-05-28 10:18:00
1 2013-06-12 05:23:20-05:00 America/Los_Angeles 2013-06-12 10:23:20
2 2014-06-21 05:26:26-05:00 America/New_York 2014-06-21 10:26:26
If your data allows, its better to try to keep columns of timestamps (or indices) in a single timezone. UTC is usually best as it doesnt have DST transitions or other issues that can result in missing / ambiguous times, as most other timezones do.
from datetime import datetime, time, date
from pytz import timezone, utc
tz = timezone("Asia/Dubai")
d = datetime.fromtimestamp(1426017600,tz)
print d
midnight = tz.localize(datetime.combine(date(d.year, d.month, d.day),time(0,0)), is_dst=None)
print int((midnight - datetime(1970, 1, 1, tzinfo=utc)).total_seconds())
Based on code from python - datetime with timezone to epoch
Related
date_cet col1
---------------------------------------
2021-10-31 02:00:00+02:00 7.0
2021-10-31 02:00:00+02:00 7.0
2021-10-31 02:00:00+02:00 8.0
2021-10-31 02:00:00+01:00 10.0
2021-10-31 02:00:00+01:00 11.0
I have a data frame that has columns looking similar to this. This data is imported from SQL into a Pandas data frame, and when I print out the dtypes I can see that the date_cet column is object. Since I need it further on, I want to convert it to a datetime object. However, the stuff I've tried just doesn't work, and I think it might have something to do with 1) the timezone difference and 2) the fact that this date is where DST changes (i.e. the +01:00 and +02:00).
I've tried to do stuff like this:
import datetime as dt
df["new_date"] = [dt.datetime.strptime(str(x), "%Y-%m-%d %H:%M:%S %z") for x in df["date_cet"]]
df['new_date']= pd.to_datetime(df['date_cet'])
and a hand full of other stuff.
The first gives an error of:
ValueError: time data '2021-10-31 02:00:00+02:00' does not match format '%Y-%m-%d %H:%M:%S %z'
And the last:
ValueError: Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True
Basically, I have no idea how to fix this. I just need this column to become a datetime[ns, Europe/Copenhagen] type of column, but everything I've done so far doesn't work.
In the datetime string ('2021-10-31 02:00:00+02:00') there is no space between %S and %z
try to change to this format - "%Y-%m-%d %H:%M:%S%z"
import datetime as dt
df["new_date"] = [dt.datetime.strptime(str(x), "%Y-%m-%d %H:%M:%S%z") for x in df["date_cet"]]
df['new_date']= pd.to_datetime(df['date_cet'])
Update:
to fix the error try adding - utc=True:
import datetime as dt
df["new_date"] = [dt.datetime.strptime(str(x), "%Y-%m-%d %H:%M:%S%z") for x in df["date_cet"]]
df['new_date']= pd.to_datetime(df['date_cet'], utc=True)
you can do this by one line:
df['new_date']= pd.to_datetime(df['date_cet'], format="%Y-%m-%d %H:%M:%S%z", utc=True)
I have a column called 'created_at' in dataframe df, its value is like '2/3/15 2:00' in UTC. Now I want to convert it to unix time, how can I do that?
I tried the script like:
time.mktime(datetime.datetime.strptime(df['created_at'], "%m/%d/%Y, %H:%MM").timetuple())
It returns error I guess the tricky part is the year is '15' instead of '2015'
Is there any efficient way that I am able to deal with it?
Thanks!
since you mention that you're working with a pandas DataFrame, you can simplify to using
import pandas as pd
import numpy as np
df = pd.DataFrame({'times': ['2/3/15 2:00']})
# to datetime, format is inferred correctly
df['datetime'] = pd.to_datetime(df['times'])
# df['datetime']
# 0 2015-02-03 02:00:00
# Name: datetime, dtype: datetime64[ns]
# to Unix time / seconds since 1970-1-1 Z
# .astype(np.int64) on datetime Series gives you nanoseconds, so divide by 1e9 to get seconds
df['unix'] = df['datetime'].astype(np.int64) / 1e9
# df['unix']
# 0 1.422929e+09
# Name: unix, dtype: float64
%Y is for 4-digit years.
Since you have 2-digits years (assuming it's 20##), you can use %y specifier instead (notice the lower-case y).
You should use lowercase %y (year without century) rather than uppercase %Y (year with century)
I have two different time series. One is a series of timestamps in ms-format from the CET timezone delivered as strings. The other are unix-timestamps in s-format in the UTC timezone.
Each of them is in a column in a larger dataframe, none of them is a DatetimeIndex and should not be one.
I need to convert the CET time to UTC and then calculate the difference between both columns and I'm lost between the Datetime functionalities of Python and Pandas, and the variety of different datatypes.
Here's an example:
import pandas as pd
import pytz
germany = pytz.timezone('Europe/Berlin')
D1 = ["2016-08-22 00:23:58.254","2016-08-22 00:23:58.254",
"2016-08-22 00:23:58.254","2016-08-22 00:40:33.260",
"2016-08-22 00:40:33.260","2016-08-22 00:40:33.260"]
D2 = [1470031195, 1470031195, 1470031195, 1471772027, 1471765890, 1471765890]
S1 = pd.to_datetime(pd.Series(D1))
S2 = pd.to_datetime(pd.Series(D2),unit='s')
First problem
is with the use of tz_localize. I need the program to understand, that the data in S1 is not in UTC, but in CET. However using tz_localize like this seems to interpret the given datetime as CET assuming it's UTC to begin with:
F1 = S1.apply(lambda x: x.tz_localize(germany)).to_frame()
Trying tz_convert always throws something like:
TypeError: index is not a valid DatetimeIndex or PeriodIndex
Second problem
is that even with both of them having the same format I'm stuck because I can't calculate the difference between the two columns now:
F1 = S1.apply(lambda x: x.tz_localize(germany)).to_frame()
F1.columns = ["CET"]
F2 = S2.apply(lambda x: x.tz_localize('UTC')).to_frame()
F2.columns = ["UTC"]
FF = pd.merge(F1,F2,left_index=True,right_index=True)
FF.CET-FF.UTC
ValueError: Incompatbile tz's on datetime subtraction ops
I need a way to do these calculation with tz-aware datetime objects that are no DatetimeIndex objects.
Alternatively I need a way to make my CET-column to just look like this:
2016-08-21 22:23:58.254
2016-08-21 22:23:58.254
2016-08-21 22:23:58.254
2016-08-21 22:40:33.260
2016-08-21 22:40:33.260
2016-08-21 22:40:33.260
That is, I don't need my datetime to be tz-aware, I just want to convert it automatically by adding/subtracting the necessary amount of time with an awareness for daylight saving times.
If it weren't for DST I could just do a simple subtraction on two integers.
First you need to convert the CET timestamps to datetime and specify the timezone:
S1 = pd.to_datetime(pd.Series(D1))
T1_cet = pd.DatetimeIndex(S1).tz_localize('Europe/Berlin')
Then convert the UTC timestamps to datetime and specify the timezone to avoid confusion:
S2 = pd.to_datetime(pd.Series(D2), unit='s')
T2_utc = pd.DatetimeIndex(S1).tz_localize('UTC')
Now convert the CET timestamps to UTC:
T1_utc = T1_cet.tz_convert('UTC')
And finally calculate the difference between the timestamps:
diff = pd.Series(T1_utc) - pd.Series(T2_utc)
I have some measurements that happened on specific days in a dictionary. It looks like
date_dictionary['YYYY-MM-DD'] = measurement.
I want to calculate the variance between the measurements within 7 days from a given date. When I convert the date strings to a datetime.datetime, the result looks like a tuple or an array, but doesn't behave like one.
Is there an easy way to generate all the dates one week from a given date? If so, how can I do that efficiently?
You can do this using - timedelta . Example -
>>> from datetime import datetime,timedelta
>>> d = datetime.strptime('2015-07-22','%Y-%m-%d')
>>> for i in range(1,8):
... print(d + timedelta(days=i))
...
2015-07-23 00:00:00
2015-07-24 00:00:00
2015-07-25 00:00:00
2015-07-26 00:00:00
2015-07-27 00:00:00
2015-07-28 00:00:00
2015-07-29 00:00:00
You do not actually need to print it, datetime object + timedelta object returns a datetime object. You can use that returned datetime object directly in your calculation.
Using datetime, to generate all 7 dates following a given date, including the the given date, you can do:
import datetime
dt = datetime.datetime(...)
week_dates = [ dt + datetime.timedelta(days=i) for i in range(7) ]
There are libraries providing nicer APIs for performing datetime/date operations, most notably pandas (though it includes much much more). See pandas.date_range.
I am converting all ISO-8601 formatted values into Unix Values. For some inexplicable reason this line
a_col = pd.DatetimeIndex(a_col).astype(np.int64)/10**6
raises the error
ValueError: Unable to convert 0 2001-06-29
... (Abbreviated Output of Column
Name: DateCol, dtype: datetime64[ns] to datetime dtype
This is very odd because I've guaranteed that each value is in datetime.datetime format as you can see here:
if a_col.dtypes is (np.dtype('object') or np.dtype('O')):
a_col = a_col.apply(lambda x: x if isinstance(x, datetime.datetime) else epoch)
a_col = pd.DatetimeIndex(a_col).astype(np.int64)/10**6
Epoch is datetime.datetime.
When I check the dtypes of the column that gives me an error it's "object), exactly what I'm checking for. Is there something I'm missing?
Assuming that your time zone is US/Eastern (based on your dataset) and that your DataFrame is named df, please try the following:
import datetime as dt
from time import mktime
import pytz
df['Job Start Date'] = \
df['Job Start Date'].apply(lambda x: mktime(pytz.timezone('US/Eastern').localize(x)
.astimezone(pytz.UTC).timetuple()))
>>> df['Job Start Date'].head()
0 993816000
1 1080824400
2 1052913600
3 1080824400
4 1075467600
Name: Job Start Date, dtype: float64
You first need to make your 'naive' datetime objects timezone aware (to US/Eastern) and then convert them to UTC. Finally, pass your new UTC aware datetime object as a timetable to the mtkime function from the time module.