pandas raises ValueError on DatetimeIndex Conversion - python

I am converting all ISO-8601 formatted values into Unix Values. For some inexplicable reason this line
a_col = pd.DatetimeIndex(a_col).astype(np.int64)/10**6
raises the error
ValueError: Unable to convert 0 2001-06-29
... (Abbreviated Output of Column
Name: DateCol, dtype: datetime64[ns] to datetime dtype
This is very odd because I've guaranteed that each value is in datetime.datetime format as you can see here:
if a_col.dtypes is (np.dtype('object') or np.dtype('O')):
a_col = a_col.apply(lambda x: x if isinstance(x, datetime.datetime) else epoch)
a_col = pd.DatetimeIndex(a_col).astype(np.int64)/10**6
Epoch is datetime.datetime.
When I check the dtypes of the column that gives me an error it's "object), exactly what I'm checking for. Is there something I'm missing?

Assuming that your time zone is US/Eastern (based on your dataset) and that your DataFrame is named df, please try the following:
import datetime as dt
from time import mktime
import pytz
df['Job Start Date'] = \
df['Job Start Date'].apply(lambda x: mktime(pytz.timezone('US/Eastern').localize(x)
.astimezone(pytz.UTC).timetuple()))
>>> df['Job Start Date'].head()
0 993816000
1 1080824400
2 1052913600
3 1080824400
4 1075467600
Name: Job Start Date, dtype: float64
You first need to make your 'naive' datetime objects timezone aware (to US/Eastern) and then convert them to UTC. Finally, pass your new UTC aware datetime object as a timetable to the mtkime function from the time module.

Related

pandas datetime column, comparing to datetime variables: TypeError: can't compare offset-naive and offset-aware datetimes

I'm trying to see if the 'DateTime_Added' rows in the df are within the last 2 days of the execution date.
execution_date = context.get("execution_date")
printed execution_date: `2022-02-14T01:00:00+00:00`
printed type execution_date: `<class 'pendulum.datetime.DateTime'>`
last_2_days=execution_date - timedelta(hours=48)
printed last_2_days: 2022-02-12T01:00:00+00:00
printed type last_2_days: <class 'pendulum.datetime.DateTime'>
I've converted the DateTime_Added col to datetime because it was string before:
df['DateTime_Added'] = pd.to_datetime(df['DateTime_Added'])
print df.info()
DateTime_Added 2 non-null datetime64[ns]
comment 3080 non-null object
then when I try to run this I see can't compare offset-naive and offset-aware datetimes:
if row['comment'] is not None and row['comment'] != '' and
row['DateTime_Added'] is not None and row['DateTime_Added'] != ''
and (last_2_days <= row['DateTime Comment Added'] <= execution_date):
Pendulum datetime enforces timezone by default (which is the 00:00 offset here), and the df['DateTime_Added'] series does not have a timezone. This means the two can't be compared, which is what the error is indicating.
Pendulum has the naive() helper method to remove offset from the datetime object.
Running last_2_days = pendulum.naive(last_2_days) before doing the comparison should resolve the error.

Python Convert UTC Datetime in string to unix time

I have a column called 'created_at' in dataframe df, its value is like '2/3/15 2:00' in UTC. Now I want to convert it to unix time, how can I do that?
I tried the script like:
time.mktime(datetime.datetime.strptime(df['created_at'], "%m/%d/%Y, %H:%MM").timetuple())
It returns error I guess the tricky part is the year is '15' instead of '2015'
Is there any efficient way that I am able to deal with it?
Thanks!
since you mention that you're working with a pandas DataFrame, you can simplify to using
import pandas as pd
import numpy as np
df = pd.DataFrame({'times': ['2/3/15 2:00']})
# to datetime, format is inferred correctly
df['datetime'] = pd.to_datetime(df['times'])
# df['datetime']
# 0 2015-02-03 02:00:00
# Name: datetime, dtype: datetime64[ns]
# to Unix time / seconds since 1970-1-1 Z
# .astype(np.int64) on datetime Series gives you nanoseconds, so divide by 1e9 to get seconds
df['unix'] = df['datetime'].astype(np.int64) / 1e9
# df['unix']
# 0 1.422929e+09
# Name: unix, dtype: float64
%Y is for 4-digit years.
Since you have 2-digits years (assuming it's 20##), you can use %y specifier instead (notice the lower-case y).
You should use lowercase %y (year without century) rather than uppercase %Y (year with century)

How to Convert EST to Local Time with Dataframe

I'm trying to convert timestamps in EST to various localized timestamps in a pandas dataframe. I have a dataframe with timestamps in EST and a timezone into which they need to be converted.
I know that there are several threads already on topics like this. However, they either start in UTC or I can't replicate with my data.
Before writing, I consulted: How to convert GMT time to EST time using python
I imported the data:
import pandas
import datetime as dt
import pytz
transaction_timestamp_est local_timezone
2013-05-28 05:18:00+00:00 America/Chicago
2013-06-12 05:23:20+00:00 America/Los_Angeles
2014-06-21 05:26:26+00:00 America/New_York
I converted to datetime and created the following function:
df.transaction_timestamp_est =
pd.to_datetime(df.transaction_timestamp_est)
def db_time_to_local(row):
db_tz = pytz.timezone('America/New_York')
local_tz = pytz.timezone(row['local_timezone'])
db_date = db_tz.localize(row['transaction_timestamp_est'])
local_date = db_date.astimezone(local_tz)
return local_date
I run it here:
df['local_timestamp'] = df.apply(db_time_to_local, axis=1)
And get this error:
ValueError: ('Not naive datetime (tzinfo is already set)', 'occurred at index 0')
I expect a new column in the dataframe called 'local_timestamp' that has the timestamp adjusted according to the data in the local_timezone column.
Any help is appreciated!
The error you see looks like its because you are trying to localize a tz-aware timestamp. The '+00:00' in your timestamps indicates these are tz-aware, in UTC (or something like it).
Some terminology: a naive date/time has no concept of timezone, a tz-aware (or localised) one is associated with a particular timezone. Localizing refers to converting a tz-naive date/time to a tz-aware one. By definition you can't localize a tz-aware date/time: you either either convert it to naive and then localize, or convert directly to the target timezone.
To get that column into EST, convert to naive and then localize to EST:
In [98]: df['transaction_timestamp_est'] = df['transaction_timestamp_est'].dt.tz_localize(None).dt.tz_localize('EST')
In [99]: df
Out [99]:
0 2013-05-28 05:18:00-05:00
1 2013-06-12 05:23:20-05:00
2 2014-06-21 05:26:26-05:00
Name: transaction_timestamp_est, dtype: datetime64[ns, EST]
Note the 'EST' in the dtype.
Then, you can convert each timestamp to its target timezone:
In [100]: df['local_ts'] = df.apply(lambda x: x[0].tz_convert(x[1]), axis=1)
In [101]: df
Out[101]:
transaction_timestamp_est local_timezone local_ts
0 2013-05-28 05:18:00-05:00 America/Chicago 2013-05-28 05:18:00-05:00
1 2013-06-12 05:23:20-05:00 America/Los_Angeles 2013-06-12 03:23:20-07:00
2 2014-06-21 05:26:26-05:00 America/New_York 2014-06-21 06:26:26-04:00
To explain: each element in the first column is of type pd.Timestamp. Its tz_convert() method changes its timezone, converting the date/time to the new zone.
This produces a column of pd.Timestamps with a mixture of timezones, which is a pain to handle in pandas. Most (perhaps all) pandas functions that operate on columns of date/times require the whole column to have the same timezone.
If you prefer, convert to tz-naive:
In [102]: df['local_ts'] = df.apply(lambda x: x[0].tz_convert(x[1]).tz_convert(None), axis=1)
In [103]: df
Out[103]:
transaction_timestamp_est local_timezone local_ts
0 2013-05-28 05:18:00-05:00 America/Chicago 2013-05-28 10:18:00
1 2013-06-12 05:23:20-05:00 America/Los_Angeles 2013-06-12 10:23:20
2 2014-06-21 05:26:26-05:00 America/New_York 2014-06-21 10:26:26
If your data allows, its better to try to keep columns of timestamps (or indices) in a single timezone. UTC is usually best as it doesnt have DST transitions or other issues that can result in missing / ambiguous times, as most other timezones do.
from datetime import datetime, time, date
from pytz import timezone, utc
tz = timezone("Asia/Dubai")
d = datetime.fromtimestamp(1426017600,tz)
print d
midnight = tz.localize(datetime.combine(date(d.year, d.month, d.day),time(0,0)), is_dst=None)
print int((midnight - datetime(1970, 1, 1, tzinfo=utc)).total_seconds())
Based on code from python - datetime with timezone to epoch

Calculate with different datetime formats, datetime and datetime64

I am trying to calculate the days between two dates:
First Date:
date = datetime.datetime.today() # {datetime} 2018-09-17 14:42:06.506541
Second date, extracted from a data frame:
date2 = data_audit.loc[(data_audit.Audit == audit), 'Erledigungsdatum'].values[0]
# {datetime64} 2018-07-23T00:00:00.000000000
The error:
ufunc subtract cannot use operands with types dtype('O') and dtype('M8[ns]')
My next try was:
date = np.datetime64(datetime.datetime.now()) # {datetime64} 2018-09-17T14:48:16.599541
Which resulted in the following error (I pass the date as a parameter in a function):
ufunc 'bitwise_and' not supported for the input types, and the inputs
could not be safely coerced to any supported types according to the
casting rule ''safe''
How should I approach this problem? The second one seems more logical to me, but I don't understand why I cant pass a simple date to a function.
I believe something like this should work for you:
import datetime
import numpy as np
# earlier being of type datetime64
earlier = np.datetime64(datetime.datetime.today())
# Converting datetime64 to datetime
earlier = earlier.astype(datetime.datetime)
now = datetime.datetime.today()
print(now-earlier)
Let's try such approach
import datetime
date = datetime.datetime.today()
date2 = '2018-07-23'#here could be your date converted to proper type
date2 = datetime.datetime.strptime(date2, '%Y-%m-%d')
difference = date- date2
difference = difference.days
And you can apply df.some_col_with_difference.astype('timedelta64[D]') to whole column in dataframe as well

How to perform logical tests on time values in a pandas dataframe

I have an excel sheet where one column contains a time field, where the values are the time of day entered as four digits: i.e. 0845, 1630, 1000.
I've read this into a pandas dataframe for analysis, one piece of which is labeling each time as day or evening. To do this, I first changed the datatype and format:
# Get start time as time
df['START_TIME'] = pd.to_datetime(df['START_TIME'],format='%H%M').dt.time
Which gets the values looking like:
08:45:00
16:30:00
10:00:00
The new dtype is object.
When I try to perform a logical test on that field, i.e.
# Create indicator of whether course begins before or after 4:00 PM
df['DAY COURSE INDICATOR'] = df['START_TIME'] < '16:00:00'
I get a Type Error:
TypeError: '<' not supported between instances of >'datetime.time' and 'str'
or syntax error if I remove the quotes.
What is the best way to create that indicator; how do I work with stand-alone time values? Or am I better off just leaving them as integers.
You can't compare a datetime.time and a str but you certainly can compare a datetime.time and a datetime.time:
import datetime
df['DAY COURSE INDICATOR'] = df['START_TIME'] < datetime.time(16, 0)
You can do exactly what you did in the first place:
pd.to_datetime(df['START_TIME'], format='%H:%M:%S') < pd.to_datetime('16:00:00', format='%H:%M:%S')
Example:
df = pd.DataFrame({'START_TIME': ['08:45']})
>>> pd.to_datetime(df['START_TIME'], format='%H:%M:%S') < pd.to_datetime('16:00:00', format='%H:%M:%S')
0 True
Name: START_TIME, dtype: bool

Categories

Resources