date_cet col1
---------------------------------------
2021-10-31 02:00:00+02:00 7.0
2021-10-31 02:00:00+02:00 7.0
2021-10-31 02:00:00+02:00 8.0
2021-10-31 02:00:00+01:00 10.0
2021-10-31 02:00:00+01:00 11.0
I have a data frame that has columns looking similar to this. This data is imported from SQL into a Pandas data frame, and when I print out the dtypes I can see that the date_cet column is object. Since I need it further on, I want to convert it to a datetime object. However, the stuff I've tried just doesn't work, and I think it might have something to do with 1) the timezone difference and 2) the fact that this date is where DST changes (i.e. the +01:00 and +02:00).
I've tried to do stuff like this:
import datetime as dt
df["new_date"] = [dt.datetime.strptime(str(x), "%Y-%m-%d %H:%M:%S %z") for x in df["date_cet"]]
df['new_date']= pd.to_datetime(df['date_cet'])
and a hand full of other stuff.
The first gives an error of:
ValueError: time data '2021-10-31 02:00:00+02:00' does not match format '%Y-%m-%d %H:%M:%S %z'
And the last:
ValueError: Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True
Basically, I have no idea how to fix this. I just need this column to become a datetime[ns, Europe/Copenhagen] type of column, but everything I've done so far doesn't work.
In the datetime string ('2021-10-31 02:00:00+02:00') there is no space between %S and %z
try to change to this format - "%Y-%m-%d %H:%M:%S%z"
import datetime as dt
df["new_date"] = [dt.datetime.strptime(str(x), "%Y-%m-%d %H:%M:%S%z") for x in df["date_cet"]]
df['new_date']= pd.to_datetime(df['date_cet'])
Update:
to fix the error try adding - utc=True:
import datetime as dt
df["new_date"] = [dt.datetime.strptime(str(x), "%Y-%m-%d %H:%M:%S%z") for x in df["date_cet"]]
df['new_date']= pd.to_datetime(df['date_cet'], utc=True)
you can do this by one line:
df['new_date']= pd.to_datetime(df['date_cet'], format="%Y-%m-%d %H:%M:%S%z", utc=True)
Related
sorry am new here and a total Python Rookie.
I pull Data with Python from Jira and put it into DataFrame. There I have a datetime as string in following format: DD/MM/YY HH:MM AM (or PM). Now I want to convert to Datetime to make it comparable with other datetimes to DD/MM/YY HH:MM:SS. I wanted to use datetime.strptime but it always fails. Any ideas? Thanks in advance!
Use a custom format: https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior
from datetime import datetime
datetime.strptime("05/11/22 07:40 AM", "%d/%m/%y %I:%M %p")
# datetime.datetime(2022, 11, 5, 7, 40)
datetime.strptime("05/11/22 07:40 PM", "%d/%m/%y %I:%M %p")
# datetime.datetime(2022, 11, 5, 19, 40)
You mention "DataFrame", so I guess you are using pandas. In that case, you should not have to do much work yourself. Just point pandas' read_* function to the column(s) that contain datetime-like strings. With argument dayfirst=True you make sure that dates are inferred as DD/MM/YY, not MM/DD/YY:
import pandas as pd
from io import StringIO
df = pd.read_csv(StringIO("""\
when,value
11/05/22 07:40 AM,42
10/05/22 09:42 AM,8
09/05/22 08:41 AM,15
"""), parse_dates=['when'], dayfirst=True)
This yields the following DataFrame, df.dtypes containing a nice datetime64[ns] for column "when".
when value
0 2022-05-11 07:40:00 42
1 2022-05-10 09:42:00 8
2 2022-05-09 08:41:00 15
If you already have the DataFrame with a string column, you can convert it after the fact using
df["when"] = pd.to_datetime(df["when"], dayfirst=True)
I cannot find the correct format for this datetime. I have tried several formats, %Y/%m/%d%I:%M:%S%p is the closest format I can find for the example below.
df['datetime'] = '2019-11-13 16:28:05.779'
df['datetime'] = pd.to_datetime(df['datetime'], format="%Y/%m/%d%I:%M:%S%p")
Result:
ValueError: time data '2019-11-13 16:28:05.779' does not match format '%Y/%m/%d%I:%M:%S%p' (match)
Before guessing yourself have pandas make the first guess
df['datetime'] = pd.to_datetime(df['datetime'], infer_datetime_format=True)
0 2019-11-13 16:28:05.779
Name: datetime, dtype: datetime64[ns]
You can solve this probably by using the parameter infer_datetime_format=True. Here's an example:
df = {}
df['datetime'] = '2019-11-13 16:28:05.779'
df['datetime'] = pd.to_datetime(df['datetime'], infer_datetime_format=True)
print(df['datetime'])
print(type(df['datetime'])
Output:
2019-11-13 16:28:05.779000
<class 'pandas._libs.tslibs.timestamps.Timestamp'>
Here is the pandas.to_datetime() call with the correct format string: pd.to_datetime(df['datetime'], format="%Y/%m/%d %H:%M:%S")
You were missing a space, %I is for 12-hour time (the example time you gave is 16:28, and %p is, to quote the docs, the Locale’s equivalent of either AM or PM.
We are struggling with formatting datetime in Python 3, and we can't seem to figure it out by our own. So far, we have formatted our dataframe to datetime, so that it should be '%Y-%m-%d %H:%M:%S':
before
02-01-2011 22:00:00
after
2011-01-02 22:00:00
For some very odd reason, when datetime is
13-01-2011 00:00:00
it is changed to this
2011-13-01 00:00:00
And from there it's mixing months with days and is therefore counting months instead of days.
This is all of our code for this datetime formatting:
df['local_date']=df['local_date'] + ':00'
df['local_date'] = pd.to_datetime(df.local_date)
df['local_date']=df['local_date'].dt.strftime('%Y-%m-%d %H:%M:%S')
UPDATED CODE WHICH WORKS:
df['local_date']=df['local_date'] + ':00'
df['local_date'] = pd.to_datetime(df.local_date.str.strip(), format='%d-%m-%Y %H:%M:%S')
df['local_date']=df['local_date'].dt.strftime('%Y-%m-%d %H:%M:%S')
Can't say for sure, but I believe this has to do with the warning mentioned in the documentation of to_datetime:
dayfirst : boolean, default False
Specify a date parse order if arg is str or its list-likes. If True, parses dates with the day first, eg 10/11/12 is parsed as 2012-11-10. Warning: dayfirst=True is not strict, but will prefer to parse with day first (this is a known bug, based on dateutil behavior).
I think the way to get around this is by explicitly pssing a format string to to_datetime:
df['local_date'] = pd.to_datetime(df.local_date, format='%d-%m-%Y %H:%M:%S')
This way it won't accidentally mix months and days (but it will raise an error if any line has a different format)
import pandas as pd
local_date = "13-01-2011 00:00"
local_date = local_date + ":00"
local_date = pd.to_datetime(local_date, format='%d-%m-%Y %H:%M:%S')
local_date = local_date.strftime('%Y-%m-%d %H:%M:%S')
print(local_date)
The output is:
2011-01-13 00:00:00
I'm trying to convert timestamps in EST to various localized timestamps in a pandas dataframe. I have a dataframe with timestamps in EST and a timezone into which they need to be converted.
I know that there are several threads already on topics like this. However, they either start in UTC or I can't replicate with my data.
Before writing, I consulted: How to convert GMT time to EST time using python
I imported the data:
import pandas
import datetime as dt
import pytz
transaction_timestamp_est local_timezone
2013-05-28 05:18:00+00:00 America/Chicago
2013-06-12 05:23:20+00:00 America/Los_Angeles
2014-06-21 05:26:26+00:00 America/New_York
I converted to datetime and created the following function:
df.transaction_timestamp_est =
pd.to_datetime(df.transaction_timestamp_est)
def db_time_to_local(row):
db_tz = pytz.timezone('America/New_York')
local_tz = pytz.timezone(row['local_timezone'])
db_date = db_tz.localize(row['transaction_timestamp_est'])
local_date = db_date.astimezone(local_tz)
return local_date
I run it here:
df['local_timestamp'] = df.apply(db_time_to_local, axis=1)
And get this error:
ValueError: ('Not naive datetime (tzinfo is already set)', 'occurred at index 0')
I expect a new column in the dataframe called 'local_timestamp' that has the timestamp adjusted according to the data in the local_timezone column.
Any help is appreciated!
The error you see looks like its because you are trying to localize a tz-aware timestamp. The '+00:00' in your timestamps indicates these are tz-aware, in UTC (or something like it).
Some terminology: a naive date/time has no concept of timezone, a tz-aware (or localised) one is associated with a particular timezone. Localizing refers to converting a tz-naive date/time to a tz-aware one. By definition you can't localize a tz-aware date/time: you either either convert it to naive and then localize, or convert directly to the target timezone.
To get that column into EST, convert to naive and then localize to EST:
In [98]: df['transaction_timestamp_est'] = df['transaction_timestamp_est'].dt.tz_localize(None).dt.tz_localize('EST')
In [99]: df
Out [99]:
0 2013-05-28 05:18:00-05:00
1 2013-06-12 05:23:20-05:00
2 2014-06-21 05:26:26-05:00
Name: transaction_timestamp_est, dtype: datetime64[ns, EST]
Note the 'EST' in the dtype.
Then, you can convert each timestamp to its target timezone:
In [100]: df['local_ts'] = df.apply(lambda x: x[0].tz_convert(x[1]), axis=1)
In [101]: df
Out[101]:
transaction_timestamp_est local_timezone local_ts
0 2013-05-28 05:18:00-05:00 America/Chicago 2013-05-28 05:18:00-05:00
1 2013-06-12 05:23:20-05:00 America/Los_Angeles 2013-06-12 03:23:20-07:00
2 2014-06-21 05:26:26-05:00 America/New_York 2014-06-21 06:26:26-04:00
To explain: each element in the first column is of type pd.Timestamp. Its tz_convert() method changes its timezone, converting the date/time to the new zone.
This produces a column of pd.Timestamps with a mixture of timezones, which is a pain to handle in pandas. Most (perhaps all) pandas functions that operate on columns of date/times require the whole column to have the same timezone.
If you prefer, convert to tz-naive:
In [102]: df['local_ts'] = df.apply(lambda x: x[0].tz_convert(x[1]).tz_convert(None), axis=1)
In [103]: df
Out[103]:
transaction_timestamp_est local_timezone local_ts
0 2013-05-28 05:18:00-05:00 America/Chicago 2013-05-28 10:18:00
1 2013-06-12 05:23:20-05:00 America/Los_Angeles 2013-06-12 10:23:20
2 2014-06-21 05:26:26-05:00 America/New_York 2014-06-21 10:26:26
If your data allows, its better to try to keep columns of timestamps (or indices) in a single timezone. UTC is usually best as it doesnt have DST transitions or other issues that can result in missing / ambiguous times, as most other timezones do.
from datetime import datetime, time, date
from pytz import timezone, utc
tz = timezone("Asia/Dubai")
d = datetime.fromtimestamp(1426017600,tz)
print d
midnight = tz.localize(datetime.combine(date(d.year, d.month, d.day),time(0,0)), is_dst=None)
print int((midnight - datetime(1970, 1, 1, tzinfo=utc)).total_seconds())
Based on code from python - datetime with timezone to epoch
I tried:
df["datetime_obj"] = df["datetime"].apply(lambda dt: datetime.strptime(dt, "%d/%m/%Y %H:%M"))
but got this error:
ValueError: time data '10/11/2006 24:00' does not match format
'%d/%m/%Y %H:%M'
How to solve it correctly?
The reason why this does not work is because the %H parameter only accepts values in the range of 00 to 23 (both inclusive). This thus means that 24:00 is - like the error says - not a valid time string.
I think therefore we have not much other options than convert the string to a valid format. We can do this by first replacing 24:00 with 00:00, and then later increment the day for these timestamps.
Like:
from datetime import timedelta
import pandas as pd
df['datetime_zero'] = df['datetime'].str.replace('24:00', '0:00')
df['datetime_er'] = pd.to_datetime(df['datetime_zero'], format='%d/%m/%Y %H:%M')
selrow = df['datetime'].str.contains('24:00')
df['datetime_obj'] = df['datetime_er'] + selrow * timedelta(days=1)
The last line thus adds one day to the rows that contain 24:00, such that '10/11/2006 24:00' gets converted to '11/11/2006 24:00'. Note however that the above is rather unsafe since depending on the format of the timestamp this will/will not work. For the above it will (probably) work, since there is only one colon. But if for example the datetimes have seconds as well, the filter could get triggered for 00:24:00, so it might require some extra work to get it working.
Your data doesn't follow the conventions used by Python / Pandas datetime objects. There should be only one way of storing a particular datetime, i.e. '10/11/2006 24:00' should be rewritten as '11/11/2006 00:00'.
Here's one way to approach the problem:
# find datetimes which have '24:00' and rewrite
twenty_fours = df['strings'].str[-5:] == '24:00'
df.loc[twenty_fours, 'strings'] = df['strings'].str[:-5] + '00:00'
# construct datetime series
df['datetime'] = pd.to_datetime(df['strings'], format='%d/%m/%Y %H:%M')
# add one day where applicable
df.loc[twenty_fours, 'datetime'] += pd.DateOffset(1)
Here's some data to test:
dateList = ['10/11/2006 24:00', '11/11/2006 00:00', '12/11/2006 15:00']
df = pd.DataFrame({'strings': dateList})
Result after transformations described above:
print(df['datetime'])
0 2006-11-11 00:00:00
1 2006-11-11 00:00:00
2 2006-11-12 15:00:00
Name: datetime, dtype: datetime64[ns]
As indicated in the documentation (https://docs.python.org/2/library/datetime.html#strftime-strptime-behavior), hours go from 00 to 23. 24:00 is then an error.