How can I parse this date format into datetime? Python/Pandas - python

The starting date format I currently have is 2019-09-04 16:00 UTC+3 and I'm trying to convert it into a datetime format of 2019-09-04 16:00:00+0300.
The format I thought would work was format='%Y-%m-%d %H:%M %Z%z', but when I run it I get the error message ValueError: Cannot parse both %Z and %z.
Does anyone know the correct format to use, or should I be trying a different method altogether? Thanks.
Edit
Sorry, I had a hard time putting into words what it is I am looking to do, hopefully I can clarify.
I'm looking to change all the date and times in a dataframe into the datetime format.
This is the method I was trying to use which presented me with an error
df['datepicker'] = pd.to_datetime(df['datepicker'], format='%Y-%m-%d %H:%M %Z%z')
And here is a sample of the data I currently have.
datepicker
2019-09-07 16:00 UTC+2
2019-09-04 18:30 UTC+4
2019-09-06 17:00 UTC±0
2019-09-10 16:00 UTC+1
2019-09-04 18:00 UTC+3
And this is what I'm looking to convert them into, a timestamp format.
datepicker
2019-09-07 16:00:00+0200
2019-09-04 18:30:00+0400
2019-09-06 17:00:00+0000
2019-09-10 16:00:00+0100
2019-09-04 18:00:00+0300

pandas.to_datetime should parse this happily if you tweak the strings slightly:
import pandas as pd
df = pd.DataFrame({"datepicker":[ "2019-09-07 16:00 UTC+2", "2019-09-04 18:30 UTC+4",
"2019-09-06 17:00 UTC±0", "2019-09-10 16:00 UTC+1",
"2019-09-04 18:00 UTC+3"]})
df['datetime'] = pd.to_datetime(df['datepicker'].str.replace('±', '+'))
# df['datetime']
# 0 2019-09-07 16:00:00-02:00
# 1 2019-09-04 18:30:00-04:00
# 2 2019-09-06 17:00:00+00:00
# 3 2019-09-10 16:00:00-01:00
# 4 2019-09-04 18:00:00-03:00
# Name: datetime, dtype: object
Note that due to the mixed UTC offsets, the column's data type is 'object' (datetime objects). If you wish, you can also convert to UTC straight away, to get a column of dtype datetime[ns]:
df['UTC'] = pd.to_datetime(df['datepicker'].str.replace('±', '+'), utc=True)
# df['UTC']
# 0 2019-09-07 18:00:00+00:00
# 1 2019-09-04 22:30:00+00:00
# 2 2019-09-06 17:00:00+00:00
# 3 2019-09-10 17:00:00+00:00
# 4 2019-09-04 21:00:00+00:00
# Name: UTC, dtype: datetime64[ns, UTC]

When i defined as below. it works as you expect.
from datetime import datetime, timedelta, timezone
UTC = timezone(timedelta(hours=+3))
dt = datetime(2019, 1, 1, 12, 0, 0, tzinfo=UTC)
timestampStr = dt.strftime("%Y-%m-%d %H:%M %Z%z")
print(timestampStr)
With the output of:
2019-01-01 12:00 UTC+03:00+0300

Related

Datetime Daylight savings transition problem [Python]

I get the following error for my timestamp field when converting from UTC to CET due to the transition to daylight savings last Saturday/Sunday:
AmbiguousTimeError: Cannot infer dst time from 2020-07-31 11:17:18+00:00, try using the 'ambiguous' argument
#converting timestamp fields to CET (Europe,Berlin)
df['timestamp_berlin_time'] = df['timestamp'].dt.tz_localize('Europe/Berlin')
I tried the following snippet:
df['timestamp_berlin_time'] = df['timestamp'].dt.tz_localize('CET',ambiguous='infer')
but this gives me then this error:
AmbiguousTimeError: 2020-07-31 11:17:18+00:00
Data sample:
0 2020-07-31 11:17:18+00:00
1 2020-07-31 11:17:18+00:00
2 2020-08-31 16:26:42+00:00
3 2020-10-20 07:28:46+00:00
4 2020-10-01 22:11:33+00:00
Name: timestamp, dtype: datetime64[ns, UTC]
If your input is UTC but UTC isn't set yet, you can localize to UTC first, here e.g.:
df['timestamp'] = df['timestamp'].dt.tz_localize('UTC')
If your input already is converted to UTC, you can simply tz_convert, e.g.:
s = pd.Series(pd.to_datetime(['2020-10-25 00:40:03.925000',
'2020-10-25 01:40:03.925000',
'2020-10-25 02:40:03.925000'], utc=True))
s.dt.tz_convert('Europe/Berlin')
# 0 2020-10-25 02:40:03.925000+02:00
# 1 2020-10-25 02:40:03.925000+01:00
# 2 2020-10-25 03:40:03.925000+01:00
# dtype: datetime64[ns, Europe/Berlin]
If your input timestamps represent local time (here: Europe/Berlin time zone), you can try to infer the DST transition based on order:
s = pd.Series(pd.to_datetime(['2020-10-25 02:40:03.925000',
'2020-10-25 02:40:03.925000',
'2020-10-25 03:40:03.925000']))
s.dt.tz_localize('Europe/Berlin', ambiguous='infer')
# 0 2020-10-25 02:40:03.925000+02:00
# 1 2020-10-25 02:40:03.925000+01:00
# 2 2020-10-25 03:40:03.925000+01:00
# dtype: datetime64[ns, Europe/Berlin]
Note: CET is not a time zone in a geographical sense. pytz can handle some of these for historical reasons but don't count on it. In any case, it might give you static tz offsets - which is not what you want if you expect it to include DST transitions.

Pandas dataframe not including time of day when converting from UNIX

I am retrieving data from an API which is timestamped in UNIX millisecond time and am trying to save this data to a CSV file. The data is in daily intervals but represented in UNIX millisecond time as mentioned.
I am using pandas functions to convert from milliseconds to datetime but is still not saving the data with the time of day part. My code is as follows:
ticker = 'tBTCUSD'
r = requests.get(url, params = params)
data = pd.DataFrame(r.json())
data.set_index([0], inplace = True)
data.index = pd.to_datetime(data.index, unit = 'ms' )
data.to_csv('bitfinex_{}_usd_{}.csv'.format(ticker[1:-3].lower(), '1D'), mode='a', header=False)
It saves the data as 2020-08-21 instead of 2020-08-21 00:00:00. When I poll the API on say, an hourly or 15-minutely basis, that still includes the time but on daily intervals it doesn't. I was wondering if there is a step that I am missing to convert the time accordingly from UNIX millisecond to a %Y-%m-%d %H:%M:%S %Z format?
You can always explicitly specify the format:
data.index = pd.to_datetime(data.index, unit='ms').strftime('%Y-%m-%d %H:%M:%S UTC')
print(data)
1 2 3 4 5
0
2020-09-10 00:00:00 UTC 10241.000000 10333.862868 10516.00000 10233.087967 3427.178984
2020-09-09 00:00:00 UTC 10150.000000 10240.000000 10359.00000 10010.000000 2406.147398
2020-09-08 00:00:00 UTC 10400.000000 10148.000000 10464.00000 9882.400000 6761.138356
2020-09-07 00:00:00 UTC 10275.967600 10397.000000 10430.00000 9913.800000 6301.951492
2020-09-06 00:00:00 UTC 10197.000000 10276.000000 10365.07422 10031.000000 2755.663001
... ... ... ... ... ...
2020-05-18 00:00:00 UTC 9668.200000 9714.825163 9944.00000 9450.000000 9201.536549
2020-05-17 00:00:00 UTC 9386.000000 9668.200000 9883.50000 9329.700000 9663.262087
2020-05-16 00:00:00 UTC 9307.600000 9387.952090 9580.00000 9222.000000 4157.691762
2020-05-15 00:00:00 UTC 9791.000000 9311.200000 9848.90000 9130.200000 11340.269781
2020-05-14 00:00:00 UTC 9311.967387 9790.954158 9938.70000 9266.200000 12867.687617

Python - Local Time

I have a dataframe that has entries like this, where the times are in UTC:
start_date_time timezone
1 2017-01-01 14:00:00 America/Los_Angeles
2 2017-01-01 14:00:00 America/Denver
3 2017-01-01 14:00:00 America/Phoenix
4 2017-01-01 14:30:00 America/Los_Angeles
5 2017-01-01 14:30:00 America/Los_Angeles
I need to be able to group by date (local date, not UTC date) and I need to be able to create indicators for whether the event happened between certain times (local times, not UTC times).
I have successfully done the above in R by:
Creating a time variable in each of the timezones
Converting those to strings
Pulling each of the string date/time variables into one column, which one I pull depends on the appropriate timezone
Then, splitting that column to get a string date column and a string time column
I can then convert everything back to datetime objects for comparisons. e.g. now I can say if something happened between 2 and 3pm and it will correctly identify everything that happened between 2 and 3pm locally.
I have tried a bunch in python and have the dates as
2017-01-02 04:30:00-08:00
but I can't figure out how to go from there to
2017-01-01 20:30:00
Thanks!
Your example is incorrect. Your timezone is eight hours behind UTC, which means you need to add eight hours to 4:30AM which is 12:30PM UTC time.
The datetime object function astimezone(...) will do the conversion for you. For ease of use, I recommend pytz.
However in pure python:
import datetime as dt
local_tz = dt.timezone(dt.timedelta(hours=-8))
utc = dt.timezone.utc
d = dt.datetime(2017, 1, 2, 4, 30, 0, 0, local_tz)
print(d, d.astimezone(utc))
Will print:
2017-01-02 04:30:00-08:00 2017-01-02 12:30:00+00:00
Here's an example using pytz to lookup time zones:
import datetime as dt
import pytz
dates = [("2017-01-01 14:00:00", "America/Los_Angeles"),
("2017-01-01 14:00:00", "America/Denver"),
("2017-01-01 14:00:00", "America/Phoenix"),
("2017-01-01 14:30:00", "America/Los_Angeles"),
]
for d, tz_str in dates:
start = dt.datetime.strptime(d, "%Y-%m-%d %H:%M:%S")
start = start.replace(tzinfo=pytz.utc)
local_tz = pytz.timezone(tz_str) # convert to desired timezone
print(start, local_tz.zone, "\t", start.astimezone(local_tz))
This produces:
2017-01-01 14:00:00+00:00 America/Los_Angeles 2017-01-01 06:00:00-08:00
2017-01-01 14:00:00+00:00 America/Denver 2017-01-01 07:00:00-07:00
2017-01-01 14:00:00+00:00 America/Phoenix 2017-01-01 07:00:00-07:00
2017-01-01 14:30:00+00:00 America/Los_Angeles 2017-01-01 06:30:00-08:00

How to convert normal string to datetime in Pandas

I have normal strings with more than millions data points from .csv file with format as below:
Datetime
22/12/2015 17:00:00
22/12/2015 18:00:00
I loaded into pandas and tried to converted into datetime format by using pandas.to_datetime(df['Datetime']). However, the new time series data I got that is not correct. There are some new Datetime produced during converting process. For example, 2016-12-11 23:30:00 that does not contain in original data.
It has been a while that I worked with panda, but in your example you have a different dateformat than in the example lines from csv:
yyyy-mm-dd hh:mm:ss
instead of
mm/dd/yyyy hh:mm:ss
the to_datetime function takes a parameter "format", this should help if that is the cause.
You want to use the option dayfirst=True
pd.to_datetime(df.Datetime, dayfirst=True)
This:
Datetime
22/12/2015 17:00:00
22/12/2015 18:00:00
11/12/2015 23:30:00
Gets converted to
0 2015-12-22 17:00:00
1 2015-12-22 18:00:00
2 2015-12-11 23:30:00
Name: Datetime, dtype: datetime64[ns]

python-dateutil - RRule - Different times for different weekdays

I'm using rrule as shown here:
https://labix.org/python-dateutil#head-470fa22b2db72000d7abe698a5783a46b0731b57
I'm wondering if it somehow possible to create a rule where different times are specified for different weekdays
e.g. WEEKLY Thursday 6pm and Saturday 10am
Hope someone can help :)
A single rrule can not specify both pairs of days and hours, but you could use a rrule.ruleset to combine rrules:
import datetime as DT
import dateutil.rrule as RR
today = DT.date.today()
aset = RR.rruleset()
aset.rrule(RR.rrule(RR.WEEKLY, byweekday=RR.TH, byhour=18, count=3, dtstart=today))
aset.rrule(RR.rrule(RR.WEEKLY, byweekday=RR.SA, byhour=10, count=3, dtstart=today))
for date in aset:
print(date)
yields
2015-03-26 18:00:00
2015-03-28 10:00:00
2015-04-02 18:00:00
2015-04-04 10:00:00
2015-04-09 18:00:00
2015-04-11 10:00:00

Categories

Resources