I have a dataframe that has entries like this, where the times are in UTC:
start_date_time timezone
1 2017-01-01 14:00:00 America/Los_Angeles
2 2017-01-01 14:00:00 America/Denver
3 2017-01-01 14:00:00 America/Phoenix
4 2017-01-01 14:30:00 America/Los_Angeles
5 2017-01-01 14:30:00 America/Los_Angeles
I need to be able to group by date (local date, not UTC date) and I need to be able to create indicators for whether the event happened between certain times (local times, not UTC times).
I have successfully done the above in R by:
Creating a time variable in each of the timezones
Converting those to strings
Pulling each of the string date/time variables into one column, which one I pull depends on the appropriate timezone
Then, splitting that column to get a string date column and a string time column
I can then convert everything back to datetime objects for comparisons. e.g. now I can say if something happened between 2 and 3pm and it will correctly identify everything that happened between 2 and 3pm locally.
I have tried a bunch in python and have the dates as
2017-01-02 04:30:00-08:00
but I can't figure out how to go from there to
2017-01-01 20:30:00
Thanks!
Your example is incorrect. Your timezone is eight hours behind UTC, which means you need to add eight hours to 4:30AM which is 12:30PM UTC time.
The datetime object function astimezone(...) will do the conversion for you. For ease of use, I recommend pytz.
However in pure python:
import datetime as dt
local_tz = dt.timezone(dt.timedelta(hours=-8))
utc = dt.timezone.utc
d = dt.datetime(2017, 1, 2, 4, 30, 0, 0, local_tz)
print(d, d.astimezone(utc))
Will print:
2017-01-02 04:30:00-08:00 2017-01-02 12:30:00+00:00
Here's an example using pytz to lookup time zones:
import datetime as dt
import pytz
dates = [("2017-01-01 14:00:00", "America/Los_Angeles"),
("2017-01-01 14:00:00", "America/Denver"),
("2017-01-01 14:00:00", "America/Phoenix"),
("2017-01-01 14:30:00", "America/Los_Angeles"),
]
for d, tz_str in dates:
start = dt.datetime.strptime(d, "%Y-%m-%d %H:%M:%S")
start = start.replace(tzinfo=pytz.utc)
local_tz = pytz.timezone(tz_str) # convert to desired timezone
print(start, local_tz.zone, "\t", start.astimezone(local_tz))
This produces:
2017-01-01 14:00:00+00:00 America/Los_Angeles 2017-01-01 06:00:00-08:00
2017-01-01 14:00:00+00:00 America/Denver 2017-01-01 07:00:00-07:00
2017-01-01 14:00:00+00:00 America/Phoenix 2017-01-01 07:00:00-07:00
2017-01-01 14:30:00+00:00 America/Los_Angeles 2017-01-01 06:30:00-08:00
Related
I am retrieving data from an API which is timestamped in UNIX millisecond time and am trying to save this data to a CSV file. The data is in daily intervals but represented in UNIX millisecond time as mentioned.
I am using pandas functions to convert from milliseconds to datetime but is still not saving the data with the time of day part. My code is as follows:
ticker = 'tBTCUSD'
r = requests.get(url, params = params)
data = pd.DataFrame(r.json())
data.set_index([0], inplace = True)
data.index = pd.to_datetime(data.index, unit = 'ms' )
data.to_csv('bitfinex_{}_usd_{}.csv'.format(ticker[1:-3].lower(), '1D'), mode='a', header=False)
It saves the data as 2020-08-21 instead of 2020-08-21 00:00:00. When I poll the API on say, an hourly or 15-minutely basis, that still includes the time but on daily intervals it doesn't. I was wondering if there is a step that I am missing to convert the time accordingly from UNIX millisecond to a %Y-%m-%d %H:%M:%S %Z format?
You can always explicitly specify the format:
data.index = pd.to_datetime(data.index, unit='ms').strftime('%Y-%m-%d %H:%M:%S UTC')
print(data)
1 2 3 4 5
0
2020-09-10 00:00:00 UTC 10241.000000 10333.862868 10516.00000 10233.087967 3427.178984
2020-09-09 00:00:00 UTC 10150.000000 10240.000000 10359.00000 10010.000000 2406.147398
2020-09-08 00:00:00 UTC 10400.000000 10148.000000 10464.00000 9882.400000 6761.138356
2020-09-07 00:00:00 UTC 10275.967600 10397.000000 10430.00000 9913.800000 6301.951492
2020-09-06 00:00:00 UTC 10197.000000 10276.000000 10365.07422 10031.000000 2755.663001
... ... ... ... ... ...
2020-05-18 00:00:00 UTC 9668.200000 9714.825163 9944.00000 9450.000000 9201.536549
2020-05-17 00:00:00 UTC 9386.000000 9668.200000 9883.50000 9329.700000 9663.262087
2020-05-16 00:00:00 UTC 9307.600000 9387.952090 9580.00000 9222.000000 4157.691762
2020-05-15 00:00:00 UTC 9791.000000 9311.200000 9848.90000 9130.200000 11340.269781
2020-05-14 00:00:00 UTC 9311.967387 9790.954158 9938.70000 9266.200000 12867.687617
The starting date format I currently have is 2019-09-04 16:00 UTC+3 and I'm trying to convert it into a datetime format of 2019-09-04 16:00:00+0300.
The format I thought would work was format='%Y-%m-%d %H:%M %Z%z', but when I run it I get the error message ValueError: Cannot parse both %Z and %z.
Does anyone know the correct format to use, or should I be trying a different method altogether? Thanks.
Edit
Sorry, I had a hard time putting into words what it is I am looking to do, hopefully I can clarify.
I'm looking to change all the date and times in a dataframe into the datetime format.
This is the method I was trying to use which presented me with an error
df['datepicker'] = pd.to_datetime(df['datepicker'], format='%Y-%m-%d %H:%M %Z%z')
And here is a sample of the data I currently have.
datepicker
2019-09-07 16:00 UTC+2
2019-09-04 18:30 UTC+4
2019-09-06 17:00 UTC±0
2019-09-10 16:00 UTC+1
2019-09-04 18:00 UTC+3
And this is what I'm looking to convert them into, a timestamp format.
datepicker
2019-09-07 16:00:00+0200
2019-09-04 18:30:00+0400
2019-09-06 17:00:00+0000
2019-09-10 16:00:00+0100
2019-09-04 18:00:00+0300
pandas.to_datetime should parse this happily if you tweak the strings slightly:
import pandas as pd
df = pd.DataFrame({"datepicker":[ "2019-09-07 16:00 UTC+2", "2019-09-04 18:30 UTC+4",
"2019-09-06 17:00 UTC±0", "2019-09-10 16:00 UTC+1",
"2019-09-04 18:00 UTC+3"]})
df['datetime'] = pd.to_datetime(df['datepicker'].str.replace('±', '+'))
# df['datetime']
# 0 2019-09-07 16:00:00-02:00
# 1 2019-09-04 18:30:00-04:00
# 2 2019-09-06 17:00:00+00:00
# 3 2019-09-10 16:00:00-01:00
# 4 2019-09-04 18:00:00-03:00
# Name: datetime, dtype: object
Note that due to the mixed UTC offsets, the column's data type is 'object' (datetime objects). If you wish, you can also convert to UTC straight away, to get a column of dtype datetime[ns]:
df['UTC'] = pd.to_datetime(df['datepicker'].str.replace('±', '+'), utc=True)
# df['UTC']
# 0 2019-09-07 18:00:00+00:00
# 1 2019-09-04 22:30:00+00:00
# 2 2019-09-06 17:00:00+00:00
# 3 2019-09-10 17:00:00+00:00
# 4 2019-09-04 21:00:00+00:00
# Name: UTC, dtype: datetime64[ns, UTC]
When i defined as below. it works as you expect.
from datetime import datetime, timedelta, timezone
UTC = timezone(timedelta(hours=+3))
dt = datetime(2019, 1, 1, 12, 0, 0, tzinfo=UTC)
timestampStr = dt.strftime("%Y-%m-%d %H:%M %Z%z")
print(timestampStr)
With the output of:
2019-01-01 12:00 UTC+03:00+0300
I have the following column
Time
2:00
00:13
1:00
00:24
in object format (strings). This time refers to hours and minutes ago from a time that I need to use as a start: 8:00 (it might change; in this example is 8:00).
Since the times in the column Time are referring to hours/minutes ago, what I would like to expect should be
Time
6:00
07:47
7:00
07:36
calculated as time difference (e.g. 8:00 - 2:00).
However, I am having difficulties in doing this calculation and transform the result in a datetime (keeping only hours and minutes).
I hope you can help me.
Since the Time columns contains only Hour:Minute I suggest using timedelta instead of datetime:
df['Time'] = pd.to_timedelta(df.Time+':00')
df['Start_Time'] = pd.to_timedelta('8:00:00') - df['Time']
Output:
Time Start_Time
0 02:00:00 06:00:00
1 00:13:00 07:47:00
2 01:00:00 07:00:00
3 00:24:00 07:36:00
you can do it using pd.to_datetime.
ref = pd.to_datetime('08:00') #here define the hour of reference
s = ref-pd.to_datetime(df['Time'])
print (s)
0 06:00:00
1 07:47:00
2 07:00:00
3 07:36:00
Name: Time, dtype: timedelta64[ns]
This return a series, that can be change to a dataframe with s.to_frame() for example
I'm using rrule as shown here:
https://labix.org/python-dateutil#head-470fa22b2db72000d7abe698a5783a46b0731b57
I'm wondering if it somehow possible to create a rule where different times are specified for different weekdays
e.g. WEEKLY Thursday 6pm and Saturday 10am
Hope someone can help :)
A single rrule can not specify both pairs of days and hours, but you could use a rrule.ruleset to combine rrules:
import datetime as DT
import dateutil.rrule as RR
today = DT.date.today()
aset = RR.rruleset()
aset.rrule(RR.rrule(RR.WEEKLY, byweekday=RR.TH, byhour=18, count=3, dtstart=today))
aset.rrule(RR.rrule(RR.WEEKLY, byweekday=RR.SA, byhour=10, count=3, dtstart=today))
for date in aset:
print(date)
yields
2015-03-26 18:00:00
2015-03-28 10:00:00
2015-04-02 18:00:00
2015-04-04 10:00:00
2015-04-09 18:00:00
2015-04-11 10:00:00
This is my current code
class TimeSeries():
def year(year):
today = datetime.now()
start_date = today+relativedelta(years=-1)
mint, maxt = datetime.min.time(), datetime.max.time()
for st in rrule(MONTHLY, count=24, bymonthday=(1,-1,), dtstart=start_date):
yield st.combine(st, mint)
And this is output from this:
for y in TimeSeries().year():
print(y)
2013-01-31 00:00:00
2013-02-01 00:00:00
2013-02-28 00:00:00
2013-03-01 00:00:00
2013-03-31 00:00:00
2013-04-01 00:00:00
2013-04-30 00:00:00
2013-05-01 00:00:00
2013-05-31 00:00:00
2013-06-01 00:00:00
2013-06-30 00:00:00
2013-07-01 00:00:00
2013-07-31 00:00:00
2013-08-01 00:00:00
2013-08-31 00:00:00
2013-09-01 00:00:00
2013-09-30 00:00:00
2013-10-01 00:00:00
2013-10-31 00:00:00
2013-11-01 00:00:00
2013-11-30 00:00:00
2013-12-01 00:00:00
2013-12-31 00:00:00
2014-01-01 00:00:00
The question is how I can force that counting are started from 2013-01-01 00:00:00 and month end like 2013-01-31 23:59:59 and so on.
And the end of loop ends on 2014-01-31 23:59:59 instead 2014-01-01 00:00:00
Also I like make start date and end date on one line:
2013-03-01 00:00:00 2013-03-31 23:59:59
2013-04-01 00:00:00 2013-03-30 23:59:59
...
...
2014-01-01 00:00:00 2014-01-31 23:59:59
Any suggestion?
First, are you really sure that you want 2013-03-31 23:59:59. Date intervals are traditionally specified as half-open intervals—just like ranges in Python. And the reason for this is that 23:59:59 is not actually the end of a day.
Most obviously, 23:59:59.001 is later than that but on the same day. Python datetime objects include microseconds, so this isn't just a "meh, whatever" problem—if you, e.g., call now(), you can get a time that's incorrectly later than your "end of the day" on the same day.
Less obviously, on a day with a leap second, 23:59:60 is also later but on the same day.
But if you really want this, there are two obvious ways to get it:
You're already iterating dates instead of datetimes and combining the times in manually. And it's obvious when you're dealing with a day 1 vs. day -1, because the date's day member will be 1 or it won't be. So:
class TimeSeries():
def year(year):
today = datetime.now()
start_date = today+relativedelta(years=-1)
mint, maxt = datetime.min.time(), datetime.max.time()
for st in rrule(MONTHLY, count=24, bymonthday=(1, -1,), dtstart=start_date):
yield st.combine(st, mint if st.day=1 else maxt)
Alternatively, instead of iterating both first and last days, just iterate first days, and subtract a second to get the last second of the previous month:
class TimeSeries():
def year(year):
today = datetime.now()
start_date = today+relativedelta(years=-1)
mint, maxt = datetime.min.time(), datetime.max.time()
for st in rrule(MONTHLY, count=24, bymonthday=(1,), dtstart=start_date):
dt = st.combine(st, mint)
yield dt - timedelta(seconds=1)
yield dt
As far as printing these in pairs… well, as written, that's an underspecified problem. The first value in your list is the second value in a pair—except when you run this on the 1st of a month. And likewise, the last date is the first value in a pair, except when you run this on the 31st. So, what do you want to do with them?
If this isn't obvious, look at your example. Your first value is 2013-01-31 00:00:00, but your first pair doesn't start with 2013-01-31.
There are many things you could want here:
Start with the first of the month a year ago, rather than the first first-or-last of the month that happened within the last year. And likewise for the end. So you would have 2013-01-01 in your list, and there would always be pairs.
Start with the first month that started within the last year, and likewise for the end. So you wouldn't get 2013-01-31 in your list, and there would always be pairs.
Use your current rule, and there's not a pair, use None for the missing value.
etc.
Whatever rule you actually want can be coded up pretty easily. And then you'll probably want to yield in (start, end) tuples, so the print loop can just do this:
for start, end in TimeSeries().year():
print(start, end)