I am retrieving data from an API which is timestamped in UNIX millisecond time and am trying to save this data to a CSV file. The data is in daily intervals but represented in UNIX millisecond time as mentioned.
I am using pandas functions to convert from milliseconds to datetime but is still not saving the data with the time of day part. My code is as follows:
ticker = 'tBTCUSD'
r = requests.get(url, params = params)
data = pd.DataFrame(r.json())
data.set_index([0], inplace = True)
data.index = pd.to_datetime(data.index, unit = 'ms' )
data.to_csv('bitfinex_{}_usd_{}.csv'.format(ticker[1:-3].lower(), '1D'), mode='a', header=False)
It saves the data as 2020-08-21 instead of 2020-08-21 00:00:00. When I poll the API on say, an hourly or 15-minutely basis, that still includes the time but on daily intervals it doesn't. I was wondering if there is a step that I am missing to convert the time accordingly from UNIX millisecond to a %Y-%m-%d %H:%M:%S %Z format?
You can always explicitly specify the format:
data.index = pd.to_datetime(data.index, unit='ms').strftime('%Y-%m-%d %H:%M:%S UTC')
print(data)
1 2 3 4 5
0
2020-09-10 00:00:00 UTC 10241.000000 10333.862868 10516.00000 10233.087967 3427.178984
2020-09-09 00:00:00 UTC 10150.000000 10240.000000 10359.00000 10010.000000 2406.147398
2020-09-08 00:00:00 UTC 10400.000000 10148.000000 10464.00000 9882.400000 6761.138356
2020-09-07 00:00:00 UTC 10275.967600 10397.000000 10430.00000 9913.800000 6301.951492
2020-09-06 00:00:00 UTC 10197.000000 10276.000000 10365.07422 10031.000000 2755.663001
... ... ... ... ... ...
2020-05-18 00:00:00 UTC 9668.200000 9714.825163 9944.00000 9450.000000 9201.536549
2020-05-17 00:00:00 UTC 9386.000000 9668.200000 9883.50000 9329.700000 9663.262087
2020-05-16 00:00:00 UTC 9307.600000 9387.952090 9580.00000 9222.000000 4157.691762
2020-05-15 00:00:00 UTC 9791.000000 9311.200000 9848.90000 9130.200000 11340.269781
2020-05-14 00:00:00 UTC 9311.967387 9790.954158 9938.70000 9266.200000 12867.687617
Related
I have a dataframe with columns:
time: time in UTC format
timezone: the corresponding timezone.
time timezone
0 2022-12-28T20:16:31.373Z Europe/Athens
1 2022-07-28T20:16:31.373Z Europe/Athens
2 2022-11-01T21:35:35.865Z Europe/Dublin
3 2022-08-03T19:44:07.611Z America/Los_Angeles
4 2022-08-02T12:44:44.360Z Europe/Minsk
I want to:
Convert UTC time to Local time (using timezone)
Remove the Timezone and just keep the datetime
It seems to me that this solution works, but want to make sure that I am not missing something (eg. doesn't deal with dailight saving or something)
import pandas as pd
# example dataframe
df = pd.DataFrame({
'time' : ['2022-12-28T20:16:31.373Z', '2022-07-28T20:16:31.373Z', '2022-11-01T21:35:35.865Z', '2022-08-03T19:44:07.611Z', '2022-08-02T12:44:44.360Z'],
'timezone': ['Europe/Athens', 'Europe/Athens', 'Europe/Dublin', 'America/Los_Angeles', 'Europe/Minsk']
})
# function
def get_local_time (timestamp: pd.Timestamp, timezone: str) -> pd.Timestamp:
timestamp = pd.to_datetime(timestamp).tz_convert(timezone).replace(tzinfo=None)
return timestamp
df['local_time'] = df.apply(lambda row: get_local_time(row['time'], row['timezone']), axis = 1).dt.round(freq='S')
print (df)
---
OUT:
time timezone local_time
0 2022-12-28T20:16:31.373Z Europe/Athens 2022-12-28 22:16:31
1 2022-07-28T20:16:31.373Z Europe/Athens 2022-07-28 23:16:31
2 2022-11-01T21:35:35.865Z Europe/Dublin 2022-11-01 21:35:36
3 2022-08-03T19:44:07.611Z America/Los_Angeles 2022-08-03 12:44:08
4 2022-08-02T12:44:44.360Z Europe/Minsk 2022-08-02 15:44:44
The starting date format I currently have is 2019-09-04 16:00 UTC+3 and I'm trying to convert it into a datetime format of 2019-09-04 16:00:00+0300.
The format I thought would work was format='%Y-%m-%d %H:%M %Z%z', but when I run it I get the error message ValueError: Cannot parse both %Z and %z.
Does anyone know the correct format to use, or should I be trying a different method altogether? Thanks.
Edit
Sorry, I had a hard time putting into words what it is I am looking to do, hopefully I can clarify.
I'm looking to change all the date and times in a dataframe into the datetime format.
This is the method I was trying to use which presented me with an error
df['datepicker'] = pd.to_datetime(df['datepicker'], format='%Y-%m-%d %H:%M %Z%z')
And here is a sample of the data I currently have.
datepicker
2019-09-07 16:00 UTC+2
2019-09-04 18:30 UTC+4
2019-09-06 17:00 UTC±0
2019-09-10 16:00 UTC+1
2019-09-04 18:00 UTC+3
And this is what I'm looking to convert them into, a timestamp format.
datepicker
2019-09-07 16:00:00+0200
2019-09-04 18:30:00+0400
2019-09-06 17:00:00+0000
2019-09-10 16:00:00+0100
2019-09-04 18:00:00+0300
pandas.to_datetime should parse this happily if you tweak the strings slightly:
import pandas as pd
df = pd.DataFrame({"datepicker":[ "2019-09-07 16:00 UTC+2", "2019-09-04 18:30 UTC+4",
"2019-09-06 17:00 UTC±0", "2019-09-10 16:00 UTC+1",
"2019-09-04 18:00 UTC+3"]})
df['datetime'] = pd.to_datetime(df['datepicker'].str.replace('±', '+'))
# df['datetime']
# 0 2019-09-07 16:00:00-02:00
# 1 2019-09-04 18:30:00-04:00
# 2 2019-09-06 17:00:00+00:00
# 3 2019-09-10 16:00:00-01:00
# 4 2019-09-04 18:00:00-03:00
# Name: datetime, dtype: object
Note that due to the mixed UTC offsets, the column's data type is 'object' (datetime objects). If you wish, you can also convert to UTC straight away, to get a column of dtype datetime[ns]:
df['UTC'] = pd.to_datetime(df['datepicker'].str.replace('±', '+'), utc=True)
# df['UTC']
# 0 2019-09-07 18:00:00+00:00
# 1 2019-09-04 22:30:00+00:00
# 2 2019-09-06 17:00:00+00:00
# 3 2019-09-10 17:00:00+00:00
# 4 2019-09-04 21:00:00+00:00
# Name: UTC, dtype: datetime64[ns, UTC]
When i defined as below. it works as you expect.
from datetime import datetime, timedelta, timezone
UTC = timezone(timedelta(hours=+3))
dt = datetime(2019, 1, 1, 12, 0, 0, tzinfo=UTC)
timestampStr = dt.strftime("%Y-%m-%d %H:%M %Z%z")
print(timestampStr)
With the output of:
2019-01-01 12:00 UTC+03:00+0300
I have the following column
Time
2:00
00:13
1:00
00:24
in object format (strings). This time refers to hours and minutes ago from a time that I need to use as a start: 8:00 (it might change; in this example is 8:00).
Since the times in the column Time are referring to hours/minutes ago, what I would like to expect should be
Time
6:00
07:47
7:00
07:36
calculated as time difference (e.g. 8:00 - 2:00).
However, I am having difficulties in doing this calculation and transform the result in a datetime (keeping only hours and minutes).
I hope you can help me.
Since the Time columns contains only Hour:Minute I suggest using timedelta instead of datetime:
df['Time'] = pd.to_timedelta(df.Time+':00')
df['Start_Time'] = pd.to_timedelta('8:00:00') - df['Time']
Output:
Time Start_Time
0 02:00:00 06:00:00
1 00:13:00 07:47:00
2 01:00:00 07:00:00
3 00:24:00 07:36:00
you can do it using pd.to_datetime.
ref = pd.to_datetime('08:00') #here define the hour of reference
s = ref-pd.to_datetime(df['Time'])
print (s)
0 06:00:00
1 07:47:00
2 07:00:00
3 07:36:00
Name: Time, dtype: timedelta64[ns]
This return a series, that can be change to a dataframe with s.to_frame() for example
I have a dataframe that has entries like this, where the times are in UTC:
start_date_time timezone
1 2017-01-01 14:00:00 America/Los_Angeles
2 2017-01-01 14:00:00 America/Denver
3 2017-01-01 14:00:00 America/Phoenix
4 2017-01-01 14:30:00 America/Los_Angeles
5 2017-01-01 14:30:00 America/Los_Angeles
I need to be able to group by date (local date, not UTC date) and I need to be able to create indicators for whether the event happened between certain times (local times, not UTC times).
I have successfully done the above in R by:
Creating a time variable in each of the timezones
Converting those to strings
Pulling each of the string date/time variables into one column, which one I pull depends on the appropriate timezone
Then, splitting that column to get a string date column and a string time column
I can then convert everything back to datetime objects for comparisons. e.g. now I can say if something happened between 2 and 3pm and it will correctly identify everything that happened between 2 and 3pm locally.
I have tried a bunch in python and have the dates as
2017-01-02 04:30:00-08:00
but I can't figure out how to go from there to
2017-01-01 20:30:00
Thanks!
Your example is incorrect. Your timezone is eight hours behind UTC, which means you need to add eight hours to 4:30AM which is 12:30PM UTC time.
The datetime object function astimezone(...) will do the conversion for you. For ease of use, I recommend pytz.
However in pure python:
import datetime as dt
local_tz = dt.timezone(dt.timedelta(hours=-8))
utc = dt.timezone.utc
d = dt.datetime(2017, 1, 2, 4, 30, 0, 0, local_tz)
print(d, d.astimezone(utc))
Will print:
2017-01-02 04:30:00-08:00 2017-01-02 12:30:00+00:00
Here's an example using pytz to lookup time zones:
import datetime as dt
import pytz
dates = [("2017-01-01 14:00:00", "America/Los_Angeles"),
("2017-01-01 14:00:00", "America/Denver"),
("2017-01-01 14:00:00", "America/Phoenix"),
("2017-01-01 14:30:00", "America/Los_Angeles"),
]
for d, tz_str in dates:
start = dt.datetime.strptime(d, "%Y-%m-%d %H:%M:%S")
start = start.replace(tzinfo=pytz.utc)
local_tz = pytz.timezone(tz_str) # convert to desired timezone
print(start, local_tz.zone, "\t", start.astimezone(local_tz))
This produces:
2017-01-01 14:00:00+00:00 America/Los_Angeles 2017-01-01 06:00:00-08:00
2017-01-01 14:00:00+00:00 America/Denver 2017-01-01 07:00:00-07:00
2017-01-01 14:00:00+00:00 America/Phoenix 2017-01-01 07:00:00-07:00
2017-01-01 14:30:00+00:00 America/Los_Angeles 2017-01-01 06:30:00-08:00
I have normal strings with more than millions data points from .csv file with format as below:
Datetime
22/12/2015 17:00:00
22/12/2015 18:00:00
I loaded into pandas and tried to converted into datetime format by using pandas.to_datetime(df['Datetime']). However, the new time series data I got that is not correct. There are some new Datetime produced during converting process. For example, 2016-12-11 23:30:00 that does not contain in original data.
It has been a while that I worked with panda, but in your example you have a different dateformat than in the example lines from csv:
yyyy-mm-dd hh:mm:ss
instead of
mm/dd/yyyy hh:mm:ss
the to_datetime function takes a parameter "format", this should help if that is the cause.
You want to use the option dayfirst=True
pd.to_datetime(df.Datetime, dayfirst=True)
This:
Datetime
22/12/2015 17:00:00
22/12/2015 18:00:00
11/12/2015 23:30:00
Gets converted to
0 2015-12-22 17:00:00
1 2015-12-22 18:00:00
2 2015-12-11 23:30:00
Name: Datetime, dtype: datetime64[ns]