I connect to an API that provides covid-19 data in Brazil organized by state and city, as follows:
#Bibliotecas
import pandas as pd
from pandas import Series, DataFrame, Panel
import matplotlib.pyplot as plt
from matplotlib.pyplot import plot_date, axis, show, gcf
import numpy as np
from urllib.request import Request, urlopen
import urllib
from http.cookiejar import CookieJar
import numpy as np
from datetime import datetime, timedelta
cj = CookieJar()
url_Bso = "https://brasil.io/api/dataset/covid19/caso_full/data?state=MG&city=Barroso"
req_Bso = urllib.request.Request(url_Bso, None, {"User-Agent": "python-urllib"})
opener_Bso = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
response_Bso = opener_Bso.open(req_Bso)
raw_response_Bso = response_Bso.read()
json_Bso = pd.read_json(raw_response_Bso)
results_Bso = json_Bso['results']
results_Bso = results_Bso.to_dict().values()
df_Bso = pd.DataFrame(results_Bso)
df_Bso.head(5)
This Api compiles the data released by the state health departments. However, there is a difference between the records of the state and city health departments, and the state records are out of date in relation to those of the cities. I would like to update Thursdays and Saturdays (the day when the epidemiological week ends). I'm trying the following:
saturday = datetime.today() + timedelta(days=-5)
yesterday = datetime.today() + timedelta(days=-1)
last_available_confirmed_day_Bso_saturday = 51
last_available_confirmed_day_Bso_yesterday = 54
df_Bso = df_Bso.loc[df_Bso['date'] == saturday, ['last_available_confirmed']] = last_available_confirmed_day_Bso_saturday
df_Bso = df_Bso.loc[df_Bso['date'] == yesterday, ['last_available_confirmed']] = last_available_confirmed_day_Bso_yesterday
df_Bso
However, I get the error:
> AttributeError: 'int' object has no attribute 'loc'
I need another dataframe with the values of these days updates. Can anyone help?
You have to adjust the date. Your data frame date column is a string. You can convert them to datetime.
today = datetime.now()
last_sat_num = (today.weekday() + 2) % 7
last_thu_num = (today.weekday() + 4) % 7
last_sat = today - timedelta(last_sat_num)
last_thu = today - timedelta(last_thu_num)
last_sat_str = last_sat.strftime('%Y-%m-%d')
last_thu_str = last_thu.strftime('%Y-%m-%d')
last_available_confirmed_day_Bso_sat = 51
last_available_confirmed_day_Bso_thu = 54
df_Bso2 = df_Bso.copy()
df_Bso2.loc[df_Bso2['date'] == last_sat_str, ['last_available_confirmed']] = last_available_confirmed_day_Bso_sat
df_Bso2.loc[df_Bso2['date'] == last_thu_str, ['last_available_confirmed']] = last_available_confirmed_day_Bso_thu
df_Bso2[['date', 'last_available_confirmed']].head(10)
Output
date last_available_confirmed
0 2020-07-15 44
1 2020-07-14 43
2 2020-07-13 40
3 2020-07-12 40
4 2020-07-11 51
5 2020-07-10 39
6 2020-07-09 36
7 2020-07-08 36
8 2020-07-07 27
9 2020-07-06 27
Related
I want to calculate difference between two time columns without considering non-business hours. I have used pyholidays, which worked totally fine. But even when i define starttime and endtime for Business-duration, Result still includes Non-Business Hours as you shown in attached photos.
for index, row in df.iterrows():
first=row['New']
second=row['Assigned']
third=row['In Progress']
if(pd.notnull(second)):
starttime = (8,0,0)
endtime = (17,0,0)
holidaylist = pyholidays.Germany()
unit='hour'
row['AP'] = businessDuration(first,second,holidaylist=holidaylist,unit=unit)
else:
starttime = (8,0,0)
endtime = (17,0,0)
holidaylist = pyholidays.Germany()
unit='hour'
row['AP'] = businessDuration(first,third,holidaylist=holidaylist,unit=unit)
ap.append(row['AP'])
DataFrame
Printed Result
Thank you for your suggestion. I have tried your method, i have also defined calendar instance. Later i was getting 'relativedelta' error which i have somehow solved by 'dateutil'. Now i am at final stage to compute business-hour difference between two columns.
`de_holidays = pyholidays.Germany()
cal = Calendar(holidays=de_holidays, weekdays=['Saturday', 'Sunday'])
df['rp'] = df.apply(lambda row: compute_bizhours_diff(row['Resolved'], row['Pending'], cal=cal, biz_open_time = time(8, 0, 0), biz_close_time = time(17, 0, 0)), axis=1)`
Now i am getting error about month number, which can not be nan. I have also attached photo of errors.
Pic1
Pic2
I do not know if this works, but try this:
# == Imports needed ===========================
from __future__ import annotations
from typing import Any
import pandas as pd
import holidays as pyholidays
from datetime import time
from bizdays import Calendar
from dateutil.relativedelta import relativedelta
# == Functions ==================================
def is_null_dates(*dates: Any) -> bool:
"""Determine whether objects are valid dates.
Parameters
----------
dates : Any
Variables to check whether they hold a valid date, or not.
Returns
-------
bool
True, if at least one informed value is not a date.
False otherwise.
"""
for date in dates:
if pd.isna(pd.to_datetime(date, errors='coerce')):
return True
return False
def compute_bizhours_diff(
start_date: str | pd.Timestamp,
end_date: str | pd.Timestamp,
biz_open_time: datetime.time | None = None,
biz_close_time: datetime.time | None = None,
cal: bizdays.Calendar | None = None,
) -> float:
"""Compute the number of business hours between two dates.
Parameters
----------
start_date : str | pd.Timestamp
The first date.
end_date : str | pd.Timestamp
The final date.
biz_open_time : datetime.time | None
The beginning hour/minute of a business day.
biz_close_time : datetime.time | None
The ending hour/minute of a business day.
cal : bizdays.Calendar | None
The calendar object used to figure out the number of days between `start_date`
and `end_date` that are not holidays. If None, consider every day as a business day,
except Saturdays, or Sundays.
Returns
-------
float
The total number of business hours between `start_date`, and `end_date`.
Examples
--------
>>> import holidays as pyholidays
>>> from datetime import time
>>> from bizdays import Calendar
>>> # 2022-09-07 is a national holiday in Brazil, therefore only
>>> # the hours between 2022-09-08 09:00:00, and 2022-09-08 15:48:00
>>> # should be considered. This should equal 6.8 hours.
>>> start_date = pd.to_datetime('2022-09-07 15:55:00')
>>> end_date = pd.to_datetime('2022-09-08 15:48:00')
>>> BR_holiday_list = pyholidays.BR(years={start_date.year, end_date.year}, state='RJ')
>>> cal = Calendar(holidays=BR_holiday_list, weekdays=['Saturday', 'Sunday'])
>>> print(compute_bizhours_diff(start_date, end_date, cal=cal))
6.8
>>> # Both dates in the next example are holidays, therefore, the result should be 0.0
>>> start_date = pd.to_datetime('2022-09-07 15:55:00')
>>> end_date = pd.to_datetime('2022-09-07 15:48:00')
>>> print(compute_bizhours_diff(start_date, end_date, cal=cal))
0.0
>>> # What if the end_date preceeds start_date by mistake?
>>> # In such cases, we switch start_date to end_date, and vice-versa.
>>> start_date = pd.to_datetime('2022-09-02 00:00:00')
>>> end_date = pd.to_datetime('2022-09-01 15:55:00')
>>> print(compute_bizhours_diff(start_date, end_date, cal=cal))
2.0833333333333335
>>> # What if the start_date, and end_date begin and finish on the same day, but they both have timestamps that end before
>>> # or after the business hours?
>>> # In such cases, the total number of hours is equal to 0.0
>>> start_date = pd.to_datetime('2022-09-02 00:00:00')
>>> end_date = pd.to_datetime('2022-09-02 8:00:00')
>>> print(compute_bizhours_diff(start_date, end_date, cal=cal))
0.0
"""
if is_null_dates(start_date, end_date):
return pd.NA
if biz_open_time is None:
biz_open_time = time(9, 0, 0)
if biz_close_time is None:
biz_close_time = time(18, 0, 0)
if cal is None:
cal = Calendar(weekdays=['Saturday', 'Sunday'])
open_delta = relativedelta(hour=biz_open_time.hour, minute=biz_open_time.minute)
end_delta = relativedelta(hour=biz_close_time.hour, minute=biz_close_time.minute)
start_date = pd.to_datetime(start_date)
end_date = pd.to_datetime(end_date)
_end_date = max(start_date, end_date)
_start_date = min(start_date, end_date)
start_date = _start_date
end_date = _end_date
start_date = (
start_date if cal.isbizday(start_date) else cal.following(start_date) + open_delta
)
end_date = (
end_date if cal.isbizday(end_date) else cal.preceding(end_date) + end_delta
)
if end_date < start_date:
return 0.00
start_date_biz = max(start_date, start_date + open_delta)
end_first_day = start_date_biz + end_delta
end_date_biz = min(
end_date,
end_date + end_delta
)
start_last_day = end_date_biz + open_delta
if start_last_day > end_date:
end_date_biz = start_last_day
if end_first_day < start_date:
end_first_day = start_date_biz
if end_first_day.date() == end_date_biz.date():
return (end_date_biz - start_date_biz).seconds / 3600
return (
(end_first_day - start_date_biz).seconds
+ (end_date_biz - start_last_day).seconds
+ (
max((len(list(cal.seq(start_date, end_date))) - 2), 0)
* (end_first_day - (start_date + open_delta)).seconds
)
) / 3600
Before running the preceding code, you need to install the following packages, if you do not already have them:
pip install holidays bizdays
Link to both packages' documentation:
bizdays
python-holidays
Examples
Here is how you can use compute_bizhours_diff:
import pandas as pd
import holidays as pyholidays
from datetime import time
from bizdays import Calendar
# OPTIONAL: define custom start, and end to your business hours.
biz_open_time = time(9, 0, 0)
biz_close_time = time(18, 0, 0)
# Define your start, and end dates.
start_date = pd.to_datetime('2022-09-07 04:48:00')
end_date = pd.to_datetime('2022-09-10 15:55:00')
# Create a list of holidays, and create a Calendar instance.
BR_holiday_list = pyholidays.BR(years={start_date.year, end_date.year}, state='RJ')
# For German holidays, you can use something like:
German_holiday_list = pyholidays.Germany(years={start_date.year, end_date.year})
# Define the Calendar instance. Here, we use the German holidays, excluding Saturday, and Sunday from weekdays.
cal = Calendar(holidays=German_holiday_list, weekdays=['Saturday', 'Sunday'])
# Finally, compute the total number of working hours between your two dates:
compute_bizhours_diff(start_date, end_date, cal=cal)
# Returns: 27.0
You can also use the function with pandas dataframes, using apply:
df['working_hours_delta'] = df.apply(lambda row: compute_bizhours_diff(row[START_DATE_COLNAME], row[END_DATE_COLNAME], cal=cal), axis=1)
Notes
The function compute_bizhours_diff is far from perfect. Before using it in any production environment, or for any serious use case, I strongly recommend refactoring it.
Edit
I made some changes to the original answer, to account for instances where start_date, or end_date have null or invalid representations of dates.
Using the example dataframe from your question it now runs fine:
de_holidays = pyholidays.Germany()
cal = Calendar(holidays=de_holidays, weekdays=['Saturday', 'Sunday'])
df = pd.DataFrame(
{
'Assigned': [None, '2022-07-28 10:53:00', '2022-07-28 18:08:00', None, '2022-07-29 12:56:00'],
'In Progress': ['2022-08-01 10:53:00', '2022-08-02 09:32:00', '2022-07-29 12:08:00', '2022-08-02 10:23:00', '2022-07-29 14:54:00'],
'New': ['2022-07-27 15:01:00', '2022-07-28 10:09:00', '2022-07-28 13:37:00', '2022-07-29 00:12:00', '2022-07-29 09:51:00'],
}
).apply(pd.to_datetime)
df['rp'] = df.apply(
lambda row: compute_bizhours_diff(
row['Assigned'], row['In Progress'], cal=cal, biz_open_time = time(8, 0, 0), biz_close_time = time(17, 0, 0)
), axis=1
)
print(df)
# Prints:
# Assigned In Progress New rp
# 0 NaT 2022-08-01 10:53:00 2022-07-27 15:01:00 <NA>
# 1 2022-07-28 10:53:00 2022-08-02 09:32:00 2022-07-28 10:09:00 25.65
# 2 2022-07-28 18:08:00 2022-07-29 12:08:00 2022-07-28 13:37:00 4.133333
# 3 NaT 2022-08-02 10:23:00 2022-07-29 00:12:00 <NA>
# 4 2022-07-29 12:56:00 2022-07-29 14:54:00 2022-07-29 09:51:00 1.966667
try:
# For Python 3.0 and later
from urllib.request import urlopen
except ImportError:
# Fall back to Python 2's urllib2
from urllib2 import urlopen
import certifi
import json
def get_jsonparsed_data(url):
response = urlopen(url, cafile=certifi.where())
data = response.read().decode("utf-8")
return json.loads(data)
url = ("https://financialmodelingprep.com/api/v3/ratios/AAPL?apikey=92a1dad5aef4eb31276c19417c31dfeb")
print(get_jsonparsed_data(URL))
import requests
import pandas as pd
url = (
"https://financialmodelingprep.com/api/v3/ratios/AAPL?"
"apikey=92a1dad5aef4eb31276c19417c31dfeb"
)
response = requests.get(url)
data = response.json()
df = pd.DataFrame(data)
df
prints:
symbol date period currentRatio quickRatio cashRatio daysOfSalesOutstanding daysOfInventoryOutstanding operatingCycle daysOfPayablesOutstanding ... priceToSalesRatio priceEarningsRatio priceToFreeCashFlowsRatio priceToOperatingCashFlowsRatio priceCashFlowRatio priceEarningsToGrowthRatio priceSalesRatio dividendYield enterpriseValueMultiple priceFairValue
0 AAPL 2021-09-25 FY 1.074553 0.909660 0.278449 51.390969 11.276593 62.667561 93.851071 ... 6.786117 26.219656 26.706799 23.861253 23.861253 0.367742 6.786117 0.005828 20.889553 39.348186
1 AAPL 2020-09-26 FY 1.363604 1.218195 0.360710 49.787534 8.741883 58.529418 91.048190 ... 7.272322 34.773150 27.211359 24.746031 24.746031 3.277438 7.272322 0.007053 25.558891 30.553901
2 AAPL 2019-09-28 FY 1.540126 1.384447 0.462022 64.258765 9.263639 73.522404 104.314077 ... 4.420394 20.813515 19.527159 16.573786 16.573786 -62.492578 4.420394 0.012277 14.772472 12.709658
3 AAPL 2018-09-29 FY 1.123843 0.986566 0.221733 67.332499 8.817631 76.150130 124.570214 ... 3.959898 17.666917 16.402259 13.582267 13.582267 0.597709 3.959898 0.013038 13.099961 9.815760
4 AAPL 2017-09-30 FY 1.276063 1.089670 0.201252 56.800671 12.563631 69.364302 126.927606 ... 3.794457 17.989671 17.121402 13.676823 13.676823 1.632758 3.794457 0.014680 12.605749 6.488908
import pandas
print(pandas.DataFrame(data))
I guess maybe what you are trying to do...
import requests
import pandas as pd
import pandas_ta as ta
def stochFourMonitor():
k_period = 14
d_period = 3
data = get_data('BTC-PERP',14400,1642935495,1643165895)
print(data)
data = data['result']
df = pd.DataFrame(data)
df['trailingHigh'] = df['high'].rolling(k_period).max()
df['trailingLow'] = df['low'].rolling(k_period).min()
df['%K'] = (df['close'] - df['trailingLow']) * 100 / (df['trailingHigh'] - df['trailingLow'])
df['%D'] = df['%K'].rolling(d_period).mean()
df.index.name = 'test'
df.set_index(pd.DatetimeIndex(df["startTime"]), inplace=True)
print(df)
df.drop(columns=['startTime'])
print(df)
df.ta.stoch(high='High', low='Low',close= 'Close', k=14, d=3, append=True)
#t = ta.stoch(close='close',high='high', low='low', k=14, d=3, append=True)
#df.ta.stoch(close='close',high='high', low='low', k=14, d=3, append=True)
def get_data(marketName,resolution,start_time,end_time):
data = requests.get('https://ftx.com/api/markets/' + marketName + '/candles?resolution=' + str(resolution) + '&start_time=' + str(start_time) + '&end_time=' + str(end_time)).json()
return data
I keep receiving the error 'NoneType' object has no attribute 'name'. See below for full exception. It seems like the code is not recognizing the pandas_ta module but I don't understand why. Any help troubleshooting would be much appreciated.
Exception has occurred: AttributeError (note: full exception trace is shown but execution is paused at: )
'NoneType' object has no attribute 'name'
File "C:\Users\Jason\Documents\TradingCode\FTX Websocket\testing21.py", line 21, in stochFourMonitor
df.ta.stoch(high='High', low='Low',close= 'Close', k=14, d=3, append=True)
File "C:\Users\Jason\Documents\TradingCode\FTX Websocket\testing21.py", line 31, in (Current frame)
stochFourMonitor()
You have to few values in your dataframe. You need at least 17 values (k=14, d=3)
>>> pd.Timestamp(1642935495, unit='s')
Timestamp('2022-01-23 10:58:15')
>>> pd.Timestamp(1643165895, unit='s')
Timestamp('2022-01-26 02:58:15')
>>> pd.DataFrame(get_data('BTC-PERP',14400,1642935495,1643165895)['result'])
0 2022-01-23T12:00:00+00:00 1.642939e+12 35690.0 36082.0 35000.0 35306.0 6.315513e+08
1 2022-01-23T16:00:00+00:00 1.642954e+12 35306.0 35460.0 34601.0 34785.0 7.246238e+08
2 2022-01-23T20:00:00+00:00 1.642968e+12 34785.0 36551.0 34712.0 36271.0 9.663773e+08
3 2022-01-24T00:00:00+00:00 1.642982e+12 36271.0 36283.0 35148.0 35351.0 6.007333e+08
4 2022-01-24T04:00:00+00:00 1.642997e+12 35351.0 35511.0 34821.0 34896.0 5.554126e+08
5 2022-01-24T08:00:00+00:00 1.643011e+12 34895.0 35610.0 33033.0 33709.0 1.676436e+09
6 2022-01-24T12:00:00+00:00 1.643026e+12 33709.0 34399.0 32837.0 34260.0 2.021096e+09
7 2022-01-24T16:00:00+00:00 1.643040e+12 34261.0 36493.0 33800.0 36101.0 1.989552e+09
8 2022-01-24T20:00:00+00:00 1.643054e+12 36101.0 37596.0 35990.0 36673.0 1.202684e+09
9 2022-01-25T00:00:00+00:00 1.643069e+12 36673.0 36702.0 35974.0 36431.0 4.538093e+08
10 2022-01-25T04:00:00+00:00 1.643083e+12 36431.0 36500.0 35719.0 36067.0 3.514587e+08
11 2022-01-25T08:00:00+00:00 1.643098e+12 36067.0 36824.0 36030.0 36431.0 5.830712e+08
12 2022-01-25T12:00:00+00:00 1.643112e+12 36431.0 37200.0 35997.0 36568.0 9.992247e+08
13 2022-01-25T16:00:00+00:00 1.643126e+12 36568.0 37600.0 36532.0 37079.0 8.225219e+08
14 2022-01-25T20:00:00+00:00 1.643141e+12 37077.0 37140.0 36437.0 36980.0 7.892745e+08
15 2022-01-26T00:00:00+00:00 1.643155e+12 36980.0 37242.0 36567.0 37238.0 3.226400e+08
>>> pd.DataFrame(get_data('BTC-PERP',14400,1642935495,1643165895)['result'])
...
AttributeError: 'NoneType' object has no attribute 'name'
Now change 1642935495 ('2022-01-23 10:58:15') by 1642845495 ('2022-01-22 10:58:15':
>>> pd.DataFrame(get_data('BTC-PERP',14400,1642845495,1643165895)['result']).ta.stoch()
STOCHk_14_3_3 STOCHd_14_3_3
13 NaN NaN
14 NaN NaN
15 80.824814 NaN
16 74.665546 NaN
17 72.970512 76.153624
18 73.930097 73.855385
19 80.993469 75.964693
20 84.814444 79.912670
21 89.775352 85.194422
I am trying to extract historical data between [curr_time - 2years, curr_time]. Time gap is 1 day. So, I expect about 700 items, but i received only 3 items.
How can I fix this problem?
My code
from binance.client import Client
# Binance test_key https://testnet.binance.vision/key/generate
API_KEY = "---"
API_SECRET = "---"
DAYS_IN_YEAR = 365
DB_NAME = "charts"
def GetHistoricalData(
timedelta_days=DAYS_IN_YEAR * 2,
ticker="BTCUSDT",
kline_interval=Client.KLINE_INTERVAL_1HOUR
):
start_time = time.time()
untilThisDate = datetime.datetime.now()
sinceThisDate = untilThisDate - datetime.timedelta(days=timedelta_days)
print("ZZZZZZZZZ_ ", str(sinceThisDate), str(untilThisDate)) # 2019-11-06 00:23:43.620016 2021-11-05 00:23:43.620016
client = Client(API_KEY, API_SECRET)
client.API_URL = 'https://testnet.binance.vision/api'
candle = client.get_historical_klines(ticker, kline_interval, str(sinceThisDate), str(untilThisDate))
print("CANDLE_", len(candle)) # 3
I tried this request:
candle = client.get_historical_klines(ticker, kline_interval, "01 January, 2019", "04 November 2021")
but received only 3 items again
dateTime ...
2021-11-02 00:00:00 61722.80000000 150535.61000000 ... 448.99018200 1635897599999
2021-11-03 00:00:00 63208.69000000 100000.00000000 ... 451.03367500 1635983999999
2021-11-04 00:00:00 62894.04000000 70000.00000000 ... 401.86212800 1636070399999
Well....
If you try to request this data with API call it will give you:
In [1]: import requests
...: len(requests.get('https://testnet.binance.vision/api/v3/klines?symbol=BTCUSDT&interval=1h&limit=1000').json())
Out[1]: 65
but if you try to run it with production env of binance (btw klines/candles is a public data and don't require apiKey):
In [2]: import requests
...: len(requests.get('https://api.binance.com/api/v3/klines?symbol=BTCUSDT&interval=1h&limit=1000').json())
Out[2]: 1000
So, to fix you example, you need replace BASE_URL
client.API_URL = 'https://api.binance.com/api'
It gives me:
ZZZZZZZZZ_ 2019-11-06 01:15:15.122873 2021-11-05 01:15:15.122873
CANDLE_ 17483
Try the code below. I get a bunch of data, but its not formatted:
import datetime
from binance.client import Client
import time
# Binance test_key https://testnet.binance.vision/key/generate
API_KEY = "---"
API_SECRET = "---"
DAYS_IN_YEAR = 365
DB_NAME = "charts"
def GetHistoricalData(
timedelta_days=DAYS_IN_YEAR * 2,
ticker="BTCUSDT",
kline_interval=Client.KLINE_INTERVAL_1HOUR
):
start_time = time.time()
untilThisDate = datetime.datetime.now()
sinceThisDate = untilThisDate - datetime.timedelta(days=timedelta_days)
print("ZZZZZZZZZ_ ", str(sinceThisDate),
str(untilThisDate)) # 2019-11-06 00:23:43.620016 2021-11-05 00:23:43.620016
client = Client(API_KEY, API_SECRET)
client.API_URL = 'https://testnet.binance.vision/api'
candle = client.get_historical_klines(ticker, kline_interval, str(sinceThisDate), str(untilThisDate))
print(candle)
GetHistoricalData()
The data in test.csv are like this:
TIMESTAMP POLYLINE
0 1408039037 [[-8.585676,41.148522],[-8.585712,41.148639],[...
1 1408038611 [[-8.610876,41.14557],[-8.610858,41.145579],[-...
2 1408038568 [[-8.585739,41.148558],[-8.58573,41.148828],[-...
3 1408039090 [[-8.613963,41.141169],[-8.614125,41.141124],[...
4 1408039177 [[-8.619903,41.148036],[-8.619894,41.148036]]
.. ... ...
315 1419171485 [[-8.570196,41.159484],[-8.570187,41.158962],[...
316 1419170802 [[-8.613873,41.141232],[-8.613882,41.141241],[...
317 1419172121 [[-8.6481,41.152536],[-8.647461,41.15241],[-8....
318 1419171980 [[-8.571699,41.156073],[-8.570583,41.155929],[...
319 1419171420 [[-8.574561,41.180184],[-8.572248,41.17995],[-...
[320 rows x 2 columns]
I read them from csv file in this way:
train = pd.read_csv("path/train.csv",engine='python',error_bad_lines=False)
So, I have this timestamp in Unix format. I want to convert in UTC time and then extract year, month, day and so on.
This is the code for the conversion from Unix timestamp to UTC date time:
train["TIMESTAMP"] = [float(time) for time in train["TIMESTAMP"]]
train["data_time"] = [datetime.datetime.fromtimestamp(time, datetime.timezone.utc) for time in train["TIMESTAMP"]]
To extract year and other information I do this:
train["year"] = train["data_time"].dt.year
train["month"] = train["data_time"].dt.month
train["day"] = train["data_time"].dt.day
train["hour"] = train["data_time"].dt.hour
train["min"] = train["data_time"].dt.minute
But I obtain this error when the execution arrives at the extraction point:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-30-d2249cabe965> in <module>()
67 train["TIMESTAMP"] = [float(time) for time in train["TIMESTAMP"]]
68 train["data_time"] = [datetime.datetime.fromtimestamp(time, datetime.timezone.utc) for time in train["TIMESTAMP"]]
---> 69 train["year"] = train["data_time"].dt.year
70 train["month"] = train["data_time"].dt.month
71 train["day"] = train["data_time"].dt.day
2 frames
/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/accessors.py in __new__(cls, data)
478 return PeriodProperties(data, orig)
479
--> 480 raise AttributeError("Can only use .dt accessor with datetimelike values")
AttributeError: Can only use .dt accessor with datetimelike values
I also read a lot of similiar discussion but I can't figure out why I obtain this error.
Edited:
So the train["TIMESTAMP"] data are like this:
1408039037
1408038611
1408039090
Then I do this with this data:
train["TIMESTAMP"] = [float(time) for time in train["TIMESTAMP"]]
train["data_time"] = [datetime.datetime.fromtimestamp(time, datetime.timezone.utc) for time in train["TIMESTAMP"]]
train["year"] = train["data_time"].dt.year
train["month"] = train["data_time"].dt.month
train["day"] = train["data_time"].dt.day
train["hour"] = train["data_time"].dt.hour
train["min"] = train["data_time"].dt.minute
train = train[["year", "month", "day", "hour","min"]]