Update row after comparing values on pandas dataframe - python

I connect to an API that provides covid-19 data in Brazil organized by state and city, as follows:
#Bibliotecas
import pandas as pd
from pandas import Series, DataFrame, Panel
import matplotlib.pyplot as plt
from matplotlib.pyplot import plot_date, axis, show, gcf
import numpy as np
from urllib.request import Request, urlopen
import urllib
from http.cookiejar import CookieJar
import numpy as np
from datetime import datetime, timedelta
cj = CookieJar()
url_Bso = "https://brasil.io/api/dataset/covid19/caso_full/data?state=MG&city=Barroso"
req_Bso = urllib.request.Request(url_Bso, None, {"User-Agent": "python-urllib"})
opener_Bso = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
response_Bso = opener_Bso.open(req_Bso)
raw_response_Bso = response_Bso.read()
json_Bso = pd.read_json(raw_response_Bso)
results_Bso = json_Bso['results']
results_Bso = results_Bso.to_dict().values()
df_Bso = pd.DataFrame(results_Bso)
df_Bso.head(5)
This Api compiles the data released by the state health departments. However, there is a difference between the records of the state and city health departments, and the state records are out of date in relation to those of the cities. I would like to update Thursdays and Saturdays (the day when the epidemiological week ends). I'm trying the following:
saturday = datetime.today() + timedelta(days=-5)
yesterday = datetime.today() + timedelta(days=-1)
last_available_confirmed_day_Bso_saturday = 51
last_available_confirmed_day_Bso_yesterday = 54
df_Bso = df_Bso.loc[df_Bso['date'] == saturday, ['last_available_confirmed']] = last_available_confirmed_day_Bso_saturday
df_Bso = df_Bso.loc[df_Bso['date'] == yesterday, ['last_available_confirmed']] = last_available_confirmed_day_Bso_yesterday
df_Bso
However, I get the error:
> AttributeError: 'int' object has no attribute 'loc'
I need another dataframe with the values of these days updates. Can anyone help?

You have to adjust the date. Your data frame date column is a string. You can convert them to datetime.
today = datetime.now()
last_sat_num = (today.weekday() + 2) % 7
last_thu_num = (today.weekday() + 4) % 7
last_sat = today - timedelta(last_sat_num)
last_thu = today - timedelta(last_thu_num)
last_sat_str = last_sat.strftime('%Y-%m-%d')
last_thu_str = last_thu.strftime('%Y-%m-%d')
last_available_confirmed_day_Bso_sat = 51
last_available_confirmed_day_Bso_thu = 54
df_Bso2 = df_Bso.copy()
df_Bso2.loc[df_Bso2['date'] == last_sat_str, ['last_available_confirmed']] = last_available_confirmed_day_Bso_sat
df_Bso2.loc[df_Bso2['date'] == last_thu_str, ['last_available_confirmed']] = last_available_confirmed_day_Bso_thu
df_Bso2[['date', 'last_available_confirmed']].head(10)
Output
date last_available_confirmed
0 2020-07-15 44
1 2020-07-14 43
2 2020-07-13 40
3 2020-07-12 40
4 2020-07-11 51
5 2020-07-10 39
6 2020-07-09 36
7 2020-07-08 36
8 2020-07-07 27
9 2020-07-06 27

Related

Time difference between two timedate columns without considering Non-business hours

I want to calculate difference between two time columns without considering non-business hours. I have used pyholidays, which worked totally fine. But even when i define starttime and endtime for Business-duration, Result still includes Non-Business Hours as you shown in attached photos.
for index, row in df.iterrows():
first=row['New']
second=row['Assigned']
third=row['In Progress']
if(pd.notnull(second)):
starttime = (8,0,0)
endtime = (17,0,0)
holidaylist = pyholidays.Germany()
unit='hour'
row['AP'] = businessDuration(first,second,holidaylist=holidaylist,unit=unit)
else:
starttime = (8,0,0)
endtime = (17,0,0)
holidaylist = pyholidays.Germany()
unit='hour'
row['AP'] = businessDuration(first,third,holidaylist=holidaylist,unit=unit)
ap.append(row['AP'])
DataFrame
Printed Result
Thank you for your suggestion. I have tried your method, i have also defined calendar instance. Later i was getting 'relativedelta' error which i have somehow solved by 'dateutil'. Now i am at final stage to compute business-hour difference between two columns.
`de_holidays = pyholidays.Germany()
cal = Calendar(holidays=de_holidays, weekdays=['Saturday', 'Sunday'])
df['rp'] = df.apply(lambda row: compute_bizhours_diff(row['Resolved'], row['Pending'], cal=cal, biz_open_time = time(8, 0, 0), biz_close_time = time(17, 0, 0)), axis=1)`
Now i am getting error about month number, which can not be nan. I have also attached photo of errors.
Pic1
Pic2
I do not know if this works, but try this:
# == Imports needed ===========================
from __future__ import annotations
from typing import Any
import pandas as pd
import holidays as pyholidays
from datetime import time
from bizdays import Calendar
from dateutil.relativedelta import relativedelta
# == Functions ==================================
def is_null_dates(*dates: Any) -> bool:
"""Determine whether objects are valid dates.
Parameters
----------
dates : Any
Variables to check whether they hold a valid date, or not.
Returns
-------
bool
True, if at least one informed value is not a date.
False otherwise.
"""
for date in dates:
if pd.isna(pd.to_datetime(date, errors='coerce')):
return True
return False
def compute_bizhours_diff(
start_date: str | pd.Timestamp,
end_date: str | pd.Timestamp,
biz_open_time: datetime.time | None = None,
biz_close_time: datetime.time | None = None,
cal: bizdays.Calendar | None = None,
) -> float:
"""Compute the number of business hours between two dates.
Parameters
----------
start_date : str | pd.Timestamp
The first date.
end_date : str | pd.Timestamp
The final date.
biz_open_time : datetime.time | None
The beginning hour/minute of a business day.
biz_close_time : datetime.time | None
The ending hour/minute of a business day.
cal : bizdays.Calendar | None
The calendar object used to figure out the number of days between `start_date`
and `end_date` that are not holidays. If None, consider every day as a business day,
except Saturdays, or Sundays.
Returns
-------
float
The total number of business hours between `start_date`, and `end_date`.
Examples
--------
>>> import holidays as pyholidays
>>> from datetime import time
>>> from bizdays import Calendar
>>> # 2022-09-07 is a national holiday in Brazil, therefore only
>>> # the hours between 2022-09-08 09:00:00, and 2022-09-08 15:48:00
>>> # should be considered. This should equal 6.8 hours.
>>> start_date = pd.to_datetime('2022-09-07 15:55:00')
>>> end_date = pd.to_datetime('2022-09-08 15:48:00')
>>> BR_holiday_list = pyholidays.BR(years={start_date.year, end_date.year}, state='RJ')
>>> cal = Calendar(holidays=BR_holiday_list, weekdays=['Saturday', 'Sunday'])
>>> print(compute_bizhours_diff(start_date, end_date, cal=cal))
6.8
>>> # Both dates in the next example are holidays, therefore, the result should be 0.0
>>> start_date = pd.to_datetime('2022-09-07 15:55:00')
>>> end_date = pd.to_datetime('2022-09-07 15:48:00')
>>> print(compute_bizhours_diff(start_date, end_date, cal=cal))
0.0
>>> # What if the end_date preceeds start_date by mistake?
>>> # In such cases, we switch start_date to end_date, and vice-versa.
>>> start_date = pd.to_datetime('2022-09-02 00:00:00')
>>> end_date = pd.to_datetime('2022-09-01 15:55:00')
>>> print(compute_bizhours_diff(start_date, end_date, cal=cal))
2.0833333333333335
>>> # What if the start_date, and end_date begin and finish on the same day, but they both have timestamps that end before
>>> # or after the business hours?
>>> # In such cases, the total number of hours is equal to 0.0
>>> start_date = pd.to_datetime('2022-09-02 00:00:00')
>>> end_date = pd.to_datetime('2022-09-02 8:00:00')
>>> print(compute_bizhours_diff(start_date, end_date, cal=cal))
0.0
"""
if is_null_dates(start_date, end_date):
return pd.NA
if biz_open_time is None:
biz_open_time = time(9, 0, 0)
if biz_close_time is None:
biz_close_time = time(18, 0, 0)
if cal is None:
cal = Calendar(weekdays=['Saturday', 'Sunday'])
open_delta = relativedelta(hour=biz_open_time.hour, minute=biz_open_time.minute)
end_delta = relativedelta(hour=biz_close_time.hour, minute=biz_close_time.minute)
start_date = pd.to_datetime(start_date)
end_date = pd.to_datetime(end_date)
_end_date = max(start_date, end_date)
_start_date = min(start_date, end_date)
start_date = _start_date
end_date = _end_date
start_date = (
start_date if cal.isbizday(start_date) else cal.following(start_date) + open_delta
)
end_date = (
end_date if cal.isbizday(end_date) else cal.preceding(end_date) + end_delta
)
if end_date < start_date:
return 0.00
start_date_biz = max(start_date, start_date + open_delta)
end_first_day = start_date_biz + end_delta
end_date_biz = min(
end_date,
end_date + end_delta
)
start_last_day = end_date_biz + open_delta
if start_last_day > end_date:
end_date_biz = start_last_day
if end_first_day < start_date:
end_first_day = start_date_biz
if end_first_day.date() == end_date_biz.date():
return (end_date_biz - start_date_biz).seconds / 3600
return (
(end_first_day - start_date_biz).seconds
+ (end_date_biz - start_last_day).seconds
+ (
max((len(list(cal.seq(start_date, end_date))) - 2), 0)
* (end_first_day - (start_date + open_delta)).seconds
)
) / 3600
Before running the preceding code, you need to install the following packages, if you do not already have them:
pip install holidays bizdays
Link to both packages' documentation:
bizdays
python-holidays
Examples
Here is how you can use compute_bizhours_diff:
import pandas as pd
import holidays as pyholidays
from datetime import time
from bizdays import Calendar
# OPTIONAL: define custom start, and end to your business hours.
biz_open_time = time(9, 0, 0)
biz_close_time = time(18, 0, 0)
# Define your start, and end dates.
start_date = pd.to_datetime('2022-09-07 04:48:00')
end_date = pd.to_datetime('2022-09-10 15:55:00')
# Create a list of holidays, and create a Calendar instance.
BR_holiday_list = pyholidays.BR(years={start_date.year, end_date.year}, state='RJ')
# For German holidays, you can use something like:
German_holiday_list = pyholidays.Germany(years={start_date.year, end_date.year})
# Define the Calendar instance. Here, we use the German holidays, excluding Saturday, and Sunday from weekdays.
cal = Calendar(holidays=German_holiday_list, weekdays=['Saturday', 'Sunday'])
# Finally, compute the total number of working hours between your two dates:
compute_bizhours_diff(start_date, end_date, cal=cal)
# Returns: 27.0
You can also use the function with pandas dataframes, using apply:
df['working_hours_delta'] = df.apply(lambda row: compute_bizhours_diff(row[START_DATE_COLNAME], row[END_DATE_COLNAME], cal=cal), axis=1)
Notes
The function compute_bizhours_diff is far from perfect. Before using it in any production environment, or for any serious use case, I strongly recommend refactoring it.
Edit
I made some changes to the original answer, to account for instances where start_date, or end_date have null or invalid representations of dates.
Using the example dataframe from your question it now runs fine:
de_holidays = pyholidays.Germany()
cal = Calendar(holidays=de_holidays, weekdays=['Saturday', 'Sunday'])
df = pd.DataFrame(
{
'Assigned': [None, '2022-07-28 10:53:00', '2022-07-28 18:08:00', None, '2022-07-29 12:56:00'],
'In Progress': ['2022-08-01 10:53:00', '2022-08-02 09:32:00', '2022-07-29 12:08:00', '2022-08-02 10:23:00', '2022-07-29 14:54:00'],
'New': ['2022-07-27 15:01:00', '2022-07-28 10:09:00', '2022-07-28 13:37:00', '2022-07-29 00:12:00', '2022-07-29 09:51:00'],
}
).apply(pd.to_datetime)
df['rp'] = df.apply(
lambda row: compute_bizhours_diff(
row['Assigned'], row['In Progress'], cal=cal, biz_open_time = time(8, 0, 0), biz_close_time = time(17, 0, 0)
), axis=1
)
print(df)
# Prints:
# Assigned In Progress New rp
# 0 NaT 2022-08-01 10:53:00 2022-07-27 15:01:00 <NA>
# 1 2022-07-28 10:53:00 2022-08-02 09:32:00 2022-07-28 10:09:00 25.65
# 2 2022-07-28 18:08:00 2022-07-29 12:08:00 2022-07-28 13:37:00 4.133333
# 3 NaT 2022-08-02 10:23:00 2022-07-29 00:12:00 <NA>
# 4 2022-07-29 12:56:00 2022-07-29 14:54:00 2022-07-29 09:51:00 1.966667

how to convert the Json to table in python

try:
# For Python 3.0 and later
from urllib.request import urlopen
except ImportError:
# Fall back to Python 2's urllib2
from urllib2 import urlopen
import certifi
import json
def get_jsonparsed_data(url):
response = urlopen(url, cafile=certifi.where())
data = response.read().decode("utf-8")
return json.loads(data)
url = ("https://financialmodelingprep.com/api/v3/ratios/AAPL?apikey=92a1dad5aef4eb31276c19417c31dfeb")
print(get_jsonparsed_data(URL))
import requests
import pandas as pd
url = (
"https://financialmodelingprep.com/api/v3/ratios/AAPL?"
"apikey=92a1dad5aef4eb31276c19417c31dfeb"
)
response = requests.get(url)
data = response.json()
df = pd.DataFrame(data)
df
prints:
symbol date period currentRatio quickRatio cashRatio daysOfSalesOutstanding daysOfInventoryOutstanding operatingCycle daysOfPayablesOutstanding ... priceToSalesRatio priceEarningsRatio priceToFreeCashFlowsRatio priceToOperatingCashFlowsRatio priceCashFlowRatio priceEarningsToGrowthRatio priceSalesRatio dividendYield enterpriseValueMultiple priceFairValue
0 AAPL 2021-09-25 FY 1.074553 0.909660 0.278449 51.390969 11.276593 62.667561 93.851071 ... 6.786117 26.219656 26.706799 23.861253 23.861253 0.367742 6.786117 0.005828 20.889553 39.348186
1 AAPL 2020-09-26 FY 1.363604 1.218195 0.360710 49.787534 8.741883 58.529418 91.048190 ... 7.272322 34.773150 27.211359 24.746031 24.746031 3.277438 7.272322 0.007053 25.558891 30.553901
2 AAPL 2019-09-28 FY 1.540126 1.384447 0.462022 64.258765 9.263639 73.522404 104.314077 ... 4.420394 20.813515 19.527159 16.573786 16.573786 -62.492578 4.420394 0.012277 14.772472 12.709658
3 AAPL 2018-09-29 FY 1.123843 0.986566 0.221733 67.332499 8.817631 76.150130 124.570214 ... 3.959898 17.666917 16.402259 13.582267 13.582267 0.597709 3.959898 0.013038 13.099961 9.815760
4 AAPL 2017-09-30 FY 1.276063 1.089670 0.201252 56.800671 12.563631 69.364302 126.927606 ... 3.794457 17.989671 17.121402 13.676823 13.676823 1.632758 3.794457 0.014680 12.605749 6.488908
import pandas
print(pandas.DataFrame(data))
I guess maybe what you are trying to do...

python not recognizing pandas_ta module

import requests
import pandas as pd
import pandas_ta as ta
def stochFourMonitor():
k_period = 14
d_period = 3
data = get_data('BTC-PERP',14400,1642935495,1643165895)
print(data)
data = data['result']
df = pd.DataFrame(data)
df['trailingHigh'] = df['high'].rolling(k_period).max()
df['trailingLow'] = df['low'].rolling(k_period).min()
df['%K'] = (df['close'] - df['trailingLow']) * 100 / (df['trailingHigh'] - df['trailingLow'])
df['%D'] = df['%K'].rolling(d_period).mean()
df.index.name = 'test'
df.set_index(pd.DatetimeIndex(df["startTime"]), inplace=True)
print(df)
df.drop(columns=['startTime'])
print(df)
df.ta.stoch(high='High', low='Low',close= 'Close', k=14, d=3, append=True)
#t = ta.stoch(close='close',high='high', low='low', k=14, d=3, append=True)
#df.ta.stoch(close='close',high='high', low='low', k=14, d=3, append=True)
def get_data(marketName,resolution,start_time,end_time):
data = requests.get('https://ftx.com/api/markets/' + marketName + '/candles?resolution=' + str(resolution) + '&start_time=' + str(start_time) + '&end_time=' + str(end_time)).json()
return data
I keep receiving the error 'NoneType' object has no attribute 'name'. See below for full exception. It seems like the code is not recognizing the pandas_ta module but I don't understand why. Any help troubleshooting would be much appreciated.
Exception has occurred: AttributeError (note: full exception trace is shown but execution is paused at: )
'NoneType' object has no attribute 'name'
File "C:\Users\Jason\Documents\TradingCode\FTX Websocket\testing21.py", line 21, in stochFourMonitor
df.ta.stoch(high='High', low='Low',close= 'Close', k=14, d=3, append=True)
File "C:\Users\Jason\Documents\TradingCode\FTX Websocket\testing21.py", line 31, in (Current frame)
stochFourMonitor()
You have to few values in your dataframe. You need at least 17 values (k=14, d=3)
>>> pd.Timestamp(1642935495, unit='s')
Timestamp('2022-01-23 10:58:15')
>>> pd.Timestamp(1643165895, unit='s')
Timestamp('2022-01-26 02:58:15')
>>> pd.DataFrame(get_data('BTC-PERP',14400,1642935495,1643165895)['result'])
0 2022-01-23T12:00:00+00:00 1.642939e+12 35690.0 36082.0 35000.0 35306.0 6.315513e+08
1 2022-01-23T16:00:00+00:00 1.642954e+12 35306.0 35460.0 34601.0 34785.0 7.246238e+08
2 2022-01-23T20:00:00+00:00 1.642968e+12 34785.0 36551.0 34712.0 36271.0 9.663773e+08
3 2022-01-24T00:00:00+00:00 1.642982e+12 36271.0 36283.0 35148.0 35351.0 6.007333e+08
4 2022-01-24T04:00:00+00:00 1.642997e+12 35351.0 35511.0 34821.0 34896.0 5.554126e+08
5 2022-01-24T08:00:00+00:00 1.643011e+12 34895.0 35610.0 33033.0 33709.0 1.676436e+09
6 2022-01-24T12:00:00+00:00 1.643026e+12 33709.0 34399.0 32837.0 34260.0 2.021096e+09
7 2022-01-24T16:00:00+00:00 1.643040e+12 34261.0 36493.0 33800.0 36101.0 1.989552e+09
8 2022-01-24T20:00:00+00:00 1.643054e+12 36101.0 37596.0 35990.0 36673.0 1.202684e+09
9 2022-01-25T00:00:00+00:00 1.643069e+12 36673.0 36702.0 35974.0 36431.0 4.538093e+08
10 2022-01-25T04:00:00+00:00 1.643083e+12 36431.0 36500.0 35719.0 36067.0 3.514587e+08
11 2022-01-25T08:00:00+00:00 1.643098e+12 36067.0 36824.0 36030.0 36431.0 5.830712e+08
12 2022-01-25T12:00:00+00:00 1.643112e+12 36431.0 37200.0 35997.0 36568.0 9.992247e+08
13 2022-01-25T16:00:00+00:00 1.643126e+12 36568.0 37600.0 36532.0 37079.0 8.225219e+08
14 2022-01-25T20:00:00+00:00 1.643141e+12 37077.0 37140.0 36437.0 36980.0 7.892745e+08
15 2022-01-26T00:00:00+00:00 1.643155e+12 36980.0 37242.0 36567.0 37238.0 3.226400e+08
>>> pd.DataFrame(get_data('BTC-PERP',14400,1642935495,1643165895)['result'])
...
AttributeError: 'NoneType' object has no attribute 'name'
Now change 1642935495 ('2022-01-23 10:58:15') by 1642845495 ('2022-01-22 10:58:15':
>>> pd.DataFrame(get_data('BTC-PERP',14400,1642845495,1643165895)['result']).ta.stoch()
STOCHk_14_3_3 STOCHd_14_3_3
13 NaN NaN
14 NaN NaN
15 80.824814 NaN
16 74.665546 NaN
17 72.970512 76.153624
18 73.930097 73.855385
19 80.993469 75.964693
20 84.814444 79.912670
21 89.775352 85.194422

Get historical data from binance

I am trying to extract historical data between [curr_time - 2years, curr_time]. Time gap is 1 day. So, I expect about 700 items, but i received only 3 items.
How can I fix this problem?
My code
from binance.client import Client
# Binance test_key https://testnet.binance.vision/key/generate
API_KEY = "---"
API_SECRET = "---"
DAYS_IN_YEAR = 365
DB_NAME = "charts"
def GetHistoricalData(
timedelta_days=DAYS_IN_YEAR * 2,
ticker="BTCUSDT",
kline_interval=Client.KLINE_INTERVAL_1HOUR
):
start_time = time.time()
untilThisDate = datetime.datetime.now()
sinceThisDate = untilThisDate - datetime.timedelta(days=timedelta_days)
print("ZZZZZZZZZ_ ", str(sinceThisDate), str(untilThisDate)) # 2019-11-06 00:23:43.620016 2021-11-05 00:23:43.620016
client = Client(API_KEY, API_SECRET)
client.API_URL = 'https://testnet.binance.vision/api'
candle = client.get_historical_klines(ticker, kline_interval, str(sinceThisDate), str(untilThisDate))
print("CANDLE_", len(candle)) # 3
I tried this request:
candle = client.get_historical_klines(ticker, kline_interval, "01 January, 2019", "04 November 2021")
but received only 3 items again
dateTime ...
2021-11-02 00:00:00 61722.80000000 150535.61000000 ... 448.99018200 1635897599999
2021-11-03 00:00:00 63208.69000000 100000.00000000 ... 451.03367500 1635983999999
2021-11-04 00:00:00 62894.04000000 70000.00000000 ... 401.86212800 1636070399999
Well....
If you try to request this data with API call it will give you:
In [1]: import requests
...: len(requests.get('https://testnet.binance.vision/api/v3/klines?symbol=BTCUSDT&interval=1h&limit=1000').json())
Out[1]: 65
but if you try to run it with production env of binance (btw klines/candles is a public data and don't require apiKey):
In [2]: import requests
...: len(requests.get('https://api.binance.com/api/v3/klines?symbol=BTCUSDT&interval=1h&limit=1000').json())
Out[2]: 1000
So, to fix you example, you need replace BASE_URL
client.API_URL = 'https://api.binance.com/api'
It gives me:
ZZZZZZZZZ_ 2019-11-06 01:15:15.122873 2021-11-05 01:15:15.122873
CANDLE_ 17483
Try the code below. I get a bunch of data, but its not formatted:
import datetime
from binance.client import Client
import time
# Binance test_key https://testnet.binance.vision/key/generate
API_KEY = "---"
API_SECRET = "---"
DAYS_IN_YEAR = 365
DB_NAME = "charts"
def GetHistoricalData(
timedelta_days=DAYS_IN_YEAR * 2,
ticker="BTCUSDT",
kline_interval=Client.KLINE_INTERVAL_1HOUR
):
start_time = time.time()
untilThisDate = datetime.datetime.now()
sinceThisDate = untilThisDate - datetime.timedelta(days=timedelta_days)
print("ZZZZZZZZZ_ ", str(sinceThisDate),
str(untilThisDate)) # 2019-11-06 00:23:43.620016 2021-11-05 00:23:43.620016
client = Client(API_KEY, API_SECRET)
client.API_URL = 'https://testnet.binance.vision/api'
candle = client.get_historical_klines(ticker, kline_interval, str(sinceThisDate), str(untilThisDate))
print(candle)
GetHistoricalData()

Error with pandas.dt while extracting year from a date

The data in test.csv are like this:
TIMESTAMP POLYLINE
0 1408039037 [[-8.585676,41.148522],[-8.585712,41.148639],[...
1 1408038611 [[-8.610876,41.14557],[-8.610858,41.145579],[-...
2 1408038568 [[-8.585739,41.148558],[-8.58573,41.148828],[-...
3 1408039090 [[-8.613963,41.141169],[-8.614125,41.141124],[...
4 1408039177 [[-8.619903,41.148036],[-8.619894,41.148036]]
.. ... ...
315 1419171485 [[-8.570196,41.159484],[-8.570187,41.158962],[...
316 1419170802 [[-8.613873,41.141232],[-8.613882,41.141241],[...
317 1419172121 [[-8.6481,41.152536],[-8.647461,41.15241],[-8....
318 1419171980 [[-8.571699,41.156073],[-8.570583,41.155929],[...
319 1419171420 [[-8.574561,41.180184],[-8.572248,41.17995],[-...
[320 rows x 2 columns]
I read them from csv file in this way:
train = pd.read_csv("path/train.csv",engine='python',error_bad_lines=False)
So, I have this timestamp in Unix format. I want to convert in UTC time and then extract year, month, day and so on.
This is the code for the conversion from Unix timestamp to UTC date time:
train["TIMESTAMP"] = [float(time) for time in train["TIMESTAMP"]]
train["data_time"] = [datetime.datetime.fromtimestamp(time, datetime.timezone.utc) for time in train["TIMESTAMP"]]
To extract year and other information I do this:
train["year"] = train["data_time"].dt.year
train["month"] = train["data_time"].dt.month
train["day"] = train["data_time"].dt.day
train["hour"] = train["data_time"].dt.hour
train["min"] = train["data_time"].dt.minute
But I obtain this error when the execution arrives at the extraction point:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-30-d2249cabe965> in <module>()
67 train["TIMESTAMP"] = [float(time) for time in train["TIMESTAMP"]]
68 train["data_time"] = [datetime.datetime.fromtimestamp(time, datetime.timezone.utc) for time in train["TIMESTAMP"]]
---> 69 train["year"] = train["data_time"].dt.year
70 train["month"] = train["data_time"].dt.month
71 train["day"] = train["data_time"].dt.day
2 frames
/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/accessors.py in __new__(cls, data)
478 return PeriodProperties(data, orig)
479
--> 480 raise AttributeError("Can only use .dt accessor with datetimelike values")
AttributeError: Can only use .dt accessor with datetimelike values
I also read a lot of similiar discussion but I can't figure out why I obtain this error.
Edited:
So the train["TIMESTAMP"] data are like this:
1408039037
1408038611
1408039090
Then I do this with this data:
train["TIMESTAMP"] = [float(time) for time in train["TIMESTAMP"]]
train["data_time"] = [datetime.datetime.fromtimestamp(time, datetime.timezone.utc) for time in train["TIMESTAMP"]]
train["year"] = train["data_time"].dt.year
train["month"] = train["data_time"].dt.month
train["day"] = train["data_time"].dt.day
train["hour"] = train["data_time"].dt.hour
train["min"] = train["data_time"].dt.minute
train = train[["year", "month", "day", "hour","min"]]

Categories

Resources