pandas will not let me reindex?

pandas will not let me reindex? - python

I am creating a function to more easily manipulate similar data sets, but for some reason the function is not reindexing my data frame. Could someone tell me what is going on? I am trying to figure out how to reindex and interpolate the data and am wondering why it stops there.
CODE:
import pandas as pd
data2.rename(columns={'DATE':'DATE','DGS20':'Yd'},inplace = True)
data.rename(columns={'DATE':'DATE','DGS10':'Yd'},inplace = True)
def func(dat):
dat.DATE = pd.to_datetime(dat.DATE)
dat.Yd = pd.to_numeric(dat.Yd,errors = "coerce")
dat.index = dat.DATE
dat.drop('DATE',axis = 1,inplace = True)
scale = pd.date_range(start = data.index[0],end = data.index[3774],freq = 'D')
dat = dat.reindex(scale) <--- THIS LINE IS NOT EXECUTING
dat.interpolate(method = 'time',inplace = True)
RESULT:
The function works, but the manipulation is stopping at the line I have pointed out above.
SAMPLE OF DATA:
DATE,DGS5
2004-01-02,3.36
2004-01-05,3.39
2004-01-06,3.26
2004-01-07,3.25
2004-01-08,3.24
2004-01-09,3.05
2004-01-12,3.04
2004-01-13,2.98
2004-01-14,2.96
2004-01-15,2.97
2004-01-16,3.03
2004-01-19,.
2004-01-20,3.05
2004-01-21,3.02
2004-01-22,2.96
2004-01-23,3.06
2004-01-26,3.13
2004-01-27,3.07
2004-01-28,3.22
2004-01-29,3.22
2004-01-30,3.17
2004-02-02,3.18
2004-02-03,3.12
2004-02-04,3.15
2004-02-05,3.21
2004-02-06,3.12
2004-02-09,3.08
2004-02-10,3.13
2004-02-11,3.03
2004-02-12,3.07
2004-02-13,3.01
2004-02-16,.
2004-02-17,3.02
2004-02-18,3.03
2004-02-19,3.02
2004-02-20,3.08
2004-02-23,3.03
2004-02-24,3.01
2004-02-25,2.98
2004-02-26,3.01
2004-02-27,3.01
2004-03-01,2.98
2004-03-02,3.04
2004-03-03,3.06
2004-03-04,3.02
2004-03-05,2.81
2004-03-08,2.74
2004-03-09,2.68
2004-03-10,2.71
2004-03-11,2.72
2004-03-12,2.73
2004-03-15,2.74
2004-03-16,2.65
2004-03-17,2.66
2004-03-18,2.72
2004-03-19,2.75
2004-03-22,2.69
2004-03-23,2.69
2004-03-24,2.68
2004-03-25,2.70
2004-03-26,2.81
2004-03-29,2.86
2004-03-30,2.86
2004-03-31,2.80
2004-04-01,2.87
2004-04-02,3.15
2004-04-05,3.24
2004-04-06,3.19
2004-04-07,3.19
2004-04-08,3.22
2004-04-09,.
2004-04-12,3.26
2004-04-13,3.37
2004-04-14,3.44
2004-04-15,3.45
2004-04-16,3.39
2004-04-19,3.42
2004-04-20,3.45
2004-04-21,3.52
2004-04-22,3.46
2004-04-23,3.58
2004-04-26,3.57
2004-04-27,3.52
2004-04-28,3.60
2004-04-29,3.66
2004-04-30,3.63
2004-05-03,3.63
2004-05-04,3.66
2004-05-05,3.71
2004-05-06,3.72
2004-05-07,3.96
2004-05-10,3.95
2004-05-11,3.94
2004-05-12,3.96
2004-05-13,4.01
2004-05-14,3.92
2004-05-17,3.83
2004-05-18,3.87
2004-05-19,3.93
2004-05-20,3.86
2004-05-21,3.91
2004-05-24,3.90
2004-05-25,3.89
2004-05-26,3.81
2004-05-27,3.74
2004-05-28,3.81
2004-05-31,.
2004-06-01,3.86
2004-06-02,3.91
2004-06-03,3.89
2004-06-04,3.97
2004-06-07,3.95
2004-06-08,3.96
2004-06-09,4.01
2004-06-10,4.00
2004-06-11,.
2004-06-14,4.10
2004-06-15,3.90
2004-06-16,3.96
2004-06-17,3.93
2004-06-18,3.94
2004-06-21,3.91
2004-06-22,3.92
2004-06-23,3.90
2004-06-24,3.85
2004-06-25,3.85
2004-06-28,3.97
2004-06-29,3.92
2004-06-30,3.81
2004-07-01,3.74
2004-07-02,3.62
2004-07-05,.
2004-07-06,3.65

From the v0.23.4 docs:
DataFrame.reindex supports two calling conventions
(index=index_labels, columns=column_labels, ...)
(labels, axis={'index', 'columns'}, ...)
We highly recommend using keyword arguments to clarify your intent.
EDIT: The following code works for me. I added a return statement in my function.
import pandas as pd
raw_series = {'Yd': [3.36, 3.39, 3.26, 3.25, 3.24, 3.05, 3.04, 2.98, 2.96, 2.97, 3.03, '.']}
raw_index = ['2004-01-02', '2004-01-05', '2004-01-06', '2004-01-07', '2004-01-08', '2004-01-09', '2004-01-12', '2004-01-13', '2004-01-14', '2004-01-15', '2004-01-16', '2004-01-19']
dat = pd.DataFrame(raw_series, index=raw_index)
def func(dat):
dat.loc[:, 'Yd'] = pd.to_numeric(dat['Yd'], errors="coerce")
dat.index = pd.to_datetime(dat.index)
scale = pd.date_range(raw_index[0], raw_index[-1], freq='D')
reindexed = dat.reindex(index=scale)
return reindexed.interpolate(method='time')
Output:
Yd
2004-01-02 3.360000
2004-01-03 3.370000
2004-01-04 3.380000
2004-01-05 3.390000
2004-01-06 3.260000
2004-01-07 3.250000
2004-01-08 3.240000
2004-01-09 3.050000
2004-01-10 3.046667
2004-01-11 3.043333
2004-01-12 3.040000
2004-01-13 2.980000
2004-01-14 2.960000
2004-01-15 2.970000
2004-01-16 3.030000
2004-01-17 3.035000
2004-01-18 3.040000
2004-01-19 3.045000
2004-01-20 3.050000
verify the data types:
>>>func(dat).reset_index().dtypes
index datetime64[ns]
Yd float64
dtype: object

Related

Removing utc info from yfinance dataframe

How can I remove the utc portion of a DF created from a yfinance? Every example I and approach I seen has failed.
eg:
df = yf.download('2022-01-01', '2023-01-06', interval = '60m' )
pd.to_datetime(df['Datetime'])
error: 3806 #If we have a listlike key, _check_indexing_error will raise
KeyError: 'Datetime'
As well as the following approaches
enter code heredf = df.reset_index()
df = pd.DataFrame(df, columns = ['Datetime', "Close"])
df.rename(columns = {'Date': 'ds'}, inplace = True)
df.rename(columns = {'Close':'y'}, inplace = True)
#df['ds'] = df['ds'].dt.date
#df['ds'] = datetime.fromtimestamp(df['ds'], tz = None)
#df['ds'] = df['ds'].dt.floor("Min")
#df['ds'] = pd.to_datetime(df['ds'].dt.tz_convert(None))
#df['ds'] = pd.to_datetime['ds']
#pd.to_datetime(df['ds'])
df['ds'].dt.tz_localize(None)
print(df)
with similar errors, Any help or pointer will greatly appreciated I have spent the entire morning on this.
Thanks in advance
BTT

Your code interprets '2022-01-01' as the first and required argument tickers.
This date is not a valid ticker, so yf.download() does not return any price and volume data.
Try:
df = yf.download(tickers='AAPL', start='2022-01-01', end='2023-01-06', interval = '60m' )
df.index = df.index.tz_localize(None)

How to make sure that the data in this PyTrends function comes out in YYYY-MM-DD format and not YYYY-MM-DD 00:00:00

I have the following function:
def my_funct(Keyword, Dates, Country, Col_name):
KEYWORDS=[Keyword]
KEYWORDS_CODES=[pytrend.suggestions(keyword=i)[0] for i in KEYWORDS]
df_CODES= pd.DataFrame(KEYWORDS_CODES)
EXACT_KEYWORDS=df_CODES['mid'].to_list()
DATE_INTERVAL= Dates
COUNTRY=[Country] #Use this link for iso country code
CATEGORY=0 # Use this link to select categories
SEARCH_TYPE='' #default is 'web searches',others include 'images','news','youtube','froogle' (google shopping)
Individual_EXACT_KEYWORD = list(zip(*[iter(EXACT_KEYWORDS)]*1))
Individual_EXACT_KEYWORD = [list(x) for x in Individual_EXACT_KEYWORD]
dicti = {}
i = 1
for Country in COUNTRY:
for keyword in Individual_EXACT_KEYWORD:
try:
pytrend.build_payload(kw_list=keyword,
timeframe = DATE_INTERVAL,
geo = Country,
cat=CATEGORY,
gprop=SEARCH_TYPE)
dicti[i] = pytrend.interest_over_time()
i+=1
time.sleep(6)
except requests.exceptions.Timeout:
print("Timeout occured")
df_trends = pd.concat(dicti, axis=1)
df_trends.columns = df_trends.columns.droplevel(0) #drop outside header
df_trends = df_trends.drop('isPartial', axis = 1) #drop "isPartial"
df_trends.reset_index(level=0,inplace=True) #reset_index
df_trends.columns=['date', Col_name] #change column names
return df_trends
Then I call the function using:
x1 = my_funct('Unemployment', '2004-01-04 2009-01-04', 'DK', 'Unemployment (Denmark)')
Then I put that into a df:
df1 = pd.DataFrame(x1)
Once I convert that df to excel, how do I ensure that it is in YYYY-MM-DD format without the dangling 00:00:00? Anytime I convert it comes out with hours and seconds.
I tried df1 = pd.DataFrame(x1).dt.strftime('%Y-%m-%d') but it says that this cannot be used?
Please help
Thanks

You are trying pass dt.strftime on the entire dataframe, but you need to pass it on the date column:
df1['date'] = df1['date'].dt.strftime('%Y-%m-%d')

How may I append new results from iterating through a list, into a new column in the dataframe

Im attempting to create a table as follows, where equities in a list get appended as columns to the dataframe:
Fundamentals CTRP EBAY ...... MPNGF
price
dividend
five_year_dividend
pe_ratio
pegRatio
priceToBook
price_to_sales
book_value
ebit
net_income
EPS
DebtEquity
threeYearAverageReturn
At the moment, based on the code below, only the last equity in the list is showing up:
Fundamentals MPNGF
price
dividend
five_year_dividend
pe_ratio
pegRatio
priceToBook
price_to_sales
book_value
ebit
net_income
EPS
DebtEquity
threeYearAverageReturn
from yahoofinancials import YahooFinancials
import pandas as pd
import lxml
from lxml import html
import requests
import numpy as np
from datetime import datetime
def scrape_table(url):
page = requests.get(url)
tree = html.fromstring(page.content)
table = tree.xpath('//table')
assert len(table) == 1
df = pd.read_html(lxml.etree.tostring(table[0], method='html'))[0]
df = df.set_index(0)
df = df.dropna()
df = df.transpose()
df = df.replace('-', '0')
df[df.columns[0]] = pd.to_datetime(df[df.columns[0]])
cols = list(df.columns)
cols[0] = 'Date'
df = df.set_axis(cols, axis='columns', inplace=False)
numeric_columns = list(df.columns)[1::]
df[numeric_columns] = df[numeric_columns].astype(np.float64)
return df
ecommerce = ['CTRP', 'EBAY', 'GRUB', 'BABA', 'JD', 'EXPE', 'AMZN', 'BKNG', 'MPNGF']
price=[]
dividend=[]
five_year_dividend=[]
pe_ratio=[]
pegRatio=[]
priceToBook=[]
price_to_sales=[]
book_value=[]
ebit=[]
net_income=[]
EPS=[]
DebtEquity=[]
threeYearAverageReturn=[]
for i, symbol in enumerate(ecommerce):
yahoo_financials = YahooFinancials(symbol)
balance_sheet_url = 'https://finance.yahoo.com/quote/' + symbol + '/balance-sheet?p=' + symbol
df_balance_sheet = scrape_table(balance_sheet_url)
df_balance_sheet_de = pd.DataFrame(df_balance_sheet, columns = ["Total Liabilities", "Total stockholders' equity"])
j= df_balance_sheet_de.loc[[1]]
j['DebtEquity'] = j["Total Liabilities"]/j["Total stockholders' equity"]
k= j.iloc[0]['DebtEquity']
X = yahoo_financials.get_key_statistics_data()
for d in X.values():
PEG = d['pegRatio']
PB = d['priceToBook']
three_year_ave_return = d['threeYearAverageReturn']
data = [['price', yahoo_financials.get_current_price()], ['dividend', yahoo_financials.get_dividend_yield()], ['five_year_dividend', yahoo_financials.get_five_yr_avg_div_yield()], ['pe_ratio', yahoo_financials.get_pe_ratio()], ['pegRatio', PEG], ['priceToBook', PB], ['price_to_sales', yahoo_financials.get_price_to_sales()], ['book_value', yahoo_financials.get_book_value()], ['ebit', yahoo_financials.get_ebit()], ['net_income', yahoo_financials.get_net_income()], ['EPS', yahoo_financials.get_earnings_per_share()], ['DebtEquity', mee], ['threeYearAverageReturn', three_year_ave_return]]
data.append(symbol.text)
df = pd.DataFrame(data, columns = ['Fundamentals', symbol])
df
Seeking your kind advice please as to where may i have gone wrong in the above table? Thank you so very much!

You need to call your df outside of your for loop. Your code as currently written will recreate a new df for every loop.

Multiple variables loop and append dataframe

I am trying to loop over 2 lists to get all combinations possible in the loop below. I have some difficulties to understand why the first part works and the second does not. Basically it query the same data but with all pattern from the lists. Any help would be very appreciated.
THE CODE:
base = ['BTC', 'ETH']
quoted = ['USDT', 'AUD','USD']
def daily_volume_historical(symbol, comparison_symbol, all_data=False, limit=90, aggregate=1, exchange=''):
url = 'https://min-api.cryptocompare.com/data/histoday?fsym={}&tsym={}&limit={}&aggregate={}'\
.format(symbol.upper(), comparison_symbol.upper(), limit, aggregate)
if exchange:
url += '&e={}'.format(exchange)
if all_data:
url += '&allData=true'
page = requests.get(url)
data = page.json()['Data']
df = pd.DataFrame(data)
df.drop(df.index[-1], inplace=True)
df['timestamp'] = [datetime.datetime.fromtimestamp(d) for d in df.time]
df.set_index('timestamp')
return df
## THIS CODE GIVES SOME DATA ##
volu = daily_volume_historical('BTC', 'USD', 'CCCAGG').set_index('timestamp').volumefrom
## THIS CODE GIVES EMPTY DATA FRAME ##
d_volu = []
for a,b in [(a,b) for a in base for b in quoted]:
volu = daily_volume_historical(a, b, exchange= 'CCCAGG').volumefrom
d_volu.append
d_volu = pd.concat(d_volu, axis=1)
volu output sample:
timestamp
2010-07-17 09:00:00 20.00
2010-07-18 09:00:00 75.01
2010-07-19 09:00:00 574.00
2010-07-20 09:00:00 262.00
2010-07-21 09:00:00 575.00
2010-07-22 09:00:00 2160.00
2010-07-23 09:00:00 2402.50
2010-07-24 09:00:00 496.32

import itertools
base = ['BTC', 'ETH']
quoted = ['USDT', 'AUD','USD']
combinations = list(itertools.product(base, quoted))
def daily_volume_historical(symbol, comparison_symbol, all_data=False, limit=90, aggregate=1, exchange=''):
url = 'https://min-api.cryptocompare.com/data/histoday?fsym={}&tsym={}&limit={}&aggregate={}'\
.format(symbol.upper(), comparison_symbol.upper(), limit, aggregate)
if exchange:
url += '&e={}'.format(exchange)
if all_data:
url += '&allData=true'
page = requests.get(url)
data = page.json()['Data']
df = pd.DataFrame(data)
df.drop(df.index[-1], inplace=True)
df['timestamp'] = [datetime.datetime.fromtimestamp(d) for d in df.time]
df.set_index('timestamp')
return df
## THIS CODE GIVES SOME DATA ##
volu = daily_volume_historical('BTC', 'USD', 'CCCAGG').set_index('timestamp').volumefrom
## THIS CODE GIVES EMPTY DATA FRAME ##
d_volu = []
for a,b in combinations:
volu = daily_volume_historical(a, b, exchange= 'CCCAGG').volumefrom
d_volu.append
d_volu = pd.concat(d_volu, axis=1)

Panda DataFrame Row Items IF Comparison doesnt return correct result

I retrieve data from quandl and load it to a pandas DF object.
Afterwards I calculate SMA values (SMA21, SMA55) based on "Last Price".
Adding those SMA values as a column do my DF object.
I iterate through DF to catch a buy signal.
I know the buy condition is holding true for some dates but my code does not printing anything out. I am expecting to print the buy condition at the very least.
as below you can see the following condition:
kitem['SMA21'] >= kitem['Last']
My code:
import requests
import pandas as pd
import json
class URL_Params:
def __init__ (self, endPoint, symboll, startDate, endDate, apiKey):
self.endPoint = endPoint
self.symboll = symboll
self.startDate = startDate
self.endDate = endDate
self.apiKey = apiKey
def createURL (self):
return self.endPoint + self.symboll + '?start_date=' + self.startDate + '&end_date=' + self.endDate + '&api_key=' + self.apiKey
def add_url(self, _url):
self.url_list
my_portfolio = {'BTC':1.0, 'XRP':0, 'DSH':0, 'XMR':0, 'TotalBTCValue':1.0}
_endPoint = 'https://www.quandl.com/api/v3/datasets/BITFINEX/'
_symbolls = ['BTCEUR','XRPBTC','DSHBTC','IOTBTC','XMRBTC']
_startDate = '2017-01-01'
_endDate = '2019-03-01'
_apiKey = '' #needs to be set for quandl
my_data = {}
my_conns = {}
my_col_names = ['Date', 'High', 'Low', 'Mid', 'Last', 'Bid', 'Ask', 'Volume']
orderbook = []
#create connection and load data for each pair/market.
#load them in a dict for later use
for idx_symbol in _symbolls:
my_url_params = URL_Params(_endPoint,idx_symbol,_startDate,_endDate,_apiKey)
response = requests.get(my_url_params.createURL())
my_data[idx_symbol] = json.loads(response.text)
#Prepare Data
my_raw_data_df_xrpbtc = pd.DataFrame(my_data['XRPBTC']['dataset']['data'], columns= my_data['XRPBTC']['dataset']['column_names'])
#Set Index to Date Column and Sort
my_raw_data_df_xrpbtc['Date'] = pd.to_datetime(my_raw_data_df_xrpbtc['Date'])
my_raw_data_df_xrpbtc.index = my_raw_data_df_xrpbtc['Date']
my_raw_data_df_xrpbtc = my_raw_data_df_xrpbtc.sort_index()
#Drop unrelated columns
my_raw_data_df_xrpbtc.drop(['Date'], axis=1, inplace=True)
my_raw_data_df_xrpbtc.drop(['Ask'], axis=1, inplace=True)
my_raw_data_df_xrpbtc.drop(['Bid'], axis=1, inplace=True)
my_raw_data_df_xrpbtc.drop(['Low'], axis=1, inplace=True)
my_raw_data_df_xrpbtc.drop(['High'], axis=1, inplace=True)
my_raw_data_df_xrpbtc.drop(['Mid'], axis=1, inplace=True)
#Calculate SMA values to create buy-sell signal
my_raw_data_df_xrpbtc['SMA21'] = my_raw_data_df_xrpbtc['Last'].rolling(21).mean()
my_raw_data_df_xrpbtc['SMA55'] = my_raw_data_df_xrpbtc['Last'].rolling(55).mean()
my_raw_data_df_xrpbtc['SMA200'] = my_raw_data_df_xrpbtc['Last'].rolling(200).mean()
#Check for each day if buy signal holds BUY if sell signal holds SELL
for idx,kitem in my_raw_data_df_xrpbtc.iterrows():
if (kitem['SMA21'] >= kitem['Last']) is True: #buy signal
print("buy0")
if my_portfolio['BTC'] > 0 is True:
print("buy1")
if (kitem['Last'] * my_portfolio['XRP']) >= (my_portfolio['BTC'] * 1.05) is True: #sell signal
print("sell0")
if my_portfolio['XRP'] > 0 is True:
print("sell1")
I know that there are lots of rows that holds true but my code never enters this path of code so it does not print out what I expect.
Could anyone please help/comment what might be wrong?

The reason is that your comparison is wrong. The result of kitem['SMA21'] >= kitem['Last'] will be a numpy.bool_. When you use is to compare it to True this will fail as it is not the same object.
If you change the comparison to == it will work as expected:
if (kitem['SMA21'] >= kitem['Last']) == True:

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

pandas will not let me reindex? - python

Related

Removing utc info from yfinance dataframe

How to make sure that the data in this PyTrends function comes out in YYYY-MM-DD format and not YYYY-MM-DD 00:00:00

How may I append new results from iterating through a list, into a new column in the dataframe

Multiple variables loop and append dataframe

Panda DataFrame Row Items IF Comparison doesnt return correct result

Categories

Resources