pandas will not let me reindex? - python
I am creating a function to more easily manipulate similar data sets, but for some reason the function is not reindexing my data frame. Could someone tell me what is going on? I am trying to figure out how to reindex and interpolate the data and am wondering why it stops there.
CODE:
import pandas as pd
data2.rename(columns={'DATE':'DATE','DGS20':'Yd'},inplace = True)
data.rename(columns={'DATE':'DATE','DGS10':'Yd'},inplace = True)
def func(dat):
dat.DATE = pd.to_datetime(dat.DATE)
dat.Yd = pd.to_numeric(dat.Yd,errors = "coerce")
dat.index = dat.DATE
dat.drop('DATE',axis = 1,inplace = True)
scale = pd.date_range(start = data.index[0],end = data.index[3774],freq = 'D')
dat = dat.reindex(scale) <--- THIS LINE IS NOT EXECUTING
dat.interpolate(method = 'time',inplace = True)
RESULT:
The function works, but the manipulation is stopping at the line I have pointed out above.
SAMPLE OF DATA:
DATE,DGS5
2004-01-02,3.36
2004-01-05,3.39
2004-01-06,3.26
2004-01-07,3.25
2004-01-08,3.24
2004-01-09,3.05
2004-01-12,3.04
2004-01-13,2.98
2004-01-14,2.96
2004-01-15,2.97
2004-01-16,3.03
2004-01-19,.
2004-01-20,3.05
2004-01-21,3.02
2004-01-22,2.96
2004-01-23,3.06
2004-01-26,3.13
2004-01-27,3.07
2004-01-28,3.22
2004-01-29,3.22
2004-01-30,3.17
2004-02-02,3.18
2004-02-03,3.12
2004-02-04,3.15
2004-02-05,3.21
2004-02-06,3.12
2004-02-09,3.08
2004-02-10,3.13
2004-02-11,3.03
2004-02-12,3.07
2004-02-13,3.01
2004-02-16,.
2004-02-17,3.02
2004-02-18,3.03
2004-02-19,3.02
2004-02-20,3.08
2004-02-23,3.03
2004-02-24,3.01
2004-02-25,2.98
2004-02-26,3.01
2004-02-27,3.01
2004-03-01,2.98
2004-03-02,3.04
2004-03-03,3.06
2004-03-04,3.02
2004-03-05,2.81
2004-03-08,2.74
2004-03-09,2.68
2004-03-10,2.71
2004-03-11,2.72
2004-03-12,2.73
2004-03-15,2.74
2004-03-16,2.65
2004-03-17,2.66
2004-03-18,2.72
2004-03-19,2.75
2004-03-22,2.69
2004-03-23,2.69
2004-03-24,2.68
2004-03-25,2.70
2004-03-26,2.81
2004-03-29,2.86
2004-03-30,2.86
2004-03-31,2.80
2004-04-01,2.87
2004-04-02,3.15
2004-04-05,3.24
2004-04-06,3.19
2004-04-07,3.19
2004-04-08,3.22
2004-04-09,.
2004-04-12,3.26
2004-04-13,3.37
2004-04-14,3.44
2004-04-15,3.45
2004-04-16,3.39
2004-04-19,3.42
2004-04-20,3.45
2004-04-21,3.52
2004-04-22,3.46
2004-04-23,3.58
2004-04-26,3.57
2004-04-27,3.52
2004-04-28,3.60
2004-04-29,3.66
2004-04-30,3.63
2004-05-03,3.63
2004-05-04,3.66
2004-05-05,3.71
2004-05-06,3.72
2004-05-07,3.96
2004-05-10,3.95
2004-05-11,3.94
2004-05-12,3.96
2004-05-13,4.01
2004-05-14,3.92
2004-05-17,3.83
2004-05-18,3.87
2004-05-19,3.93
2004-05-20,3.86
2004-05-21,3.91
2004-05-24,3.90
2004-05-25,3.89
2004-05-26,3.81
2004-05-27,3.74
2004-05-28,3.81
2004-05-31,.
2004-06-01,3.86
2004-06-02,3.91
2004-06-03,3.89
2004-06-04,3.97
2004-06-07,3.95
2004-06-08,3.96
2004-06-09,4.01
2004-06-10,4.00
2004-06-11,.
2004-06-14,4.10
2004-06-15,3.90
2004-06-16,3.96
2004-06-17,3.93
2004-06-18,3.94
2004-06-21,3.91
2004-06-22,3.92
2004-06-23,3.90
2004-06-24,3.85
2004-06-25,3.85
2004-06-28,3.97
2004-06-29,3.92
2004-06-30,3.81
2004-07-01,3.74
2004-07-02,3.62
2004-07-05,.
2004-07-06,3.65
From the v0.23.4 docs:
DataFrame.reindex supports two calling conventions
(index=index_labels, columns=column_labels, ...)
(labels, axis={'index', 'columns'}, ...)
We highly recommend using keyword arguments to clarify your intent.
EDIT: The following code works for me. I added a return statement in my function.
import pandas as pd
raw_series = {'Yd': [3.36, 3.39, 3.26, 3.25, 3.24, 3.05, 3.04, 2.98, 2.96, 2.97, 3.03, '.']}
raw_index = ['2004-01-02', '2004-01-05', '2004-01-06', '2004-01-07', '2004-01-08', '2004-01-09', '2004-01-12', '2004-01-13', '2004-01-14', '2004-01-15', '2004-01-16', '2004-01-19']
dat = pd.DataFrame(raw_series, index=raw_index)
def func(dat):
dat.loc[:, 'Yd'] = pd.to_numeric(dat['Yd'], errors="coerce")
dat.index = pd.to_datetime(dat.index)
scale = pd.date_range(raw_index[0], raw_index[-1], freq='D')
reindexed = dat.reindex(index=scale)
return reindexed.interpolate(method='time')
Output:
Yd
2004-01-02 3.360000
2004-01-03 3.370000
2004-01-04 3.380000
2004-01-05 3.390000
2004-01-06 3.260000
2004-01-07 3.250000
2004-01-08 3.240000
2004-01-09 3.050000
2004-01-10 3.046667
2004-01-11 3.043333
2004-01-12 3.040000
2004-01-13 2.980000
2004-01-14 2.960000
2004-01-15 2.970000
2004-01-16 3.030000
2004-01-17 3.035000
2004-01-18 3.040000
2004-01-19 3.045000
2004-01-20 3.050000
verify the data types:
>>>func(dat).reset_index().dtypes
index datetime64[ns]
Yd float64
dtype: object
Related
Removing utc info from yfinance dataframe
How can I remove the utc portion of a DF created from a yfinance? Every example I and approach I seen has failed. eg: df = yf.download('2022-01-01', '2023-01-06', interval = '60m' ) pd.to_datetime(df['Datetime']) error: 3806 #If we have a listlike key, _check_indexing_error will raise KeyError: 'Datetime' As well as the following approaches enter code heredf = df.reset_index() df = pd.DataFrame(df, columns = ['Datetime', "Close"]) df.rename(columns = {'Date': 'ds'}, inplace = True) df.rename(columns = {'Close':'y'}, inplace = True) #df['ds'] = df['ds'].dt.date #df['ds'] = datetime.fromtimestamp(df['ds'], tz = None) #df['ds'] = df['ds'].dt.floor("Min") #df['ds'] = pd.to_datetime(df['ds'].dt.tz_convert(None)) #df['ds'] = pd.to_datetime['ds'] #pd.to_datetime(df['ds']) df['ds'].dt.tz_localize(None) print(df) with similar errors, Any help or pointer will greatly appreciated I have spent the entire morning on this. Thanks in advance BTT
Your code interprets '2022-01-01' as the first and required argument tickers. This date is not a valid ticker, so yf.download() does not return any price and volume data. Try: df = yf.download(tickers='AAPL', start='2022-01-01', end='2023-01-06', interval = '60m' ) df.index = df.index.tz_localize(None)
How to make sure that the data in this PyTrends function comes out in YYYY-MM-DD format and not YYYY-MM-DD 00:00:00
I have the following function: def my_funct(Keyword, Dates, Country, Col_name): KEYWORDS=[Keyword] KEYWORDS_CODES=[pytrend.suggestions(keyword=i)[0] for i in KEYWORDS] df_CODES= pd.DataFrame(KEYWORDS_CODES) EXACT_KEYWORDS=df_CODES['mid'].to_list() DATE_INTERVAL= Dates COUNTRY=[Country] #Use this link for iso country code CATEGORY=0 # Use this link to select categories SEARCH_TYPE='' #default is 'web searches',others include 'images','news','youtube','froogle' (google shopping) Individual_EXACT_KEYWORD = list(zip(*[iter(EXACT_KEYWORDS)]*1)) Individual_EXACT_KEYWORD = [list(x) for x in Individual_EXACT_KEYWORD] dicti = {} i = 1 for Country in COUNTRY: for keyword in Individual_EXACT_KEYWORD: try: pytrend.build_payload(kw_list=keyword, timeframe = DATE_INTERVAL, geo = Country, cat=CATEGORY, gprop=SEARCH_TYPE) dicti[i] = pytrend.interest_over_time() i+=1 time.sleep(6) except requests.exceptions.Timeout: print("Timeout occured") df_trends = pd.concat(dicti, axis=1) df_trends.columns = df_trends.columns.droplevel(0) #drop outside header df_trends = df_trends.drop('isPartial', axis = 1) #drop "isPartial" df_trends.reset_index(level=0,inplace=True) #reset_index df_trends.columns=['date', Col_name] #change column names return df_trends Then I call the function using: x1 = my_funct('Unemployment', '2004-01-04 2009-01-04', 'DK', 'Unemployment (Denmark)') Then I put that into a df: df1 = pd.DataFrame(x1) Once I convert that df to excel, how do I ensure that it is in YYYY-MM-DD format without the dangling 00:00:00? Anytime I convert it comes out with hours and seconds. I tried df1 = pd.DataFrame(x1).dt.strftime('%Y-%m-%d') but it says that this cannot be used? Please help Thanks
You are trying pass dt.strftime on the entire dataframe, but you need to pass it on the date column: df1['date'] = df1['date'].dt.strftime('%Y-%m-%d')
How may I append new results from iterating through a list, into a new column in the dataframe
Im attempting to create a table as follows, where equities in a list get appended as columns to the dataframe: Fundamentals CTRP EBAY ...... MPNGF price dividend five_year_dividend pe_ratio pegRatio priceToBook price_to_sales book_value ebit net_income EPS DebtEquity threeYearAverageReturn At the moment, based on the code below, only the last equity in the list is showing up: Fundamentals MPNGF price dividend five_year_dividend pe_ratio pegRatio priceToBook price_to_sales book_value ebit net_income EPS DebtEquity threeYearAverageReturn from yahoofinancials import YahooFinancials import pandas as pd import lxml from lxml import html import requests import numpy as np from datetime import datetime def scrape_table(url): page = requests.get(url) tree = html.fromstring(page.content) table = tree.xpath('//table') assert len(table) == 1 df = pd.read_html(lxml.etree.tostring(table[0], method='html'))[0] df = df.set_index(0) df = df.dropna() df = df.transpose() df = df.replace('-', '0') df[df.columns[0]] = pd.to_datetime(df[df.columns[0]]) cols = list(df.columns) cols[0] = 'Date' df = df.set_axis(cols, axis='columns', inplace=False) numeric_columns = list(df.columns)[1::] df[numeric_columns] = df[numeric_columns].astype(np.float64) return df ecommerce = ['CTRP', 'EBAY', 'GRUB', 'BABA', 'JD', 'EXPE', 'AMZN', 'BKNG', 'MPNGF'] price=[] dividend=[] five_year_dividend=[] pe_ratio=[] pegRatio=[] priceToBook=[] price_to_sales=[] book_value=[] ebit=[] net_income=[] EPS=[] DebtEquity=[] threeYearAverageReturn=[] for i, symbol in enumerate(ecommerce): yahoo_financials = YahooFinancials(symbol) balance_sheet_url = 'https://finance.yahoo.com/quote/' + symbol + '/balance-sheet?p=' + symbol df_balance_sheet = scrape_table(balance_sheet_url) df_balance_sheet_de = pd.DataFrame(df_balance_sheet, columns = ["Total Liabilities", "Total stockholders' equity"]) j= df_balance_sheet_de.loc[[1]] j['DebtEquity'] = j["Total Liabilities"]/j["Total stockholders' equity"] k= j.iloc[0]['DebtEquity'] X = yahoo_financials.get_key_statistics_data() for d in X.values(): PEG = d['pegRatio'] PB = d['priceToBook'] three_year_ave_return = d['threeYearAverageReturn'] data = [['price', yahoo_financials.get_current_price()], ['dividend', yahoo_financials.get_dividend_yield()], ['five_year_dividend', yahoo_financials.get_five_yr_avg_div_yield()], ['pe_ratio', yahoo_financials.get_pe_ratio()], ['pegRatio', PEG], ['priceToBook', PB], ['price_to_sales', yahoo_financials.get_price_to_sales()], ['book_value', yahoo_financials.get_book_value()], ['ebit', yahoo_financials.get_ebit()], ['net_income', yahoo_financials.get_net_income()], ['EPS', yahoo_financials.get_earnings_per_share()], ['DebtEquity', mee], ['threeYearAverageReturn', three_year_ave_return]] data.append(symbol.text) df = pd.DataFrame(data, columns = ['Fundamentals', symbol]) df Seeking your kind advice please as to where may i have gone wrong in the above table? Thank you so very much!
You need to call your df outside of your for loop. Your code as currently written will recreate a new df for every loop.
Multiple variables loop and append dataframe
I am trying to loop over 2 lists to get all combinations possible in the loop below. I have some difficulties to understand why the first part works and the second does not. Basically it query the same data but with all pattern from the lists. Any help would be very appreciated. THE CODE: base = ['BTC', 'ETH'] quoted = ['USDT', 'AUD','USD'] def daily_volume_historical(symbol, comparison_symbol, all_data=False, limit=90, aggregate=1, exchange=''): url = 'https://min-api.cryptocompare.com/data/histoday?fsym={}&tsym={}&limit={}&aggregate={}'\ .format(symbol.upper(), comparison_symbol.upper(), limit, aggregate) if exchange: url += '&e={}'.format(exchange) if all_data: url += '&allData=true' page = requests.get(url) data = page.json()['Data'] df = pd.DataFrame(data) df.drop(df.index[-1], inplace=True) df['timestamp'] = [datetime.datetime.fromtimestamp(d) for d in df.time] df.set_index('timestamp') return df ## THIS CODE GIVES SOME DATA ## volu = daily_volume_historical('BTC', 'USD', 'CCCAGG').set_index('timestamp').volumefrom ## THIS CODE GIVES EMPTY DATA FRAME ## d_volu = [] for a,b in [(a,b) for a in base for b in quoted]: volu = daily_volume_historical(a, b, exchange= 'CCCAGG').volumefrom d_volu.append d_volu = pd.concat(d_volu, axis=1) volu output sample: timestamp 2010-07-17 09:00:00 20.00 2010-07-18 09:00:00 75.01 2010-07-19 09:00:00 574.00 2010-07-20 09:00:00 262.00 2010-07-21 09:00:00 575.00 2010-07-22 09:00:00 2160.00 2010-07-23 09:00:00 2402.50 2010-07-24 09:00:00 496.32
import itertools base = ['BTC', 'ETH'] quoted = ['USDT', 'AUD','USD'] combinations = list(itertools.product(base, quoted)) def daily_volume_historical(symbol, comparison_symbol, all_data=False, limit=90, aggregate=1, exchange=''): url = 'https://min-api.cryptocompare.com/data/histoday?fsym={}&tsym={}&limit={}&aggregate={}'\ .format(symbol.upper(), comparison_symbol.upper(), limit, aggregate) if exchange: url += '&e={}'.format(exchange) if all_data: url += '&allData=true' page = requests.get(url) data = page.json()['Data'] df = pd.DataFrame(data) df.drop(df.index[-1], inplace=True) df['timestamp'] = [datetime.datetime.fromtimestamp(d) for d in df.time] df.set_index('timestamp') return df ## THIS CODE GIVES SOME DATA ## volu = daily_volume_historical('BTC', 'USD', 'CCCAGG').set_index('timestamp').volumefrom ## THIS CODE GIVES EMPTY DATA FRAME ## d_volu = [] for a,b in combinations: volu = daily_volume_historical(a, b, exchange= 'CCCAGG').volumefrom d_volu.append d_volu = pd.concat(d_volu, axis=1)
Panda DataFrame Row Items IF Comparison doesnt return correct result
I retrieve data from quandl and load it to a pandas DF object. Afterwards I calculate SMA values (SMA21, SMA55) based on "Last Price". Adding those SMA values as a column do my DF object. I iterate through DF to catch a buy signal. I know the buy condition is holding true for some dates but my code does not printing anything out. I am expecting to print the buy condition at the very least. as below you can see the following condition: kitem['SMA21'] >= kitem['Last'] My code: import requests import pandas as pd import json class URL_Params: def __init__ (self, endPoint, symboll, startDate, endDate, apiKey): self.endPoint = endPoint self.symboll = symboll self.startDate = startDate self.endDate = endDate self.apiKey = apiKey def createURL (self): return self.endPoint + self.symboll + '?start_date=' + self.startDate + '&end_date=' + self.endDate + '&api_key=' + self.apiKey def add_url(self, _url): self.url_list my_portfolio = {'BTC':1.0, 'XRP':0, 'DSH':0, 'XMR':0, 'TotalBTCValue':1.0} _endPoint = 'https://www.quandl.com/api/v3/datasets/BITFINEX/' _symbolls = ['BTCEUR','XRPBTC','DSHBTC','IOTBTC','XMRBTC'] _startDate = '2017-01-01' _endDate = '2019-03-01' _apiKey = '' #needs to be set for quandl my_data = {} my_conns = {} my_col_names = ['Date', 'High', 'Low', 'Mid', 'Last', 'Bid', 'Ask', 'Volume'] orderbook = [] #create connection and load data for each pair/market. #load them in a dict for later use for idx_symbol in _symbolls: my_url_params = URL_Params(_endPoint,idx_symbol,_startDate,_endDate,_apiKey) response = requests.get(my_url_params.createURL()) my_data[idx_symbol] = json.loads(response.text) #Prepare Data my_raw_data_df_xrpbtc = pd.DataFrame(my_data['XRPBTC']['dataset']['data'], columns= my_data['XRPBTC']['dataset']['column_names']) #Set Index to Date Column and Sort my_raw_data_df_xrpbtc['Date'] = pd.to_datetime(my_raw_data_df_xrpbtc['Date']) my_raw_data_df_xrpbtc.index = my_raw_data_df_xrpbtc['Date'] my_raw_data_df_xrpbtc = my_raw_data_df_xrpbtc.sort_index() #Drop unrelated columns my_raw_data_df_xrpbtc.drop(['Date'], axis=1, inplace=True) my_raw_data_df_xrpbtc.drop(['Ask'], axis=1, inplace=True) my_raw_data_df_xrpbtc.drop(['Bid'], axis=1, inplace=True) my_raw_data_df_xrpbtc.drop(['Low'], axis=1, inplace=True) my_raw_data_df_xrpbtc.drop(['High'], axis=1, inplace=True) my_raw_data_df_xrpbtc.drop(['Mid'], axis=1, inplace=True) #Calculate SMA values to create buy-sell signal my_raw_data_df_xrpbtc['SMA21'] = my_raw_data_df_xrpbtc['Last'].rolling(21).mean() my_raw_data_df_xrpbtc['SMA55'] = my_raw_data_df_xrpbtc['Last'].rolling(55).mean() my_raw_data_df_xrpbtc['SMA200'] = my_raw_data_df_xrpbtc['Last'].rolling(200).mean() #Check for each day if buy signal holds BUY if sell signal holds SELL for idx,kitem in my_raw_data_df_xrpbtc.iterrows(): if (kitem['SMA21'] >= kitem['Last']) is True: #buy signal print("buy0") if my_portfolio['BTC'] > 0 is True: print("buy1") if (kitem['Last'] * my_portfolio['XRP']) >= (my_portfolio['BTC'] * 1.05) is True: #sell signal print("sell0") if my_portfolio['XRP'] > 0 is True: print("sell1") I know that there are lots of rows that holds true but my code never enters this path of code so it does not print out what I expect. Could anyone please help/comment what might be wrong?
The reason is that your comparison is wrong. The result of kitem['SMA21'] >= kitem['Last'] will be a numpy.bool_. When you use is to compare it to True this will fail as it is not the same object. If you change the comparison to == it will work as expected: if (kitem['SMA21'] >= kitem['Last']) == True: