Deleting empty columns with a few columns between data

Deleting empty columns with a few columns between data - python

I'm fetching data from a Google sheet:
values1 = pd.DataFrame(values)
aux = values1.head(1)
values1.drop(index={0}, inplace=True)
senal1 = (values1[2] == "SEÑAL")
senal = values1[senal1]
senal.dropna(axis=1, inplace=True)
print(senal)
This is my result after running the code:

Related

Removing utc info from yfinance dataframe

How can I remove the utc portion of a DF created from a yfinance? Every example I and approach I seen has failed.
eg:
df = yf.download('2022-01-01', '2023-01-06', interval = '60m' )
pd.to_datetime(df['Datetime'])
error: 3806 #If we have a listlike key, _check_indexing_error will raise
KeyError: 'Datetime'
As well as the following approaches
enter code heredf = df.reset_index()
df = pd.DataFrame(df, columns = ['Datetime', "Close"])
df.rename(columns = {'Date': 'ds'}, inplace = True)
df.rename(columns = {'Close':'y'}, inplace = True)
#df['ds'] = df['ds'].dt.date
#df['ds'] = datetime.fromtimestamp(df['ds'], tz = None)
#df['ds'] = df['ds'].dt.floor("Min")
#df['ds'] = pd.to_datetime(df['ds'].dt.tz_convert(None))
#df['ds'] = pd.to_datetime['ds']
#pd.to_datetime(df['ds'])
df['ds'].dt.tz_localize(None)
print(df)
with similar errors, Any help or pointer will greatly appreciated I have spent the entire morning on this.
Thanks in advance
BTT

Your code interprets '2022-01-01' as the first and required argument tickers.
This date is not a valid ticker, so yf.download() does not return any price and volume data.
Try:
df = yf.download(tickers='AAPL', start='2022-01-01', end='2023-01-06', interval = '60m' )
df.index = df.index.tz_localize(None)

How to make sure that the data in this PyTrends function comes out in YYYY-MM-DD format and not YYYY-MM-DD 00:00:00

I have the following function:
def my_funct(Keyword, Dates, Country, Col_name):
KEYWORDS=[Keyword]
KEYWORDS_CODES=[pytrend.suggestions(keyword=i)[0] for i in KEYWORDS]
df_CODES= pd.DataFrame(KEYWORDS_CODES)
EXACT_KEYWORDS=df_CODES['mid'].to_list()
DATE_INTERVAL= Dates
COUNTRY=[Country] #Use this link for iso country code
CATEGORY=0 # Use this link to select categories
SEARCH_TYPE='' #default is 'web searches',others include 'images','news','youtube','froogle' (google shopping)
Individual_EXACT_KEYWORD = list(zip(*[iter(EXACT_KEYWORDS)]*1))
Individual_EXACT_KEYWORD = [list(x) for x in Individual_EXACT_KEYWORD]
dicti = {}
i = 1
for Country in COUNTRY:
for keyword in Individual_EXACT_KEYWORD:
try:
pytrend.build_payload(kw_list=keyword,
timeframe = DATE_INTERVAL,
geo = Country,
cat=CATEGORY,
gprop=SEARCH_TYPE)
dicti[i] = pytrend.interest_over_time()
i+=1
time.sleep(6)
except requests.exceptions.Timeout:
print("Timeout occured")
df_trends = pd.concat(dicti, axis=1)
df_trends.columns = df_trends.columns.droplevel(0) #drop outside header
df_trends = df_trends.drop('isPartial', axis = 1) #drop "isPartial"
df_trends.reset_index(level=0,inplace=True) #reset_index
df_trends.columns=['date', Col_name] #change column names
return df_trends
Then I call the function using:
x1 = my_funct('Unemployment', '2004-01-04 2009-01-04', 'DK', 'Unemployment (Denmark)')
Then I put that into a df:
df1 = pd.DataFrame(x1)
Once I convert that df to excel, how do I ensure that it is in YYYY-MM-DD format without the dangling 00:00:00? Anytime I convert it comes out with hours and seconds.
I tried df1 = pd.DataFrame(x1).dt.strftime('%Y-%m-%d') but it says that this cannot be used?
Please help
Thanks

You are trying pass dt.strftime on the entire dataframe, but you need to pass it on the date column:
df1['date'] = df1['date'].dt.strftime('%Y-%m-%d')

Looping through a list of names of dataframes

I have a list of dataframes, each created from a unique web query;
bngimp = parse_forecast_data(get_json('419524'), None)
belimp = parse_forecast_data(get_json('419525'), None)
braimp = parse_forecast_data(get_json('419635'), None)
chilimp = parse_forecast_data(get_json('419526'), None)
chinimp = parse_forecast_data(get_json('419527'), None)
domimp = parse_forecast_data(get_json('419633'), None)
fraimp = parse_forecast_data(get_json('419636'), None)
greimp = parse_forecast_data(get_json('419528'), None)
ghaimp = parse_forecast_data(get_json('419638'), None)
indimp = parse_forecast_data(get_json('419530'), None)
indoimp = parse_forecast_data(get_json('419639'), None)
itaimp = parse_forecast_data(get_json('419533'), None)
japimp = parse_forecast_data(get_json('419534'), None)
kuwimp = parse_forecast_data(get_json('419640'), None)
litimp = parse_forecast_data(get_json('419641'), None)
meximp = parse_forecast_data(get_json('419537'), None)
I need to format each dataframe in the same way as follows;
bngimp = bngimp[['From Date','Sales Volume']]
bngimp = bngimp.set_index('From Date')
bngimp.index = pd.to_datetime(bngimp.index)
bngimp = bngimp.groupby(by=[bngimp.index.year, bngimp.index.month]).sum()
bngimp.columns = ['bngimp']
Is there any way I could loop through the name of dataframes without having to copy and paste each dataframe name into the above code?
There will be multiple more dataframes so the copying and pasting quite time consuming!
Any help is much appreciated;

I suggest create dictionary for map numbers by DataFrame names and create dictionary of DataFrame called out:
d = {'419524': 'bngimp', '419525': 'belimp', ...}
out = {}
for k, v in d.items():
df = parse_forecast_data(get_json(k), None)
df = df[['From Date','Sales Volume']]
df = df.set_index('From Date')
df.index = pd.to_datetime(df.index)
df = df.groupby(by=[df.index.year, df.index.month]).sum()
df.columns = [v]
out[v] = df
then for get DataFrame select by key:
print (out['bngimp'])
Also if want create one big DataFrame is possible use:
df = pd.concat(out, axis=1)

How may I append new results from iterating through a list, into a new column in the dataframe

Im attempting to create a table as follows, where equities in a list get appended as columns to the dataframe:
Fundamentals CTRP EBAY ...... MPNGF
price
dividend
five_year_dividend
pe_ratio
pegRatio
priceToBook
price_to_sales
book_value
ebit
net_income
EPS
DebtEquity
threeYearAverageReturn
At the moment, based on the code below, only the last equity in the list is showing up:
Fundamentals MPNGF
price
dividend
five_year_dividend
pe_ratio
pegRatio
priceToBook
price_to_sales
book_value
ebit
net_income
EPS
DebtEquity
threeYearAverageReturn
from yahoofinancials import YahooFinancials
import pandas as pd
import lxml
from lxml import html
import requests
import numpy as np
from datetime import datetime
def scrape_table(url):
page = requests.get(url)
tree = html.fromstring(page.content)
table = tree.xpath('//table')
assert len(table) == 1
df = pd.read_html(lxml.etree.tostring(table[0], method='html'))[0]
df = df.set_index(0)
df = df.dropna()
df = df.transpose()
df = df.replace('-', '0')
df[df.columns[0]] = pd.to_datetime(df[df.columns[0]])
cols = list(df.columns)
cols[0] = 'Date'
df = df.set_axis(cols, axis='columns', inplace=False)
numeric_columns = list(df.columns)[1::]
df[numeric_columns] = df[numeric_columns].astype(np.float64)
return df
ecommerce = ['CTRP', 'EBAY', 'GRUB', 'BABA', 'JD', 'EXPE', 'AMZN', 'BKNG', 'MPNGF']
price=[]
dividend=[]
five_year_dividend=[]
pe_ratio=[]
pegRatio=[]
priceToBook=[]
price_to_sales=[]
book_value=[]
ebit=[]
net_income=[]
EPS=[]
DebtEquity=[]
threeYearAverageReturn=[]
for i, symbol in enumerate(ecommerce):
yahoo_financials = YahooFinancials(symbol)
balance_sheet_url = 'https://finance.yahoo.com/quote/' + symbol + '/balance-sheet?p=' + symbol
df_balance_sheet = scrape_table(balance_sheet_url)
df_balance_sheet_de = pd.DataFrame(df_balance_sheet, columns = ["Total Liabilities", "Total stockholders' equity"])
j= df_balance_sheet_de.loc[[1]]
j['DebtEquity'] = j["Total Liabilities"]/j["Total stockholders' equity"]
k= j.iloc[0]['DebtEquity']
X = yahoo_financials.get_key_statistics_data()
for d in X.values():
PEG = d['pegRatio']
PB = d['priceToBook']
three_year_ave_return = d['threeYearAverageReturn']
data = [['price', yahoo_financials.get_current_price()], ['dividend', yahoo_financials.get_dividend_yield()], ['five_year_dividend', yahoo_financials.get_five_yr_avg_div_yield()], ['pe_ratio', yahoo_financials.get_pe_ratio()], ['pegRatio', PEG], ['priceToBook', PB], ['price_to_sales', yahoo_financials.get_price_to_sales()], ['book_value', yahoo_financials.get_book_value()], ['ebit', yahoo_financials.get_ebit()], ['net_income', yahoo_financials.get_net_income()], ['EPS', yahoo_financials.get_earnings_per_share()], ['DebtEquity', mee], ['threeYearAverageReturn', three_year_ave_return]]
data.append(symbol.text)
df = pd.DataFrame(data, columns = ['Fundamentals', symbol])
df
Seeking your kind advice please as to where may i have gone wrong in the above table? Thank you so very much!

You need to call your df outside of your for loop. Your code as currently written will recreate a new df for every loop.

Panda DataFrame Row Items IF Comparison doesnt return correct result

I retrieve data from quandl and load it to a pandas DF object.
Afterwards I calculate SMA values (SMA21, SMA55) based on "Last Price".
Adding those SMA values as a column do my DF object.
I iterate through DF to catch a buy signal.
I know the buy condition is holding true for some dates but my code does not printing anything out. I am expecting to print the buy condition at the very least.
as below you can see the following condition:
kitem['SMA21'] >= kitem['Last']
My code:
import requests
import pandas as pd
import json
class URL_Params:
def __init__ (self, endPoint, symboll, startDate, endDate, apiKey):
self.endPoint = endPoint
self.symboll = symboll
self.startDate = startDate
self.endDate = endDate
self.apiKey = apiKey
def createURL (self):
return self.endPoint + self.symboll + '?start_date=' + self.startDate + '&end_date=' + self.endDate + '&api_key=' + self.apiKey
def add_url(self, _url):
self.url_list
my_portfolio = {'BTC':1.0, 'XRP':0, 'DSH':0, 'XMR':0, 'TotalBTCValue':1.0}
_endPoint = 'https://www.quandl.com/api/v3/datasets/BITFINEX/'
_symbolls = ['BTCEUR','XRPBTC','DSHBTC','IOTBTC','XMRBTC']
_startDate = '2017-01-01'
_endDate = '2019-03-01'
_apiKey = '' #needs to be set for quandl
my_data = {}
my_conns = {}
my_col_names = ['Date', 'High', 'Low', 'Mid', 'Last', 'Bid', 'Ask', 'Volume']
orderbook = []
#create connection and load data for each pair/market.
#load them in a dict for later use
for idx_symbol in _symbolls:
my_url_params = URL_Params(_endPoint,idx_symbol,_startDate,_endDate,_apiKey)
response = requests.get(my_url_params.createURL())
my_data[idx_symbol] = json.loads(response.text)
#Prepare Data
my_raw_data_df_xrpbtc = pd.DataFrame(my_data['XRPBTC']['dataset']['data'], columns= my_data['XRPBTC']['dataset']['column_names'])
#Set Index to Date Column and Sort
my_raw_data_df_xrpbtc['Date'] = pd.to_datetime(my_raw_data_df_xrpbtc['Date'])
my_raw_data_df_xrpbtc.index = my_raw_data_df_xrpbtc['Date']
my_raw_data_df_xrpbtc = my_raw_data_df_xrpbtc.sort_index()
#Drop unrelated columns
my_raw_data_df_xrpbtc.drop(['Date'], axis=1, inplace=True)
my_raw_data_df_xrpbtc.drop(['Ask'], axis=1, inplace=True)
my_raw_data_df_xrpbtc.drop(['Bid'], axis=1, inplace=True)
my_raw_data_df_xrpbtc.drop(['Low'], axis=1, inplace=True)
my_raw_data_df_xrpbtc.drop(['High'], axis=1, inplace=True)
my_raw_data_df_xrpbtc.drop(['Mid'], axis=1, inplace=True)
#Calculate SMA values to create buy-sell signal
my_raw_data_df_xrpbtc['SMA21'] = my_raw_data_df_xrpbtc['Last'].rolling(21).mean()
my_raw_data_df_xrpbtc['SMA55'] = my_raw_data_df_xrpbtc['Last'].rolling(55).mean()
my_raw_data_df_xrpbtc['SMA200'] = my_raw_data_df_xrpbtc['Last'].rolling(200).mean()
#Check for each day if buy signal holds BUY if sell signal holds SELL
for idx,kitem in my_raw_data_df_xrpbtc.iterrows():
if (kitem['SMA21'] >= kitem['Last']) is True: #buy signal
print("buy0")
if my_portfolio['BTC'] > 0 is True:
print("buy1")
if (kitem['Last'] * my_portfolio['XRP']) >= (my_portfolio['BTC'] * 1.05) is True: #sell signal
print("sell0")
if my_portfolio['XRP'] > 0 is True:
print("sell1")
I know that there are lots of rows that holds true but my code never enters this path of code so it does not print out what I expect.
Could anyone please help/comment what might be wrong?

The reason is that your comparison is wrong. The result of kitem['SMA21'] >= kitem['Last'] will be a numpy.bool_. When you use is to compare it to True this will fail as it is not the same object.
If you change the comparison to == it will work as expected:
if (kitem['SMA21'] >= kitem['Last']) == True:

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Deleting empty columns with a few columns between data - python

I'm fetching data from a Google sheet: values1 = pd.DataFrame(values) aux = values1.head(1) values1.drop(index={0}, inplace=True) senal1 = (values1[2] == "SEÑAL") senal = values1[senal1] senal.dropna(axis=1, inplace=True) print(senal) This is my result after running the code:

Related

Removing utc info from yfinance dataframe

How to make sure that the data in this PyTrends function comes out in YYYY-MM-DD format and not YYYY-MM-DD 00:00:00

Looping through a list of names of dataframes

How may I append new results from iterating through a list, into a new column in the dataframe

Panda DataFrame Row Items IF Comparison doesnt return correct result

Categories

Resources