Python Google Finance historical data foreign stock Codes eg ASX - python

The code below works for the American stocks APLE and BHP however when I replace them with the ASX codes it crashes. I though it was due to the colon and have placed str(ASX:BHP) with out success. Unfortunately Yahoo is no longer supplying historical data. Any thoughts or solutions alternatives would be greatly appreciated.
Thanks
import datetime
import pandas as pd
from pandas_datareader import data, wb
list = ["APLE","BHP"]
#list = ["ASX:AMP","ASX:BHP"]
df_all_stock = pd.DataFrame([])
start = datetime.datetime(2016, 1, 1)
end = datetime.datetime(2017, 1, 1)
for row in list:
row = str(row)
df_stock = data.DataReader(row, "google", start, end)
df_all_stock = df_all_stock.append(df_stock)
df_all_stock['code'] = row
df_all_stock

Just use the ASX API
https://www.asx.com.au/asx/1/share/AMP
will return:
{"code":"AMP","isin_code":"AU000000AMP6","desc_full":"Ordinary Fully Paid","last_price":1.155,"open_price":1.115,"day_high_price":1.155,"day_low_price":1.11,"change_price":0.040,"change_in_percent":"3.587%","volume":24558498,"bid_price":1.15,"offer_price":1.16,"previous_close_price":1.115,"previous_day_percentage_change":"3.241%","year_high_price":1.77,"last_trade_date":"2021-08-13T00:00:00+1000","year_high_date":"2020-12-03T00:00:00+1100","year_low_price":1.038,"year_low_date":"2021-07-30T00:00:00+1000","year_open_price":4.97,"year_open_date":"2014-02-25T11:00:00+1100","year_change_price":-3.815,"year_change_in_percentage":"-76.761%","pe":32.08,"eps":0.036,"average_daily_volume":20511519,"annual_dividend_yield":0,"market_cap":3641708026,"number_of_shares":3266105853,"deprecated_market_cap":3772352000,"deprecated_number_of_shares":3266105853,"suspended":false}
Other sample queries that can give you price history, announcements, directors, dividends etc:
https://www.asx.com.au/asx/1/share/AMP/prices?interval=daily&count=255
https://www.asx.com.au/asx/1/company/AMP
https://www.asx.com.au/asx/1/company/AMP?fields=primary_share,latest_annual_reports,last_dividend,primary_share.indices
https://www.asx.com.au/asx/1/company/AMP/announcements?count=10&market_sensitive=true
https://www.asx.com.au/asx/1/company/AMP/dividends
https://www.asx.com.au/asx/1/company/AMP/dividends/history?years=10
https://www.asx.com.au/asx/1/company/AMP/people
https://www.asx.com.au/asx/1/company/AMP/options?count=1000
https://www.asx.com.au/asx/1/company/AMP/warrants?count=1000
https://www.asx.com.au/asx/1/chart/highcharts?asx_code=AMP&years=10
https://www.asx.com.au/asx/1/company/AMP/similar?compare=marketcap
I included some sample Python code here: https://stackoverflow.com/a/68790147/8459557

will need to build a scraper to get the data out of the html table and then build up a pandas dataframe that resembles the one we get as output for the american stock data.
I determined the base url for Canadian stocks on google finance to be: 'https://www.google.ca/finance/historical?q=TSE%3A' To get data for a stock, we simply append its name to the end of the above base url. For example to see the historical stock data for 'VCN' we would need to go to the page: https://www.google.ca/finance/historical?q=TSE%3AVCN
To do the above in python code we simply need the following, where the stock variable can be changed for any TSE(Tornto stock exchange) stock of interest.
from datetime import datetime
from pandas import DataFrame
import pandas_datareader.data as web
google_historical_price_site= 'https://www.google.ca/finance/historical?
q=TSE%3A'
stock = 'VCN' #sub any sock in here
historical_price_page = google_historical_price_site + stock
print(historical_price_page)
from urllib.request import urlopen
from bs4 import BeautifulSoup
#open the historical_price_page link and acquire the source code
stock_dat = urlopen(historical_price_page)
#parse the code using BeautifulSoup
historical_page = BeautifulSoup(stock_dat,'lxml')
#scrape the table
table_dat = historical_page.find('table',{'class':'gf-table
historical_price'})
#find all the rows in the table
rows = table_dat.findAll('td',{'class':'lm'})
#get just the dates out of the table rows, strip the newline characters
dates = [x.get_text().rstrip() for x in rows]
#turn dates to python datetime format
datetime_dates = [datetime.strptime(x, '%b %d, %Y') for x in dates]
#next we build up the price dataframe rows
#iterate through the table, taking the siblings to the
#right of the dates and adding to the row's data
prices = []
for num, row in enumerate(rows):
row_dat = [datetime_dates[num]] #first column is the dates
for i in row.next_siblings:
row_dat.append(i.get_text().rstrip()) #iterate through columns, append
prices.append(row_dat) #add the row to the list of rows
#turn the output into the dataframe
outdat = DataFrame(prices,columns =
['Date','Open','High','Low','Close','Volume'])
#make the Volume columns integers, in case we wish to use it later!
outdat["Volume"] = outdat["Volume"].apply(lambda x: int(x.replace(',','')))
#change the other columns to floating point values
for col in ['Open','High','Low','Close']:
outdat[col] = outdat[col].apply(lambda x: float(x))
#set the index to match the american stock data
outdat = outdat.set_index('Date')
#sort the index so it is in the same orientation as the american data
outdat = outdat.sort_index()
#have a look
outdat

EXAMPLE OF downloading Hong Kong Stock as CSV file (STOCK EXAMPLE: Tencent Holdings Ltd(HKG:0700)
from datetime import datetime
from pandas import DataFrame
import pandas_datareader.data as web
import os
google_historical_price_site='https://finance.google.com/finance/historical?q=HKG:0700'
print(google_historical_price_site)
from urllib.request import urlopen
from bs4 import BeautifulSoup
#open the historical_price_page link and acquire the source code
stock_dat = urlopen(google_historical_price_site)
#parse the code using BeautifulSoup
google_historical_price_site = BeautifulSoup(stock_dat,'lxml')
#scrape the table
table_dat = google_historical_price_site.find('table',{'class':'gf-table
historical_price'})
#find all the rows in the table
rows = table_dat.findAll('td',{'class':'lm'})
#get just the dates out of the table rows, strip the newline characters
dates = [x.get_text().rstrip() for x in rows]
#turn dates to python datetime format
datetime_dates = [datetime.strptime(x, '%b %d, %Y') for x in dates]
#next we build up the price dataframe rows
#iterate through the table, taking the siblings to the
#right of the dates and adding to the row's data
prices = []
for num, row in enumerate(rows):
row_dat = [datetime_dates[num]] #first column is the dates
for i in row.next_siblings:
row_dat.append(i.get_text().rstrip()) #iterate through columns, append
prices.append(row_dat) #add the row to the list of rows
#turn the output into the dataframe
outdat = DataFrame(prices,columns =
['Date','Open','High','Low','Close','Volume'])
#make the Volume columns integers, in case we wish to use it later!
outdat["Volume"] = outdat["Volume"].apply(lambda x: int(x.replace(',','')))
#change the other columns to floating point values
for col in ['Open','High','Low','Close']:
outdat[col] = outdat[col].apply(lambda x: float(x))
#set the index to match the american stock data
outdat = outdat.set_index('Date')
#sort the index so it is in the same orientation as the american data
outdat = outdat.sort_index()
#output CSV.file
df=outdat
path_d = 'C:\MA data'
df.to_csv(os.path.join(path_d, 'HKGstock700.csv'))

Related

How do I save each iteration of a for loop in one big DataFrame - Python

I want to gather all the historical prices of each stock in the S&P500 in Python. I'm using a package from IEX Cloud which gives me the historical prices of an individual stock. I want a for loop to run through a list of the tickers/symbols from the stock index so that I get all the data in a single DataFrame.
This is the code that produces a DataFrame - in this example I've chosen AAPL for a two year period:
import pyEX as p
sym = 'AAPL'
stock_list = stocks['Ticker'].tolist()
c = p.Client(api_token='TOKEN', version='stable')
timeframe = '2y'
df = c.chartDF(symbol=sym, timeframe=timeframe)[['close']]
df
This DataFrame contains the date and the daily closing price. Now do any of you have any ideas how to loop through my list of tickers, so that I get a comprehensive DataFrame of all the historical prices?
Thank you.
Create an empty list to append to and concat everything together after you iterate over all the tickers
import pyEX as p
import pandas as pd
stock_list = stocks['Ticker'].tolist()
c = p.Client(api_token='TOKEN', version='stable')
timeframe = '2y'
dfs = [] # create an empty list
for sym in stock_list: # iterate over your ticker list
df = c.chartDF(symbol=sym, timeframe=timeframe)[['close']] # create your frame
dfs.append(df) # append frame to list
final_df = pd.concat(dfs) # concat all your frames together into one
Update with Try-Except
import pyEX as p
import pandas as pd
stock_list = stocks['Ticker'].tolist()
c = p.Client(api_token='TOKEN', version='stable')
timeframe = '2y'
dfs = [] # create an empty list
for sym in stock_list: # iterate over your ticker list
try:
df = c.chartDF(symbol=sym, timeframe=timeframe)[['close']] # create your frame
dfs.append(df) # append frame to list
except KeyError:
print(f'KeyError for {sym}')
final_df = pd.concat(dfs) # concat all your frames together into one

Python: How to Webscrape All Rows from a Specific Table

For practice, I am trying to webscrape financial data from one table in this url: https://www.macrotrends.net/stocks/charts/TSLA/tesla/revenue
I'd like to save the data from the "Tesla Quarterly Revenue" table into a data frame and return two columns: Data, Revenue.
Currently the code as it runs now is grabbing data from the adjacent table, "Tesla Annual Revenue." Since the tables don't seem to have unique id's from which to separate them in this instance, how would I select elements only from the "Tesla Quarterly Revenue" table?
Any help or insight on how to remedy this would be deeply appreciated.
import pandas as pd
import requests
from bs4 import BeautifulSoup
url = "https://www.macrotrends.net/stocks/charts/TSLA/tesla/revenue"
html_data = requests.get(url).text
soup = BeautifulSoup(html_data, 'html5lib')
tesla_revenue = pd.DataFrame(columns=["Date", "Revenue"])
for row in soup.find("tbody").find_all("tr"):
col = row.find_all("td")
date = col[0].text
revenue = col[1].text
tesla_revenue = tesla_revenue.append({"Date":date, "Revenue":revenue},ignore_index=True)
tesla_revenue.head()
Below are the results when I run this code:
You can let pandas do all the work
import pandas as pd
url = "https://www.macrotrends.net/stocks/charts/TSLA/tesla/revenue"
tables = pd.read_html(url)
for df in tables:
# loop over all found tables
pass
# quarterly revenue is the second table
df = tables[1]
df.columns = ['Date', 'Revenue'] # rename the columns if you want to
print(df)

Downloading stock data via Python to excel (missing date column)

I want to download stock data into single excel file but while doing this I have an issue that I struggle to get date column for my stock database.
Goal - Picture example
Can someone help me to find why am I getting such results? And how can I fix this?
import pandas as pd
import pandas_datareader as pdr
import datetime as dt
file = str(input('Enter File Name - '))
download_sourrce = (r"C:\Users\vladt\OneDrive\Рабочий стол\Stock\{}.xlsx".format(file))
writer = pd.ExcelWriter(download_sourrce, engine='xlsxwriter')
should_continue = True
while should_continue:
x = str(input('Stock? - '))
start = dt.datetime(2020,1,1)
end = dt.datetime.today()
df = pdr.get_data_yahoo(x,start,end)
df.to_excel(writer, sheet_name=x, index=False)
should_continue = input('Add new stock? [Y/N]').upper() == 'Y'
writer.save()
Try the following before saving to Excel:
df = df.reset_index()
It's common to have stock data feeds from pandas datareader (from yahoo or other companies providing data readers to pandas datareader) or other API/packages provided by other stock brokers, to have the date column set up as row index.
For this kind date series set as row index, you have to reset index to get its contents into data column.

Scraping a table with row labels in Python using Beautiful Soup

I'm trying to scrape a table from a website that has row labels. I'm able to get the actual data from the table, but I have no idea how to get the row labels as well.
Here is my code right now:
import numpy as np
import pandas as pd
import urllib.request
from bs4 import BeautifulSoup
url = "http://www12.statcan.gc.ca/census-recensement/2016/dp-pd/dt-td/Rp-eng.cfm?TABID=2&LANG=E&A=R&APATH=3&DETAIL=0&DIM=0&FL=A&FREE=0&GC=01&GL=-1&GID=1341679&GK=1&GRP=1&O=D&PID=110719&PRID=10&PTYPE=109445&S=0&SHOWALL=0&SUB=0&Temporal=2017&THEME=125&VID=0&VNAMEE=&VNAMEF=&D1=0&D2=0&D3=0&D4=0&D5=0&D6=0"
res = urllib.request.urlopen(url)
html = res.read()
## parse with BeautifulSoup
bs = BeautifulSoup(html, "html.parser")
tables = bs.find_all("table")
table = tables[0]
df = pd.DataFrame()
rows = table.find_all("tr")
#extract the first column name (Employment income groups (18))
column_names = []
header_cells = rows[0].find_all("th")
for cell in header_cells:
header = cell.text
header = header.strip()
header = header.replace("\n", " ")
column_names.append(header)
#extract the rest of the column names
header_cells = rows[1].find_all("th")
for cell in header_cells:
header = cell.text
header = header.strip()
header = header.replace("\n", " ")
column_names.append(header)
#this is an extra label
column_names.remove('Main mode of commuting (10)')
#get the data from the table
data = []
for row in rows[2:]:
## create an empty tuple
dt = ()
cells = row.find_all("td")
for cell in cells:
## dp stands for "data point"
font = cell.find("font")
if font is not None:
dp = font.text
else:
dp = cell.text
dp = dp.strip()
dp = dp.replace("\n", " ")
## add to tuple
dt = dt + (dp,)
data.append(dt)
df = pd.DataFrame(data, columns = column_names)
Creating the dataframe will give an error because the code above only extracts the cells with data points but does not extract the first cell of each row that contains the row label.
That is, there are 11 column names, but the tuples only have 10 values because it is not extracting the row label (ie, Total - Employment income) because they are of "th" type.
How can I get the row label and put it into the tuple as I process the rest of the data in the table?
Thank you for your help.
(The table I am trying to scrape is on this site if it's not clear from the code)
Use this table.findAll('th',{'headers':'col-0'}) to find row labels
lab = []
labels = table.findAll('th',{'headers':'col-0'})
for label in labels:
data = str(label.text).strip()
data = str(data).split("($)Footnote", 1)[0]
lab.append(data)
#print(data)
EDIT:
Using pandas.read_html
import numpy as np
import pandas as pd
import urllib.request
from bs4 import BeautifulSoup
url = "http://www12.statcan.gc.ca/census-recensement/2016/dp-pd/dt-td/Rp-eng.cfm?TABID=2&LANG=E&A=R&APATH=3&DETAIL=0&DIM=0&FL=A&FREE=0&GC=01&GL=-1&GID=1341679&GK=1&GRP=1&O=D&PID=110719&PRID=10&PTYPE=109445&S=0&SHOWALL=0&SUB=0&Temporal=2017&THEME=125&VID=0&VNAMEE=&VNAMEF=&D1=0&D2=0&D3=0&D4=0&D5=0&D6=0"
res = urllib.request.urlopen(url)
html = res.read()
## parse with BeautifulSoup
bs = BeautifulSoup(html, "html.parser")
tables = bs.find_all("table")
df = (pd.read_html(str(tables)))[0]
#print(df)
columns = ['Employment income groups (18)','Total - Main mode of commuting','Car, truck or van','Driver, alone',
'2 or more persons shared the ride to work','Driver, with 1 or more passengers',
'Passenger, 2 or more persons in the vehicle','Sustainable transportation',
'Public transit','Active transport','Other method']
df.columns = columns
Edit 2: Element wont be accesible by index because strings are not proper strings (Employment income groups (18) column labels). I have the edited the code again.

Saving to_csv read just the columns, not the rows

I'm stuck with reading all the rows of a csv file and save into a csv files (I'm using pandas 0.17.1).
I've a list of tickers inserted into a csv file: they are inserted into each column, like this:
Column A: AAPL / Column B:TSLA / Column C: EXPD... and so on.
Now, I've to add 3000 new tickers to this list, and so I change the orientation of the csv, bringing every ticker into each row of the first column, like this:
Column A
AAPL
TSLA
EXPD
...and so on.
The issue is: when I save the document into a csv file, it read only the first row, and nothing else.
In my example, if i have on the first row "AAPL", I will obtain a csv file that has only the data from AAPL.
This is my code:
symbols_list = pd.read_csv('/home/andrea/htrade/python/titoli_rows.csv')
symbols = []
for ticker in symbols_list:
r = DataReader(ticker, "yahoo",
start=datetime.datetime.now() - BDay(20),
end=datetime.datetime.now())
# add a symbol column
r['Symbol'] = ticker
symbols.append(r)
# concatenate all the dfs
df = pd.concat(symbols)
#define cell with the columns that i need
cell = df[['Symbol', 'Open', 'High', 'Low', 'Adj Close', 'Volume']]
cell.reset_index().sort_values(['Symbol', 'Date'], ascending=[1, 0]).set_index('Symbol').to_csv('/home/andrea/Dropbox/HT/stock20.csv', date_format='%d/%m/%Y')
Why if I paste a ticker in each column the csv contain all the data of every ticker, but if I paste a ticker in each row, it will read just the first row?
I already tried to see if the "read_csv" function was reading correctly the csv, and he is, so I don't understand why he's not elaborating them all.
I just ran the below and with a short list of symbols imported via read_csv it seemed to work fine:
from datetime import datetime
import pandas.io.data as web
from pandas.tseries.offsets import BDay
df = pd.read_csv(path_to_file).loc[:, ['symbols']].dropna().squeeze()
symbols = []
for ticker in df.tolist():
r = web.DataReader(ticker, "yahoo",
start= datetime.now() - BDay(20),
end= datetime.now())
r['Symbol'] = ticker
symbols.append(r)
df = pd.concat(symbols).drop('Close', axis=1)
cell= df[['Symbol','Open','High','Low','Adj Close','Volume']]
cell.reset_index().sort_values(['Symbol', 'Date'], ascending=[1,0]).set_index('Symbol').to_csv(path_to_file, date_format='%d/%m/%Y')

Categories

Resources