API Request to Array - python

I'm using a Crypto API that gives me the Time, Open, High, Low, Close Values of the last weeks back. I just need the first row.
The input:
[[1635260400000, 53744.5, 53744.5, 53430.71, 53430.71], [1635262200000, 53635.49, 53899.73, 53635.49, 53899.73], [1635264000000, 53850.63, 54258.62, 53779.11, 54242.25], [1635265800000, 54264.32, 54264.32, 53909.02, 54003.42]]
I've tried:
resp = pd.read_csv('https://api.coingecko.com/api/v3/coins/bitcoin/ohlc?vs_currency=eur&days=1')
resp = resp.astype(str)
Time = resp[resp.columns[0]]
Open = resp[resp.columns[1]]
High = resp[resp.columns[2]]
Low = resp[resp.columns[3]]
Close = resp[resp.columns[4]]
But this doesn't work as I can't process it(i wanted to process it from object to str to double or float). I want to have each value as a double in a different variable. Im kinda stuck at this.

The problem with using pandas is that the JSON array creates one row with several columns.
If you expect to just loop over the JSON array, I suggest using requests rather than pandas.
import requests
resp = requests.get('https://api.coingecko.com/api/v3/coins/bitcoin/ohlc?vs_currency=eur&days=1')
for row in resp.json():
timestamp, open_price, high, low, close = row
...

You just need to use read_json:
resp = pd.read_json('https://api.coingecko.com/api/v3/coins/bitcoin/ohlc?vs_currency=eur&days=1')
resp = resp.astype(float)
Time = resp[resp.columns[0]]
Open = resp[resp.columns[1]]
High = resp[resp.columns[2]]
Low = resp[resp.columns[3]]
Close = resp[resp.columns[4]]
But the previous solution is more compact and understandable.

Related

HTTP Error 400 Bad request calling api with python

list = [i for i in range(2321)]
for i in range(0, len(my_list), 100):
my_list[i:i+100]
query_get_data_by_dea_schedule = 'https://api.fda.gov/drug/ndc.json?search=dea_schedule:"{}"&limit={}'.format('CII', i)
print(query_get_data_by_dea_schedule)
data_df = pd.DataFrame(pd.read_json(path_or_buf=query_get_data_by_dea_schedule, orient='values', typ='series', convert_dates=False)['results'])
all_data_df = all_data_df.append(data_df)
I am trying to run this to get the data for 2321 lines that are coming from FDA for schedule 3 items. I need to read 100 at a time because that is the limit. I am not sure what am I doing wrong here. Also, am I reading that right 100 at a time to save it in the data frame? It stops and gives me : HTTPError: HTTP Error 400: Bad Request. thanks in advance.
Based on documentation you should use skip instead of limit - and use always limit=100 - like limit=100&skip=0, limit=100&skip=100, limit=100&skip=200, limit=100&skip=300, etc.
Minimal code which works for me:
import pandas as pd
url = 'https://api.fda.gov/drug/ndc.json?search=dea_schedule:"{}"&limit={}&skip={}'
all_data_df = []
limit = 100
for skip in range(0, 2321, limit):
query = url.format('CII', limit, skip)
print('query:', query)
data = pd.read_json(query, orient='values', typ='series', convert_dates=False)
data = data['results']
all_data_df.append(data)
print(all_data_df)

Daily leaderboard or price tracking data

I'll just start from scratch since I feel like I'm lost with all the different possibilities. What I will be talking about is leaderboard but could apply to price tracking as well.
My goal is to scrape data from a website (the all time leaderboard / hidden), put it in a .csv file and update it daily at noon.
What I have succeeded so far : scraping the data.
Tried scraping with BS4 but since the data is hidden, I couldn't be specific enough to only get the all-time points. I find it's a success because I'm able to get a table with all the data I need and the date as a header. My problem with this solution is 1) unuseful data populating the csv 2) table is vertical and not horizontal
Scraped data with CSS selector but I have abandoned this idea because soemtimes the page won't load and the data wasn't scraped. Found out that there's a json file containing the data right away
Json scraping seems to be the best option, but having trouble creating a csv file that's OK to make a graph with.
This is what brings me to what I'm struggling with : storing the data in a table that looks like this where the grey area is the points and the DATE1 is the moment the data has been scraped :
I'd like not to manipulate the data in the csv file too much. If the table would look like what I picture above, then it's gonna be easier to make a graph afterwards but I'm having trouble. The best I did is creating a table that looks like this AND that's vertical and not horizontal.
name,points,date
Dennis,52570,10-23-2020
Dinh,40930,10-23-2020
name,points,date
Dennis,52570,10-23-2020
Dinh,40930,10-23-2020
name,points,date
Dennis,52570,10-23-2020
Dinh,40930,10-23-2020
Thank you for your help.
Here's the code
import pandas as pd
import time
timestr = time.strftime("%Y-%m-%d %H:%M")
url_all_time = 'https://community.koodomobile.com/widget/pointsLeaderboard?period=allTime&maxResults=20&excludeRoles='
data = pd.read_json(url_all_time)
table = pd.DataFrame.from_records(data, index=['name'], columns=['points','name'])
table['date'] = pd.Timestamp.today().strftime('%m-%d-%Y')
table.to_csv('products.csv', index=True, encoding='utf-8')
If what I want is not possible, I might just scrape individually for each member, make one CSV file per member and make a graph that refers to those different files.
So, I've played around with your question a bit and here's what I came up with.
Basically, your best bet for data storage is a light weight database, as suggested in the comments. However, with a bit of planning, a few hoops to jump, and some hacky code you could get away with a simple (sort of) JSON that eventually ends up as a .csv file that looks like this:
Note: the values are the same as I don't want to wait a day or two for the leader-board to actually update.
What I did was rearranging the data that came back from the request to the API and built a structure that looks like this:
"BobTheElectrician": {
"id": 7160010,
"rank": 14,
"score_data": {
"2020-10-24 18:45": 4187,
"2020-10-24 18:57": 4187,
"2020-10-24 19:06": 4187,
"2020-10-24 19:13": 4187
}
Every player is your main key that has, among others, a scores_data value. This in turn is a dict that holds points value for each day you run the script.
Now, the trick is to get this JSON to look like the .csv you want. The question is - how?
Well, since you intend to update all players' data (I just assumed that) they all should have the same number of entries for score_data.
The keys for score_data are your timestamps. Grab any player's score_data keys and you have the date headers, right?
Having said that, you can build your .csv rows the same way: grab player's name and all their point values from score_data. This should get you a list of lists, right? Right.
Then, when you have all this, you just dump that to a .csv file and there you have it!
Putting it all together:
import csv
import json
import os
import random
import time
from urllib.parse import urlencode
import requests
API_URL = "https://community.koodomobile.com/widget/pointsLeaderboard?"
LEADERBOARD_FILE = "leaderboard_data.json"
def get_leaderboard(period: str = "allTime", max_results: int = 20) -> list:
payload = {"period": period, "maxResults": max_results}
return requests.get(f"{API_URL}{urlencode(payload)}").json()
def dump_leaderboard_data(leaderboard_data: dict) -> None:
with open("leaderboard_data.json", "w") as jf:
json.dump(leaderboard_data, jf, indent=4, sort_keys=True)
def read_leaderboard_data(data_file: str) -> dict:
with open(data_file) as f:
return json.load(f)
def parse_leaderboard(leaderboard: list) -> dict:
return {
item["name"]: {
"id": item["id"],
"score_data": {
time.strftime("%Y-%m-%d %H:%M"): item["points"],
},
"rank": item["leaderboardPosition"],
} for item in leaderboard
}
def update_leaderboard_data(target: dict, new_data: dict) -> dict:
for player, stats in new_data.items():
target[player]["rank"] = stats["rank"]
target[player]["score_data"].update(stats["score_data"])
return target
def leaderboard_to_csv(leaderboard: dict) -> None:
data_rows = [
[player] + list(stats["score_data"].values())
for player, stats in leaderboard.items()
]
random_player = random.choice(list(leaderboard.keys()))
dates = list(leaderboard[random_player]["score_data"])
with open("the_data.csv", "w") as output:
w = csv.writer(output)
w.writerow([""] + dates)
w.writerows(data_rows)
def script_runner():
if os.path.isfile(LEADERBOARD_FILE):
fresh_data = update_leaderboard_data(
target=read_leaderboard_data(LEADERBOARD_FILE),
new_data=parse_leaderboard(get_leaderboard()),
)
leaderboard_to_csv(fresh_data)
dump_leaderboard_data(fresh_data)
else:
dump_leaderboard_data(parse_leaderboard(get_leaderboard()))
if __name__ == "__main__":
script_runner()
The script also checks if you have a JSON file that pretends to be a proper database. If not, it'll write the leader-board data. Next time you run the script, it'll update the JSON and spit out a fresh .csv file.
Hope this answer will nudge you in the right direction.
Hey since you are loading it in a panda frame it makes the operations fairly simple. I ran your code first
import pandas as pd
import time
timestr = time.strftime("%Y-%m-%d %H:%M")
url_all_time = 'https://community.koodomobile.com/widget/pointsLeaderboard?period=allTime&maxResults=20&excludeRoles='
data = pd.read_json(url_all_time)
table = pd.DataFrame.from_records(data, index=['name'], columns=['points','name'])
table['date'] = pd.Timestamp.today().strftime('%m-%d-%Y')
Then I added a few more lines of code to modify the panda frame table to your need.
idxs = table['date'].index
for i,val in enumerate(idxs):
table.at[ val , table['date'][i] ] = table['points'][i]
table = table.drop([ 'date', 'points' ], axis = 1)
In the above snippet I am using pandas frames ability to assign values using indexes. So first I get index values for the date column then I go through each of them to add column for the required date(values from date column) and get the corresponding points according to the indexes we pulled earlier
This gives me the following output:
name 10-24-2020
Dennis 52570.0
Dinh 40930.0
Sophia 26053.0
Mayumi 25300.0
Goran 24689.0
Robert T 19843.0
Allan M 19768.0
Bernard Koodo 14404.0
nim4165 13629.0
Timo Tuokkola 11216.0
rikkster 7338.0
David AKU 5774.0
Ranjan Koodo 4506.0
BobTheElectrician 4170.0
Helen Koodo 3370.0
Mihaela Koodo 2764.0
Fred C 2542.0
Philosoraptor 2122.0
Paul Deschamps 1973.0
Emilia Koodo 1755.0
I can then save this to csv using last line from your code. Similar you can pull data for more dates and format it to add it to the same panda frame
table.to_csv('products.csv', index=True, encoding='utf-8')

Retry Single Iteration in For Loop (Python)

Python novice here (sorry if this is a dumb question)! I'm currently using a for loop to download and manipulate data. Unfortunately, I occasionally run into brief network issues that cause portions of the loop to fail.
Originally, I was doing something like this:
# Import Modules
import fix_yahoo_finance as yf
import pandas as pd
from stockstats import StockDataFrame as sdf
# Stock Tickers to Gather Data For - in my full code I have thousands of tickers
Ticker = ['MSFT','SPY','GOOG']
# Data Start and End Data
Data_Start_Date = '2017-03-01'
Data_End_Date = '2017-06-01'
# Create Data List to Append
DataList = pd.DataFrame([])
# Initialize Loop
for i in Ticker:
# Download Data
data = yf.download(i, Data_Start_Date, Data_End_Date)
# Create StockDataFrame
stock_df = sdf.retype(data)
# Calculate RSI
data['rsi'] = stock_df['rsi_14']
DataList.append(pd.DataFrame(data))
DataList.to_csv('DataList.csv',header=True,index=True)
With that basic layout, whenever I had a network error, it caused the entire program to halt and spit out an error.
I did some research and tried modifying the 'for loop' to following:
for i in Ticker:
try:
# Download Data
data = yf.download(i, Data_Start_Date, Data_End_Date)
# Create StockDataFrame
stock_df = sdf.retype(data)
# Calculate RSI
data['rsi'] = stock_df['rsi_14']
DataList.append(pd.DataFrame(data))
except:
continue
With this, the code always ran without issue, but whenever I encountered a network error, it skipped all the tickers it was on (failed to download their data).
I want this to download the data for each ticker once. If it fails, I want it to try again until it succeeds once and then move on to the next ticker. I tried using while True and variations of it, but it caused the loop to download the same ticker multiple times!
Any help or advice is greatly appreciated! Thank you!
If you can continue after you've hit a glitch (some protocols support it), then you're better off not using this exact approach. But for a slightly brute-force method:
for i in Ticker:
incomplete = True
tries = 10
while incomplete and tries > 0:
try:
# Download Data
data = yf.download(i, Data_Start_Date, Data_End_Date)
incomplete = False
except:
tries -= 1
# Create StockDataFrame
if incomplete:
print("Oops, it is really failing a lot, skipping: %r" % (i,))
continue # not technically needed, but in case you opt to add
# anything afterward ...
else:
stock_df = sdf.retype(data)
# Calculate RSI
data['rsi'] = stock_df['rsi_14']
DataList.append(pd.DataFrame(data))
This is slighly different that Prune's in that it stops after 10 attempts ... if it fails that many times, that indicates you may want to divert some energy into fixing a different problem such as network connectivity.
If it gets to that point, it will continue in the list of Tickers, so perhaps you can get most of what you need.
You can use a wrapper loop to continue until you get a good result.
for i in Ticker:
fail = True
while fail: # Keep trying until it works
try:
# Download Data
data = yf.download(i, Data_Start_Date, Data_End_Date)
# Create StockDataFrame
stock_df = sdf.retype(data)
# Calculate RSI
data['rsi'] = stock_df['rsi_14']
DataList.append(pd.DataFrame(data))
except:
continue
else:
fail = False

Checking HTTP Status (Python)

Is there a way to check the HTTP Status Code in the code below, as I have not used the request or urllib libraries which would allow for this.
from pandas.io.excel import read_excel
url = 'http://www.bankofengland.co.uk/statistics/Documents/yieldcurve/uknom05_mdaily.xls'
#check the sheet number, spot: 9/9, short end 7/9
spot_curve = read_excel(url, sheetname=8) #Creates the dataframes
short_end_spot_curve = read_excel(url, sheetname=6)
# do some cleaning, keep NaN for now, as forward fill NaN is not recommended for yield curve
spot_curve.columns = spot_curve.loc['years:']
valid_index = spot_curve.index[4:]
spot_curve = spot_curve.loc[valid_index]
# remove all maturities within 5 years as those are duplicated in short-end file
col_mask = spot_curve.columns.values > 5
spot_curve = spot_curve.iloc[:, col_mask]
#Providing correct names
short_end_spot_curve.columns = short_end_spot_curve.loc['years:']
valid_index = short_end_spot_curve.index[4:]
short_end_spot_curve = short_end_spot_curve.loc[valid_index]
# merge these two, time index are identical
# ==============================================
combined_data = pd.concat([short_end_spot_curve, spot_curve], axis=1, join='outer')
# sort the maturity from short end to long end
combined_data.sort_index(axis=1, inplace=True)
def filter_func(group):
return group.isnull().sum(axis=1) <= 50
combined_data = combined_data.groupby(level=0).filter(filter_func)
In pandas:
read_excel try to use urllib2.urlopen(urllib.request.urlopen instead in py3x) to open the url and get .read() of response immediately without store the http request like:
data = urlopen(url).read()
Though you need only part of the excel, pandas will download the whole excel each time. So, I voted #jonnybazookatone.
It's better to store the excel to your local, then you can check the status code and md5 of file first to verify data integrity or others.

How do I add an ID attribute to each named tuple create from a CSV file?

I'm pulling stock-quotes from Yahoo into a named tuple using the CSV module.
YahooQuote = collections.namedtuple(
'YahooQuote', 'date, open, high, low, close, volume, adj_close')
def prices(ticker):
# make url given ticker
csvfile = urllib2.urlopen(url)
return map(YahooQuote._make, csv.reader(csvfile))
Yahoo's stock quote csv format does not include the stock ticker. If I adjusted my named tuple class to include a ticker attribute, how would I modify the map expression to make it add the value of the ticker argument to each of the named tuple instances?
I'm genetically incapable of understanding code with map() in it, so I'm just going to transform "map(f, i)" into "[f(x) for x in i]" so I don't have to:
return [YahooQuote._make(x) for x in csv.reader(csvfile)]
Then it's a simple matter to add ticker to the end of the lists returned by csv.reader:
YahooQuote = collections.namedtuple(
'YahooQuote', 'date, open, high, low, close, volume, adj_close, ticker')
def prices(ticker):
# make url given ticker
ticker_list = [ticker]
csvfile = urllib2.urlopen(url)
return [YahooQuote._make(x + ticker_list) for x in csv.reader(csvfile)]

Categories

Resources