list = [i for i in range(2321)]
for i in range(0, len(my_list), 100):
my_list[i:i+100]
query_get_data_by_dea_schedule = 'https://api.fda.gov/drug/ndc.json?search=dea_schedule:"{}"&limit={}'.format('CII', i)
print(query_get_data_by_dea_schedule)
data_df = pd.DataFrame(pd.read_json(path_or_buf=query_get_data_by_dea_schedule, orient='values', typ='series', convert_dates=False)['results'])
all_data_df = all_data_df.append(data_df)
I am trying to run this to get the data for 2321 lines that are coming from FDA for schedule 3 items. I need to read 100 at a time because that is the limit. I am not sure what am I doing wrong here. Also, am I reading that right 100 at a time to save it in the data frame? It stops and gives me : HTTPError: HTTP Error 400: Bad Request. thanks in advance.
Based on documentation you should use skip instead of limit - and use always limit=100 - like limit=100&skip=0, limit=100&skip=100, limit=100&skip=200, limit=100&skip=300, etc.
Minimal code which works for me:
import pandas as pd
url = 'https://api.fda.gov/drug/ndc.json?search=dea_schedule:"{}"&limit={}&skip={}'
all_data_df = []
limit = 100
for skip in range(0, 2321, limit):
query = url.format('CII', limit, skip)
print('query:', query)
data = pd.read_json(query, orient='values', typ='series', convert_dates=False)
data = data['results']
all_data_df.append(data)
print(all_data_df)
Related
This is my latest attempt:
Screenshot from my IDE
stockcode = pd.read_csv('D:\Book1.csv')
stockcode = stockcode.Symbol.to_list()
print(len(stockcode))
df = []
for i in stockcode:
df = isec.get_historical_data(
interval=time_interval,
from_date=from_date,
to_date=to_date,
stock_code=i,
exchange_code="NSE",
product_type="cash"
)
df = pd.DataFrame(df["Success"])
df.append(df)
df = pd.concat(df)
I am trying to fetch data of multiple stocks from brokers API but after few mins it throws an error.
how can it restart from the part where it stopped?
I tried some loop exception but of no use and the error thrown was Output exceeds the size limit
Screenshot from my IDE
If I understand you correctly, you want to restart with no looping through items you've already processed? In this case, I'd use manual retry in a code and looping through a copy of your original list without items which are already OK:
number_of_retries = 3
retry_counter = 0
while retry_counter < number_of_retries and not is_success:
try:
stockcode_process = stockcode.copy()
for i in stockcode_process:
# your for-loop body here with the additional last line as following to remove processed item:
stockcode.pop(i)
is_success = true
except Exception:
retry_counter += 1
I'm using a Crypto API that gives me the Time, Open, High, Low, Close Values of the last weeks back. I just need the first row.
The input:
[[1635260400000, 53744.5, 53744.5, 53430.71, 53430.71], [1635262200000, 53635.49, 53899.73, 53635.49, 53899.73], [1635264000000, 53850.63, 54258.62, 53779.11, 54242.25], [1635265800000, 54264.32, 54264.32, 53909.02, 54003.42]]
I've tried:
resp = pd.read_csv('https://api.coingecko.com/api/v3/coins/bitcoin/ohlc?vs_currency=eur&days=1')
resp = resp.astype(str)
Time = resp[resp.columns[0]]
Open = resp[resp.columns[1]]
High = resp[resp.columns[2]]
Low = resp[resp.columns[3]]
Close = resp[resp.columns[4]]
But this doesn't work as I can't process it(i wanted to process it from object to str to double or float). I want to have each value as a double in a different variable. Im kinda stuck at this.
The problem with using pandas is that the JSON array creates one row with several columns.
If you expect to just loop over the JSON array, I suggest using requests rather than pandas.
import requests
resp = requests.get('https://api.coingecko.com/api/v3/coins/bitcoin/ohlc?vs_currency=eur&days=1')
for row in resp.json():
timestamp, open_price, high, low, close = row
...
You just need to use read_json:
resp = pd.read_json('https://api.coingecko.com/api/v3/coins/bitcoin/ohlc?vs_currency=eur&days=1')
resp = resp.astype(float)
Time = resp[resp.columns[0]]
Open = resp[resp.columns[1]]
High = resp[resp.columns[2]]
Low = resp[resp.columns[3]]
Close = resp[resp.columns[4]]
But the previous solution is more compact and understandable.
I'm relatively new to python and very new to multithreading and multiprocessing. I've been trying to send out thousands of values (Approx. 70,000) into chunks through a web-based API and want it to return me data associated with all those values. The API can take on 50 values a batch at a time so for now as a test I have 100 values I'd like to send in 2 chunks of 50 values. Without multithreading, it would've taken me hours to finish the job so I've tried to use multithreading to improve performance.
The Issue: The code is getting stuck after performing only one task(first row, that even the header, not even the main values) on pool.map() part, I had to restart the notebook kernel. I've heard not to use multiprocessing on a notebook, so I've coded the whole thing on Spyder and ran it, but still the same. Code is below:
#create df data frame with
#some codes to get df of 100 values in
#2 chunks, each chunk contains 50 values.
output:
df = VAL
0 1166835704;1352357565;544477351;159345951;22...
1 354236462063;54666246046;13452466248...
def get_val(df):
data = []
v_list = df
s = requests.Session()
url = 'https://website/'
post_fields = {'format': 'json', 'data':v_list}
r = s.post(url, data=post_fields)
d = json.loads(r.text)
sort = pd.json_normalize(d, ['Results'])
return sort
if __name__ == "__main__":
pool = ThreadPool(4) # Make the Pool of workers
results = pool.map(get_val, df) #Open the df in their own threads
pool.close() #close the pool and wait for the work to finish
pool.join()
Any suggestions would be helpful. Thanks!
Can you check once with following
with ThreadPool(4) as pool:
results= pool.map(get_val, df) #df should be iterable.
print(results)
Also, pls.check if chunksize can be passed to threadpool as that can affect performance.
Python novice here (sorry if this is a dumb question)! I'm currently using a for loop to download and manipulate data. Unfortunately, I occasionally run into brief network issues that cause portions of the loop to fail.
Originally, I was doing something like this:
# Import Modules
import fix_yahoo_finance as yf
import pandas as pd
from stockstats import StockDataFrame as sdf
# Stock Tickers to Gather Data For - in my full code I have thousands of tickers
Ticker = ['MSFT','SPY','GOOG']
# Data Start and End Data
Data_Start_Date = '2017-03-01'
Data_End_Date = '2017-06-01'
# Create Data List to Append
DataList = pd.DataFrame([])
# Initialize Loop
for i in Ticker:
# Download Data
data = yf.download(i, Data_Start_Date, Data_End_Date)
# Create StockDataFrame
stock_df = sdf.retype(data)
# Calculate RSI
data['rsi'] = stock_df['rsi_14']
DataList.append(pd.DataFrame(data))
DataList.to_csv('DataList.csv',header=True,index=True)
With that basic layout, whenever I had a network error, it caused the entire program to halt and spit out an error.
I did some research and tried modifying the 'for loop' to following:
for i in Ticker:
try:
# Download Data
data = yf.download(i, Data_Start_Date, Data_End_Date)
# Create StockDataFrame
stock_df = sdf.retype(data)
# Calculate RSI
data['rsi'] = stock_df['rsi_14']
DataList.append(pd.DataFrame(data))
except:
continue
With this, the code always ran without issue, but whenever I encountered a network error, it skipped all the tickers it was on (failed to download their data).
I want this to download the data for each ticker once. If it fails, I want it to try again until it succeeds once and then move on to the next ticker. I tried using while True and variations of it, but it caused the loop to download the same ticker multiple times!
Any help or advice is greatly appreciated! Thank you!
If you can continue after you've hit a glitch (some protocols support it), then you're better off not using this exact approach. But for a slightly brute-force method:
for i in Ticker:
incomplete = True
tries = 10
while incomplete and tries > 0:
try:
# Download Data
data = yf.download(i, Data_Start_Date, Data_End_Date)
incomplete = False
except:
tries -= 1
# Create StockDataFrame
if incomplete:
print("Oops, it is really failing a lot, skipping: %r" % (i,))
continue # not technically needed, but in case you opt to add
# anything afterward ...
else:
stock_df = sdf.retype(data)
# Calculate RSI
data['rsi'] = stock_df['rsi_14']
DataList.append(pd.DataFrame(data))
This is slighly different that Prune's in that it stops after 10 attempts ... if it fails that many times, that indicates you may want to divert some energy into fixing a different problem such as network connectivity.
If it gets to that point, it will continue in the list of Tickers, so perhaps you can get most of what you need.
You can use a wrapper loop to continue until you get a good result.
for i in Ticker:
fail = True
while fail: # Keep trying until it works
try:
# Download Data
data = yf.download(i, Data_Start_Date, Data_End_Date)
# Create StockDataFrame
stock_df = sdf.retype(data)
# Calculate RSI
data['rsi'] = stock_df['rsi_14']
DataList.append(pd.DataFrame(data))
except:
continue
else:
fail = False
Is there a way to check the HTTP Status Code in the code below, as I have not used the request or urllib libraries which would allow for this.
from pandas.io.excel import read_excel
url = 'http://www.bankofengland.co.uk/statistics/Documents/yieldcurve/uknom05_mdaily.xls'
#check the sheet number, spot: 9/9, short end 7/9
spot_curve = read_excel(url, sheetname=8) #Creates the dataframes
short_end_spot_curve = read_excel(url, sheetname=6)
# do some cleaning, keep NaN for now, as forward fill NaN is not recommended for yield curve
spot_curve.columns = spot_curve.loc['years:']
valid_index = spot_curve.index[4:]
spot_curve = spot_curve.loc[valid_index]
# remove all maturities within 5 years as those are duplicated in short-end file
col_mask = spot_curve.columns.values > 5
spot_curve = spot_curve.iloc[:, col_mask]
#Providing correct names
short_end_spot_curve.columns = short_end_spot_curve.loc['years:']
valid_index = short_end_spot_curve.index[4:]
short_end_spot_curve = short_end_spot_curve.loc[valid_index]
# merge these two, time index are identical
# ==============================================
combined_data = pd.concat([short_end_spot_curve, spot_curve], axis=1, join='outer')
# sort the maturity from short end to long end
combined_data.sort_index(axis=1, inplace=True)
def filter_func(group):
return group.isnull().sum(axis=1) <= 50
combined_data = combined_data.groupby(level=0).filter(filter_func)
In pandas:
read_excel try to use urllib2.urlopen(urllib.request.urlopen instead in py3x) to open the url and get .read() of response immediately without store the http request like:
data = urlopen(url).read()
Though you need only part of the excel, pandas will download the whole excel each time. So, I voted #jonnybazookatone.
It's better to store the excel to your local, then you can check the status code and md5 of file first to verify data integrity or others.