Getting BTC historical data by Kraken API - python

I'm trying to getting data from kraken exchange by the krakenex API. But i'm facing several problems, 'cause, I want getting the data in a range time bigger than the alllowed by the API.
The API only allows getting a dataframe with 720 rows, so 'cause that I need to do a loop while to getting more data and concat in another dataframe.
I've already read other topics about it, but I'm still not reaching good results.
import krakenex
import time
import krakenex
import pandas as pd
from pykrakenapi import KrakenAPI
from datetime import datetime
k = krakenex.API()
start = '28/01/2021 00:00:00'
start = datetime.strptime(start, "%d/%m/%Y %H:%M:%S")
start = int(time.mktime(start.timetuple()))
stop = '03/02/2021 00:00:00'
stop = datetime.strptime(stop, "%d/%m/%Y %H:%M:%S")
stop = int(time.mktime(stop.timetuple()))
prices = pd.DataFrame()
while start < stop:
time.sleep(5)
data = k.query_public('OHLC', {'pair':'XXBTZUSD', 'interval':1, 'since':start})
df = pd.DataFrame( data['result']['XXBTZUSD'])
daily_prices = df[0].to_list()
start = int(daily_prices[0])
prices = pd.concat([precos , df])

For weeks I have been working on a script that does exactly that. In my case I collect all pairs with BTC and ETH but you can use the script with any pair. To do this I used the REST API and defined some functions that automate everything. I download the data with 1 minute timeframe but it can be used for any timeframe.
First I defined a function that downloads the data in full or from a specific date, it's necessary because at the first run it will download all the data and then it will download only the new data. The parameter 'interval' defines the number of minutes of the timeframe while 'since' defines the beginning of the data to download.
def get_ohlc (pair, interval=1, since='last'):
endpoint = 'https://api.kraken.com/0/public/OHLC'
payLoad = {
'pair': pair,
'interval': interval,
'since' : since
}
response = requests.get(endpoint, payLoad)
data = response.json()
OHLC = data['result'][pair]
data = pd.DataFrame.from_records(OHLC, columns=['Time', 'Open', 'High', 'Low', 'Close', 'vwap', 'volume', 'count'])
data['Time'] = pd.to_datetime(data['Time'], unit='s')
data.set_index('Time',inplace=True)
data = data.drop(['vwap', 'volume', 'count'], axis=1)
data['Open'] = data.Open.astype(float)
data['High'] = data.High.astype(float)
data['Low'] = data.Low.astype(float)
data['Close'] = data.Close.astype(float)
return data
Then I defined a function to load the .json file that was saved into memory. The function returns the dataframe with the old data and a timestamp that indicates from where to download the new data. I also created a function for calculate the timestamp.
def load_data(pair, path):
data = pd.read_json(path + pair + '.json' , orient='split')
tmp = data.tail(1).index
tmp = tmp.strftime('%Y-%m-%d %H:%M:%S')
dt = str_to_datetime(tmp[0])
ts = dt.timestamp()
return data, ts
def str_to_datetime(datestr):
Y = int(datestr[0:4])
M = int(datestr[5:7])
D = int(datestr[8:10])
H = int(datestr[11:13])
m = int(datestr[14:16])
return datetime.datetime(Y, M, D, H, m, 0, tzinfo=tz.gettz("Etc/GMT"))
Now your main should be something like:
from countdown import countdown
import pandas as pd
import datetime
import os
path = os.getcwd() + '/historical_data/'
pair = 'XBTUSD'
while True:
if os.path.exists(path + pair + '.json') == False:
data = get_ohlc(pair, 1) # 1 minute timeframe
data.to_json(path + pair + '.json', orient='split')
else:
data1, ts = load_data(pair, path)
data2 = get_ohlc(pair, 1, ts)
data3 = pd.concat([data1, data2])
data3.drop(data3.tail(1).index,inplace=True) # delete last record because it's not ended
data3.to_json(path + pair + '.json', orient='split')
countdown(60) # update every hour
I delete the last record because when you download it it's not ended so we will download at the next update. I haven't tested if it works because I took pieces of code from my program, if it doesn't work let me know and I'll fix it.

Related

How do I specify a start date using time at a particular time of the day with yfinance

Please am trying to get the stock date at a specify time of the date adding the time of which i want the stock data to be gotten from as well to so far i did something like this but i keep getting error saying data - F: Data doesn't exist for startDate = 1666286770, endDate = 1666287074
this is my code below
def watchlist():
timezone = pytz.timezone('US/Eastern')
print(type(timezone))
aware = dt.datetime.now(timezone).time()
print(aware)
global pastTime
pastTime = dt.datetime.now(timezone) - dt.timedelta(minutes=5) # time of 5minutes ago
print(pastTime)
for x in ticks:
toStr = str(x)
syb = yf.Ticker(toStr)
data = pd.DataFrame(syb.history(interval="1m",period='1d',))
data2 = pd.DataFrame(syb.history(interval="1m",period='1d',start=pastTime))
if data['Open'].sum() < data2['Open'].sum():
print(data['Open'].sum())
print(data2['Open'].sum())
print('Watch stock')
else:
print(toStr, 'Proceed to sell with robinhood')
watchlist()
Screen shot of issue

How to speed up downloading data from Quandl/SHARADAR api

I've built a small download manager to get data for the SHARADAR tables in Quandl. GIT
This is functioning well but the downloads are very slow for the larger files (up to 2 gb over 10 years).
I attempted to use asyncio but this didn't speed up the downloads. This may be because Quandl doesn't allow concurrent downloads. Am I making an error in my code, or is this restriction I will have to live with from Quandl?
import asyncio
import math
import time
import pandas as pd
import quandl
import update
def segment_dates(table, date_start, date_end):
# Determine the number of days per asyncio loop. Determined by the max size of the
# range of data divided by the size of the files in 100 mb chunks.
# reduce this number for smaller more frequent downloads.
total_days = 40
# Number of days per download should be:
sizer = math.ceil(total_days / update.sharadar_tables[table][2])
# Number of days between start and end.
date_diff = date_end - date_start
loop_count = int(math.ceil(date_diff.days / sizer))
sd = date_start
sync_li = []
for _ in range(loop_count):
ed = sd + pd.Timedelta(days=sizer)
if ed > date_end:
ed = date_end
sync_li.append((sd, ed,))
sd = ed + pd.Timedelta(days=1)
return sync_li
async def get_data(table, kwarg):
"""
Using the table name and kwargs retrieves the most current data.
:param table: Name of table to update.
:param kwarg: Dictionary containing the parameters to send to Quandl.
:return dataframe: Pandas dataframe containing latest data for the table.
"""
return quandl.get_table("SHARADAR/" + table.upper(), paginate=True, **kwarg)
async def main():
table = "SF1"
# Name of the column that has the date field for this particular table.
date_col = update.sharadar_tables[table][0]
date_start = pd.to_datetime("2020-03-15")
date_end = pd.to_datetime("2020-04-01")
apikey = "API Key"
quandl.ApiConfig.api_key = apikey
# Get a list containing the times start and end for loops.
times = segment_dates(table, date_start, date_end)
wait_li = []
for t in times:
kwarg = {date_col: {"gte": t[0].strftime("%Y-%m-%d"), "lte": t[1].strftime("%Y-%m-%d")}}
wait_li.append(loop.create_task(get_data(table, kwarg)))
await asyncio.wait(wait_li)
return wait_li
if __name__ == "__main__":
starter = time.time()
try:
loop = asyncio.get_event_loop()
res = loop.run_until_complete(main())
for r in res:
df = r.result()
print(df.shape)
print(df.head())
except:
raise ValueError("error")
finally:
# loop.close()
print("Finished in {}".format(time.time() - starter))

Mutiple API with different variable in URL

I am learning Python and had a question regarding for and if loops. This is my scenario:
I have an endpoint that i make API-call with request.get
I need to retrieve all the historic data
I have a start_date (2017-06-17)
So i need to make multiple API-call because they have a limit of 60-days period. So i made my code like this:
date = datetime.strptime("2017-06-17", "%Y-%m-%d") # Start Date
current_date = date.date() # timedelta need date object so i make it a date object
days_after = (current_date+timedelta(days=60)).isoformat() # days_after is set to 60-days because limit in API
date_string = current_date.strftime('%Y-%m-%d') # made to string again since API need string not date object
So this is how i make the dates for 60 days period. Starting from 2017-06-17 and 60-days ahead.
This is how i make the API-request:
response = requests.get("https://reporting-api/campaign?token=xxxxxxxxxx&format=json&fromDate="+date_string+"&toDate="+days_after)
response_data = response.json() # Added this because i am writing temprorary to a JSON file
This is how i write to JSON file:
if response_data:
print("WE GOT DATA") # Debugging
data = response.json() # This is duplicate?
with open('data.json', 'w') as f: # Open my data.json file as write
json.dump(data, f) # dumps my json-data from API to the file
else:
print("NO DATA") # Debugging if no data / response. Should make a skip statement here
So my question is how can i proceed with my code so that every time i make a API-call starting from 2017-06-17 the date date_string and days_after should go 60 days forward for each API-call and append those data to data.json. I would maybe need some for loops or something?
Please note i have been using Python for 3 days now, be gentle.
Thanks!
You could use a while loop that changes the start and end date until a specified condition is met. Also, you can append the response to a file for every run. the example below I used the date of "today":
import os
from datetime import datetime, timedelta
x = 0
y = 60
date = datetime.strptime("2017-06-17", "%Y-%m-%d")
current_date = date.date()
date_start = current_date+timedelta(days=x)
while date_start < datetime.now().date():
date_start = current_date+timedelta(days=x)
days_after = current_date+timedelta(days=y)
x = x + 60
y = y + 60
response = requests.get("https://reporting-api/campaign?token=xxxxxxxxxx&format=json&fromDate="+date_start.isoformat() +"&toDate="+days_after.isoformat())
response_data = response.json()
if response_data:
print("WE GOT DATA")
data = response.json()
#create a file if not exists or append new data to it.
if os.path.exists('data.json'):
append_write = 'a' # append if already exists
else:
append_write = 'w' # make a new file if not
with open('data.json', append_write) as f:
json.dump(data, f)
else:
print("NO DATA")
Basically, on every run the time of start and end is increased by 60 days and appended to the data.json file.

IndexError: single positional indexer is out-of-bounds error while downloading data

I am running a code to download data and saving them in local drive. However, I am getting above mentioned error message. Please note initially I have converted date in a different format and while saving them I get this error message.
Can you please help me with this error?
'''
import quandl
import os
import pandas as pd
import datetime as dt
import glob
if name == "main":
'Creating bucket to store missing data file.'
data_missing = []
New_date = []
'Defining a path to save CSV files after downloading and also deleting all csv file at one go.'
extension = 'csv'
path = "F:/Tradepoint/MyMkt/"
if not os.path.exists(path):
os.mkdir(path)
os.chdir(path)
csv_count = [forma for forma in glob.glob('*.{}'.format(extension))]
for csv_coun in range(len(csv_count)):
os.remove(r"F:/Tradepoint/MyMkt/" + csv_count[csv_coun][0:])
'Setting up quandl configuration, reading ticker list, setting up date for which data is going to get downloaded'
quandl.ApiConfig.api_key = 'Hba3CzgNnEa2LMxR14FA'
end_date = dt.date.today()
diff_year = dt.timedelta(days=3650)
start_date = end_date - diff_year
stock_list = pd.read_csv(r"F:\Abhay_New\Abhay\Python\Project\SHARADAR_SF1.csv")
'Looping through quandl website to download data and renaming them as per requirement.'
for stock_lis in range(len(stock_list)):
data = quandl.get_table('SHARADAR/SEP', date={'gte':start_date, 'lte':end_date}, ticker=stock_list.iloc[stock_lis])
sort_by_date = data.sort_values('date')
for sort_by_dat in range(len(sort_by_date['date'])):
Date = dt.date.strftime(sort_by_date['date'][sort_by_dat],'%d-%m-%Y')
New_date.append(Date)
if len(data)>1:
Date = pd.Series(New_date).rename('Date').astype(str)
OPEN = sort_by_date['open']
HIGH = sort_by_date['high']
LOW = sort_by_date['low']
CLOSE = sort_by_date['close']
VOLUME = sort_by_date['volume']
final_data = pd.concat([Date,OPEN,HIGH,LOW,CLOSE,VOLUME],axis=1)
stk = stock_list.iloc[sort_by_dat][0]
final_data.to_csv(str(path + stk + '.csv'), sep=',', index = False, header = False)
else:
data_missing.append(stock_list.iloc[sort_by_dat])
print(data_missing)
'''
Thanks,
Abhay Dodiya
The index of both for loops is i. This causes potentially unintended behavior:
for i in range(2):
for i in range(3,11):
pass
print(i)
gives
10
10
So even after exiting the second loop, the last value from i is there. Rename the counting variable in that loop and your issue should be gone.
In your case you probably have more dates than stocks, and thus observe the error message you have.

Python Pandas append Data-frame multiprocessor pool for loop to exist Data-frame

i have dataframe called df3 with 5 columns
and i am parsing dataframe tables from bittrex.com using multiprocessor pool to dataframe called df2
i decreased processes to 2 only to simple my code as a test
here is my code
import pandas as pd
import json
import urllib.request
import os
from urllib import parse
import csv
import datetime
from multiprocessing import Process, Pool
import time
df3 = pd.DataFrame(columns=['tickers', 'RSIS', 'CCIS', 'ICH', 'SMAS'])
tickers = ["BTC-1ST", "BTC-ADA"]
def http_get(url):
result = {"url": url, "data": urllib.request.urlopen(url, timeout=60).read()}
return result
urls = ["https://bittrex.com/Api/v2.0/pub/market/GetTicks?marketName=" + ticker + "&tickInterval=thirtyMin" for ticker in tickers ]
pool = Pool(processes=200)
results = pool.map(http_get, urls)
for result in results:
j = json.loads(result['data'].decode())
df2 = pd.DataFrame(data=j['result'])
df2.rename(columns={'BV': 'BaseVolume', 'C': 'Close', 'H': 'High', 'L': 'Low', 'O': 'Open', 'T': 'TimeStamp',
'V': 'Volume'}, inplace=True)
# Tenken-sen (Conversion Line): (9-period high + 9-period low)/2))
nine_period_high = df2['High'].rolling(window=50).max()
nine_period_low = df2['Low'].rolling(window=50).min()
df2['tenkan_sen'] = (nine_period_high + nine_period_low) / 2
# Kijun-sen (Base Line): (26-period high + 26-period low)/2))
period26_high = df2['High'].rolling(window=250).max()
period26_low = df2['Low'].rolling(window=250).min()
df2['kijun_sen'] = (period26_high + period26_low) / 2
TEN30L = df2.loc[df2.index[-1], 'tenkan_sen']
TEN30LL = df2.loc[df2.index[-2], 'tenkan_sen']
KIJ30L = df2.loc[df2.index[-1], 'kijun_sen']
KIJ30LL = df2.loc[df2.index[-2], 'kijun_sen']
if (TEN30LL < KIJ30LL) and (TEN30L > KIJ30L):
df3.at[ticker, 'ICH'] = 'BUY'
elif (TEN30LL > KIJ30LL) and (TEN30L < KIJ30L):
df3.at[ticker, 'ICH'] = 'SELL'
else:
df3.at[ticker, 'ICH'] = 'NO'
pool.close()
pool.join()
print(df2)
my question is about i always get error NameError: name 'ticker' is not defined which will get me mad
why i received this error In spite of i pre-defined ticker as a for loop in the line urls = ["https://bittrex.com/Api/v2.0/pub/market/GetTicks?marketName=" + ticker + "&tickInterval=thirtyMin" for ticker in tickers ]
and already python used it successfully.
i googled for three days and tried several solutions without result.
any ideas please ???!!!!
I don't think you are looking at the correct line; when I run your code, I get:
NameError Traceback (most recent call last)
<ipython-input-1-fd766f4a9b8e> in <module>()
49 df3.at[ticker, 'ICH'] = 'SELL'
50 else:
---> 51 df3.at[ticker, 'ICH'] = 'NO'
52
53 pool.close()
so at line 51, not the line where you create the urls list. And this makes sense, because ticker is not defined outside of the list comprehension at that line. The problem is regardless of your use of multiprocessing or pandas, but due to Python scoping rules: a temporary variable in a list comprehension is not usable outside of it; it would be difficult to imagine how it would be because it has iterated through several values, unless you're just interested in the last value it had, which is not what you want here.
You'll probably have to keep track of the ticker throughout the fetching process, so you can relate the results to the right ticker in the end, something like:
def http_get(ticker):
url = "https://bittrex.com/Api/v2.0/pub/market/GetTicks?marketName=" + ticker + "&tickInterval=thirtyMin"
result = {"url": url, "data": urllib.request.urlopen(url, timeout=60).read(), "ticker": ticker}
return result
pool = Pool(processes=200)
results = pool.map(http_get, tickers)
for result in results:
j = json.loads(result['data'].decode())
df2 = pd.DataFrame(data=j['result'])
ticker = result['ticker']
...

Categories

Resources