How to detect failed downloads using yfinance - python

I am using the API yfinance: https://github.com/ranaroussi/yfinance
With the simple code below:
data = yf.download("A AA AAA Z LOL KE QP")
I got the following output:
[*********************100%***********************] 7 of 7 completed
2 Failed downloads:
- LOL: 1d data not available for startTime=-2208988800 and endTime=1621954979. Only 100 years worth of day granularity data are allowed to be fetched per request.
- QP: 1d data not available for startTime=-2208988800 and endTime=1621954979. Only 100 years worth of day granularity data are allowed to be fetched per request.
I would like to know how can I detect in my code that "LOL" and "QP" failed?

This is the code where the 'error' is thrown in the yfinance package. This is not an actual error so you might want to override the function download which is quite big.
if shared._ERRORS:
print('\n%.f Failed download%s:' % (
len(shared._ERRORS), 's' if len(shared._ERRORS) > 1 else ''))
# print(shared._ERRORS)
print("\n".join(['- %s: %s' %
v for v in list(shared._ERRORS.items())]))
Edit
If found a way to get the failed download:
simply import the shared.py file and get the ERRORS dict.
This dict stores the last errors of the download method. It is reset before a download so it is accessible right after it.
Simply use the following code:
import yfinance.shared as shared
data = yf.download("A AA AAA Z LOL KE QP")
print(list(shared._ERRORS.keys()))

After playing a little bit more with the data output I found a non-elegant way of checking for failed values, for example for the element "LOL":
all(pd.isna(v) for v in dict(data.Close["LOL"]).values())
It check if all values are nan for the closing price.
This method is working, but not optimal I think, there might be a better and simpler way of doing it. Lets hope someone find it :)

Related

Python get all stock Tickers

This question have been asked to death but none of the answers provide an actual workable solution. I had found one previously in get-all-tickers:
pip install get-all-tickers
Recently, for whatever reason, the package get-all-tickers has stopped working:
from get_all_tickers import get_tickers as gt
list_of_tickers = gt.get_tickers()
gives me the error:
ParserError: Error tokenizing data. C error: Expected 1 fields in line 23, saw 46
As this is the only package I found that actually gave a complete ticker list (a good check is "NKLA" which is missing from 100% of all other "solutions" I've found on stackoverflow or elsewhere), I now either need a new way to get up-to-date ticker lists, or a fix to this...
Any ideas?
Another solution would be to load this data as CSV.
Get the CSV from:
https://plextock.com/us-symbols?utm_source=so
See this answer first: https://quant.stackexchange.com/a/1862/38968
NASDAQ makes this information available via FTP and they update it
every night. Log into ftp.nasdaqtrader.com anonymously. Look in the
directory SymbolDirectory. You'll notice two files: nasdaqlisted.txt
and otherlisted.txt. These two files will give you the entire list of
tradeable symbols, where they are listed, their name/description, and
an indicator as to whether they are an ETF.
Given this list, which you can pull each night, you can then query
Yahoo to obtain the necessary data to calculate your statistics.
Also: New York's Stock Exchange provides a search-function:
https://www.nyse.com/listings_directory/stock
.. and this page seems to have a lot as well - it has Nikola/NKLA at least ;)
https://www.advfn.com/nasdaq/nasdaq.asp?companies=N
This pip package was recently broken, Someone has already raised an issue on the projects github (https://github.com/shilewenuw/get_all_tickers/issues/12).
It was caused by an update to the NASDAQ API recently.
Not perfect, but Kaggle has some:
https://www.kaggle.com/datasets/jacksoncrow/stock-market-dataset?resource=download
You could use the free alphavantage API https://www.alphavantage.co/documentation/
example:
import requests
key = '2DHC1EFVR3EOQ33Z' # free key from https://www.alphavantage.co/support/#api-key -- no registration required
result = requests.get('https://www.alphavantage.co/query?function=GLOBAL_QUOTE&symbol=NKLA&apikey='+ key).json()
print(f'The price for NKLA right now is ${result["Global Quote"]["05. price"]}.')

Getting quantity of issues through github.api

My task is to get the number of open issues using github.api. Unfortunately, when I parsing any repositories, I get the same number - 30.
import requests
r = requests.get('https://api.github.com/repos/grpc/grpc/issues')
count = 0
for item in r.json():
if item['state'] == 'open':
count += 1
print(count)
Is there any way to get a real quantity of issues?
See the documentation about the Link response header, also you can pass the state or filters.
https://developer.github.com/v3/guides/traversing-with-pagination/
https://developer.github.com/v3/issues/
You'll have to page through.
http://.../issues?page=1&state=open
http://.../issues?page=2&state=open
The /issues/ endpoint is paginated: it means that you will have to iterate through several pages to get all the issues.
https://api.github.com/repos/grpc/grpc/issues?page=1
https://api.github.com/repos/grpc/grpc/issues?page=2
...
But there is a better way to get what you want. The GET /repos/:owner/:repo directly gives the number of open issues on a repository.
For instance, on https://api.github.com/repos/grpc/grpc, you can see:
"open_issues_count": 1052,
Click here to have a look at the documentation for this endpoint.

Python - How to parse smartctl program output?

I am writing a wrapper for smartctl in python 2.7.3...
I am having a hell of a time trying to wrap my head around how to parse the output from the smartctl program in Linux (Ubuntu x64 to be specific)
I am running smartctl -l selftest /dev/sdx via subprocess and grabbing the output into a variable
This variable is broken up into a list, then I drop the useless header data and blank lines from the output.
Now, I am left with a list of strings, which is great!
The data is sort-of tabular, and I want to parse it into a dict() full of lists (I think this is the correct way to represent tabular data in Python from reading the docs)
Here's a sample of the data:
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 44796 -
# 2 Short offline Completed without error 00% 44796 -
# 3 Short offline Completed without error 00% 44796 -
# 4 Short offline Completed without error 00% 44796 -
# 5 Short offline Completed without error 00% 44796 -
# 6 Extended offline Completed without error 00% 44771 -
# 7 Short offline Completed without error 00% 44771 -
# 8 Short offline Completed without error 00% 44741 -
# 9 Short offline Completed without error 00% 1 -
#10 Short offline Self-test routine in progress 70% 44813 -
I can see some issues with trying to parse this, and am open to solutions, but i may also just be doing this all wrong ;-):
The Status Text Self-test routine in progress flows past the first character of the text Remaining
In the Num column, the numbers after 9 are not separated from the # character by a space
I might be way off-base here, but this is my first time trying to parse something this eccentric.
Thank everyone who even bothers to read this wall of text in advance!!!
Here's my code so far, if anyone feels it necessary or finds it useful:
#testStatus.py
#This module provides an interface for retrieving
#test status and results for ongoing and completed
#drive tests
import subprocess
#this function takes a list of strings and removes
#strings which do not have pertinent information
def cleanOutput(data):
cleanedOutput = []
del data[0:3] #This deletes records 0-3 (lines 1-4) from the list
for item in data:
if item == '': #This removes blank items from remaining list
pass
else:
cleanedOutput.append(item)
return cleanedOutput
def resultsOutput(data):
headerLines = []
resultsLines = []
resultsTable = {}
for line in data:
if "START OF READ" in line or "log structure revision" in line:
headerLines.append(line)
else:
resultsLines.append(line)
nameLine = resultsLines[0].split()
print nameLine
def getStatus(sdxPath):
try:
output = subprocess.check_output(["smartctl", "-l", "selftest", sdxPath])
except subprocess.CalledProcessError:
print ("smartctl command failed...")
except Exception as e:
print (e)
splitOutput = output.split('\n')
cleanedOutput = cleanOutput(splitOutput)
resultsOutput(cleanedOutput)
#For Testing
getStatus("/dev/sdb")
For what it's worth (this is an old question): smartctl has a --json flag which you can use and then parse the output like normal JSON since version 7.0
release notes
The main parsing problem seems to be the first three columns; the remaining data is more straight forward. Assuming the output uses blanks between fields (instead of tab characters, which would be much easier to parse), I'd go for fixed length parsing, something like:
num = line[1:2]
desc = line[5:25]
status = line[25:54]
remain = line[54:58]
lifetime = line[60:68]
lba = line[77:99]
The header line would be handled differently. What structure you put the data into depends on what you want to do with it. A dictionary keyed by "num" might be appropriate if you mainly wanted to randomly access data by that "num" identifier. Otherwise a list might be better. Each entry (per line) could be a tuple, a list, a dictionary, a class instance, or maybe other things. If you want to access fields by name, then a dictionary or class instance per entry might be appropriate.

quickfix : how to get Symbol ( flag 55 ) from messages?

I'm running QuickFix with the Python API and connecting to a TT FIX Adapter using FIX4.2
I am logging on and sending a market data request for two instruments. That works fine and data from the instruments comes in as expected. I can get all kinds of information from the messages.
However, I am having trouble getting the Symbol (flag 55) field.
import quickfix as fix
def fromApp(self, message, sessionID):
ID = fix.Symbol()
message.getField(ID)
print ID
This works for the very first message [the initial Market Data Snapshot (flag 35 = W)] that comes to me. Once I start getting incremental refreshes (flag 35 = X), I can no longer get the Symbol field. Every message that arrives results in a Field Not Found error.
This is confusing me because in the logs, the Symbol field is always present, whether the message type is W or X.
Thinking the Symbol is in the header of refresh messages, I tried get.Field(ID) when 35 = W and get.Header().getField(ID) when 35 = X, however this did not work.
Can somebody help me figure out what is going on here? I would like to be able to explicitly tell my computer what instruments it is looking at.
Thanks
Your question is pretty simple, but you've mixed in some misconceptions as well.
1) Symbol will never be in the header. It is a body field.
2) In X messages, the symbol is in a repeating group. You first have to get a group object with msg.GetGroup(), then get the symbol from that. See this example code, from the repeating groups doc page.
3) In W messages, the symbol is not in a group. That's why it works for you there.
It seems clear you are pretty new to QuickFIX and FIX in general. I think you should take few minutes and skim through the "Working with Messages" section of the docs.
Also, the FIXimate website can be your best friend.

elastic search performance using pyes

Sorry for cross posting.The following question is also posted on Elastic Search's google group.
In short I am trying to find out why I am not able to get optimal performance while doing searches on a ES index which contains about 1.5 millon records.
Currently I am able to get about 500-1000 searches in 2 seconds. I would think that this should be orders of magnitudes faster. Also currently I am not using thrift.
Here is how I am checking the performance.
Using 0.19.1 version of pyes (tried both stable and dev version from github)
Using 0.13.8 version of requests
conn = ES(['localhost:9201'],timeout=20,bulk_size=1000)
loop_start = time.clock()
q1 = TermQuery("tax_name","cellvibrio")
for x in xrange(1000000):
if x % 1000 == 0 and x > 0:
loop_check_point = time.clock()
print 'took %s secs to search %d records' % (loop_check_point-loop_start,x)
results = conn.search(query=q1)
if results:
for r in results:
pass
# print len(results)
else:
pass
Appreciate any help that you can give to help me scaleup the searches.
Thanks!
Isn't it just a matter of concurrency?
You're doing all your queries in sequence. So a query has to finish before the next one can come in to play. If you have a 1ms RTT to the server, this will limit you to 1000 requests per second.
Try to run a few instances of your script in parallel and see what kind of performance you got.
There are severeal ways to improve that with using pyes.
First of all try to get rid of the DottedDict class/object which is used to generat from every json/dict to an object for every result you get.
Second switch the json encoder to ujson.
These two things brought up a lot of performance.
This has the disadvantage that you
have to use the ways to access dicts instead of the dotted version ("result.facets.attribute.term" instead you have to use something like "result.facets['attribute']['term']" or "result.facets.get('attribute', {}).get('term', None)" )
I did this through extending the ES class and replace the "_send_request" function.

Categories

Resources