I am at the absolutely basic level on Python 3 (everything i currently know is from TheMonkeyLords) and my main focus is to integrate Python 3 with XBRLware so that i can extract financial information from the SEC EDGAR database with accuracy and reliability.
How can i use the xbrlware frame with Python 3? I have absolutely no idea how you can use a frame with Python 3....
Any suggestions on what should I learn or code for me to study, clues etc would be great help!
Thank you
Don't do it. Based on personal experience, it is very difficult to extract useful financial data from XBRL. XBRLWare does work, but there is a lot of work to do afterwards to extract the data into something useful.
XBRL has over 100 definitions of "revenue". Each industry reports differently. Each company makes 100s of filings and you have to string together data from different reports. It's an incredibly frustrating process.
I have used XBRLWare as a Ruby Gem on Windows. (It is no longer supported.) It does "work". It downloads and formats the reports nicely, but it operates as a viewer. Most filings contain two quarters of data. (Probably not the quarters you want either.)
You can use the XBRL viewer on the SEC's website to accomplish the same thing. Or you can go to the company's 10-Qs.
Also, XBRL uses CIK codes for the companies. As far as I know, the SEC doesn't have a central database to match CIK codes to ticker symbols (if you can believe that!). So it can be frustrating to find the companies you want to download.
If you want to download all the XBRL filings, I've been told its like 6TB a month.
You can't pull historical financial data from XBRL either. You have to string two quarters at a time together. So, pull every IBM filing (XBRL is 3 yrs old) and string together all the 10-Qs. XBRL is only three years old for the large accelerated filers, so historical data is limited.
There is a reason why Wall Street still charges $25k/year for financial data. XBRL is very difficult to use and difficult to extract data from.
You could try: XBRLcloud.com or findynamics.com
Related
Relative zipline newbie here (FYI I have been learning mainly via the book TRADING EVOLVED plus trial and error). I have a few data bundles that extract from my mysql db of historical daily price data and work fine - as in I can run algos on it which transact orders (of any type) as expected.
However, if I fabricate a symbol and create a bunch of price history for it, the algos will analyze the data no problem but not fulfill my orders for it. E.g. I could invent the symbol ZYXW and give it all the data associated with my real stock data, but no order happens. I can submit the order but it will sit there forever until the algo ends. At first I wondered if this was to do with the "data.can_trade" check I was doing, but it is the same without that call.
Any clues? I am probably misunderstanding some basic principle. Please enlighten me!
i have annual report of a company(in .pdf format) and i want to fetch balance sheet and other related report form annual report using python. i tried with PyPDF2 lib but it is extracting highly unstructured text. is there any way??
You should use textract
https://github.com/deanmalmgren/textract
It supports various file types for text extraction.
Your question is not very clear. I understand it as I’ve done a lot of work on extracting from UK annual reports. To explain to others, what you’re asking for sounds straightforward where in reality it’s a nightmare. Annual reports come in PDF format and none of the firms producing them follow any standard which makes it difficult to analyse thise reports even manually. PDFs loose structure when you convert them to text. I have a java tool that reads and detects the structure of UK PDF annual reports (similar to the one your provided in the link). It took me 5 years to come up with a solution that can process up to 95% of all UK annual reports despite the huge differences between them. Have a look: https://github.com/drelhaj/CFIE-FRSE there are links there to papers on how we did it.
I am not a software engineer and my knowledge of Python is focused on using it fro data wrangling and machine learning modeling. However, I need to learn how to get and post data to webpages and do webscraping as well. What are good on line tutorials or courses that would teach me the necessary skills? I find difficult learning from reading the documentation.
The following series of courses, is a very good start for learning python
https://www.class-central.com/mooc/4319/coursera-programming-for-everybody-getting-started-with-python
You might want to focus on the following course in the series:
https://www.class-central.com/mooc/4343/coursera-using-python-to-access-web-data
About the Course
This course will show how one can treat the Internet as a source of data. We will scrape, parse, and read web data as well as access data using web APIs. We will work with HTML, XML, and JSON data formats in Python. This course will cover Chapters 11-13 of the textbook “Python for Informatics”. To succeed in this course, you should be familiar with the material covered in Chapters 1-10 of the textbook and the first two courses in this specialization. These topics include variables and expressions, conditional execution (loops, branching, and try/except), functions, Python data structures (strings, lists, dictionaries, and tuples), and manipulating files. This course covers Python 2.
I've seen a number of ways to script code (for example, in python: ystockquote) that returns the stock price (or the historical closing prices) of a particular stock. Is there a way of scripting the information for calculating various fundamental quantities: Enterprise Value, EBITDA (I know this is included in that python link), Free-cash-flow...etc.
I'm asking whether there is a command-line tool that can be pinged to return this kind of information (or enough of the relevant information to do the calculation oneself)? Something like a repository for earnings statements/debt/cash flow/taxes.
Thanks in advance for any suggestions!
JBW,
You can do this in multiple steps but you will have to invest some time to use the API and identify the financial data from the companies.
1) Identify company's CIK
2) Download XBRL file from SEC FTP
3) Use the numbers provided to compute the metric you are looking for
3*) Some companies will already contain those values in the XML
FTP Info : http://www.sec.gov/edgar/searchedgar/ftpusers.htm
XBRL Spec: http://www.sec.gov/info/edgar/nsarxml1_d.htm
I hope this helps.
Vladimir
I am trying to apply knowledge i learnt during statistics courses to real world datasets.
I am looking for some real database/tables. Would be helpful if the link to page added as well . Format is not a constraint - i use python and i can well convert to sqlite .
One example would be [one medium sized table] of for identifying country for given ip address : http://ip-to-country.webhosting.info/node/view/6
Well, since your profile says you're from India, I thought some Indian Government statistics would help, so a quick google search yields this site:
http://mospi.nic.in/dwh/index.htm
Click on 'Tables', and you'll have a list of more data/tables than you could possibly need.
...these files all seem to be in Microsoft XLS format, but another quick google search yields a free converter: http://download.cnet.com/XLS-Converter/3000-2077_4-10401513.html
...or you could run a python program, xlrd ( http://pypi.python.org/pypi/xlrd ) and read the files directly.