I've seen a number of ways to script code (for example, in python: ystockquote) that returns the stock price (or the historical closing prices) of a particular stock. Is there a way of scripting the information for calculating various fundamental quantities: Enterprise Value, EBITDA (I know this is included in that python link), Free-cash-flow...etc.
I'm asking whether there is a command-line tool that can be pinged to return this kind of information (or enough of the relevant information to do the calculation oneself)? Something like a repository for earnings statements/debt/cash flow/taxes.
Thanks in advance for any suggestions!
JBW,
You can do this in multiple steps but you will have to invest some time to use the API and identify the financial data from the companies.
1) Identify company's CIK
2) Download XBRL file from SEC FTP
3) Use the numbers provided to compute the metric you are looking for
3*) Some companies will already contain those values in the XML
FTP Info : http://www.sec.gov/edgar/searchedgar/ftpusers.htm
XBRL Spec: http://www.sec.gov/info/edgar/nsarxml1_d.htm
I hope this helps.
Vladimir
Related
I'm doing a work-related project in which I should study whether we could extract certain fields of information (e.g. contract parties, start and end dates) from contracts automatically.
I am quite new to working with text data and am wondering if those pieces of information could be extracted with ML by having the whole contract as input and the information as output without tagging or annotating the whole text?
I understand that the extraction should be ran separately for each targeted field.
Thanks!
First question - how are the contracts stored? Are they PDFs or text-based?
If they're PDFs, there are a handful of packages that can extract text from a PDF (e.g. pdftotext).
Second question - is the data you're looking for in the same place in every document?
If so, you can extract the information you're looking for (like start and end dates) from a known location in the contract. If not, you'll have to do something more sophisticated. For example you may need to do a text search for "start date", if the same terminology is used in every contract. If different terminology is used from contract to contract, you may need to work to extract meaning from the text, which can be done using some sophisticated natural language processing (NLP).
Without more knowledge of your problem or a concrete example, it's hard to say what your best option may be.
I am not 100 % sure which thread of stack overflow to post this question and simply chose this thread to post the question.
I have been trying hours just to simply download a dataset and it is frustrating. So, I decided to seek help from already experienced users.
The problem is like this:
How to use python (web scraping) or R or bash wget or just click the menus or any other tools to download the data from the Bureau of Transportation Statistics with following criteria:
year 2018
airports from new york
should have columns like departure arrival etc.
I looked at the website BTS(https://www.bts.gov/), it is too overwhelming for me to download the data with the given criteria.
Helps or instructions or snapshots are much appreciated.
This could get you one step of the way there - but: https://www.transtats.bts.gov/ONTIME/Departures.aspx.
There's also: https://www.flightstats.com/ (I've only heard of this, not done any digging. There's a historical and data export option on the dropdown on their homepage)
'All Statistics'
--then select the NYC airport, if all 3, may have to loop over those--
--carrier: this is a huge list, not sure if you need 'every' flight but I selected UA out of EWR--
'All months'
'All days'
'2018'
The biggest
I'm using the Alpha Vantage API to fetch stock market data. However, this API seems to be geared towards only providing series of data which is also implied in its aptly named functions like TimeSeries. That means that if I request a quote from the API I get a series of different dates, times and so forth.
What I'm after is to get only data from a specific date and nothing else. I could get todays date and then use the "is in" if loop to check for it, but that does not seem like a good solution and it would waste quite a bit of resources, so I'm looking to see if there is another better solution available. I have not seen any mention of getting a single entry from their API and tring to get a slice of the dict returned does not seem to work good as the dict is unsorted.
Does someone know about a good way of fetching only stock market data from a single date from the TimeSeries class ?
I came across this problem and this question, and after looking into the documentation I saw this Quote Endpoint that returns just the latest info. Here is the description:
A lightweight alternative to the time series APIs, this service returns the latest price and volume information for a security of your choice.
Example from the documentation:
https://www.alphavantage.co/query?function=GLOBAL_QUOTE&symbol=MSFT&apikey=demo
If you need data from another day in the past you can download all historical data once and cache it.
I am at the absolutely basic level on Python 3 (everything i currently know is from TheMonkeyLords) and my main focus is to integrate Python 3 with XBRLware so that i can extract financial information from the SEC EDGAR database with accuracy and reliability.
How can i use the xbrlware frame with Python 3? I have absolutely no idea how you can use a frame with Python 3....
Any suggestions on what should I learn or code for me to study, clues etc would be great help!
Thank you
Don't do it. Based on personal experience, it is very difficult to extract useful financial data from XBRL. XBRLWare does work, but there is a lot of work to do afterwards to extract the data into something useful.
XBRL has over 100 definitions of "revenue". Each industry reports differently. Each company makes 100s of filings and you have to string together data from different reports. It's an incredibly frustrating process.
I have used XBRLWare as a Ruby Gem on Windows. (It is no longer supported.) It does "work". It downloads and formats the reports nicely, but it operates as a viewer. Most filings contain two quarters of data. (Probably not the quarters you want either.)
You can use the XBRL viewer on the SEC's website to accomplish the same thing. Or you can go to the company's 10-Qs.
Also, XBRL uses CIK codes for the companies. As far as I know, the SEC doesn't have a central database to match CIK codes to ticker symbols (if you can believe that!). So it can be frustrating to find the companies you want to download.
If you want to download all the XBRL filings, I've been told its like 6TB a month.
You can't pull historical financial data from XBRL either. You have to string two quarters at a time together. So, pull every IBM filing (XBRL is 3 yrs old) and string together all the 10-Qs. XBRL is only three years old for the large accelerated filers, so historical data is limited.
There is a reason why Wall Street still charges $25k/year for financial data. XBRL is very difficult to use and difficult to extract data from.
You could try: XBRLcloud.com or findynamics.com
What ready available algorithms could I use to data mine twitter to find out the degrees of separation between 2 people on twitter.
How does it change when the social graph keeps changing and updating constantly.
And then, is there any dump of twitter social graph data which I could use rather than making so many API calls to start over.
From the Twitter API
What's the Data Mining Feed and can I have access to it?
The Data Mining Feed is an expanded version of our /statuses/public_timeline REST API method. It returns 600 recent public statuses, cached for a minute at a time. You can request it up to once per minute to get a representative sample of the public statuses on Twitter. We offer this for free (and with no quality of service guarantees) to researchers and hobbyists. All we ask is that you provide a brief description of your research or project and the IP address(es) you'll be requesting the feed from; just fill out this form. Note that the Data Mining Feed is not intended to provide a contiguous stream of all public updates on Twitter; please see above for more information on the forthcoming "firehose" solution.
and also see: Streaming API Documentation
There was a company offering a dump of the social graph, but it was taken down and no longer available. As you already realized - it is kind of hard, as it is changing all the time.
I would recommend checking out their social_graph api methods as they give the most info with the least API calls.
There might be other ways of doing it but I've just spent the past 10 minutes looking at doing something similar and stumbled upon this Q.
I'd use an undirected (& weighted - as I want to look at location too) graph - use JgraphT or similar in py; JGraphT is java based but includes different prewritten algos.
You can then use an algorithm called BellmanFord; takes an integer input and searches the graph for the shortest path with the integer input, and only integer input, unlike Dijkstras.
http://en.wikipedia.org/wiki/Bellman%E2%80%93Ford_algorithm
I used it recently in a project for flight routing, iterating up to find shortest path with shortest 'hops' (edges).