I need to fetch twitter historical data for a given set of keywords. Twitter Search API returns tweets that are not more than 9 days old, so that will not do. I'm currently using Tweepy Library (http://code.google.com/p/tweepy/) to call Streaming API and it is working fine except the fact that it is too slow. For example, when I run a search for "$GOOG" sometimes it takes more than an hour between two results. There are definitely tweets containing that keyword but it isn't returning result fast enough.
What can be the problem? Is Streaming API slow or there is some problem in my method of accessing it? Is there any better way to get that data free of cost?
How far back do you need? To fetch historical data, you might want to keep the stream on indefinitely (the stream API allows for this) and store the stream locally, then retrieve historical data from your db.
I also use Tweepy for live Stream/Filtering and it works well. The latency is typically < 1s and Tweepy is able to handle large volume streams.
streaming API too fast you get message as soon as you post it, we use twitter4j. But streamer streams only current messages, so if you not listening on streamer the moment you send tweet then message is lost.
Related
We are trying to get the owned games of a lot of users but our problem is that after a while the API call limit (100.000 a day) kicks in and we stop getting results.
We use 'IPlayerService/GetOwnedGames/v0001/?key=APIKEY&steamid=STEAMID' in our call and it works for the first entries.
There are several other queries like the GetPlayerSummaries query which take multiple Steam IDs, but according to the documentation, this one only takes one.
Is there any other way to combine/ merge our queries? We are using Python and the urllib.request library to create the request.
Depending on the payload of the requests you have the following possibilities:
if each request brings only the newest updates, you could serialize the steam ID's when you get the response that you've hit the daily limit
if you have the ability to control via the request payload what data you receive, you could go for a multithreaded / multiprocessing approach that consume the request queries and the steam ID's from a couple of shared resources
As #andreihondrari indirectly stated in his comment under his answer, one can request to get an API key which can get more then the 100.000 calls/ day. This is stated under part "License to Steam Web API & Steam Data" of the documentation:
You are limited to one hundred thousand (100,000) calls to the Steam Web API per day. Valve may approve higher daily call limits if you adhere to these API Terms of Use.
This may be complicated and there is of cause the possibility that you wont get approved, but this is pretty much the only stable way you can go.
Furthermore you could theoretically use multiple Steam Web API keys, BUT:
Each API key still has the limitation of 100.000 calls/day so you'll need to implement a fail safe and a transition between used keys and possibly need to create lots of accounts.
As each user has his own specific friendlist and blocked list the API key can "see" a portion of the Steam Community exclusively (friends data is not public otherwise). So it could be that you are using one API key which can't "see" a certain user when you could've used another to "see" it properly.
You'll need a unique email adress for each created account.
Note: Having multiple accounts actually complies with Valves ToS according to this post on Arqade.
I am trying to build a script that will take a Twitter handle and calculate its engagement rate based on the last 10 tweets or so. If I understand Twitter's API correctly I would have to make a q request for each calculation. If I understand Twitter's pricing correctly, I would be paying between $0.75 and $1 per request depending on my package. That seems very expensive for me to build such a simple tool. Am I missing something, is there a cheaper way of doing it?
I am working on a project which is going to consume data from Twitter Stream API and count certain hashtags. But I have difficulties in understanding what kind architecture I need in my case. Should I use Tornado or is there more suitable frameworks for this?
It really depends on what you want to do with the Tweets. Simply reading a stream of Tweets has not been an issue that I've seen. In fact that can be done on an AWS Micro Instance. I even run more advanced regression algorithms on the real-time feed. The scalability problem arises if you try to process a set of historical Tweets. Since Tweets are produced so fast, processing historical Tweets can be very slow. That's when you should try to parallelize.
At first I want to find some API, but I have searched on the internet and didn't find anything
really helpful.
"Real time" I mean live stream the stock price on a webpage without a refresh.
If there is not such API, would my following method be a good way to implement this?
1. Python side, call yahoo finance api to get the most recent price.
2. Browser side, use ajax to constantly call server side to get the price and display the price. More specifically, I am thinking to use setInterval in jquery to achieve this.
How does this approach look?
Actually this is not specific to stock price data, any website that need to constantly retrieve data from server side need to consider this problem. For example google chat, facebook news feed, and so on. Can anybody tell me in general how to achieve live streaming data from server to browser?
Another way would be to use a push architecture. You could take a look at APE - Ajax Push Engine.
You could also take a look at Socket.IO, a realtime application framework for Node.JS.
Hope this helps!
You should definitely use a Push API. These days you should probably use http://www.websocket.org/
You don't want to use a rest API for real time, its inefficient to constantly "pull" the live price. Instead you want a service that will "push" changes to you whenever new trades are executed on the exchange. This is done with a websocket, which is a type of API but it is definitely different from a rest API. This article discusses the difference.
Intrinio provides a real-time websocket and you can access it via Python using this SDK on Github. You can access the same data via rest API using this package in Python. If you try them both you will see the architecture doesn't make sense with a rest API.
This video shows the trades coming in- trades don't execute on the market at regular intervals, it's completely sporadic. Instead of constantly "asking" the server for the data, it's better to "listen". This is called top of the book, meaning you get the newest trades as they come in from the top.
Is there a way for me to determine the total number of Twitter messages on a given trend topic (e.g. frequency of Twitter messages with subject matter on Haiti/#Haiti) at a given instance in time using the Twitter API? I'm writing a script in Python that will monitor Twitter traffic over a long spell of time and I was wondering how I could go about doing this.
Yes. Use the Twitter Streaming API to get a representative sample.
You might take a look at this site (see below) Drew has several sources of information and was to look at the data.
Network of People who Twitter about R
http://www.drewconway.com/zia/?p=1471