Tweepy Search Keeps Returning Status Code 429 - python

I am trying to run the following code to search for two pieces of text in tweets:
search_words = ['Samsung', 'Amazon']
tweets = []
for tweet in tweepy.Cursor(api.search, q=search_words).items():
tweets.append(tweet.text)
But this keeps returning:
TweepError: Twitter error response: status code = 429
Which in the documentation states that I am exceeding the rate limit:
Rate limiting of the API is primarily on a per-user basis — or more accurately described, per user access token. If a method allows for 15 requests per rate limit window, then it allows 15 requests per window per access token.
Rate limits are divided into 15 minute intervals. All endpoints require authentication, so there is no concept of unauthenticated calls and rate limits.
Is my search term too broad? Even when I wait for 15 minutes, I still get the same error when I re-run the script, and even when I try to narrow the words used (just to test).
As an aside but related question, how many tweets/how far back in time will api.search return?
EDIT: Looking into this further, I think that the Cursor loop is making me hit the limit (180 per 15 minutes) after the 180th loop. Is there a more efficient way of searching all tweets in one block rather than having to iterate through?

api = tweepy.API(auth,wait_on_rate_limit=True, wait_on_rate_limit_notify=True)
you can try this and see if it helps it is a rate limiter

Related

Twitter scraping in Python

I have to scrape tweets from Twitter for a specific user (#salvinimi), from January 2018. The issue is that there are a lot of tweets in this range of time, and so I am not able to scrape all the ones I need!
I tried multiple solutions:
1)
pip install twitterscraper
from twitterscraper import query_tweets_from_user as qtfu
tweets = qtfu(user='matteosalvinimi')
With this method, I get only a few teets (500~600 more or less), instead of all the tweets... Do you know why?
2)
!pip install twitter_scraper
from twitter_scraper import get_tweets
tweets = []
for i in get_tweets('matteosalvinimi', pages=100):
tweets.append(i)
With this method I get an error -> "ParserError: Document is empty"...
If I set "pages=40", I get the tweets without errors, but not all the ones. Do you know why?
Three things for the first issue you encounter:
first of all, every API has its limits and one like Twitter would be expected to monitor its use and eventually stop a user from retrieving data if the user is asking for more than the limits. Trying to overcome the limitations of the API might not be the best idea and might result in being banned from accessing the site or other things (I'm taking guesses here as I don't know what's the policy of Twitter on the matter). That said, the documentation on the library you're using states :
With Twitter's Search API you can only sent 180 Requests every 15 minutes. With a maximum number of 100 tweets per Request this means you can mine for 4 x 180 x 100 = 72.000 tweets per hour.
By using TwitterScraper you are not limited by this number but by your internet speed/bandwith and the number of instances of TwitterScraper you are willing to start.
then, the function you're using, query_tweets_from_user() has a limit argument which you can set to an integer. One thing you can try is changing that argument and seeing whether you get what you want or not.
finally, if the above does not work, you could be subsetting your time range in two, three ore more subsets if needed, collect the data separately and merge them together afterwards.
The second issue you mention might be due to many different things so I'll just take a broad guess here. For me, either setting pages=100 is too high and by one way or another the program or the API is unable to retrieve the data, or you're trying to look at a hundred pages when there is less than a hundred in pages to look for reality, which results in the program trying to parse an empty document.

keep getting rate limited python api

ok so im really new to python and I am trying to create to assist me in marketing my music via social media. I am trying to code it so that when I compare a users followers with my followers if I am not following one of their followers, it automatically follows them. here is what I have
import twitter
import time
now = time.time
username = raw_input("whos followers")
api = twitter.Api(...)
friendslist = api.GetFollowersPaged(screen_name=username, count=1,)
myfollowers = api.GetFollowersPaged(user_id=821151801785405441, count=1)
for u in friendslist:
if u not in myfollowers:
api.CreateFriendship(u.friendslist)
print 'you followed new people'
time.sleep(15)
I am using python 2.7 and the python-twitter api wrapper my error seems to start at the api.CreateFriendship line. also I set the count to 1 to try to avoid rate limiting but hae had them as high as 150, 200 being the max
The Twitter API has fairly subjective controls in place for Write operations. There are daily follow limits and they designed to limit exactly the sort of thing you are doing.
see https://support.twitter.com/articles/15364 and https://support.twitter.com/articles/15364
If you do reach a limit, we'll let you know with an error message
telling you which limit you've hit. For limits that are time-based
(like the direct messages, Tweets, changes to account email, and API
request limits), you'll be able to try again after the time limit has
elapsed.

Fastest way to crawl Twitter data with Python - Followers' followers

I'm doing a research on user social relations in Twitter, in Python.
The problem is that "what is the fastest way to crawl followers of a certain user's followers information"
I searched a lot of information and am currently using Tweepy:
c = tweepy.Cursor(api.followers_ids, id=centre, count=5000).items()
while True:
try:
followers_ids_list.append(c.next())
except tweepy.TweepError:
# hit rate limit, sleep for 15 minutes
time.sleep(15 * 60 + 15)
continue
and after that I am using the /users/lookup to find the User() object according to those ids gained before.
However, this way is quite slow...I was wondering if there any fastest than what I am doing currently.
Because I want to find the user relations, which means followers in depth 2 is not enough.
Say, I have 100 followers, and those 100 followers have their own 200 followers, then the time needed for grabbing this social network (depth=3) would be:
(1 + 100 + 100*200)/15calls * 15mins / 60mins = 335 hours = about 14 days!
1 call: request my follower ids (100ids)
100 calls: request 100 followers' followers ids (100*200ids)
100*200 calls(at least): request 100*200(followers' followers) users's ids.
What I can think about to be alternative is to crawl the twitter.com website without api (but, I figure, this way would make my IP or account banned from Twitter....)
The API Limits prevent you from going any faster.
You could set up multiple apps and distribute the problem through them - but that's likely to get noticed by Twitter if they're all running from the same IP address.
You can never do it with Twitter API because of the 15 minutes time rate.
I'm also doing some work related to one author's followers. However, I need millions of followers' names, which is even worse.
My solution is to write my own crawler and it does work faster than API. It could crawl 100*1000 per night. (I test it on my local machine) This rate is lower than my expectation so I have to think about other ways to increase its speed.
Hope this could give you some inspirations.

How much data can I get with the Twitter Search API for one specific keyword?

I want collect data from twitter using python Tweepy library.
I surveyed the rate limits for Twitter API,which is 180 requests per 15-minute.
What I want to know how many data I can get for one specific keyword?put it in another way , when I use the Tweepy.Cursor,when it'll stops?
I not saying the maths calculation(100 count * 180 request * 4 times/hour etc.) but the real experience.I found a view as follows:
"With a specific keyword, you can typically only poll the last 5,000 tweets per keyword. You are further limited by the number of requests you can make in a certain time period. "
http://www.brightplanet.com/2013/06/twitter-firehose-vs-twitter-api-whats-the-difference-and-why-should-you-care/
Is this correct(if this's correct,I only need to run the program for 5 minutes or so)? or I am needed to keep getting as many tweets as they are there(which may make the program keep running very long time)?
You will definitely not be getting as many tweets as exist. The way Twitter limits how far back you can go (and therefore how many tweets are available) is with a minimum since_id parameter passed to the GET search/tweets call to the Twitter API. In Tweepy, the API.search function interfaces with the Twitter API. Twitter's GET search/tweets documentation has a lot of good info:
There are limits to the number of Tweets which can be accessed through the API. If the limit of Tweets has occured since the since_id, the since_id will be forced to the oldest ID available.
In practical terms, Tweepy's API.search should not take long to get all the available tweets. Note that not all tweets are available per the Twitter API, but I've never had a search take up more than 10 minutes.

Twitter Random Rate Limiting

I am trying to retrieve user Friend network using python-twitter API. I am using the getFriendIDs() method which retrieves the ids of all the accounts a particular twitter user is following. The following is a small snipped of my test code:
for item in IdList:
aDict[item] = api.GetFriendIDs(user_id=item,count=4999)
print "sleeping 60"
time.sleep(66)
print str(api.MaximumHitFrequency())+" The maximum hit frequency"
print api.GetRateLimitStatus()['resources']['friends']['/friends/ids']['remaining']
There are 35 ids (of twitter user accounts) in IdList and for each item I am retrieving upto 4999 Ids that the current user with id 'item' is following. I am aware of the new rate-limiting by twitter wherein the rate-limit window has been changed from 60 minutes to 15 minutes and the fact that they advice you not to make more than one request to the server per minute (api.MaximumHitFrequency()). So basically 15 requests in 15 minutes. That is exactly what I'm doing in fact I'm making a request to the server every 66 seconds and not 60 seconds but I get a rate-limit error after 6 requests. I am unable to figure out why this is happening. Please do let me know if anyone else has had this problem.
Have a look at https://github.com/bear/python-twitter/wiki/Rate-Limited-API---How-to-deal-with.
Also, it might help to use a newer version of the python-twitter code. The MaximumHitFrequency and GetRateLimitStatus methods have been modified with https://github.com/bear/python-twitter/commit/25cccb81fbeb4c630a0024981bc98f7fb41f3933.

Categories

Resources