i'm collecting tweets withe thier replies from Twitter's API to build data set and i'm using tweepy library in python for that,but the problem is that I get this error so much (Rate limit reached. Sleeping for:(any number for sec)) that delays me and I have to collect as many data as possible in the shortest time
I read that twitter has a rate limit of i think 15 requests per 15 minutes or something like that, but on my situation I can only gather a tweet or two tweet until it stops again and sometimes it stops for 15 minutes and then stop again for 15 minutes without giving me give me time between them, I don't know what caused the problem whether it is my code or not?
# Import the necessary package to process data in JSON format
try:
import json
except ImportError:
import simplejson as json
# Import the tweepy library
import tweepy
import sys
# Variables that contains the user credentials to access Twitter API
ACCESS_TOKEN = '-'
ACCESS_SECRET = '-'
CONSUMER_KEY = '-'
CONSUMER_SECRET = '-'
# Setup tweepy to authenticate with Twitter credentials:
auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_SECRET)
# Create the api to connect to twitter with your creadentials
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True, compression=True)
file2 = open('replies.csv','w', encoding='utf-8-sig')
replies=[]
non_bmp_map = dict.fromkeys(range(0x10000, sys.maxunicode + 1), 0xfffd)
for full_tweets in tweepy.Cursor(api.search,q='#عربي',timeout=999999,tweet_mode='extended').items():
if (not full_tweets.retweeted) and ('RT #' not in full_tweets.full_text):
for tweet in tweepy.Cursor(api.search,q='to:'+full_tweets.user.screen_name,result_type='recent',timeout=999999,tweet_mode='extended').items(1000):
if hasattr(tweet, 'in_reply_to_status_id_str'):
if (tweet.in_reply_to_status_id_str==full_tweets.id_str):
replies.append(tweet.full_text)
print(full_tweets._json)
file2.write("{ 'id' : "+ full_tweets.id_str + "," +"'Replies' : ")
for elements in replies:
file2.write(elements.strip('\n')+" , ")
file2.write("}\n")
replies.clear()
file2.close()
$ python code.py > file.csv
Rate limit reached. Sleeping for: 262
Rate limit reached. Sleeping for: 853
Hoping this might help
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=False, compression=True)
Just add this line to the Python script to avoid the sleep:
sleep_on_rate_limit=False
Related
A little help please. This is what I am working with now (I got it from here on stackoverflow) and it works very well, but it seems to only work with the most recent accounts in the list of accounts that don't follow me back. I want to start unfollowing accounts from the oldest to the newest because I keep reaching the limit of the API. I thought to make a list of followers and reverse it then plug that in somewhere but not quite sure how to do that or what the syntax would be. Thanks in advance.
import tweepy
from cred import *
from config import QUERY, UNFOLLOW, FOLLOW, LIKE, RETWEET
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth, wait_on_rate_limit=True)
def main():
try:
if UNFOLLOW:
my_screen_name = api.get_user(screen_name='YOUR_SCREEN_NAME')
for follower in my_screen_name.friends():
Status = api.get_friendship(source_id = my_screen_name.id , source_screen_name = my_screen_name.screen_name, target_id = follower.id, target_screen_name = follower.screen_name)
if Status [0].followed_by:
print('{} he is following You'.format(follower.screen_name))
else:
print('{} he is not following You'.format(follower.screen_name))
api.destroy_friendship(screen_name = follower.screen_name)
except tweepy.errors.TweepyException as e:
print(e)
while True:
main()
here is the config.py file
#config.py
UNFOLLOW = True
I recently assembled some pieces of code together to reach this, so I'll just copy paste what I already have here instead of updating your code, but I can point out the main points (and give some tips).
The full code:
import tweepy
from cred import *
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth, wait_on_rate_limit=True)
def unfollower():
followers = api.get_follower_ids(screen_name=api.verify_credentials().screen_name)
friends = api.get_friend_ids(screen_name=api.verify_credentials().screen_name)
print("You follow:", len(friends))
for friend in friends[::-1]:
if friend not in followers:
api.destroy_friendship(user_id = friend)
else:
pass
friends = api.friends_ids(screen_name=api.me().screen_name)
print("Now you're following:", len(friends))
unfollower()
Now what happened here and what is different from your code
This two variables:
followers = api.followers_ids(screen_name=api.verify_credentials().screen_name)
friends = api.friends_ids(screen_name=api.verify_credentials().screen_name)
create a list with the ids from both the followers (follow you) and the friends (you are following), now all we need to do is compare both.
There is a discussion about the Twitter Rate limit and how using cursors have a smaller rate than not using, but I'm not qualified to explain the whys, so let's just assume that if we do not want small rate limits, the best way is not to use requests that have a intrinsic small rate limit like the api.get_friendship and them getting the screen_name, instead I'm using the get_friend_ids method.
the next part involves what you called "make a list of followers and reverse", well the list is already there in the variable "followers", so all we need to do now is reverse read it with the following command:
for friend in friends[::-1]:
this says: "read each element of the list, starting from index -1" roughtly "read the list backwards".
Well, I think the major points are these, I created a function but you really don't even need to, is just easier to update this to a class if you need to, and this way you don't need to use the while True: main(), just call the function unfollow() and it will automatically end the script when the unfollows are over.
Now some minor points that might improve your code:
Instead of using
screen_name='YOUR_SCREEN_NAME'
That you need a config file or to hardcode the screen_name, you can use
screen_name=api.verify_credentials().screen_name
This way it will automatically knows that you want the authenticating user information (note that I didn't used this part on my code, for the get_friend_ids method does not need the screen_name)
Now this part
from cred import *
from config import QUERY, UNFOLLOW, FOLLOW, LIKE, RETWEET
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth, wait_on_rate_limit=True)
First I've eliminated the need for the config file
and you can eliminate all the extra info that comes imported from the cred file, so you don't need to import all in from cred import * updating cred.py with:
import tweepy
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth, wait_on_rate_limit=True)
and now you can only impor the api function with from cred import api, this way the code can become cleaner:
import tweepy
from cred import api
def unfollower():
followers = api.get_follower_ids(screen_name=api.verify_credentials().screen_name)
friends = api.get_friend_ids(screen_name=api.verify_credentials().screen_name)
print("You follow:", len(friends))
for friend in friends[::-1]:
if friend not in followers:
api.destroy_friendship(user_id = friend)
else:
pass
friends = api.get_friend_ids(screen_name=api.verify_credentials().screen_name)
print("Now you're following:", len(friends))
unfollower()
Lastly, if anyone is having problems with the api.get_friend_ids or get_follower_ids remember that the tweepy update for versions 4.x.x changed the name of some methods, the ones I remember are:
followers_ids is now get_follower_ids
friends_ids is now get_friend_ids
me() is now verify_credentials()
Well, I guest that's it, you can check the rest on the docs.
Happy pythoning!
I am using the code time.sleep(3600) and it is tweeting more than every 3600 seconds. Why is this happening?
Currently it is tweeting at 9 minutes past, then 32 minutes past.
Edit:
Here is the code. The only other reason this could be happening is that this may be running in multiple instances accidentally. I will check that.
# tweepy will allow us to communicate with Twitter, time will allow us to set how often we tweet
import tweepy, time
#enter the corresponding information from your Twitter application management:
CONSUMER_KEY = 'mykey' #keep the quotes, replace this with your consumer key
CONSUMER_SECRET = 'mykey' #keep the quotes, replace this with your consumer secret key
ACCESS_TOKEN = 'my-my' #keep the quotes, replace this with your access token
ACCESS_SECRET = 'mykey' #keep the quotes, replace this with your access token secret
# configure our access information for reaching Twitter
auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_SECRET)
# access Twitter!
api = tweepy.API(auth)
# open our content file and read each line
filename=open('content.txt')
f=filename.readlines()
filename.close()
# for each line in our contents file, lets tweet that line out except when we hit a error
for line in f:
try:
api.update_status(line)
print("Tweeting!")
except tweepy.TweepError as err:
print(err)
time.sleep(3600) #Tweet every hour
print("All done tweeting!")
This may be caused by your module not being protected from running when imported.
That means every time your module is imported, (could happen on
from package import *
), your code is interpreted and a new loop is created.
You could ensure your code is run only when you want it to run with this :
Make a function from your code, let's name it main().
You can then check if your module is called as a script.
def main():
# tweepy will allow us to communicate with Twitter, time will allow us to set how often we tweet
import tweepy, time
#enter the corresponding information from your Twitter application management:
CONSUMER_KEY = 'mykey' #keep the quotes, replace this with your consumer key
CONSUMER_SECRET = 'mykey' #keep the quotes, replace this with your consumer secret key
ACCESS_TOKEN = 'my-my' #keep the quotes, replace this with your access token
ACCESS_SECRET = 'mykey' #keep the quotes, replace this with your access token secret
# configure our access information for reaching Twitter
auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_SECRET)
# access Twitter!
api = tweepy.API(auth)
# open our content file and read each line
filename=open('content.txt')
f=filename.readlines()
filename.close()
# for each line in our contents file, lets tweet that line out except when we hit a error
for line in f:
try:
api.update_status(line)
print("Tweeting!")
except tweepy.TweepError as err:
print(err)
time.sleep(3600) #Tweet every hour
print("All done tweeting!")
if __name__ == "__main__":
main()
If you have to use your code from another script, you can use
from your_module import main
main()
Or from a command line :
python -m your_module
i'm collecting tweets withe thier replies from Twitter's API to build data set and i'm using tweepy library in python for that,but the problem is that I get this error so much (Rate limit reached. Sleeping for:(any number for sec)) that delays me and I have to collect as many data as possible in the shortest time
I read that twitter has a rate limit of i think 15 requests per 15 minutes or something like that, but on my situation I can only gather a tweet or two tweet until it stops again and sometimes it stops for 15 minutes and then stop again for 15 minutes without giving me give me time between them, I don't know what caused the problem whether it is my code or not?
# Import the necessary package to process data in JSON format
try:
import json
except ImportError:
import simplejson as json
# Import the tweepy library
import tweepy
import sys
# Variables that contains the user credentials to access Twitter API
ACCESS_TOKEN = '-'
ACCESS_SECRET = '-'
CONSUMER_KEY = '-'
CONSUMER_SECRET = '-'
# Setup tweepy to authenticate with Twitter credentials:
auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_SECRET)
# Create the api to connect to twitter with your creadentials
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True, compression=True)
file2 = open('replies.csv','w', encoding='utf-8-sig')
replies=[]
non_bmp_map = dict.fromkeys(range(0x10000, sys.maxunicode + 1), 0xfffd)
for full_tweets in tweepy.Cursor(api.search,q='#عربي',timeout=999999,tweet_mode='extended').items():
if (not full_tweets.retweeted) and ('RT #' not in full_tweets.full_text):
for tweet in tweepy.Cursor(api.search,q='to:'+full_tweets.user.screen_name,result_type='recent',timeout=999999,tweet_mode='extended').items(1000):
if hasattr(tweet, 'in_reply_to_status_id_str'):
if (tweet.in_reply_to_status_id_str==full_tweets.id_str):
replies.append(tweet.full_text)
print(full_tweets._json)
file2.write("{ 'id' : "+ full_tweets.id_str + "," +"'Replies' : ")
for elements in replies:
file2.write(elements.strip('\n')+" , ")
file2.write("}\n")
replies.clear()
file2.close()
$ python code.py > file.csv
Rate limit reached. Sleeping for: 262
Rate limit reached. Sleeping for: 853
Hoping this might help
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=False, compression=True)
Just add this line to the Python script to avoid the sleep:
sleep_on_rate_limit=False
I am currently in the process of doing some research using sentiment analysis on twitter data regarding a certain topic (isn't necessarily important to this question) using python, of which I am a beginner at. I understand the twitter streaming API limits users to access only to the previous 7 days unless you apply for a full enterprise search which opens up the whole archive. I had recently been given access to the full archive for this research project from twitter but I am unable to specify a start and end date to the tweets I would like to stream into a csv file. This is my code:
import pandas as pd
from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener
ckey = 'xxxxxxxxxxxxxxxxxxxxxxx'
csecret = 'xxxxxxxxxxxxxxxxxxxxxxx'
atoken = 'xxxxxxxxxx-xxxxxxxxxxxxxxxxxxxxxxx'
asecret = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxx'
# =============================================================================
# def sentimentAnalysis(text):
# output = '0'
# return output
# =============================================================================
class listener(StreamListener):
def on_data(self, data):
tweet = data.split(',"text":"')[1].split('","source')[0]
saveMe = tweet+'::'+'\n'
output = open('output.csv','a')
output.write(saveMe)
output.close()
return True
def on_error(self, status):
print(status)
auth = OAuthHandler(ckey, csecret)
auth.set_access_token(atoken, asecret)
twitterStream = Stream(auth, listener())
twitterStream.filter(track=["#weather"], languages = ["en"])
Now this code streams twitter date from the past 7 days perfectly. I tried changing the bottom line to
twitterStream.filter(track=["#weather"], languages = ["en"], since = ["2016-06-01"])
but this returns this error :: filter() got an unexpected keyword argument 'since'.
What would be the correct way to filter by a given date frame?
The tweepy does not provide the "since" argument, as you can check yourself here.
To achieve the desired output, you will have to use the api.user_timeline, iterating through pages until the desired date is reached, Eg:
import tweepy
import datetime
# The consumer keys can be found on your application's Details
# page located at https://dev.twitter.com/apps (under "OAuth settings")
consumer_key=""
consumer_secret=""
# The access tokens can be found on your applications's Details
# page located at https://dev.twitter.com/apps (located
# under "Your access token")
access_token=""
access_token_secret=""
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
page = 1
stop_loop = False
while not stop_loop:
tweets = api.user_timeline(username, page=page)
if not tweets:
break
for tweet in tweets:
if datetime.date(YEAR, MONTH, DAY) < tweet.created_at:
stop_loop = True
break
# Do the tweet process here
page+=1
time.sleep(500)
Note that you will need to update the code to fit your needs, this is just a general solution.
i'm trying to tweepy to reply to certain tweets, and my reply includes an image.
the twt variable holds the tweet i'm trying to reply to.
here's what i'm doing at the moment:
# -*- coding: utf-8 -*-
import tweepy, time, random
CONSUMER_KEY = 'XXXX'
CONSUMER_SECRET = 'XXXX'
ACCESS_KEY = 'XXXX'
ACCESS_SECRET = 'XXXX'
auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_KEY, ACCESS_SECRET)
api = tweepy.API(auth)
query = ['aaa', 'bbb', 'ccc']
t0 = time.time()
count = 0
last_count = 0
f = open('last_replied.txt')
last_replied = int(f.readline().strip())
f.close()
print('starting time:', time.strftime('%X'))
while True:
if count > last_count:
print(time.strftime('%X'), ':', count, 'replies')
last_count = count
for i in range(3):
twts = api.search(query[i], since_id=last_replied)
if len(twts)>0:
for twt in twts:
sid = twt.id
sn = twt.user.screen_name
stat = "lalala" + "#" + sn
api.update_with_media('oscar1.gif',status=stat,in_reply_to_status_id=sid)
count += 1
last_replied = twt.id
f = open('last_replied.txt','w')
f.write(str(last_replied))
f.close()
pause = random.randint(50,90)
time.sleep(pause)
my tweet gets posted, but not as a reply to the original tweet (twt). instead, it just gets posted as a new, independent, tweet.
however, when instead of update_with_media as above, i use update_status, such as:
api.update_status(status=stat,in_reply_to_status_id=sid)
my new tweet does get posted as a reply to the original tweet (twt).
what am i missing?
thanks
the solution i ended up with was switching to the twython module, that has the functionality well documented and working perfectly.
thank you very much for your help.
The update_with_media endpoint is being deprecated by Twitter (https://dev.twitter.com/rest/reference/post/statuses/update_with_media) you shouldn't use it. Instead upload the media first with the media_upload method and add the media_ids you get back to the update_status method.
I glanced at the code for update_with_media and don't see any obvious bugs, but the new method of handling tweets with media was added to the Tweepy API in January - note the last two bullets here:
https://github.com/tweepy/tweepy/releases/tag/v3.2.0
If you download Tweepy 3.2.0 you should be able to switch over to the new regimen, which will hopefully fix your reply_to problem. (I can't say for sure if the new stuff works, I'm using an older version of Tweepy myself.)