i'm collecting tweets withe thier replies from Twitter's API to build data set and i'm using tweepy library in python for that,but the problem is that I get this error so much (Rate limit reached. Sleeping for:(any number for sec)) that delays me and I have to collect as many data as possible in the shortest time
I read that twitter has a rate limit of i think 15 requests per 15 minutes or something like that, but on my situation I can only gather a tweet or two tweet until it stops again and sometimes it stops for 15 minutes and then stop again for 15 minutes without giving me give me time between them, I don't know what caused the problem whether it is my code or not?
# Import the necessary package to process data in JSON format
try:
import json
except ImportError:
import simplejson as json
# Import the tweepy library
import tweepy
import sys
# Variables that contains the user credentials to access Twitter API
ACCESS_TOKEN = '-'
ACCESS_SECRET = '-'
CONSUMER_KEY = '-'
CONSUMER_SECRET = '-'
# Setup tweepy to authenticate with Twitter credentials:
auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_SECRET)
# Create the api to connect to twitter with your creadentials
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True, compression=True)
file2 = open('replies.csv','w', encoding='utf-8-sig')
replies=[]
non_bmp_map = dict.fromkeys(range(0x10000, sys.maxunicode + 1), 0xfffd)
for full_tweets in tweepy.Cursor(api.search,q='#عربي',timeout=999999,tweet_mode='extended').items():
if (not full_tweets.retweeted) and ('RT #' not in full_tweets.full_text):
for tweet in tweepy.Cursor(api.search,q='to:'+full_tweets.user.screen_name,result_type='recent',timeout=999999,tweet_mode='extended').items(1000):
if hasattr(tweet, 'in_reply_to_status_id_str'):
if (tweet.in_reply_to_status_id_str==full_tweets.id_str):
replies.append(tweet.full_text)
print(full_tweets._json)
file2.write("{ 'id' : "+ full_tweets.id_str + "," +"'Replies' : ")
for elements in replies:
file2.write(elements.strip('\n')+" , ")
file2.write("}\n")
replies.clear()
file2.close()
$ python code.py > file.csv
Rate limit reached. Sleeping for: 262
Rate limit reached. Sleeping for: 853
Hoping this might help
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=False, compression=True)
Just add this line to the Python script to avoid the sleep:
sleep_on_rate_limit=False
Related
I'm creating a bot that posts every 60 minutes covid numbers, I have it finished but idk how to make it repeat. Any idea? It's a little project that I have in mind and it's the last thing I have to do. (if you answer the code from the publication with inside the solution it would be very cool)
import sys
CONSUMER_KEY = 'XXXX'
CONSUMER_SECRET = 'XXXX'
ACCESS_TOKEN = 'XXXX'
ACCESS_TOKEN_SECRET = 'XXXX'
import tweepy
import requests
from lxml import html
def create_tweet():
response = requests.get('https://www.worldometers.info/coronavirus/')
doc = html.fromstring(response.content)
total, deaths, recovered = doc.xpath('//div[#class="maincounter-number"]/span/text()')
tweet = f'''Coronavirus Latest Updates
Total cases: {total}
Recovered: {recovered}
Deaths: {deaths}
Source: https://www.worldometers.info/coronavirus/
#coronavirus #covid19 #coronavirusnews #coronavirusupdates #COVID19
'''
return tweet
if __name__ == '__main__':
auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)
# Create API object
api = tweepy.API(auth)
try:
api.verify_credentials()
print('Authentication Successful')
except:
print('Error while authenticating API')
sys.exit(5)
tweet = create_tweet()
api.update_status(tweet)
print('Tweet successful')
You can simply add this statement at the end of the code
sleep(3600)
If you want it to run endlessly, you can do it like this:
while True:
insert your main code here
sleep(3600)
You might want to use a scheduler, there is already a built-in one in Python (sched), read more about it here: https://docs.python.org/3/library/sched.html
You have to use a timer like this one below.
(The function refresh refers to itself every hour)
from threading import Timer
def refresh(delay_repeat=3600): # 3600 sec for one hour
# Your refresh script here
# ....
# timer
Timer(delay_repeat, refresh, (delay_repeat, )).start()
Have a look at my runnable refresh graph here to have an idea :
How can I dynamically update my matplotlib figure as the data file changes?
You can use the task scheduler in Python. "schedule" is a library in python which you can install using pip or conda. This Library allows you rerun a function after a given interval (daily, hourly weekly etc).
First you need to install the library using pip
pip install schedule
Secondly, Put your code in a function. For e.g:
def theCode():
print("Do Something")
Third, set the schedule:
schedule.every(2).hours.do(theCode)
while 1:
schedule.run_pending()
time.sleep(10)
i'm collecting tweets withe thier replies from Twitter's API to build data set and i'm using tweepy library in python for that,but the problem is that I get this error so much (Rate limit reached. Sleeping for:(any number for sec)) that delays me and I have to collect as many data as possible in the shortest time
I read that twitter has a rate limit of i think 15 requests per 15 minutes or something like that, but on my situation I can only gather a tweet or two tweet until it stops again and sometimes it stops for 15 minutes and then stop again for 15 minutes without giving me give me time between them, I don't know what caused the problem whether it is my code or not?
# Import the necessary package to process data in JSON format
try:
import json
except ImportError:
import simplejson as json
# Import the tweepy library
import tweepy
import sys
# Variables that contains the user credentials to access Twitter API
ACCESS_TOKEN = '-'
ACCESS_SECRET = '-'
CONSUMER_KEY = '-'
CONSUMER_SECRET = '-'
# Setup tweepy to authenticate with Twitter credentials:
auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_SECRET)
# Create the api to connect to twitter with your creadentials
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True, compression=True)
file2 = open('replies.csv','w', encoding='utf-8-sig')
replies=[]
non_bmp_map = dict.fromkeys(range(0x10000, sys.maxunicode + 1), 0xfffd)
for full_tweets in tweepy.Cursor(api.search,q='#عربي',timeout=999999,tweet_mode='extended').items():
if (not full_tweets.retweeted) and ('RT #' not in full_tweets.full_text):
for tweet in tweepy.Cursor(api.search,q='to:'+full_tweets.user.screen_name,result_type='recent',timeout=999999,tweet_mode='extended').items(1000):
if hasattr(tweet, 'in_reply_to_status_id_str'):
if (tweet.in_reply_to_status_id_str==full_tweets.id_str):
replies.append(tweet.full_text)
print(full_tweets._json)
file2.write("{ 'id' : "+ full_tweets.id_str + "," +"'Replies' : ")
for elements in replies:
file2.write(elements.strip('\n')+" , ")
file2.write("}\n")
replies.clear()
file2.close()
$ python code.py > file.csv
Rate limit reached. Sleeping for: 262
Rate limit reached. Sleeping for: 853
Hoping this might help
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=False, compression=True)
Just add this line to the Python script to avoid the sleep:
sleep_on_rate_limit=False
I am using the code time.sleep(3600) and it is tweeting more than every 3600 seconds. Why is this happening?
Currently it is tweeting at 9 minutes past, then 32 minutes past.
Edit:
Here is the code. The only other reason this could be happening is that this may be running in multiple instances accidentally. I will check that.
# tweepy will allow us to communicate with Twitter, time will allow us to set how often we tweet
import tweepy, time
#enter the corresponding information from your Twitter application management:
CONSUMER_KEY = 'mykey' #keep the quotes, replace this with your consumer key
CONSUMER_SECRET = 'mykey' #keep the quotes, replace this with your consumer secret key
ACCESS_TOKEN = 'my-my' #keep the quotes, replace this with your access token
ACCESS_SECRET = 'mykey' #keep the quotes, replace this with your access token secret
# configure our access information for reaching Twitter
auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_SECRET)
# access Twitter!
api = tweepy.API(auth)
# open our content file and read each line
filename=open('content.txt')
f=filename.readlines()
filename.close()
# for each line in our contents file, lets tweet that line out except when we hit a error
for line in f:
try:
api.update_status(line)
print("Tweeting!")
except tweepy.TweepError as err:
print(err)
time.sleep(3600) #Tweet every hour
print("All done tweeting!")
This may be caused by your module not being protected from running when imported.
That means every time your module is imported, (could happen on
from package import *
), your code is interpreted and a new loop is created.
You could ensure your code is run only when you want it to run with this :
Make a function from your code, let's name it main().
You can then check if your module is called as a script.
def main():
# tweepy will allow us to communicate with Twitter, time will allow us to set how often we tweet
import tweepy, time
#enter the corresponding information from your Twitter application management:
CONSUMER_KEY = 'mykey' #keep the quotes, replace this with your consumer key
CONSUMER_SECRET = 'mykey' #keep the quotes, replace this with your consumer secret key
ACCESS_TOKEN = 'my-my' #keep the quotes, replace this with your access token
ACCESS_SECRET = 'mykey' #keep the quotes, replace this with your access token secret
# configure our access information for reaching Twitter
auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_SECRET)
# access Twitter!
api = tweepy.API(auth)
# open our content file and read each line
filename=open('content.txt')
f=filename.readlines()
filename.close()
# for each line in our contents file, lets tweet that line out except when we hit a error
for line in f:
try:
api.update_status(line)
print("Tweeting!")
except tweepy.TweepError as err:
print(err)
time.sleep(3600) #Tweet every hour
print("All done tweeting!")
if __name__ == "__main__":
main()
If you have to use your code from another script, you can use
from your_module import main
main()
Or from a command line :
python -m your_module
I am currently in the process of doing some research using sentiment analysis on twitter data regarding a certain topic (isn't necessarily important to this question) using python, of which I am a beginner at. I understand the twitter streaming API limits users to access only to the previous 7 days unless you apply for a full enterprise search which opens up the whole archive. I had recently been given access to the full archive for this research project from twitter but I am unable to specify a start and end date to the tweets I would like to stream into a csv file. This is my code:
import pandas as pd
from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener
ckey = 'xxxxxxxxxxxxxxxxxxxxxxx'
csecret = 'xxxxxxxxxxxxxxxxxxxxxxx'
atoken = 'xxxxxxxxxx-xxxxxxxxxxxxxxxxxxxxxxx'
asecret = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxx'
# =============================================================================
# def sentimentAnalysis(text):
# output = '0'
# return output
# =============================================================================
class listener(StreamListener):
def on_data(self, data):
tweet = data.split(',"text":"')[1].split('","source')[0]
saveMe = tweet+'::'+'\n'
output = open('output.csv','a')
output.write(saveMe)
output.close()
return True
def on_error(self, status):
print(status)
auth = OAuthHandler(ckey, csecret)
auth.set_access_token(atoken, asecret)
twitterStream = Stream(auth, listener())
twitterStream.filter(track=["#weather"], languages = ["en"])
Now this code streams twitter date from the past 7 days perfectly. I tried changing the bottom line to
twitterStream.filter(track=["#weather"], languages = ["en"], since = ["2016-06-01"])
but this returns this error :: filter() got an unexpected keyword argument 'since'.
What would be the correct way to filter by a given date frame?
The tweepy does not provide the "since" argument, as you can check yourself here.
To achieve the desired output, you will have to use the api.user_timeline, iterating through pages until the desired date is reached, Eg:
import tweepy
import datetime
# The consumer keys can be found on your application's Details
# page located at https://dev.twitter.com/apps (under "OAuth settings")
consumer_key=""
consumer_secret=""
# The access tokens can be found on your applications's Details
# page located at https://dev.twitter.com/apps (located
# under "Your access token")
access_token=""
access_token_secret=""
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
page = 1
stop_loop = False
while not stop_loop:
tweets = api.user_timeline(username, page=page)
if not tweets:
break
for tweet in tweets:
if datetime.date(YEAR, MONTH, DAY) < tweet.created_at:
stop_loop = True
break
# Do the tweet process here
page+=1
time.sleep(500)
Note that you will need to update the code to fit your needs, this is just a general solution.
i'm trying to tweepy to reply to certain tweets, and my reply includes an image.
the twt variable holds the tweet i'm trying to reply to.
here's what i'm doing at the moment:
# -*- coding: utf-8 -*-
import tweepy, time, random
CONSUMER_KEY = 'XXXX'
CONSUMER_SECRET = 'XXXX'
ACCESS_KEY = 'XXXX'
ACCESS_SECRET = 'XXXX'
auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_KEY, ACCESS_SECRET)
api = tweepy.API(auth)
query = ['aaa', 'bbb', 'ccc']
t0 = time.time()
count = 0
last_count = 0
f = open('last_replied.txt')
last_replied = int(f.readline().strip())
f.close()
print('starting time:', time.strftime('%X'))
while True:
if count > last_count:
print(time.strftime('%X'), ':', count, 'replies')
last_count = count
for i in range(3):
twts = api.search(query[i], since_id=last_replied)
if len(twts)>0:
for twt in twts:
sid = twt.id
sn = twt.user.screen_name
stat = "lalala" + "#" + sn
api.update_with_media('oscar1.gif',status=stat,in_reply_to_status_id=sid)
count += 1
last_replied = twt.id
f = open('last_replied.txt','w')
f.write(str(last_replied))
f.close()
pause = random.randint(50,90)
time.sleep(pause)
my tweet gets posted, but not as a reply to the original tweet (twt). instead, it just gets posted as a new, independent, tweet.
however, when instead of update_with_media as above, i use update_status, such as:
api.update_status(status=stat,in_reply_to_status_id=sid)
my new tweet does get posted as a reply to the original tweet (twt).
what am i missing?
thanks
the solution i ended up with was switching to the twython module, that has the functionality well documented and working perfectly.
thank you very much for your help.
The update_with_media endpoint is being deprecated by Twitter (https://dev.twitter.com/rest/reference/post/statuses/update_with_media) you shouldn't use it. Instead upload the media first with the media_upload method and add the media_ids you get back to the update_status method.
I glanced at the code for update_with_media and don't see any obvious bugs, but the new method of handling tweets with media was added to the Tweepy API in January - note the last two bullets here:
https://github.com/tweepy/tweepy/releases/tag/v3.2.0
If you download Tweepy 3.2.0 you should be able to switch over to the new regimen, which will hopefully fix your reply_to problem. (I can't say for sure if the new stuff works, I'm using an older version of Tweepy myself.)