I am using tweepy to stream tweets and store it in a file. I am using python and flask. On a click of a button the stream start to fetch the tweets. What I want is, on a click of a button the stream should get stopped.
I know the answers related to a counter variable, but I don't want specific number of tweets to fetch.
Thanks in advance
def fetch_tweet():
page_type = "got"
lang_list = request.form.getlist('lang')
print lang_list
#return render_template("index.html",lang_list=lang_list,page_type=page_type)
l = StdOutListener()
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
stream = Stream(auth, l)
with open('hashtags.txt') as hf:
hashtag = [line.strip() for line in hf]
print hashtag
print request.form.getlist('fetchbtn')
if (request.form.getlist('stopbtn')) == ['stop']:
print "inside stop"
stream.disconnect()
return render_template("analyse.html")
elif (request.form.getlist('fetchbtn')) == ['fetch']:
stream.filter(track=lang_list, async=True)
return render_template("fetching.html")
So I'm assuming your initial button links to the initializing of a tweepy stream (i.e. a call to stream.filter()).
If you're going to allow your application to run while tweet collection is happening, you'll need to collect tweets asynchronously (threaded). Otherwise once you call stream.filter() it will lock your program up while it collects tweets until it either reaches some condition you have provided it or you ctrl-c out, etc.
To take advantage of tweepy's built in threading, you simply need to add the async parameter to your stream initialization, like so:
stream.filter(track=['your_terms'], async=True)
This will thread your tweet collection and allow your application to continue to run.
Finally, to stop your tweet collection, link a flask call to a function that calls disconnect() on your stream object, like so:
stream.disconnect()
This will disconnect your stream and stop tweet collection. Here is an example of this exact approach in a more object oriented design (see the gather() and stop() methods in the Pyckaxe object).
EDIT - Ok, I can see your issue now, but I'm going to leave my original answer up for others who might find this. You issue is where you are creating your stream object.
Every time fetch_tweet() gets called via flask, you are creating a new stream object, so when you call it the first time to start your stream it creates an initial object, but the second time it calls disconnect() on a different stream object. Creating a single instance of your stream will solve the issue:
l = StdOutListener()
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
stream = Stream(auth, l)
def fetch_tweet():
with open('hashtags.txt') as hf:
hashtag = [line.strip() for line in hf]
print hashtag
print request.form.getlist('fetchbtn')
if (request.form.getlist('stopbtn')) == ['stop']:
print "inside stop"
stream.disconnect()
return render_template("analyse.html")
elif (request.form.getlist('fetchbtn')) == ['fetch']:
stream.filter(track=lang_list, async=True)
return render_template("fetching.html")
Long story short, you need to create your stream object outside of fetch_tweets(). Good luck!
Related
I am using the code time.sleep(3600) and it is tweeting more than every 3600 seconds. Why is this happening?
Currently it is tweeting at 9 minutes past, then 32 minutes past.
Edit:
Here is the code. The only other reason this could be happening is that this may be running in multiple instances accidentally. I will check that.
# tweepy will allow us to communicate with Twitter, time will allow us to set how often we tweet
import tweepy, time
#enter the corresponding information from your Twitter application management:
CONSUMER_KEY = 'mykey' #keep the quotes, replace this with your consumer key
CONSUMER_SECRET = 'mykey' #keep the quotes, replace this with your consumer secret key
ACCESS_TOKEN = 'my-my' #keep the quotes, replace this with your access token
ACCESS_SECRET = 'mykey' #keep the quotes, replace this with your access token secret
# configure our access information for reaching Twitter
auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_SECRET)
# access Twitter!
api = tweepy.API(auth)
# open our content file and read each line
filename=open('content.txt')
f=filename.readlines()
filename.close()
# for each line in our contents file, lets tweet that line out except when we hit a error
for line in f:
try:
api.update_status(line)
print("Tweeting!")
except tweepy.TweepError as err:
print(err)
time.sleep(3600) #Tweet every hour
print("All done tweeting!")
This may be caused by your module not being protected from running when imported.
That means every time your module is imported, (could happen on
from package import *
), your code is interpreted and a new loop is created.
You could ensure your code is run only when you want it to run with this :
Make a function from your code, let's name it main().
You can then check if your module is called as a script.
def main():
# tweepy will allow us to communicate with Twitter, time will allow us to set how often we tweet
import tweepy, time
#enter the corresponding information from your Twitter application management:
CONSUMER_KEY = 'mykey' #keep the quotes, replace this with your consumer key
CONSUMER_SECRET = 'mykey' #keep the quotes, replace this with your consumer secret key
ACCESS_TOKEN = 'my-my' #keep the quotes, replace this with your access token
ACCESS_SECRET = 'mykey' #keep the quotes, replace this with your access token secret
# configure our access information for reaching Twitter
auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_SECRET)
# access Twitter!
api = tweepy.API(auth)
# open our content file and read each line
filename=open('content.txt')
f=filename.readlines()
filename.close()
# for each line in our contents file, lets tweet that line out except when we hit a error
for line in f:
try:
api.update_status(line)
print("Tweeting!")
except tweepy.TweepError as err:
print(err)
time.sleep(3600) #Tweet every hour
print("All done tweeting!")
if __name__ == "__main__":
main()
If you have to use your code from another script, you can use
from your_module import main
main()
Or from a command line :
python -m your_module
I made a python script which uses tweepy streaming module to stream mentions to a twitter account and carry some functions based on the status text.
I wanted it to stream until a mention is made, next stop streaming, carry some functions based on the status text and again start streaming.
This is my code:
class StdOutListener(tweepy.StreamListener):
def on_data(self, data):
tweet = json.loads(data.strip())
global d
d=tweet
return False #stops streaming after a tweet is fed to it
def on_error(self, status_code):
print(status_code)
time.sleep(120)
return F # To continue listening
def on_timeout(self)
time.sleep(120)
return True # To continue listening
while True:
d={}
listener = StdOutListener()
stream = tweepy.Stream(twitter_auth(tokens), listener)
stream.filter(track=['#xxx'])
stream.disconnect()
doSomething(d)
But it only works for one loop and later shows 420(Exceeding Rate Limit) errors,even though I just take in a single tweet (per stream, if I'm not wrong).
Can anyone please explain where I'm doing it wrong? And also when should we use async mode in tweepy stream listener?
PROBLEM SOLVED, SEE SOLUTION AT THE END OF THE POST
I need help to estimate running time for my tweepy program calling Twitter Stream API with location filter.
After I kicked it off, it has run for over 20 minutes, which is longer than what I expected. I am new to Twitter Stream API, and have only worked with REST API for couple of days. It looks to me that REST API will give me 50 tweets in a few seconds, easy. But this Stream request is taking a lot more time. My program hasn't died on me or given any error. So I don't know if there's anything wrong with it. If so, please do point out.
In conclusion, if you think my code is correct, could you provide an estimate for the running time? If you think my code is wrong, could you help me to fix it?
Thank you in advance!
Here's the code:
# Import Tweepy, sys, sleep, credentials.py
import tweepy, sys
from time import sleep
from credentials import *
# Access and authorize our Twitter credentials from credentials.py
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
box = [-86.33,41.63,-86.20,41.74]
class CustomStreamListener(tweepy.StreamListener):
def on_error(self, status_code):
print >> sys.stderr, 'Encountered error with status code:', status_code
return True # Don't kill the stream
def on_timeout(self):
print >> sys.stderr, 'Timeout...'
return True # Don't kill the stream
stream = tweepy.streaming.Stream(auth, CustomStreamListener()).filter(locations=box).items(50)
stream
I tried the method from http://docs.tweepy.org/en/v3.4.0/auth_tutorial.html#auth-tutorial Apparently it is not working for me... Here is my code below. Would you mind giving any input? Let me know if you have some working code. Thanks!
# Import Tweepy, sys, sleep, credentials.py
import tweepy, sys
from time import sleep
from credentials import *
# Access and authorize our Twitter credentials from credentials.py
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
# Assign coordinates to the variable
box = [-74.0,40.73,-73.0,41.73]
import tweepy
#override tweepy.StreamListener to add logic to on_status
class MyStreamListener(tweepy.StreamListener):
def on_status(self, status):
print(status.text)
def on_error(self, status_code):
if status_code == 420:
#returning False in on_data disconnects the stream
return False
myStreamListener = MyStreamListener()
myStream = tweepy.Stream(auth = api.auth, listener=myStreamListener())
myStream.filter(track=['python'], locations=(box), async=True)
Here is the error message:
Traceback (most recent call last):
File "test.py", line 26, in <module>
myStream = tweepy.Stream(auth = api.auth, listener=myStreamListener())
TypeError: 'MyStreamListener' object is not callable
PROBLEM SOLVED! SEE SOLUTION BELOW
After another round of debug, here is the solution for one who may have interest in the same topic:
# Import Tweepy, sys, sleep, credentials.py
try:
import json
except ImportError:
import simplejson as json
import tweepy, sys
from time import sleep
from credentials import *
# Access and authorize our Twitter credentials from credentials.py
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
# Assign coordinates to the variable
box = [-74.0,40.73,-73.0,41.73]
import tweepy
#override tweepy.StreamListener to add logic to on_status
class MyStreamListener(tweepy.StreamListener):
def on_status(self, status):
print(status.text.encode('utf-8'))
def on_error(self, status_code):
if status_code == 420:
#returning False in on_data disconnects the stream
return False
myStreamListener = MyStreamListener()
myStream = tweepy.Stream(api.auth, listener=myStreamListener)
myStream.filter(track=['NYC'], locations=(box), async=True)
Core Problem:
I think you're misunderstanding what the Stream is here.
Tl;dr: Your code is working, you're just not doing anything with the data that gets back.
The rest API call is a single call for information. You make a request, Twitter sends back some information, which gets assigned to your variable.
The StreamObject (which you've created as stream) from Tweepy opens a connection to twitter with your search parameters, and Twitter, well, streams Tweets to it. Forever.
From the Tweepy docs:
The streaming api is quite different from the REST api because the
REST api is used to pull data from twitter but the streaming api
pushes messages to a persistent session. This allows the streaming api
to download more data in real time than could be done using the REST
API.
So, you need to build a handler (streamListener, in tweepy's terminology), like this one that prints out the tweets..
Additional
Word of warning, from bitter experience - if you're going to try and save the tweets to a database: Twitter can, and will, stream objects to you much faster than you can save them to the database. This will result in your Stream being disconnected, because the tweets back up at Twitter, and over a certain level of backed-up-ness (not an actual phrase), they'll just disconnect you.
I handled this by using django-rq to put save jobs into a jobqueue - this way, I could handle hundreds of tweets a second (at peak), and it would smooth out. You can see how I did this below. Python-rq would also work if you're not using django as a framework round it. The read both method is just a function that reads from the tweet and saves it to a postgres database. In my specific case, I did that via the Django ORM, using the django_rq.enqueue function.
__author__ = 'iamwithnail'
from django.core.management.base import BaseCommand, CommandError
from django.db.utils import DataError
from harvester.tools import read_both
import django_rq
class Command(BaseCommand):
args = '<search_string search_string>'
help = "Opens a listener to the Twitter stream, and tracks the given string or list" \
"of strings, saving them down to the DB as they are received."
def handle(self, *args, **options):
try:
import urllib3.contrib.pyopenssl
urllib3.contrib.pyopenssl.inject_into_urllib3()
except ImportError:
pass
consumer_key = '***'
consumer_secret = '****'
access_token='****'
access_token_secret_var='****'
import tweepy
import json
# This is the listener, responsible for receiving data
class StdOutListener(tweepy.StreamListener):
def on_data(self, data):
decoded = json.loads(data)
try:
if decoded['lang'] == 'en':
django_rq.enqueue(read_both, decoded)
else:
pass
except KeyError,e:
print "Error on Key", e
except DataError, e:
print "DataError", e
return True
def on_error(self, status):
print status
l = StdOutListener()
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret_var)
stream = tweepy.Stream(auth, l)
stream.filter(track=args)
Edit: Your subsequent problem is caused by calling the listener wrongly.
myStreamListener = MyStreamListener() #creates an instance of your class
Where you have this:
myStream = tweepy.Stream(auth = api.auth, listener=myStreamListener())
You're trying to call the listener as a function when you use the (). So it should be:
myStream = tweepy.Stream(auth = api.auth, listener=myStreamListener)
And in fact, can probably just be more succinctly written as:
myStream = tweepy.Stream(api.auth,myStreamListener)
I'm using Flask and Tweepy to search for live tweets. On the front-end I have a user text input, and button called "Search". Ideally, when a user gives a search-term into the input and clicks the "Search" button, the Tweepy should listen for the new search-term and stop the previous search-term stream. When the "Search" button is clicked it executes this function:
#app.route('/search', methods=['POST'])
# gets search-keyword and starts stream
def streamTweets():
search_term = request.form['tweet']
search_term_hashtag = '#' + search_term
# instantiate listener
listener = StdOutListener()
# stream object uses listener we instantiated above to listen for data
stream = tweepy.Stream(auth, listener)
if stream is not None:
print "Stream disconnected..."
stream.disconnect()
stream.filter(track=[search_term or search_term_hashtag], async=True)
redirect('/stream') # execute '/stream' sse
return render_template('index.html')
The /stream route that is executed in the second to last line in above code is as follows:
#app.route('/stream')
def stream():
# we will use Pub/Sub process to send real-time tweets to client
def event_stream():
# instantiate pubsub
pubsub = red.pubsub()
# subscribe to tweet_stream channel
pubsub.subscribe('tweet_stream')
# initiate server-sent events on messages pushed to channel
for message in pubsub.listen():
yield 'data: %s\n\n' % message['data']
return Response(stream_with_context(event_stream()), mimetype="text/event-stream")
My code works fine, in the sense that it starts a new stream and searches for a given term whenever the "Search" button is clicked, but it does not stop the previous search. For example, if my first search term was "NYC" and then I wanted to search for a different term, say "Los Angeles", it will give me results for both "NYC" and "Los Angeles", which is not what I want. I want just "Los Angeles" to be searched. How do I fix this? In other words, how do I stop the previous stream? I looked through other previous threads, and I know I have to use stream.disconnect(), but I'm not sure how to implement this in my code. Any help or input would be greatly appreciated. Thanks so much!!
Below is some code that will cancel old streams when a new stream is created. It works by adding new streams to a global list, and then calling stream.disconnect() on all streams in the list whenever a new stream is created.
diff --git a/app.py b/app.py
index 1e3ed10..f416ddc 100755
--- a/app.py
+++ b/app.py
## -23,6 +23,8 ## auth.set_access_token(access_token, access_token_secret)
app = Flask(__name__)
red = redis.StrictRedis()
+# Add a place to keep track of current streams
+streams = []
#app.route('/')
def index():
## -32,12 +34,18 ## def index():
#app.route('/search', methods=['POST'])
# gets search-keyword and starts stream
def streamTweets():
+ # cancel old streams
+ for stream in streams:
+ stream.disconnect()
+
search_term = request.form['tweet']
search_term_hashtag = '#' + search_term
# instantiate listener
listener = StdOutListener()
# stream object uses listener we instantiated above to listen for data
stream = tweepy.Stream(auth, listener)
+ # add this stream to the global list
+ streams.append(stream)
stream.filter(track=[search_term or search_term_hashtag],
async=True) # make sure stream is non-blocking
redirect('/stream') # execute '/stream' sse
What this does not solve is the problem of session management. With your current setup a search by one user will affect the searches of all users. This can be avoided by giving your users some identifier and storing their streams along with their identifier. The easiest way to do this is likely to use Flask's session support. You could also do this with a requestId as Pierre suggested. In either case you will also need code to notice when a user has closed the page and close their stream.
Disclaimer: I know nothing about Tweepy, but this appears to be a design issue.
Are you trying to add state to a RESTful API? You may have a design problem.
As JRichardSnape answered, your API shouldn't be the one taking care of canceling a request; it should be done in the front-end. What I mean here is in the javascript / AJAX / etc calling this function, add another call, to the new function
#app.route('/cancelSearch', methods=['POST'])
With the "POST" that has the search terms. So long as you don't have state, you can't really do this safely in an async call: Imagine someone else makes the same search at the same time then canceling one will cancel both (remember, you don't have state so you don't know who you're canceling). Perhaps you do need state with your design.
If you must keep using this and don't mind breaking the "stateless" rule, then add a "state" to your request. In this case it's not so bad because you could launch a thread and name it with the userId, then kill the thread every new search
def streamTweets():
search_term = request.form['tweet']
userId = request.form['userId'] # If your limit is one request per user at a time. If multiple windows can be opened and you want to follow this limit, store userId in a cookie.
#Look for any request currently running with this ID, and cancel them
Alternatively, you could return a requestId, which you would then keep in the front-end can call cancelSearch?requestId=$requestId. In cancelSearch, you would have to find the pending request (sounds like that's in tweepy since you're not using your own threads) and disconnect it.
Out of curiosity I just watched what happens when you search on Google, and it uses a GET request. Have a look (debug tools -> Network; then enter some text and see the autofill). Google uses a token sent with every request (every time you type something)). It doesn't mean it's used for this, but that's basically what I described. If you don't want a session, then use a unique identifier.
Well I solved it by using timer method But still I'm looking for pythonic way.
from streamer import StreamListener
def stream():
hashtag = input
#assign each user an ID ( for pubsub )
StreamListener.userid = random_user_id
def handler(signum, frame):
print("Forever is over")
raise Exception("end of time")
def main_stream():
stream = tweepy.Stream(auth, StreamListener())
stream.filter(track=track,async=True)
redirect(url_for('map_stream'))
def close_stream():
# this is for closing client list in redis but don't know it's working
obj = redis.client_list(tweet_stream)
redis_client_list = obj[0]['addr']
redis.client_kill(redis_client_list)
stream = tweepy.Stream(auth, StreamListener())
stream.disconnect()
import signal
signal.signal(signal.SIGALRM, handler)
signal.alarm(300)
try:
main_stream()
except Exception:
close_stream()
print("function terminate")
I'm trying to use tweepy to build a dataset of tweets. Right now, I have the stream running for a single search term but I would like to use the library to search for different queries at the same time. I know I am able to supply the twitterStream.filter function with a list instead of just the "Disney" term, however I am not sure how I can see which tweet is a result to which term returned in this case.
What would be a good extension of the following code to search for ["Disney", "Pandabears", "Polarbears"] instead of just "Disney" and know which query returned the hit?
I can think of two ways to do this in principle:
1: Search the resulting tweet for the search terms and tag them accordingly. However, this doesn't really solve the problem as a tweet might contain two of the search terms. Described here
2: Run as many of the streams as there are search terms. However, I'm not sure the API allows the same app to have multiple active streams at once?
from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener
import time
ckey = "secret"
csecret="secret"
atoken="secret"
asecret="secret"
searchterm = "Disney"
class listener(StreamListener):
def on_data(self, data):
try:
tweet = data.split(',"text":"')[1].split('","source')[0]
saveThis = str(time.time())+"::%::"+tweet
saveFile = open("tweets.csv", "a")
saveFile.write(saveThis)
saveFile.write("\n")
saveFile.close()
return True
except BaseException, e:
print "Failed on data", str(e)
time.sleep(10)
return True # Don't kill the stream
def on_error(self, status):
print status
time.sleep(5)
return True # Don't kill the stream
try:
auth = OAuthHandler(ckey, csecret)
auth.set_access_token(atoken, asecret)
twitterStream = Stream(auth, listener())
twitterStream.filter(track=[searchterm])
except Exception:
print "Failed in auth or streaming"
Is there a "good" way to solve this problem?
I have chosen to go with option 1 and run a single stream with multiple search terms, checking each tweet for matches manually...
tweet = "I am a tweet"
terms = ["am","tweet"]
matches = []
for i, term in enumerate(terms):
if( term.lower() in tweet.lower() ):
matches.append(i)
matches
Out: [0, 1]
...and adding the resulting matches list in the the object returned by the stream listener. Of course, this results in a larger stram, increasing the hazard of being rate limited.