Tweepy streamer stopping after few seconds - python

Hi !
I am experiencing some issue about tweepy library for Python. The first time I launched the below script, everything perfectly worked, and the second time... the script stop unexpectedly.
I did not found anything about this behavior, the Listener is stopping after few seconds, and I do not have any error code or something.
There is the simple code:
import tweepy
import sys
import json
from textwrap import TextWrapper
from datetime import datetime
from elasticsearch import Elasticsearch
consumer_key = "hidden"
consumer_secret = "hidden"
access_token = "hidden"
access_token_secret = "hidden"
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
ES_HOST = {"host" : "localhost", "port" : 9200}
es = Elasticsearch(hosts = [ES_HOST])
class StreamListener(tweepy.StreamListener):
print('Starting StreamListener')
status_wrapper = TextWrapper(width=60, initial_indent=' ', subsequent_indent=' ')
def on_status(self, status):
try:
print 'n%s %s' % (status.author.screen_name, status.created_at)
json_data = status._json
#print json_data['text']
es.create(index="idx_twp",
doc_type="twitter_twp_nintendo",
body=json_data
)
except Exception, e:
print e
pass
print('Starting Receiving')
streamer = tweepy.Stream(auth=auth, listener=StreamListener(), timeout=3000000000)
#Fill with your own Keywords bellow
terms = ['nintendo']
streamer.filter(None,terms)
#streamer.userstream(None)
print ('Ending program')
And then there is the ouput (only 2 seconds);
[root#localhost ~]# python projects/m/twitter/twitter_logs.py
Starting StreamListener
Starting Receiving
Ending program
I am using Python 2.7.5
Any ideas about ?

Hi !
I solved this weird issue by changing my Python version to 3.5 via virtualenv. For now, it works well.
This could was due to the python version, anyway if someone have this, I just recommend to use virtualenv to test another Python version, and see what happens.
FYI : I already opened issue #759 into the github project.

Related

How to make this script loop every hour

I'm creating a bot that posts every 60 minutes covid numbers, I have it finished but idk how to make it repeat. Any idea? It's a little project that I have in mind and it's the last thing I have to do. (if you answer the code from the publication with inside the solution it would be very cool)
import sys
CONSUMER_KEY = 'XXXX'
CONSUMER_SECRET = 'XXXX'
ACCESS_TOKEN = 'XXXX'
ACCESS_TOKEN_SECRET = 'XXXX'
import tweepy
import requests
from lxml import html
def create_tweet():
response = requests.get('https://www.worldometers.info/coronavirus/')
doc = html.fromstring(response.content)
total, deaths, recovered = doc.xpath('//div[#class="maincounter-number"]/span/text()')
tweet = f'''Coronavirus Latest Updates
Total cases: {total}
Recovered: {recovered}
Deaths: {deaths}
Source: https://www.worldometers.info/coronavirus/
#coronavirus #covid19 #coronavirusnews #coronavirusupdates #COVID19
'''
return tweet
if __name__ == '__main__':
auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)
# Create API object
api = tweepy.API(auth)
try:
api.verify_credentials()
print('Authentication Successful')
except:
print('Error while authenticating API')
sys.exit(5)
tweet = create_tweet()
api.update_status(tweet)
print('Tweet successful')
You can simply add this statement at the end of the code
sleep(3600)
If you want it to run endlessly, you can do it like this:
while True:
insert your main code here
sleep(3600)
You might want to use a scheduler, there is already a built-in one in Python (sched), read more about it here: https://docs.python.org/3/library/sched.html
You have to use a timer like this one below.
(The function refresh refers to itself every hour)
from threading import Timer
def refresh(delay_repeat=3600): # 3600 sec for one hour
# Your refresh script here
# ....
# timer
Timer(delay_repeat, refresh, (delay_repeat, )).start()
Have a look at my runnable refresh graph here to have an idea :
How can I dynamically update my matplotlib figure as the data file changes?
You can use the task scheduler in Python. "schedule" is a library in python which you can install using pip or conda. This Library allows you rerun a function after a given interval (daily, hourly weekly etc).
First you need to install the library using pip
pip install schedule
Secondly, Put your code in a function. For e.g:
def theCode():
print("Do Something")
Third, set the schedule:
schedule.every(2).hours.do(theCode)
while 1:
schedule.run_pending()
time.sleep(10)

Running Time Estimate for Stream Twitter with Location Filter in Tweepy

PROBLEM SOLVED, SEE SOLUTION AT THE END OF THE POST
I need help to estimate running time for my tweepy program calling Twitter Stream API with location filter.
After I kicked it off, it has run for over 20 minutes, which is longer than what I expected. I am new to Twitter Stream API, and have only worked with REST API for couple of days. It looks to me that REST API will give me 50 tweets in a few seconds, easy. But this Stream request is taking a lot more time. My program hasn't died on me or given any error. So I don't know if there's anything wrong with it. If so, please do point out.
In conclusion, if you think my code is correct, could you provide an estimate for the running time? If you think my code is wrong, could you help me to fix it?
Thank you in advance!
Here's the code:
# Import Tweepy, sys, sleep, credentials.py
import tweepy, sys
from time import sleep
from credentials import *
# Access and authorize our Twitter credentials from credentials.py
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
box = [-86.33,41.63,-86.20,41.74]
class CustomStreamListener(tweepy.StreamListener):
def on_error(self, status_code):
print >> sys.stderr, 'Encountered error with status code:', status_code
return True # Don't kill the stream
def on_timeout(self):
print >> sys.stderr, 'Timeout...'
return True # Don't kill the stream
stream = tweepy.streaming.Stream(auth, CustomStreamListener()).filter(locations=box).items(50)
stream
I tried the method from http://docs.tweepy.org/en/v3.4.0/auth_tutorial.html#auth-tutorial Apparently it is not working for me... Here is my code below. Would you mind giving any input? Let me know if you have some working code. Thanks!
# Import Tweepy, sys, sleep, credentials.py
import tweepy, sys
from time import sleep
from credentials import *
# Access and authorize our Twitter credentials from credentials.py
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
# Assign coordinates to the variable
box = [-74.0,40.73,-73.0,41.73]
import tweepy
#override tweepy.StreamListener to add logic to on_status
class MyStreamListener(tweepy.StreamListener):
def on_status(self, status):
print(status.text)
def on_error(self, status_code):
if status_code == 420:
#returning False in on_data disconnects the stream
return False
myStreamListener = MyStreamListener()
myStream = tweepy.Stream(auth = api.auth, listener=myStreamListener())
myStream.filter(track=['python'], locations=(box), async=True)
Here is the error message:
Traceback (most recent call last):
File "test.py", line 26, in <module>
myStream = tweepy.Stream(auth = api.auth, listener=myStreamListener())
TypeError: 'MyStreamListener' object is not callable
PROBLEM SOLVED! SEE SOLUTION BELOW
After another round of debug, here is the solution for one who may have interest in the same topic:
# Import Tweepy, sys, sleep, credentials.py
try:
import json
except ImportError:
import simplejson as json
import tweepy, sys
from time import sleep
from credentials import *
# Access and authorize our Twitter credentials from credentials.py
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
# Assign coordinates to the variable
box = [-74.0,40.73,-73.0,41.73]
import tweepy
#override tweepy.StreamListener to add logic to on_status
class MyStreamListener(tweepy.StreamListener):
def on_status(self, status):
print(status.text.encode('utf-8'))
def on_error(self, status_code):
if status_code == 420:
#returning False in on_data disconnects the stream
return False
myStreamListener = MyStreamListener()
myStream = tweepy.Stream(api.auth, listener=myStreamListener)
myStream.filter(track=['NYC'], locations=(box), async=True)
Core Problem:
I think you're misunderstanding what the Stream is here.
Tl;dr: Your code is working, you're just not doing anything with the data that gets back.
The rest API call is a single call for information. You make a request, Twitter sends back some information, which gets assigned to your variable.
The StreamObject (which you've created as stream) from Tweepy opens a connection to twitter with your search parameters, and Twitter, well, streams Tweets to it. Forever.
From the Tweepy docs:
The streaming api is quite different from the REST api because the
REST api is used to pull data from twitter but the streaming api
pushes messages to a persistent session. This allows the streaming api
to download more data in real time than could be done using the REST
API.
So, you need to build a handler (streamListener, in tweepy's terminology), like this one that prints out the tweets..
Additional
Word of warning, from bitter experience - if you're going to try and save the tweets to a database: Twitter can, and will, stream objects to you much faster than you can save them to the database. This will result in your Stream being disconnected, because the tweets back up at Twitter, and over a certain level of backed-up-ness (not an actual phrase), they'll just disconnect you.
I handled this by using django-rq to put save jobs into a jobqueue - this way, I could handle hundreds of tweets a second (at peak), and it would smooth out. You can see how I did this below. Python-rq would also work if you're not using django as a framework round it. The read both method is just a function that reads from the tweet and saves it to a postgres database. In my specific case, I did that via the Django ORM, using the django_rq.enqueue function.
__author__ = 'iamwithnail'
from django.core.management.base import BaseCommand, CommandError
from django.db.utils import DataError
from harvester.tools import read_both
import django_rq
class Command(BaseCommand):
args = '<search_string search_string>'
help = "Opens a listener to the Twitter stream, and tracks the given string or list" \
"of strings, saving them down to the DB as they are received."
def handle(self, *args, **options):
try:
import urllib3.contrib.pyopenssl
urllib3.contrib.pyopenssl.inject_into_urllib3()
except ImportError:
pass
consumer_key = '***'
consumer_secret = '****'
access_token='****'
access_token_secret_var='****'
import tweepy
import json
# This is the listener, responsible for receiving data
class StdOutListener(tweepy.StreamListener):
def on_data(self, data):
decoded = json.loads(data)
try:
if decoded['lang'] == 'en':
django_rq.enqueue(read_both, decoded)
else:
pass
except KeyError,e:
print "Error on Key", e
except DataError, e:
print "DataError", e
return True
def on_error(self, status):
print status
l = StdOutListener()
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret_var)
stream = tweepy.Stream(auth, l)
stream.filter(track=args)
Edit: Your subsequent problem is caused by calling the listener wrongly.
myStreamListener = MyStreamListener() #creates an instance of your class
Where you have this:
myStream = tweepy.Stream(auth = api.auth, listener=myStreamListener())
You're trying to call the listener as a function when you use the (). So it should be:
myStream = tweepy.Stream(auth = api.auth, listener=myStreamListener)
And in fact, can probably just be more succinctly written as:
myStream = tweepy.Stream(api.auth,myStreamListener)

Tweepy on_direct_message() never called

I've been running the script below using tweepy, but the on_direct_message() is never called. I'd like to use this function so I can receive new direct messages. I've used tweepy for the past month without any issue until now. There seem to be others out there will a similar issue: Tweepy streaming: on_direct_message() is never called
I'm on a Mac OS X 10.10.5 and I'm using Python 2.7.
Any help would be really appreciated.
class MyStreamListener(tweepy.StreamListener):
def on_direct_message(self, status):
print "status: "
print status
myStreamListener = MyStreamListener()
myStream = tweepy.Stream(auth = api.auth, listener = MyStreamListener(), timeout = None, retry_count = None)
myStream.filter(track=["filter"], async=False)

Tweepy Stream python, running error on IDLE

When I run my script just a blank shell opens and nothing happens. It gives a restart line in the output shell and stops. Then when i try to cancel the window it asks me, "The program is still running, do you want to kill it".. I waited for over 15 mins but nothing happened. Can you help me. I am using Mac
Here is my code
from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener
ckey = ''
csecret = ''
atoken = ''
asecret = ''
class listener(StreamListener):
def on_data(self, data):
print (data)
return True
def on_error(self, status):
print (status)`enter code here`
auth = OAuthHandler(ckey, csecret)
auth.set_access_token(atoken, asecret)
twitterStream = Stream(auth, listener())
twitterStream.filter(track=["car"])
There are 2 problems in the code snippet you provided The first one is in the on_error method which I think is not causing any problems since this method would be invoked only when an error is encountered, still you can rewrite the method as:
def on_error(self, status):
print ("The error is : "+status)
And the next issue is in the line twitterStream.filter(track=["car"]), By using this line the performance of this code now depends upon as how many public tweets contain the keyword car in their tweets , if unfortunately None of public tweets have been made using this keyword then you will get nothing printed on the console, So I will recommend you to use more set of keywords like: vehicle, automobile, etc to increase you chance.
For testing purpose you could remove this line from the piece of code and end you code at twitterStream = Stream(auth, listener()) , If you still see no output on the IDLE console then you should try some other text editor like sublime, Canopy, etc..

replying to tweet with image using tweepy

i'm trying to tweepy to reply to certain tweets, and my reply includes an image.
the twt variable holds the tweet i'm trying to reply to.
here's what i'm doing at the moment:
# -*- coding: utf-8 -*-
import tweepy, time, random
CONSUMER_KEY = 'XXXX'
CONSUMER_SECRET = 'XXXX'
ACCESS_KEY = 'XXXX'
ACCESS_SECRET = 'XXXX'
auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_KEY, ACCESS_SECRET)
api = tweepy.API(auth)
query = ['aaa', 'bbb', 'ccc']
t0 = time.time()
count = 0
last_count = 0
f = open('last_replied.txt')
last_replied = int(f.readline().strip())
f.close()
print('starting time:', time.strftime('%X'))
while True:
if count > last_count:
print(time.strftime('%X'), ':', count, 'replies')
last_count = count
for i in range(3):
twts = api.search(query[i], since_id=last_replied)
if len(twts)>0:
for twt in twts:
sid = twt.id
sn = twt.user.screen_name
stat = "lalala" + "#" + sn
api.update_with_media('oscar1.gif',status=stat,in_reply_to_status_id=sid)
count += 1
last_replied = twt.id
f = open('last_replied.txt','w')
f.write(str(last_replied))
f.close()
pause = random.randint(50,90)
time.sleep(pause)
my tweet gets posted, but not as a reply to the original tweet (twt). instead, it just gets posted as a new, independent, tweet.
however, when instead of update_with_media as above, i use update_status, such as:
api.update_status(status=stat,in_reply_to_status_id=sid)
my new tweet does get posted as a reply to the original tweet (twt).
what am i missing?
thanks
the solution i ended up with was switching to the twython module, that has the functionality well documented and working perfectly.
thank you very much for your help.
The update_with_media endpoint is being deprecated by Twitter (https://dev.twitter.com/rest/reference/post/statuses/update_with_media) you shouldn't use it. Instead upload the media first with the media_upload method and add the media_ids you get back to the update_status method.
I glanced at the code for update_with_media and don't see any obvious bugs, but the new method of handling tweets with media was added to the Tweepy API in January - note the last two bullets here:
https://github.com/tweepy/tweepy/releases/tag/v3.2.0
If you download Tweepy 3.2.0 you should be able to switch over to the new regimen, which will hopefully fix your reply_to problem. (I can't say for sure if the new stuff works, I'm using an older version of Tweepy myself.)

Categories

Resources