I am learning python and have started out a few weeks ago. I have tried to write a code to check for tweets with a particular hashtag in the streaming API and then reply to the tweet in case the a tweet has not been posted to the handle previously. While running the code, I have tried to avoid overstepping the rate limitations so as to not get any error. But there is an issue of duplicate status that Twitter raises once in a while. I would like the code to keep running and not stop on encountering an issue. Please help in this. The following is the code:
import tweepy
from tweepy import Stream
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
import json
import time
consumer_key =
consumer_secret =
access_token =
access_secret =
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)
def check(status):
datafile = file('C:\Users\User\Desktop\Growth Handles.txt', 'r')
found = False
for line in datafile:
if status.user.screen_name in line:
found = True
break
return found
class MyListener(StreamListener):
def on_status(self, status):
f=status.user.screen_name
if check(status) :
pass
else:
Append=open('Growth Handles.txt' , 'a' )
Append.write(f + "\n")
Append.close()
Reply='#' + f + ' Check out Tomorrowland 2014 Setlist . http://.... '
api = tweepy.API(auth)
api.update_status(status=Reply)
time.sleep(45)
return True
def on_error(self, status):
print(status)
return True
twitter_stream = Stream(auth, MyListener())
twitter_stream.filter(track=['#musiclovers'])
In case, update_status method throws an error
try:
api.update_status(status=Reply)
except:
pass
In case twitter_stream gets disconnected.
twitter_stream = Stream(auth, MyListener())
while True:
twitter_stream.filter(track=['#musiclovers'])
Warning - Your app may got banned if it reaches certain limits, or their system caught you spamming. Check Twitter Rules
Related
The code below is functional and it streams tweets and stores them in a JSON file. I am running it on a virtual machine since I want to collect 2 months worth of data. However, for some reason, the code stopped running after around 48 hours with no error. Is this a tweepy limitation ( streaming rate or something) or should I check my connection:
import os
import sys
from tweepy import API
from tweepy import OAuthHandler
consumer_key = ''
consumer_secret = ''
access_token = ''
access_secret = ''
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)
def get_twitter_client():
"""Setup Twitter API client Return: tweepy.API object """
auth = get_twitter_auth()
client = API(auth)
return client
from tweepy import Stream
from tweepy.streaming import StreamListener
class MyListener(StreamListener):
def on_data(self, data):
try:
with open('strategyand_user.json', 'a') as f:
f.write(data)
return True
except BaseException as e:
print("Error on_data: %s" % str(e))
return True
def on_error(self, status):
print(status)
return True
twitter_stream = Stream(auth, MyListener())
twitter_stream.filter(follow=['xxxxxxx'])
I am trying to extract 1000 unique, fully extended URI's from Twitter using Tweepy and Python. Specifically, I am interested in links that direct me outside of Twitter (so not back to other tweets/ retweets/ duplicates).
The code I wrote keeps giving me a Key error for "entities."
It will give me a few urls before breaking; some are extended, some are not. I have no idea how to go about fixing this.
Help me please!
Note: I left my credentials out.
Here is my code:
# Import the necessary methods from different libraries
import tweepy
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
import json
# Variables that contains the user credentials to access Twitter API
access_token = "enter token here"
access_token_secret = "enter token here"
consumer_key = "enter key here"
consumer_secret = "enter key here"
# Accessing tweepy API
# api = tweepy.API(auth)
# This is a basic listener that just prints received tweets to stdout.
class StdOutListener(StreamListener):
def on_data(self, data):
# resource: http://code.runnable.com/Us9rrMiTWf9bAAW3/how-to- stream-data-from-twitter-with-tweepy-for-python
# Twitter returns data in JSON format - we need to decode it first
decoded = json.loads(data)
# resource: http://socialmedia-class.org/twittertutorial.html
# Print each tweet in the stream to the screen
# Here we set it to stop after getting 1000 tweets.
# You don't have to set it to stop, but can continue running
# the Twitter API to collect data for days or even longer.
count = 1000
for url in decoded["entities"]["urls"]:
count -= 1
print "%s" % url["expanded_url"] + "\r\n\n"
if count <= 0:
break
def on_error(self, status):
print status
if __name__ == '__main__':
# This handles Twitter authetification and the connection to Twitter Streaming API
l = StdOutListener()
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
stream = Stream(auth, l)
# This line filter Twitter Streams to capture data by the keyword: YouTube
stream.filter(track=['YouTube'])
It seems like the API is hitting a rate limit, so one option is to include an Exception when it gets a KeyError, I then see [u'limit']. I added a count display to verify it does get to 1000:
count = 1000 # moved outside of class definition to avoid getting reset
class StdOutListener(StreamListener):
def on_data(self, data):
decoded = json.loads(data)
global count # get the count
if count <= 0:
import sys
sys.exit()
else:
try:
for url in decoded["entities"]["urls"]:
count -= 1
print count,':', "%s" % url["expanded_url"] + "\r\n\n"
except KeyError:
print decoded.keys()
def on_error(self, status):
print status
if __name__ == '__main__':
l = StdOutListener()
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
stream = Stream(auth, l)
stream.filter(track=['YouTube'])
Using the code below, I'm trying to get a hash tag. It works fine for larger searches like #StarWars, but when i ask for smaller ones it doesn't seem to return anything.
Ideas?
'code' is used instead of the actual strings for authentication
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
from textwrap import TextWrapper
import json
access_token = "code"
access_token_secret = "code"
consumer_key = "code"
consumer_secret = "code"
class StdOutListener(StreamListener):
''' Handles data received from the stream. '''
status_wrapper = TextWrapper(width=60, initial_indent=' ', subsequent_indent=' ')
def on_status(self, status):
try:
print self.status_wrapper.fill(status.text)
print '\n %s %s via %s\n' % (status.author.screen_name, status.created_at, status.source)
except:
# Catch any unicode errors while printing to console
# and just ignore them to avoid breaking application.
pass
def on_error(self, status_code):
print('Got an error with status code: ' + str(status_code))
return True # To continue listening
def on_timeout(self):
print('Timeout...')
return True # To continue listening
if __name__ == '__main__':
listener = StdOutListener()
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
stream = Stream(auth, listener)
stream.filter(track=['#TestingPythonTweet'])
Ok, so found that the answer to this is that i was expecting it to work retro-actively. This was a fundamental error on my part. Instead what actually happens is that it gets what's currently being tweeted. Not was has been previously.
I am trying the following code to extract tweets in french but it returns tweets in english and some other languages. Is there a syntax error ?
import sys
import tweepy
import config as config
consumer_key= config.CONSUMER_KEY
consumer_secret= config.CONSUMER_SECRET
access_key = config.OAUTH_TOKEN
access_secret = config.OAUTH_TOKEN_SECRET
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)
class CustomStreamListener(tweepy.StreamListener):
def on_status(self, status):
print status.lang
#, status.user.location, status.user.description
def on_error(self, status_code):
print >> sys.stderr, 'Encountered error with status code:', status_code
return True # Don't kill the stream
def on_timeout(self):
print >> sys.stderr, 'Timeout...'
return True # Don't kill the stream
sapi = tweepy.streaming.Stream(auth, CustomStreamListener())
sapi.filter(track=["rame", "lent", "long", "beug", "beugg", "beugge", "bug"], languages = ["fr"])
a bit of an old post but I've been experiencing the same problem lately : the keyword search does not seem to work properly, it seems to return tweets even if they don't have the keyword. The solution which worked for me was to use twython and the API v2...
Try language instead of languages.
Im doing Twitter sentiment research at the moment. For this reason, I'm using the Twitter API to download all tweets on certain keywords. But my current code is taking a lot of time to create a large datafile, so I was wondering if there's a faster method.
This is what Im using right now:
__author__ = 'gerbuiker'
import time
#Import the necessary methods from tweepy library
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
#Variables that contains the user credentials to access Twitter API
access_token = "XXXXXXXXXXXXX"
access_token_secret = "XXXXXXXX"
consumer_key = "XXXXX"
consumer_secret = "XXXXXXXXXXXXXX"
#This is a basic listener that just prints received tweets to stdout.
class StdOutListener(StreamListener):
def on_data(self, data):
try:
#print data
tweet = data.split(',"text":"')[1].split('","source')[0]
print tweet
saveThis = str(time.time())+ '::'+ tweet #saves time+actual tweet
saveFile = open('twitiamsterdam.txt','a')
saveFile.write(saveThis)
saveFile.write('\n')
saveFile.close()
return True
except BaseException, e:
print 'failed ondata,',str(e)
time.sleep(5)
def on_error(self, status):
print status
if __name__ == '__main__':
#This handles Twitter authetification and the connection to Twitter Streaming API
l = StdOutListener()
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
stream = Stream(auth, l)
#This line filter Twitter Streams to capture data by the keywords: 'Amsterdam'
stream.filter(track=['KEYWORD which i want to check'])
This gets me about 1500 tweets in one hour, for a pretty popular keyword (Amsterdam). Does anyone now a faster method in Python?
To be clear: I want to download all tweets on a certain subject for last month/year for example. So the newest tweets don't have to keep coming in, the most recent ones for a period would be sufficient. Thanks!
I need something similar to this for an academic research.
We're you able to fix it?
Would it be possible to specify a custom range of time from which to pull the data?
Sorry for asking here, but couldn't send you private messages.