Tweepy error when trying to gather Tweets - python

I'm trying to get Tweets using the Tweepy module in Python. However, whenever I try to collect Tweets using the tweepy.Cursor function, it returns the error:
TweepyException Traceback (most recent call last)
<ipython-input-5-a26c992842dc> in <module>
----> 1 tweets_list = tweepy.Cursor(api.search_tweets(q="oranges", tweet_mode='extended', lang='en')).items()
~\anaconda3\lib\site-packages\tweepy\cursor.py in __init__(self, method, *args, **kwargs)
38 raise TweepyException('Invalid pagination mode.')
39 else:
---> 40 raise TweepyException('This method does not perform pagination')
41
42 def pages(self, limit=inf):
TweepyException: This method does not perform pagination
I do not know why this is. I can still post Tweets using the api.update_status() function. Please help. Here is my code. As you can see, the setup is correct; it is only this function that is returning the error.
from config import *
import tweepy
import datetime
consumer_key= 'CONSUMER KEY HERE'
consumer_secret= 'CONSUMER KEY SECRET HERE'
access_token= 'ACCESS TOKEN HERE'
access_token_secret= 'ACCESS TOKEN SECRET HERE'
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
try:
api.verify_credentials()
print("Authentication Successful")
except:
print("Authentication Error")
tweets_list = tweepy.Cursor(api.search_tweets(q="oranges", tweet_mode='extended', lang='en')).items()
If there is an error in the function or the code itself, can you please write a correct version? I'm trying to gather Tweet data for a project.

You're passing the result of the call to / invocation of API.search_tweets, rather than the method itself, to Cursor.
See Tweepy's Pagination documentation for an example of how to use Cursor.

Related

tweepy API.user_timeline returns empty list

I am building a Telegram bot that will forward Elon Musk's latest Twitter.
Even if it works without any problems, occasionly index error is occured.
"api_user_timeline" returns empty list.
To find out the problem, print() was added and driven, but an error occurred in the second iteration of while().
import telepot
import os
import tweepy
import time
import requests
token = '' #telepot bot token
user_list=[''] #user id
bot = telepot.Bot(token)
consumer_key = "" # Twitter API Key
consumer_secret = "" # Twitter API Secret Key
access_token = "" # Twitter Access Key
access_token_secret = "" # Twitter Access Secret Key
auth = tweepy.OAuthHandler(consumer_key,consumer_secret)
auth.set_access_token(access_token,access_token_secret)
api = tweepy.API(auth, wait_on_rate_limit=True)
user_id='44196397'
status = api.user_timeline(user_id=user_id,exclude_replies=True,include_rts=False, tweet_mode = 'extended')
print("[status] "+status_present[0].full_text)
while True:
time.sleep(30)
status_present = api.user_timeline(user_id=user_id,exclude_replies=True, include_rts=False, tweet_mode = 'extended')
print("[status_present] "+status_present[0].full_text)
if status[0].full_text!=status_present[0].full_text:
for i in user_list:
bot.sendMessage(i,status_present[0].full_text)
status=status_present
here's output print() message & error message below
I ran it in jupyter notebook.
[status] Given that Twitter serves as the de facto public town square, failing to adhere to free speech principles fundamentally undermines democracy.
[status_present] Given that Twitter serves as the de facto public town square, failing to adhere to free speech principles fundamentally undermines democracy.
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_9460/139022525.py in <module>
56 time.sleep(30)
57 status_present = api.user_timeline(user_id=user_id,exclude_replies=True, include_rts=False, tweet_mode = 'extended')
---> 58 print("[status_present] "+status_present[0].full_text)
59 if status[0].full_text!=status_present[0].full_text:
60 for i in user_list:
IndexError: list index out of range
I want to know why this phenomenon occurs. I would appreciate it if you could let me know if there is anything to fix in the code. Thanks

I don't know how to use Tweepy; can you tell me?

I install Tweepy and Python-Twitter and try code, and I tried to run.
If I try this code:
import tweepy
import time
#insert your Twitter keys here
consumer_key ='xxx'
consumer_secret='xxx'
access_token='xxx'
access_token_secret='xxx'
twitter_handle='handle'
auth = tweepy.auth.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
list= open('twitter_followers.txt','w')
if(api.verify_credentials):
print ('We successfully logged in')
user = tweepy.Cursor(api.followers, screen_name=twitter_handle).items()
while True:
try:
u = next(user)
list.write(u.screen_name +' \n')
except:
time.sleep(15*60)
print ('We got a timeout ... Sleeping for 15 minutes')
u = next(user)
list.write(u.screen_name +' \n')
list.close()
I get this error when I do:
File "C:\Users\xxx.py", line 19, in <module>
user = tweepy.Cursor(api.followers, screen_name=twitter_handle).items()
AttributeError: 'API' object has no attribute 'followers'
Are there any errors that have not been progressed yet?
Tweepy v4.0.0 renamed API.followers to API.get_followers.
Also, you're not calling API.verify_credentials and instead only checking if the method exists, so that if statement expression will always evaluate to True and print.

Running Time Estimate for Stream Twitter with Location Filter in Tweepy

PROBLEM SOLVED, SEE SOLUTION AT THE END OF THE POST
I need help to estimate running time for my tweepy program calling Twitter Stream API with location filter.
After I kicked it off, it has run for over 20 minutes, which is longer than what I expected. I am new to Twitter Stream API, and have only worked with REST API for couple of days. It looks to me that REST API will give me 50 tweets in a few seconds, easy. But this Stream request is taking a lot more time. My program hasn't died on me or given any error. So I don't know if there's anything wrong with it. If so, please do point out.
In conclusion, if you think my code is correct, could you provide an estimate for the running time? If you think my code is wrong, could you help me to fix it?
Thank you in advance!
Here's the code:
# Import Tweepy, sys, sleep, credentials.py
import tweepy, sys
from time import sleep
from credentials import *
# Access and authorize our Twitter credentials from credentials.py
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
box = [-86.33,41.63,-86.20,41.74]
class CustomStreamListener(tweepy.StreamListener):
def on_error(self, status_code):
print >> sys.stderr, 'Encountered error with status code:', status_code
return True # Don't kill the stream
def on_timeout(self):
print >> sys.stderr, 'Timeout...'
return True # Don't kill the stream
stream = tweepy.streaming.Stream(auth, CustomStreamListener()).filter(locations=box).items(50)
stream
I tried the method from http://docs.tweepy.org/en/v3.4.0/auth_tutorial.html#auth-tutorial Apparently it is not working for me... Here is my code below. Would you mind giving any input? Let me know if you have some working code. Thanks!
# Import Tweepy, sys, sleep, credentials.py
import tweepy, sys
from time import sleep
from credentials import *
# Access and authorize our Twitter credentials from credentials.py
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
# Assign coordinates to the variable
box = [-74.0,40.73,-73.0,41.73]
import tweepy
#override tweepy.StreamListener to add logic to on_status
class MyStreamListener(tweepy.StreamListener):
def on_status(self, status):
print(status.text)
def on_error(self, status_code):
if status_code == 420:
#returning False in on_data disconnects the stream
return False
myStreamListener = MyStreamListener()
myStream = tweepy.Stream(auth = api.auth, listener=myStreamListener())
myStream.filter(track=['python'], locations=(box), async=True)
Here is the error message:
Traceback (most recent call last):
File "test.py", line 26, in <module>
myStream = tweepy.Stream(auth = api.auth, listener=myStreamListener())
TypeError: 'MyStreamListener' object is not callable
PROBLEM SOLVED! SEE SOLUTION BELOW
After another round of debug, here is the solution for one who may have interest in the same topic:
# Import Tweepy, sys, sleep, credentials.py
try:
import json
except ImportError:
import simplejson as json
import tweepy, sys
from time import sleep
from credentials import *
# Access and authorize our Twitter credentials from credentials.py
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
# Assign coordinates to the variable
box = [-74.0,40.73,-73.0,41.73]
import tweepy
#override tweepy.StreamListener to add logic to on_status
class MyStreamListener(tweepy.StreamListener):
def on_status(self, status):
print(status.text.encode('utf-8'))
def on_error(self, status_code):
if status_code == 420:
#returning False in on_data disconnects the stream
return False
myStreamListener = MyStreamListener()
myStream = tweepy.Stream(api.auth, listener=myStreamListener)
myStream.filter(track=['NYC'], locations=(box), async=True)
Core Problem:
I think you're misunderstanding what the Stream is here.
Tl;dr: Your code is working, you're just not doing anything with the data that gets back.
The rest API call is a single call for information. You make a request, Twitter sends back some information, which gets assigned to your variable.
The StreamObject (which you've created as stream) from Tweepy opens a connection to twitter with your search parameters, and Twitter, well, streams Tweets to it. Forever.
From the Tweepy docs:
The streaming api is quite different from the REST api because the
REST api is used to pull data from twitter but the streaming api
pushes messages to a persistent session. This allows the streaming api
to download more data in real time than could be done using the REST
API.
So, you need to build a handler (streamListener, in tweepy's terminology), like this one that prints out the tweets..
Additional
Word of warning, from bitter experience - if you're going to try and save the tweets to a database: Twitter can, and will, stream objects to you much faster than you can save them to the database. This will result in your Stream being disconnected, because the tweets back up at Twitter, and over a certain level of backed-up-ness (not an actual phrase), they'll just disconnect you.
I handled this by using django-rq to put save jobs into a jobqueue - this way, I could handle hundreds of tweets a second (at peak), and it would smooth out. You can see how I did this below. Python-rq would also work if you're not using django as a framework round it. The read both method is just a function that reads from the tweet and saves it to a postgres database. In my specific case, I did that via the Django ORM, using the django_rq.enqueue function.
__author__ = 'iamwithnail'
from django.core.management.base import BaseCommand, CommandError
from django.db.utils import DataError
from harvester.tools import read_both
import django_rq
class Command(BaseCommand):
args = '<search_string search_string>'
help = "Opens a listener to the Twitter stream, and tracks the given string or list" \
"of strings, saving them down to the DB as they are received."
def handle(self, *args, **options):
try:
import urllib3.contrib.pyopenssl
urllib3.contrib.pyopenssl.inject_into_urllib3()
except ImportError:
pass
consumer_key = '***'
consumer_secret = '****'
access_token='****'
access_token_secret_var='****'
import tweepy
import json
# This is the listener, responsible for receiving data
class StdOutListener(tweepy.StreamListener):
def on_data(self, data):
decoded = json.loads(data)
try:
if decoded['lang'] == 'en':
django_rq.enqueue(read_both, decoded)
else:
pass
except KeyError,e:
print "Error on Key", e
except DataError, e:
print "DataError", e
return True
def on_error(self, status):
print status
l = StdOutListener()
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret_var)
stream = tweepy.Stream(auth, l)
stream.filter(track=args)
Edit: Your subsequent problem is caused by calling the listener wrongly.
myStreamListener = MyStreamListener() #creates an instance of your class
Where you have this:
myStream = tweepy.Stream(auth = api.auth, listener=myStreamListener())
You're trying to call the listener as a function when you use the (). So it should be:
myStream = tweepy.Stream(auth = api.auth, listener=myStreamListener)
And in fact, can probably just be more succinctly written as:
myStream = tweepy.Stream(api.auth,myStreamListener)

count parameter is ignored when querying user_timeline in tweepy

I'm trying to use the tweepy library in one of my python projects. When I try the following code that creates a tweepy cursor to fetch a user's timeline status messages, the count parameter is always ignored.
def search(self, username, keyword, consumer_key, consumer_secret, access_token, access_token_secret):
#start twitter auth
try:
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
user = api.get_user(username)
except Exception as e:
print(str(e))
self.error = str(e)
return
self.followercount = user.followers_count
self.screenname = user.screen_name
results = []
for status in tweepy.Cursor(api.user_timeline, id=username, count=2).items():
try:
tweet = status._json
In this instance, the count is set to 2 in the Cursor object, yet it receives all of them. What am I doing wrong?
tweepy.Cursor() does not appear to recognize a count argument. In fact, count is not mentioned anywhere in tweepy/cursor.py, the module where tweepy.Cursor is defined. Instead, it looks like you might want to use:
for status in tweepy.Cursor(api.user_timeline, id=username).items(2):
passing the limit to items() instead of as the count keyword argument. See this section in the tweepy Cursor tutorial.

LinkedIn API Python Key Error 2.7

This code is available online to run a map of your connections in linkedin
This uses linkedin api.
I'm able to connect fine and everything runs okay till the last script of actually writing the data to a csv.
Whenever I run the code
import oauth2 as oauth
import urlparse
import simplejson
import codecs
CONSUMER_KEY = "xxx"
CONSUMER_SECRET = "xxx"
OAUTH_TOKEN = "xxx"
OAUTH_TOKEN_SECRET = "xxx"
OUTPUT = "linked.csv"
def linkedin_connections():
# Use your credentials to build the oauth client
consumer = oauth.Consumer(key=CONSUMER_KEY, secret=CONSUMER_SECRET)
token = oauth.Token(key=OAUTH_TOKEN, secret=OAUTH_TOKEN_SECRET)
client = oauth.Client(consumer, token)
# Fetch first degree connections
resp, content = client.request('http://api.linkedin.com/v1/people/~/connections?format=json')
results = simplejson.loads(content)
# File that will store the results
output = codecs.open(OUTPUT, 'w', 'utf-8')
# Loop thru the 1st degree connection and see how they connect to each other
for result in results["values"]:
con = "%s %s" % (result["firstName"].replace(",", " "), result["lastName"].replace(",", " "))
print >>output, "%s,%s" % ("John Henry", con)
# This is the trick, use the search API to get related connections
u = "https://api.linkedin.com/v1/people/%s:(relation-to-viewer:(related-connections))?format=json" % result["id"]
resp, content = client.request(u)
rels = simplejson.loads(content)
try:
for rel in rels['relationToViewer']['relatedConnections']['values']:
sec = "%s %s" % (rel["firstName"].replace(",", " "), rel["lastName"].replace(",", " "))
print >>output, "%s,%s" % (con, sec)
except:
pass
if __name__ == '__main__':
linkedin_connections()
for result in results["values"]:
KeyError: 'values'
When I run this I get an error message:
Traceback (most recent call last):
File "linkedin-2-query.py", line 51, in <module>
linkedin_connections()
File "linkedin-2-query.py", line 35, in linkedin_connections
for result in results["values"]:
KeyError: 'values'
Any suggestions or help would be greatly appreciated!
I encountered the same issue working through the post Visualizing your LinkedIn graph using Gephi – Part 1.
Python raises a KeyError whenever a dict() object is requested (using the format a = adict[key]) and the key is not in the dictionary. KeyError - Python Wiki
After searching a bit and adding some print statements, I realize that my OAuth session has expired, so the OAuth token in my linkedin-2-query.py script was is longer valid.
Since the OAuth token is invalid, the LinkedIn API does not return a dictionary with the key "values" like the script expects. Instead, the API returns the string 'N'. Python tries to find the dict key "values"in the string 'N', fails, and generates the KeyError: 'values'.
So a new, valid OAuth token & secret should get the API to return a dict containing connection data.
I run the linkedin-1-oauth.py script again, and then visit the LinkedIn Application details page to find my new OAuth token. (The screenshot omits the values for my app. You should see alphanumeric values for each Key, Token, & Secret.)
...
I then update my linkedin-2-query.py script with the new OAuth User Token and OAuth User Secret
OAUTH_TOKEN = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" # your updated OAuth User Token
OAUTH_TOKEN_SECRET = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" # your updated OAuth User Secret
After updating the OAuth token & secret, I immediately run my linkedin-2-query.py script. Hooray, it runs without errors and retrieves my connection data from the API.

Categories

Resources