Alternative to Twitter API - python

I'm working on project where I stream tweets from Twitter API then apply sentiment analysis and visualize the results on an interactive colorful map.
I've tried the 'tweepy' library in python but the problem is it only retrieves few tweets (10 or less).
Also, I'm going to specify the language and the location which means I might get even less tweets! I need a real time streaming of hundred/thousands of tweets.
This is the code I tried (just in case):
import os
import tweepy
from textblob import TextBlob
port = os.getenv('PORT', '8080')
host = os.getenv('IP', '0.0.0.0')
# Step 1 - Authenticate
consumer_key= 'xx'
consumer_secret= 'xx'
access_token='xx'
access_token_secret='xx'
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
#Step 3 - Retrieve Tweets
public_tweets = api.search('school')
for tweet in public_tweets:
print(tweet.text)
analysis = TextBlob(tweet.text)
print(analysis)
Is there any better alternatives? I found "PubNub" which is a JavaScript API but for now I want something in python since it is easier for me.
Thank you

If you want large amount of tweets, I would recommend you to utilize Twitter's streaming API using tweepy:
#Create a stream listner:
import tweepy
tweets = []
class MyStreamListener(tweepy.StreamListener):
#The next function defines what to do when a tweet is parsed by the streaming API
def on_status(self, status):
tweets.append(status.text)
#Create a stream:
myStreamListener = MyStreamListener()
myStream = tweepy.Stream(auth = api.auth, listener=myStreamListener)
#Filter streamed tweets by the keyword 'school':
myStream.filter(track=['school'], languages=['en'])
Note that track filter used here is the standard free filtering API where there is another API called PowerTrack which is built for enterprises who have more requirements and rules to filter on.
Ref: https://developer.twitter.com/en/docs/tweets/filter-realtime/overview/statuses-filter
Otherwise, if you want to stick to the search method, you can query maximum of 100 tweets by adding count and use since_id on the maximum id parsed to get new tweets, you can add those attributes to the search method as follows:
public_tweets = []
max_id = 0
for i in range(10): #This loop will run 10 times you can play around with that
public_tweets.extend(api.search(q='school', count=100, since_id=max_id))
max_id = max([tweet.id for tweet in public_tweets])
#To make sure you only got unique tweets, you can do:
unique_tweets = list({tweet._json['id']:tweet._json for tweet in public_tweets}.values())
This way you will have to be careful with the API's limits and you will have to handle that by enabeling wait_on_rate_limit attribute when you initialize the API: api = tweepy.API(auth,wait_on_rate_limit=True)

Related

Obtain tweets with hashtags in them - Teepy

I have a basic program working using the Tweepy API. It essentially grabs tweets from a user and outputs it to a terminal. Ideally, I'd like this to be automated, so when the user tweets, the program will see it and display the tweet. But that's a question for another time.
What I'd like to do now, however, is grab the tweets with only a hashtag in it.
How do I go about this? I'm hoping it's a parameter I can add with inside the timeline function..?
Here is a snippet of the code I have at the moment:
import tweepy
import twitter_credentials
auth = tweepy.OAuthHandler(twitter_credentials.CONSUMER_KEY, twitter_credentials.CONSUMER_SECRET)
auth.set_access_token(twitter_credentials.ACCESS_TOKEN, twitter_credentials.ACCESS_TOKEN_SECRET)
api = tweepy.API(auth)
stuff = api.user_timeline(screen_name = 'XXXXXX', count = 10, include_rts = False)
for status in stuff:
print(status.text)
For a simple use case you can use # in the search string, for example:
api = tweepy.API(auth,wait_on_rate_limit=True)
tweet in tweepy.Cursor(api.search,q="#",count=100).items():
print(tweet)
This will give you tweets which contain any hastags.

How to stream tweets using tweepy from a start date to end date using python?

I am currently in the process of doing some research using sentiment analysis on twitter data regarding a certain topic (isn't necessarily important to this question) using python, of which I am a beginner at. I understand the twitter streaming API limits users to access only to the previous 7 days unless you apply for a full enterprise search which opens up the whole archive. I had recently been given access to the full archive for this research project from twitter but I am unable to specify a start and end date to the tweets I would like to stream into a csv file. This is my code:
import pandas as pd
from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener
ckey = 'xxxxxxxxxxxxxxxxxxxxxxx'
csecret = 'xxxxxxxxxxxxxxxxxxxxxxx'
atoken = 'xxxxxxxxxx-xxxxxxxxxxxxxxxxxxxxxxx'
asecret = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxx'
# =============================================================================
# def sentimentAnalysis(text):
# output = '0'
# return output
# =============================================================================
class listener(StreamListener):
def on_data(self, data):
tweet = data.split(',"text":"')[1].split('","source')[0]
saveMe = tweet+'::'+'\n'
output = open('output.csv','a')
output.write(saveMe)
output.close()
return True
def on_error(self, status):
print(status)
auth = OAuthHandler(ckey, csecret)
auth.set_access_token(atoken, asecret)
twitterStream = Stream(auth, listener())
twitterStream.filter(track=["#weather"], languages = ["en"])
Now this code streams twitter date from the past 7 days perfectly. I tried changing the bottom line to
twitterStream.filter(track=["#weather"], languages = ["en"], since = ["2016-06-01"])
but this returns this error :: filter() got an unexpected keyword argument 'since'.
What would be the correct way to filter by a given date frame?
The tweepy does not provide the "since" argument, as you can check yourself here.
To achieve the desired output, you will have to use the api.user_timeline, iterating through pages until the desired date is reached, Eg:
import tweepy
import datetime
# The consumer keys can be found on your application's Details
# page located at https://dev.twitter.com/apps (under "OAuth settings")
consumer_key=""
consumer_secret=""
# The access tokens can be found on your applications's Details
# page located at https://dev.twitter.com/apps (located
# under "Your access token")
access_token=""
access_token_secret=""
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
page = 1
stop_loop = False
while not stop_loop:
tweets = api.user_timeline(username, page=page)
if not tweets:
break
for tweet in tweets:
if datetime.date(YEAR, MONTH, DAY) < tweet.created_at:
stop_loop = True
break
# Do the tweet process here
page+=1
time.sleep(500)
Note that you will need to update the code to fit your needs, this is just a general solution.

Tweepy Streaming filter fields

I've this python code that retrieves data from Twitter with Tweepy and Streming APIs and it stops when has found 1000 results (that is 1000 tweets data).
It works well but the problem is that when I try to run it on PyCharm, it cuts part of the results. Since the code returns all the data of a tweets (ID, Text, Author ecc) problably it generates too many data and the software crushs. So I'd like te modify the code in order to get only some fields of the twitter data (for eg. I need only the text of the tweet, the author, the date)
Any suggestion is appreciated
# Import the necessary package to process data in JSON format
try:
import json
except ImportError:
import simplejson as json
# Import the necessary methods from "twitter" library
from twitter import Twitter, OAuth, TwitterHTTPError, TwitterStream
# Variables that contains the user credentials to access Twitter API
ACCESS_TOKEN = ''
ACCESS_SECRET = ''
CONSUMER_KEY = ''
CONSUMER_SECRET = ''
oauth = OAuth(ACCESS_TOKEN, ACCESS_SECRET, CONSUMER_KEY, CONSUMER_SECRET)
# Initiate the connection to Twitter Streaming API
twitter_stream = TwitterStream(auth=oauth)
# Get a sample of the public data following through Twitter
#iterator = twitter_stream.statuses.sample() #SEMPLICE TWITTER STREAMING
iterator = twitter_stream.statuses.filter(track="Euro2016", language="en") #tWITTER STREAMING IN BASE AD UNA TRACK DI RICERCA E AL LINGUAGGIO PER ALTRI SETTAGGI VEDERE https://dev.twitter.com/streaming/overview/request-parameters
#PER SETTARE PARAMETRI RICERCA https://dev.twitter.com/streaming/overview/request-parameters
# Print each tweet in the stream to the screen
# Here we set it to stop after getting 1000 tweets.
# You don't have to set it to stop, but can continue running
# the Twitter API to collect data for days or even longer.
tweet_count = 1000 #SETTAGGIO DI QUANTI RISULTATI RESTITUIRE
for tweet in iterator:
tweet_count -= 1
# Twitter Python Tool wraps the data returned by Twitter
# as a TwitterDictResponse object.
# We convert it back to the JSON format to print/score
print(json.dumps(tweet))
# The command below will do pretty printing for JSON data, try it out
# print json.dumps(tweet, indent=4)
if tweet_count <= 0:
break
I was able to run this on PyCharm without any issues for 1000 tweets. So try running this on another computer or investigate if you have issues with your existing system.
The result is a python dictionary, so all you need to access individual elements is like below
for tweet in iterator:
tweet_count -= 1
#access the elements such as 'text','created_at' ...
print tweet['text']

Filter Twitter feeds only by language

I am using Tweepy API for extracting Twitter feeds. I want to extract all Twitter feeds of a specific language only. The language filter works only if track filter is provided. The following code returns 406 error:
l = StdOutListener()
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
stream = Stream(auth, l)
stream.filter(languages=["en"])
How can I extract all the tweets from certain language using Tweepy?
You can't (without special access). Streaming all the tweets (unfiltered) requires a connection to the firehose, which is granted only in specific use cases by Twitter. Honestly, the firehose isn't really necessary--proper use of track can get you more tweets than you know what to do with.
Try using something like this:
stream.filter(languages=["en"], track=["a", "the", "i", "you", "u"]) # etc
Filtering by words like that will get you many, many tweets. If you want real data for the most-used words, check out this article from Time: The 500 Most Frequently Used Words on Twitter. You can use up to 400 keywords, but that will likely approach the 1% limit of tweets at a given time interval. If your track parameter matches 60% of all tweets at a given time, you will still only get 1% (which is a LOT of tweets).
Try lang='en' param in Cursor() e.g.
tweepy.Cursor(.. lang='en')
Other than getting filtered tweets directly, you can filter it after getting all tweets of different languages by:
tweets = api.search("python")
for tweet in tweets:
if tweet.lang == "en":
print(tweet.text)
#Do the stuff here
Hope it helps.
You can see the arguments for the track method in the github code https://github.com/tweepy/tweepy/blob/master/tweepy/streaming.py
Put languages in a array of ISO_639-1_codes.
They are:
filter(self, follow=None, track=None, is_async=False, locations=None,
stall_warnings=False, languages=None, encoding='utf8', filter_level=None):
So to track by languages just put:
class Listener(StreamListener):
def on_data(self, data):
j = json.loads(data)
t = {
'screenName' : j['user']['screen_name'],
'text:': j['text']
}
print(t)
return(True)
def on_status(self, status):
print(status.text)
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
stream = Stream(auth=auth, listener=Listener(),wait_on_rate_limit=True,wait_on_rate_limit_notify=True)
stream.filter(track=['Trump'],languages=["en","fr","es"])
Tweepy search allows to fetch tweets for specific language. You can use ISO 639-1 code to specify the value for language parameter.
Following code will fetch tweets with full text in specified language (English for below example)
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
tweets = api.search(q = keywordtosearch, lang = 'en', count = 100, truncated = False, tweet_mode = 'extended')
for tweet in tweets:
print(tweet.full_text)
#add your code
This worked for me.
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
a=input("Enter Tag: ")
tweets = api.search(a, count=200)
a=[]
for tweet in tweets:
if tweet.lang == "en":
a.append(tweet.text)
With the help of GetOldTweets3 (https://pypi.org/project/GetOldTweets3/), you can download tweets (even old ones) by filtering over few criteria, as shown below:
tweetCriteria = got.manager.TweetCriteria().setQuerySearch('Coronavirus')\
.setSince("2020-02-15")\
.setUntil("2020-03-29")\
.setMaxTweets(5)\
.setNear('India')\
.setLang('en')
tweets = got.manager.TweetManager.getTweets(tweetCriteria)
for tweet in tweets:
print(tweet.text)
print(tweet.date)
print(tweet.geo)
print(tweet.id)
print(tweet.permalink)
print(tweet.username)
print(tweet.retweets)
print(tweet.favorites)
print(tweet.mentions)
print(tweet.hashtags)
print('*'*50)

Return a users tweets with tweepy

I am using tweepy and python 2.7.6 to return the tweets of a specified user
My code looks like:
import tweepy
ckey = 'myckey'
csecret = 'mycsecret'
atoken = 'myatoken'
asecret = 'myasecret'
auth = tweepy.OAuthHandler(ckey, csecret)
auth.set_access_token(atoken, asecret)
api = tweepy.API(auth)
stuff = api.user_timeline(screen_name = 'danieltosh', count = 100, include_rts = True)
print stuff
However this yields a set of messages which look like<tweepy.models.Status object at 0x7ff2ca3c1050>
Is it possible to print out useful information from these objects? where can I find all of their attributes?
Unfortunately, Status model is not really well documented in the tweepy docs.
user_timeline() method returns a list of Status object instances. You can explore the available properties and methods using dir(), or look at the actual implementation.
For example, from the source code you can see that there are author, user and other attributes:
for status in stuff:
print status.author, status.user
Or, you can print out the _json attribute value which contains the actual response of an API call:
for status in stuff:
print status._json
import tweepy
import tkinter
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)
# set parser=tweepy.parsers.JSONParser() if you want a nice printed json response.
userID = "userid"
user = api.get_user(userID)
tweets = api.user_timeline(screen_name=userID,
# 200 is the maximum allowed count
count=200,
include_rts = False,
# Necessary to keep full_text
# otherwise only the first 140 words are extracted
tweet_mode = 'extended'
)
for info in tweets[:3]:
print("ID: {}".format(info.id))
print(info.created_at)
print(info.full_text)
print("\n")
Credit to https://fairyonice.github.io/extract-someones-tweet-using-tweepy.html
In Tweeter API v2 getting tweets of a specified user is fairly easy, provided that you won’t exceed the limit of 3200 tweets. See documentation for more info.
import tweepy
# create client object
tweepy.Client(
bearer_token=TWITTER_BEARER_TOKEN,
consumer_key=TWITTER_API_KEY,
consumer_secret=TWITTER_API_KEY_SECRET,
access_token=TWITTER_ACCESS_TOKEN,
access_token_secret=TWITTER_TOKEN_SECRET,
)
# retrieve first n=`max_results` tweets
tweets = client.get_users_tweets(id=user_id, **kwargs)
# retrieve using pagination until no tweets left
while True:
if not tweets.data:
break
tweets_list.extend(tweets.data)
if not tweets.meta.get('next_token'):
break
tweets = client.get_users_tweets(
id=user_id,
pagination_token=tweets.meta['next_token'],
**kwargs,
)
The tweets_list is going to be a list of tweepy.tweet.Tweet objects.

Categories

Resources