Tweepy Streaming filter fields - python

I've this python code that retrieves data from Twitter with Tweepy and Streming APIs and it stops when has found 1000 results (that is 1000 tweets data).
It works well but the problem is that when I try to run it on PyCharm, it cuts part of the results. Since the code returns all the data of a tweets (ID, Text, Author ecc) problably it generates too many data and the software crushs. So I'd like te modify the code in order to get only some fields of the twitter data (for eg. I need only the text of the tweet, the author, the date)
Any suggestion is appreciated
# Import the necessary package to process data in JSON format
try:
import json
except ImportError:
import simplejson as json
# Import the necessary methods from "twitter" library
from twitter import Twitter, OAuth, TwitterHTTPError, TwitterStream
# Variables that contains the user credentials to access Twitter API
ACCESS_TOKEN = ''
ACCESS_SECRET = ''
CONSUMER_KEY = ''
CONSUMER_SECRET = ''
oauth = OAuth(ACCESS_TOKEN, ACCESS_SECRET, CONSUMER_KEY, CONSUMER_SECRET)
# Initiate the connection to Twitter Streaming API
twitter_stream = TwitterStream(auth=oauth)
# Get a sample of the public data following through Twitter
#iterator = twitter_stream.statuses.sample() #SEMPLICE TWITTER STREAMING
iterator = twitter_stream.statuses.filter(track="Euro2016", language="en") #tWITTER STREAMING IN BASE AD UNA TRACK DI RICERCA E AL LINGUAGGIO PER ALTRI SETTAGGI VEDERE https://dev.twitter.com/streaming/overview/request-parameters
#PER SETTARE PARAMETRI RICERCA https://dev.twitter.com/streaming/overview/request-parameters
# Print each tweet in the stream to the screen
# Here we set it to stop after getting 1000 tweets.
# You don't have to set it to stop, but can continue running
# the Twitter API to collect data for days or even longer.
tweet_count = 1000 #SETTAGGIO DI QUANTI RISULTATI RESTITUIRE
for tweet in iterator:
tweet_count -= 1
# Twitter Python Tool wraps the data returned by Twitter
# as a TwitterDictResponse object.
# We convert it back to the JSON format to print/score
print(json.dumps(tweet))
# The command below will do pretty printing for JSON data, try it out
# print json.dumps(tweet, indent=4)
if tweet_count <= 0:
break

I was able to run this on PyCharm without any issues for 1000 tweets. So try running this on another computer or investigate if you have issues with your existing system.
The result is a python dictionary, so all you need to access individual elements is like below
for tweet in iterator:
tweet_count -= 1
#access the elements such as 'text','created_at' ...
print tweet['text']

Related

Obtain tweets with hashtags in them - Teepy

I have a basic program working using the Tweepy API. It essentially grabs tweets from a user and outputs it to a terminal. Ideally, I'd like this to be automated, so when the user tweets, the program will see it and display the tweet. But that's a question for another time.
What I'd like to do now, however, is grab the tweets with only a hashtag in it.
How do I go about this? I'm hoping it's a parameter I can add with inside the timeline function..?
Here is a snippet of the code I have at the moment:
import tweepy
import twitter_credentials
auth = tweepy.OAuthHandler(twitter_credentials.CONSUMER_KEY, twitter_credentials.CONSUMER_SECRET)
auth.set_access_token(twitter_credentials.ACCESS_TOKEN, twitter_credentials.ACCESS_TOKEN_SECRET)
api = tweepy.API(auth)
stuff = api.user_timeline(screen_name = 'XXXXXX', count = 10, include_rts = False)
for status in stuff:
print(status.text)
For a simple use case you can use # in the search string, for example:
api = tweepy.API(auth,wait_on_rate_limit=True)
tweet in tweepy.Cursor(api.search,q="#",count=100).items():
print(tweet)
This will give you tweets which contain any hastags.

How to stream tweets using tweepy from a start date to end date using python?

I am currently in the process of doing some research using sentiment analysis on twitter data regarding a certain topic (isn't necessarily important to this question) using python, of which I am a beginner at. I understand the twitter streaming API limits users to access only to the previous 7 days unless you apply for a full enterprise search which opens up the whole archive. I had recently been given access to the full archive for this research project from twitter but I am unable to specify a start and end date to the tweets I would like to stream into a csv file. This is my code:
import pandas as pd
from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener
ckey = 'xxxxxxxxxxxxxxxxxxxxxxx'
csecret = 'xxxxxxxxxxxxxxxxxxxxxxx'
atoken = 'xxxxxxxxxx-xxxxxxxxxxxxxxxxxxxxxxx'
asecret = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxx'
# =============================================================================
# def sentimentAnalysis(text):
# output = '0'
# return output
# =============================================================================
class listener(StreamListener):
def on_data(self, data):
tweet = data.split(',"text":"')[1].split('","source')[0]
saveMe = tweet+'::'+'\n'
output = open('output.csv','a')
output.write(saveMe)
output.close()
return True
def on_error(self, status):
print(status)
auth = OAuthHandler(ckey, csecret)
auth.set_access_token(atoken, asecret)
twitterStream = Stream(auth, listener())
twitterStream.filter(track=["#weather"], languages = ["en"])
Now this code streams twitter date from the past 7 days perfectly. I tried changing the bottom line to
twitterStream.filter(track=["#weather"], languages = ["en"], since = ["2016-06-01"])
but this returns this error :: filter() got an unexpected keyword argument 'since'.
What would be the correct way to filter by a given date frame?
The tweepy does not provide the "since" argument, as you can check yourself here.
To achieve the desired output, you will have to use the api.user_timeline, iterating through pages until the desired date is reached, Eg:
import tweepy
import datetime
# The consumer keys can be found on your application's Details
# page located at https://dev.twitter.com/apps (under "OAuth settings")
consumer_key=""
consumer_secret=""
# The access tokens can be found on your applications's Details
# page located at https://dev.twitter.com/apps (located
# under "Your access token")
access_token=""
access_token_secret=""
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
page = 1
stop_loop = False
while not stop_loop:
tweets = api.user_timeline(username, page=page)
if not tweets:
break
for tweet in tweets:
if datetime.date(YEAR, MONTH, DAY) < tweet.created_at:
stop_loop = True
break
# Do the tweet process here
page+=1
time.sleep(500)
Note that you will need to update the code to fit your needs, this is just a general solution.

Alternative to Twitter API

I'm working on project where I stream tweets from Twitter API then apply sentiment analysis and visualize the results on an interactive colorful map.
I've tried the 'tweepy' library in python but the problem is it only retrieves few tweets (10 or less).
Also, I'm going to specify the language and the location which means I might get even less tweets! I need a real time streaming of hundred/thousands of tweets.
This is the code I tried (just in case):
import os
import tweepy
from textblob import TextBlob
port = os.getenv('PORT', '8080')
host = os.getenv('IP', '0.0.0.0')
# Step 1 - Authenticate
consumer_key= 'xx'
consumer_secret= 'xx'
access_token='xx'
access_token_secret='xx'
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
#Step 3 - Retrieve Tweets
public_tweets = api.search('school')
for tweet in public_tweets:
print(tweet.text)
analysis = TextBlob(tweet.text)
print(analysis)
Is there any better alternatives? I found "PubNub" which is a JavaScript API but for now I want something in python since it is easier for me.
Thank you
If you want large amount of tweets, I would recommend you to utilize Twitter's streaming API using tweepy:
#Create a stream listner:
import tweepy
tweets = []
class MyStreamListener(tweepy.StreamListener):
#The next function defines what to do when a tweet is parsed by the streaming API
def on_status(self, status):
tweets.append(status.text)
#Create a stream:
myStreamListener = MyStreamListener()
myStream = tweepy.Stream(auth = api.auth, listener=myStreamListener)
#Filter streamed tweets by the keyword 'school':
myStream.filter(track=['school'], languages=['en'])
Note that track filter used here is the standard free filtering API where there is another API called PowerTrack which is built for enterprises who have more requirements and rules to filter on.
Ref: https://developer.twitter.com/en/docs/tweets/filter-realtime/overview/statuses-filter
Otherwise, if you want to stick to the search method, you can query maximum of 100 tweets by adding count and use since_id on the maximum id parsed to get new tweets, you can add those attributes to the search method as follows:
public_tweets = []
max_id = 0
for i in range(10): #This loop will run 10 times you can play around with that
public_tweets.extend(api.search(q='school', count=100, since_id=max_id))
max_id = max([tweet.id for tweet in public_tweets])
#To make sure you only got unique tweets, you can do:
unique_tweets = list({tweet._json['id']:tweet._json for tweet in public_tweets}.values())
This way you will have to be careful with the API's limits and you will have to handle that by enabeling wait_on_rate_limit attribute when you initialize the API: api = tweepy.API(auth,wait_on_rate_limit=True)

Twitter API: Pull Tweets by a user AND containing a keyword using Twython (Python)

I'm new to Python and Twython, and could use some help with #3. I'm doing some research and want to pull tweets by a user AND containing a certain keyword. I'd also like to have it exported to csv but figuring it out at all is the first part. Thanks :)
# Bring in the module Twython which pulls from Twitter's API
from twython import Twython, TwythonError
# Making variables for my twitter API keys
CONSUMER_KEY = 'my personal input hered'
CONSUMER_SECRET = 'my personal input here'
ACCESS_KEY = 'my personal input here'
ACCESS_SECRET = 'my personal input here'
twitter = Twython(CONSUMER_KEY,CONSUMER_SECRET,ACCESS_KEY,ACCESS_SECRET)
## 1) This block works and searches for tweets containing a specific keyword
print(twitter.search(q='python'))
## 2) This block works as well and returns all tweets by username
try:
user_timeline = twitter.get_user_timeline(screen_name='')
except TwythonError as e:
print e
for tweets in user_timeline:
print tweets['text']
## 3) I can't get this one to work. It is supposed to return all tweets by
# username AND containing keyword
*try:
user_timeline = twitter.get_user_timeline(screen_name='ThePSF') and twitter.search(q='lines')
except TwythonError as e:
print e
for tweets in user_timeline and q:
print tweets['text']*
The Twitter API does not allow you to search within a user timeline - you would have to do that yourself in #2.
An alternative to what you are doing in #3 is to just use the Search API (twitter.search in this case) and pass in a query that covers both cases. An example would be
twitter.search(q='from:ThePSF AND lines')
However, note that the search API is limited to 7 days of data.

Return a users tweets with tweepy

I am using tweepy and python 2.7.6 to return the tweets of a specified user
My code looks like:
import tweepy
ckey = 'myckey'
csecret = 'mycsecret'
atoken = 'myatoken'
asecret = 'myasecret'
auth = tweepy.OAuthHandler(ckey, csecret)
auth.set_access_token(atoken, asecret)
api = tweepy.API(auth)
stuff = api.user_timeline(screen_name = 'danieltosh', count = 100, include_rts = True)
print stuff
However this yields a set of messages which look like<tweepy.models.Status object at 0x7ff2ca3c1050>
Is it possible to print out useful information from these objects? where can I find all of their attributes?
Unfortunately, Status model is not really well documented in the tweepy docs.
user_timeline() method returns a list of Status object instances. You can explore the available properties and methods using dir(), or look at the actual implementation.
For example, from the source code you can see that there are author, user and other attributes:
for status in stuff:
print status.author, status.user
Or, you can print out the _json attribute value which contains the actual response of an API call:
for status in stuff:
print status._json
import tweepy
import tkinter
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)
# set parser=tweepy.parsers.JSONParser() if you want a nice printed json response.
userID = "userid"
user = api.get_user(userID)
tweets = api.user_timeline(screen_name=userID,
# 200 is the maximum allowed count
count=200,
include_rts = False,
# Necessary to keep full_text
# otherwise only the first 140 words are extracted
tweet_mode = 'extended'
)
for info in tweets[:3]:
print("ID: {}".format(info.id))
print(info.created_at)
print(info.full_text)
print("\n")
Credit to https://fairyonice.github.io/extract-someones-tweet-using-tweepy.html
In Tweeter API v2 getting tweets of a specified user is fairly easy, provided that you won’t exceed the limit of 3200 tweets. See documentation for more info.
import tweepy
# create client object
tweepy.Client(
bearer_token=TWITTER_BEARER_TOKEN,
consumer_key=TWITTER_API_KEY,
consumer_secret=TWITTER_API_KEY_SECRET,
access_token=TWITTER_ACCESS_TOKEN,
access_token_secret=TWITTER_TOKEN_SECRET,
)
# retrieve first n=`max_results` tweets
tweets = client.get_users_tweets(id=user_id, **kwargs)
# retrieve using pagination until no tweets left
while True:
if not tweets.data:
break
tweets_list.extend(tweets.data)
if not tweets.meta.get('next_token'):
break
tweets = client.get_users_tweets(
id=user_id,
pagination_token=tweets.meta['next_token'],
**kwargs,
)
The tweets_list is going to be a list of tweepy.tweet.Tweet objects.

Categories

Resources