Tweepy get tweets by date - python

I want to get tweets for a certain topic between dates. Is this possible in tweepy? (or any other API for twitter?)
I can get it working for user_timeline by using the IDs, but when I change it to use api.search the program basically just keeps on running without any output
def getTweets(username):
tweets = []
tmpTweets = api.user_timeline(username, tweet_mode = 'extended', include_rts=True)
for tweet in tmpTweets:
if tweet.created_at < endDate and tweet.created_at > startDate:
tweets.append(tweet)
while (tmpTweets[-1].created_at > startDate):
tmpTweets = api.user_timeline(username, max_id = tmpTweets[-1].id,tweet_mode = 'extended')
for tweet in tmpTweets:
if tweet.created_at < endDate and tweet.created_at > startDate:
tweets.append(tweet)
return tweets
tl;dr: Is there a way in python to get tweets between two dates based on keyword search?

What about using a cursor:
text_query = 'Coronavirus'
since_date = '2020-02-10'
until_date = '2020-08-10'
max_tweets = 150
# Creation of query method using parameters
tweets = tweepy.Cursor(api.search,q=text_query, since=since_date, until=until_date).items(max_tweets)
Also described in this blog and this answer.

Related

Is there a faster way to count all hashtags using Tweepy and Twitter API?

I want to return all hashtags that match my search, but it is currently taking a very long time to return all the data. In a perfect world I would like to return data where hashtag matches my search query. Get the count of how many times it was mentioned, and then see who tweeted it. Currently just to count the hashtags within a day takes a long time. Here is my current code.
def main():
consumer_key= 'key'
consumer_secret= 'key'
access_token= 'key'
access_token_secret= 'key'
auth = tw.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tw.API(auth, wait_on_rate_limit=True)
search_words = "#search"
date_since = "2021-04-26"
tweets = tw.Cursor(api.search,
q=search_words,
lang="en",
fromDate=date_since).items()
count = 0
for tweet in tweets:
count = count + 1
#print(tweet.text)
print(count)
if __name__ == "__main__":
main()
EDIT: I found out it is sleeping on the Wait Rate limit. Is their anyay around the wait limit?
in current version of tweepy counts are done like this:
import tweepy
client = tweepy.Client(bearer_token="TWITTER_API_BEARER")
# Replace with your own search query
query = 'search -is:retweet'
counts = client.get_recent_tweets_count(query=query, granularity='day')
for count in counts.data:
print(count)

How to get all tweets after a DateTime with Twitter API and Tweepy

I currently want to look for a keyword and return all mentions of that word. I only want everything within the last 15 minutes though. I currently only know how to send a date and not DateTime. My code is as follows
def search_hash(api, since_id):
logger.info("Retrieving tweets")
search_words = "#keyword"
date_since = "2021-04-28 13:48:01"
new_since_id = since_id
tweetss = tw.Cursor(api.search,
q=search_words,
result_type='latest',
lang="en",
count = 100,
since=date_since,
since_id=since_id).items()
count = 0
for tweet in tweetss:
count = count + 1
print(tweet.text)
print(tweet.created_at)
new_since_id = max(tweet.id, new_since_id)
print(count)
return new_since_id
I tried using date_since to be a datetime, but I do not think it is working correctly as it is returning times before my filter. Is their anyway to do a datetime?
I figured this out by getting the ID of the latest tweet then using that id as the last id that should be checked and having the script run every 15 minutes while updating the last_id each time

How to remove duplicate tweets in Python?

I am trying to retrieve about 1000 tweets from a search term like 'NFL' using tweepy and storing the tweets into a DataFrame using pandas. My issue is I can't find a way to remove duplicated tweets, I have tried df.drop_duplicates but it only gives me about 100 tweets to work with. Help would be appreciated!
num_needed = 1000
tweet_list = [] # Lists to be added as columns( Tweets, usernames, and screen names) in our dataframe
user_list = []
screen_name_list = []
last_id = -1 # ID of last tweet seen
while len(tweet_list) < num_needed:
try:
new_tweets = api.search(q = 'NFL', count = num_needed, max_id = str(last_id - 1), lang = 'en', tweet_mode = 'extended') # This is the criteria for collecting the tweets that I want. I want to make sure the results are as accurate as possible when making a final analysis.
except tweepy.TweepError as e:
print("Error", e)
break
else:
if not new_tweets:
print("Could not find any more tweets!")
break
else:
for tweet in new_tweets:
# Fetching the screen name and username
screen_name = tweet.author.screen_name
user_name = tweet.author.name
tweet_text = tweet.full_text
tweet_list.append(tweet_text)
user_list.append(user_name)
screen_name_list.append(screen_name)
df = pd.DataFrame() #Create a new dataframe (df) with new columns
df['Screen name'] = screen_name_list
df['Username'] = user_list
df['Tweets'] = tweet_list
Well, yes, when you use .drop_duplicates(), you only get 100 tweets because that's how many duplicates there are. Doesn't matter what technique you use here, there are 900 or so duplicates with how your code runs.
So you might be asking, why? It by default returns only 100 tweets, which I am assuming you are aware of since you are looping and you try to get more by using the max_id parameter. However, your max_id, is always -1 here, you never get the id and thus never change that parameter. So one thing you can do, is while you iterate through the tweets, also collect the ids. Then after you get all the ids, store the minimum id value as last_id, then it'll work in your loop:
Code:
num_needed = 1000
tweet_list = [] # Lists to be added as columns( Tweets, usernames, and screen names) in our dataframe
user_list = []
screen_name_list = []
tw_id = [] #<-- ADDED THIS
last_id = -1 # ID of last tweet seen
while len(tweet_list) < num_needed:
try:
new_tweets = api.search(q = 'NFL -filter:retweets', count = num_needed, max_id = str(last_id - 1), lang = 'en', tweet_mode = 'extended') # This is the criteria for collecting the tweets that I want. I want to make sure the results are as accurate as possible when making a final analysis.
except tweepy.TweepError as e:
print("Error", e)
break
else:
if not new_tweets:
print("Could not find any more tweets!")
break
else:
for tweet in new_tweets:
# Fetching the screen name and username
screen_name = tweet.author.screen_name
user_name = tweet.author.name
tweet_text = tweet.full_text
tweet_list.append(tweet_text)
user_list.append(user_name)
screen_name_list.append(screen_name)
tw_id.append(tweet.id) #<-- ADDED THIS
last_id = min(tw_id) #<-- ADDED THIS
df = pd.DataFrame({'Screen name':screen_name_list,
'Username':user_list,
'Tweets':tweet_list})
df = df.drop_duplicates()
This returns to me aprox 1000 tweets.
Output:
print (len(df))
1084

Why do I keep getting a line 93 error saying TwitterClient not defined when it is defined in a class above?

This is for tweepy. It says
TwitterClient not defined.
import re
import tweepy
from tweepy import OAuthHandler
from textblob import TextBlob
class TwitterClient(object):
'''
Generic Twitter Class for sentiment analysis.
'''
def __init__(self):
'''
Class constructor or initialization method.
'''
# keys and tokens from the Twitter Dev Console
consumer_key = 'remove'
consumer_secret = 'remove'
access_token = 'remove-remove'
access_token_secret = 'remove'
# attempt authentication
try:
# create OAuthHandler object
self.auth = OAuthHandler(consumer_key, consumer_secret)
# set access token and secret
self.auth.set_access_token(access_token, access_token_secret)
# create tweepy API object to fetch tweets
self.api = tweepy.API(self.auth)
except:
print("Error: Authentication Failed")
def clean_tweet(self, tweet):
'''
Utility function to clean tweet text by removing links, special characters
using simple regex statements.
'''
return ' '.join(re.sub("(#[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)", " ", tweet).split())
def get_tweet_sentiment(self, tweet):
'''
Utility function to classify sentiment of passed tweet
using textblob's sentiment method
'''
# create TextBlob object of passed tweet text
analysis = TextBlob(self.clean_tweet(tweet))
# set sentiment
if analysis.sentiment.polarity > 0:
return 'positive'
elif analysis.sentiment.polarity == 0:
return 'neutral'
else:
return 'negative'
def get_tweets(self, query, count = 10):
'''
Main function to fetch tweets and parse them.
'''
# empty list to store parsed tweets
tweets = []
try:
# call twitter api to fetch tweets
fetched_tweets = self.api.search(q = query, count = count)
# parsing tweets one by one
for tweet in fetched_tweets:
# empty dictionary to store required params of a tweet
parsed_tweet = {}
# saving text of tweet
parsed_tweet['text'] = tweet.text
# saving sentiment of tweet
parsed_tweet['sentiment'] = self.get_tweet_sentiment(tweet.text)
# appending parsed tweet to tweets list
if tweet.retweet_count > 0:
# if tweet has retweets, ensure that it is appended only once
if parsed_tweet not in tweets:
tweets.append(parsed_tweet)
else:
tweets.append(parsed_tweet)
# return parsed tweets
return tweets
except tweepy.TweepError as e:
#print error (if any)
print("Error : " + str(e))
def main():
#creating object of TwitterClient Class
api = TwitterClient()
#calling function to get tweets
tweets = api.get_tweets(query = 'ADF', count = 200)
#picking positive tweets from tweets
ptweets = [tweet for tweet in tweets if tweet['sentiment'] == 'positive']
# percentage of positive tweets
print("Positive tweets percentage: {} %".format(100*len(ptweets)/len(tweets)))
#picking negative tweets from tweets
ntweets = [tweet for tweet in tweets if tweet['sentiment'] == 'negative']
#percentage of negative tweets
print("Negative tweets percentage: {} %".format(100*len(ntweets)/len(tweets)))
#percentage of neutral tweets \
netweets = [tweet for tweet in tweets if tweet['sentiment'] == 'neutral']
print("Neutral tweets percentage: {} %".format(100*(len(netweets)/len(tweets))))
#printing first 5 positive tweets
print("\n\nPositive tweets:")
for tweet in ptweets[:10]:
print(tweet['text'])
#printing first 5 negative tweets
print("\n\nNegative tweets:")
for tweet in ntweets[:10]:
print(tweet['text'])
if __name__ == "__main__":
#calling main function
main()
Here is a cut-down version of your code which demonstrates the problem.
class TwitterClient(object):
None
def main():
api = TwitterClient()
print("main()")
if __name__ == "__main__":
main()
Note the indentation of both main() and if __name__ == "__main__" place them under the definition of TwitterClient itself. Hence the error, in Python 3:
Traceback (most recent call last):
File "twitter-55610165.py", line 2, in <module>
class TwitterClient(object):
File "twitter-55610165.py", line 11, in TwitterClient
main()
File "twitter-55610165.py", line 6, in main
api = TwitterClient()
NameError: name 'TwitterClient' is not defined
TwitterClient is not defined because the class definition of TwitterClient has not finished - you're still inside it. The if is at the class scope level, so runs at time of definition of the class. Indentation determines a lot about scope in Python.
With small but important changes in whitespace, to take main() and if __name__ ... out of the TwitterClient scope and put them back at the main scope, the problem goes away.
class TwitterClient(object):
None
def main():
api = TwitterClient()
print("main()")
if __name__ == "__main__":
main()
ie these constructs are now at the same indent level as TwitterClient, further left by one level of indentation.
$ python3 twitter-55610165.py
main()
An easy solution is to remove main() from the TwitterClient() definition.
The exact problem is that main() is inside the TwitterClient(), so in other words you haven't finished defining TwitterClient(), so python is throwing errors.
How to fix
The easiest solution is to move main() and if __name__... lines from the TwitterClient() definition. That will get rid of your current error. This code should work:
import re
import tweepy
from tweepy import OAuthHandler
from textblob import TextBlob
class TwitterClient(object):
'''
Generic Twitter Class for sentiment analysis.
'''
def init(self):
'''
Class constructor or initialization method.
'''
# keys and tokens from the Twitter Dev Console
consumer_key = 'remove'
consumer_secret = 'remove'
access_token = 'remove-remove'
access_token_secret = 'remove'
# attempt authentication
try:
# create OAuthHandler object
self.auth = OAuthHandler(consumer_key, consumer_secret)
# set access token and secret
self.auth.set_access_token(access_token, access_token_secret)
# create tweepy API object to fetch tweets
self.api = tweepy.API(self.auth)
except:
print("Error: Authentication Failed")
def clean_tweet(self, tweet):
'''
Utility function to clean tweet text by removing links, special characters
using simple regex statements.
'''
return ' '.join(re.sub("(#[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)", " ", tweet).split())
def get_tweet_sentiment(self, tweet):
'''
Utility function to classify sentiment of passed tweet
using textblob's sentiment method
'''
# create TextBlob object of passed tweet text
analysis = TextBlob(self.clean_tweet(tweet))
# set sentiment
if analysis.sentiment.polarity > 0:
return 'positive'
elif analysis.sentiment.polarity == 0:
return 'neutral'
else:
return 'negative'
def get_tweets(self, query, count = 10):
'''
Main function to fetch tweets and parse them.
'''
# empty list to store parsed tweets
tweets = []
try:
# call twitter api to fetch tweets
fetched_tweets = self.api.search(q = query, count = count)
# parsing tweets one by one
for tweet in fetched_tweets:
# empty dictionary to store required params of a tweet
parsed_tweet = {}
# saving text of tweet
parsed_tweet['text'] = tweet.text
# saving sentiment of tweet
parsed_tweet['sentiment'] = self.get_tweet_sentiment(tweet.text)
# appending parsed tweet to tweets list
if tweet.retweet_count > 0:
# if tweet has retweets, ensure that it is appended only once
if parsed_tweet not in tweets:
tweets.append(parsed_tweet)
else:
tweets.append(parsed_tweet)
# return parsed tweets
return tweets
except tweepy.TweepError as e:
#print error (if any)
print("Error : " + str(e))
def main():
#creating object of TwitterClient Class
api = TwitterClient()
#calling function to get tweets
tweets = api.get_tweets(query = 'ADF', count = 200)
#picking positive tweets from tweets
ptweets = [tweet for tweet in tweets if tweet['sentiment'] == 'positive']
#percentage of positive tweets
print("Positive tweets percentage: {} %".format(100*len(ptweets)/len(tweets)))
#picking negative tweets from tweets
ntweets = [tweet for tweet in tweets if tweet['sentiment'] == 'negative']
#percentage of negative tweets
print("Negative tweets percentage: {} %".format(100*len(ntweets)/len(tweets)))
#percentage of neutral tweets \
netweets = [tweet for tweet in tweets if tweet['sentiment'] == 'neutral']
print("Neutral tweets percentage: {} %".format(100*(len(netweets)/len(tweets))))
#printing first 5 positive tweets
print("\n\nPositive tweets:")
for tweet in ptweets[:10]:
print(tweet['text'])
#printing first 5 negative tweets
print("\n\nNegative tweets:")
for tweet in ntweets[:10]:
print(tweet['text'])
if __name__ == "__main__":
#calling main function
main()

Filter tweets using "filter_level" in Twitter streaming API

I am using statuses/filter and am trying to filter the tweets from the twitter stream based on the parameter "filter_level".
query = ["Donald Trump","Cristiano Ronaldo"]
numberOfTweets = 1000
dictOfTweets ={}
twitter_api = oauth_login()
twitter_stream = twitter.TwitterStream(auth=twitter_api.auth)
for q in query:
stream = twitter_stream.statuses.filter(track=q,max_count=numberOfTweets,languages= ['en'],filter_level=['medium'])
for tweet in stream:
if tweet.get('text',0) == 0:
continue
dictOfTweets.setdefault(q,[]).append(tweet['text'])
I am still getting tweets with filter_level ="low". It would be really helpful if anyone can suggest what am I missing or doing wrong?
You need to put (languages= ['en'],filter_level=['medium']) while Authentication
query = ["Donald Trump","Cristiano Ronaldo"]
numberOfTweets = 1000
dictOfTweets ={}
twitter_api = oauth_login()
twitter_stream = twitter.TwitterStream(auth=twitter_api.auth, ,languages= ['en'],filter_level=['medium'])
for q in query:
stream = twitter_stream.statuses.filter(track=q,max_count=numberOfTweets)
for tweet in stream:
if tweet.get('text',0) == 0:
continue
dictOfTweets.setdefault(q,[]).append(tweet['text'])

Categories

Resources