Retrieving the tweets related to a specific search between two dates - python

Im struggling to retrieve the tweets associated with a particular search between two dates. I looked at the answer here and used that as below, but, as the answer mentions, the code only works for tweets which are 10-14 days old and as I need tweets from 2014, it results in tweets being an empty list.
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
tweets = []
company_name = '#' + 'Apple'
date_strng = " since:2014-10-11 until:2015-10-14"
for tweet in tweepy.Cursor(api.search,q=company_name + date_strng,count=10000,lang="en").items():
tweets.append(tweet)
Also tried the following, but it didnt work (tweets is again an empty list). But if I remove the until argument, I get the tweets since the start_date:
start_date = datetime.datetime(2014,10,11)
end_date = datetime.datetime(2015,10,14)
for tweet in tweepy.Cursor(api.search,q=company_name,count=10000,lang="en", since=start_date,until=end_date).items():
tweets.append(tweet)
Was wondering if there is a solution to this.
Thanks

Reason for the empty list is due to the fact that the standard search api retrieve only last 7 days of tweets . Since you have given the start and until dates it’s filtering the tweets as per dates. Obviously list will be empty.
Refer the below link for retrieving old tweets
https://stackoverflow.com/a/61737450/10703097
Also you are trying 1 year duration of tweet which is a huge corpus of tweets try to modify as per your needs.

Related

Tweepy get Tweets related to a specific country

Context
I am working on a topic modeling for twitter project.
The idea is to retrieve all tweets related to a specific country and analyze them in order to discover what people from a specific country are talking about on Twitter.
What I have tried
1.First Solution
I know that we can use twitter streaming API or cursor to retrieve tweets from a specific country and I have tried the following code to get all tweets given geocodes coordinates of a country.
I have written the following code :
def get_tweets(query_fname, auth, max_time, location=None):
stop = datetime.now() + max_time
twitter_stream = Stream(auth, CustomListener(query_fname))
while datetime.now() < stop:
if location:
twitter_stream.filter(locations=[11.94,-13.64,30.54,5.19], is_async=True)
else:
twitter_stream.filter(track=query, is_async=True)
The problem of this approach
Not everyone has allowed Twitter to access his location details and with this approach, I can only get a few tweets something like 300 tweets for my location.
There are some persons who are not in the country but who tweet about the country and people within the country replies to them. Their tweets are not captured by this approach.
2.Second Solution
Another approach was to collect tweets with hashtags related to a country with a cursor
I have tried this code :
def query_tweet(client, query=[], max_tweets=2000, country=None):
"""
query tweets using the query list pass in parameter
"""
query = ' OR '.join(query)
name = 'by_hashtags_'
now = datetime.now()
today = now.strftime("%d-%m-%Y-%H-%M")
with open('data/query_drc_{}_{}.jsonl'.format(name, today), 'w') as f:
for status in Cursor(
client.search,
q=query,
include_rts=True).items(max_tweets):
f.write(json.dumps(status._json) + "\n")
Problem
This approach gives more results than the first one but as you may notice, not everyone uses those hashtags to tweets about the country.
3.Third approach
I have tried to retrieve the tweet using place id specific to a country but it gives the same problem as the first approach.
My questions
How can I retrieve all tweets about a specific country? I mean everything people are tweeting about for a specific country with or without country-specific hashtags?
Hint: For people who are not located in the country, It may be a good idea to get their tweets if they were replied or retweeted by people within the country.
Regards.

Tweepy lookup of extended tweets for multiple tweets at a time?

I'm using tweepy to access a large number of tweets. Many tweets are truncated, so I want to get the full text of some tweets, which I have the id for.
My problem is: The tweepy api instance has one method of downloading multiple tweets at once (api.statuses_lookup), but this returns truncated tweets.
It also has a method that includes the full tweet text (api.get_status), but which afaik only takes one tweet at a time.
Is there way of getting the full text for multiple tweets at once?
import tweepy
consumer_key = "XXX"
secret = "XXX"
auth = tweepy.AppAuthHandler(consumer_key, secret)
auth.secure = True
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)
ids = [1108360183586140161, 1108474125486641153]
# Finds tweets (up to 100 at a time), but doesn't contain extended text
foo = api.statuses_lookup(ids)
# Returns tweet, including extended text, but only for one at a time
bar = api.get_status(1108449077937635328, tweet_mode='extended')
As pointed out by Andy Piper, the issue was fixed in a recent update of the Tweepy library, so running
pip install tweepy --upgrade
solves this.

twitter tweet_mode = 'extended' not just giving me the text in the tweet

I'm trying to download tweets using tweepy. But the tweets keep getting cut off.
results = api.search(q=hashtag, lang="en", count=num, tweet_mode="extended")
for tweet in results:
tweet_list.append(tweet.full_text)
I end up getting outputs looking like this:
RT #Acosta: Trump also said at the meeting “why do we need more Haitians? Take them out,” a person familiar with today’s meeting confirms t…
I just want the actual full text part of the tweet.
Already answered here
Instead of full_text=True you need tweet_mode="extended"
Then, instead of text you should use full_text to get the full tweet text.
Your code should look like:
new_tweets = api.user_timeline(screen_name = screen_name,count=200, tweet_mode="extended")
Then in order to get the full tweets text:
tweets = [[tweet.full_text] for tweet in new_tweets]

Retweets of my tweets

So I am trying to find out retweets of my tweets using tweepy
# To get first tweet
firstTweet = api.user_timeline("zzaibis")[0]
# then getting the retweet data using tweet id and it gives me the results
resultsOfFirstTweet = api.retweets(firstTweet.id)
# but when I try to find any other tweet except first tweet, this returns nothing.
secondTweet = api.user_timeline("zzaibis")[1]
Any idea why this is not working beyond that, and also what I need to follow to get all the retweet data of my tweets considering that I don't have limited tweets and retweets in my account.
To find the number of retweets a user gets on each tweet from their timeline try the following code:
for tweet in api.user_timeline(screen_name = 'StackOverflow', count = 10):
print(tweet.retweet_count)
To edit the user who is being searched change the "screen_name" to the username of the person you want to search. To change the number of statuses is seaches change the "count".
you can also print information like the tweet id of the tweet being searched using:
print(tweet.id)

how to get English tweets alone using python?

Here is my current code
from twitter import *
t = Twitter(auth=OAuth(TWITTER_CONSUMER_KEY, TWITTER_CONSUMER_SECRET,
ACCESS_TOKEN, ACCESS_TOKEN_SECRET))
t.statuses.home_timeline()
query=raw_input("enter the query \n")
data = t.search.tweets(q=query)
for i in range (0,1000):
print data['statuses'][i]['text']
print '\n'
Here, I fetch tweets from all the languages. Is there a way to restrict myself to fetching tweets only in English?
There are at least 4 ways... I put them in the order of simplicity.
After you collect the tweets, the json output has a key/value pair that identifies the language. So you can use something like this to take all language tweets and select only the ones that are from English accounts.
for i in range (0,1000):
if data['statuses'][i][u'lang']==u'en':
print data['statuses'][i]['text']
print '\n'
Another way to collect only tweets that are identified in English, you can use the optional 'lang' parameter to request from the API only English (self-idenfitied) tweets. See details here. If you are using the python-twitter library, you can set the 'lang' parameter in twitter.py.
Use a language recognition package like guess-language.
Or if you want to recognize English text without using the self-identified twitter data (i.e. a chinese account that is writing in English), then you have to do Natural Language Processing. One option. This method will recognize common English words and then mark the text as English.
I try this for farsi:
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
res = api.search('lang','fa')
for i in res:
print( i.lang)

Categories

Resources