Using python, how to use collect tweets (using tweepy) between two dates? - python

How can i use python and tweepy in order to collect tweets from twitter that are between two given dates?
is there a way to pass from...until... values to the search api?
Note:
I need to be able to search back but WITHOUT limitation to a specific user
i am using python and I know that the code should be something like this but i need help to make it work.
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token_key, access_token_secret)
api = tweepy.API(auth)
collection = []
for tweet in tweepy.Cursor(api.search, ???????).items():
collection[tweet.id] = tweet._json

After long hours of investigations and stabilization i can gladly share my findings.
search by geocode: pass the geocode parameter in the 'q' parameter in this format: geocode:"37.781157,-122.398720,500mi" , the double quotes are important. notice that the parameter near is not supported anymore by this api. The geocode gives more flexibility
search by timeline: use the parameters "since" and "until" in the following format: "since:2016-08-01 until:2016-08-02"
there is one more important note... twitter don't allow queries with too old dates. I am not sure but i think they give only 10-14 days back. So you cannot query this way for tweets of last month.
===================================
for status in tweepy.Cursor(api.search,
q='geocode:"37.781157,-122.398720,1mi" since:2016-08-01 until:2016-08-02 include:retweets',
result_type='recent',
include_entities=True,
monitor_rate_limit=False,
wait_on_rate_limit=False).items(300):
tweet_id = status.id
tweet_json = status._json

As of now, Tweepy is not the best solution. The best solution is using the python library SnScrape, which scrapes twitter, and can therefore get tweets after the 2-week cap twitter sets. The below code only scrapes for 100 English tweets between dates and only gets the tweet ID, but it can be easily extended for more specific searches, more or fewer tweets, or to get more information about the tweet.
import snscrape.modules.twitter as sntwitter
tweetslist = []
params="'"+"lang:en "+"since:2020-11-1"+" until:2021-03-13"+"'"
for i,tweet in enumerate(sntwitter.TwitterSearchScraper(params).get_items()):
if i>100:
break
tweetslist.append([tweet.id])
print(tweetslist)

You have to use max_id parameters as described in twitter documentation
tweepy is a wrapper around twitter API so you should be able to use this parameter.
As per geolocation, take look at The Search API: Tweets by Place. It uses same search API, with customized keys.

Related

Is there a way I can filter out bogus locations while scraping tweets from Twitter API using tweepy?

I am looking for a way to filter out bogus locations that I get from tweets extracted utilizing tweepy. I use tweepy to extract the tweets and then a written function to get the tweet's user location. Here is the part of the function that gets the location:
for tweet in tweets:
if hasattr(tweet, 'user') and hasattr(tweet.user, 'screen_name') and hasattr(tweet.user, 'location'):
if tweet.user.location:
location_data.append((tweet.user.screen_name, tweet.user.location))
original_tweets.append((tweet.text))
and is an example of the result of the data frame col the get_user_location function writes to.
Is there a simple way to validate if the locations are legitimate?

Is there a way to filter out Retweets from results of search_tweets using Tweepy?

I'm using Tweepy to query Tweets with certain keywords and am able to a get a list of Tweet objects (or Status objects?) (https://developer.twitter.com/en/docs/twitter-api/data-dictionary/object-model/tweet).
I'm wondering if there's a way to remove Retweets from all the Tweets since I want to look at original Tweets and replies specifically. If there's an attribute in the Tweet object, that'd solve my problem but didn't figure that out so far.
Thanks in advance!
If you're using the standard search API with API.search_tweets, you can use the -filter:retweets operator in your query.
Also, if you're using API.search_tweets, you're using Twitter API v1.1, and https://developer.twitter.com/en/docs/twitter-api/v1/data-dictionary/object-model/tweet is the corresponding documentation for the Tweet object. You can distinguish Retweets by the existence of a retweeted_status attribute.

Any ways to retrive tweets based on hashtag for a time frame more than a year?

Im looking for ways to retrieve tweets from Twitter which contains certain hashtags.
I tried to use the official API and tweepy package in Python but even with academic access I was only able to retrieve tweets which are 7 days old. I want to retrieve tweets from 2019 till 2020 but Im not able to do so with tweepy.
I tried the following packages GetOldTweet3, twint but none of them seem to work due to some changes Twitter made last year.
Can someone suggest a way to get old tweets with certain hashtags. Thanks in advance for any help or suggestion provided.
If you have academic access, you are able to use the full archive search API available in the Twitter API v2. Tweepy has support for this via the tweepy.Client class. There's a full tutorial on DEV, but the code will be something like this:
import tweepy
client = tweepy.Client(bearer_token='REPLACE_ME')
# Replace with your own search query
query = 'from:andypiper -is:retweet'
tweets = client.search_all_tweets(query=query, tweet_fields=['context_annotations', 'created_at'], max_results=100)
for tweet in tweets.data:
print(tweet.text)
if len(tweet.context_annotations) > 0:
print(tweet.context_annotations)
You can use search query parameters to specify the date range.

Tweets scraping - how to measure tweeting intensity?

I am looking for a method to get information of a "trend" regarding some hashtag/key word on Twitter. Let`s say I want to measure how often the hashtag/key word "Python" is tweeted in time. For instance, today, "Python" is tweeted on average every 1 minute but yesterday it was tweeted on average every 2 minutes.
I have tried various options but I am always bouncing off the twitter API limitations, i.e. if I try to download all tweets for a hashtag during the last (for example) day, only a certain franction of the tweets is downloaded (via tweepy.cursor).
Do you have any ideas / script examples of achieving similar results? Libraries or guides to recommend? I did not find any help searching on the internet. Thank you.
You should check twint repository.
Can fetch almost all Tweets (Twitter API limits to last 3200 Tweets only);
Fast initial setup;
Can be used anonymously and without Twitter sign up;
here is a sample code:
import twint
def scrapeData(search):
c = twint.Config()
c.Search = search
c.Since = '2021-03-05 00:00:00'
c.Until = '2021-03-06 00:00:00'
c.Pandas = True
c.Store_csv = True
c.Hide_output = True
c.Output = f'{search}.csv'
c.Limit = 10 # number of tweets want to fetch
print(f"\n#### Scraping from {c.Since} to {c.Until}")
twint.run.Search(c)
print("\n#### Preview: ")
print(twint.storage.panda.Tweets_df.head())
if __name__ == "__main__":
scrapeData(search="python")
Try a library called:
GetOldTweets or GetOldTweets3
Twitter Search, and by extension its API, are not meant to be an exhaustive source of tweets. The Twitter Streaming API places a limit of just one week on how far back tweets can be extracted from that match the input parameters. So in order to extract all historical tweets relevant to a set of search parameters for analysis, the Twitter Official API needs to be bypassed and custom libraries that mimic the Twitter Search Engine need to be used.

Tweepy - How to "tag" tweets with their respective tracking filter

I'm having a hard time formulating my question, but basically, imagine you're streaming Twitter with Tweepy and filtering the tweets on 2 keywords like that:
twitterStream = Stream(auth, listener())
twitterStream.filter(track=["keyword1", "keyword2"])
Basically, I would like to append the keywords on their respective tweets, so, for example, I would get something like this:
some tweet about keyword 1 [keyword1]
another tweet about keyword 1 [keyword1]
some tweet about keyword 2 [keyword2]
etc...
Is this possible?
Thanks!
Tweepy use twitter streaming API, from docs of streaming API, I believe its impossible to get result as you expected. Possible solutions are:
If you have very limited keywords to track, then for each of these keywords, make a streaming track request.
If you have lots of keywords to track, and you track all of these keywords in one streaming request, then you can perform keyword search on the returned tweets to determine what keywords this tweets containing. Depending one your applications, the search operation may process on many fields of tweet, e.g., text, hashtag, urls and etc.
Hope this would be help for. Thanks.

Categories

Resources