Using tweepy to get unique tweets - python

I am trying to get a corpus of Tweets using a number of search terms. One issue I am having is that it is not being able to get unique tweets. That is, retweets.
Is there a way to remove these beforehand without doing any text processing?
What I've got now:
api=tweepy.API(auth)
for search in hashtags:
for tweet in tweepy.Cursor(api.search,q=search,count=1000,lang="en").items():
text=repr(tweet.text.encode("utf-8"))
out.write(text+"\n")

You can add " -filter:retweets" to your query to only get original tweets. Maybe not the prettiest solution, but it works.
api=tweepy.API(auth)
for search in hashtags:
for tweet in tweepy.Cursor(api.search,q=search+" -filter:retweets",count=1000,lang="en").items():
text=repr(tweet.text.encode("utf-8"))
out.write(text+"\n")

Related

Is it possible to set multiple strings in query for search method of tweepy? python

What I want is to search tweets that have multiple words I choose on twitter with python.
The official doc dose not say anything but it seems that the search method only takes 1 query.
source code
import tweepy
CK=
CS=
AT=
AS=
auth = tweepy.OAuthHandler(CK, CS)
auth.set_access_token(AT, AS)
api = tweepy.API(auth)
for status in api.search(q='word',count=100,): # I want to set multiple words in q but when I do.
print(status.user.id)
print(status.user.screen_name)
print(status.user.name)
print(status.text)
print(status.created_at)
What I have tried is below it didn't get any error but it searched only with the last word in the query in this case, the results were only tweets with the word "Python" it did not get tweets with both words.
for status in api.search(q='Java' and 'Python',count=100,)
Official doc
https://developer.twitter.com/en/docs/twitter-api/v1/tweets/search/api-reference/get-search-tweets
So my questions is that is it possible to set multiple words in query.
Is the way I wrote is simply wrong?
If so, please let me know.
If it can't set multiple words, I would appreciate if you could share simple python code that works for what I want to do.
Thank you in advance.
Use:
for status in api.search(q='Java Python', count=100)
From the Search Tweets: Standard v1.1 section Standard search operators:
watching now - containing both “watching” and “now”. This is the default operator.
As explained by Vlad Siv, just put each word you wish to look for in the speech marks for the query param. This should in turn look for tweets containing these words.

Excluding link at the end while pulling tweets in tweepy Streaming

I am pulling text or extended_text using tweepy streaming, but when I pull these tweets, there is always a t.co/randomletters link at the end that leads to nowhere. What is it and how do I get rid of it?
Here is an example:
"text": "To make room for more expression, we will now count all emojis as equal—including those with gender‍‍‍ ‍‍and skin tone modifiers https://t.co(forward slash)MkGjXf9aXm"
Please help
As far as my experience with twitter and tweepy goes, these URL's are included in a tweet's text whenever there is a URL of some sort in the actual tweet, so we can't really avoid getting them.
You could remove them after you get them, this is a simple regex that replaces the pattern of these URL's with a blank string.
import re
re.sub(r' https://t.co/\w{10}', '', tweet_text)

How to get many returned tweets from Twitter search API

I'm looking into the Twitter Search API, and apparently, it has a count parameter that determines "The number of tweets to return per page, up to a maximum of 100." What does "per page" mean, if I'm for example running a python script like this:
import twitter #python-twitter package
api = twitter.Api(consumer_key="mykey",
consumer_secret="mysecret",
access_token_key="myaccess",
access_token_secret="myaccesssecret")
results = api.GetSearch(raw_query="q=%23myHashtag&geocode=59.347937,18.072433,5km")
print(len(results))
This will only give me 15 tweets in results. I want more, preferably all tweets, if possible. So what should I do? Is there a "next page" option? Can't I just specify the search query in a way that gives me all tweets at once? Or if the number of tweets is too large, some maximum number of tweets?
Tweepy has a Cursor object that works like this:
for tweet in tweepy.Cursor(api.search, q="#myHashtag&geocode=59.347937,18.072433,5km", lang='en', tweet_mode='extended').items():
# handle tweets here
You can find more info in the Tweepy Cursor docs.
With TwitterAPI you would access pages this way:
pager = TwitterPager(api,
'search/tweets',
{'q':'#myHashtag', 'geocode':'59.347937,18.072433,5km'})
for item in pager.get_iterator():
print(item['text'] if 'text' in item else item)
A complete example is here: https://github.com/geduldig/TwitterAPI/blob/master/examples/page_tweets.py

twitter tweet_mode = 'extended' not just giving me the text in the tweet

I'm trying to download tweets using tweepy. But the tweets keep getting cut off.
results = api.search(q=hashtag, lang="en", count=num, tweet_mode="extended")
for tweet in results:
tweet_list.append(tweet.full_text)
I end up getting outputs looking like this:
RT #Acosta: Trump also said at the meeting “why do we need more Haitians? Take them out,” a person familiar with today’s meeting confirms t…
I just want the actual full text part of the tweet.
Already answered here
Instead of full_text=True you need tweet_mode="extended"
Then, instead of text you should use full_text to get the full tweet text.
Your code should look like:
new_tweets = api.user_timeline(screen_name = screen_name,count=200, tweet_mode="extended")
Then in order to get the full tweets text:
tweets = [[tweet.full_text] for tweet in new_tweets]

Filtering in tweepy

I am new to tweepy and have encountered a problem. I want to download tweets with special hashtags. But it seems
stream.filter(track = ['word1', 'word2', 'word3'])
looks for these words in tweet and not in hashtags of the tweet. How can I filter on hashtags?
You can actually filter tweets based on your special hashtag.
stream.filter(track=['#MySpecialHashtag', '#AlsoThisHashtag'])
This will pick up only tweets that contain the hashtags you provide as part of the tweet text and save you from arbitrarily collecting tweets and checking if the hashtag field has your hashtag in it.
You find the tags in the status object. It is there you have to make the comparison with the ones you are looking for.
example:
for hashtag in status.entities['hashtags']:
print(hashtag['text'])
example here: http://www.pythoncentral.io/introduction-to-tweepy-twitter-for-python/

Categories

Resources