Extracting tweets with tweepy - python

I have tried using tweepy to extract tweets for a specific keyword.But the count of extracted tweets using tweepy is less compared those tweets for the specific keyword as seen on twitter search.
Also I want to know how to effectively extract ALL the tweets for a specific keyword of interest using any twitter data extracting library (tweepy/twython).
I also face a problem of irrelevant tweets with same keyword coming up.Is there a way to fine tune search and perform accurate extraction so that I get all the tweets extracted for the specific keyword.
Im adding the code snippet as many asked for it.But I don't have a problem with the code as its running.
tweets = api.search('Mexican Food', count=500,tweet_mode = 'extended')
data = pd.DataFrame(data=[tweet.full_text for tweet in tweets], columns
['Tweets'])
data.head(10)
print(tweets[0].created_at)
My question is that how to get ALL the tweets with a particular keyword.For example when I run the above code ,for each time I am getting different count of tweets.Also I cross checked with doing manual search on twitter and it seems that there are much more tweets than extracted through tweepy for the particular keyword.
Also I want to know if there is any way to fine tune the keyword search through python so that all the relevant tweets for my keyword of interest is fetched.

The thing is when you use tweepy It has some limitation. It won't be able to fetch older tweets.
So I will suggest you to use
https://github.com/Jefferson-Henrique/GetOldTweets-python
in place of tweepy to fetch the older tweets.

Since you refuse to help me with your question, I'll do the bare minimum with my answer:
You are probably not doing pagination correctly
ps: Check out the stack overflow guidelines. There is an important point about Helping others reproduce the problem

Related

Any ways to retrive tweets based on hashtag for a time frame more than a year?

Im looking for ways to retrieve tweets from Twitter which contains certain hashtags.
I tried to use the official API and tweepy package in Python but even with academic access I was only able to retrieve tweets which are 7 days old. I want to retrieve tweets from 2019 till 2020 but Im not able to do so with tweepy.
I tried the following packages GetOldTweet3, twint but none of them seem to work due to some changes Twitter made last year.
Can someone suggest a way to get old tweets with certain hashtags. Thanks in advance for any help or suggestion provided.
If you have academic access, you are able to use the full archive search API available in the Twitter API v2. Tweepy has support for this via the tweepy.Client class. There's a full tutorial on DEV, but the code will be something like this:
import tweepy
client = tweepy.Client(bearer_token='REPLACE_ME')
# Replace with your own search query
query = 'from:andypiper -is:retweet'
tweets = client.search_all_tweets(query=query, tweet_fields=['context_annotations', 'created_at'], max_results=100)
for tweet in tweets.data:
print(tweet.text)
if len(tweet.context_annotations) > 0:
print(tweet.context_annotations)
You can use search query parameters to specify the date range.

Twitter API - Obtain user tweets and parse into a table/database

This is a small project I'd like to get started on in the near future. It's still in the planning stage so this post is more about being steered in the right direction
Essentially, I'd like to obtain tweets from a user and parse the tweets into a table/database, with the aim to be able to run this program in real-time.
My initial plan to tackle this was to use Beautiful Soup, a Python specific library, however, I believe the Twitter API is the better approach (advice on this subject would be appreciated)
There are still 3 unknowns:
Where do I store the tweets once obtained?
How to parse the tweets?
Where to store the parsed data?
To answer (3), I suppose it depends on what I want to do with the data. I still haven't decided how I'll use the parsed data but I know that I'd like it put into categories so my thinking is probably a database/table/excel??
A few questions still to answer and I'd like you guys to steer me in the right direction. My programming language knowledge is limited to just C for now, but as this project means a great deal to me, I'm willing to put the effort in and learn the necessary languages/APIs.
What languages/APIs will I need to gain an understanding of to accomplish this project? From where I stand, it seems to be Twitter API and Python.
EDIT: So I have a basic script going which obtains a user tweets. It works better than expected. However, I'd like to take it another step. I'd like to only obtain the users' tweets if it contains a hashtag inside of the tweet. All other tweets should be ignored. How best to do this?
Here is a snippet of the basic code I have going:
import tweepy
import twitter_credentials
auth = tweepy.OAuthHandler(twitter_credentials.CONSUMER_KEY, twitter_credentials.CONSUMER_SECRET)
auth.set_access_token(twitter_credentials.ACCESS_TOKEN, twitter_credentials.ACCESS_TOKEN_SECRET)
api = tweepy.API(auth)
stuff = api.user_timeline(screen_name = 'XXXXXXXXXX', count = 10, include_rts = False)
for status in stuff:
print(status.text)
Scraping Twitter (or any other social network) with for example Beautiful soup, as you said, is not a good idea for 2 reasons :
if the source pages changes (name attributes, div ids...), you have to keep your code up to date
your script can be banned because scraping is not "allowed".
To answer your questions :
1) you can store the tweets wherever you want : csv, mysql, sqlite, redis, neo4j...
2) With official API, you get JSON. Here is a Tweet Object : https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/tweet-object.html . With tweepy, for example status.text will give you the text of the tweet.
3) Same as #1. If you don't know actually what you will do with the data, store the full JSONs. You will be able later to parse them.
I suggest tweepy/python (http://www.tweepy.org/) or twit/nodejs (https://www.npmjs.com/package/twit). And read official docs : https://developer.twitter.com/en/docs/api-reference-index

How to predict twitter tweet reach using python?

How to predict twitter tweet reach on a specific area ? For e.g if a user tweets in china then how can i predict about its reach there is a website named keyhole.co they tells the total reach of tweets containing the hashtag that user searched for. Below is the screenshot that is the result when i searched for hashTag (#Retweet)
Basically, they are tells reach of a hashtag.
But the question is that how to predict the about tweet reach in a specific area (country or region).
Backgroud Research:
I know that i can check count of how many times the Tweet has been viewed through twitter Engagement API. I have aslo checked that there is a api called TwitterCounter that contains the Scripts for counting tweets. Uses 'search/tweets' or 'statuses/filter' Twitter resource to get old or new tweets, respectively.
There are also ways to get counts of retweets like discussed in this answer. Moreover, there are other ways to get views/reach of old tweets and track specific word or hashtag in tweets etc.
But my question is that how can i predict about a tweet reach ?
I have been searching for this for past several days but didn't able to find a way or solution to do it.
Can anyone please help me by telling the way to get what i want or tell me any solution by which i can do it.
Thanks in advance!

Fetching tweets through tweepy - how many tweets?

I am doing a project on aspect based sentiment analysis from tweets on twitter. I am using tweepy to fetch the tweets. I noticed that I get only a few tweets. What is the maximum number of tweets I can get using tweepy, when a product name is searched? Is there any way I can increase this number? This is the code I am referring to : https://pastebin.com/vqbkFhtf
The number of tweets will depend on how many people are talking about the product.
Also i would suggest you to use twitter streaming API so that you get the tweets as they appear.

How to extract all tweets from multiple users' timelines using R?

I am working on a project for which I want to extract the timelines of around 500 different twitter users (I am using this for historical analysis, so I'll only need to retrieve them all once- no need to update with incoming tweets).
While I know the Twitter API only allows the last 3,200 tweets to be retrieved, when I use the basic UserTimeline method of the R twitteR package, I only seem to fetch about 20 every time I try (for users with significantly more, recent, tweets). Is this because of rate limiting, or because I am doing something wrong?
Does anyone have tips for doing this most efficiently? I realize it might take a lot of time because of rate limiting, is there a way of automating/iterating this process in R?
I am quite stuck, so thank you very much for any help/tips you may have!
(I have some experience using the Twitter API/twitteR package to extract tweets using a certain hashtag over a couple of days. I have basic Python skills, if it turns out to be easier/quicker to do in Python).
It looks like the twitteR documentation suggests using the maxID argument for pagination. So when you get the first batch of results, you could use the minimum ID in that set minus one as the maxID for the next request, until you get no more results back (meaning you've gotten to the beginning of a user's timeline).

Categories

Resources