how to scrape twitter for a whole country using twint library - python

Hello I want to ask a question regarding scraping tweets from the Twitter using Twint library
Basically, to scrape tweet from a specific location, it is needed to put geocoded data which is consist of 'latitude, longitude, radius'
So my question here is how do I scrape the tweets in the whole Indonesia?
If i need to use the geocode, so the coordinate will be coor_ind = '4.2105, 101.9758, radius(km)
how do I determine the radius that covers the whole Indonesia?

First up, you might not get every tweet - because some tweets might not have location on them.
But you can use the c.near for cities in Indonesia.
An example is available on this Medium article.

Related

Extracting tweets with tweepy

I have tried using tweepy to extract tweets for a specific keyword.But the count of extracted tweets using tweepy is less compared those tweets for the specific keyword as seen on twitter search.
Also I want to know how to effectively extract ALL the tweets for a specific keyword of interest using any twitter data extracting library (tweepy/twython).
I also face a problem of irrelevant tweets with same keyword coming up.Is there a way to fine tune search and perform accurate extraction so that I get all the tweets extracted for the specific keyword.
Im adding the code snippet as many asked for it.But I don't have a problem with the code as its running.
tweets = api.search('Mexican Food', count=500,tweet_mode = 'extended')
data = pd.DataFrame(data=[tweet.full_text for tweet in tweets], columns
['Tweets'])
data.head(10)
print(tweets[0].created_at)
My question is that how to get ALL the tweets with a particular keyword.For example when I run the above code ,for each time I am getting different count of tweets.Also I cross checked with doing manual search on twitter and it seems that there are much more tweets than extracted through tweepy for the particular keyword.
Also I want to know if there is any way to fine tune the keyword search through python so that all the relevant tweets for my keyword of interest is fetched.
The thing is when you use tweepy It has some limitation. It won't be able to fetch older tweets.
So I will suggest you to use
https://github.com/Jefferson-Henrique/GetOldTweets-python
in place of tweepy to fetch the older tweets.
Since you refuse to help me with your question, I'll do the bare minimum with my answer:
You are probably not doing pagination correctly
ps: Check out the stack overflow guidelines. There is an important point about Helping others reproduce the problem

Twitter API - Obtain user tweets and parse into a table/database

This is a small project I'd like to get started on in the near future. It's still in the planning stage so this post is more about being steered in the right direction
Essentially, I'd like to obtain tweets from a user and parse the tweets into a table/database, with the aim to be able to run this program in real-time.
My initial plan to tackle this was to use Beautiful Soup, a Python specific library, however, I believe the Twitter API is the better approach (advice on this subject would be appreciated)
There are still 3 unknowns:
Where do I store the tweets once obtained?
How to parse the tweets?
Where to store the parsed data?
To answer (3), I suppose it depends on what I want to do with the data. I still haven't decided how I'll use the parsed data but I know that I'd like it put into categories so my thinking is probably a database/table/excel??
A few questions still to answer and I'd like you guys to steer me in the right direction. My programming language knowledge is limited to just C for now, but as this project means a great deal to me, I'm willing to put the effort in and learn the necessary languages/APIs.
What languages/APIs will I need to gain an understanding of to accomplish this project? From where I stand, it seems to be Twitter API and Python.
EDIT: So I have a basic script going which obtains a user tweets. It works better than expected. However, I'd like to take it another step. I'd like to only obtain the users' tweets if it contains a hashtag inside of the tweet. All other tweets should be ignored. How best to do this?
Here is a snippet of the basic code I have going:
import tweepy
import twitter_credentials
auth = tweepy.OAuthHandler(twitter_credentials.CONSUMER_KEY, twitter_credentials.CONSUMER_SECRET)
auth.set_access_token(twitter_credentials.ACCESS_TOKEN, twitter_credentials.ACCESS_TOKEN_SECRET)
api = tweepy.API(auth)
stuff = api.user_timeline(screen_name = 'XXXXXXXXXX', count = 10, include_rts = False)
for status in stuff:
print(status.text)
Scraping Twitter (or any other social network) with for example Beautiful soup, as you said, is not a good idea for 2 reasons :
if the source pages changes (name attributes, div ids...), you have to keep your code up to date
your script can be banned because scraping is not "allowed".
To answer your questions :
1) you can store the tweets wherever you want : csv, mysql, sqlite, redis, neo4j...
2) With official API, you get JSON. Here is a Tweet Object : https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/tweet-object.html . With tweepy, for example status.text will give you the text of the tweet.
3) Same as #1. If you don't know actually what you will do with the data, store the full JSONs. You will be able later to parse them.
I suggest tweepy/python (http://www.tweepy.org/) or twit/nodejs (https://www.npmjs.com/package/twit). And read official docs : https://developer.twitter.com/en/docs/api-reference-index

How to predict twitter tweet reach using python?

How to predict twitter tweet reach on a specific area ? For e.g if a user tweets in china then how can i predict about its reach there is a website named keyhole.co they tells the total reach of tweets containing the hashtag that user searched for. Below is the screenshot that is the result when i searched for hashTag (#Retweet)
Basically, they are tells reach of a hashtag.
But the question is that how to predict the about tweet reach in a specific area (country or region).
Backgroud Research:
I know that i can check count of how many times the Tweet has been viewed through twitter Engagement API. I have aslo checked that there is a api called TwitterCounter that contains the Scripts for counting tweets. Uses 'search/tweets' or 'statuses/filter' Twitter resource to get old or new tweets, respectively.
There are also ways to get counts of retweets like discussed in this answer. Moreover, there are other ways to get views/reach of old tweets and track specific word or hashtag in tweets etc.
But my question is that how can i predict about a tweet reach ?
I have been searching for this for past several days but didn't able to find a way or solution to do it.
Can anyone please help me by telling the way to get what i want or tell me any solution by which i can do it.
Thanks in advance!

Dynamic extraction of tweets using python and tweepy that is whenever new tweet is posted it should be immediately extracted.

I am using Python and tweepy API to extract tweets of specific keywords. However, now I want to extract tweets from a single page but the problem is that tweet should automatically extracted whenever that page posts the new tweet. Can anyone please tell me how to do that as I don't know when the new tweet would be posted?

How can I scrape twitter's tweets that go 2 years back by scraping?

I've been looking into scraping, and I cant manage to scrape twitter searches that date long way back by using python code, i can do it however with an addon for chrome but it falls short since it will only let me obtain a very limited amount of tweets. can anybody point me in the right direction¿
I found a related answer here:
https://stackoverflow.com/a/6082633/1449045
you can get up to 1500 tweets for a search; 3200 for a particular user's timeline
source: https://dev.twitter.com/docs/things-every-developer-should-know
see "There are pagination limits"
Here you can find a list of libraries to simplify the use of the APIs,
in different languages, including python
https://dev.twitter.com/docs/twitter-libraries
You can use snscrape. It doesn't require a Twitter Developer Account it can go back many years.

Categories

Resources