Timespan for Elevated Access to Historical Twitter Data - python

I have a developer account as an academic and my profile page on twitter has Elevated on top of it, but when I use Tweepy to access the tweets, it only scrapes tweets from 7 days ago. How can I extend my access up to 2006?
This is my code:
import tweepy
from tweepy import OAuthHandler
import pandas as pd
access_token = '#'
access_token_secret = '#'
API_key = '#'
API_key_secret = '#'
auth = tweepy.OAuthHandler(API_key, API_key_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth, wait_on_rate_limit=True)
tweets = []
count = 1
for tweet in tweepy.Cursor(api.search_tweets, q= "#SEARCHQUERY", count=5000).items(50000):
print(count)
count += 1
try:
data = [tweet.created_at, tweet.id, tweet.text,
tweet.user._json['screen_name'], tweet.user._json['name'], tweet.user._json['created_at'], tweet.entities['urls']]
data = tuple(data)
tweets.append(data)
except tweepy.TweepError as e:
print(e.reason)
continue
except StopIteration:
break
df = pd.DataFrame(tweets, columns = ['created_at','tweet_id', 'tweet_text', 'screen_name', 'name', 'account_creation_date', 'urls'])
df.to_csv(path_or_buf = 'local address/file.csv', index=False)

The Search All endpoint is available in Twitter API v2, which is represented by the tweepy.Client object (you are using tweepy.api).
The most important thing is that you require Academic research access from Twitter. Elevated access grants addition request volume, and access to the v1.1 APIs on top of v2 (Essential) access, but you will need an account and Project with Academic access to call the endpoint. There's a process to apply for that in the Twitter Developer Portal.

Related

Tweepy API Failing For Every ID

I'm running the below code that was given to me by an instructor to grab the status based off the tweet_id in another dataframe I've imported already. When running the code, everything comes back Failed. I don't receive any errors so I'm not sure what I'm missing. When I requested my twitter developer access I didn't have to answer a ton of questions like I've seen other people say they've had to do, so I'm curious if it's just not enough access?
import tweepy
from tweepy import OAuthHandler
import json
from timeit import default_timer as timer
# Query Twitter API for each tweet in the Twitter archive and save JSON in a text file
# These are hidden to comply with Twitter's API terms and conditions
consumer_key = 'HIDDEN'
consumer_secret = 'HIDDEN'
access_token = 'HIDDEN'
access_secret = 'HIDDEN'
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)
api = tweepy.API(auth, wait_on_rate_limit=True)
# NOTE TO STUDENT WITH MOBILE VERIFICATION ISSUES:
# df_1 is a DataFrame with the twitter_archive_enhanced.csv file. You may have to
# change line 17 to match the name of your DataFrame with twitter_archive_enhanced.csv
# NOTE TO REVIEWER: this student had mobile verification issues so the following
# Twitter API code was sent to this student from a Udacity instructor
# Tweet IDs for which to gather additional data via Twitter's API
tweet_ids = twitter_archive.tweet_id.values
len(tweet_ids)
# Query Twitter's API for JSON data for each tweet ID in the Twitter archive
count = 0
fails_dict = {}
start = timer()
# Save each tweet's returned JSON as a new line in a .txt file
with open('tweet_json.txt', 'w') as outfile:
# This loop will likely take 20-30 minutes to run because of Twitter's rate limit
for tweet_id in tweet_ids:
count += 1
print(str(count) + ": " + str(tweet_id))
try:
tweet = api.get_status(tweet_id, tweet_mode='extended')
print("Success")
json.dump(tweet._json, outfile)
outfile.write('\n')
except tweepy.TweepError as e:
print("Fail")
fails_dict[tweet_id] = e
pass
end = timer()
print(end - start)
print(fails_dict)

twitter python developer codes

in the process of extracting tweets from a twitter account, should the access keys be regenerated again and again ?
like i have tried this code:
import tweepy as tw
import pandas as pd
consumer_key = "xyz"
consumer_secret = "xyz"
access_token = "xyz"
access_token_secret = "xyz"
auth = tw.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tw.API(auth)
tweets = []
likes = []
time = []
cursor = tw.Cursor(api.user_timeline, id='CirusFoundation', tweet_mode = "extended").items(1)
for i in tw.Cursor(api.user_timeline, id='CirusFoundation', tweet_mode = "extended").items(200):
tweets.append(i.full_text)
likes.append(i.favourite_count)
time.append(i.created_at)
df = pd.DataFrame({'tweets':tweets, 'likes':likes, 'time':time})
now i get a forbidden: 403 error. as output
thanks in adv.
There are several reasons for a 403:
You are requesting a feature that is not covered by your twitter dev account. Tweepy may be predecated on having the elevated access.
You are trying to access from a non-ssl/https
You have exceeded your request quota.
Your access token was not generated properly (E.g if you are trying to write, you need to generate with r/w permissions)

Followers network with tweepy

I'm trying to do a network of my followers in twitter with Python and tweepy. My problem is that I'm not obtaining all the followers for each user oly a few. This is the code:
import tweepy
# Copy the api key, the api secret, the access token and the access token secret from the relevant page on your Twitter app
api_key = 'xxxx'
api_secret = 'xxxx'
access_token = 'x-x'
access_token_secret = 'xxxx'
# You don't need to make any changes below here # This bit authorises you to ask for information from Twitter
auth = tweepy.OAuthHandler(api_key, api_secret)
auth.set_access_token(access_token, access_token_secret)
# The api object gives you access to all of the http calls that Twitter accepts
api = tweepy.API(auth)
#User we want to use as initial node
user='xxxx'
import csv
import time
#This creates a csv file and defines that each new entry will be in a new line
csvfile=open(user+'network.csv', 'w')
spamwriter = csv.writer(csvfile, delimiter=' ',quotechar='|', quoting=csv.QUOTE_MINIMAL)
#This is the function that takes a node (user) and looks for all its followers #and print them into a CSV file... and look for the followers of each follower...
def fib(n,user,spamwriter):
if n>0:
#There is a limit to the traffic you can have with the API, so you need to wait
#a few seconds per call or after a few calls it will restrict your traffic
#for 15 minutes. This parameter can be tweeked
time.sleep(40)
try:
users=api.followers(user)
for follower in users:
print(follower.screen_name)
spamwriter.writerow([user+';'+follower.screen_name])
fib(n-1,follower.screen_name,spamwriter)
#n defines the level of autorecurrence
except tweepy.TweepError:
print("Failed to run the command on that user, Skipping...")
n=2
fib(n,user,spamwriter)
API.followers([id/screen_name]) only returns followers 100 at a time.
Try:
API.followers_ids(id/screen_name/user_id)
It will return a list of ID's for all the people following the specified user. Just put your ID in the parameters.

Python Twitter Streaming Timeline

****I am trying to obtain information from the twitter timeline of a specific user and I am trying to print the output in Json format, however I am getting an AttributeError: 'str' object has no attribute '_json'. I am new to python so I'm having troubles trying to resolve this so any help would be greatly appreciated. ****
Below shows the code that I have at the moment:
from __future__ import absolute_import, print_function
import tweepy
import twitter
def oauth_login():
# credentials for OAuth
CONSUMER_KEY = 'woIIbsmhE0LJhGjn7GyeSkeDiU'
CONSUMER_SECRET = 'H2xSc6E3sGqiHhbNJjZCig5KFYj0UaLy22M6WjhM5gwth7HsWmi'
OAUTH_TOKEN = '306848945-Kmh3xZDbfhMc7wMHgnBmuRLtmMzs6RN7d62o3x6i8'
OAUTH_TOKEN_SECRET = 'qpaqkvXQtfrqPkJKnBf09b48TkuTufLwTV02vyTW1kFGunu'
# Creating the authentication
auth = twitter.oauth.OAuth( OAUTH_TOKEN,
OAUTH_TOKEN_SECRET,
CONSUMER_KEY,
CONSUMER_SECRET )
# Twitter instance
twitter_api = twitter.Twitter(auth=auth)
return twitter_api
# LogIn
twitter_api = oauth_login()
# Get statuses
statuses = twitter_api.statuses.user_timeline(screen_name='#ladygaga')
# Print text
for status in statuses:
print (status['text']._json)
You seem to be mixing up tweepy with twitter, and are possibly getting a bit confused with methods as a result. The auth process for tweepy, from your code, should go as follows:
import tweepy
def oauth_login():
# credentials for OAuth
consumer_key = 'YOUR_KEY'
consumer_secret = 'YOUR_KEY'
access_token = 'YOUR_KEY'
access_token_secret = 'YOUR_KEY'
# Creating the authentication
auth = tweepy.OAuthHandler(consumer_key,
consumer_secret)
# Twitter instance
auth.set_access_token(access_token, access_token_secret)
return tweepy.API(auth)
# LogIn
twitter_api = oauth_login()
# Get statuses
statuses = twitter_api.user_timeline(screen_name='#ladygaga')
# Print text
for status in statuses:
print (status._json['text'])
If, as previously mentioned, you want to create a list of tweets, you could do the following rather than everything after # Print text
# Create a list
statuses_list = [status._json['text'] for status in statuses]
And, as mentioned in the comments, you shouldn't every give out your keys publicly. Twitter lets you reset them, which I'd recommend you do as soon as possible - editing your post isn't enough as people can still read your edit history.

Downloading all Tweets about certain subject in Python

Im doing Twitter sentiment research at the moment. For this reason, I'm using the Twitter API to download all tweets on certain keywords. But my current code is taking a lot of time to create a large datafile, so I was wondering if there's a faster method.
This is what Im using right now:
__author__ = 'gerbuiker'
import time
#Import the necessary methods from tweepy library
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
#Variables that contains the user credentials to access Twitter API
access_token = "XXXXXXXXXXXXX"
access_token_secret = "XXXXXXXX"
consumer_key = "XXXXX"
consumer_secret = "XXXXXXXXXXXXXX"
#This is a basic listener that just prints received tweets to stdout.
class StdOutListener(StreamListener):
def on_data(self, data):
try:
#print data
tweet = data.split(',"text":"')[1].split('","source')[0]
print tweet
saveThis = str(time.time())+ '::'+ tweet #saves time+actual tweet
saveFile = open('twitiamsterdam.txt','a')
saveFile.write(saveThis)
saveFile.write('\n')
saveFile.close()
return True
except BaseException, e:
print 'failed ondata,',str(e)
time.sleep(5)
def on_error(self, status):
print status
if __name__ == '__main__':
#This handles Twitter authetification and the connection to Twitter Streaming API
l = StdOutListener()
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
stream = Stream(auth, l)
#This line filter Twitter Streams to capture data by the keywords: 'Amsterdam'
stream.filter(track=['KEYWORD which i want to check'])
This gets me about 1500 tweets in one hour, for a pretty popular keyword (Amsterdam). Does anyone now a faster method in Python?
To be clear: I want to download all tweets on a certain subject for last month/year for example. So the newest tweets don't have to keep coming in, the most recent ones for a period would be sufficient. Thanks!
I need something similar to this for an academic research.
We're you able to fix it?
Would it be possible to specify a custom range of time from which to pull the data?
Sorry for asking here, but couldn't send you private messages.

Categories

Resources