I'm running the below code that was given to me by an instructor to grab the status based off the tweet_id in another dataframe I've imported already. When running the code, everything comes back Failed. I don't receive any errors so I'm not sure what I'm missing. When I requested my twitter developer access I didn't have to answer a ton of questions like I've seen other people say they've had to do, so I'm curious if it's just not enough access?
import tweepy
from tweepy import OAuthHandler
import json
from timeit import default_timer as timer
# Query Twitter API for each tweet in the Twitter archive and save JSON in a text file
# These are hidden to comply with Twitter's API terms and conditions
consumer_key = 'HIDDEN'
consumer_secret = 'HIDDEN'
access_token = 'HIDDEN'
access_secret = 'HIDDEN'
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)
api = tweepy.API(auth, wait_on_rate_limit=True)
# NOTE TO STUDENT WITH MOBILE VERIFICATION ISSUES:
# df_1 is a DataFrame with the twitter_archive_enhanced.csv file. You may have to
# change line 17 to match the name of your DataFrame with twitter_archive_enhanced.csv
# NOTE TO REVIEWER: this student had mobile verification issues so the following
# Twitter API code was sent to this student from a Udacity instructor
# Tweet IDs for which to gather additional data via Twitter's API
tweet_ids = twitter_archive.tweet_id.values
len(tweet_ids)
# Query Twitter's API for JSON data for each tweet ID in the Twitter archive
count = 0
fails_dict = {}
start = timer()
# Save each tweet's returned JSON as a new line in a .txt file
with open('tweet_json.txt', 'w') as outfile:
# This loop will likely take 20-30 minutes to run because of Twitter's rate limit
for tweet_id in tweet_ids:
count += 1
print(str(count) + ": " + str(tweet_id))
try:
tweet = api.get_status(tweet_id, tweet_mode='extended')
print("Success")
json.dump(tweet._json, outfile)
outfile.write('\n')
except tweepy.TweepError as e:
print("Fail")
fails_dict[tweet_id] = e
pass
end = timer()
print(end - start)
print(fails_dict)
Related
I have a developer account as an academic and my profile page on twitter has Elevated on top of it, but when I use Tweepy to access the tweets, it only scrapes tweets from 7 days ago. How can I extend my access up to 2006?
This is my code:
import tweepy
from tweepy import OAuthHandler
import pandas as pd
access_token = '#'
access_token_secret = '#'
API_key = '#'
API_key_secret = '#'
auth = tweepy.OAuthHandler(API_key, API_key_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth, wait_on_rate_limit=True)
tweets = []
count = 1
for tweet in tweepy.Cursor(api.search_tweets, q= "#SEARCHQUERY", count=5000).items(50000):
print(count)
count += 1
try:
data = [tweet.created_at, tweet.id, tweet.text,
tweet.user._json['screen_name'], tweet.user._json['name'], tweet.user._json['created_at'], tweet.entities['urls']]
data = tuple(data)
tweets.append(data)
except tweepy.TweepError as e:
print(e.reason)
continue
except StopIteration:
break
df = pd.DataFrame(tweets, columns = ['created_at','tweet_id', 'tweet_text', 'screen_name', 'name', 'account_creation_date', 'urls'])
df.to_csv(path_or_buf = 'local address/file.csv', index=False)
The Search All endpoint is available in Twitter API v2, which is represented by the tweepy.Client object (you are using tweepy.api).
The most important thing is that you require Academic research access from Twitter. Elevated access grants addition request volume, and access to the v1.1 APIs on top of v2 (Essential) access, but you will need an account and Project with Academic access to call the endpoint. There's a process to apply for that in the Twitter Developer Portal.
I am trying to save only Urdu language tweets using tweepy in python. I am using 3.6 version of python. The data in Urdu language is not saving in the file if I open the file I am only been able to see only the usernames, not tweets in Urdu. this is my code
The same code is working for the English language.
import re
import io
import csv
import tweepy
from tweepy import OAuthHandler
#from textblob import TextBlob
consumer_key = "xxxxxxxxxxxxxxxxxxxxx"
consumer_secret = "xxxxxxxxxxxxxxxxxx"
access_key = "xxxxxxxxxxxxxxxxxxxxx"
access_secret = "xxxxxxxxxxxxxxxxxxxx"
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
# set access token and secret
auth.set_access_token(access_key, access_secret)
# create tweepy API object to fetch tweets
api = tweepy.API(auth)
def get_tweets(query, count = 300):
# empty list to store parsed tweets
tweets = []
target = io.open("newsfileurdu.csv", 'w', encoding='utf-8')
# call twitter api to fetch tweets
q=str(query)
a=str(q+" اردو")
b=str(q+" خبریں")
c=str(q+" خبریں اردو")
fetched_tweets = api.search(a, count = count)+ api.search(b, count = count)+ api.search(c, count = count)
# parsing tweets one by one
print(len(fetched_tweets))
for tweet in fetched_tweets:
# empty dictionary to store required params of a tweet
parsed_tweet = {}
# saving text of tweet
parsed_tweet['text'] = tweet.text
if "http" not in tweet.text:
line = re.sub("[^A-Za-z]", " ", tweet.text)
target.write(line+"\n")
return tweets
# creating object of TwitterClient Class
# calling function to get tweets
tweets = get_tweets(query ="", count = 20000)
The language of the desired tweets(urdu in this case) can be extracted by passing the lang="ur" to the api.search function or use the code snippet below:
api.search(q=<query goes here>, lang ['ur'], tweet_mode='extended', count=<tweets_per_query>)
The tweepy search API specifies the "lang" parameter using ISO 639-1 code (so for urdu it is "ur"). So for any desired language, the tweets can be extracted by just searching in the language code here, and passing that code to the lang parameter.
I have specified to extract tweets since a specified date but I also need to extract tweets before a specified date. The since keyword is used to extract tweets since the given date. SO there must be a keyword which extracts tweets before the specified date. What is that keyword and how to use it?
import tweepy
import csv
import pandas as pd
####input your credentials here
consumer_key = ''
consumer_secret = ''
access_token = ''
access_token_secret = ''
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth,wait_on_rate_limit=True)
csvFile = open('demon4.csv', 'a')
csvWriter = csv.writer(csvFile)
for tweet in tweepy.Cursor(api.search,q="#unitedAIRLINES",count=100,lang="en",\
since="2017-04-03").items():
print (tweet.created_at, tweet.text)
csvWriter.writerow([tweet.created_at, tweet.text.encode('utf-8')])
In the "q" parameter you can use "since" and "until" like this :
q="#unitedAIRLINES since:2017-04-02 until:2017-04-03"
The result shoud be the same as this advanced search on the official web site :
https://twitter.com/search?f=tweets&vertical=default&q=%23unitedAIRLINES%20since%3A2017-04-02%20until%3A2017-04-03&src=typd
Except that with the public search API you just can get 7 days past.
You can either use a specific tweet id as a starting point. The parameter is "since_id". And a "max_id" to delimit the period. For more information, see : https://developer.twitter.com/en/docs/tweets/search/api-reference/get-search-tweets.html
I'm trying to do a network of my followers in twitter with Python and tweepy. My problem is that I'm not obtaining all the followers for each user oly a few. This is the code:
import tweepy
# Copy the api key, the api secret, the access token and the access token secret from the relevant page on your Twitter app
api_key = 'xxxx'
api_secret = 'xxxx'
access_token = 'x-x'
access_token_secret = 'xxxx'
# You don't need to make any changes below here # This bit authorises you to ask for information from Twitter
auth = tweepy.OAuthHandler(api_key, api_secret)
auth.set_access_token(access_token, access_token_secret)
# The api object gives you access to all of the http calls that Twitter accepts
api = tweepy.API(auth)
#User we want to use as initial node
user='xxxx'
import csv
import time
#This creates a csv file and defines that each new entry will be in a new line
csvfile=open(user+'network.csv', 'w')
spamwriter = csv.writer(csvfile, delimiter=' ',quotechar='|', quoting=csv.QUOTE_MINIMAL)
#This is the function that takes a node (user) and looks for all its followers #and print them into a CSV file... and look for the followers of each follower...
def fib(n,user,spamwriter):
if n>0:
#There is a limit to the traffic you can have with the API, so you need to wait
#a few seconds per call or after a few calls it will restrict your traffic
#for 15 minutes. This parameter can be tweeked
time.sleep(40)
try:
users=api.followers(user)
for follower in users:
print(follower.screen_name)
spamwriter.writerow([user+';'+follower.screen_name])
fib(n-1,follower.screen_name,spamwriter)
#n defines the level of autorecurrence
except tweepy.TweepError:
print("Failed to run the command on that user, Skipping...")
n=2
fib(n,user,spamwriter)
API.followers([id/screen_name]) only returns followers 100 at a time.
Try:
API.followers_ids(id/screen_name/user_id)
It will return a list of ID's for all the people following the specified user. Just put your ID in the parameters.
Im doing Twitter sentiment research at the moment. For this reason, I'm using the Twitter API to download all tweets on certain keywords. But my current code is taking a lot of time to create a large datafile, so I was wondering if there's a faster method.
This is what Im using right now:
__author__ = 'gerbuiker'
import time
#Import the necessary methods from tweepy library
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
#Variables that contains the user credentials to access Twitter API
access_token = "XXXXXXXXXXXXX"
access_token_secret = "XXXXXXXX"
consumer_key = "XXXXX"
consumer_secret = "XXXXXXXXXXXXXX"
#This is a basic listener that just prints received tweets to stdout.
class StdOutListener(StreamListener):
def on_data(self, data):
try:
#print data
tweet = data.split(',"text":"')[1].split('","source')[0]
print tweet
saveThis = str(time.time())+ '::'+ tweet #saves time+actual tweet
saveFile = open('twitiamsterdam.txt','a')
saveFile.write(saveThis)
saveFile.write('\n')
saveFile.close()
return True
except BaseException, e:
print 'failed ondata,',str(e)
time.sleep(5)
def on_error(self, status):
print status
if __name__ == '__main__':
#This handles Twitter authetification and the connection to Twitter Streaming API
l = StdOutListener()
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
stream = Stream(auth, l)
#This line filter Twitter Streams to capture data by the keywords: 'Amsterdam'
stream.filter(track=['KEYWORD which i want to check'])
This gets me about 1500 tweets in one hour, for a pretty popular keyword (Amsterdam). Does anyone now a faster method in Python?
To be clear: I want to download all tweets on a certain subject for last month/year for example. So the newest tweets don't have to keep coming in, the most recent ones for a period would be sufficient. Thanks!
I need something similar to this for an academic research.
We're you able to fix it?
Would it be possible to specify a custom range of time from which to pull the data?
Sorry for asking here, but couldn't send you private messages.