I am trying to save only Urdu language tweets using tweepy in python. I am using 3.6 version of python. The data in Urdu language is not saving in the file if I open the file I am only been able to see only the usernames, not tweets in Urdu. this is my code
The same code is working for the English language.
import re
import io
import csv
import tweepy
from tweepy import OAuthHandler
#from textblob import TextBlob
consumer_key = "xxxxxxxxxxxxxxxxxxxxx"
consumer_secret = "xxxxxxxxxxxxxxxxxx"
access_key = "xxxxxxxxxxxxxxxxxxxxx"
access_secret = "xxxxxxxxxxxxxxxxxxxx"
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
# set access token and secret
auth.set_access_token(access_key, access_secret)
# create tweepy API object to fetch tweets
api = tweepy.API(auth)
def get_tweets(query, count = 300):
# empty list to store parsed tweets
tweets = []
target = io.open("newsfileurdu.csv", 'w', encoding='utf-8')
# call twitter api to fetch tweets
q=str(query)
a=str(q+" اردو")
b=str(q+" خبریں")
c=str(q+" خبریں اردو")
fetched_tweets = api.search(a, count = count)+ api.search(b, count = count)+ api.search(c, count = count)
# parsing tweets one by one
print(len(fetched_tweets))
for tweet in fetched_tweets:
# empty dictionary to store required params of a tweet
parsed_tweet = {}
# saving text of tweet
parsed_tweet['text'] = tweet.text
if "http" not in tweet.text:
line = re.sub("[^A-Za-z]", " ", tweet.text)
target.write(line+"\n")
return tweets
# creating object of TwitterClient Class
# calling function to get tweets
tweets = get_tweets(query ="", count = 20000)
The language of the desired tweets(urdu in this case) can be extracted by passing the lang="ur" to the api.search function or use the code snippet below:
api.search(q=<query goes here>, lang ['ur'], tweet_mode='extended', count=<tweets_per_query>)
The tweepy search API specifies the "lang" parameter using ISO 639-1 code (so for urdu it is "ur"). So for any desired language, the tweets can be extracted by just searching in the language code here, and passing that code to the lang parameter.
Related
I'm running the below code that was given to me by an instructor to grab the status based off the tweet_id in another dataframe I've imported already. When running the code, everything comes back Failed. I don't receive any errors so I'm not sure what I'm missing. When I requested my twitter developer access I didn't have to answer a ton of questions like I've seen other people say they've had to do, so I'm curious if it's just not enough access?
import tweepy
from tweepy import OAuthHandler
import json
from timeit import default_timer as timer
# Query Twitter API for each tweet in the Twitter archive and save JSON in a text file
# These are hidden to comply with Twitter's API terms and conditions
consumer_key = 'HIDDEN'
consumer_secret = 'HIDDEN'
access_token = 'HIDDEN'
access_secret = 'HIDDEN'
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)
api = tweepy.API(auth, wait_on_rate_limit=True)
# NOTE TO STUDENT WITH MOBILE VERIFICATION ISSUES:
# df_1 is a DataFrame with the twitter_archive_enhanced.csv file. You may have to
# change line 17 to match the name of your DataFrame with twitter_archive_enhanced.csv
# NOTE TO REVIEWER: this student had mobile verification issues so the following
# Twitter API code was sent to this student from a Udacity instructor
# Tweet IDs for which to gather additional data via Twitter's API
tweet_ids = twitter_archive.tweet_id.values
len(tweet_ids)
# Query Twitter's API for JSON data for each tweet ID in the Twitter archive
count = 0
fails_dict = {}
start = timer()
# Save each tweet's returned JSON as a new line in a .txt file
with open('tweet_json.txt', 'w') as outfile:
# This loop will likely take 20-30 minutes to run because of Twitter's rate limit
for tweet_id in tweet_ids:
count += 1
print(str(count) + ": " + str(tweet_id))
try:
tweet = api.get_status(tweet_id, tweet_mode='extended')
print("Success")
json.dump(tweet._json, outfile)
outfile.write('\n')
except tweepy.TweepError as e:
print("Fail")
fails_dict[tweet_id] = e
pass
end = timer()
print(end - start)
print(fails_dict)
I have specified to extract tweets since a specified date but I also need to extract tweets before a specified date. The since keyword is used to extract tweets since the given date. SO there must be a keyword which extracts tweets before the specified date. What is that keyword and how to use it?
import tweepy
import csv
import pandas as pd
####input your credentials here
consumer_key = ''
consumer_secret = ''
access_token = ''
access_token_secret = ''
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth,wait_on_rate_limit=True)
csvFile = open('demon4.csv', 'a')
csvWriter = csv.writer(csvFile)
for tweet in tweepy.Cursor(api.search,q="#unitedAIRLINES",count=100,lang="en",\
since="2017-04-03").items():
print (tweet.created_at, tweet.text)
csvWriter.writerow([tweet.created_at, tweet.text.encode('utf-8')])
In the "q" parameter you can use "since" and "until" like this :
q="#unitedAIRLINES since:2017-04-02 until:2017-04-03"
The result shoud be the same as this advanced search on the official web site :
https://twitter.com/search?f=tweets&vertical=default&q=%23unitedAIRLINES%20since%3A2017-04-02%20until%3A2017-04-03&src=typd
Except that with the public search API you just can get 7 days past.
You can either use a specific tweet id as a starting point. The parameter is "since_id". And a "max_id" to delimit the period. For more information, see : https://developer.twitter.com/en/docs/tweets/search/api-reference/get-search-tweets.html
Im doing Twitter sentiment research at the moment. For this reason, I'm using the Twitter API to download all tweets on certain keywords. But my current code is taking a lot of time to create a large datafile, so I was wondering if there's a faster method.
This is what Im using right now:
__author__ = 'gerbuiker'
import time
#Import the necessary methods from tweepy library
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
#Variables that contains the user credentials to access Twitter API
access_token = "XXXXXXXXXXXXX"
access_token_secret = "XXXXXXXX"
consumer_key = "XXXXX"
consumer_secret = "XXXXXXXXXXXXXX"
#This is a basic listener that just prints received tweets to stdout.
class StdOutListener(StreamListener):
def on_data(self, data):
try:
#print data
tweet = data.split(',"text":"')[1].split('","source')[0]
print tweet
saveThis = str(time.time())+ '::'+ tweet #saves time+actual tweet
saveFile = open('twitiamsterdam.txt','a')
saveFile.write(saveThis)
saveFile.write('\n')
saveFile.close()
return True
except BaseException, e:
print 'failed ondata,',str(e)
time.sleep(5)
def on_error(self, status):
print status
if __name__ == '__main__':
#This handles Twitter authetification and the connection to Twitter Streaming API
l = StdOutListener()
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
stream = Stream(auth, l)
#This line filter Twitter Streams to capture data by the keywords: 'Amsterdam'
stream.filter(track=['KEYWORD which i want to check'])
This gets me about 1500 tweets in one hour, for a pretty popular keyword (Amsterdam). Does anyone now a faster method in Python?
To be clear: I want to download all tweets on a certain subject for last month/year for example. So the newest tweets don't have to keep coming in, the most recent ones for a period would be sufficient. Thanks!
I need something similar to this for an academic research.
We're you able to fix it?
Would it be possible to specify a custom range of time from which to pull the data?
Sorry for asking here, but couldn't send you private messages.
When I run it, the terminal keeps tying "23851" in new rows, which is the number of followers of the first Twitter name in my file f; I believe this means that the pointer was not moving in file f, but I'm not sure how this should be done properly in Python 2) when I check my file f1, there's nothing, i.e. the program is not writing to f1 as expected.
import tweepy
from tweepy import Stream
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
CONSUMER_KEY = 'xxx'
CONSUMER_SECRET = 'xxx'
ACCESS_KEY = 'xxx'
ACCESS_SECRET = 'xxx'
auth = OAuthHandler(CONSUMER_KEY,CONSUMER_SECRET)
api = tweepy.API(auth)
auth.set_access_token(ACCESS_KEY, ACCESS_SECRET)
#Create Class First
class TweetListener(StreamListener):
# A listener handles tweets are the received from the stream.
#This is a basic listener that just prints received tweets to standard output
def on_data(self, data): # indented inside the class
print(data)
return True
def on_error(self, status):
print(status)
# open both files outside the loop
with open('Twitternames.txt') as f,open('followers_number.txt', 'a') as f1:
for x in f:
#search
api = tweepy.API(auth)
twitterStream = Stream(auth,TweetListener())
test = api.lookup_users(screen_names=['x'])
for user in test:
print(user.followers_count)
#print it out and also write it into a file
s = user.followers_count
f1.write(str(s) +"\n") # add a newline with +
#end of stackoverflow
f.close()
Actually there are some things to consider, There are some unwanted lines as well. So I will go line by line and explain the relevant things ,as we don't need any streaming data for counting the number of follower , so we need to import only tweepy and OauthHandler, so :
import tweepy
from tweepy import OAuthHandler
Now we need to set the 4 keys required for login so, This will go same as :
CONSUMER_KEY = 'xxxxxxxx' #Replace with the original values.
CONSUMER_SECRET = 'xxx' #Replace with the original values.
ACCESS_KEY = 'xxx' #Replace with the original values.
ACCESS_SECRET = 'xxx' #Replace with the original values.
auth = OAuthHandler(CONSUMER_KEY,CONSUMER_SECRET)
auth.set_access_token(ACCESS_KEY, ACCESS_SECRET)
api = tweepy.API(auth)
I don't guess you would need, StreamListner to just log the follower_count of various users. So I am skipping that part, However you can add that code snippet afterwards.
usernames_file = open('Twitternames.txt').readlines()
I am assuming the contents of Twitternames.txt to be in the following format(every username without # symbol and separated by a new line):
user_name_1
user_name_2
user_name_3
...
now the usernames_file would be list of strings usernames_file= ['user_name_1\n', 'user_name_2\n', 'user_name_3\n'] so now we have extracted the various usernames from the text file, but we need to get rid of that \n character at the end of each name. So we can use .strip() method.
usernames = []
for i in usernames_file:
usernames.append(i.strip())
>>> usernames = ['user_name_1', 'user_name_2', 'user_name_3']
Now we are ready to use the lookup_users method as this method takes a list of usernames as input.
So it may look something like this:
test = api.lookup_users(screen_names=usernames)
for user in test:
print(user.followers_count)
If you want to log the results to a .txt file then you can use:
log_file = open("log.txt", 'a')
test = api.lookup_users(screen_names=usernames)
for user in test:
print(user.followers_count)
log_file.write(user.name+" has "+str(user.followers_count)+" followers.\n")
log_file.close()
So the short and final code would look something like this:
import tweepy
from tweepy import OAuthHandler
CONSUMER_KEY = 'xxx'
CONSUMER_SECRET = 'xxx'
ACCESS_KEY = 'xxx'
ACCESS_SECRET = 'xxx'
auth = OAuthHandler(CONSUMER_KEY,CONSUMER_SECRET)
auth.set_access_token(ACCESS_KEY, ACCESS_SECRET)
api = tweepy.API(auth)
usernames_file = open('Twitternames.txt').readlines()
usernames = []
for i in usernames_file:
usernames.append(i.strip())
log_file = open("log.txt", 'a')
test = api.lookup_users(screen_names=usernames)
for user in test:
print(user.followers_count)
log_file.write(user.name+" has "+str(user.followers_count)+" followers.\n")
log_file.close()
I've put together this short script to search twitter. The majority of the tweets from this search date back from a year ago. It was in connection with a kickstarter campaign. When I run this script though I only get newer tweets that aren't relevant to that term anymore. Could anybody tell me what I need to do to get it the way I want? When I search for the terms on twitter it gives me the right results.
import tweepy
import csv
consumer_key = 'x'
consumer_secret = 'x'
access_token = 'x'
access_token_secret = 'x'
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
# Open/Create a file to append data
csvFile = open('tweets.csv', 'a')
#Use csv Writer
csvWriter = csv.writer(csvFile)
api = tweepy.API(auth)
results = api.search(q="kickstarter campaign")
for result in results:
csvWriter.writerow([result.created_at, result.text.encode('utf-8')])
print result.created_at, result.text
csvFile.close()