I am using Tweepy to get all tweets made by #UserName. This is the following code
import urllib, json
import sys
import tweepy
from tweepy import OAuthHandler
def twitter_fetch(screen_name = "prateek",maxnumtweets=10):
consumer_token = "" #Keys removed for security
consumer_secret = ""
access_token = ""
access_secret = ""
auth = tweepy.OAuthHandler(consumer_token,consumer_secret)
auth.set_access_token(access_token,access_secret)
api = tweepy.API(auth)
for status in tweepy.Cursor(api.user_timeline,id=screen_name).items(1):
print status['statuses_count']
print '\n'
if __name__ == '__main__':
twitter_fetch('BarackObama',200)
How do I parse the JSON properly to read the Number of statuses made by that particular user ?
How about something that keeps track of how many statuses you've iterated through? I'm not positive how tweepy works, but using something like this:
statuses = 0
for status in tweepy.Cursor(api.user_timeline,id=screen_name).items(1):
print status['statuses_count']
statuses += 1
print '\n'
return statuses
Usually JSON data has a nice structure, with clear formatting like the following, making it easier to understand.
So when I want to iterate through this list to find if an x exists (achievement, in this case), I use this function, which adds 1 to index every iteration it goes through.
def achnamefdr(appid,mykey,steamid64,achname):
playerachurl = 'http://api.steampowered.com/ISteamUserStats/GetPlayerAchievements/v0001/?appid=' + str(appid) + '&key=' + mykey + '&steamid=' + steamid64 + '&l=name'
achjson = json.loads(urllib.request.urlopen(playerachurl).read().decode('utf-8'))
achjsonr = achjson['playerstats']['achievements']
index = 0
for ach in achjsonr:
if not ach['name'].lower() == achname.lower():
index += 1
continue
else:
achnamef = ach['name']
return achnamef, index, True
return 'Invalid Achievement!', index, False
It can be done by getting the JSON object from status._json and then parsing it..
print status._json["statuses_count"]
Related
i am using a code that i found in github, i had to modify somethings, it works, but sometimes (even when working) it gives Index error page out of range and then stop working.
File "bot.py", line 36, in module
imageSource = pageTable[arrayNum]["file_url"]
IndexError: list index out of range
Here is my code
import time
import requests
import tweepy
import urllib
import os
import random
page = 1
url = 'https://danbooru.donmai.us/posts.json?tags=shimakaze_(kantai_collection) rating:s&limit=1000&page='
consumer_key = ''
consumer_secret = ''
access_key = ''
access_secret = ''
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)
while True:
try:
random.seed()
jsURL = url + str(random.randint(1,1000))
response = requests.get(jsURL)
pageTable = response.json()
arrayNum = random.randint(0,5)
print arrayNum
imageSource = pageTable[arrayNum]["file_url"]
imageURL = imageSource
print imageURL
sourceURL = "http://danbooru.donmai.us/posts/" + str(pageTable[arrayNum]["id"])
print sourceURL
urllib.urlretrieve(imageURL, 'image.jpg')
.
tweetString = sourceURL + " "
api.update_with_media('image.jpg', status=tweetString)
os.remove('image.jpg')
post. Limited to 500 requests/hour.
time.sleep(600)
except tweepy.error.TweepError:
print "Image too large, finding a different image.."
arrayNum = random.randint(0,5) line gives the error, that code generates a 0 - 5 numbers, and use as danbooru page, so i don't know why it gives an IndexError
It seems the response you get maybe empty sometimes. I've tried (which can be a possibility within your random range)
https://danbooru.donmai.us/posts.json?tags=shimakaze_(kantai_collection)%20rating:s&limit=1000&page=796
and get a response with empty list. So calling on an index with empty list will give you the index error. Check if the response is not empty.
You can only retrieve 100 user objects per request with the
api.lookup_users() method. Is there an easy way to retrieve more than 100 using Tweepy and Python? I have read this post: User ID to Username tweepy but it does not help with the more than 100 problem. I am pretty novice in Python so I cannot come up with a solution myself. What I have tried is this:
users = []
i = 0
num_pages = 2
while i < num_pages:
try:
# Look up a collection of ids
users.append(api.lookup_users(user_ids=ids[100*i:100*(i+1)-1]))
except tweepy.TweepError:
# We get a tweep error
print('Something went wrong, quitting...')
i = i + 1
where ids is a list containing the ids, but I get IndexError: list index out of range when I try to get a user with index higher than 100. If it helps I am only interested in getting the screen names from the user ids.
You're right that you need to send the tweets to the API in batches of 100, but you're ignoring the fact that you might not have an exact multiple of 100 tweets. Try the following:
import tweepy
def lookup_user_list(user_id_list, api):
full_users = []
users_count = len(user_id_list)
try:
for i in range((users_count / 100) + 1):
full_users.extend(api.lookup_users(user_ids=user_id_list[i*100:min((i+1)*100, users_count)]))
return full_users
except tweepy.TweepError:
print 'Something went wrong, quitting...'
results = lookup_user_list(ids, api)
By taking the minimum of results = lookup_user_list(user_ids, main_api) we ensure the final loop only gets the users left over. results will be a list of the looked-up users.
You may also hit rate limits - when setting up your API, you should take care to let tweepy catch these gracefully and remove some of the hard work, like so:
consumer_key = 'X'
consumer_secret = 'X'
access_token = 'X'
access_token_secret = 'X'
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)
I haven't tested it since I don't have access to the API.
But if you have a collection of user ids in any range, this should fetch all of them.
It fetches any remainder first, meaning if you have a list of 250 ids, it will fetch 50 users with the last 50 ids in the list.
Then it will fetch the remaining 200 users in batches of hundreds.
from tweepy import api, TweepError
users = []
user_ids = [] # collection of user ids
count_100 = int(len(user_ids) / 100) # amount of hundred user ids
if len(user_ids) % 100 > 0:
for i in range(0, count_100 + 1):
try:
if i == 0:
remainder = len(user_ids) % 100
users.append(api.lookup_users(user_ids=user_ids[:-remainder]))
else:
end_at = i * 100
start_at = end_at - 100
users.append(api.lookup_users(user_ids=user_ids[start_at:end_at]))
except TweepError:
print('Something went wrong, quitting...')
I written script for tweepy, where when I enter keyword like bse, nse it searches these keywords.
But when I enter hashtag #bse, #nse or user name #skirani , #mukund it gives no result.
I tried with both result_type, recent as well popular, no change. Is there any other result type?
def getTweets(self, data):
import pdb
# pdb.set_trace()
tweets = {}
if "query" in data:
tweets = self.twitter_api.search(data['query'], result_type='recent' )
# print type(tweets)
result = tweets['statuses']
count = 1
for val in result:
print count
print val['text']
count += 1
print
return tweets
Tweepy requires that search entities are URL encoded. That means that # needs to be %40 and # must be %23
A simple way to do this in Python is:
import urllib2
search = urllib2.quote("#skirani")
import requests
import json
# initial message
message = "if i can\'t let it go out of my mind"
# split into list
split_message = message.split()
def decrementList(words):
for w in [words] + [words[:-x] for x in range(1,len(words))]:
url = 'http://ws.spotify.com/search/1/track.json?q='
request = requests.get(url + "%20".join(w))
json_dict = json.loads(request.content)
num_results = json_dict['info']['num_results']
if num_results > 0:
num_removed = len(words) - len(w)
#track_title = ' '.join(words)
track_title = "If I Can't Take It with Me"
for value in json_dict.items():
if value == track_title:
print "match found"
return num_removed, json_dict
num_words_removed, json_dict = decrementList(split_message)
In the code below, I am trying to match the name of a song to my search query. In this particular query, the song will not match, but I have added a variable that will match the song for the returned query. The for loop at the end of the function is supposed to match the track title variable, but I can't figure out why it isn't working. Is there a simple way to find all values for a known key? In this case, the key is "name"
You have to search for the track title, within the tracks dictionary. So, just change your code like this
for value in json_dict["tracks"]:
if value["name"] == track_title:
it would print
match found
I'm trying to find a way to NOT get the same tweets using search API.
That's what I'm doing:
make a request to the Twitter
Store Tweets
make another request to the Twitter
Store Tweets,
Compare results from 2 and 4
Ideally in step 5 I would get 0, meaning that no overlapping tweets where received. So I'm not asking Twitter server for the same information more than once.
But I think I got stuck in step 3, where I have to make another call. I'm trying to use 'since_id' argument to get tweets after some certain points. But I'm not sure If the value that I'm using is correct.
Code:
import twitter
class Test():
def __init__(self):
self.t_auth()
self.hashtag = ['justinbieber']
self.tweets_1 = []
self.ids_1 = []
self.created_at_1 = []
self.tweet_text_1 = []
self.last_id_1 = ''
self.page_1 = 1
self.tweets_2 = []
self.ids_2 = []
self.created_at_2 = []
self.tweet_text_2 = []
self.last_id_2 = ''
self.page_2 = 1
for i in range(1,16):
self.tweets_1.extend(self.api.GetSearch(self.hashtag, per_page=100, since_id=self.last_id_1, page=self.page_1))
self.page_1 += 1;
print len(self.tweets_1)
for t in self.tweets_1:
self.ids_1.insert(0,t.id)
self.created_at_1.insert(0,t.created_at)
self.tweet_text_1.insert(0,t.text)
self.last_id_1 = t.id
self.last_id_2 = self.last_id_1
for i in range(1,16):
self.tweets_2.extend(self.api.GetSearch(self.hashtag, per_page=100, since_id=self.last_id_2, page=self.page_2))
self.page_2 += 1;
print len(self.tweets_2)
for t in self.tweets_2:
self.ids_2.insert(0,t.id)
self.created_at_2.insert(0,t.created_at)
self.tweet_text_2.insert(0,t.text)
self.last_id_2 = t.id
print 'Total number of tweets in test 1: ', len(self.tweets_1)
print 'Last id of test 1: ', self.last_id_1
print 'Total number of tweets in test 2: ', len(self.tweets_2)
print 'Last id of test 2: ', self.last_id_2
print '##################################'
print '#############OVERLAPING###########'
ids_overlap = set(self.ids_1).intersection(self.ids_2)
tweets_text_overlap = set(self.tweet_text_1).intersection(self.tweet_text_2)
created_at_overlap = set(self.created_at_1).intersection(self.created_at_2)
print 'Ids: ', len(ids_overlap)
print 'Text: ', len(tweets_text_overlap)
print 'Created_at: ', len(created_at_overlap)
print ids_overlap
print tweets_text_overlap
print created_at_overlap
def t_auth(self):
consumer_key="xxx"
consumer_secret="xxx"
access_key = "xxx"
access_secret = "xxx"
self.api = twitter.Api(consumer_key, consumer_secret ,access_key, access_secret)
self.api.VerifyCredentials()
return self.api
if __name__ == "__main__":
Test()
In addition to 'since_id', you can use 'max_id'. From the Twitter API documentation:
Iterating in a result set: parameters such count, until, since_id, max_id allow to control how we iterate through search results, since it could be a large set of tweets.
By setting these values dynamically, you can restrict your search results to not overlap. For example, max_id is set at 1100 and since_id is set at 1000, and then you will have tweets with IDs between those two values.