I would like to follow certain users just to monitor their tweets in my program, but I don't know where to find the serial numbers that identify Twitter users. I have
follow_list = []
streamer.filter(follow = follow_list)
I know that the users are identified by strings like '1234567890', but I don't know where a list of these serial numbers is...
You need to use twitter's users/lookup API method, it will take usernames and return list of dicts with extended user data.
In tweepy there is lookup_users method, wrapping this API call. According to tweepy source, it should be somethin like:
users = tweepy.api.lookup_users(screen_names=['twitter', 'cleg'])
for user in users:
print(user)
Related
I have been downloading tonnes of tweets for the last few weeks. In order to reduce download time, I only saved tweet user ids not the user account. I need to pass them through a bot check but have now realised that 90% of the user ids are huge numbers (e.g. 1.25103113308656E+018) and cannot be used to search for the account.
Is there a way to convert these back to an account number?
Notes:
The tweet_id column is an equally huge, different number meaning they haven't been read into the wrong column.
When I raise them from the e notation into their raw number it still doesn't work.
I am limited by the week window of the twitter api so I must find a way of linking the data I have already got to individual accounts. This work is for a charitable cause and your help would be greatly appreciated.
The Tweepy API call returns a Response which contains the data in the _json field. You can parse the user key of the said json and extract the IDs and the screen name of the user and store it.
Then you can query the Tweepy api again as per their doc to get the user information.
Please make a note that when you store the ID field, you have to cast it to the String datatype.
I've been using the Twitter API (with Python) for quite some time, but I'm unable to search for Twitter users having a specific criterion. For example, the API has several user attributes in the user JSON data it returns, like statuses_count or profile_link_color. But how do I do a reverse search using such parameters, like searching for users who have tweeeted more than 1000 times, or users who have created their accounts last week?
Based on the documentation, it looks like you can search for users that fulfill certain criteria with a GET users/search query:
Provides a simple, relevance-based search interface to public user
accounts on Twitter. Try querying by topical interest, full name,
company name, location, or other criteria.
I'm using Google App Engine (python) for the backend of a mobile social game. The game uses Twitter integration to allow people to follow relative leaderboards and play against their friends or followers.
By far the most expensive piece of the puzzle is the background (push) task that hits the Twitter API to query for the friends and followers of a given user, and then stores that data within our datastore. I'm trying to optimize that to reduce costs as much as possible.
The Data Model:
There are three main models related to this portion of the app:
User
'''General user info, like scores and stats'''
# key id => randomly generated string that uniquely identifies a user
# along the lines of user_kdsgj326
# (I realize I probably should have just used the integer ID that GAE
# creates, but its too late for that)
AuthAccount
'''Authentication mechanism.
A user may have multiple auth accounts- one for each provider'''
# key id => concatenation of the auth provider and the auth provider's unique
# ID for that user, ie, "tw:555555", where '555555' is their twitter ID
auth_id = ndb.StringProperty(indexed=True) # ie, '555555'
user = ndb.KeyProperty(kind=User, indexed=True)
extra_data = ndb.JsonProperty(indexed=False) # twitter picture url, name, etc.
RelativeUserScore
'''Denormalization for quickly generated relative leaderboards'''
# key id => same as their User id, ie, user_kdsgj326, so that we can quickly
# retrieve the object for each user
follower_ids = ndb.StringProperty(indexed=True, repeated=True)
# misc properties for the user's score, name, etc. needed for leaderboard
I don't think its necessary for this question, but just in case, here is a more detailed discussion that led to this design.
The Task
The background thread receives the twitter authentication data and requests a chunk of friend IDs from the Twitter API, via tweepy. Twitter sends up to 5000 friend IDs by default, and I'd rather not arbitrarily limit that more if I can avoid it (you can only make so many requests to their API per minute).
Once I get the list of the friend IDs, I can easily translate that into "tw:" AuthAccount key IDs, and use get_multi to retrieve the AuthAccounts. Then I remove all of the Null accounts for twitter users not in our system, and get all the user IDs for the twitter friends that are in our system. Those ids are also the keys of the RelativeUserScores, so I use a bunch of transactional_tasklets to add this user's ID to the RelativeUserScore's followers list.
The Optimization Questions
The first thing that happens is a call to Twitter's API. Given that this is required for everything else in the task, I'm assuming I would not get any gains in making this asynchronous, correct? (GAE is already smart enough to use the server for handling other tasks while this one blocks?)
When determining if a twitter friend is playing our game, I currently convert all twitter friend ids to auth account IDs, and retrieve by get_multi. Given that this data is sparse (most twitter friends will most likely not be playing our game), would I be better off with a projection query that just retrieves the user ID directly? Something like...
twitter_friend_ids = twitter_api.friend_ids() # potentially 5000 values
friend_system_ids = AuthAccount\
.query(AuthAccount.auth_id.IN(twitter_friend_ids))\
.fetch(projection=[AuthAccount.user_id])
(I can't remember or find where, but I read this is better because you don't waste time attempting to read model objects that don't exist
Whether I end up using get_multi or a projection query, is there any benefit to breaking up the request into multiple async queries, instead of trying to get / query for potentially 5000 objects at once?
I would organize the task like this:
Make an asynchronous fetch call to the Twitter feed
Use memcache to hold all the AuthAccount->User data:
Request the data from memcache, if it doesn't exist then make a fetch_async() call to the AuthAccount to populate memcache and a local dict
Run each of the twitter IDs through the dict
Here is some sample code:
future = twitter_api.friend_ids() # make this asynchronous
auth_users = memcache.get('auth_users')
if auth_users is None:
auth_accounts = AuthAccount.query()
.fetch(projection=[AuthAccount.auth_id,
AuthAccount.user_id])
auth_users = dict([(a.auth_id, a.user_id) for a in auth_accounts])
memcache.add('auth_users', auth_users, 60)
twitter_friend_ids = future.get_result() # get async twitter results
friend_system_ids = []
for id in twitter_friend_ids:
friend_id = auth_users.get("tw:%s" % id)
if friend_id:
friend_system_ids.append(friend_id)
This is optimized for a relatively smaller number of users and a high rate of requests. Your comments above indicate a higher number of users and a lower rate of requests, so I would only make this change to your code:
twitter_friend_ids = twitter_api.friend_ids() # potentially 5000 values
auth_account_keys = [ndb.Key("AuthAccount", "tw:%s" % id) for id in twitter_friend_ids]
friend_system_ids = filter(None, ndb.get_multi(auth_account_keys))
This will use ndb's built-in memcache to hold data when using get_multi() with keys.
I'm trying to use the Twitter API in order to automatically fetch the last tweet of a given user, but I'm having trouble :/
I am using this library : https://code.google.com/p/python-twitter/
I installed it, everything seems working, but when I try to fetch the timeline of a user, I only get all my timeline :(
Here is my code:
import twitter
api = twitter.Api(consumer_key='***', consumer_secret='****', access_token_key='***', access_token_secret='****')
statuses = api.GetUserTimeline('#twitterapi')
print [s.text for s in statuses]
Is there something I missed ?
I believe you have to provide the userId rather than the screen_name in order for the GetUserTimeLine to work.
Also, although you might expect that this would return the equivalent of that user's home status page, it does not. Instead, it returns just the tweets from that user.
The twitter API documentation mentions another method - GetFriendsTimeline, but, despite being listed in the documentation, it doesn't seem to exist as far as I can tell.
You must explicitly enter screen_name= or user_id= otherwise the value defaults to the authenticated user.
Examples:
statuses = api.GetUserTimeline(screen_name='some_handle')
or
statuses = api.GetUserTimeline(user_id=22233344)
I'm wondering if someone could help guide the approach to this fairly common problem:
I'm building a simple site which a user connects their twitter account to sign up. I'd like to create an interface which shows them which of their twitter friends are already using the site.
So I can get a list the user's twitter friends, and a list of the site's users (which all have the twitter screen name as username, but I'm wondering the most efficient method to compare these lists and create a variable with the commonalities.
As an aside, given the Twitter API returns IDs, should I save the twitter user's ID (in addition to their username) when they create an account?
Thanks in advance!
Create Sets out of them, and use the intersection method:
intersection_set = set(list_a).intersection(list_b)
You should store the twitter user's ID because the username can change at any time, but the id will always be the same. You should be comparing the id's, not the usernames in the intersection_set that Ofri recommends.