I've been using the Twitter API (with Python) for quite some time, but I'm unable to search for Twitter users having a specific criterion. For example, the API has several user attributes in the user JSON data it returns, like statuses_count or profile_link_color. But how do I do a reverse search using such parameters, like searching for users who have tweeeted more than 1000 times, or users who have created their accounts last week?
Based on the documentation, it looks like you can search for users that fulfill certain criteria with a GET users/search query:
Provides a simple, relevance-based search interface to public user
accounts on Twitter. Try querying by topical interest, full name,
company name, location, or other criteria.
Related
I've created an API using Flask-RESTFUL package for Python 3.7.
I'd like to know what the proper approach would be for returning data to a user based on which columns he should have access to.
For example, if I have an "orders" table with (order_id, order_date, price, ebay_name, revenue), but want User A and User B to have access to different data. Let's say that on route /get_data, I return all fields, but User A should have access to all data, while User B only can see the revenue field.
My current approach:
While building the JWT token when a user logins in and authenticates, would it be acceptable to store the column names of the "orders" table in the actual token? Then, when the user goes to the /get_data route, I would basically check the column names stored in the JWT and build the MySQL query with the column names found in the token (select all_columns_in_jwt from orders). I worry that exposing the table columns in the JWT token is not the best approach.
Another idea would be to check within a user permissions table each time the /get_data route is hit.
Does anyone have a suggestion for implementing this in a more efficient way?
I would do it using user permissions but you could also create separate procedures for either each individual user or groups of users based on their roles/permissions which create/replace specific views of the tables allowing the user to see only what you want them to see. For example, upon any changes to the table (or alternatively, upon login for a user/member of a user group) you could run the procedure to generate (or replace if it already exists) a view on the table for either each user or each user group which you want to restrict access to, and then your API would select data from the view rather than directly from the table
With user permissions (this is very similar to how I implement it in my own apps) you could create a permissions table and then a table of the permissions which users possess. Then, on using an API, you could query the user's permissions and using a map which you store somewhere in your code base you could look up the columns based on a combination of the relevant user permission and the relevant table and use the resulting column set as what you select in your query
I'm using Google App Engine (python) for the backend of a mobile social game. The game uses Twitter integration to allow people to follow relative leaderboards and play against their friends or followers.
By far the most expensive piece of the puzzle is the background (push) task that hits the Twitter API to query for the friends and followers of a given user, and then stores that data within our datastore. I'm trying to optimize that to reduce costs as much as possible.
The Data Model:
There are three main models related to this portion of the app:
User
'''General user info, like scores and stats'''
# key id => randomly generated string that uniquely identifies a user
# along the lines of user_kdsgj326
# (I realize I probably should have just used the integer ID that GAE
# creates, but its too late for that)
AuthAccount
'''Authentication mechanism.
A user may have multiple auth accounts- one for each provider'''
# key id => concatenation of the auth provider and the auth provider's unique
# ID for that user, ie, "tw:555555", where '555555' is their twitter ID
auth_id = ndb.StringProperty(indexed=True) # ie, '555555'
user = ndb.KeyProperty(kind=User, indexed=True)
extra_data = ndb.JsonProperty(indexed=False) # twitter picture url, name, etc.
RelativeUserScore
'''Denormalization for quickly generated relative leaderboards'''
# key id => same as their User id, ie, user_kdsgj326, so that we can quickly
# retrieve the object for each user
follower_ids = ndb.StringProperty(indexed=True, repeated=True)
# misc properties for the user's score, name, etc. needed for leaderboard
I don't think its necessary for this question, but just in case, here is a more detailed discussion that led to this design.
The Task
The background thread receives the twitter authentication data and requests a chunk of friend IDs from the Twitter API, via tweepy. Twitter sends up to 5000 friend IDs by default, and I'd rather not arbitrarily limit that more if I can avoid it (you can only make so many requests to their API per minute).
Once I get the list of the friend IDs, I can easily translate that into "tw:" AuthAccount key IDs, and use get_multi to retrieve the AuthAccounts. Then I remove all of the Null accounts for twitter users not in our system, and get all the user IDs for the twitter friends that are in our system. Those ids are also the keys of the RelativeUserScores, so I use a bunch of transactional_tasklets to add this user's ID to the RelativeUserScore's followers list.
The Optimization Questions
The first thing that happens is a call to Twitter's API. Given that this is required for everything else in the task, I'm assuming I would not get any gains in making this asynchronous, correct? (GAE is already smart enough to use the server for handling other tasks while this one blocks?)
When determining if a twitter friend is playing our game, I currently convert all twitter friend ids to auth account IDs, and retrieve by get_multi. Given that this data is sparse (most twitter friends will most likely not be playing our game), would I be better off with a projection query that just retrieves the user ID directly? Something like...
twitter_friend_ids = twitter_api.friend_ids() # potentially 5000 values
friend_system_ids = AuthAccount\
.query(AuthAccount.auth_id.IN(twitter_friend_ids))\
.fetch(projection=[AuthAccount.user_id])
(I can't remember or find where, but I read this is better because you don't waste time attempting to read model objects that don't exist
Whether I end up using get_multi or a projection query, is there any benefit to breaking up the request into multiple async queries, instead of trying to get / query for potentially 5000 objects at once?
I would organize the task like this:
Make an asynchronous fetch call to the Twitter feed
Use memcache to hold all the AuthAccount->User data:
Request the data from memcache, if it doesn't exist then make a fetch_async() call to the AuthAccount to populate memcache and a local dict
Run each of the twitter IDs through the dict
Here is some sample code:
future = twitter_api.friend_ids() # make this asynchronous
auth_users = memcache.get('auth_users')
if auth_users is None:
auth_accounts = AuthAccount.query()
.fetch(projection=[AuthAccount.auth_id,
AuthAccount.user_id])
auth_users = dict([(a.auth_id, a.user_id) for a in auth_accounts])
memcache.add('auth_users', auth_users, 60)
twitter_friend_ids = future.get_result() # get async twitter results
friend_system_ids = []
for id in twitter_friend_ids:
friend_id = auth_users.get("tw:%s" % id)
if friend_id:
friend_system_ids.append(friend_id)
This is optimized for a relatively smaller number of users and a high rate of requests. Your comments above indicate a higher number of users and a lower rate of requests, so I would only make this change to your code:
twitter_friend_ids = twitter_api.friend_ids() # potentially 5000 values
auth_account_keys = [ndb.Key("AuthAccount", "tw:%s" % id) for id in twitter_friend_ids]
friend_system_ids = filter(None, ndb.get_multi(auth_account_keys))
This will use ndb's built-in memcache to hold data when using get_multi() with keys.
I have a large amount of users (over 400k) that have been sent a survey to complete. As part of logging into my site I'm using the surveymonkey api to check to see if they completed their assigned survey. I'm keying on email address. I'm thinking of using:
https://developer.surveymonkey.com/mashery/get_respondent_list
however, I don't want to page through all 400k users to find a specific email - anyway to do this search more efficiently?
Using django backend to ping the surveymonkey api
get_respondent_list allows you to search for respondents by modified date/time range. For 400K respondents, you should store the results in a local database and only query the API when the email address you're looking for isn't found locally.
To avoid having to parse the whole list every time, you should only get new respondents since the last time your checked using that date/time range feature and add the new respondents to your DB. There is some example code which illustrates polling for new respondents based on date/time range on SurveyMonkey's public GitHub here:
https://github.com/SurveyMonkey/python_guides/blob/master/guides/polling.py
I had read about an app called citycounds.fm, which is no longer active, where they made city-based playlists. Unfortunately, I can't seem to find any way to search for tracks by city in the soundcloud api documentation.
Any one know if this is possible?
You can't filter tracks by city. The city is actually stored with the user. So you would have to search for the tracks you want, then perform an additional step to check if the user for each of the tracks is from the city you want.
I wanted to do something similar, but too many users do not have their city saved in their profile so the results are very limited.
I'm wondering if someone could help guide the approach to this fairly common problem:
I'm building a simple site which a user connects their twitter account to sign up. I'd like to create an interface which shows them which of their twitter friends are already using the site.
So I can get a list the user's twitter friends, and a list of the site's users (which all have the twitter screen name as username, but I'm wondering the most efficient method to compare these lists and create a variable with the commonalities.
As an aside, given the Twitter API returns IDs, should I save the twitter user's ID (in addition to their username) when they create an account?
Thanks in advance!
Create Sets out of them, and use the intersection method:
intersection_set = set(list_a).intersection(list_b)
You should store the twitter user's ID because the username can change at any time, but the id will always be the same. You should be comparing the id's, not the usernames in the intersection_set that Ofri recommends.