Asynchronous Python Classes

Asynchronous Python Classes - python

I'm a real coding n00b so apologies if this is a simple or basic question.
I'm coding on Python, Webapp, Appengine.
My question, is it possible to carry on working after I've written out the page? And is that the best way to do something? Basically, when someone creates a list on my site (www.7bks.com) I want to carry on working for a little bit to do some 'post-processing' on the books they've just chosen.
Currently I have something like this (pseudocode!)
class InputList(webapp.RequestHandler):
def post(self):
#get books data from the post and put in datastore
list = List()
list = self.request.get('books')
list.put()
#redirect the user to the new list
self.redirect("http://www.7bks.com/list/&s" % list.id)
Now, I have some slow (involving API calls) post-processing I want to do to each book in the list. I don't want to slow down redirecting the user and generating the list page because my post-processing doesn't directly affect the list page. So could I do this?
class InputList(webapp.RequestHandler):
def post(self):
#get books data from the post and put in datastore
list = List()
list = self.request.get('books')
list.put()
#redirect the user to the new list
self.redirect("http://www.7bks.com/list/&s" % list.id)
#carry on working behind the scenes independently of the user
for book in list:
data = heavyprocessing(book)
Would that cause my application to effectively serve the redirect and then carry on working behind the scenes?
Are there better ways of doing this? I'm aware I could use CRON but I'd like this heavy processed data fairly soon after I create the list, but not immediately. Feels like Cron might not be the right answer (unless I have a CRON script to run every minute or so and check for new books to process?)
I know it's not really async my question but I couldn't think of a good way to phrase it. I'm sure there's some standard terminology for this kind of thing but I don't know what it is. Thanks :)
Tom

Check out the Task Queue API.
Task or message queues tend to be the way to go about doing some sort of work that is initiated by a user request, but not necessarily completed within the time frame of that request's execution.

Related

How do I start using the RAWG api using rawgpy?

I'd like to make a database of all the games I play with their developers/publishers/platforms/etc... And I am sure that the RAWG api is the way to do that.
I'm experienced with python but I've never used an API before, here is the code I used from the quickstart guide:
import rawgpy
rawg = rawgpy.RAWG("User-Agent, this should identify your app")
results = rawg.search("Warframe") # defaults to returning the top 5 results
game = results[0]
game.populate() # get additional info for the game
print(game.name)
print(game.description)
for store in game.stores:
print(store.url)
rawg.login("someemail#example.com", "somepassword")
me = rawg.current_user()
print(me.name) # print my name, equivalent to print(self.username)
me.populate() # gets additional info for the user
for game in me.playing:
print(game.name) # prints all the games i'm currently playing
However I don't know what to use as my user agent in the second line. Any help would be much appreciated.
Here is the link to the quickstart guide

It's a bit tricky to tell from their documentation, but typically this means you just need to put in a user agent that shows the API your an app calling their api. Lots of APIs block default user agents to prevent spam and abuse, hence the requirement.
So you could literally put:
rawg = rawgpy.RAWG("My first app")
However it's better practice to put something unique and descriptive that describes your app. For your use case this could be "game-database-app-01".
There probably aren't any syntax requirements on what you can put in, but I wouldn't be surprised if they only accept alphanumeric entries.
It's probably a good idea to always call the same API with the same app name to avoid throwing any errors their end.
I hope this answers your question.

Google App Engine: Modifying 1000 entities

I have about 1000 user account entities like this:
class UserAccount(ndb.Model):
email = ndb.StringProperty()
Some of these email values contain uppercase letters like JohnathanDough#email.com. I want to select all the email values from all UserAccount entities and apply python's email.lower(). How can I do this efficiently, and most importantly, without errors?
Note: The email values are important for login, so I cannot afford to mess this up. Is there a way to backup this data in case of the event that I do make a mistake?
Thank you.

Yes, off course. Even if Datastore Administration is an experimental feature we can backup and restore data without coding. Follow this instruction for the backup flow: Backing up data.
To processing your data instead, the most efficient way is to use the MapReduce library.

Mapreduce works but its an excesive complication if youve never done it before.
Use task queues, each can handle a query result page, store the next pageToken and start another taskqueue for the next page.
Slower than mapreduce if you run the taskqueues secuentially. 1000 entries ia not much. Maybe in one minute it will be done.

Big mysql query versus an http post connection in terms of long term speed

right now I think i'm stuck between two main choices for grabbing a user's friends list.
The first is a direct connection with facebook, and the pulling the friends list out and creating a list of friend models with the json. (Takes quite a while whenever I try it out, like 2 seconds?)
The other is whenever a user logs in, the program will store his or her entire friends list inside a big friends model (note that even if two people have the same exact friends, two sets will still be stored, all friend models will have an FK back to the person who has these friends on their list).
Whenever a user needs his or her friends list, I just use django's filter to grab them.
Right now this is pretty fast but that's because it hasn't been tested with many people yet.
Based off of your guys experience, which of these two decisions would make the most sense long term?
Thank you

It depends a lot on what you plan on doing with the data. However, thinking long term you're going to have much more flexibility with breaking out the friends into distinct units than just storing them all together.
If the friend creation process is taking too long, you should consider off-loading it to a separate process that can finish it in the background, using something like Celery.

Basic friend timeline algorithm?

I'm sure a lot of services online today must perform a task similar to what I'm doing. A user has friends, and I want to get all status updates of all the user's friends after their friends last status update date.
That was a mouthful, but here's what I have:
A user has say 10 friends. What I want to do is get new status updates for all his friends. So, I prepare a dictionary with each friend's last status date. Something like:
for friend in user:
dictionary['userId] = friend.id
dictionary['lastDate'] = friend.mostRecentStatusUpdate.date
Then, on my server side, I do something like this:
for dict in friends:
userId = dict['userId]
lastDate = dict['lastDate']
# each get below, however, launches an RPC and does a separate table lookup, so if I have 100 friends, this seems extremely inefficient
get statusUpdates for userId where postDate > lastDate
The problem with the above approach is that on the server side each iteration of the for loop launches a new query, which launches an RPC. So if there are a lot of friends, it would seem to be really inefficient.
Is there a better way to design my structure to make this task more efficient? How does say Twitter do something like that, where it gets new time line updates?

From the high level, I'd suggest you follow the prescribed app-engine mantra - make writes expensive to make reads cheap.
For each friend, you should keep a collection of known friends and their last status updates. This will allow you to update friends at write time. This is expensive for the write, but saves you processing and querying at read. This also assumes that you read more than you write.
Additionally, if you are just trying to display N number of latest updates for each friend, I would suggest you use NDB Structured property to store the Friend objects - this way you can create matching data structure. As part of the object, create a collection of keys that correspond to the status updates. When the status update is written, add to the collection, and potentially remove older entries (if space is a concern).
This way when you need to retrieve the updates, you are getting them by key, instead of a more expensive query types.
An alternative to this that avoids any additional queries, is to keep the entire update instead of just keys. However, this will be a lot bigger for storage - 10 friends all interconnected, means 100 versions of the same update.

Django/SQL: keeping track of who who read what in a forum

I'm working on a not-so-big project in django that will among other things incorporate a forum system.
I have most of the system at a more or less functioning state, but I'm still missing a feature to mark unread threads for the users when there are new posts.
The thing is I can't really think of a way to properly store that information. My first idea was to create another model that will store a list of threads with changes in them for each user. Something with one ForeignKey(User) and one ForeignKey(Thread) and just keep adding new entries each time a thread is posted or a post is added to a thread.
But then, I'm not sure how well that would scale with say several hundred threads after a while and maybe 50-200 users. So add 200 rows for each new post for the users who aren't logged on? Sounds like a lot.
How do other forum systems do it anyway? And how can I implement a system to work these things out in Django.
Thanks!

You're much better off storing the "read" bit, not the "unread" bit. And you can store them not as relational data, but in a giant bit-blob. Then you don't have to modify the read data at all when new posts are added, only when a user reads posts.

You might also simply store the last time a user was reading a particular forum. Any posts that have been updated since that date are new. You'll only be storing one additional piece of information per user as opposed to a piece of information per post per user.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Asynchronous Python Classes - python

Check out the Task Queue API. Task or message queues tend to be the way to go about doing some sort of work that is initiated by a user request, but not necessarily completed within the time frame of that request's execution.

Related

How do I start using the RAWG api using rawgpy?

Google App Engine: Modifying 1000 entities

Big mysql query versus an http post connection in terms of long term speed

Basic friend timeline algorithm?

Django/SQL: keeping track of who who read what in a forum

Categories

Resources