Match Making using GAE + ndb

Match Making using GAE + ndb - python

I have a game in which users contact a server to find a user of their level who wants to play a game. Here is the basic architecture of a game request.
I am using ndb to store a waiting queue for each user level in the Google DataStore.
I am accessing these queues by their keys to ensure strong consistency (per this article). The entities are stored in the queue using a repeated (list of) LocalStructuredProperty.
Questions:
An entity is deleted from a waiting queue because it is matched to a request. The transaction is committed but not yet applied. That same entity is matched with another request and deleted. Will this throw an error?
These strongly consistent accesses are limited to ~1 write/sec. Is there a better architecture that would eliminate this constraint?
One thing I've considered for the latter question is to maintain multiple queues (whose number grows and shrinks with demand).

Not sure about your first question, but you might be able to simulate it with a sleep statement in your transaction.
For your second question, there is another architecture that you could use. If the waiting queue duration is relatively short (minutes instead of hours), you might want to use memcache. It will be a lot faster than writing to disk and you can avoid dealing with consistency issues.

1.- If you do the entity get and the post inside a transaction, then the same entity can not be matched for a game and therefore no error and it remains consistent.
2.- The 1 write per second is sthe limit for transactions inside the same entity group. If you need more, you can shard the queue entity.
You can use a dedicated memcache or a redis instance to avoid contention. This are much faster than the datastore.
See how these guys use tree nodes to do the match making:
https://www.youtube.com/watch?v=9nWyWwY2Onc

Related

Middleware to optimize postgres

In my company, we have an ingestion service written in Go whose job is to take messages from a HTTP end point and store them in Postgres. It receives a peak throughput of 50,000 messages/second. However, our database can handle a maximum of 30,000 messages/second.
Is it possible to write a middleware in Python to optimize this? If so please explain.

It seems to be pretty unrelated to Python or any particular programming language.
These are typical questions to be asked and answers to be given:
Are there duplicates? If yes, don't save every message immediately but rather wait for duplicates (for what some kind of RAM-originated cache is required, the simplest one is <thread-safe?> hashtable).
Batch your message into large enough packs and then dump them into PostgreSQL all-at-once. You have to determine what is "large enough" based on load tests.
Can you drop some of those messages? If your data is not of critical importance, or at least not all of it, then you may detect overload by tracking number of pending messages and start to throw incoming stuff away until load becomes acceptable.

Redis pattern: how would you cache free / occupied ressource with expiration?

Here is the problem I was working on this week, and I am kind of hitting a wall here.
Let's say I have 100 resources available to do some quick task.
What I want to do is for the client, as fast a possible:
fetch the first available ressource
mark it as occupied
use it
mark it as free.
For this kind of thing, I think the use of a sorted set is the best.
But because my client is not very safe and can sometimes fail in the middle of the code it runs I really want to set expiration time when I mark a resource as occupied so resources are can't in the occupied state forever.
It sounds like a very common problem and I'm sure there is a lot of literature out there on how to fix it with Redis but I could not find any.
I found many patterns and example for "Maintaining a global leaderboard" kind of problem, but none of those examples dealt with key expiration.
I currently have a solution like this:
for ressource in ressources:
if GET <ressource> == 0:
SET <ressource> 1, EX=10
use_ressource(<ressource>)
SET <ressource> 0, EX=10
else:
continue
Thing is, as soon as I have lots of resources used, this can take lots of operations to find the first free resource, and although Redis is really fast, this snippet does not scale well.

Off the top of my head:
Maintain a set of free resources
Maintain a set of used resources
Set up a keyspace listener on the expired event notification
When a resource is needed, randomly select one with SRANDMEMBER and move it to the in-use resources set with SMOVE. In this same transaction, set up a simple expire key with a good prefix, the name/type of the resource, and required TTL with SETEX.
Set up a redis keyspace notification consumer (still new, but check out their newest tech Redis Gears for a super simplified version of this!) that listens to the expired events for your assigned prefix. When one of these events occurs, run the same SMOVE logic above but just move the resource back into the free resources set.
Regarding the actual resources themselves, when they finish, have them self-expire their tracking keys and the notification consumer can handle the state refreshing :)
This should give you the flexibility you need!
Similar question here, and some answers may be of use: How to "EXPIRE" the "HSET" child key in redis?

How to import/sync data to App Engine datastore without excessive datastore reads or timeouts

I am writing an application that uses a remote API that serves up a fairly static data (but still can update several times a day). The problem is that the API is quite slow, and I'd much rather import that data into my own datastore anyway, so that I can actually query the data on my end as well.
The problem is that the results contain ~700 records that need to be sync'd every 5 hours or so. This involves adding new records, updating old records and deleting stale ones.
I have a simple solution that works -- but it's slow as molasses, and uses 30,000 datastore read operations before it times out (after about 500 records).
The worst part about this is that the 700 records are for a single client, and I was doing it as a test. In reality, I would want to do the same thing for hundreds or thousands of clients with a similar number of records... you can see how that is not going to scale.
Here is my entity class definition:
class Group(ndb.Model):
groupid = ndb.StringProperty(required=True)
name = ndb.StringProperty(required=True)
date_created = ndb.DateTimeProperty(required=True, auto_now_add=True)
last_updated = ndb.DateTimeProperty(required=True, auto_now=True)
Here is my sync code (Python):
currentTime = datetime.now()
groups = get_list_of_groups_from_api(clientid) #[{'groupname':'Group Name','id':'12341235'}, ...]
for group in groups:
groupid = group["id"]
groupObj = Group.get_or_insert(groupid, groupid=group["id"], name=group["name"])
groupObj.put()
staleGroups = Group.query(Group.last_updated < currentTime)
for staleGroup in staleGroups:
staleGroup.delete()

I can't tell you why you are getting 30,000 read operations.
You should start by running appstats and profiling this code, to see where the datastore operations are being performed.
That being said I can see some real inefficiencies in your code.
For instance your delete stale groups code is horribly inefficient.
You should be doing a keys_only query, and then doing batch deletes.
What you are doing is really slow with lots of latency for each delete() in the loop.
Also get_or_insert uses a transaction (also if the group didn't exist a put is already done, and then you do a second put()) , and if you don't need transactions you will find things will run faster. The fact that you are not storing any additional data means you could just blind write the groups (So initial get/read), unless you want to preserve date_created.
Other ways of making this faster would be by doing batch gets/puts on the list of keys.
Then for all the entities that didn't exist, do a batch put()
Again this would be much faster than iterating over each key.
In addition you should use a TaskQueue to run this set of code, you then have a 10 min processing window.
After that further scaling can be achieved by splitting the process into two tasks. The first creates/updates the group entities. Once that completes you start the task that deletes stale groups - passing the datetime as an argument to the next task.
If you have even more entities than can be processed in this simple model then start looking at MapReduce.
But for starters just concentrate on making the job you are currently running more efficient.

Periodical tasks for each entity

I often have models that are a local copy of some remote resource, which needs to be periodically kept in sync.
Task(
url="/keep_in_sync",
params={'entity_id':entity_id},
name="sync-%s" % entity_id,
countdown=3600
).add()
Inside keep_in_sync any changes are saved to the model and a new task is scheduled to happen again later.
Now, while superficially this seems like a nice solution, in practice you might become worried if all the necessary tasks have really been added or not. Maybe you have entities representing the level of food pellets inside your hamster cages so that an automated email can be sent to your housekeeper to feed them. But then a few weeks later when you come back from your holiday, you find several of your hamsters starving.
It then starts seeming like a good idea to make a script that goes through each entity and makes sure that the proper task really is in the queue for it. But neither Task nor Queue classes have any method for checking if a task exists or not.
Can you save the hamsters and come up with a nicer way to make sure that a method really for sure is being periodically called for each entity?
Update
It seems that if you want to be really sure that tasks are scheduled, you need to keep track of your own tasks as Nick Johnson suggests. Not ready to let go of the convenient task queue, so for the time being will just tolerate the uncertainty of being unable to check if tasks are really scheduled or not.

Instead of enqueueing a task per entity, handle multiple entities in a single task. This can be triggered by a daily cron job, for instance, which fans out to multiple tasks. As well as ensuring you execute your code for each entity, you can also take advantage of asynchronous URLFetch to synchronize with the external resource more efficiently, and batch puts and gets from the datastore to make the updates more efficient.

You'll get an exception (TaskAlreadyExistsError) if there already such task in queue (same url and same params). So, don't worry, just all of them into queue, and remember to catch exceptions.
You can find full list of exceptions here: http://code.google.com/intl/en/appengine/docs/python/taskqueue/exceptions.html

Google App Engine - design considerations about cron tasks

I'm developing software using the Google App Engine.
I have some considerations about the optimal design regarding the following issue: I need to create and save snapshots of some entities at regular intervals.
In the conventional relational db world, I would create db jobs which would insert new summary records.
For example, a job would insert a record for every active user that would contain his current score to the "userrank" table, say, every hour.
I'd like to know what's the best method to achieve this in Google App Engine. I know that there is the Cron service, but does it allow us to execute jobs which will insert/update thousands of records?

I think you'll find that snapshotting every user's state every hour isn't something that will scale well no matter what your framework. A more ordinary environment will disguise this by letting you have longer running tasks, but you'll still reach the point where it's not practical to take a snapshot of every user's data, every hour.
My suggestion would be this: Add a 'last snapshot' field, and subclass the put() function of your model (assuming you're using Python; the same is possible in Java, but I don't know the syntax), such that whenever you update a record, it checks if it's been more than an hour since the last snapshot, and if so, creates and writes a snapshot record.
In order to prevent concurrent updates creating two identical snapshots, you'll want to give the snapshots a key name derived from the time at which the snapshot was taken. That way, if two concurrent updates try to write a snapshot, one will harmlessly overwrite the other.
To get the snapshot for a given hour, simply query for the oldest snapshot newer than the requested period. As an added bonus, since inactive records aren't snapshotted, you're saving a lot of space, too.

Have you considered using the remote api instead? This way you could get a shell to your datastore and avoid the timeouts. The Mapper class they demonstrate in that link is quite useful and I've used it successfully to do batch operations on ~1500 objects.
That said, cron should work fine too. You do have a limit on the time of each individual request so you can't just chew through them all at once, but you can use redirection to loop over as many users as you want, processing one user at a time. There should be an example of this in the docs somewhere if you need help with this approach.

I would use a combination of Cron jobs and a looping url fetch method detailed here: http://stage.vambenepe.com/archives/549. In this way you can catch your timeouts and begin another request.
To summarize the article, the cron job calls your initial process, you catch the timeout error and call the process again, masked as a second url. You have to ping between two URLs to keep app engine from thinking you are in a accidental loop. You also need to be careful that you do not loop infinitely. Make sure that there is an end state for your updating loop, since this would put you over your quotas pretty quickly if it never ended.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.