Checking username availability - Handling of AJAX requests (Google App Engine) - python

I want to add the 'check username available' functionality on my signup page using AJAX. I have few doubts about the way I should implement it.
With which event should I register my AJAX requests? We can send the
requests when user focus out of the 'username' input field (blur
event) or as he types (keyup event). Which provides better user
experience?
On the server side, a simple way of dealing with requests would be
to query my main 'Accounts' database. But this could lead to a lot
of request hitting my database(even more if we POST using the keyup
event). Should I maintain a separate model for registered usernames
only and use that to get better results?
Is it possible to use Memcache in this case? Initializing cache with
every username as key and updating it as we register users and use a
random key to check if cache is actually initialized or pass the
queries directly to db.

Answers -
Do the check on blur. If you do it on key up, you will be hammering your server with unnecessary queries, annoying the user who is not yet done typing, and likely lag the typing anyway.
If your Account entity is very large, you may want to create a separate AccountName entity, and create a matching such entity whenever you create a real Account (but this is probably an unnecessary optimization). When you create the Account (or AccountName), be sure to assign id=name when you create it. Then you can do an AccountName.get_by_id(name) to quickly see if the AccountName has already been assigned, and it will automatically pull it from memcache if it has been recently dealt with.
By default, GAE NDB will automatically populate memcache for you when you put or get entities. If you follow my advice in step 2, things will be very fast and you won't have to mess around with pre-populating memcache.
If you are concerned about 2 people simultaneously requesting the same user name, put your create method in a transaction:
#classmethod
#ndb.transactional()
def create_account(cls, name, other_params):
acct = Account.get_by_id(name)
if not acct:
acct = Account(id=name, other param assigns)
acct.put()

I would recommend the blur event of the username field, combined with some sort of inline error/warning display.
I would also suggest maintaining a memcache of registered usernames, to reduce DB hits and improve user experience - although probably not populate this with a warm-up, but instead only when requests are made. This is sometimes called a "Repository" pattern.
BUT, you can only populate the cache with USED usernames - you should not store the "available" usernames here (or if you do, use a much lower timeout).
You should always check directly against the DB/Datastore when actually performing the registration. And ideally in some sort of transactional method so that you don't have race conditions with multiple people registering.
BUT, all of this work is dependant on several things, including how busy your app is and what data storage tech you are using!

Related

microservices and multiple databases

i have written MicroServices like for auth, location, etc.
All of microservices have different database, with for eg location is there in all my databases for these services.When in any of my project i need a location of user, it first looks in cache, if not found it hits the database. So far so good.Now when location is changed in any of my different databases, i need to update it in other databases as well as update my cache.
currently i made a model (called subscription) with url as its field, whenever a location is changed in any database, an object is created of this subscription. A periodic task is running which checks for subscription model, when it finds such objects it hits api of other services and updates location and updates the cache.
I am wondering if there is any better way to do this?
I am wondering if there is any better way to do this?
"better" is entirely subjective. if it meets your needs, it's fine.
something to consider, though: don't store the same information in more than one place.
if you need an address, look it up from the service that provides address, every time.
this may be a performance hit, but it eliminates the problem of replicating the data everywhere.
another option would be a more proactive approach, as suggested in comments.
instead of creating a task list for changes, and doing that periodically, send a message across rabbitmq immediately when the change happens. let every service that needs to know, get a copy of the message and update it's own cache of info.
just remember, though. every time you have more than one copy of the information, you reduce the "correctness" of the system, as a whole. it will always be possible for the information found in one of your apps to be out of date, because it did not get an update from the official source.

Logging a snapshot of online users for Django website (postgresql backend, nginx webserver)

In a Django + postgresql website of mine, I need to publicly show all is online at a point in time (it's a social website). How do I do this? For instance, can there be a way to enumerate all logged in users who hit my nginx webserver in the previous 10 mins? Something like that could work. I'm a beginner and fishing for a viable solution at the moment.
Currently to accomplish this, I store sessions to the database, using an external library to make sessions enumeratable. This allows me to query how many unique users are online at a point in time.
But this scheme creates a lot of needless DB traffic. As a result, logging and pruning logs has become ineffective. Moreover pgFouine shows me that session related DB calls are the biggest performance bottleneck my website currently has.
There's a proposed solution here, but it uses the database.
Use django's cache framework to save the result of the db query to memory. That way you don't need to do the expensive database query for every page render.
from django.core.cache import cache
def count_current_users():
users = cache.get('users')
if users is None:
# last count has timed out
users = do_expensive_db_query()
cache.set('users', users, timeout=500)
return users
https://docs.djangoproject.com/en/1.10/topics/cache/#basic-usage
You can also use Template fragment caching and write a custom template tag that only runs the db query if the cache is stale. This will cache the result for 500 seconds.
{% cache 500 logged_in_users %}
{% expensive_query_db_for_logged_in_users %}
{% endcache %}
If you want your user count to be more real time, you probably have to bypass django's cache framework, and communicate directly with Redis.
Store each logged in user as a key with a set time to live. Getting a list of currently active keys from Redis would be much cheaper than the equivalent query to a sql database. It can also be implemented with just a few lines of python code.
If you're using django-user-sessions, the Session model has a last_activity field.
You may be able to do something like:
from user_sessions import Session
from datetime import datetime, timedelta
time_threshold = datetime.now() - timedelta(minutes=10)
qs = Session.objects.filter(last_activity__gt=time_threshold)
Though, the django-user-sessions does not have a database index on that field, which means if you have a very large number of users / sessions, that query may be hard and take a long time. A more complicated answer might involve creating a materialzed view (if you're using postgres) that refreshes via a cron job.
Currently, I'm trying a different approach. I've written a middleware where upon each request, the user's user_id is stored in a global sorted set. I do this only if they're authenticated, and I use the redis key-value store to ensure everything is blazingly fast.
The solution isn't live yet. I'm going to report more here and give a full answer once I'm done. I'll also consider other answers given here before marking the correct solution.

update existing cache data with newer items in django

I want to use caching in Django and I am stuck up with how to go about it. I have data in some specific models which are write intensive. records will get added continuously to the model. Each user has some specific data in the model similar to orders table.
Since my model is write intensive I am not sure how effective caching frameworks in Django are going to be. I tried Django view specific caching and I am try to develop a view where first it will pick up data from the cache. Then I will have another call which will bring in data which was added to the model after the caching was done. What I want to do is add the updated data to the original cache data and store it again.
It is like I don't want to expire my cache, I just want to keep adding to my existing cache data. may be once in 3 hrs I can clear it.
Is what I am doing right. Are there better ways than this. Can I really add to items in existing cache.
I will be very glad for your help
You ask about "caching" which is a really broad topic, and the answer is always a mix of opinion, style and the specific app requirements. Here are a few points to consider.
If the data is per user, you can cache it per user:
from django.core.cache import cache
cache.set(request.user.id,"foo")
cache.get(request.user.id)
The common practice it to keep a database flag that tells you if the user's data changed since it was cached. So before you fetch the data from cache, check only this flag from the DB. If the flag says nothing changed, get the data from cache. If it did change, pull from DB, replace the cache, and set the flag again.
The flag check should be fast and simple: one table, indexed by user.id, and a boolean flag field. This will squeeze a lot of index rows into a single DB page, and enables a fast fetching of a single one field row. Yet you still get a persistent updated main storage, that prevents the use of not updated cache data. You can check this flag in a middleware.
You can run expiry in many ways: clear cache when user logs out, run a cron script that clears items, or let the cache backend expire items. If you use a flag check before you use the cache, there is no issue in keeping items in cache except space, and caching backends handle that. If you use the django simple file cache (which is easy, simple and zero config), you will have to clear the cache. A simple cron script will do.

Django Rest Framework Viewsets / Database Locks

Hi there so i am using Django Rest Framework 3.1 and i was wondering if its possible to "protect" my viewsets / database against writes in a per user basis?
In other words if 1 user is saving something the other one cannot save and it either waits till the first user finishes or returns some kind of error.
I tried looking for this answer but couldn't find it.
Is this behavior already implemented? if not how can i achieve this in practice?
UPDATE after some more thinking:
This is just a theory still, it needs more thinking, but if we use a Queue (Redis or Rabbitmq) we can put all synchronization writes requests in the queue instead of processing them right away and in conjunction with some user specific lock variable (maybe in the user sessions db table) we can ask if there are any users in front of us belonging to the same proponent and if those users have finished writing their updates or not (using the lock)
cheers
Database transactions will provide some of the safety you're looking for, I think. If a number of database operations are wrapped in a transaction, they are applied to the database together, so a sequence of operations cannot fail mid-way through and leave the database in an invalid state.
Other users will see the results of the operations as if they were applied all at once, or not at all (in the case of an error).

Google App Engine: Modifying 1000 entities

I have about 1000 user account entities like this:
class UserAccount(ndb.Model):
email = ndb.StringProperty()
Some of these email values contain uppercase letters like JohnathanDough#email.com. I want to select all the email values from all UserAccount entities and apply python's email.lower(). How can I do this efficiently, and most importantly, without errors?
Note: The email values are important for login, so I cannot afford to mess this up. Is there a way to backup this data in case of the event that I do make a mistake?
Thank you.
Yes, off course. Even if Datastore Administration is an experimental feature we can backup and restore data without coding. Follow this instruction for the backup flow: Backing up data.
To processing your data instead, the most efficient way is to use the MapReduce library.
Mapreduce works but its an excesive complication if youve never done it before.
Use task queues, each can handle a query result page, store the next pageToken and start another taskqueue for the next page.
Slower than mapreduce if you run the taskqueues secuentially. 1000 entries ia not much. Maybe in one minute it will be done.

Categories

Resources