Dynamically select database based on request

Dynamically select database based on request - python

I'm trying to keep my RESTful site DRY, and I can't come up with a good way to factor out the code to dynamically select from each "user's" separate database. We've got a separate database for each client. This comes in as a part of the URL, and is passed into each view as a keyword arg. I want to give each and every view the behavior of accessing the corresponding database WITHOUT have to make sure each programmer writing a view remembers to use
Thing.objects.using(user).all()
and
t = Thing()
t.save(using=user)
every time. It seems like there ought to be some way to intercept the request and set the default database based on the args to the view before it hits the view, allowing us to use the usual
Thing.objects.all()
This would also have the advantage of factoring out all the user resolution code into a more appropriate place.

We do this by the following technique.
Apache picks off the first part of the path and routes this to a specific mod_wsgi Daemon.
Each mod_wsgi daemon is a different customer's installation.
We have many parallel customers, each with (nearly) identical code, all based off a single common installation of the base software.
Each customer has a separate settings.py with their unique configuration.
They don't (actually can't) know about each other because Apache has peeled off the top layer of the path for us.

Related

Handling repetitive content within django apps

I am currently building a tool in Django for managing the design information within an engineering department. The idea is to have a common catalogue of items accessible to all projects. However, the projects would be restricted based on user groups.
For each project, you can import items from the catalogue and change them within the project. There is a requirement that each project must be linked to a different database.
I am not entirely sure how to approach this problem. From what I read, the solution I came up with is to have multiple django apps. One represents the common catalogue of items (linked to its own database) and then an app for each project(which can write and read from its own database but it can additionally read also from the common items catalogue database). In this way, I can restrict what user can access what database/project. However, the problem with this solution is that it is not DRY. All projects look the same: same models, same forms, same templates. They are just linked to different database and I do not know how to do this in a smart way (without copy-pasting entire files cause I think managing this would be a pain).
I was thinking that this could be avoided by changing the database label when doing queries (employing the using attribute) depending on the group of the authenticated user. The problem with this is that an user can have access to multiple projects. So, I am again at a loss.

It looks for me that all you need is a single application that will manage its access properly.
If the requirement is to have separate DBs then I will not argue that, but ... there is always small chance that separate tables in 1 DB is what they will accept

Django apps don't segregate objects, they are a way of structuring your code base. The idea is that an app can be re-used in other projects. Having a separate app for your catalogue of items and your projects is a good idea, but having them together in one is not a problem if you have a small codebase.
If I have understood your post correctly, what you want is for the databases of different departments to be separate. This is essentially a multi-tenancy question which is a big topic in itself, there are a few options:
Code separation - all of your projects/departments exist in a single database and schema but are separate by code that filters departments depending on who the end user is (literally by using Django .filters()). This is easy to do but there is a risk that data could be leaked to the wrong user if you get your code wrong. I would recommend this one for your use-case.
Schema separation - you are still using a single database but each department has its own schema. You would need to use Postgresql for this but once a schema has been set, there is far less chance that data is going to be visible to the wrong user. There are some Django libraries such as django-tenants that can do a lot of the heavy lifting.
Database separation - each department has their own database. There is even less of a chance that data will be leaked but you have to manage multi-databases and it is more difficult to scale. You can manage this through django as there is support for multi-databases.
Application separation - each department not only has their own database but their own application instance. The separation is absolute but again you need to manage multiple applications on a host like Heroku, which is even less scalable.

Preserving value of variables between subsequent requests in Python Django

I have a Django application to log the character sequences from an autocomplete interface. Each time a call is made to the server, the parameters are added to a list and when the user submits the query, the list is written to a file.
Since I am not sure how to preserve the list between subsequent calls, I relied on a global variable say query_logger. Now I can preserve the list in the following way:
def log_query(query, completions, submitted=False):
global query_logger
if query_logger is None:
query_logger = list()
query_logger.append(query, completions, submitted)
if submitted:
query_logger = None
While this hack works for a single client sending requests I don't think this is a stable solution when requests come from multiple clients. My question is two-fold:
What is the order of execution of requests: Do they follow first come first serve (especially if the requests are asynchronous)?
What is a better approach for doing this?

If your django server is single-threaded, then yes, it will respond to requests as it receives them. If you're using wsgi or another proxy, that becomes more complicated. Regardless, I think you'll want to use a db to store the information.
I encountered a similar problem and ended up using sqlite to store the data temporarily, because that's super simple and easy to manage. You'll want to use IP addresses or create a unique ID passed as a url parameter in order to identify clients on subsequent requests.
I also scheduled a daily task (using cron on ubuntu) that goes through and removes any incomplete requests that haven't been completed (excluding those started in the last hour).

You must not use global variables for this.
The proper answer is to use the session - that is exactly what it is for.

Simplest (bad) solution would be to have a global variable. Which means you need some in memory location or a db to store this info

microservices and multiple databases

i have written MicroServices like for auth, location, etc.
All of microservices have different database, with for eg location is there in all my databases for these services.When in any of my project i need a location of user, it first looks in cache, if not found it hits the database. So far so good.Now when location is changed in any of my different databases, i need to update it in other databases as well as update my cache.
currently i made a model (called subscription) with url as its field, whenever a location is changed in any database, an object is created of this subscription. A periodic task is running which checks for subscription model, when it finds such objects it hits api of other services and updates location and updates the cache.
I am wondering if there is any better way to do this?

I am wondering if there is any better way to do this?
"better" is entirely subjective. if it meets your needs, it's fine.
something to consider, though: don't store the same information in more than one place.
if you need an address, look it up from the service that provides address, every time.
this may be a performance hit, but it eliminates the problem of replicating the data everywhere.
another option would be a more proactive approach, as suggested in comments.
instead of creating a task list for changes, and doing that periodically, send a message across rabbitmq immediately when the change happens. let every service that needs to know, get a copy of the message and update it's own cache of info.
just remember, though. every time you have more than one copy of the information, you reduce the "correctness" of the system, as a whole. it will always be possible for the information found in one of your apps to be out of date, because it did not get an update from the official source.

Django code organisation

I've recently started working with Django. I'm working on an existing Django/Python based site. In particular I'm implementing some functionality to create and display a PDF document when a particular URL is hit. I have an entry in the app's urls file that routes to a function in the views file and the PDF generation is working fine.
However, the view function is pretty big and I want to extract the code out somewhere to keep my view as thin as possible, but I'm not sure of the best/correct approach. I'll probably need to generate other PDFs in due course so would it make sense to create a 'pdfs' app and put code in there? If so, should it go in a model or view?
In a PHP/CodeIgniter environment for example I would put the code into a model, but models seem to be closely linked to database tables in Django and I don't need any db functionality for this.
Any pointers/advice from more experienced Django users would be appreciated.
Thanks

If you plan to scale your project, I would suggest moving it to a separate app. Generally speaking, generating PDFs based on an url hit directly is not the best thing to do performance-wise. Generating a PDF file is pretty heavy on you server, so if multiple people do it at the same time, the performance of your system will suffer.
As a first step, just put it in a separate class, and execute that code from the view. At some point you will probably want to do some permission checks etc - that stays in the view, while generation of the PDF itself will be cleanly separated.
Once you test your code, scale etc - then you can substitute that one line call in the view into putting the PDF generation in a queue and only pulling it once it's done - that will allow you to manage your computing powers better.

Yes you can in principle do it in an app (the concept of reusable apps is the basis for their existence)
However not many people do it/not many applications require it. It depends on how/if the functionality will be shared. In other words there must be a real benefit.
The code normally goes in both the view/s and in the models (to isolate code and for the model managers)

django vars in ram

I am implementing a really lightweight Web Project, which has just one page, showing data in a diagram. I use Django as a Webserver and d3.js as plotting routine for this diagram. As you can imagine, there are just a few simple time series which have to be responded by Django server, so I was wondering if I simply could hold this variable in ram. My first test was positive, I had something like this in my views.py:
X = np.array([123,23,1,32,123,1])
#csrf_exempt
def getGraph(request):
global X
return HttpResponse(json.dumps(X))
Notice, X is updated by another function every now and then, but all user access is read-only. Do I have to deal with
security issues by defining a global variable?
inconsistencies in general?
I found a thread discussing global variables in Django, but in that case, the difficulty is of handling multiple write-access.
To answer potential questions on why I don't want store data in database: All data I got in my X is already stored in a huge remote database and this web app just needs to display data.

Storing it in a variable does indeed have threading implications (and also scalibility - what if you have two Django servers running the same app?). The advice from the Django community is don't!.
This sounds like a good fit for the Django cache system though. Just cache your getGraph view with #cache_page and the job is done. No need to use memcache, the built-in in-memory memory-cache cache-backend* will work fine. Put a very high number as the time-out on the cache (years).
This way you are storing the HTTP response (JSON) not the value of X. But from your code sample, that is not a problem. If you need to re-calculate X you need to re-calculate the JSON, and if you need to re-calculate the JSON you will need to re-calculate X.
https://docs.djangoproject.com/en/dev/topics/cache/?from=olddocs/
1 or just 'built-in memory backend', I couldn't resist

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.