Querying multiple schemas with django-tenants

Querying multiple schemas with django-tenants - python

I'm writing this question because I have doubts about how to implement a scenario that we had not foreseen for my django developed application. Basically what I need is a way to allow some types of users, depending on their role, could access data that is in multiple schemas.
So the approach that I had thought of was to create relationships between the "tenants", which define the database schemas, and detecting which of them a certain user belongs to, to be able to query all the schemes that are related to it in a loop.
In the same way, this has some drawbacks, since this solution would require a lot of additional programming and concatenating querysets or working by adding the responses of the serialized data of each queryset of the aforementioned schemas could be inefficient... So, is there a different way to reach the explained objective with a different approach?
Thanks in advance.

Related

Handling repetitive content within django apps

I am currently building a tool in Django for managing the design information within an engineering department. The idea is to have a common catalogue of items accessible to all projects. However, the projects would be restricted based on user groups.
For each project, you can import items from the catalogue and change them within the project. There is a requirement that each project must be linked to a different database.
I am not entirely sure how to approach this problem. From what I read, the solution I came up with is to have multiple django apps. One represents the common catalogue of items (linked to its own database) and then an app for each project(which can write and read from its own database but it can additionally read also from the common items catalogue database). In this way, I can restrict what user can access what database/project. However, the problem with this solution is that it is not DRY. All projects look the same: same models, same forms, same templates. They are just linked to different database and I do not know how to do this in a smart way (without copy-pasting entire files cause I think managing this would be a pain).
I was thinking that this could be avoided by changing the database label when doing queries (employing the using attribute) depending on the group of the authenticated user. The problem with this is that an user can have access to multiple projects. So, I am again at a loss.

It looks for me that all you need is a single application that will manage its access properly.
If the requirement is to have separate DBs then I will not argue that, but ... there is always small chance that separate tables in 1 DB is what they will accept

Django apps don't segregate objects, they are a way of structuring your code base. The idea is that an app can be re-used in other projects. Having a separate app for your catalogue of items and your projects is a good idea, but having them together in one is not a problem if you have a small codebase.
If I have understood your post correctly, what you want is for the databases of different departments to be separate. This is essentially a multi-tenancy question which is a big topic in itself, there are a few options:
Code separation - all of your projects/departments exist in a single database and schema but are separate by code that filters departments depending on who the end user is (literally by using Django .filters()). This is easy to do but there is a risk that data could be leaked to the wrong user if you get your code wrong. I would recommend this one for your use-case.
Schema separation - you are still using a single database but each department has its own schema. You would need to use Postgresql for this but once a schema has been set, there is far less chance that data is going to be visible to the wrong user. There are some Django libraries such as django-tenants that can do a lot of the heavy lifting.
Database separation - each department has their own database. There is even less of a chance that data will be leaked but you have to manage multi-databases and it is more difficult to scale. You can manage this through django as there is support for multi-databases.
Application separation - each department not only has their own database but their own application instance. The separation is absolute but again you need to manage multiple applications on a host like Heroku, which is even less scalable.

django - model - what are the benefits?

I have customized my Django project to work with mariaDB (mySQL).
Works fine, however I have issues (or concerns) with models.
First of all - I am not sure why I should need them if for me (personally) its much easier to use SQL statements to get the data.
Using API for DB queries might be useful for people who do not know SQL, but for me its less flexible.
Can anybody explain me main benefits of using models?
Here is one of the issues I have. See the code below.
class Quotes(models.Model):
updated = models.DateTimeField()
tdate = models.DateField(default='1900-01-01')
ticker = models.CharField(max_length=15)
open = models.FloatField(default=0)
vol = models.BigIntegerField(default=0)
why program does not consider 'default' when DB table and fields are created?
why - what I define as FloatField on DB is 'double' and not 'float' (I checked this using phpMYAdmin)
How can I properly set default value?
My table will have at least 1 million of entries.
Do I need to concern about performance using API instead of direct SQL queries? Usually one query will select 700-800 entries.
Is it good approach to use MySQLdb and direct SQL's instead of models?
sorry that some questions might sound too simple, but I just started with Django. Before this I worked with PHP. Main reason I want to use Python for web page development is library which I have developed.

Question zero, i.e. why models: Django's models are a nice abstraction on top of relational database tables – most (if not all) web apps end up having (or being) some sort of CRUD where you manipulate objects or graphs of objects saved in the database, so an object-oriented approach is nice to work with there.
In addition, many features in Django (and libraries that work with Django) are built around models (such as the admin, ModelForms, serialization, etc.).
Question 1: That date should preferably be datetime.date(1900, 1, 1), not a string, but that aside, Django deals with defaults on model instantiation, not necessarily in the database.
Question 2: Because that's how it's mapped, presumably to avoid programmers accidentally losing floating-point precision (since MySQL is rather notorious about doing precision-losing conversions "behind your back").
Question 3: Django's ORM is, to be absolutely honest, not the fastest when it generates queries and instantiates model instances. Most of the time, in regular operations, that's not a problem. Depending on what you're doing with those 700 to 800 instances, you may be able to work around that anyway; for instance, using .values() or .values_list() on a queryset if you don't need the actual instances, just the data.
Regarding direct SQL, please don't hard-code any MySQLdb calls in a Django app though; Django has very nice "escape hatches" for doing raw SQL:
You can perform .raw() SQL queries that map into models, or if that's not enough,
you can just execute SQL against the database connection like you would with MySQLdb.
Oh, and one more thing: your model name should be singular (Quote) :)

How can I implement shadow editing (or revisions) across a model with relationships (django)?

While this is django+postgresql, the answer could be generic sql or from a "Databases for Dummies" book.
We have a database with several interrelated models (one to one, one to many, and many to many fields). We'd like to allow a user to shadow-edit the database, and only publish once he's happy with the changes.
For a single model, I could use something like django-reversions, and I could handle the relationships by hand in a hacky sort of way. But, this would have several side effects:
The models not in control of django could change, which would update the data immediately (no shadow copy)
Since external relationships are being stored, things will get strange if there are a lot of edits to them.
Huge amount of work 'catching' CRUD operations and routing them to published or draft entries (if particular user is editing)
Need to fix all pks on relations when publishing (more hack-titude)
What I'd really like is something that would do this:
Allow editing of many related tables at once, over many REST CRUD calls, and only updating after 'publishing'
Allow rolling back to previous version (versioning)
Any ideas?

Hybrid sql/nosql data store in Django for document-based fields?

I am creating an application that needs to allow users to create custom fields, which I think would be best stored in a document-based (basically a serialized dictionary) model field.
I am concerned that I would run into performance issues storing these potentially very large documents in a SQL database, so I thought instead of storing the documents in the SQL database, I would just store pointers in the SQL database to the documents. The documents themselves would then be stored in a separate NoSQL database.
Assuming this structure makes sense, what is the best way to go about constructing a field that stores custom data fields in this manner? Optimally, these custom fields would be accessible on the object as attributes and would be denoted as custom with a "_c" appended to the name. E.g. created_date would become created_date_c on the django model object. I'm thinking custom managers would be best to tackle this: https://docs.djangoproject.com/en/dev/topics/db/managers/
EDIT: My SQL database is MySQL. Also, documents could make more sense as columns (as e.g. in Cassandra). Thoughts on this would be helpful.

As far as I know, there is no best approach to this task, but I advise you to look into two other options, that has its own pros and contras:
Store your user defined data in a JSONfield or pickled field. This will save you a lot of efforts in writing custom managers and NoSQL. If you are worried about storing it along with your fixed structure data, store it in a separate model with one-to-one relationship in a separate InnoDB file for example.
Store yours user data in a generalized (field_id, field_name, content_type) and (object_id, field_id, field_value) dictionaries. They can be split by field types (i.e. int, string, float etc.). This approach won't give you well performing data model from scratch, but smart indexing and partitioning can make it worth noting. And your data query, model structure enforcement will be alot easier from other approaches.
If you consider using NoSQL or other db for your mutable content, be sure to choose one that has means for your data efficient querying, see this discussion and wiki.

Correct way of implementing database-wide functionality

I'm creating a small website with Django, and I need to calculate statistics with data taken from several tables in the database.
For example (nothing to do with my actual models), for a given user, let's say I want all birthday parties he has attended, and people he spoke with in said parties. For this, I would need a wide query, accessing several tables.
Now, from the object-oriented perspective, it would be great if the User class implemented a method that returned that information. From a database model perspective, I don't like at all the idea of adding functionality to a "row instance" that needs to query other tables. I would like to keep all properties and methods in the Model classes relevant to that single row, so as to avoid scattering the business logic all over the place.
How should I go about implementing database-wide queries that, from an object-oriented standpoint, belong to a single object? Should I have an external kinda God-object that knows how to collect and organize this information? Or is there a better, more elegant solution?

I recommend extending Django's Model-Template-View approach with a controller. I usually have a controller.py within my apps which is the only interface to the data sources. So in your above case I'd have something like get_all_parties_and_people_for_user(user).
This is especially useful when your "data taken from several tables in the database" becomes "data taken from several tables in SEVERAL databases" or even "data taken from various sources, e.g. databases, cache backends, external apis, etc.".

User.get_attended_birthday_parties() or Event.get_attended_parties(user) work fine: it's an interface that makes sense when you use it. Creating an additional "all-purpose" object will not make your code cleaner or easier to maintain.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.