Correct way of implementing database-wide functionality

Correct way of implementing database-wide functionality - python

I'm creating a small website with Django, and I need to calculate statistics with data taken from several tables in the database.
For example (nothing to do with my actual models), for a given user, let's say I want all birthday parties he has attended, and people he spoke with in said parties. For this, I would need a wide query, accessing several tables.
Now, from the object-oriented perspective, it would be great if the User class implemented a method that returned that information. From a database model perspective, I don't like at all the idea of adding functionality to a "row instance" that needs to query other tables. I would like to keep all properties and methods in the Model classes relevant to that single row, so as to avoid scattering the business logic all over the place.
How should I go about implementing database-wide queries that, from an object-oriented standpoint, belong to a single object? Should I have an external kinda God-object that knows how to collect and organize this information? Or is there a better, more elegant solution?

I recommend extending Django's Model-Template-View approach with a controller. I usually have a controller.py within my apps which is the only interface to the data sources. So in your above case I'd have something like get_all_parties_and_people_for_user(user).
This is especially useful when your "data taken from several tables in the database" becomes "data taken from several tables in SEVERAL databases" or even "data taken from various sources, e.g. databases, cache backends, external apis, etc.".

User.get_attended_birthday_parties() or Event.get_attended_parties(user) work fine: it's an interface that makes sense when you use it. Creating an additional "all-purpose" object will not make your code cleaner or easier to maintain.

Related

Querying multiple schemas with django-tenants

I'm writing this question because I have doubts about how to implement a scenario that we had not foreseen for my django developed application. Basically what I need is a way to allow some types of users, depending on their role, could access data that is in multiple schemas.
So the approach that I had thought of was to create relationships between the "tenants", which define the database schemas, and detecting which of them a certain user belongs to, to be able to query all the schemes that are related to it in a loop.
In the same way, this has some drawbacks, since this solution would require a lot of additional programming and concatenating querysets or working by adding the responses of the serialized data of each queryset of the aforementioned schemas could be inefficient... So, is there a different way to reach the explained objective with a different approach?
Thanks in advance.

How and where do I delete rows in Flask - SQLAlchemy

I'm building a project using flask and flask-SQLAlchemy (with sqlite3 and with python 3.8). I have models.py which holds all the tables of the db, and I want to have static method which deletes rows in Articles table depends on one of their attribute.
I thought about writing this funciton in the models class and wanted to ask if that's fine.
The function looks like:
def update_articles():
all_articles = Articles.query.all()
for article in all_articles:
if needs_to_be_deleted(article):
db.session.delete(article)
db.session.commit()
Is that funciton fine (don't mind the needs_to_be_deleted(article) thing). Is the way I delete the article good? and is the place for this function can be at the models.py file?

As per my experience its all good here. As a general rule you should design your application as:
Models should do the most powerful stuff
Views should carry out only logical work
Templates should be dumb. They should just render things.
Now by saying the above i mean to say all the stuff relating to the operations with database should be in the models. Just like you are doing. Most of the time all the transactions with your database are simple ones and can be written with a couple of lines with ORM. But sometimes if you feel that something needs to be done over and over again or you feel like having a method for doing something big. Then its better to have that in your models only. The method you mentioned should be a part of your Article model. Other things that can be there in your model could be operations like sending emails to users or some other routine tasks.
As far as your views are concerned. They should carry out all the logical stuff. Your business logic is basically implemented by the views.
Lastly templates should do the work of rendering things whatever is fed to them by the views.
Obviously there are no such hard and fast rule. There can be exceptions to the above. Or someone could find any other way of doing things better rather than that i mentioned. For that you need to use your own conscience and no one can teach you that. It comes with experience.
For your reference please go through the following articles:
https://blog.miguelgrinberg.com/post/the-flask-mega-tutorial-part-i-hello-world
https://flask.palletsprojects.com/en/1.1.x/tutorial/
Also you might be using this but in case your aren't:
https://flask-marshmallow.readthedocs.io/en/latest/
Just one small suggestion too. though i am not clear about your requirements. But it would be better if you could commit after the for loop i.e at the very end of the method. A request should generally have only one commit.

What to do with runtime generated data spanning several classes?

I'm a self-taught programmer and a lot of the problems I encounter come from a lack of formal education (and often also experience).
My question it the following: How to rationalize where you store the data a class or function creates? I'll make a simple example:
Case: I have a webshop (SHOP) with a REST api and a product provider (PROVIDER) also with a REST API. I determine the product, I send that data to PROVIDER who sends me back formatted data that can be read by SHOP to make a working product on the webshop. PROVIDER also has a secondary REST api that provides generated images.
What I would come up with:
I'd make three classes: ProductBase, Shop and Provider
ProductBase would be the class from where I instantiate and store the individual product information.
Shop would be where I design the api interactions with the webshop.
Provider same as shop, but for interactions with provider api.
My problem: At some point you're creating data that's not clearly separated in concern. For example: Would I store the generated product data (from PROVIDER) in the ProductBase instance I created? It feels like I'm coupling the two classes this way. But it not there, then where?
What if I create product images with PROVIDER and I upload them to SHOP? Do I store the uploaded image-url in PRODUCT? How do you keep track of all this info?
The question I want answered:
I've read a lot on OOP and Design Patterns, and I have adopted a TDD approach which has greatly helped to improve my code but I haven't found anything on how to approach the flow of at runtime generated data within software engineering.
What would be a good way to solve above problem(s) and could you explain your rationale for it?

If I understand correctly, I think your current concern is that you have "raw" product data, which you want to store in objects, and you have "processed" (formatted) product data, which you also want to store in objects. Your question being should you mix them.
Let me just first point out the other obvious option. Namely, having two product classes: RawProduct and ProcessedProduct. Which to do?
(Edit: also, to be sure, product data should not be stored in provider. The provide performs the action of formatting but the data is product data. Not provider data).
It depends. There are a couple of considerations:
1) In general, in OOP, the idea is to couple actions on data with the data. So if possible, you have some method in ProductBase like "format()", where format will send the object off to the API to get formatted, and store the result in an instance variable. You can then also have a method like "find_image", that goes and fetches the image url from the API and then stores that in a field. An object's data is meant to be dynamic. It is meant to be altered by object methods.
2) If you need version control (if you want the full history of the object's state to be available), then you can't override fields with new data. So either you need to store a history of every object field in the object, or you need to create new objects.
3) Is RAM a concern? I sometimes create dataclasses that store only the final part of an object's life so that I can fit more of the objects into memory.
Personally I often find myself creating "RawObject" and "ProcessedObject" classes, it's just easier a lot of the time. But that's probably because I mostly work with document processing, so it's very clear. Usually You'll just update the objects data.
A benefit of having one object with the full history is that it is much easier to debug. Because the raw data and the API result are in the same object. So you can very easily probe what went wrong. If you start splitting things up it's harder to track. In general, the more information an object has about where it's been, the easier it is to figure out what went wrong with it.
Remember also though, since this is a Python question, Python is multi-paridigm. And if you're writing pipeline-style architectures (synchronous, linear processes), then a functional approach can also work well.
Once your data is stored in a product object, anything can hold a reference to that. So a shop can reference an object and a product can reference the object. Be clear on the difference between "has-a" relationships and "is-a" relationships.

python database abstraction to store datastructures unpickled

I am looking for a generic way to store python objects in a database. Of course I could just pickle the objects, but that way I would have binary blobs in my database. That way I can not search my objects. Also it seems to be easier to put it together with other applications.
So in my fantasy, I have on object like
class myClass
data1=1
data2='foobar'
data3=some_html_object
...
and could do something like
mydata=myClass()
mydata.add_data(various_things)
mydata.save_to_database()
and would end up with a database which has colums called data1,data2, data3, where I have the values of the of the objects attributes in the rows stored as text which would be searchable. Of course some inital setup would have to be done.
And of course it would be nice if I could plug any database I want (well, at least not just one database) and would not be bothered with the details.
Now of course I could program my own framework to let me do this, but I was hoping that this has been done bevore by someone else :)
Any suggestions?

Your fantasy in fact exists!
You describe something called the Active Record Pattern. It is usually implemented by using Object-Relational Mapping. One common solution for Python is SQLAlchemy, but Storm is somehow popular too:
See What are some good Python ORM solutions?
If your are developing for the Web, Django possess its own ORM.

It sounds like what you want is an Object Relational Mapper (ORM) to map SQL tables to objects.
The most popular ORMs that support different dialects by community are the following:
Python -- SQLAlchemy, Storm, Django (built into web framework)
Ruby -- ActiveRecord, Sequel
Node -- Sequelize
For a specific example of implementing what you described in Python using SQLAlchemy, check out this blog post that walks through a simple example

Building a DSL query language

i'm working on a project (written in Django) which has only a few entities, but many rows for each entity.
In my application i have several static "reports", directly written in plain SQL. The users can also search the database via a generic filter form. Since the target audience is really tech-savvy and at some point the filter doesn't fit their needs, i think about creating a query language for my database like YQL or Jira's advanced search.
I found http://sourceforge.net/projects/littletable/ and http://www.quicksort.co.uk/DeeDoc.html, but it seems that they only operate on in-memory objects. Since the database can be too large for holding it in-memory, i would prefer that the query is translated in SQL (or better a Django query) before doing the actual work.
Are there any library or best practices on how to do this?

Writing such a DSL is actually surprisingly easy with PLY, and what ho—there's already an example available for doing just what you want, in Django. You see, Django has this fancy thing called a Q object which make the Django querying side of things fairly easy.
At DjangoCon EU 2012, Matthieu Amiguet gave a session entitled Implementing Domain-specific Languages in Django Applications in which he went through the process, right down to implementing such a DSL as you desire. His slides, which include all you need, are available on his website. The final code (linked to from the last slide, anyway) is available at http://www.matthieuamiguet.ch/media/misc/djangocon2012/resources/compiler.html.
Reinout van Rees also produced some good comments on that session. (He normally does!) These cover a little of the missing context.
You see in there something very similar to YQL and JQL in the examples given:
groups__name="XXX" AND NOT groups__name="YYY"
(modified > 1/4/2011 OR NOT state__name="OK") AND groups__name="XXX"
It can also be tweaked very easily; for example, you might want to use groups.name rather than groups__name (I would). This modification could be made fairly trivially (allow . in the FIELD token, by modifying t_FIELD, and then replacing . with __ before constructing the Q object in p_expression_ID).
So, that satisfies simple querying; it also gives you a good starting point should you wish to make a more complex DSL.

I've faced exactly this problem - a large database which needs searching. I made some static reports and several fancy filters using django (very easy with django) just like you have.
However the power users were clamouring for more. I decided that there already was a DSL that they all knew - SQL. The question was how to make it secure enough.
So I used django permissions to give the power users permission to make SQL queries in a new table. I then made a view for the not-quite-so-power users to use these queries. I made them take optional parameters. The queries were run using Python's lower level DB-API which django is using under the hood for its ORM anyway.
The real trick was opening a read only database connection to run these queries just to make sure that no updates were ever run. I made a read only connection by creating a different user in the database with lower permissions and opening a specific connection for that in the view.
TL;DR - SQL is the way to go!

Depending on the form of your data, the types of queries your users need to use, and the frequency that your data is updated, an alternative to the pure SQL solution suggested by Nick Craig-Wood is to index your data in Solr and then run queries against it.
Solr is an added layer of complexity (configuration, data synchronization) but it is super-fast, can handle large datasets, and provides a (relatively) intuitive query language.

You could write your own SQL-ish language using pyparsing, actually. There is even pretty verbose example you could extend.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.