Unit-testing tightly-coupled models in Django

Unit-testing tightly-coupled models in Django - python

I'm new to both django and unit-testing, and I'm trying to build unit-tests for my models but have been having some difficulty.
I have several models working closely together:
Resource which will maintain a file resource
MetadataField which represents a metadata field that can be added to resources, corresponds to a table full of fields
MetadataValue Matches MetadataField IDs with Resource IDs and a corresponding value, this is an intermediary table for the Resource - MetadataField many-to-many relationship
MetadataSchema represents a schema consisting of many MetadataFields. Each Resource is assigned a MetadataSchema which controls which MetadataFields it is represented by
Relationships:
Resource - MetadataField : Many-to-Many through MetadataValue
MetadataValue - MetadataSchema : Many-to-Many
Resource - MetadataSchema : One-to-Many
I'm not sure how to write tests to deal with these models. The model testing in the Test Driven Django tutorial seems to mostly cover initializing the objects and verifying attributes. If I do any setting up of these objects though it requires the use of all the others, so the tests will all be dependent on code that they're not meant to be testing.
e.g. if I wish to create a resource, I should also be assigning it a metadata schema and values for fields in that schema.
I've looked around for good examples of unit tested models in django but haven't been able to find anything (the django website doesn't seem to have unittests, and these projects all either have poor/missing testing or in a couple cases have good testing but almost no models used.
Here are the possible approaches I see:
Doing a lot of Mocking, to ensure that I am only ever testing one class, and keep the unit tests on the models very simple, testing only their methods/attributes but not that the relationships are functioning correctly. Then rely on higher level integration tests to pick up any problem in the relationships etc.
Design unittests that DO rely on other functionality, and accept that a break in one function will break more than one test, provided it remains easy to see where the fault occurred. So i would perhaps have a method testing whether I can successfully add a MetadataValue to a resource, which would require setting up at least one MetadataSchema and Resource. I could then use a try - except block to ensure that if the test fails before the assertions dealing with what I'm actually meant to be testing, it gives a specific error message suggesting the fault lies elsewhere. This way I could quickly scan multiple failed test messages to find the real culprit. It wouldn't be possible to do this separation reliably in every test though
I'm having a hard time getting my head round this, so I don't know if this all makes sense, but if there are best practices for this sort of situation please point me to them! Thanks

You can use django fixtures to load data for testing, this can be very time consuming and hard to maintain if your models change a lot.
I suggest you to use a library like Factory Boy, which allows you to create objects on demand for your tests when you need them. You can set as many factories as you want, you can see some examples here and here you can also see some examples on mocking with the mocker library and a lot of tips on testing django apps.

For me the purpose of Unit testing is to separate UNITS of code to test ONLY them, not worrying about all their dependencies. If I understand your idea correctly, You want to create something that is more an integration test (the relationship between two or more models) which is also a very helpful, but still a different, layer of testing :)
To test separate modules, especially when they use a lot of code around, I prefer to mock the dependencies. Google returned this as a first option for you Python mocks (I guess there are plenty of them out there).
The other thing is if there are TOO MANY dependencies You have to mock it probably means You have to rethink your architecture because of tight coupling :)
Good luck!

Use fixtures, they let you load model data without writing the code.

Related

Handling repetitive content within django apps

I am currently building a tool in Django for managing the design information within an engineering department. The idea is to have a common catalogue of items accessible to all projects. However, the projects would be restricted based on user groups.
For each project, you can import items from the catalogue and change them within the project. There is a requirement that each project must be linked to a different database.
I am not entirely sure how to approach this problem. From what I read, the solution I came up with is to have multiple django apps. One represents the common catalogue of items (linked to its own database) and then an app for each project(which can write and read from its own database but it can additionally read also from the common items catalogue database). In this way, I can restrict what user can access what database/project. However, the problem with this solution is that it is not DRY. All projects look the same: same models, same forms, same templates. They are just linked to different database and I do not know how to do this in a smart way (without copy-pasting entire files cause I think managing this would be a pain).
I was thinking that this could be avoided by changing the database label when doing queries (employing the using attribute) depending on the group of the authenticated user. The problem with this is that an user can have access to multiple projects. So, I am again at a loss.

It looks for me that all you need is a single application that will manage its access properly.
If the requirement is to have separate DBs then I will not argue that, but ... there is always small chance that separate tables in 1 DB is what they will accept

Django apps don't segregate objects, they are a way of structuring your code base. The idea is that an app can be re-used in other projects. Having a separate app for your catalogue of items and your projects is a good idea, but having them together in one is not a problem if you have a small codebase.
If I have understood your post correctly, what you want is for the databases of different departments to be separate. This is essentially a multi-tenancy question which is a big topic in itself, there are a few options:
Code separation - all of your projects/departments exist in a single database and schema but are separate by code that filters departments depending on who the end user is (literally by using Django .filters()). This is easy to do but there is a risk that data could be leaked to the wrong user if you get your code wrong. I would recommend this one for your use-case.
Schema separation - you are still using a single database but each department has its own schema. You would need to use Postgresql for this but once a schema has been set, there is far less chance that data is going to be visible to the wrong user. There are some Django libraries such as django-tenants that can do a lot of the heavy lifting.
Database separation - each department has their own database. There is even less of a chance that data will be leaked but you have to manage multi-databases and it is more difficult to scale. You can manage this through django as there is support for multi-databases.
Application separation - each department not only has their own database but their own application instance. The separation is absolute but again you need to manage multiple applications on a host like Heroku, which is even less scalable.

How and where do I delete rows in Flask - SQLAlchemy

I'm building a project using flask and flask-SQLAlchemy (with sqlite3 and with python 3.8). I have models.py which holds all the tables of the db, and I want to have static method which deletes rows in Articles table depends on one of their attribute.
I thought about writing this funciton in the models class and wanted to ask if that's fine.
The function looks like:
def update_articles():
all_articles = Articles.query.all()
for article in all_articles:
if needs_to_be_deleted(article):
db.session.delete(article)
db.session.commit()
Is that funciton fine (don't mind the needs_to_be_deleted(article) thing). Is the way I delete the article good? and is the place for this function can be at the models.py file?

As per my experience its all good here. As a general rule you should design your application as:
Models should do the most powerful stuff
Views should carry out only logical work
Templates should be dumb. They should just render things.
Now by saying the above i mean to say all the stuff relating to the operations with database should be in the models. Just like you are doing. Most of the time all the transactions with your database are simple ones and can be written with a couple of lines with ORM. But sometimes if you feel that something needs to be done over and over again or you feel like having a method for doing something big. Then its better to have that in your models only. The method you mentioned should be a part of your Article model. Other things that can be there in your model could be operations like sending emails to users or some other routine tasks.
As far as your views are concerned. They should carry out all the logical stuff. Your business logic is basically implemented by the views.
Lastly templates should do the work of rendering things whatever is fed to them by the views.
Obviously there are no such hard and fast rule. There can be exceptions to the above. Or someone could find any other way of doing things better rather than that i mentioned. For that you need to use your own conscience and no one can teach you that. It comes with experience.
For your reference please go through the following articles:
https://blog.miguelgrinberg.com/post/the-flask-mega-tutorial-part-i-hello-world
https://flask.palletsprojects.com/en/1.1.x/tutorial/
Also you might be using this but in case your aren't:
https://flask-marshmallow.readthedocs.io/en/latest/
Just one small suggestion too. though i am not clear about your requirements. But it would be better if you could commit after the for loop i.e at the very end of the method. A request should generally have only one commit.

What to test in a simple DRF API?

So, I'm not a test expert and sometimes, when using packages like DRF, I think what should I test on the code...
If I write custom functions for some endpoints, I understand I should test this because I've written this code and there are no tests for this... But the DRF codebase is pretty tested.
But if I'm writing a simple API that only extends ModelSerializer and ModelViewSet what should I be testing?
The keys in the JSON serialized?
The relations?
What should I be testing?

Testing your ModelSerializer, Check the request payload against your expected Model fields.
Testing your ModelViewSet, Check the response HTTP_Status_Code against the expected Status codes for your viewsets. You can also test for your response data.
A good resource - https://realpython.com/test-driven-development-of-a-django-restful-api/

Even if you're only using automated features and added absolutely no customization on your serializer and viewset, and it's obvious to you that this part of the code works smoothly, you still need to write tests.
Code tends to get large, and some other person might be extending your code, or you might go back to your code a few months later and not remember how your implementation was. Knowing that tests are passing will inform other people (or yourself in the distant future) that you're code is working without having to read it and dive into the implementation details, which makes your code reliable.
The person using your API might be using it at a service and not even be interested in what framework or language you used for implementation, but only wants to be sure that the features he/she requires work properly. How can we ensure this? One way is to write tests and pass them.
That's why it's very important to write complete and reliable tests so people can safely use or extend your code knowing that the tests are passing and everything is OK.

Correct way of implementing database-wide functionality

I'm creating a small website with Django, and I need to calculate statistics with data taken from several tables in the database.
For example (nothing to do with my actual models), for a given user, let's say I want all birthday parties he has attended, and people he spoke with in said parties. For this, I would need a wide query, accessing several tables.
Now, from the object-oriented perspective, it would be great if the User class implemented a method that returned that information. From a database model perspective, I don't like at all the idea of adding functionality to a "row instance" that needs to query other tables. I would like to keep all properties and methods in the Model classes relevant to that single row, so as to avoid scattering the business logic all over the place.
How should I go about implementing database-wide queries that, from an object-oriented standpoint, belong to a single object? Should I have an external kinda God-object that knows how to collect and organize this information? Or is there a better, more elegant solution?

I recommend extending Django's Model-Template-View approach with a controller. I usually have a controller.py within my apps which is the only interface to the data sources. So in your above case I'd have something like get_all_parties_and_people_for_user(user).
This is especially useful when your "data taken from several tables in the database" becomes "data taken from several tables in SEVERAL databases" or even "data taken from various sources, e.g. databases, cache backends, external apis, etc.".

User.get_attended_birthday_parties() or Event.get_attended_parties(user) work fine: it's an interface that makes sense when you use it. Creating an additional "all-purpose" object will not make your code cleaner or easier to maintain.

How to differentiate model classes between webapp and unittest

I've started looking in to unittest when using google app engine. And it seems to bit tricky from what I've read. Since you can't (and not suppose to) run your test's against the datastore.
I've written an abstract class to emulate a datastore model class. And it quite works pretty nice returning mockup data on get, all, fetch and so on (only tried on a small scale) returning dbModel like results.
The one thing I haven't found a solution I'm satisfied with is how to differentiate which model class to use. I want to use the mock-ups for unit tests and the actual db.Model for when webapp is running.
My current solution looks like this in my .py containing all db.Models:
if 'SERVER_SOFTWARE' in os.environ:
class dbTest(db.Model):
content = db.StringProperty()
comments = db.ListProperty(str)
else:
class dbTest(Abstract):
content = 'Test'
comments = ['test1', 'test2']
And it kinda feels like it could break any minute. Is this the way to go or could one combine these as one class and if the db.Model is invoked properly use that else the mockup?

Check out gaetestbed (docs). It stubs out the datastore (and all the other services like memcache) and makes testing from the command line very easy. It ensures a clean environment before every test runs.
I personally think it is nicer than the other solutions I have seen.

Instead of messing your models.py I would go with gaeunit.
I've used it with success in a couple of projects and the features I like are:
Just one file to add to your project (gaeunit.py) and you are almost done
Gaeunit isolates the test datastore from the development store (i.e. tests don't pollute your development db)

Since you can't (and not suppose to)
run your test's against the datastore.
This is not true. You can and should use the local datastore implementation as a test harness - there's no reason to waste your time creating mocks for every datastore behaviour. You can use a tool such as noseGAE or gaeunit, as suggested by other posters, but if you want to set it up yourself, see this snippet.

There's more than one problem you're trying to address here...
First, for running tests with GAE emulation you can take a look at gaeunit, which I like best. If you don't want to run them from the browser then you can take a look at noseGAE (part of nose). This should give you command-line testing.
Second, regarding your comment about about 'creating an overhead of dependencies' it sounds like you're searching for a good unit testing and mocking framework. These will let you mock out the database for tests which don't need to hit it. Try mox and mockito for python.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.