Django: How to keep db structure without data after running tests?

Django: How to keep db structure without data after running tests? - python

I want to use the --keepdb option that allow to keep the database after running tests, because in my case, the creation of the dabatase takes a while (but running tests is actualy fast)
However, I would like to keep the database structure only, not the data.
Each time a run tests, I need an empty database.
I know I could use tearDown method to delete each object created, but it is a tedious and error-prone way to do.
I just need to find a way to tell Django to flush the whole dabatase (not destroy it) at the end of the unit tests.
I have thinking of making a very simple script that:
Run the tests keeping the db: manage.py test --keepdb
run manage.py flushdb --database test_fugodb
However, with 2nd step, I got a django.db.utils.ConnectionDoesNotExist: The connection test_fugodb doesn't exist. What is the name of this test db? I took the one displayed when running tests:
What's wrong?
Thanks!

drop all the collections of the test database after tests.
in this way, the db doesn't need to be recreated, and the db is empty after tests.
for the problem of test database name, it is described in django doc:
https://docs.djangoproject.com/en/1.8/topics/testing/overview/#the-test-database
you could create your own test runner to replace your script:
django unit tests without a db

Related

how to handle time-consuming tests

I have tests which have a huge variance in their runtime. Most will take much less than a second, some maybe a few seconds, some of them could take up to minutes.
Can I somehow specify that in my Nosetests?
In the end, I want to be able to run only a subset of my tests which take e.g. less than 1 second (via my specified expected runtime estimate).

Have a look at this write up about attribute plugin for nose tests, where you can manually tag tests as #attr('slow') and #attr('fast'). You can tun nosetests -a '!slow' afterward to run your tests quickly.
It would be great if you can do it automatically, but I'm afraid that you would have to write additional code to do it on the fly. If you are into rapid development, I would run the nose with xunit xml output enabled (which tracks the runtime of each test). Your test module can dynamically read in your xml output file from previous runs and set attribute settings for tests accordingly to filter out quick tests. This way you do not have to do it manually, alas with more work (and you have to run all tests at least once).

why would a django test fail only when the full test suite is run?

I have a test in Django 1.5 that passes in these conditions:
when run by itself in isolation
when the full TestCase is run
when all of my app's tests are run
But it fails when the full test suite is run with python manage.py test. Why might this be happening?
The aberrant test uses django.test.Client to POST some data to an endpoint, and then the test checks that an object was successfully updated. Could some other app be modifying the test client or the data itself?
I have tried some print debugging and I see all of the data being sent and received as expected. The specific failure is a does-not-exist exception that is raised when I try to fetch the to-be-updated object from the db. Strangely, in the exception handler itself, I can query for all objects of that type and see that the target object does in fact exist.
Edit:
My issue was resolved when I found that I was querying for the target object by id and User and not id and UserProfile, but it's still confusing to me that this would work in some cases but fail in others.
I also found that the test would fail with python manage.py test auth <myapp>

It sounds like your problem does not involve mocks, but I just spent all day debugging an issue with similar symptoms, and your question is the first one that came up when I was searching for a solution, so I wanted to share my solution here, just in case it will prove helpful for others. In my case, the issue was as follows.
I had a single test that would pass in isolation, but fail when run as part of my full test suite. In one of my view functions I was using the Django send_mail() function. In my test, rather than having it send me an email every time I ran my tests, I patched send_mail in my test method:
from mock import patch
...
def test_stuff(self):
...
with patch('django.core.mail.send_mail') as mocked_send_mail:
...
That way, after my view function is called, I can test that send_mail was called with:
self.assertTrue(mocked_send_mail.called)
This worked fine when running the test on its own, but failed when run with other tests in the suite. The reason this fails is that when it runs as part of the suite other views are called beforehand, causing the views.py file to be loaded, causing send_mail to be imported before I get the chance to patch it. So when send_mail gets called in my view, it is the actual send_mail that gets called, not my patched version. When I run the test alone, the function gets mocked before it is imported, so the patched version ends up getting imported when views.py is loaded. This situation is described in the mock documentation, which I had read a few times before, but now understand quite well after learning the hard way...
The solution was simple: instead of patching django.core.mail.send_mail I just patched the version that had already been imported in my views.py - myapp.views.send_mail. In other words:
with patch('myapp.views.send_mail') as mocked_send_mail:
...

Try this to help you debug:
./manage.py test --reverse
In my case I realised that one test was updating certain data which would cause the following test to fail.

Another possibility is that you've disconnected signals in the setUp of a test class and did not re-connect in the tearDown. This explained my issue.

There is a lot of nondeterminism that can come from tests that involve the database.
For instance, most databases don't offer deterministic selects unless you do an order by. This leads to the strange behavior where when the stars align, the database returns things in a different order than you might expect, and tests that look like
result = pull_stuff_from_database()
assert result[0] == 1
assert result[1] == 2
will fail because result[0] == 2 and result[1] == 1.
Another source of strange nondeterministic behavior is the id autoincrement together with sorting of some kind.
Let's say each tests creates two items and you sort by item name before you do assertions. When you run it by itself, "Item 1" and "Item 2" work fine and pass the test. However, when you run the entire suite, one of the tests generates "Item 9" and "Item 10". "Item 10" is sorted ahead of "Item 9" so your test fails because the order is flipped.

So I first read #elethan's answer and went 'well this is certainly not my problem, I'm not patching anything'. But it turned out I was indeed patching a method in a different test suite, which stayed permanently patched for the rest of the time the tests were run.
I had something of this sort going on;
send_comment_published_signal_mock = MagicMock()
comment_published_signal.send = send_comment_published_signal_mock
You can see why this would be a problem if some things are not cleaned up after running the test suite. The solution in my case was to use the helpful with to restrict the scope.
signal = 'authors.apps.comments.signals.comment_published_signal.send'
with patch(signal) as comment_published_signal_mock:
do_your_test_stuff()
This is the easy part though, after you know where to look. The guilty test could come from anywhere. The solution to this is to run the failing test and other tests together until you find the cause, and then again, progressively narrowing it down, module by module.
Something like;
./manage.py test A C.TestModel.test_problem
./manage.py test B C.TestModel.test_problem
./manage.py test D C.TestModel.test_problem
Then recursively, if for example B is the problem child;
./manage.py test B.TestModel1 C.TestModel.test_problem
./manage.py test B.TestModel2 C.TestModel.test_problem
./manage.py test B.TestModel3 C.TestModel.test_problem
This answer gives a good explanation for all this.
This answer is in the context of django, but can really apply to any python tests.
Good luck.

This was happening to me too.
When running the tests individually they passed, but running all tests with ./manage.py test it failed.
The problem in my case is because I had some tests inheriting from unittest.TestCase instead of from django.test.TestCase, so some tests were failing because there were registers on the database from previous tests.
After making all tests inherit from django.test.TestCase this problem has gone away.
I found the answer on https://stackoverflow.com/a/436795/6490637

Execute Django Testcase Suite Manually

Scenario:
Suppose I have some python scripts that do maintenance on the DB, if I use unittest to run tests on those scripts then they will interact with the DB and clobber it. So I'm trying to find a way to use the native Django test suite which simulates the DB without clobbering the real one. From the docs it looks like you can run test scripts using manage.py tests /path/to/someTests.py but only on Django > v1.6 (I'm on Django-Nonrel v1.5.5).
I found How do I run a django TestCase manually / against other database, which talks about running individual test cases and it seems dated--it mentions Django v1.2. However, I'd like to manually kick off the entire suite of tests I've defined. Assume for now that I can't kick off the suite using manage.py test myApp (because that's not what I'm doing). I tried kicking off the tests using unittest.main(). This works with one drawback... it completely clobbers the database. Thank goodness for backups.
someTest.py
import unittest
import myModule
from django.test import TestCase
class venvTest(TestCase): # some test
def test_outside_venv(self):
self.failUnlessRaises(EnvironmentError, myModule.main)
if __name__ == '__main__':
unittest.main() # this fires off all tests, with nasty side effect of
# totally clobbering the DB
How can I run python someTest.py and get it to fire off all the testcases using the Django test runner?
I can't even get this to work:
>>> venvTest.run()
Update
Since it's been mentioned in the comments, I asked a similar but different question here: django testing external script. That question concerns using manage.py test /someTestScript.py. This question concerns firing off test cases or a test suite separately from manage.py.

Database migrations on django production

From someone who has a django application in a non-trivial production environment, how do you handle database migrations? I know there is south, but it seems like that would miss quite a lot if anything substantial is involved.
The other two options (that I can think of or have used) is doing the changes on a test database and then (going offline with the app) and importing that sql export. Or, perhaps a riskier option, doing the necessary changes on the production database in real-time, and if anything goes wrong reverting to the back-up.
How do you usually handle your database migrations and schema changes?

I think there are two parts to this problem.
First is managing the database schema and it's changes. We do this using South, keeping both the working models and the migration files in our SCM repository. For safety (or paranoia), we take a dump of the database before (and if we are really scared, after) running any migrations. South has been adequate for all our requirements so far.
Second is deploying the schema change which goes beyond just running the migration file generated by South. In my experience, a change to the database normally requires a change to deployed code. If you have even a small web farm, keeping the deployed code in sync with the current version of your database schema may not be trivial - this gets worse if you consider the different caching layers and effect to an already active site user. Different sites handle this problem differently, and I don't think there is a one-size-fits-all answer.
Solving the second part of this problem is not necessarily straight forward. I don't believe there is a one-size-fits-all approach, and there is not enough information about your website and environment to suggest a solution that would be most suitable for your situation. However, I think there are a few considerations that can be kept in mind to help guide deployment in most situations.
Taking the whole site (web servers and database) offline is an option in some cases. It is certainly the most straight forward way to manage updates. But frequent downtime (even when planned) can be a good way to go our of business quickly, makes it tiresome to deploy even small code changes, and might take many hours if you have a large dataset and/or complex migration. That said, for sites I help manage (which are all internal and generally only used during working hours on business days) this approach works wonders.
Be careful if you do the changes on a copy of your master database. The main problem here is that your site is still live, and presumably accepting writes to the database. What happens to data written to the master database while you are busy migrating the clone for later use? Your site has to either be down the whole time or put in some read-only state temporarily otherwise you'll lose them.
If your changes are backwards compatible, and you have a web farm, sometimes you can get away with updating the live production database server (which I think is unavoidable in most situations) and then incrementally updating nodes in the farm by taking them out of the load balancer for a short period. This can work ok - however the main problem here is if a node that has already been updated sends a request for a url which isn't supported by an older node you will get fail as you cant manage that at the load balancer level.
I've seen/heard a couple of other ways work well.
The first is wrapping all code changes in a feature lock which is then configurable at run-time through some site-wide configuration options. This essentially means you can release code where all your changes are turned off, and then after you have made all the necessary updates to your servers you change your configuration option to enable the feature. But this makes quite heavy code...
The second is letting the code manage the migration. I've heard of sites where changes to the code is written in such a way that it handles the migration at runtime. It is able to detect the version of the schema being used, and the format of the data it got back - if the data is from the old schema it does the migration in place, if the data is already from the new schema it does nothing. From natural site usage a high portion of your data will be migrated by people using the site, the rest you can do with a migration script whenever you like.
But I think at this point Google becomes your friend, because as I say, the solution is very context specific and I'm worried this answer will start to get meaningless... Search for something like "zero down time deployment" and you'll get results such as this with plenty of ideas...

I use South for a production server with a codebase of ~40K lines and we have had no problems so far. We have also been through a couple of major refactors for some of our models and we have had zero problems.
One thing that we also have is version control on our models which helps us revert any changes we make to models on the software side with South being more for the actual data. We use Django Reversion

I have sometimes taken an unconventional approach (reading the other answers perhaps it's not that unconventional) to this problem. I never tried it with django so I just did some experiments with it.
In short, I let the code catch the exception resulting from the old schema and apply the appropriate schema upgrade. I don't expect this to be the accepted answer - it is only appropriate in some cases (and some might argue never). But I think it has an ugly-duckling elegance.
Of course, I have a test environment which I can reset back to the production state at any point. Using that test environment, I update my schema and write code against it - as usual.
I then revert the schema change and test the new code again. I catch the resulting errors, perform the schema upgrade and then re-try the erring query.
The upgrade function must be written so it will "do no harm" so that if it's called multiple times (as may happen when put into production) it only acts once.
Actual python code - I put this at the end of my settings.py to test the concept, but you would probably want to keep it in a separate module:
from django.db.models.sql.compiler import SQLCompiler
from MySQLdb import OperationalError
orig_exec = SQLCompiler.execute_sql
def new_exec(self, *args, **kw):
try:
return orig_exec(self, *args, **kw)
except OperationalError, e:
if e[0] != 1054: # unknown column
raise
upgradeSchema(self.connection)
return orig_exec(self, *args, **kw)
SQLCompiler.execute_sql = new_exec
def upgradeSchema(conn):
cursor = conn.cursor()
try:
cursor.execute("alter table users add phone varchar(255)")
except OperationalError, e:
if e[0] != 1060: # duplicate column name
raise
Once your production environment is up to date, you are free to remove this self-upgrade code from your codebase. But even if you don't, the code isn't doing any significant unnecessary work.
You would need to tailor the exception class (MySQLdb.OperationalError in my case) and numbers (1054 "unknown column" / 1060 "duplicate column" in my case) to your database engine and schema change, but that should be easy.
You might want to add some additional checks to ensure the sql being executed is actually erring because of the schema change in question rather than some other problem, but even if you don't, this should re-raise unrelated exception. The only penalty is that in that situation you'd be trying the upgrade and the bad query twice before raising the exception.
One of my favorite things about python is one's ability to easily override system methods at run-time like this. It provides so much flexibility.

If your database is non-trivial and Postgresql you have a whole bunch of excellent options SQL-wise, including:
snapshotting and rollback
live replication to a backup server
trial upgrade then live
The trial upgrade option is nice (but best done in collaboration with a snapshot)
su postgres
pg_dump <db> > $(date "+%Y%m%d_%H%M").sql
psql template1
# create database upgrade_test template current_db
# \c upgradetest
# \i upgrade_file.sql
...assuming all well...
# \q
pg_dump <db> > $(date "+%Y%m%d_%H%M").sql # we're paranoid
psql <db>
# \i upgrade_file.sql
If you like the above arrangement, but you are worried about the time it takes to run upgrade twice, you can lock db for writes and then if the upgrade to upgradetest goes well you can then rename db to dbold and upgradetest to db. There are lots of options.
If you have an SQL file listing all the changes you want to make, an extremely handy psql command \set ON_ERROR_STOP 1. This stops the upgrade script in its tracks the moment something goes wrong. And, with lots of testing, you can make sure nothing does.
There are a whole host of database schema diffing tools available, with a number noted in this StackOverflow answer. But it is basically pretty easy to do by hand ...
pg_dump --schema-only production_db > production_schema.sql
pg_dump --schema-only upgraded_db > upgrade_schema.sql
vimdiff production_schema.sql upgrade_schema.sql
or
diff -Naur production_schema.sql upgrade_schema.sql > changes.patch
vim changes.patch (to check/edit)

South isnt used everywhere. Like in my orgainzation we have 3 levels of code testing. One is local dev environment, one is staging dev enviroment, and third is that of a production .
Local Dev is on the developers hands where he can play according to his needs. Then comes staging dev which is kept identical to production, ofcourse, until a db change has to be done on the live site, where we do the db changes on staging first, and check if everything is working fine and then we manually change the production db making it identical to staging again.

If its not trivial, you should have pre-prod database/ app that mimic the production one. To avoid downtime on production.

How can I specify a database for Django Tests to use instead of having it build it every time?

I want to be able to use an existing test database to run my tests against and not have Django create and delete a database every time I want to run the tests. Is this possible?

It's possible, here is a way :
1) Define your own test runner look here to see how.
2) For your custom test runner look in the default test runner, you can just copy and past the code and just comment this line : connection.creation.destroy_test_db(old_name, verbosity) which is responsible for destroying the test database, and i think you should put the connection.creation.create_test_db(..) line in a try except something like this maybe:
try:
# Create the database the first time.
connection.creation.create_test_db(verbosity, autoclobber=not interactive)
except ..: # Look at the error that this will raise when create a database that already exist
# Test database already created.
pass
3) Bound TEST_RUNNER in setting.py to your test runner.
4) Now run your test like this: ./manage.py test

Who are using Django >= 1.8
python manage.py test --keepdb
--keepdb Preserves the test database between test runs. This has the advantage of skipping both the create and destroy actions which can greatly decrease the time to run tests, especially those in a large test suite. If the test database does not exist, it will be created on the first run and then preserved for each subsequent run. Any unapplied migrations will also be applied to the test database before running the test suite.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.