I am working on testing database accessor methods using a DB API2 methods in Python with Pytest. Automated testing is new to me and I can't seem to figure out what should be done in the case of testing a database with fixtures. I would like to check whether getting fields in a table are successful. To be able to get the same result, I intend to add a row entry every time I run some tests and delete the row after each test that depends on it. The terms I have heard are 'setUp' and 'tearDown' although I have also read that using yield is newer syntax.
My conceptual question whose answer I would like to figure out before writing code is:
What happens when the 'tearDown' portion of the fixture fails? How do I return the database to the same state without the added row entry? Is there a way of recovering from this? I still need the rest of the data in the database?
I read this article [with unittest] that explains what runs when setting up and tearing down methods fail but it falls short on providing an answer to my question.
A common practice is to run each test in its own database transaction, so that no matter what happens any changes are rolled back and database gets returned to a clean state.
Related
I have a couple of SQL statements stored as files which get executed by a Python script. The database is hosted in Snowflake and I use Snowflake SQLAlchemy to connect to it.
How can I test those statements? I don't want to execute them, I just want to check if they could be executable.
One very basic thing to check if it is valid standard SQL. A better answer would be something that considers snowflake-specific stuff like
copy into s3://example from table ...
The best answer would be something that also checks permissions, e.g. for SELECT statements if the table is visible / readable.
An in-memory sqlite database is one option. But if you are executing raw SQL queries against snowflake in your code, your tests may fail if the same syntax isn't valid against sqlite. Recording your HTTP requests against a test snowflake database, and then replaying them for your unit tests suits this purpose better. There are two very good libraries that do this, check them out:
vcrpy
betamax
We do run integration tests on our Snowflake databases. We maintain clones of our production databases, for example, one of our production databases is called data_lake and we maintain a clone that gets cloned on a nightly basis called data_lake_test which is used to run our integration tests against.
Like Tim Biegeleisen mentioned, a "true" unittest would mock the response but our integration tests do run real Snowflake queries on our test cloned databases. There is the possibility that a test drastically alters the test database, but we run integration tests only during our CI/CD process so it is rare if there is ever a conflict between two tests.
I very much like this idea, however I can suggest a work around, as I often have to check my syntax and need help there. What I would recommend, if you plan on using the Snowflake interface would be to make sure to use the LIMIT 10 or LIMIT 1 on the SELECT statements that you would be needing to validate.
Another tip I would recommend is talking to a Snowflake representative about a trial if you are just getting started. They will also have alot of tips for more specific queries you are seeking to validate.
And finally, based on some comments, make sure it uses SQL: ANSI and the live in the https://docs.snowflake.net/manuals/index.html for reference.
As far as the validity of the sql statement is a concern you can run explain of the statement and it should give you error if syntax is incorrect or if you do not have permission to access the object/database. That being there still some exceptions which you cannot run explain for like 'use' command which I do not think is needed for validation.
Hope this helps.
I have a django app and I am running some unit tests on it. So the problem I am having is not when one test inserts into the test db. It is the tests that come after. Since each test is not saving the transaction, the entry from a previous test is not there which is fine, although the auto increment id's are increasing as if there are still entries into the database. Which I need to fix because I am inserting more data where I cannot control the id's given to it and need to be able to grab this specific data for the test. If I hard code the code to grab the objects I will have to change the code for every time I add a new test, which is not ideal.
I have multiple tests running, but for simplicity sake, I will show two.
from django.test import TestCase
from app.models import Model
class VersionMerge(TestCase):
fixtures = ['initial_test_data.json']
def test_model_test1(self):
*Insert new data*
*grab new data in*
*Check data*
def test_model_test2(self):
*Insert new data*
*grab new data*
*Check data*
The problem arises in test_model_test2 where when I try to grab the new data, I have to print the object out to see the id's to be able to grab it.
I have a solution on how I can fix this on the actual database but not the test one. For mine I need to be able to connect to a docker container and run a psql command to reset the table_id_seq.
docker exec -t $CONTAINER_ID psql --dbname=test_database_name -username=user -c "SELECT setval('modelName_appName_id_seq', 2, true)"
This will go to the table and set the last id value used to be 2 to make the next id 3. However whenever I try to run the command from inside python using
cmd = "command above"
os.system(cmd)
and when I run this I get the following error.
sh: 1: docker: not found
sh: 1: docker: not found
Looking for any help on this, either a new solution to the problem or improvements on mine.
TLDR; I need to be able to modify data in the database that the django unit tests create.
If you need a test to reset the primary key sequence, you can issue a RawSQL query doing that. How to do that exactly is answered in this StackOverflow question.
An easier option is available if you're using pytest. We're using pytest and pytest-django in all our Django projects and it makes testing a breeze. pytest-django provdes a database fixture that can take a boolean parameter to reset the sequences. Use it like so:
#pytest.mark.django_db(transaction=True, reset_sequences=True)
def mytest():
[...]
I got this to work by replacing TestCase with TransactionTestCase and set reset_sequences=True. However the tests are running slower.
from django.test import TransactionTestCase
class ViewTest(TransactionTestCase):
reset_sequences = True
def test_view_redirects(self):
...
Here's the doc
You are using a fixtures file - put whatever data you want in there; then edit it in your test as needed.
Although, that is harder to maintain. In my opinion - there are far better options that may work much more like what you intend.
You're better off using something like factory_boy and generating the models (and related foreign keys) at instantiation with dummy data you provide.
That way you know exactly what is being tested and it's completely independent of everything else. The nice part is that with factory_boy you will have a factories.py file you can keep up to date much easier than working with some fixture.
There are other options like Mixer or model_mommy, although I only have experience with factory_boy and mixer.
With factory_boy it might look something like this:
def test_model_test1(self):
factory.ModelFactory(
some_specific_attribute='some_specific_value'
)
model = Model.objects.all().first()
# Test against your model
I am close to finishing an ORM for RethinkDB in Python and I got stuck at writing tests. Particularly at those involving save(), get() and delete() operations. What's the recommended way to test whether my ORM does what it is supposed to do when saving or deleting or getting a document?
Right now, for each test in my suite I create a database, populate it with all tables needed by the test models (this takes a lot of time, almost 5 seconds/test!), run the operation on my model (e.g.: save()) and then manually run a query against the database (using RethinkDB's Python driver) to see whether everything has been updated in the database.
Now, I feel this isn't just right; maybe there is another way to write these tests or maybe I can design the tests without even running that many queries against the database. Any idea on how can I improve this or a suggestion on how this has to be really done?
You can create all your databases/tables just once for all your test.
You can also use the raw data directory:
- Start RethinkDB
- Create all your databases/tables
- Commit it.
Before each test, copy the data directory, start RethinkDB on the copy, then when your test is done, delete the copied data directory.
I have a script which cleans the database, and this is widely used in our tests.
First, we tried to use SQLAlchemy Metadata.drop_all() thing, but it didn't resolve some foreign keys on deletion, which caused errors. Then, I found this script from #zzzeek, which does almost the same, but in a "smart" way. It handles all the issues with foreign keys, but now there are several issues regarding changed custom types (ENUMs). The question is, how can I drop them all them using SQLAlchemy? Execute DROP TYPE by hand only?
Tables in database are created with Alembic, and even while script above deletes all tables successfully, some custom ENUMs are still there and everything fails on attempt to recreate them.
Recreating the whole database is not a preferred solution, because default DB user for application shouldn't normally have rights to create databases.
Are you sure your Metadata instance fully describes all the tables?
Try:
Metadata.reflect()
Metadata.drop_all()
This is an ancient question, but it's still possible to come across this problem if create_type=False is on any postgresql.ENUM definitions.
Per the SQLAlchemy docs on create_type,
When False, no check will be performed and no CREATE TYPE or DROP TYPE is emitted, unless ENUM.create() or ENUM.drop() are called directly.
That means when running tests, while there may be a setup & teardown with create_all() and drop_all(), neither will affect custom enum types.
The solution is simply to remove create_type=False, since True is the default. Then all custom types will be created at the beginning of testing and dropped at the end, resulting in a perfectly clean test database.
I have a problem with SQL Alchemy - my app works as a constantly working python application.
I have function like this:
def myFunction(self, param1):
s = select([statsModel.c.STA_ID, statsModel.c.STA_DATE)])\
.select_from(statsModel)
statsResult = self.connection.execute(s).fetchall()
return {'result': statsResult, 'calculation': param1}
I think this is clear example - one result set is fetched from database, second is just passed as argument.
The problem is that when I change data in my database, this function still returns data like nothing was changed. When I change data in input parameter, returned parameter "calculation" has proper value.
When I restart the app server, situation comes back to normal - new data are fetched from MySQL.
I know that there were several questions about SQLAlchemy caching like:
How to disable caching correctly in Sqlalchemy orm session?
How to disable SQLAlchemy caching?
but how other can I call this situation? It seems SQLAlchemy keeps the data fetched before and does not perform new queries until application restart. How can I avoid such behavior?
Calling session.expire_all() will evict all database-loaded data from the session. Any access of object attributes subsequent emits a new SELECT statement and gets new data back. Please see http://docs.sqlalchemy.org/en/latest/orm/session_state_management.html#refreshing-expiring for background.
If you still see so-called "caching" after calling expire_all(), then you need to close out transactions as described in my answer linked above.
A few possibilities.
You are reusing your session improperly or at improper time. Best practice is to throw away your session after you commit, and get a new one at the last possible moment before you use it. The behavior that appears to be caching may in fact be due to a session lifetime being very long in your application.
Objects that survive longer than the session are not being merged into a subsequent session. The "metadata" may not be able to update their state if you do not merge them back in. This is more a concern for the ORM API of SQLAlchemy, which you do not appear so far to be using.
Your changes are not committed. You say they are so we'll assume this is not it, but if none of the other avenues explain it you may want to look again.
One general debugging tip: if you want to know exactly what SQLAlchemy is doing in the database, pass echo=True to the create_engine function. The engine will print all queries it runs.
Also check out this suggestion I made to someone else, who was using ORM and had transactionality problems, which resolved their issue without ever pinpointing it. Maybe it will help you.
You need to change transaction isolation level to READ_COMMITTED
http://docs.sqlalchemy.org/en/rel_0_9/dialects/mysql.html#mysql-isolation-level