How can I create a unit test for SQL statements? - python

I have a couple of SQL statements stored as files which get executed by a Python script. The database is hosted in Snowflake and I use Snowflake SQLAlchemy to connect to it.
How can I test those statements? I don't want to execute them, I just want to check if they could be executable.
One very basic thing to check if it is valid standard SQL. A better answer would be something that considers snowflake-specific stuff like
copy into s3://example from table ...
The best answer would be something that also checks permissions, e.g. for SELECT statements if the table is visible / readable.

An in-memory sqlite database is one option. But if you are executing raw SQL queries against snowflake in your code, your tests may fail if the same syntax isn't valid against sqlite. Recording your HTTP requests against a test snowflake database, and then replaying them for your unit tests suits this purpose better. There are two very good libraries that do this, check them out:
vcrpy
betamax

We do run integration tests on our Snowflake databases. We maintain clones of our production databases, for example, one of our production databases is called data_lake and we maintain a clone that gets cloned on a nightly basis called data_lake_test which is used to run our integration tests against.
Like Tim Biegeleisen mentioned, a "true" unittest would mock the response but our integration tests do run real Snowflake queries on our test cloned databases. There is the possibility that a test drastically alters the test database, but we run integration tests only during our CI/CD process so it is rare if there is ever a conflict between two tests.

I very much like this idea, however I can suggest a work around, as I often have to check my syntax and need help there. What I would recommend, if you plan on using the Snowflake interface would be to make sure to use the LIMIT 10 or LIMIT 1 on the SELECT statements that you would be needing to validate.
Another tip I would recommend is talking to a Snowflake representative about a trial if you are just getting started. They will also have alot of tips for more specific queries you are seeking to validate.
And finally, based on some comments, make sure it uses SQL: ANSI and the live in the https://docs.snowflake.net/manuals/index.html for reference.

As far as the validity of the sql statement is a concern you can run explain of the statement and it should give you error if syntax is incorrect or if you do not have permission to access the object/database. That being there still some exceptions which you cannot run explain for like 'use' command which I do not think is needed for validation.
Hope this helps.

Related

What to do when database fixture teardown fails in testing?

I am working on testing database accessor methods using a DB API2 methods in Python with Pytest. Automated testing is new to me and I can't seem to figure out what should be done in the case of testing a database with fixtures. I would like to check whether getting fields in a table are successful. To be able to get the same result, I intend to add a row entry every time I run some tests and delete the row after each test that depends on it. The terms I have heard are 'setUp' and 'tearDown' although I have also read that using yield is newer syntax.
My conceptual question whose answer I would like to figure out before writing code is:
What happens when the 'tearDown' portion of the fixture fails? How do I return the database to the same state without the added row entry? Is there a way of recovering from this? I still need the rest of the data in the database?
I read this article [with unittest] that explains what runs when setting up and tearing down methods fail but it falls short on providing an answer to my question.
A common practice is to run each test in its own database transaction, so that no matter what happens any changes are rolled back and database gets returned to a clean state.

SQLite query restrictions

I am building a little interface where I would like users to be able to write out their entire sql statement and then see the data that is returned. However, I don't want a user to be able to do anything funny ie delete from user_table;. Actually, the only thing I would like users to be able to do is to run select statements. I know there aren't specific users for SQLite, so I am thinking what I am going to have to do, is have a set of rules that reject certain queries. Maybe a regex string or something (regex scares me a little bit). Any ideas on how to accomplish this?
def input_is_safe(input):
input = input.lower()
if "select" not in input:
return False
#more stuff
return True
I can suggest a different approach to your problem. You can restrict the access to your database as read-only. That way even when the users try to execute delete/update queries they will not be able to damage your data.
Here is the answer for Python on how to open a read-only connection:
db = sqlite3.connect('file:/path/to/database?mode=ro', uri=True)
Python's sqlite3 execute() method will only execute a single SQL statement, so if you ensure that all statements start with the SELECT keyword, you are reasonably protected from dumb stuff like SELECT 1; DROP TABLE USERS. But you should check sqlite's SQL syntax to ensure there is no way to embed a data definition or data modification statement as a subquery.
My personal opinion is that if "regex scares you a little bit", you might as well just put your computer in a box and mail it off to <stereotypical country of hackers>. Letting untrusted users write SQL code is playing with fire, and you need to know what you're doing or you'll get fried.
Open the database as read only, to prevent any changes.
Many statements, such as PRAGMA or ATTACH, can be dangerous. Use an authorizer callback (C docs) to allow only SELECTs.
Queries can run for a long time, or generate a large amount of data. Use a progress handler to abort queries that run for too long.

Testing an ORM for RethinkDB

I am close to finishing an ORM for RethinkDB in Python and I got stuck at writing tests. Particularly at those involving save(), get() and delete() operations. What's the recommended way to test whether my ORM does what it is supposed to do when saving or deleting or getting a document?
Right now, for each test in my suite I create a database, populate it with all tables needed by the test models (this takes a lot of time, almost 5 seconds/test!), run the operation on my model (e.g.: save()) and then manually run a query against the database (using RethinkDB's Python driver) to see whether everything has been updated in the database.
Now, I feel this isn't just right; maybe there is another way to write these tests or maybe I can design the tests without even running that many queries against the database. Any idea on how can I improve this or a suggestion on how this has to be really done?
You can create all your databases/tables just once for all your test.
You can also use the raw data directory:
- Start RethinkDB
- Create all your databases/tables
- Commit it.
Before each test, copy the data directory, start RethinkDB on the copy, then when your test is done, delete the copied data directory.

Deploying database changes with Python

I'm wondering if someone can recommend a good pattern for deploying database changes via python.
In my scenario, I've got one or more PostgreSQL databases and I'm trying to deploy a code base to each one. Here's an example of the directory structure for my SQL scripts:
my_db/
main.sql
some_directory/
foo.sql
bar.sql
some_other_directory/
baz.sql
Here's an example of what's in main.sql
/* main.sql has the following contents: */
BEGIN TRANSACTION
\i some_directory/bar.sql
\i some_directory/foo.sql
\i some_other_directory/baz.sql
COMMIT;
As you can see, main.sql defines a specific order of operations and a transaction for the database updates.
I've also got a python / twisted service monitoring SVN for changes in this db code, and I'd like to automatically deploy this code upon discovery of new stuff from the svn repository.
Can someone recommend a good pattern to use here?
Should I be parsing each file?
Should I be shelling out to psql?
...
What you're doing is actually a decent approach if you control all the servers and they're all postgresql servers.
A more general approach is to have a directory of "migrations" which are generally classes with an apply() and undo() that actually do the work in your database, and often come with
abstractions like .create_table() that generate the DDL instructions specific to whatever RDBMS you're using.
Generally, you have some naming convention that ensures the migrations run in the order they were created.
There's a migration library for python called South, though it appears to be geared specifically toward django development.
http://south.aeracode.org/docs/about.html
We just integrated sqlalchemy-migrate which has some pretty Rails-like conventions but with the power of SQLAlchemy. It's shaping up to be a really awesome product, but it does have some downfalls. It's pretty seamless to integrate though.

DB Permissions with Django unit testing

Disclaimer:
I'm very new to Django. I must say that so far I really like it. :)
(now for the "but"...)
But, there seems to be something I'm missing related to unit testing. I'm working on a new project with an Oracle backend. When you run the unit tests, it immediately gives a permissions error when trying to create the schema. So, I get what it's trying to do (create a clean sandbox), but what I really want is to test against an existing schema. And I want to run the test with the same username/password that my server is going to use in production. And of course, that user is NOT going to have any kind of DDL type rights.
So, the basic problem/issue that I see boils down to this: my system (and most) want to have their "app_user" account to have ONLY the permissions needed to run. Usually, this is basic "CRUD" permissions. However, Django unit tests seem to need more than this to do a test run.
How do other people handle this? Is there some settings/work around/feature of Django that I'm not aware (please refer to the initial disclaimer).
Thanks in advance for your help.
David
Don't force Django to do something unnatural.
Allow it to create the test schema. It's a good thing.
From your existing schema, do an unload to create .JSON dump files of the data. These files are your "fixtures". These fixtures are used by Django to populate the test database. This is The Greatest Testing Tool Ever. Once you get your fixtures squared away, this really does work well.
Put your fixture files into fixtures directories within each app package.
Update your unit tests to name the various fixtures files that are required for that test case.
This -- in effect -- tests with an existing schema. It rebuilds, reloads and tests in a virgin database so you can be absolutely sure that it works without destroying (or even touching) live data.
As you've discovered, Django's default test runner makes quite a few assumptions, including that it'll be able to create a new test database to run the tests against.
If you need to override this or any of these default assumptions, you probably want to write a custom test runner. By doing so you'll have full control over exactly how tests are discovered, bootstrapped, and run.
(If you're running Django's development trunk, or are looking forward to Django 1.2, note that defining custom test runners has recently gotten quite a bit easier.)
If you poke around, you'll find a few examples of custom test runners you could use to get started.
Now, keep in mind that once you've taken control of test running you'll need to ensure that you someone meet the same assumptions about environment that Django's built-in runner does. In particular, you'll need to someone guarantee that whatever test database you'll use is a clean, fresh one for the tests -- you'll be quite unhappy if you try to run tests against a database with unpredictable contents.
After I read David's (OP) question, I was curious about this too, but I don't see the answer I was hoping to see. So let me try to rephrase what I think at least part of what David is asking. In a production environment, I'm sure his Django models probably will not have access to create or drop tables. His DBA will probably not allow him to have permission to do this. (Let's assume this is True). He will only be logged into the database with regular user privileges. But in his development environment, the Django unittest framework forces him to have higher level privileges for the unittests instead of a regular user because Django requires it to create/drop tables for the model unittests. Since the unittests are now running at a higher privilege than will occur in production, you could argue that running the unittests in development are not 100% valid and errors could happen in production that might have been caught in development if Django could run the unittests with user privileges.
I'm curious if Django unittests will ever have the ability to create/drop tables with one user's (higher) privileges, and run the unittests with a different user's (lower) privileges. This would help more accurately simulate the production environment in development.
Maybe in practice this is really not an issue. And the risk is so minor compared to the reward that it not worth worrying about it.
Generally speaking, when unit tests depend on test data to be present, they also depend on it to be in a specific format/state. As such, your framework's policy is to not only execute DML (delete/insert test data records), but it also executes DDL (drop/create tables) to ensure that everything is in working order prior to running your tests.
What I would suggest is that you grant the necessary privileges for DDL to your app_user ONLY on your test_ database.
If you don't like that solution, then have a look at this blog entry where a developer also ran into your scenario and solved it with a workaround:
http://www.stopfinder.com/blog/2008/07/26/flexible-test-database-engine-selection-in-django/
Personally, my choice would be to modify the privileges for the test database. This way, I could rule out all other variables when comparing performance/results between testing/production environments.
HTH,
-aj
What you can do, is creating separate test settings.
As I've learned at http://mindlesstechnology.wordpress.com/2008/08/16/faster-django-unit-tests/ you can use the sqlite3 backend, which is created in memory by the Django unit test framework.
Quoting:
Create a new test-settings.py file next to your app’s settings.py
containing:
from projectname.settings import * DATABASE_ENGINE = 'sqlite3'
Then when you want to run tests real fast, instead of manage.py test,
you run
manage.py test --settings=test-settings
This runs my test suite in less than 5 seconds.
Obviously you still want to run tests on your real db backend, but
this is awesome for sanity checks, and while you’re doing test
development.
To load initial data, provide fixtures in your testcase.
class MyAppTestCase(TestCase):
fixtures = ['myapp/fixtures/filename']

Categories

Resources