How to retrieve the real SQL from the Django logger? - python

I am trying to analyse the SQL performance of our Django (1.3) web application. I have added a custom log handler which attaches to django.db.backends and set DEBUG = True, this allows me to see all the database queries that are being executed.
However the SQL is not valid SQL! The actual query is select * from app_model where name = %s with some parameters passed in (e.g. "admin"), however the logging message doesn't quote the params, so the sql is select * from app_model where name = admin, which is wrong. This also happens using django.db.connection.queries. AFAIK the django debug toolbar has a complex custom cursor to handle this.
Update For those suggesting the Django debug toolbar: I am aware of that tool, it is great. However it does not do what I need. I want to run a sample interaction of our application, and aggregate the SQL that's used. DjDT is great for showing and shallow learning. But not great for aggregating and summarazing the interaction of dozens of pages.
Is there any easy way to get the real, legit, SQL that is run?

Check out django-debug-toolbar. Open a page, and a sidebar will be displayed with all SQL queries plus other information.

select * from app_model where name = %s is a prepared statement. I would recommend you to log the statement and the parameters separately. In order to get a wellformed query you need to do something like "select * from app_model where name = %s" % quote_string("user") or more general query % map(quote_string, params).
Please note that quote_string is DB specific and the DB 2.0 API does not define a quote_string method. So you need to write one yourself. For logging purposes I'd recommend keeping the queries and parameters separate as it allows for far better profiling as you can easily group the queries without taking the actual values into account.

The Django Docs state that this incorrect quoting only happens for SQLite.
https://docs.djangoproject.com/en/dev/ref/databases/#sqlite-connection-queries
Have you tried another Database Engine?

Every QuerySet object has a 'query' attribute. One way to do what you want (I accept perhaps not an ideal one) is to chain the lookups each view is producing into a kind of scripted user-story, using Django's test client. For each lookup your user story contains just append the query to a file-like object that you write at the end, for example (using a list instead for brevity):
l = []
o = Object.objects.all()
l.append(o.query)

Related

Why does Django not allow a user to extract a usable query from a QuerySet as a standard feature?

In Django, you can extract a plain-text SQL query from a QuerySet object like this:
queryset = MyModel.objects.filter(**filters)
sql = str(queryset.query)
In most cases, this query itself is not valid - you can't pop this into a SQL interface of your choice or pass it to MyModel.objects.raw() without exceptions, since quotations (and possibly other features of the query) are not performed by Django but rather by the database interface at execution time. So at best, this is a useful debugging tool.
Coming from a data science background, I often need to write a lot of complex SQL queries to aggregate data into a reporting format. The Django ORM can be awkward at best and impossible at worst when queries need to be very complex. However, it does offer some security and convenience with respect to limiting SQL injection attacks and providing a way to dynamically build a query - for example, generating the WHERE clause for the query using the .filter() method of a model. I want to be able to use the ORM to generate a base data set in the form of a query, then take that query and use it as a subquery/CTE in a larger query that handles more complex logic. For example:
queryset = MyModel.objects.filter(**filters)
sql = str(queryset.query)
more_complex_query = f"""
with filtered_table as ({sql})
select
*
/* add other stuff */
from
filtered_table
"""
results = MyModel.objects.raw(more_complex_query)
In this case, the ORM generates a query that can be used to filter the base table, then the CTE/raw sql can take that result and do whatever calculations need to be done with a tool that is more common among people working with data (SQL) than the Django ORM, while still getting the ORM benefits of stripping bad actors out.
However, this method requires a way to generate a usable SQL query from a QuerySet object. I've found a workaround for postgres databases using the psycopg2 cursor:
from django.db import connections
# Whatever the key is in your settings.DATABASES for the reporting db
WAREHOUSE_CONNECTION_NAME = 'default'
# Get the Query object and separate it into the query and params
filtered_table_query = MyModel.objects.filter(**filters).query
raw_query, params = filtered_table_query.sql_with_params()
# Create a cursor from the relevant connection
cursor = connections[WAREHOUSE_CONNECTION_NAME].cursor()
# Call .mogrify() on the query/params to get an executable query string
usable_sql = cursor.mogrify(raw_query, params)
cursor.execute(usable_sql) # This works
cursor.fetchall() # This works
# Have not tried this yet
MyModel.objects.raw(usable_sql)
# Or this
wrapper_query = f"""
with base_table as ({usable_sql})
select
*
from
base_table
"""
cursor.execute(wrapper_query)
# or
MyModel.objects.raw(wrapper_query)
This method is dependent on the psycopg2 cursor method .mogrify() - I am not sure if this works for other back ends or if the DB API 2.0 spec takes care of that.
Other people have suggested creating a view in the database and then using an unmanaged Django model on top of the view, but I think this does not really work when your queries are dynamic in nature, i.e. need to be filtered differently based on some user input, since often the fields a user wants to filter on are not present in the result set after some aggregation.
So overall, I have two questions:
Is there a reason why Django does not let you extract a usable SQL query as a standard offering?
What other methods do people use when the ORM makes your elegant SQL into an ugly mess?
The Django developers tend to frown on features that aren't cross-compatible across all the databases they support. I can only imagine that one of the supported database engines doesn't have this capability and so they don't provide it as a standard, documented feature of the ORM.
But that's just a guess. You'd really have to ask one of the devs :)

Is there a way to see the raw SQL executed by a bulk_create on django?

I'm using Django's ORM to insert thousands of objects in a Postgre's DB. And it works fine, but sometimes one of those registers have a wrong format and the insert operation doesn't work.
I can't do this kind of insert ignoring errors so I'd like to see the SQL executed by the operation and the bulk_insert only returns a list of the objects.
When in debug-mode you could use the django.db.backends logger.
https://docs.djangoproject.com/en/1.8/topics/logging/#django-db-backends
In production I would use loggers for PostGres itself, because saving these queries from within a Django process will (probably) have major impact on your performance.

Project Structure for Python projects with MySQL queries?

I have a Python project that makes a variety of MySQL queries, which take some variables and insert them into the query itself. So for example:
number_of_rows = 50
query1 = '''select *
from some_db
limit %s''' % (number_of_rows)
However, since I have a lot of long queries, within a script that manipulates and cleans the data, it's making my script less readable. What is a reasonable way to structure my program so that it is both readable and makes these query calls? One way that has worked so far is to have another python file, let's call it my_query_file.py, with something along the lines of
def my_first_query(number_of_rows):
query1 = '''select *
from some_db
limit %s''' % (number_of_rows)
return query
and then importing my_query_file from within my main project file, and calling my_first_query. Is there a better way to do this, though?
consider using existing query builder like python-sql:
https://code.google.com/p/python-sql/
or if the complexity of your application justifies that you might try full blown ORM like SQL Alchemy:
http://www.sqlalchemy.org/
Have you thought about using a template system like Jinja2? I've used it to store and use templates for a variety of situations (ranging from web development to automatic code generation) and it works very well.
You could have your SQL queries in template files, and load and fill them as you wish.

How to use database view in test cases

I am unable to use database view in test cases. other hand i am able to use those database view in front end function . but when i try to get data from view in it return null in test case.
Please give me suggestion for use database views in test cases
By database view do you mean you are using an unmanaged model which represents an underlying database view (as described here)?
If so, I have found that, during unit testing, Django ignores the managed = False setting in the model meta and creates an actual table. Unless you explicitly populate this in your setUp this will be empty.
A quick-and-dirty way of getting around this is to explicitly drop the table and create the view in your test case's setUp method, like this:
# Imports
from django.db import connection
from django.core.files import File
...
# Inside your test case setUp method
# Drop the table
cursor = connection.cursor()
# See note 1
cursor.execute("SET #OLD_SQL_NOTES=##SQL_NOTES, SQL_NOTES=0; DROP TABLE IF EXISTS myproject_myview; SET SQL_NOTES=#OLD_SQL_NOTES;")
cursor.close()
# Create the view
# See note 2
file_handle=open('/full/path/to/myproject/sql/create_myview.sql','r+')
sql_file=File(file_handle)
sql = sql_file.read()
cursor = connection.cursor()
cursor.execute(sql)
cursor.close()
Notes:
This is to get around a MySQL problem so might not apply to your case. The table will only exist the first time setUp is run. If you try to drop the table on a subsequent pass MySQL will generate warnings - this code suppresses them.
This file contains creation code for a single view in the format CREATE OR REPLACE VIEW myproject_myview AS.... I've found that trying to execute a file containing multiple commands with the same cursor also causes problems.
I'm guessing by a database view you mean accessing a database inside a view.
That being said, I think your problem is that you dont have a test database that Django is trying to test against.
This is how you start off with that and its called fixtures. (You could do this with SQL as well but I think it's easier with fixtures).
The easiest being using the dumpdata command provided by Django.
python manage.py dumpdata
This will create a file, which will be in your apps directory, which you can use in your tests like this:
For example
myDjangoProject/myCoreApp/fixtures/myCoreApp_views_testdata.json
NOTE: The myCoreApp won't be named this.
You could also set a FIXTURES_DIR setting in your settings.py as to tell Django where to look for fixtures in the future.
To use a fixture then in your tests you do the following
class SomeViewThatIWantToTest(TestCase): #Note, you must use django.test.TestCase
fixtures = ['core_views_testdata.json']
After this you should be able to access your data in your views as normal.
This might require some tuning to fit your exact example so I added a link to the official docs at the bottom!
Good luck and please do correct me if I'm wrong! :)
Read more about this here

Sanitizing user-provided SQL with Python?

I'm working on a small app which will help browse the data generated by vim-logging, and I'd like to allow people to run arbitrary SQL queries against the datasets.
How can I do that safely?
For example, I'd like to let someone enter, say, SELECT file_type, count(*) FROM commands GROUP BY file_type, then send the result back to their web browser.
Do this:
cmd = "update people set name=%s where id=%s"
curs.execute(cmd, (name, id))
Note that the placeholder syntax depends on the database you are using.
Source and more info here:
http://bobby-tables.com/python.html
Allowing expressive power while preventing destruction is a difficult job. If you let them enter "SELECT .." themselves, you need to prevent them from entering "DELETE .." instead. You can require the statement to begin with "SELECT", but then you also have to be sure it doesn't contain "; DELETE" somewhere in the middle.
The safest thing to do might be to connect to the database with read-only user credentials.
In MySQL, you can create a limited user (create new user and grant limited access), which can only access certain table.
Consider using SQLAlchemy. While SQLAlchemy is arguably the greatest Object Relational Mapper ever, you certainly don't need to use any of the ORM stuff to take advantage of all of the great Python/SQL work that's been done.
As the introductory documentation suggests:
Most importantly, SQLAlchemy is not just an ORM. Its data abstraction layer allows construction and manipulation of SQL expressions in a platform agnostic way, and offers easy to use and superfast result objects, as well as table creation and schema reflection utilities. No object relational mapping whatsoever is involved until you import the orm package. Or use SQLAlchemy to write your own!
Using SQLAlchemy will give you input sanitation "for free" and let you use standard Python logic to analyze statements for safety without having to do any messy text-parsing/pattern-matching.

Categories

Resources