I needed to perform an UPDATE JOIN query but django has no builtin support for that, so I wrote a raw SQL query. But now I need to build the where clause of that query dynamically, so I would like to know how to reuse Django's SQL compiler.
I could take the Query object from Model.objects.filter(...).query, then generate the raw SQL of the where clause with query.where.as_sql(query.get_compiler(using='default'), None) but the tables from my current raw SQL are aliased, so the lookup fields will be wrong.
Related
In Django, you can extract a plain-text SQL query from a QuerySet object like this:
queryset = MyModel.objects.filter(**filters)
sql = str(queryset.query)
In most cases, this query itself is not valid - you can't pop this into a SQL interface of your choice or pass it to MyModel.objects.raw() without exceptions, since quotations (and possibly other features of the query) are not performed by Django but rather by the database interface at execution time. So at best, this is a useful debugging tool.
Coming from a data science background, I often need to write a lot of complex SQL queries to aggregate data into a reporting format. The Django ORM can be awkward at best and impossible at worst when queries need to be very complex. However, it does offer some security and convenience with respect to limiting SQL injection attacks and providing a way to dynamically build a query - for example, generating the WHERE clause for the query using the .filter() method of a model. I want to be able to use the ORM to generate a base data set in the form of a query, then take that query and use it as a subquery/CTE in a larger query that handles more complex logic. For example:
queryset = MyModel.objects.filter(**filters)
sql = str(queryset.query)
more_complex_query = f"""
with filtered_table as ({sql})
select
*
/* add other stuff */
from
filtered_table
"""
results = MyModel.objects.raw(more_complex_query)
In this case, the ORM generates a query that can be used to filter the base table, then the CTE/raw sql can take that result and do whatever calculations need to be done with a tool that is more common among people working with data (SQL) than the Django ORM, while still getting the ORM benefits of stripping bad actors out.
However, this method requires a way to generate a usable SQL query from a QuerySet object. I've found a workaround for postgres databases using the psycopg2 cursor:
from django.db import connections
# Whatever the key is in your settings.DATABASES for the reporting db
WAREHOUSE_CONNECTION_NAME = 'default'
# Get the Query object and separate it into the query and params
filtered_table_query = MyModel.objects.filter(**filters).query
raw_query, params = filtered_table_query.sql_with_params()
# Create a cursor from the relevant connection
cursor = connections[WAREHOUSE_CONNECTION_NAME].cursor()
# Call .mogrify() on the query/params to get an executable query string
usable_sql = cursor.mogrify(raw_query, params)
cursor.execute(usable_sql) # This works
cursor.fetchall() # This works
# Have not tried this yet
MyModel.objects.raw(usable_sql)
# Or this
wrapper_query = f"""
with base_table as ({usable_sql})
select
*
from
base_table
"""
cursor.execute(wrapper_query)
# or
MyModel.objects.raw(wrapper_query)
This method is dependent on the psycopg2 cursor method .mogrify() - I am not sure if this works for other back ends or if the DB API 2.0 spec takes care of that.
Other people have suggested creating a view in the database and then using an unmanaged Django model on top of the view, but I think this does not really work when your queries are dynamic in nature, i.e. need to be filtered differently based on some user input, since often the fields a user wants to filter on are not present in the result set after some aggregation.
So overall, I have two questions:
Is there a reason why Django does not let you extract a usable SQL query as a standard offering?
What other methods do people use when the ORM makes your elegant SQL into an ugly mess?
The Django developers tend to frown on features that aren't cross-compatible across all the databases they support. I can only imagine that one of the supported database engines doesn't have this capability and so they don't provide it as a standard, documented feature of the ORM.
But that's just a guess. You'd really have to ask one of the devs :)
I am new to sql alchemy.
I have a postgres local server, and I want to use sql alchemy to create a database.
I have the following code:
connection = engine.connect()
connection.execute(
text("CREATE DATABASE :database_name").bindparams(bindparam('database_name', quote=False)),
database_name="test_db"
)
But this unfortunately single quotes the database name parameter, which does not work in postgres. The logs from sql alchemy:
[SQL: CREATE DATABASE %(database_name)s]
[parameters: {'database_name': 'test_db'}]
sqlalchemy.exc.ProgrammingError: (psycopg2.errors.SyntaxError) syntax error at or near "'test_db'" LINE 1: CREATE DATABASE 'test_db`'
In postgres logs, it executes the following statement, which is invalid because of the single quotes. A valid one would have double quotes:
CREATE DATABASE 'test_db'
Is there a way for the bind parameter to not be quoted in the resulting statement? I do not want to do the parameter quoting and string creation myself, as I think this abstraction should be handled by sql alchemy - in case I change my underlying database engine for eg, and this looks to be the mechanism sql alchemy promotes to avoid sql injections too.
The same question would apply to other postgres statements like creating an user with a password, or granting privileges to an existing user, which all need quoting which is postgres specific.
You cannot have parameters in statements other than SELECT, INSERT, UPDATE or DELETE.
You'll have to construct the CREATE DATABASE statement as a string containing the database name. Something like
from psycopg2 import sql
cursor.execute(
sql.SQL("CREATE DATABASE {}").format(sql.Identifier('test_db'))
)
Summary
In SQL Server, synonyms are often used to abstract a remote table into the current database context. Normal DML operations work just fine on such a construct, but SQL Server does track synonyms as their own object type separately from tables.
I'm attempting to leverage the pandas DataFrame#to_sql method to facilitate loading a synonym, and while it works well when the table is local to the database, it is unable to locate the table via synonym and instead attempts to create a new table coordinating with the DataFrame's structure, which results in an object name collision and undesirable behavior.
Tracking through the source, it looks like pandas leverages the dialect's has_table method, which in this case tracks to SQL Alchemy's MSSQL dialect implementation, which then queries the INFORMATION_SCHEMA.columns view as a way to verify whether the table exists.
Unfortunately, synonym tables don't appear in INFORMATION_SCHEMA views like this. In the answer for "How to find all column names of a synonym", the answerer provides a technique for establishing a synonym's columns, which may be applicable here.
The Question
Is there any method available which can optionally skip table existence checks during DataFrame#to_sql? If not, is there any way to force pandas or SQL Alchemy to recognize a synonym? I couldn't find any similar questions on SO, and neither git had an issue resembling this either.
I've accepted my own answer, but if anyone has a better technique for loading DataFrames to SQL Server synonyms, please post it!
SQL Alchemy on SQL Server doesn't currently support synonym tables, which means that the DataFrame#to_sql method cannot insert to them and another technique must be employed.
As of SQL Alchemy 1.2, the Oracle dialect supports Synonym/DBLINK Reflection, but no similar feature is available for SQL Server, even on the upcoming SQL Alchemy 1.4 release.
For those trying to solve this in different ways, if your situation meets the following criteria:
Your target synonym is already declared in the ORM as a table
The table's column names match the column names in the DataFrame
The table's column data types either match the DataFrame or can be casted without error
You can perform the following bulk_insert_mappings operation, with TargetTable defining your target in the ORM model and df defining your DataFrame:
db.session.bulk_insert_mappings(
TargetTable, df.to_dict('records')
)
As a bonus, this is substantially faster than the DataFrame#to_sql operation as well!
Are there any Python libraries that provide an abstraction of SQL DDL?
I have an application that needs to dynamically add/adjust database columns, and I don't want to have to model CREATE TABLE and all the datatypes.
I am looking for something relatively lightweight; full ORMs like SQLAlchemy will unfortunately not be available.
Have you looked at SQLAlchemy?
It's an object-relational mapper (abstraction layer) that sits between your python code and the (relational) database.
It does DDL such as create table.
I know that you can get the SQL of a given QuerySet using
print query.query
but as we know from a previous question ( Potential Django Bug In QuerySet.query? ) the returned SQL is not properly quoted. See http://code.djangoproject.com/browser/django/trunk/django/db/models/sql/query.py
Is there any way that is it possible to get the raw, executable SQL (quoted) for a given QuerySet without actually executing it?
Django never creates the raw sql, so no. To prevent SQL injection, django passes the parameters separately to the database drivers at the last step. The best way to get the actual SQL is to look at your query log, which you cannot do before you execute the query.