Add custom SQL into WHERE clause in Django, without using .raw() - python

I have a complex django ORM query that I'd really rather not have to convert to raw SQL, because it's a very non-trivial query, I want consistency, I use a number of the ORM features to generate the query, and it's been thoroughly tested as it stands.
I want to add a single filter to the WHERE clause on a datetime field. However, I want to test against the date part only, not the time.
Here's a simplified version of my existing query:
MyTable.objects.filter(date_field__gte=datetime.now().date())
But I've converted the date_field to datetime_field for more precision in some scenarios. In this scenario, however, I still want a date-only comparison. Something like:
MyTable.objects.filter(datetime_field__datepartonly__gte=datetime.now().date())
In postgres, my database of choice, that's simple:
SELECT * FROM mytable WHERE DATE(datetime_field) >= ...
How can I do this in django, without converting the entire query to raw SQL?
I tried using F(), but you can only specify field names, not custom SQL.
I tried using Q(), but same deal.
I tried Django's SQL functions (Sum, etc), but there are only a few, and it looks like they're designed solely for agreggate queries.
I tried using an alias, but you can't use aliases in a WHERE clause, either in Django, or in SQL
How can I do this in django, without converting the entire query to raw SQL?

The year, month and day fields of the datetime object are available to you, but after testing here it doesn't seem to allow the additional __gte application to the field.
This will work:
now = datetime.now()
results = MyTable.objects.filter(datetime_field__year=now.year, datetime_field__month=now.month, datetime_field__day=now.day)
But doesn't allow gte.
you can always just create a datetime starting at 0:00
now = datetime.now().replace(hour=0, minute=0, second=0, microsecond=0)
results = MyTable.objects.filter(datetime_field__gte=now)

Related

How to best handle the access with Django to a database that has some DateTime fields stored in local timezone while others are stored in UTC?

What would be the best approach to handle the following case with Django?
Django needs access to a database (in MariaDB) in which datetime values are stored in UTC timezone, except for one table that has all values for all of its datetime columns stored in local timezone (obviously different that UTC). This particular table is being populated by a different system, not Django, and for some reasons we cannot have the option to convert the timestamps in that table to UTC or change that system to start storing the values in UTC. The queries involving that table are read-only, but may join data from other tables. The table itself does not have a foreign key but there are other tables with a foreign key to that table. The table is very big (millions of rows) and one of its datetime columns is part of more than one indexes that help for making optimized queries.
I am asking your opinion for an approach to the above case that would be as seamless as it can be, preferably without doing conversions here and there in various parts of the codebase while accessing and filtering on the datetime fields of this "problematic" table / model. I think an approach at the model layer, which will let Django ORM work as if the values for that table were stored in UTC timezone, would be preferable. Perhaps a solution based on a custom model field that does the conversions from and back to the database "transparently". Am I thinking right? Or perhaps there is a better approach?
It is what it is. If you have different timezones then you need to convert different timezones to the one you prefer. Plus, there is no such thing as for reasons we cannot have the option to convert the timestamps in that table to UTC - well, too bad for you, should have thought about that, now you need to deal with it (if that is the case, which it is not - this is "programming", after all. Of course everything can be changed)

Why does Django not allow a user to extract a usable query from a QuerySet as a standard feature?

In Django, you can extract a plain-text SQL query from a QuerySet object like this:
queryset = MyModel.objects.filter(**filters)
sql = str(queryset.query)
In most cases, this query itself is not valid - you can't pop this into a SQL interface of your choice or pass it to MyModel.objects.raw() without exceptions, since quotations (and possibly other features of the query) are not performed by Django but rather by the database interface at execution time. So at best, this is a useful debugging tool.
Coming from a data science background, I often need to write a lot of complex SQL queries to aggregate data into a reporting format. The Django ORM can be awkward at best and impossible at worst when queries need to be very complex. However, it does offer some security and convenience with respect to limiting SQL injection attacks and providing a way to dynamically build a query - for example, generating the WHERE clause for the query using the .filter() method of a model. I want to be able to use the ORM to generate a base data set in the form of a query, then take that query and use it as a subquery/CTE in a larger query that handles more complex logic. For example:
queryset = MyModel.objects.filter(**filters)
sql = str(queryset.query)
more_complex_query = f"""
with filtered_table as ({sql})
select
*
/* add other stuff */
from
filtered_table
"""
results = MyModel.objects.raw(more_complex_query)
In this case, the ORM generates a query that can be used to filter the base table, then the CTE/raw sql can take that result and do whatever calculations need to be done with a tool that is more common among people working with data (SQL) than the Django ORM, while still getting the ORM benefits of stripping bad actors out.
However, this method requires a way to generate a usable SQL query from a QuerySet object. I've found a workaround for postgres databases using the psycopg2 cursor:
from django.db import connections
# Whatever the key is in your settings.DATABASES for the reporting db
WAREHOUSE_CONNECTION_NAME = 'default'
# Get the Query object and separate it into the query and params
filtered_table_query = MyModel.objects.filter(**filters).query
raw_query, params = filtered_table_query.sql_with_params()
# Create a cursor from the relevant connection
cursor = connections[WAREHOUSE_CONNECTION_NAME].cursor()
# Call .mogrify() on the query/params to get an executable query string
usable_sql = cursor.mogrify(raw_query, params)
cursor.execute(usable_sql) # This works
cursor.fetchall() # This works
# Have not tried this yet
MyModel.objects.raw(usable_sql)
# Or this
wrapper_query = f"""
with base_table as ({usable_sql})
select
*
from
base_table
"""
cursor.execute(wrapper_query)
# or
MyModel.objects.raw(wrapper_query)
This method is dependent on the psycopg2 cursor method .mogrify() - I am not sure if this works for other back ends or if the DB API 2.0 spec takes care of that.
Other people have suggested creating a view in the database and then using an unmanaged Django model on top of the view, but I think this does not really work when your queries are dynamic in nature, i.e. need to be filtered differently based on some user input, since often the fields a user wants to filter on are not present in the result set after some aggregation.
So overall, I have two questions:
Is there a reason why Django does not let you extract a usable SQL query as a standard offering?
What other methods do people use when the ORM makes your elegant SQL into an ugly mess?
The Django developers tend to frown on features that aren't cross-compatible across all the databases they support. I can only imagine that one of the supported database engines doesn't have this capability and so they don't provide it as a standard, documented feature of the ORM.
But that's just a guess. You'd really have to ask one of the devs :)

SQLite Queries for dates

I have a SQLite data base which I am pulling data for a specific set of dates (lets say 01-01-2011 to 01-01-2011). What is the best way to implement this query into SQL. Ideally I would like the following line to run:
SELECT * FROM database where start_date < date_stamp and end_date > date_stamp
This obviously does not work when I store the dates as strings.
My solution (which I think is messy and I am hoping for another one) is to convert the dates into integers in the following format:
YYYYMMDD
Which makes the above line able to run (theoretically). IS there a better method?
Using python sqlite3
Would the answer be any different if I were using SQL not SQLite
For SQLlite it is the best approach, as comparison with int much faster than strings or any Date And Time manipulations
You should store the dates in one of the supported date/time datatypes, then comparisons will work without conversions, and you would be able to use the built-in date/time functions on them.
(Whether you use strings or numbers does not matter for speed; database performance is mostly determined by the amount of I/O needed.)
In other SQL databases that have a built-in date datatype, you could use that.
(However, this is usually not portable.)

SQLAlchemy: limit in the same string as where

We're trying to enable a SQL query front-end to our Web application, which is WSGI and uses Python, with SQLAlchemy (core, not ORM) to query a PostgreSQL database. We have several data layer functions set up to assist in query construction, and we are now trying to set something up that allows this type of query:
select id from <table_name> where ... limit ...
In the front end, we have a text box which lets the user type in the where clause and the limit clause, so that the data can be queried flexibly and dynamically from the front end, that is, we want to enable ad hoc querying. So, the only thing that we now firsthand is:
select id from <table_name>
And the user will type in, for example:
where date > <some_date>
where location is not null limit 10 order by location desc
using the same back end function. The select, column and table should be managed by the data layer (i.e. it knows what they are, and the user should not need to know that). However, I'm not aware of any way to get SQLAlchemy to automatically parse both the where clause and the limit clause automatically. What we have right now is a function which can return the table name and the name of the id column, and then use that to create a text query, which is passed to SQLAlchemy, as the input to a text() call.
Is there any way I can do this with SQLAlchemy, or some other library? Or is there a better pattern of which I should be aware, which does not involve parsing the SQL while still allowing this functionality from the front-end?
Thanks a lot! All suggestions will be greatly appreciated.
I'm not sure I follow, but the general SQL-Alchemy usage is like:
results = db.session.query(User).filter(User.name == "Bob").order_by(User.age.desc()).limit(10)
That will query the User table to return the top ten oldest members named "Bob"

Dynamic column formatting in SQL - and a backend to store the formatting

I'm trying to create a system in Python in which one can select a number of rows from a set of tables, which are to be formatted in a user-defined way. Let's say the table a has a set of columns, some of which include a date or timestamp value. The user-defined format for each column should be stored in another table, and queried and applied on the main query at runtime.
Let me give you an example: There are different ways of formatting a date column, e.g. using
SELECT to_char(column, 'YYYY-MM-DD') FROM table;
in PostgreSQL.
For example, I'd like the second parameter of the to_char() builtin to be queried dynamically from another table at runtime, and then applied if it has a value.
Reading the definition from a table as such is not that much of a problem, rather than creating a database scheme which would receive data from a user interface from which a user can select which formatting instructions to apply to the different columns. The user should be able to pick the user's set of columns to be included in the user's query, as well as the user's user defined formatting for each column.
I've been thinking about doing this in an elegant and efficient way for some days now, but to no avail. Having the user put in the user's desired definition in a text field and including it in a query would pretty much generate an invitation for SQL injection attacks (although I could use escape() functions), and storing every possible combination doesn't seem feasible to me either.
It seems to me a stored procedure or a sub-select would work well here, though I haven't tested it. Let's say you store a date_format for each user in the users table.
SELECT to_char((SELECT date_format FROM users WHERE users.id=123), column) FROM table;
Your mileage may vary.
Pull the dates out as Unix timestamps and format them in Python:
SELECT DATE_PART('epoch', TIMESTAMP(my_col)) FROM my_table;
my_date = datetime.datetime.fromtimestamp(row[0]) # Or equivalent for your toolkit
I've found a couple of advantages to this approach: unix timestamps are the most space-efficient common format (this approach is effectively language neutral) and because the language you're querying the database in is richer than the underlying database, giving you plenty of options if you start wanting to do friendlier formatting like "today", "yesterday", "last week", "June 23rd".
I don't know what sort of application you're developing but if it's something like a web app which will be used by multiple people I'd also consider storing your database values in UTC so you can apply user-specific timezone settings when formatting without having to consider them for all of your database operations.

Categories

Resources