Best way to save Raw SQL queries in Django - python

What is the best way to make raw SQL queries in django?
I have to search a table for the mode of another table. I could not find a way to solve this in django's ORM so I turned to raw SQL queries.
Yet creating all these very long queries in python is very unreadable and does not feel like a proper way to do this. Is there a way to save these queries in a neat format perhaps in the database.
I have to join three separate tables and compute the mode of a few columns on the last table. The length of the queries is getting very big and the code to make these queries becomes very unreadable. An example query would be
SELECT * FROM "core_assembly" INNER JOIN (SELECT * FROM "core_taxonomy" INNER JOIN(SELECT "core_phenotypic"."taxonomy_id" , \
array_agg("core_phenotypic"."isolation_host_NCBI_tax_id") FILTER (WHERE "core_phenotypic"."isolation_host_NCBI_tax_id" IS NOT NULL) \
AS super_set_isolation_host_NCBI_tax_ids FROM core_phenotypic GROUP BY "core_phenotypic"."taxonomy_id") "mode_table" ON \
"core_taxonomy"."id"="mode_table"."taxonomy_id") "tax_mode" ON "core_assembly"."taxonomy_id"="tax_mode"."id" WHERE ( 404=ANY(super_set_isolation_host_NCBI_tax_ids));
Where I would have a very big parse function to make all the WHERE clauses based on user input.

You can try this:
from django.db import connection
cursor = connection.cursor()
raw_query = "write your query here"
cursor.execute(raw_query)
You can also run raw queries for models. eg. MyModel.objects.raw('my query').
Read Performing raw SQL queries | Django documentation | Django for more.

Related

SQL query builder with seed-query Parser

Is there an SQL query builder (in Python) which allows me to "parse" and initial SQL query, add certain operators and then get the resulting SQL text?
My use case is the following:
Start with a query like: "SELECT * from my_table"
I want to be able to do something like query_object = Query.parse("SELECT * from my_table to get a query object I can manipulate and then write something like query_object.where('column < 10').limit(10) or similar (columns and operators could also be part of the library, may also have to consider existing WHERE clauses)
And finally getting the resulting query string str(query_object) with the final modified query.
Is this something that can be achieved with any of the ORMs? I don't need all the database connection to specific DB-engines or object mappings (although having it is not a limitation).
I've seen pypika, which allows to create an SQL query from code, but it doesn't allow one to parse an existing query and continue from there.
I've also seen sqlparse which allows me to parse and SQL query into tokens. But because it does not create a tree, it is non-trivial to add additional elements to am existing statement. (it is close to what I am looking for, if only it created an actual tree)

How do I write my Django query when the WHERE clause is meant to have a function?

I'm using Django and Python 3.7 along with PostGres 9.5. I have a column in my PostGres table of type text, which records URLs for articles. I want to run a query that compares everything before the query string, e.g.
SELECT * FROM article where regexp_replace(url, '\?.*$', '') = :url_wo_query_info
but I'm not sure how to pull this off in Django. Normally if I want to straigh tup query on just a URL, I could write
Article.objects.filter(url=url)
BUt I'm unsure how to do the above in Django's lingo because there is a more complicated function involved.
You can use Func with F expressions to use database functions on model fields. Your query would look like this in Django ORM:
Article.objects.all().annotate(
processed_url=Func(
F('url'),
Value('\?.*$'), Value(''),
function='regexp_replace',
)
).filter(processed_url=url_wo_query_info)

How to do general maths in sql query in django?

The following query I'd love to do in django, ideally without using iteration. I just want the database call to return the result denoted by the query below. Unfortunately according to the docs this doesn't seem to be possible; only the general functions like Avg, Max and Min etc are available. Currently I'm using django 1.4 but I'm happy to rewrite stuff from django 1.8 (hence the docs page; I've heard that 1.8 does a lot of these things much better than 1.4)
select sum(c.attr1 * fs.attr2)/ sum(c.attr1) from fancyStatistics as fs
left join superData s on fs.super_id=s.id
left join crazyData c on s.crazy_id=c.id;
Note:
The main reason for doing this in django directly is that if we ever want to change our database from MySQL to something more appropriate for django, it would be good not to have to rewrite all the queries.
You should be able to get aggregates with F expressions to do most of what you want without dropping into SQL.
https://docs.djangoproject.com/en/1.8/topics/db/aggregation/#joins-and-aggregates
aggregate_dict = FancyStatistics.objects.all()\
.aggregate(
sum1=Sum(
F('superdata__crazydata__attr1') * F('attr2'), output_field=FloatField()
) ,
sum2=Sum('superdata__crazydata__attr1')
)
)
result = aggregate_dict['sum1'] / aggregate_dict['sum2']
You need to specify the output fields if the data types used are different.
You can do that query in Django directly using your SQL expression. Check the docs concerning performing raw SQL queries.

Get All of Single Column from Every Table in Schema

In our system, we have 1000+ tables, each of which has an 'date' column containing DateTime object. I want to get a list containing every date that exists within all of the tables. I'm sure there should be an easy way to do this, but I've very limited knowledge of either postgresql or sqlalchemy.
In postgresql, I can do a full join on two tables, but there doesn't seem to be a way to do a join on every table in a schema, for a single common field.
I then tried to solve this programmatically in python with sqlalchemy. For each table, I did created a select distinct for the 'date' column, then set that list of selectes that to the selects property of a CompoundSelect object, and executed. As one might expect from an ugly brute force query, it has ben running now for an hour or so, and I am unsure if it has broken silently somewhere and will never return.
Is there a clean and better way to do this?
You definitely want to do this on the server, not at the application level, due to the many round trips between application and server and likely duplication of data in intermediate results.
Since you need to process 1,000+ tables, you should use the system catalogs and dynamically query the tables. You need a function to do that efficiently:
CREATE FUNCTION get_all_dates() RETURNS SETOF date AS $$
DECLARE
tbl name;
BEGIN
FOR tbl IN SELECT 'public.' || tablename FROM pg_tables WHERE schemaname = 'public' LOOP
RETURN QUERY EXECUTE 'SELECT DISTINCT date::date FROM ' || tbl;
END LOOP
END; $$ LANGUAGE plpgsql;
This will process all the tables in the public schema; change as required. If the tables are in multiple schemas you need to insert your additional logic on where tables are stored, or you can make the schema name a parameter of the function and call the function multiple times and UNION the results.
Note that you may get duplicate dates from multiple tables. These duplicates you can weed out in the statement calling the function:
SELECT DISTINCT * FROM get_all_dates() ORDER BY 1;
The function creates a result set in memory, but if the number of distinct dates in the rows in the 1,000+ tables is very large, the results will be written to disk. If you expect this to happen, then you are probably better off creating a temporary table at the beginning of the function and inserting the dates into that temp table.
Ended up reverting back to a previous solution of using SqlAlchemy to run the queries. This allowed me to parallelize things and run a little faster, since it really was a very large query.
I knew a few things with the dataset that helped with this query- I only wanted distinct dates from each table, and that the dates were the PK in my set. I ended up using the approach from this wiki page. Code being sent in the query looked like the following:
WITH RECURSIVE t AS (
(SELECT date FROM schema.tablename ORDER BY date LIMIT 1)
UNION ALL SELECT (SELECT knowledge_date FROM schema.table WHERE date > t.date ORDER BY date LIMIT 1)
FROM t WHERE t.date IS NOT NULL)
SELECT date FROM t WHERE date IS NOT NULL;
I pulled the results of that query into a list of all my dates if they weren't already in the list, then saved that for use later. It's possible that it takes just as long as running it all in the pgsql console, but it was easier for me to save locally than to have to query the temp table in the db.

SQL query taking input from a file

I have following SQL query -
SELECT * FROM users where id=X
Here 'X' is the set of values which need to be read from a file say ~/ids.lst. I have following approach in hand.
Read ids.lst and load all variables.
For each variable form and fire an SQL query and concat the results.
I have working code in python, but the problem is if I have n ids, then n queries will be made to server. Is there any way I could achieve the same with a single query to server?
You can use in operator for single query.
SELECT * FROM users where id in (<concat ids with , (comma) read from file.>)
This will be single query and faster compare to individual query.

Categories

Resources