I am kind of new to sql and database & currently developing website in django framework.
During my reading of django documentation I have read about raw sql queries which are executed using Manager.raw() like below.
for p in Person.objects.raw('SELECT * FROM myapp_person'):
Manager.raw(raw_query, params=None, translations=None)
How does raw queries differes from normal sql queries & when should I use raw sql queries instead of Django ORM ?
Django (like other similar ORM tools) is a connection between relational databases and object-oriented programming. One of the very important functions that it implements is providing a uniform interface to the database -- regardless of the underlying database.
When you use underlying Django functionality, the code should be supported on any database (there may be specific limits on this). This makes it particularly easy to port to another database. It also helps ensure that the generated queries do what you intend.
When you use raw SQL, the code is likely to be specific to one database (creating a porting problem). The code is also not checked, which can result in hard-to-understand errors.
I have a strong preference for using SQL directly -- but that is because I am not a programmer using an ORM framework. If you are going to use such a framework, it is probably better to use the built-in functionality wherever possible.
This is a borderline opinion question so might get flagged, but it is a good point. Essentially the raw SQL queries are intended to only be used for the edge cases where the Django ORM does not fulfil your needs (and with each new version of Django it support more and more query types so raw becomes less useful).
In general I would suggest using the ORM for the more helpful error messages, maintainability, and plain ease of use, and only use raw as a last-resort
Related
I am new to working with databases and couldn't find any relevant answers for this.
What are the uses of SQLAlchemy over MYSQL CONNECTOR for python.
I do not have much experience with MYSQL CONNECTOR for Python. However, from what I know SQLAlchemy primarily uses ORM (Object-Relational Mapping) in order to abstract the details of handling the database. This can help avoid errors some times (and also introduce possibly introduce others). You might want to have a look at the ORM technique and see if it is for you (but don't use it as a way to avoid learning SQL). Generally, ORMs tend not to be as scalable as raw SQL either.
I am also a newbie. In my understanding SQLAlchemy is an ORM (Object-Relational Mapping) that allows you to abstract the database and query data from the DB more easily in your coding language treating query data as another object. Pros is that that you can more easily switch your DB under the hood. But it has some learning curve.
Whereas MySQL Connector is "just" a plain simple direct connection to the DBMS at your database and you write SQL queries to get the data.
For now I am sticking with the mysql connector to just train SQL queries more. But later on I will definitely test out SQLAlchemy.
I'm learning Django and its ORM data access methodology and there is something that I'm curious about. In one particular endpoint, I'm making a number of database calls (to Postgres) - below is an example of one:
projects = Project.objects\
.filter(Q(first_appointment_scheduled=True) | (Q(active=True) & Q(phase=ProjectPhase.meet.value)))\
.select_related('customer__first_name', 'customer__last_name',
'lead_designer__user__first_name', 'lead_designer__user__last_name')\
.values('id')\
.annotate(project=F('name'),
buyer=Concat(F('customer__first_name'), Value(' '), F('customer__last_name')),
designer=Concat(F('lead_designer__user__first_name'), Value(' '), F('lead_designer__user__last_name')),
created=F('created_at'),
meeting=F('first_appointment_date'))\
.order_by('id')[:QUERY_SIZE]
As you can see, that's not a small query - I'm pulling in a lot of specific, related data and doing some string manipulation. I'm relatively concerned with performance so I'm doing the best I can to make things more efficient by using select_related() and values() to only get exactly what I need.
The question I have is, conceptually and in broad terms, at what point does it become faster to just write my queries using parameterized SQL instead of using the ORM (since the ORM has to first "translate" the above "mess")? At what approximate level of query complexity should I switch over to raw SQL?
Any insight would be helpful. Thanks!
The question I have is, conceptually and in broad terms, at what point
does it become faster to just write my queries using parameterized SQL
instead of using the ORM (since the ORM has to first "translate" the
above "mess")?
If you are asking about performance, Never.
The time taken to convert the ORM query into SQL will be very small compared to the time taken to actually execute that query. Brain cells are irreplaceable, servers are cheap.
If you are really do have performance issues the first place to look at is the your indexes in your models. Try printing out each of the queries generated by the ORM and run them in your psql console by prefixing EXPLAIN ANALYSE.
You can also use the django-debug-toolbar to automate this. In fact django-debug toolbar is an essential tool to hunt down bottlenecks. You will be surprised to note how often you have missed a simple select_related and how that causes hundreds of additional queries to be executed.
At what approximate level of query complexity should I switch over to
raw SQL?
if you are asking about the ease of coding, it depends.
If the query is very very hard to write using the ORM and it's unreadable, yes, then it's perfectly fine to use a raw query. For example a query that has multiple aggregations, uses common table expressions, multiple joins etc can sometimes be hard to write as an ORM query, in that case if you are comfortable with raw sql writing it that way is fine.
Agreed with what #e4c5 said .
Additional translation layer for converting an ORM query to raw SQL query will effect performance.
However, this effect will depend on how much complex your query is?
When you use ORM, you can control the load on DB by increasing the processing in the app. In addition, this gives the opportunity to cache the result in the application itself.
At last, It totally depends on your schema , how complex your queries can be and how are you scaling your DB(Indices, replicas etc .)
For more read here
i'm working on a project (written in Django) which has only a few entities, but many rows for each entity.
In my application i have several static "reports", directly written in plain SQL. The users can also search the database via a generic filter form. Since the target audience is really tech-savvy and at some point the filter doesn't fit their needs, i think about creating a query language for my database like YQL or Jira's advanced search.
I found http://sourceforge.net/projects/littletable/ and http://www.quicksort.co.uk/DeeDoc.html, but it seems that they only operate on in-memory objects. Since the database can be too large for holding it in-memory, i would prefer that the query is translated in SQL (or better a Django query) before doing the actual work.
Are there any library or best practices on how to do this?
Writing such a DSL is actually surprisingly easy with PLY, and what ho—there's already an example available for doing just what you want, in Django. You see, Django has this fancy thing called a Q object which make the Django querying side of things fairly easy.
At DjangoCon EU 2012, Matthieu Amiguet gave a session entitled Implementing Domain-specific Languages in Django Applications in which he went through the process, right down to implementing such a DSL as you desire. His slides, which include all you need, are available on his website. The final code (linked to from the last slide, anyway) is available at http://www.matthieuamiguet.ch/media/misc/djangocon2012/resources/compiler.html.
Reinout van Rees also produced some good comments on that session. (He normally does!) These cover a little of the missing context.
You see in there something very similar to YQL and JQL in the examples given:
groups__name="XXX" AND NOT groups__name="YYY"
(modified > 1/4/2011 OR NOT state__name="OK") AND groups__name="XXX"
It can also be tweaked very easily; for example, you might want to use groups.name rather than groups__name (I would). This modification could be made fairly trivially (allow . in the FIELD token, by modifying t_FIELD, and then replacing . with __ before constructing the Q object in p_expression_ID).
So, that satisfies simple querying; it also gives you a good starting point should you wish to make a more complex DSL.
I've faced exactly this problem - a large database which needs searching. I made some static reports and several fancy filters using django (very easy with django) just like you have.
However the power users were clamouring for more. I decided that there already was a DSL that they all knew - SQL. The question was how to make it secure enough.
So I used django permissions to give the power users permission to make SQL queries in a new table. I then made a view for the not-quite-so-power users to use these queries. I made them take optional parameters. The queries were run using Python's lower level DB-API which django is using under the hood for its ORM anyway.
The real trick was opening a read only database connection to run these queries just to make sure that no updates were ever run. I made a read only connection by creating a different user in the database with lower permissions and opening a specific connection for that in the view.
TL;DR - SQL is the way to go!
Depending on the form of your data, the types of queries your users need to use, and the frequency that your data is updated, an alternative to the pure SQL solution suggested by Nick Craig-Wood is to index your data in Solr and then run queries against it.
Solr is an added layer of complexity (configuration, data synchronization) but it is super-fast, can handle large datasets, and provides a (relatively) intuitive query language.
You could write your own SQL-ish language using pyparsing, actually. There is even pretty verbose example you could extend.
I have a script with several functions that all need to make database calls. I'm trying to get better at writing clean code rather than just throwing together scripts with horrible style. What is generally considered the best way to establish a global database connection that can be accessed anywhere in the script but is not susceptible to errors such as accidentally redefining the variable holding a connection. I'd imagine I should be putting everything in a module? Any links to actual code would be very useful as well. Thanks.
If you are working with Python and databases, you cannot afford not to look at SQLAlchemy:
SQLAlchemy is the Python SQL toolkit
and Object Relational Mapper that
gives application developers the full
power and flexibility of SQL.
It provides a full suite of well known
enterprise-level persistence patterns,
designed for efficient and
high-performing database access,
adapted into a simple and Pythonic
domain language.
I have built very complex databases with a surprisingly small amount of code (a few hundred lines). The schema definition is almost self-documenting, the objects used for the Object Relational Mapper are Plain Old Python Objects (i.e., what you already have), and the querying API is almost obvious. In addition, the documentation is excellent: many online examples, fully documented API, and an O'Reilly book which, while far from perfect, does take you from zero to dangerous in a few evenings.
If you don't want to use the Object Relational Mapper, you can always fall back to plain connections and literal SQL. Also, the code is portable and database independent (the same code will work with MySQL, Oracle, SQLite, and other database managers).
The Session object will automatically take care of the pooling (what you mention as your concern).
The best way to understand its power is probably to follow the tutorials obtained in the first result page of the Google query sqlalchemy tutorial.
Use a model system/ORM system.
Due to performance reasons I can't use the ORM query methods of Django and I have to use raw SQL for some complex questions. I want to find a way to map the results of a SQL query to several models.
I know I can use the following statement to map the query results to one model, but I can't figure how to use it to be able to map to related models (like I can do by using the select_related statement in Django).
model_instance = MyModel(**dict(zip(field_names, row_data)))
Is there a relatively easy way to be able to map fields of related tables that are also in the query result set?
First, can you prove the ORM is stopping your performance? Sometimes performance problems are simply poor database design, or improper indexes. Usually this comes from trying to force-fit Django's ORM onto a legacy database design. Stored procedures and triggers can have adverse impact on performance -- especially when working with Django where the trigger code is expected to be in the Python model code.
Sometimes poor performance is an application issue. This includes needless order-by operations being done in the database.
The most common performance problem is an application that "over-fetches" data. Casually using the .all() method and creating large in-memory collections. This will crush performance. The Django query sets have to be touched as little as possible so that the query set iterator is given to the template for display.
Once you choose to bypass the ORM, you have to fight out the Object-Relational Impedance Mismatch problem. Again. Specifically, relational "navigation" has no concept of "related": it has to be a first-class fetch of a relational set using foreign keys. To assemble a complex in-memory object model via SQL is simply hard. Circular references make this very hard; resolving FK's into collections is hard.
If you're going to use raw SQL, you have two choices.
Eschew "select related" -- it doesn't exist -- and it's painful to implement.
Invent your own ORM-like "select related" features. A common approach is to add stateful getters that (a) check a private cache to see if they've fetched the related object and if the object doesn't exist, (b) fetch the related object from the database and update the cache.
In the process of inventing your own stateful getters, you'll be reinventing Django's, and you'll probably discover that it isn't the ORM layer, but a database design or an application design issue.