I worked on a PHP project earlier where prepared statements made the SELECT queries 20% faster.
I'm wondering if it works on Python? I can't seem to find anything that specifically says it does or does NOT.
Most languages provide a way to do generic parameterized statements, Python is no different. When a parameterized query is used databases that support preparing statements will automatically do so.
In python a parameterized query looks like this:
cursor.execute("SELECT FROM tablename WHERE fieldname = %s", [value])
The specific style of parameterization may be different depending on your driver, you can import your db module and then do a print yourmodule.paramstyle.
From PEP-249:
paramstyle
String constant stating the type of parameter marker
formatting expected by the interface. Possible values are
[2]:
'qmark' Question mark style,
e.g. '...WHERE name=?'
'numeric' Numeric, positional style,
e.g. '...WHERE name=:1'
'named' Named style,
e.g. '...WHERE name=:name'
'format' ANSI C printf format codes,
e.g. '...WHERE name=%s'
'pyformat' Python extended format codes,
e.g. '...WHERE name=%(name)s'
Direct answer, no it doesn't.
joshperry's answer is a good explanation of what it does instead.
From eugene y answer to a similar question,
Check the MySQLdb Package Comments:
"Parameterization" is done in MySQLdb by escaping strings and then blindly interpolating them into the query, instead of using the
MYSQL_STMT API. As a result unicode strings have to go through two
intermediate representations (encoded string, escaped encoded string)
before they're received by the database.
So the answer is: No, it doesn't.
After a quick look through an execute() method of a Cursor object of a MySQLdb package (a kind of de-facto package for integrating with mysql, I guess), it seems, that (at least by default) it only does string interpolation and quoting and not the actual parametrized query:
if args is not None:
query = query % db.literal(args)
If this isn't string interpolation, then what is?
In case of executemany it actually tries to execute the insert/replace as a single statement, as opposed to executing it in a loop. That's about it, no magic there, it seems. At least not in its default behaviour.
EDIT: Oh, I've just realized, that the modulo operator could be overriden, but I've felt like cheating and grepped the source. Didn't find an overriden mod anywhere, though.
For people just trying to figure this out, YES you can use prepared statements with Python and MySQL. Just use MySQL Connector/Python from MySQL itself and instantiate the right cursor:
https://dev.mysql.com/doc/connector-python/en/index.html
https://dev.mysql.com/doc/connector-python/en/connector-python-api-mysqlcursorprepared.html
Using the SQL Interface as suggested by Amit can work if you're only concerned about performance. However, you then lose the protection against SQL injection that a native Python support for prepared statements could bring. Python 3 has modules that provide prepared statement support for PostgreSQL. For MySQL, "oursql" seems to provide true prepared statement support (not faked as in the other modules).
Not directly related, but this answer to another question at SO includes the syntax details of 'templated' queries. I'd say that the auto-escaping would be their most important feature...
As for performance, note the method executemany on cursor objects. It bundles up a number of queries and executes them all in one go, which does lead to better performance.
There is a Solution!
You can use them if you put them into a stored procedure on the server and call them like this from python...
cursor.callproc(Procedurename, args)
Here is a nice little tutorial on Stored procedures in mysql and python.
http://www.mysqltutorial.org/calling-mysql-stored-procedures-python/
Related
I took a slight peek behind the curtain at the MySQLdb python driver, and to my horror I saw it was simply escaping the parameters and putting them directly into the query string. I realize that escaping inputs should be fine in most cases, but coming from PHP, I have seen bugs where, given certain database character sets and versions of the MySQL driver, SQL injection was still possible.
This question had some incredibly detailed responses regarding the edge cases of string escaping in PHP, and has led me to the belief that prepared statements should be used whenever possible.
So then my questions are: Are there any known cases where the MySQLdb driver has been successfully exploited due to this? When a query needs to be run in a loop, say in the case of an incremental DB migration script, will this degrade performance? Are my concerns regarding escaped input fundamentally flawed?
I can't point to any known exploit cases, but I can say that yes, this is terrible.
The Python project calling itself MySQLdb is no longer maintained. It's been piling up unresolved Github issues since 2014, and just quickly looking at the source code, I can find more bugs not yet reported - for example, it's using a regex to parse queries in execute_many, leading it to mishandle any query with the string " values()" in it where values isn't the keyword.
Instead of MySQLdb, you should be using MySQL's Connector/Python. That is still maintained, and it's hopefully less terrible than MySQLdb. (Hopefully. I didn't check that one.)
...prepared statements should be used whenever possible.
Yes. That's the best advice. If the prepared statement system is broken there will be klaxons blaring from the rooftops and everyone in the Python world will pounce on the problem to fix it. If there's a mistake in your own code that doesn't use prepared statements you're on your own.
What you're seeing in the driver is probably prepared statement emulation, that is the driver is responsible for inserting data into the placeholders and forwarding the final, composed statement to the server. This is done for various reasons, some historical, some to do with compatibility.
Drivers are generally given a lot of serious scrutiny as they're the foundation of most systems. If there is a security bug in there then there's a lot of people that are going to be impacted by it, the stakes are very high.
The difference between using prepared statements with placeholder values and your own interpolated code is massive even if behind the scenes the same thing happens. This is because the driver, by design, always escapes your data. Your code might not, you may omit the escaping on one value and then you have a catastrophic hole.
Use placeholder values like your life depends on it, because it very well might. You do not want to wake up to a phone call or email one day saying your site got hacked and now your database is floating around on the internet.
Using prepared statements is faster than concatenating a query. The database can precompile the statement, so only the parameters will be changed when iterating in a loop.
I have a painted-myself-into-a-corner question that hopefully has a sensible solution I'm overlooking. I had a Python project using sqlite3, which I like a lot and use all the time, and I wanted to try to also support running it on postgres, in case scaling becomes an issue.
Some initial research suggested that there wasn't really a single de facto Python database abstraction layer (hopefully I didn't get this wrong), but psycopg2 fortunately seemed to have very similar structure and methods to sqlite3, and I was able to get away with only adding a couple helper functions and switch cases to my existing code to allow it to support both database libraries with the same queries.
The only exception, unbelievably enough, is the replacement character for variables; sqlite3 needs ? and psycopg2 needs %s. These are probably inherent to sqlite and postgres themselves for all I know.
This means that a function like this:
cur.execute("INSERT INTO repositories (repository_url, repository_name, repository_type, repository_thumbnail, last_crawl_timestamp, item_url_pattern) VALUES (%s,%s,%s,%s,%s,%s)", (repository_url, repository_name, repository_type, repository_thumbnail, time.time(), item_url_pattern))
Will only work for postgres, and if I change the %s's, to ?'s, it'll only work for sqlite. This defies any kind of elegant solution -- I don't really want to rig up some kind of string replacement to construct my queries, as that'll get dumb pretty quickly -- and mostly I'm just astonished that this has turned out to be my blocker.
Any thoughts?
The API in use by both implementations is the Python Database API Specification v2.0, documented in PEP 249. The module global paramstyle tells you what style of parameters a particular implementation expects. The possible values and meanings are documented here.
I'm quite new to Python and Flask, and while working through the examples, couldn't help noticing cursors. Before this I programmed in PHP, where I never needed cursors. So I got to wondering: What are cursors and why are they used so much in these code examples?
But no matter where I turned, I saw no clear verdict and lots of warnings:
Wikipedia: "Fetching a row from the cursor may result in a network round trip each time", and "Cursors allocate resources on the server, such as locks, packages, processes, and temporary storage."
StackOverflow: See the answer by AndreasT.
The Island of Misfit Cursors: "A good developer is never reluctant to use a tool only because it's often misused by others."
And to top it all, I learned that MySQL does NOT support cursors!
It looks like the only code that doesn't use cursors in the mysqlclient library is the _msql module, and the author repeatedly warns not to use it for compatibility reasons: "If you want to write applications which are portable across databases, use MySQLdb, and avoid using this module directly."
Well, I hope I have explained and supported my dilemma sufficiently well. Here are two big questions troubling me:
Since MySQL doesn't support cursors, what's the whole point of building the entire thing on a Cursor class hierarchy?
Why aren't cursors optional in mysqlclient?
Your are confusing database-engine level cursors and Python db-api cursors. The second ones only exists at the Python code level and are not necessarily tied to database-level ones.
At the Python level, cursors are a way to encapsulate a query and it's results. This abstraction level allow to have a simple, usable and common api for different vendors. Whether the actual implementation for a given vendor relies on database-level cursors or not is a totally different problem.
To make a long story short: there are two distinct concepts here:
database (server) cursors, a feature that exists in some but not all SQL engines
db api (client) cursors (as defined in pep 249), which are used to execute a query and eventually fetch the results.
db api cursors are named so because they conceptually have some similarity with database cursors but are technically totally unrelated.
As to why mysqlclient works this way, it's plain simple: it implements pep 249, which is the community-defined API for Python SQL databases clients.
What is the most reliable way to determine what statements are "querying" versus "modifying"? For example, SELECT versus UPDATE / INSERT / CREATE.
Parsing the statement myself seems the obvious first attempt, but I can't help but think that this would be a flaky solution. Just looking for SELECT at the beginning doesn't work, as PRAGMA can also return results, and I'm sure there are a multitude of ways that strategy could fail. Testing for zero rows returned from the cursor doesn't work either, as a SELECT can obviously return zero results.
I'm working with SQLite via the Python sqlite3 module.
Use the sqlite3_changes API call, which is also available from SQL using the changes function.
As TokenMacGuy mentioned, you can rollback the transaction containing the statement that caused the changes; the sqlite3_changes function will let you know if that is necessary.
There is also the update_hook callback if you need more fine grained information abouth the tables and rows affected.
Take this simple C# LINQ query, and imagine that db.Numbers is an SQL table with one column Number:
var result =
from n in db.Numbers
where n.Number < 5
select n.Number;
This will run very efficiently in C#, because it generates an SQL query something like
select Number from Numbers where Number < 5
What it doesn't do is select all the numbers from the database, and then filter them in C#, as it might appear to do at first.
Python supports a similar syntax:
result = [n.Number for n in Numbers if n.Number < 5]
But it the if clause here does the filtering on the client side, rather than the server side, which is much less efficient.
Is there something as efficient as LINQ in Python? (I'm currently evaluating Python vs. IronPython vs. Boo, so an answer that works in any of those languages is fine.)
sqlsoup in sqlalchemy gives you the quickest solution in python I think if you want a clear(ish) one liner . Look at the page to see.
It should be something like...
result = [n.Number for n in db.Numbers.filter(db.Numbers.Number < 5).all()]
LINQ is a language feature of C# and VB.NET. It is a special syntax recognized by the compiler and treated specially. It is also dependent on another language feature called expression trees.
Expression trees are a little different in that they are not special syntax. They are written just like any other class instantiation, but the compiler does treat them specially under the covers by turning a lambda into an instantiation of a run-time abstract syntax tree. These can be manipulated at run-time to produce a command in another language (i.e. SQL).
The C# and VB.NET compilers take LINQ syntax, and turn it into lambdas, then pass those into expression tree instantiations. Then there are a bunch of framework classes that manipulate these trees to produce SQL. You can also find other libraries, both MS-produced and third party, that offer "LINQ providers", which basically pop a different AST processer in to produce something from the LINQ other than SQL.
So one obstacle to doing these things in another language is the question whether they support run-time AST building/manipulation. I don't know whether any implementations of Python or Boo do, but I haven't heard of any such features.
Look closely at SQLAlchemy. This can probably do much of what you want. It gives you Python syntax for plain-old SQL that runs on the server.
I believe that when IronPython 2.0 is complete, it will have LINQ support (see this thread for some example discussion). Right now you should be able to write something like:
Queryable.Select(Queryable.Where(someInputSequence, somePredicate), someFuncThatReturnsTheSequenceElement)
Something better might have made it into IronPython 2.0b4 - there's a lot of current discussion about how naming conflicts were handled.
Boo supports list generator expressions using the same syntax as python. For more information on that, check out the Boo documentation on Generator expressions and List comprehensions.
A key factor for LINQ is the ability of the compiler to generate expression trees.
I am using a macro in Nemerle that converts a given Nemerle expression into an Expression tree object.
I can then pass this to the Where/Select/etc extension methods on IQueryables.
It's not quite the syntax of C# and VB, but it's close enough for me.
I got the Nemerle macro via a link on this post:
http://groups.google.com/group/nemerle-dev/browse_thread/thread/99b9dcfe204a578e
It should be possible to create a similar macro for Boo. It's quite a bit of work however, given the large set of possible expressions you need to support.
Ayende has given a proof of concept here:
http://ayende.com/Blog/archive/2008/08/05/Ugly-Linq.aspx