My query string in dict used to filter data on WHERE clause.
parameters =
{
"manufacuturerId": "1",
"fileName": "abc1234 ",
"categoryName": "normal"
}
And SQL query as:
fileSql = "select * from file_table as a
left join category_table as b
on a.fId = b.fId
left join manufacturer_table as c
on c.mId = a.mId
where c.manufacturerId = %(manufacturerId)s and
a.file_name = %(fileName)s and
b.name = %(categoryName)s ;"
cursor.execute(fileSql,(parameters))
This works well to bind the value of dict to SQL query based on key using parametrized queries.
But this way is not flexible if my query string changed to
{
"manufacuturerId": "1",
"fileName": "abc1234 "
}
Then the code will die.
The only manufacuturerId is must and others key-value pair is optional to further filter.
How to optimize the code?
The simple obvious answer is to build your query dynamically, ie:
fileSql = """
select * from file_table as a
left join category_table as b on a.fId = b.fId
left join manufacturer_table as c on c.mId = a.mId
where c.manufacturerId = %(manufacturerId)s
"""
if "fileName" in parameters:
fileSql += " and a.file_name = %(fileName)s "
if "categoryName" in parameters:
fileSql += " and b.name = %(categoryName)s "
Note that this is still not optimal since we keep the join on category_table even when we don't need it. This can be solved in a similar way by dynamically building the "from" clause too, and that's ok if you only have a couple such case in your project - but most often database-drievn apps require a lot of dynamic queries, and building them by hand using plain strings quickly becomes tedious and error-prone, so you may want to check what an ORM (Peewee comes to mind) can do for you.
Related
I have inherited some code similar to below. I understand the concept of passing values to make a query dynamic(in this case field_id) but I don't understand what the benefit of taking the passed-in field_id list and putting it into a dictionary parameters = {"logical_field_id": field_id} before accessing the newly created dictionary to build the SQL statement. Along the same line why return parameters=parameters rather than just listing parameters in the return? I assume this is all the make the request more secure but I would like to better understand of why/how as I need to take on a similar task on a slightly more complex query that is below
def get_related_art(self, field_id):
parameters = {"logical_field_id": field_id}
sql = (
"SELECT a.id AS id,"
" a.name AS name,"
" a.description AS description,"
" a.type AS type,"
" a.subtype AS subtype "
" FROM ArtclTbl AS a INNER JOIN ("
" SELECT article_id AS id FROM LogFldArtclTbl"
" WHERE logical_field_id = %(logical_field_id)s"
" ORDER BY a.name"
)
return self.query(sql, parameters=parameters)
My reason for asking this question is I was asked to parameterize this
def get_group_fields(self, exbytes=None):
parameters = {}
where_clause = (
f"WHERE eig_eb.ebyte in ({', '.join(str(e) for e in ebytes)})" if ebytes else ""
)
sql = (
"SELECT l.id AS id, "
" eig_eb.ebyte AS ebyte, "
" eig.id AS instrument_group_id, "
" eig_lf.relationship_type AS relationship "
....
f" {where_clause}"
)
I started to modify code to iterate when setting the parameters and then accessing that value in the original location. This 'works' except now the query string returns ([ebyte1, ebyte2] instead of (ebyte1, ebyte2). I could modify the string to work around this but i really wanted to understand the why of this first.
parameters = {"exbytes": ', '.join(str(e) for e in exbytes)}
...
where_clause = (
f"WHERE eig_eb.exbyte in " + str(exbytes) if exbytes else ""
The benefit of using named parameter placeholders is so you can pass the parameter values as a dict, and you can add values to that dict in any order. There's no benefit in the first example you show, because you only have one entry in the dict.
There's no benefit in the second example either, because the parameters are part of an IN() list, and there are no other parameterized parts of the query. The order of values in an IN() list is irrelevant. So you could just use positional parameters instead of named parameters.
where_clause = (
f"WHERE eig_eb.ebyte in ({', '.join('%s' for e in ebytes)})" if ebytes else ""
)
Then you don't need a dict at all, you can just pass the ebytes list as the parameters.
Using the syntax parameters=parameters looks like a usage of keyword arguments to a Python function. I don't know the function self.query() in your example, but I suppose it accepts keyword arguments to implement optional arguments. The fact that your local variable is the same name as the keyword argument name is a coincidence.
While reading this question: SQL Multiple Updates vs single Update performance
I was wondering how could I dynamically implement an update for several variables at the same time using a connector like MariaDB's. Reading the official documentation I did not find anything similar.
This question is similar, and it has helped me to understand how to use parametrized queries with custom connectors but it does not answer my question.
Let's suppose that, from one of the views of the project, we receive a dictionary.
This dictionary has the following structure (simplified example):
{'form-0-input_file_name': 'nofilename', 'form-0-id': 'K0944', 'form-0-gene': 'GJXX', 'form-0-mutation': 'NM_0040(p.Y136*)', 'form-0-trix': 'ZSSS4'}
Assuming that each key in the dictionary corresponds to a column in a table of the database, if I'm not mistaken we would have to iterate over the dictionary and build the query in each iteration.
Something like this (semi pseudo-code, probably it's not correct):
query = "UPDATE `db-dummy`.info "
for key in a_dict:
query += "SET key = a_dict[key]"
It is not clear to me how to construct said query within a loop.
What is the most pythonic way to achieve this?
Although this could work.
query = "UPDATE `db-dummy`.info "
for index, key in enumerate(a_dict):
query = query + ("," if index != 0 else "") +" SET {0} = '{1}'".format(key,a_dict[key])
You should consider parameterized queries for safety and security. Moreover, a dynamic dictionary may also raise other concerns, it may be best to verify or filter on a set of agreed keys before attempting such an operation.
query = "UPDATE `db-dummy`.info "
for index, key in enumerate(a_dict):
query = query + ("," if index != 0 else "") +" SET {0} = ? ".format(key)
# Then execute with your connection/cursor
cursor.execute(query, tuple(a_dict.values()) )
This is what I did (inspired by #ggordon's answer)
query = "UPDATE `db-dummy`.info "
for index, key in enumerate(a_dict):
if index == 0:
query = query + "SET {0} = ?".format(key)
else:
query = query + ", {0} = ?".format(key)
query += " WHERE record_id = " + record_id
And it works!
all I want to do is send a query like
SELECT * FROM table WHERE col IN (110, 130, 90);
So I prepared the following statement
SELECT * FROM table WHERE col IN (:LST);
Then I use
sqlite_bind_text(stmt, 1, "110, 130, 90", -1, SQLITE_STATIC);
Unfortunately this becomes
SELECT * FROM table WHERE col IN ('110, 130, 90');
and is useless (note the two additional single quotes). I already tried putting extra ' in the string but they get escaped. I didn't find an option to turn off the escaping or prevent the text from being enclosed by single quotes. The last thing I can think of is not using a prepared statement, but I'd only take it as last option. Do you have any ideas or suggestions?
Thanks
Edit:
The number of parameters is dynamic, so it might be three numbers, as in the example above, one or twelve.
You can dynamically build a parameterized SQL statement of the form
SELECT * FROM TABLE WHERE col IN (?, ?, ?)
and then call sqlite_bind_int once for each "?" you added to the statement.
There is no way to directly bind a text parameter to multiple integer (or, for that matter, multiple text) parameters.
Here's pseudo code for what I have in mind:
-- Args is an array of parameter values
for i = Lo(Args) to Hi(Args)
paramlist = paramlist + ', ?'
sql = 'SELECT * FROM TABLE WHERE col IN (' + Right(paramlist, 3) + ')'
for i = Lo(Args) to Hi(Args)
sql_bind_int(sql, i, Args[i]
-- execute query here.
I just faced this question myself, but answered it by creating a temporary table and inserting all the values into that, so that I could then do:
SELECT * FROM TABLE WHERE col IN (SELECT col FROM temporarytable);
Even simpler, build your query like this:
"SELECT * FROM TABLE WHERE col IN (" + ",".join(["?"] * len(lst)) + ")"
Depending on your build of sqlite (it's not part of the default build), you may be able to use:
SELECT * FROM table WHERE col IN carray(?42);
and then bind ?42 using (assuming the C API):
int32_t data[] = {110, 130, 90};
sqlite3_carray_bind(
stmtPtr, 42,
data, sizeof(data)/sizeof(data[0]),
CARRAY_INT32, SQLITE_TRANSIENT);
I haven't actually tested that, I just read the docs: https://sqlite.org/carray.html
You cannot pass an array as one parameter, but you can pass each array value as a separate parameter (IN (?, ?, ?)).
The safe way to do this for dynamic number parameters (you should not use string concatenation, .format(), etc. to insert the values themselves into the query, it can lead to SQL injections) is to generate the query string with the needed number of ? placeholders and then bind the array elements. Use array concatenation or spread syntax (* or ... in most languages) if you need to pass other parameters too.
Here is an example for Python 3:
c.execute('SELECT * FROM TABLE WHERE col IN ({}) LIMIT ?'
.format(', '.join(['?'] * len(values))), [*values, limit])
One solution (which I haven't tried yet in code, but only on the SQLite shell) is to use json_each function from SQLite.
So you could do something like:
SELECT * FROM table
WHERE col IN (SELECT value FROM json_each(?));
The caveat is that you'd have to manually assemble a valid JSON array with the values you're trying to bind.
A much simpler and safer answer simply involves generating the mask (as opposed to the data part of the query) and allowing the SQL-injection formatter engine to do its job.
Suppose we have some ids in an array, and some cb callback:
/* we need to generate a '?' for each item in our mask */
const mask = Array(ids.length).fill('?').join();
db.get(`
SELECT *
FROM films f
WHERE f.id
IN (${mask})
`, ids, cb);
Working on a same functionality lead me to this approach:
(nodejs, es6, Promise)
var deleteRecords = function (tblName, data) {
return new Promise((resolve, reject) => {
var jdata = JSON.stringify(data);
this.run(`DELETE FROM ${tblName} WHERE id IN (?)`, jdata.substr(1, jdata.length - 2), function (err) {
err ? reject('deleteRecords failed with : ' + err) : resolve();
});
});
};
this works fine aswell (Javascript ES6):
let myList = [1, 2, 3];
`SELECT * FROM table WHERE col IN (${myList.join()});`
You can try this
RSQLite in R:
lst <- c("a", "b", "c")
dbGetQuery(db_con, paste0("SELECT * FROM table WHERE col IN (", paste0(shQuote(lst), collapse=", ") , ");"))
My solution for node (ES6, Promises):
let records = await db.all(`
SELECT * FROM table
WHERE (column1 = ?) and column2 IN ( ${[...val2s].fill('?').join(',')} )
`, [val1, ...val2s])
Works with a variable number of possible values.
This uses sqlite-async but you can modify it for the callback style version trivially.
if you are using Python the easiest way to handle this, in practice, is to create a local function that tests against a string value of the list (which can be passed as a bind variable).
I used this when providing "Query By Example" functionality in a Python GUI app.
pros:
can use common approach in parsing and building the SQL across entries
as I would when parsing LIKE xxx and > xxx etc
just one extra call to set the function up - either at connection time
or if the function call is detected in the created sql
cons:
function needs to parse string list for each row. This is bad if the query
is running against a large table
embedded commas, blanks and other similar stuff may be difficult to handle
For example
user enters IN 18C, 356, 013 into Account field in application
application creates sql with ... WHERE inz( Account , ? ) ...
application creates string bind value 18C, 356, 013
application issues <sqlite3.connection>.create_function("inz", 2, inz ) to bind local python function inz (see below) to sqlite function inz.
application issues query
the coding for the inz function is as follows
def inz( val , possibles ) :
"""implements the IN list function allowing one bind variable
use <sqlite3.connection>.create_function("inz", 2, inz )
and ensure that the bind variable is a comma delimited list
in string form (without quotes)
matches can be string or integer but do not allow for leading
or trailing spaces or contained commas, quotes etc or floating points
"""
poss = [ x.strip() for x in possibles.split(',') ]
if val in poss :
return True
if isinstance( val, int ) :
ipos = [ int(x) for x in poss if x.isdecimal() ]
if val in ipos :
return True
return False
For example, if you want the sql query:
select * from table where col in (110, 130, 90)
What about:
my_list = [110, 130, 90]
my_list_str = repr(my_list).replace('[','(').replace(']',')')
cur.execute("select * from table where col in %s" % my_list_str )
Please suggest is there way to write query multi-column in clause using SQLAlchemy?
Here is example of the actual query:
SELECT url FROM pages WHERE (url_crc, url) IN ((2752937066, 'http://members.aye.net/~gharris/blog/'), (3799762538, 'http://www.coxandforkum.com/'));
I have a table that has two columns primary key and I'm hoping to avoid adding one more key just to be used as an index.
PS I'm using mysql DB.
Update: This query will be used for batch processing - so I would need to put few hundreds pairs into the in clause. With IN clause approach I hope to know fixed limit of how many pairs I can stick into one query. Like Oracle has 1000 enum limit by default.
Using AND/OR combination might be limited by the length of the query in chars. Which would be variable and less predictable.
Assuming that you have your model defined in Page, here's an example using tuple_:
keys = [
(2752937066, 'http://members.aye.net/~gharris/blog/'),
(3799762538, 'http://www.coxandforkum.com/')
]
select([
Page.url
]).select_from(
Page
).where(
tuple_(Page.url_crc, Page.url).in_(keys)
)
Or, using the query API:
session.query(Page.url).filter(tuple_(Page.url_crc, Page.url).in_(keys))
I do not think this is currently possible in sqlalchemy, and not all RDMBS support this.
You can always transform this to a OR(AND...) condition though:
filter_rows = [
(2752937066, 'http://members.aye.net/~gharris/blog/'),
(3799762538, 'http://www.coxandforkum.com/'),
]
qry = session.query(Page)
qry = qry.filter(or_(*(and_(Page.url_crc == crc, Page.url == url) for crc, url in filter_rows)))
print qry
should produce something like (for SQLite):
SELECT pages.id AS pages_id, pages.url_crc AS pages_url_crc, pages.url AS pages_url
FROM pages
WHERE pages.url_crc = ? AND pages.url = ? OR pages.url_crc = ? AND pages.url = ?
-- (2752937066L, 'http://members.aye.net/~gharris/blog/', 3799762538L, 'http://www.coxandforkum.com/')
Alternatively, you can combine two columns into just one:
filter_rows = [
(2752937066, 'http://members.aye.net/~gharris/blog/'),
(3799762538, 'http://www.coxandforkum.com/'),
]
qry = session.query(Page)
qry = qry.filter((func.cast(Page.url_crc, String) + '|' + Page.url).in_(["{}|{}".format(*_frow) for _frow in filter_rows]))
print qry
which produces the below (for SQLite), so you can use IN:
SELECT pages.id AS pages_id, pages.url_crc AS pages_url_crc, pages.url AS pages_url
FROM pages
WHERE (CAST(pages.url_crc AS VARCHAR) || ? || pages.url) IN (?, ?)
-- ('|', '2752937066|http://members.aye.net/~gharris/blog/', '3799762538|http://www.coxandforkum.com/')
I ended up using the test() based solution: generated "(a,b) in ((:a1, :b1), (:a2,:b2), ...)" with named bind vars and generating dictionary with bind vars' values.
params = {}
for counter, r in enumerate(records):
a_param = "a%s" % counter
params[a_param] = r['a']
b_param = "b%s" % counter
params[b_param] = r['b']
pair_text = "(:%s,:%s)" % (a_param, b_param)
enum_pairs.append(pair_text)
multicol_in_enumeration = ','.join(enum_pairs)
multicol_in_clause = text(
" (a,b) in (" + multicol_in_enumeration + ")")
q = session.query(Table.id, Table.a,
Table.b).filter(multicol_in_clause).params(params)
Another option I thought about using mysql upserts but this would make whole included even less portable for the other db engine then using multicolumn in clause.
Update SQLAlchemy has sqlalchemy.sql.expression.tuple_(*clauses, **kw) construct that can be used for the same purpose. (I haven't tried it yet)
I am new to python, I come here from the land of PHP. I constructed a SQL query like this in python based on my PHP knowledge and I get warnings and errors
cursor_.execute("update posts set comment_count = comment_count + "+str(cursor_.rowcount)+" where ID = " + str(postid))
# rowcount here is int
What is the right way to form queries?
Also, how do I escape strings to form SQL safe ones? like if I want to escape -, ', " etc, I used to use addslashes. How do we do it in python?
Thanks
First of all, it's high time to learn to pass variables to the queries safely, using the method Matus expressed. Clearer,
tuple = (foovar, barvar)
cursor.execute("QUERY WHERE foo = ? AND bar = ?", tuple)
If you only need to pass one variable, you must still make it a tuple: insert comma at the end to tell Python to treat it as a one-tuple: tuple = (onevar,)
Your example would be of form:
cursor_.execute("update posts set comment_count = comment_count + ? where id = ?",
(cursor_.rowcount, postid))
You can also use named parameters like this:
cursor_.execute("update posts set comment_count = comment_count + :count where id = :id",
{"count": cursor_.rowcount, "id": postid})
This time the parameters aren't a tuple, but a dictionary that is formed in pairs of "key": value.
from python manual:
t = (symbol,)
c.execute( 'select * from stocks where symbol=?', t )
this way you prevent SQL injection ( suppose this is the SQL safe you refer to ) and also have formatting solved