I have a JSON object containing sets of a "keyX" and the corresponding "value".
data = {
"key1": 10,
"key2": 20,
...
}
I need to write the values into a database into the column "keyX".
Unfortunately one can't format the SQL Query like this:
for key in data.keys():
cur.execute('UPDATE table SET ?=? WHERE identifier=?;', (key, data[key], identifier))
Therefore I'm currently solving it like this:
for key in data.keys():
cur.execute('UPDATE table SET ' + key + '=? WHERE identifier=?;', (data[key], identifier))
This is working perfectly, but SQL queries shouldn't be constructed string-based.
In this specific case, the keys are not set by the user, so SQL injection by the user is imo not possible,
Can this be solved better without string-based query construction?
You cannot set up placeholders for structural parts of the query, only for the slots where values are supposed to go.
That's by design. Placeholders are supposed to protect the integrity of the SQL from maliciously crafted values, i.e. to prevent SQL injection attacks. If you could set arbitrary parts of your query from dynamic inputs, placeholders would not be able to do this job anymore.
Column names are as much a structural part of the SQL as the SELECT keyword. You need to use string interpolation to make them dynamic. Formatted strings make this quite natural:
for column, value in data:
cur.execute(f'UPDATE table SET {column} = ? WHERE identifier = ?;', (value, identifier))
but SQL queries shouldn't be constructed string-based.
That's meant to be a rule for values, though. String interpolation would work here, too, and it even does not carry much of a risk when you already know the data you are processing, but it's a bad habit and you will end up taking that shortcut one time too often. Keep using placeholders wherever it's possible.
In this specific case, the keys are not set by the user, so SQL injection by the user is imo not possible
Correct. You can safely make parts of the SQL structure dynamic if you only use trusted parts. Placeholders are meant to guard against untrusted input.
Related
I am creating a Python Flask app that interfaces with an SQL database. One of the things it does is take user input and stores it in a database. My current way of doing it looks something like this
mycursor.execute(f"SELECT * FROM privileges_groups WHERE id = {PrivID}")
This is not a good or correct way of doing this. Not only can certain characters such as ' cause errors, it also leaves me susceptible to SQL injection. Could anyone inform me of a good way of doing this?
To protect against injection attacks you should use placeholders for values.
So change
mycursor.execute(f"SELECT * FROM privileges_groups WHERE id = {PrivID}")
to
mycursor.execute("SELECT * FROM privileges_groups WHERE id = ?", (PrivID,))
Placeholders can only store a value of the given type and not an arbitrary SQL fragment. This will help to guard against strange (and probably invalid) parameter values.
However, you can't use placeholders for table names and column names.
Note: trailing comma is required for one-element tuples only but not necessary for multiple-element tuples. The comma disambiguates a tuple from an expression surrounded by parentheses.
Related: How do parameterized queries help against SQL injection?
So, if you want to avoid a sql injection...you have to have a secure query i.e. you don't want your query to doing something it shouldn't be.
queryRun = "SELECT * FROM privileges_groups WHERE id = %s" % (PrivID)
When you use "%s" this variable as a placeholder, you avoid ambiguity as to what the injection can or cannot cause to the overall system.
then..run the .execute() call:
mycursor.execute(queryRun)
Note: this also can be done in one step having all the changes within the .execute() call but you maybe better off splitting into piece-wise approach.
This isn't 100 % but should help a lot.
I am working with python lists that have varying number of items and need to build out a dynamic string (database insert statements) based on the number of items.
For example:
l1 = ['one','two','three']
The desired output is something like this. Was hoping people had more pythonic suggestions on how to approach this in the string.
"insert into {table} values ('{c1}', '{c2}', '{c3}')".format(table='tabName', c1=l1[0], c2=l1[1], c3=l1[2])
But the above obviously wont work if I now have a list with a 4th element.
How would you suggest that I build out the string if i have a variable number of items in my list (and thus variable number of database columns to insert into)?
Thanks in advance for any advice.
What you're trying to do is a very bad idea, as well as impossible to do the way you're attempting it.
Usually, you don't want SQL queries to be dynamic at all; you want a fixed string like this:
sql = 'INSERT INTO tabName VALUES(?, ?, ?)'
db.execute(sql, l1)
Also, you usually want to add the column names, not rely on the column ordering (according to SQL standards, the order completely arbitrary, although in practice nearly every database will always use the order in which the columns were created).
In rare cases (e.g., when you're building a database administration tool), you do need dynamic SQL, but you still want to use parameters. You do this by dynamically putting parameters into the query, and then passing the list. Something like this:
params = ', '.join('?' for _ in l1)
sql = 'INSERT INTO {table} VALUES({params})'.format(table=tabName, params=params)
db.execute(sql, l1)
(But of course if tabName is a user-entered string, this is at least as dangerous as not using parameters.)
I have a kinda unusual scenario but in addition to my sql parameters, I need to let the user / API define the table column name too. My problem with the params is that the query results in:
SELECT device_id, time, 's0' ...
instead of
SELECT device_id, time, s0 ...
Is there another way to do that through raw or would I need to escape the column by myself?
queryset = Measurement.objects.raw(
'''
SELECT device_id, time, %(sensor)s FROM measurements
WHERE device_id=%(device_id)s AND time >= to_timestamp(%(start)s) AND time <= to_timestamp(%(end)s)
ORDER BY time ASC;
''', {'device_id': device_id, 'sensor': sensor, 'start': start, 'end': end})
As with any potential for SQL injection, be careful.
But essentially this is a fairly common problem with a fairly safe solution. The problem, in general, is that query parameters are "the right way" to handle query values, but they're not designed for schema elements.
To dynamically include schema elements in your query, you generally have to resort to string concatenation. Which is exactly the thing we're all told not to do with SQL queries.
But the good news here is that you don't have to use the actual user input. This is because, while possible query values are infinite, the superset of possible valid schema elements is quite finite. So you can validate the user's input against that superset.
For example, consider the following process:
User inputs a value as a column name.
Code compares that value (raw string comparison) against a list of known possible values. (This list can be hard-coded, or can be dynamically fetched from the database schema.)
If no match is found, return an error.
If a match is found, use the matched known value directly in the SQL query.
So all you're ever using are the very strings you, as the programmer, put in the code. Which is the same as writing the SQL yourself anyway.
It doesn't look like you need raw() for the example query you posted. I think the following queryset is very similar.
measurements = Measurement.objects.filter(
device_id=device_id,
to_timestamp__gte=start,
to_timestamp__lte,
).order_by('time')
for measurement in measurements:
print(getattr(measurement, sensor)
If you need to optimise and avoid loading other fields, you can use values() or only().
I'm refactoring a little side project to use SQLite instead of a python data structure so that I can learn SQLite. The data structure I've been using is a list of dicts, where each dict's keys represent a menu item's properties. Ultimately, these keys should become columns in an SQLite table.
I first thought that I could create the table programmatically by creating a single-column table, iterating over the list of dictionary keys, and executing an ALTER TABLE, ADD COLUMN command like so:
# Various import statements and initializations
conn = sqlite3.connect(database_filename)
cursor = conn.cursor()
cursor.execute("CREATE TABLE menu_items (item_id text)")
# Here's the problem:
cursor.executemany("ALTER TABLE menu_items ADD COLUMN ? ?", [(key, type(value)) for key, value in menu_data[0].iteritems()])
After some more reading, I realized parameters cannot be used for identifiers, only for literal values. The PyMOTW on sqlite3 says
Query parameters can be used with select, insert, and update statements. They can appear in any part of the query where a literal value is legal.
Kreibich says on p. 135 of Using SQLite (ISBN 9780596521189):
Note, however, that parameters can only be used to replace literal
values, such as quoted strings or numeric values. Parameters
cannot be used in place of identifiers, such as table names or
column names. The following bit of SQL is invalid:
SELECT * FROM ?; -- INCORRECT: Cannot use a parameter as an identifier
I accept that positional or named parameters cannot be used in this way. Why can't they? Is there some general principle I'm missing?
Similar SO question:
Python sqlite3 string formatting
Identifiers are syntactically significant while variable values are not.
Identifiers need to be known at SQL compilation phase so that the compiled internal bytecode representation knows about the relevant tables, columns, indices and so on. Just changing one identifier in the SQL could result in a syntax error, or at least a completely different kind of bytecode program.
Literal values can be bound at runtime. Variables behave essentially the same in a compiled SQL program regardless of the values bound in them.
I don't know why, but every database I ever used has the same limitation.
I think it would be analogous to use a variable to hold the name of another variable. Most languages do not allow that, PHP being the only exception I know of.
Regardless of the technical reasons, dynamically choosing table/column names in SQL queries is a design smell, which is why most databases do not support it.
Think about it; if you were coding a menu in Python, would you dynamically create a class for each combination of menu items? Of course not; you'd have one Menu class that contains a list of menu items. It's similar in SQL too.
Most of the time, when people ask about dynamically choosing table names, it's because they've split up their data into different tables, like collection1, collection2, ... and use the name to select which collection to query from. This isn't a very good design; it requires the service to repeat the schema for each table, including indexes, constraints, permissions, etc, and also makes altering the schema harder (Need to add a field? Now you need to do it across hundreds of tables instead of one).
The correct way of designing the database would be to have a single collection table and add a collection_id column; instead of querying collection4, you'd add a WHERE collection_id = 4 constraint to your SELECT queries. Note that the 4 is now a value, and can be replaced with a query parameter.
For your case, I would use this schema:
CREATE TABLE menu_items (
item_id TEXT,
key TEXT,
value NONE,
PRIMARY KEY(item_id, key)
);
Use executemany to insert a row for each entry in the dictionary. When you need to load the dictionary, run a SELECT filtering on item_id and recreate the dictionary one row/entry at a time.
(Of course, as with everything in Software Engineering, there are exception. Tools that operate on schemas generically, such as ORMs, will need to specify table/column names dynamically.)
I have a table of three columnsid,word,essay.I want to do a query using (?). The sql sentence is sql1 = "select id,? from training_data". My code is below:
def dbConnect(db_name,sql,flag):
conn = sqlite3.connect(db_name)
cursor = conn.cursor()
if (flag == "danci"):
itm = 'word'
elif flag == "wenzhang":
itm = 'essay'
n = cursor.execute(sql,(itm,))
res1 = cursor.fetchall()
return res1
However, when I print dbConnect("data.db",sql1,"danci")
The result I obtained is [(1,'word'),(2,'word'),(3,'word')...].What I really want to get is [(1,'the content of word column'),(2,'the content of word column')...]. What should I do ? Please give me some ideas.
You can't use placeholders for identifiers -- only for literal values.
I don't know what to suggest in this case, as your function takes a database nasme, an SQL string, and a flag to say how to modify that string. I think it would be better to pass just the first two, and write something like
sql = {
"danci": "SELECT id, word FROM training_data",
"wenzhang": "SELECT id, essay FROM training_data",
}
and then call it with one of
dbConnect("data.db", sql['danci'])
or
dbConnect("data.db", sql['wenzhang'])
But a lot depends on why you are asking dbConnect to decide on the columns to fetch based on a string passed in from outside; it's an unusual design.
Update - SQL Injection
The problems with SQL injection and tainted data is well documented, but here is a summary.
The principle is that, in theory, a programmer can write safe and secure programs as long as all the sources of data are under his control. As soon as they use any information from outside the program without checking its integrity, security is under threat.
Such information ranges from the obvious -- the parameters passed on the command line -- to the obscure -- if the PATH environment variable is modifiable then someone could induce a program to execute a completely different file from the intended one.
Perl provides direct help to avoid such situations with Taint Checking, but SQL Injection is the open door that is relevant here.
Suppose you take the value for a database column from an unverfied external source, and that value appears in your program as $val. Then, if you write
my $sql = "INSERT INTO logs (date) VALUES ('$val')";
$dbh->do($sql);
then it looks like it's going to be okay. For instance, if $val is set to 2014-10-27 then $sql becomes
INSERT INTO logs (date) VALUES ('2014-10-27')
and everything's fine. But now suppose that our data is being provided by someone less than scrupulous or downright malicious, and your $val, having originated elsewhere, contains this
2014-10-27'); DROP TABLE logs; SELECT COUNT(*) FROM security WHERE name != '
Now it doesn't look so good. $sql is set to this (with added newlines)
INSERT INTO logs (date) VALUES ('2014-10-27');
DROP TABLE logs;
SELECT COUNT(*) FROM security WHERE name != '')
which adds an entry to the logs table as before, end then goes ahead and drops the entire logs table and counts the number of records in the security table. That isn't what we had in mind at all, and something we must guard against.
The immediate solution is to use placeholders ? in a prepared statement, and later passing the actual values in a call to execute. This not only speeds things up, because the SQL statement can be prepared (compiled) just once, but protects the database from malicious data by quoting every supplied value appropriately for the data type, and escaping any embedded quotes so that it is impossible to close one statement and another open another.
This whole concept was humourised in Randall Munroe's excellent XKCD comic