Is the SQLAlchemy text function exposed to SQL Injection?

Is the SQLAlchemy text function exposed to SQL Injection? - python

I'm learning how to use SQL Alchemy, and I'm trying to re-implement a previously defined API but now using Python.
The REST API has the following query parameter:
myService/v1/data?range=time:2015-08-01:2015-08-02
So I want to map something like field:FROM:TO to filter a range of results, like a date range, for example.
This is what I'm using at this moment:
rangeStatement = range.split(':')
if(len(rangeStatement)==3):
query = query.filter(text('{} BETWEEN "{}" AND "{}"'.format(*rangeStatement)))
So, this will produce the following WHERE condition:
WHERE time BETWEEN "2015-08-01" AND "2015-08-02"
I know SQL Alchemy is a powerful tool that allows creating queries like Query.filter_by(MyClass.temp), but I need the API request to be as open as possible.
So, I'm worried that someone could pass something like DROP TABLE in the range parameter and exploit the text function

If queries are constructed using string formatting then sqlalchemy.text will not prevent SQL injection - the "injection" will already be present in the query text. However it's not difficult to build queries dynamically, in this case by using getattr to get a reference to the column. Assuming that you are using the ORM layer with model class Foo and table foos you can do
import sqlalchemy as sa
...
col, lower, upper = 'time:2015-08-01:2015-08-02'.split(':')
# Regardless of style, queries implement a fluent interface,
# so they can be built iteratively
# Classic/1.x style
q1 = session.query(Foo)
q1 = q1.filter(getattr(Foo, col).between(lower, upper))
print(q1)
or
# 2.0 style (available in v1.4+)
q2 = sa.select(Foo)
q2 = q2.where(getattr(Foo, col).between(lower, upper))
print(q2)
The respective outputs are (parameters will be bound at execution time):
SELECT foos.id AS foos_id, foos.time AS foos_time
FROM foos
WHERE foos.time BETWEEN ? AND ?
and
SELECT foos.id, foos.time
FROM foos
WHERE foos.time BETWEEN :time_1 AND :time_2
SQLAlchemy will delegate quoting of the values to the connector package being used by the engine, so you're protection against injection will be as good as that provided by the connector package*.
* In general I believe correct quoting should be a good defence against SQL injections, however I'm not sufficiently expert to confidently state that it's 100% effective. It will be more effective than building queries from strings though.

Related

Why does string concatenation create extra parameters using Access and PYODBC?

I created this procedure that I call with the cursor.execute method. The problem that I'm having is that PYODBC sees extra parameters than what I've given.
In this example query, the "-" and "-" are being read as extra parameters by PYODBC. Does anyone know why this is the case? This is happening any time I do any string concatenation in Access.
def GetAccessResults(self):
with pyodbc.connect(SQL.DBPath) as con:
cursor = con.cursor()
if self.parameters == None:
cursor.execute('{{Call {}}}'.format(self.storedProc))
else:
callString = self.__CreateStoredProcString()
cursor.execute(callString, self.parameters)
returnValue = cursor.fetchall()
return returnValue
def __CreateStoredProcString(self):
questionMarks = ('?,' * len(self.parameters))[:-1]
return '{{Call {} ({})}}'.format(self.storedProc, questionMarks)

As OP found out, MS Access being both a frontend GUI application and backend database operates differently in running SQL. Usually, the backend mode tends to be closer to standard SQL, namely:
Quotes: In backend, single quotes are reserved for literals and double quotes for identifiers as opposed to being interchangeable inside MSAccess.exe.
Wildcards: In backend, by default wildcards for LIKE uses % and GUI uses * unless running Access database in SQL Server compatible syntax (ANSI 92) that uses the standard %. For this reason, consider Access' ALIKE (ANSI-Like) with % to be compatible in both Access modes. Interestingly, this is not the case with stored queries as OP uses but will if writing queries in code.
Parameter: In backend, any unquoted object not recognized in query is considered a named parameter and errs out. Meanwhile the GUI launches a pop-up Enter Parameter box (which beginners do not know is actually a runtime error) allowing typed answers to then be evaluated on fly.
GUI Objects: While in GUI, Access queries can reference controls in open forms and reports, even user-defined functions defined in standalone VBA modules, these same references will error out in backend which essentially runs Access in headless mode and only recognizes other tables and queries.
Optimization: See Allen Browne's differences in optimizations that can occur when creating queries in GUI vs backend especially referencing Access object library functions.
By the way, using LIKE on subquery evaluates one scalar to another scalar. In fact, Access will err out if subquery returns more than one row which potentially can occur with current setup.
Error 3354: At most one record can be returned by this subquery
In other databases, the evaluation runs on first row of subquery (which without ORDER BY can be a random row) and not all records of subquery. Instead, consider re-factoring SQL to use EXISTS clause:
PARAMETERS prmServiceName Tex(255);
SELECT c.*
FROM Charts c
WHERE EXISTS
(SELECT 1
FROM Services s
WHERE s.ServiceName = prmService_Name
AND c.FileName ALIKE '%-' & s.Service_Abbreviation & '-%');

Try using ampersands:
Select "*-" & Service_Abbreviation & "-*"
Also, Like expects a string wrapped in quotes, and your subquery doesn't return that. So perhaps:
Select "'*-" & Service_Abbreviation & "-*'"

How do I call a database function using SQLAlchemy in Flask?

I want to call a function that I created in my PostgreSQL database. I've looked at the official SQLAlchemy documentation as well as several questions here on SO, but nobody seems to explain how to set up the function in SQLAlchemy.
I did find this question, but am unsure how to compile the function as the answer suggests. Where does that code go? I get errors when I try to put this in both my view and model scripts.
Edit 1 (8/11/2016)
As per the community's requests and requirements, here are all the details I left out:
I have a table called books whose columns are arranged with information regarding the general book (title, author(s), publication date...).
I then have many tables all of the same kind whose columns contain information regarding all the chapters in each book (chapter name, length, short summary...). It is absolutely necessary for each book to have its own table. I have played around with one large table of all the chapters, and found it ill suited to my needs, not too mention extremely unwieldy.
My function that I'm asking about queries the table of books for an individual book's name, and casts the book's name to a regclass. It then queries the regclass object for all its data, returns all the rows as a table like the individual book tables, and exits. Here's the raw code:
CREATE OR REPLACE FUNCTION public.get_book(bookName character varying)
RETURNS TABLE(/*columns of individual book table go here*/)
LANGUAGE plpgsql
AS $function$
declare
_tbl regclass;
begin
for _tbl in
select name::regclass
from books
where name=bookName
loop
return query execute '
select * from ' ||_tbl;
end loop;
end;
$function$
This function has been tested several times in both the command line and pgAdmin. It works as expected.
My intention is to have a view in my Flask app whose route is #app.route('/book/<string:bookName>') and calls the above function before rendering the template. The exact view is as follows:
#app.route('/book/<string:bookName>')
def book(bookName):
chapterList = /*call function here*/
return render_template('book.html', book=bookName, list=chapterList)
This is my question: how do I set up my app in such a way that SQLAlchemy knows about and can call the function I have in my database? I am open to other suggestions of achieving the same result as well.
P.S. I only omitted this information with the intention of keeping my question as abstract as possible, not knowing that the rules of the forum dictate a requirement for a very specific question. Please forgive me my lack of knowledge.

If you want to do it without raw sql, you can use func from sqlalchemy:
from sqlalchemy import func
data = db.session.query(func.your_schema.your_function_name()).all()

You can use func
Syntax:
from sqlalchemy import func
func.function_name(column)
Example:
from sqlalchemy import func
result = db.session.query(func.lower(Student.name)).all()

I found a solution to execute the function with raw SQL:
Create a connection
Call the function as you normally would in the database GUI. E.g. for the function add_apples():
select add_apples();
Execute this statement, which should be a string.
Example code:
transaction = connection.begin()
sql = list() # Allows multiple queries
sql.append('select add_apples();')
print('Printing the queries.')
for i in sql:
print(i)
# Now, we iterate through the sql statements executing them one after another. If there is an exception on one of them, we stop the execution
# of the program.
for i in sql:
# We execute the corresponding command
try:
r = connection.execute(i)
print('Executed ----- %r' % i)
except Exception as e:
print('EXCEPTION!: {}'.format(e))
transaction.rollback()
exit(-1)
transaction.commit()

from sqlalchemy.sql import text
with engine.connect() as con:
statement = text("""your function""")
con.execute(statement)
You must execute raw sql through sqlalchemy

safe parameter bindings in sqlalchemy filter

I need to pass a partial raw sql query into sqlalchemy filter, like
s.query(account).filter("coordinate <#> point(%s,%s) < %s"%(lat,long,distance))
Yes, I'm trying to use earthdistance function in postgresql.
Of course, I could use PostGis and GeoAlchemy2, but I want to know the general solution to this kind of problems.
I know sqlalchemy can safely pass raw sql query .
result = db.engine.execute("select * coordinate <#> point(:lat,:long) < :distance",**params)
Is there any similar function that can be used to bind parameter of partial(?) sql query? I guess someone who implements custom sql function like func.ll_to_earth have used the function.

There is .params() on query. Try this:
query = s.query(account).filter(
"coordinate <#> point(:lat, :long_) < :dist").params(
lat=lat, long_=long_, dist=distance)
And there is the documentation on it.
Note: I renamed your long param, because there is alread a __builtin__ named long (long int) in python, it's good practice to not overwrite already used words for obvious reasons.

How to write a generative update in SQLAlchemy

I'm just using SQLAlchemy core, and cannot get the sql to allow me to add where clauses. I would like this very generic update code to work on all my tables. The intent is that this is part of a generic insert/update function that corresponds to every table. By doing it this way it allows for extremely brief test code and simple CLI utilities that can simply pass all args & options without the complexity of separate sub-commands for each table.
It'll take a few more tweaks to get it there, but should be doing the updates now just fine. However, while SQLAlchemy refers to generative queries it doesn't distinguish between selects & updates. I've reviewed SQLAlchemy documentation, Essential SQLAlchemy, stackoverflow, and several source code repositories, and have found nothing.
u = self._table.update()
non_key_kw = {}
for column in self._table.c:
if column.name in self._table.primary_key:
u.where(self._table.c[column.name] == kw[column.name])
else:
col_name = column.name
non_key_kw[column.name] = kw[column.name]
print u
result = u.execute(kw)
Which fails - it doesn't seem to recognize the where clause:
UPDATE struct SET year=?, month=?, day=?, distance=?, speed=?, slope=?, temp=?
FAIL
And I can't find any examples of building up an update in this way. Any recommendations?

the "where()" method is generative in that it returns a new Update() object. The old one is not modified:
u = u.where(...)

Django ORM query to find all objects which don't have a recent related object

I have a repeating pattern in my code where a model has a related model (one-to-many) which tracks its history/status. This related model can have many objects representing a point-in-time snapshot of the model's state.
For example:
class Profile(models.Model):
pass
class Subscription(models.Model):
profile = models.ForeignKey(Profile)
data_point = models.IntegerField()
created = models.DateTimeField(default=datetime.datetime)
#Example objects
p = Provile()
subscription1 = Subscription(profile=p, data_point=32, created=datetime.datetime(2011, 7 1)
subscription2 = Subscription(profile=p, data_point=2, created=datetime.datetime(2011, 8 1)
subscription3 = Subscription(profile=p, data_point=3, created=datetime.datetime(2011, 9 1)
subscription4 = Subscription(profile=p, data_point=302, created=datetime.datetime(2011, 10 1)
I often need to query these models to find all of the "Profile" objects that haven't had a subscription update in the last 3 days or similar. I've been using subselect queries to accomplish this:
q = Subscription.objects.filter(created__gt=datetime.datetime.now()-datetime.timedelta(days=3).values('id').query
Profile.objects.exclude(subscription__id__in=q).distinct()
The problem is that this is terribly slow when large tables are involved. Is there a more efficient pattern for a query such as this? Maybe some way to make Django use a JOIN instead of a SUBSELECT (seems like getting rid of all those inner nested loops would help)?
I'd lilke to use the ORM, but if needed I'd be willing to use the .extra() method or even raw SQL if the performance boost is compelling enough.
I'm running against Django 1.4alpha (SVN Trunk) and Postgres 9.1.

from django.db.models import Max
from datetime import datetime, timedelta
Profile.objects.annotate(last_update=Max('subscription__created')).filter(last_update__lt=datetime.now()-timedelta(days=3))
Aggregation (and annotation) is awesome-sauce, see: https://docs.djangoproject.com/en/dev/topics/db/aggregation/

Add a DB index to created:
created = models.DateTimeField(default=datetime.datetime, db_index=True)
As a rule of thumb, any column that is used in queries for lookup or sorting should be indexed, unless you are heavy on writing operations (in that case you should think about using a separate search index, maybe).
Queries using db columns without indexes are only so fast. If you want to analyze the query bottlenecks in more detail, turn on logging for longer running statements (e.g. 200ms and above), and do an explain analyze (postgres) on the long running queries.
EDIT:
I've only now seen in your comment that you have an index on the field. In that case, all the more reason to look at the output of explain analyze.
to make sure that the index is really used, and to its full extend.
to look whether postgres is unnecessarily writing to disk instead of using memory
See
- on query planning http://www.postgresql.org/docs/current/static/runtime-config-query.html
Maybe this helps as an intro: http://blog.it-agenten.com/2015/11/tuning-django-orm-part-2-many-to-many-queries/

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.