Sanitizing user-provided SQL with Python?

Sanitizing user-provided SQL with Python? - python

I'm working on a small app which will help browse the data generated by vim-logging, and I'd like to allow people to run arbitrary SQL queries against the datasets.
How can I do that safely?
For example, I'd like to let someone enter, say, SELECT file_type, count(*) FROM commands GROUP BY file_type, then send the result back to their web browser.

Do this:
cmd = "update people set name=%s where id=%s"
curs.execute(cmd, (name, id))
Note that the placeholder syntax depends on the database you are using.
Source and more info here:
http://bobby-tables.com/python.html

Allowing expressive power while preventing destruction is a difficult job. If you let them enter "SELECT .." themselves, you need to prevent them from entering "DELETE .." instead. You can require the statement to begin with "SELECT", but then you also have to be sure it doesn't contain "; DELETE" somewhere in the middle.
The safest thing to do might be to connect to the database with read-only user credentials.

In MySQL, you can create a limited user (create new user and grant limited access), which can only access certain table.

Consider using SQLAlchemy. While SQLAlchemy is arguably the greatest Object Relational Mapper ever, you certainly don't need to use any of the ORM stuff to take advantage of all of the great Python/SQL work that's been done.
As the introductory documentation suggests:
Most importantly, SQLAlchemy is not just an ORM. Its data abstraction layer allows construction and manipulation of SQL expressions in a platform agnostic way, and offers easy to use and superfast result objects, as well as table creation and schema reflection utilities. No object relational mapping whatsoever is involved until you import the orm package. Or use SQLAlchemy to write your own!
Using SQLAlchemy will give you input sanitation "for free" and let you use standard Python logic to analyze statements for safety without having to do any messy text-parsing/pattern-matching.

Related

Suggestions on how my end users can extract data using my SQL query without having to use SQL?

The problem is that I've ended up with the extra role of having to run these queries for different users everyday because no one in our factory knows how to use SQL and my IT wouldn't grant them database access anyway.
I've consulted some tech savvier friends who have suggested that i create a GUI (using python) that does the querying for me? That sounds like it would be ideal since all my end users would have to do is key into an input field someone on the interface and the query would run, versus them querying for the data directly and potentially messing up my SQL script.
I've read online as well of the possibility of doing something like mentioned before but in Excel instead. Right now I'm just really lost as to where to start. I probably would need to do alot of reading up anyway but it'd be at least nice to know what is a good direction to start. Any new suggestions would be appreciated.

This could be overkill for you, but Metabase is basically a GUI you can pretty easily set up on top of your SQL database.
It's target audience is "business analytics", so it enables people to create "questions" (aka. queries) using a GUI rather than writing SQL (although it allows raw SQL). More relevant to you, you probably just want to preconfigure some "questions" your users can log in and run in Metabase.
Without writing any code at all you could set up Metabase, write the query you want using either raw SQL or their GUI, and parameterize the query with some user_id or something the users can fill in. From there, you could give the users access to your Metabase (via the web), where they can go in and use the GUI to fill in whatever they need to and then run the query you've created for them.
Metabase also has a lot of customization, so you can make sure they only have permissions to see certain data, etc., and they don't need DB credentials. You, the Metabase admin, connect the database when you set up Metabase and give other users Metabase logins instead.

what is difference between raw sql queries & normal sql queries?

I am kind of new to sql and database & currently developing website in django framework.
During my reading of django documentation I have read about raw sql queries which are executed using Manager.raw() like below.
for p in Person.objects.raw('SELECT * FROM myapp_person'):
Manager.raw(raw_query, params=None, translations=None)
How does raw queries differes from normal sql queries & when should I use raw sql queries instead of Django ORM ?

Django (like other similar ORM tools) is a connection between relational databases and object-oriented programming. One of the very important functions that it implements is providing a uniform interface to the database -- regardless of the underlying database.
When you use underlying Django functionality, the code should be supported on any database (there may be specific limits on this). This makes it particularly easy to port to another database. It also helps ensure that the generated queries do what you intend.
When you use raw SQL, the code is likely to be specific to one database (creating a porting problem). The code is also not checked, which can result in hard-to-understand errors.
I have a strong preference for using SQL directly -- but that is because I am not a programmer using an ORM framework. If you are going to use such a framework, it is probably better to use the built-in functionality wherever possible.

This is a borderline opinion question so might get flagged, but it is a good point. Essentially the raw SQL queries are intended to only be used for the edge cases where the Django ORM does not fulfil your needs (and with each new version of Django it support more and more query types so raw becomes less useful).
In general I would suggest using the ORM for the more helpful error messages, maintainability, and plain ease of use, and only use raw as a last-resort

SQLite query restrictions

I am building a little interface where I would like users to be able to write out their entire sql statement and then see the data that is returned. However, I don't want a user to be able to do anything funny ie delete from user_table;. Actually, the only thing I would like users to be able to do is to run select statements. I know there aren't specific users for SQLite, so I am thinking what I am going to have to do, is have a set of rules that reject certain queries. Maybe a regex string or something (regex scares me a little bit). Any ideas on how to accomplish this?
def input_is_safe(input):
input = input.lower()
if "select" not in input:
return False
#more stuff
return True

I can suggest a different approach to your problem. You can restrict the access to your database as read-only. That way even when the users try to execute delete/update queries they will not be able to damage your data.
Here is the answer for Python on how to open a read-only connection:
db = sqlite3.connect('file:/path/to/database?mode=ro', uri=True)

Python's sqlite3 execute() method will only execute a single SQL statement, so if you ensure that all statements start with the SELECT keyword, you are reasonably protected from dumb stuff like SELECT 1; DROP TABLE USERS. But you should check sqlite's SQL syntax to ensure there is no way to embed a data definition or data modification statement as a subquery.
My personal opinion is that if "regex scares you a little bit", you might as well just put your computer in a box and mail it off to <stereotypical country of hackers>. Letting untrusted users write SQL code is playing with fire, and you need to know what you're doing or you'll get fried.

Open the database as read only, to prevent any changes.
Many statements, such as PRAGMA or ATTACH, can be dangerous. Use an authorizer callback (C docs) to allow only SELECTs.
Queries can run for a long time, or generate a large amount of data. Use a progress handler to abort queries that run for too long.

Insert row into database with unknown schema using Python module peewee

I am building a database interface using Python's peewee module. I am trying to figure out how to insert data into an existing database where I do not know the schema.
My idea is to use playhouse.reflection.Introspector to find out the database schema, then use that information to create class objects which can then be inserted into the existing database.
So far I've gotten to:
introspector = Introspector.from_database(database)
models = introspector.generate_models()
I'm don't know where to go from there.
1) Can I create database objects in this manner? What is the next step?
2) Is there an easier way to do this?

peewee includes an introspection tool called pwiz that can (basically) introspect a database and produce model definitions. It is run as a command line script and dumps the model definitions to stdout, so invokation is like any other unix tool. Here is an example from the docs:
python -m pwiz -e postgresql my_postgres_db > mymodels.py
From there edit mymodels.py to get what you need.
You could do this on the fly, but it would require a few steps and is hackish (not to mention pointless if you really don't know anything about the schema):
Run pwiz as an os command
Read it to pick out the model names
Import whatever you find
BUT
If you really don't know the schema to start with then you have no idea what the semantics of the database are anyway, which means whatever you find is literally meaningless. Unless you at least know some schema/table/column names you are hunting for (in which case you do know something about the schema) there isn't really much you can do with regard to inserting data (not in a sane way), though you could certainly dump data from the db. But if you just wanted a database dump then pg_dump would have been easier.
I suspect this is actually an X-Y problem. What problem is it you are trying to solve by using this technique? What effect is it supposed to achieve within the context of your system?

If you want to create a GUI, check out the sqlite_web project. It uses Peewee to create a web-based SQLite database manager.

Building a DSL query language

i'm working on a project (written in Django) which has only a few entities, but many rows for each entity.
In my application i have several static "reports", directly written in plain SQL. The users can also search the database via a generic filter form. Since the target audience is really tech-savvy and at some point the filter doesn't fit their needs, i think about creating a query language for my database like YQL or Jira's advanced search.
I found http://sourceforge.net/projects/littletable/ and http://www.quicksort.co.uk/DeeDoc.html, but it seems that they only operate on in-memory objects. Since the database can be too large for holding it in-memory, i would prefer that the query is translated in SQL (or better a Django query) before doing the actual work.
Are there any library or best practices on how to do this?

Writing such a DSL is actually surprisingly easy with PLY, and what ho—there's already an example available for doing just what you want, in Django. You see, Django has this fancy thing called a Q object which make the Django querying side of things fairly easy.
At DjangoCon EU 2012, Matthieu Amiguet gave a session entitled Implementing Domain-specific Languages in Django Applications in which he went through the process, right down to implementing such a DSL as you desire. His slides, which include all you need, are available on his website. The final code (linked to from the last slide, anyway) is available at http://www.matthieuamiguet.ch/media/misc/djangocon2012/resources/compiler.html.
Reinout van Rees also produced some good comments on that session. (He normally does!) These cover a little of the missing context.
You see in there something very similar to YQL and JQL in the examples given:
groups__name="XXX" AND NOT groups__name="YYY"
(modified > 1/4/2011 OR NOT state__name="OK") AND groups__name="XXX"
It can also be tweaked very easily; for example, you might want to use groups.name rather than groups__name (I would). This modification could be made fairly trivially (allow . in the FIELD token, by modifying t_FIELD, and then replacing . with __ before constructing the Q object in p_expression_ID).
So, that satisfies simple querying; it also gives you a good starting point should you wish to make a more complex DSL.

I've faced exactly this problem - a large database which needs searching. I made some static reports and several fancy filters using django (very easy with django) just like you have.
However the power users were clamouring for more. I decided that there already was a DSL that they all knew - SQL. The question was how to make it secure enough.
So I used django permissions to give the power users permission to make SQL queries in a new table. I then made a view for the not-quite-so-power users to use these queries. I made them take optional parameters. The queries were run using Python's lower level DB-API which django is using under the hood for its ORM anyway.
The real trick was opening a read only database connection to run these queries just to make sure that no updates were ever run. I made a read only connection by creating a different user in the database with lower permissions and opening a specific connection for that in the view.
TL;DR - SQL is the way to go!

Depending on the form of your data, the types of queries your users need to use, and the frequency that your data is updated, an alternative to the pure SQL solution suggested by Nick Craig-Wood is to index your data in Solr and then run queries against it.
Solr is an added layer of complexity (configuration, data synchronization) but it is super-fast, can handle large datasets, and provides a (relatively) intuitive query language.

You could write your own SQL-ish language using pyparsing, actually. There is even pretty verbose example you could extend.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Sanitizing user-provided SQL with Python? - python

Do this: cmd = "update people set name=%s where id=%s" curs.execute(cmd, (name, id)) Note that the placeholder syntax depends on the database you are using. Source and more info here: http://bobby-tables.com/python.html

In MySQL, you can create a limited user (create new user and grant limited access), which can only access certain table.

Related

Suggestions on how my end users can extract data using my SQL query without having to use SQL?

what is difference between raw sql queries & normal sql queries?

SQLite query restrictions

Insert row into database with unknown schema using Python module peewee

Building a DSL query language

Categories

Resources