Am I at risk for SQL injection in BigQuery? - python

I am working on accessing data in BigQuery with Python to do some data analysis. I access the data with a standard SQL query of:
"SELECT * FROM `project.dataset.table`"
I am using the same base code on multiple datasets so I took the approach of using environment variables for the project, dataset and table, giving me an actual query that looks like this:
f"SELECT * FROM `{PROJECT}.{DATASET}.{TABLE}`"
I did this in an effort to abstract my tables a little. I run bandit testing in my CI/CD pipeline and this query using variables is failing, suggesting possible injection. Now my query cannot be changed by user input as there are no points where I take user input to get to this query. I'm trying to figure out if this is a safe query to include in my code. I've attempted running more variables, less variables, using secret manager and all fail the bandit testing.
My gut is telling me that the usage of the variables "hides" some of my info since the table is at least separate from the query and that since no users can input anything there is no issue. But the failing test has me a bit concerned. Any thoughts on if this is safe?

SQL injection is important, because it allows the attacker to destroy and read sensitive data.
for your query you can: parameterized queries, Parameterized statements ensure that the parameters passed into the SQL statements are treated safely.
BigQuery supports query parameters to help prevent SQL injection when queries are constructed using user input. This feature is only available with standard SQL syntax.
#Example
client = bigquery.Client()
query = """
SELECT word, word_count
FROM `bigquery-public-data.samples.shakespeare`
WHERE corpus = #corpus
AND word_count >= #min_word_count
ORDER BY word_count DESC;
"""
job_config = bigquery.QueryJobConfig(
query_parameters=[
bigquery.ScalarQueryParameter("corpus", "STRING", "romeoandjuliet"),
bigquery.ScalarQueryParameter("min_word_count", "INT64", 250),
]
)
query_job = client.query(query, job_config=job_config) # Make an API request.
google refrence

YES
since no users can input anything there is no issue
If an attacker gains access to your environment variables, they can use them to perform a SQL injection. This is privilege elevation or escalation.
Parameters normally don't work on identifiers such as table names, only values. You can still protect yourself by filtering the identifiers. Some libraries have a function to do this. At minimum, make sure they don't contain a `.
Consider using a SQL builder which will take care of this for you.

Related

Why does Django not allow a user to extract a usable query from a QuerySet as a standard feature?

In Django, you can extract a plain-text SQL query from a QuerySet object like this:
queryset = MyModel.objects.filter(**filters)
sql = str(queryset.query)
In most cases, this query itself is not valid - you can't pop this into a SQL interface of your choice or pass it to MyModel.objects.raw() without exceptions, since quotations (and possibly other features of the query) are not performed by Django but rather by the database interface at execution time. So at best, this is a useful debugging tool.
Coming from a data science background, I often need to write a lot of complex SQL queries to aggregate data into a reporting format. The Django ORM can be awkward at best and impossible at worst when queries need to be very complex. However, it does offer some security and convenience with respect to limiting SQL injection attacks and providing a way to dynamically build a query - for example, generating the WHERE clause for the query using the .filter() method of a model. I want to be able to use the ORM to generate a base data set in the form of a query, then take that query and use it as a subquery/CTE in a larger query that handles more complex logic. For example:
queryset = MyModel.objects.filter(**filters)
sql = str(queryset.query)
more_complex_query = f"""
with filtered_table as ({sql})
select
*
/* add other stuff */
from
filtered_table
"""
results = MyModel.objects.raw(more_complex_query)
In this case, the ORM generates a query that can be used to filter the base table, then the CTE/raw sql can take that result and do whatever calculations need to be done with a tool that is more common among people working with data (SQL) than the Django ORM, while still getting the ORM benefits of stripping bad actors out.
However, this method requires a way to generate a usable SQL query from a QuerySet object. I've found a workaround for postgres databases using the psycopg2 cursor:
from django.db import connections
# Whatever the key is in your settings.DATABASES for the reporting db
WAREHOUSE_CONNECTION_NAME = 'default'
# Get the Query object and separate it into the query and params
filtered_table_query = MyModel.objects.filter(**filters).query
raw_query, params = filtered_table_query.sql_with_params()
# Create a cursor from the relevant connection
cursor = connections[WAREHOUSE_CONNECTION_NAME].cursor()
# Call .mogrify() on the query/params to get an executable query string
usable_sql = cursor.mogrify(raw_query, params)
cursor.execute(usable_sql) # This works
cursor.fetchall() # This works
# Have not tried this yet
MyModel.objects.raw(usable_sql)
# Or this
wrapper_query = f"""
with base_table as ({usable_sql})
select
*
from
base_table
"""
cursor.execute(wrapper_query)
# or
MyModel.objects.raw(wrapper_query)
This method is dependent on the psycopg2 cursor method .mogrify() - I am not sure if this works for other back ends or if the DB API 2.0 spec takes care of that.
Other people have suggested creating a view in the database and then using an unmanaged Django model on top of the view, but I think this does not really work when your queries are dynamic in nature, i.e. need to be filtered differently based on some user input, since often the fields a user wants to filter on are not present in the result set after some aggregation.
So overall, I have two questions:
Is there a reason why Django does not let you extract a usable SQL query as a standard offering?
What other methods do people use when the ORM makes your elegant SQL into an ugly mess?
The Django developers tend to frown on features that aren't cross-compatible across all the databases they support. I can only imagine that one of the supported database engines doesn't have this capability and so they don't provide it as a standard, documented feature of the ORM.
But that's just a guess. You'd really have to ask one of the devs :)

How do PyMySQL prevent user from sql injection attack?

Sorry for ask here but I cannot found much reference about pymysql's security guide about how do we prevent sql injection,
When I do PHP develope I know use mysql preparedstatement(or called Parameterized Query or stmt),but I cannot found reference about this in pymysql
simple code use pymysql like
sqls="select id from tables where name=%s"
attack="jason' and 1=1"
cursor.execute(sqls,attack)
How do I know this will prevent sql injection attack or not?if prevent succeed,how do pymysql prevent?Is cursor.execute already use preparedstatement by default?
Python drivers do not use real query parameters. In python, the argument (the variable attack in your example) is interpolated into the SQL string before sending the SQL to the database server.
This is not the same as using a query parameter. In a real parameterized query, the SQL string is sent to the database server with the parameter placeholder intact.
But the Python driver does properly escape the argument as it interpolates, which protects against SQL injection.
I can prove it when I turn on the query log:
mysql> SET GLOBAL general_log=ON;
And tail the log while I run the Python script:
$ tail -f /usr/local/var/mysql/bkarwin.log
...
180802 8:50:47 14 Connect root#localhost on test
14 Query SET ##session.autocommit = OFF
14 Query select id from tables where name='jason\' and 1=1'
14 Quit
You can see that the query has had the value interpolated into it, and the embedded quote character is preceded by a backslash, which prevents it from becoming an SQL injection vector.
I'm actually testing MySQL's Connector/Python, but pymysql does the same thing.
I disagree with this design decision for the Python connectors to avoid using real query parameters (i.e. real parameters work by sending the SQL query to the database with parameter placeholders, and sending the values for those parameters separately). The risk is that programmers will think that any string interpolation of parameters into the query string will work the same as it does when you let the driver do it.
Example of SQL injection vulnerability:
attack="jason' and '1'='1"
sqls="select id from tables where name='%s'" % attack
cursor.execute(sqls)
The log shows this has resulted in SQL injection:
180802 8:59:30 16 Connect root#localhost on test
16 Query SET ##session.autocommit = OFF
16 Query select id from tables where name='jason' and '1'='1'
16 Quit

How to retrieve the real SQL from the Django logger?

I am trying to analyse the SQL performance of our Django (1.3) web application. I have added a custom log handler which attaches to django.db.backends and set DEBUG = True, this allows me to see all the database queries that are being executed.
However the SQL is not valid SQL! The actual query is select * from app_model where name = %s with some parameters passed in (e.g. "admin"), however the logging message doesn't quote the params, so the sql is select * from app_model where name = admin, which is wrong. This also happens using django.db.connection.queries. AFAIK the django debug toolbar has a complex custom cursor to handle this.
Update For those suggesting the Django debug toolbar: I am aware of that tool, it is great. However it does not do what I need. I want to run a sample interaction of our application, and aggregate the SQL that's used. DjDT is great for showing and shallow learning. But not great for aggregating and summarazing the interaction of dozens of pages.
Is there any easy way to get the real, legit, SQL that is run?
Check out django-debug-toolbar. Open a page, and a sidebar will be displayed with all SQL queries plus other information.
select * from app_model where name = %s is a prepared statement. I would recommend you to log the statement and the parameters separately. In order to get a wellformed query you need to do something like "select * from app_model where name = %s" % quote_string("user") or more general query % map(quote_string, params).
Please note that quote_string is DB specific and the DB 2.0 API does not define a quote_string method. So you need to write one yourself. For logging purposes I'd recommend keeping the queries and parameters separate as it allows for far better profiling as you can easily group the queries without taking the actual values into account.
The Django Docs state that this incorrect quoting only happens for SQLite.
https://docs.djangoproject.com/en/dev/ref/databases/#sqlite-connection-queries
Have you tried another Database Engine?
Every QuerySet object has a 'query' attribute. One way to do what you want (I accept perhaps not an ideal one) is to chain the lookups each view is producing into a kind of scripted user-story, using Django's test client. For each lookup your user story contains just append the query to a file-like object that you write at the end, for example (using a list instead for brevity):
l = []
o = Object.objects.all()
l.append(o.query)

SQL Injection Prevention in Python - is using parameterized query enough?

I have the following python code:
row = conn.execute('''SELECT admin FROM account WHERE password = ?''',
(request.headers.get('X-Admin-Pass'),)).fetchone()
My question is whether this code is secure for SQL injection? Since I use parameterized query it should be. However, since I am passing user information straight from the header, I am a little worried :)
Any thoughts about the issue?
The way that you are inserting the data into the database will ensure that an SQL attack will not work, the execute method will automatically escape the parameters that you passed as a tuple as its second parameter to the query.
You are doing that correctly.
If your module uses the DBI specs, then you're parameterizing fine. Unless you want to do research into preventing specific SQL attacks, paramterizing your queries is a good umbrella against SQL injection.

Sanitizing user-provided SQL with Python?

I'm working on a small app which will help browse the data generated by vim-logging, and I'd like to allow people to run arbitrary SQL queries against the datasets.
How can I do that safely?
For example, I'd like to let someone enter, say, SELECT file_type, count(*) FROM commands GROUP BY file_type, then send the result back to their web browser.
Do this:
cmd = "update people set name=%s where id=%s"
curs.execute(cmd, (name, id))
Note that the placeholder syntax depends on the database you are using.
Source and more info here:
http://bobby-tables.com/python.html
Allowing expressive power while preventing destruction is a difficult job. If you let them enter "SELECT .." themselves, you need to prevent them from entering "DELETE .." instead. You can require the statement to begin with "SELECT", but then you also have to be sure it doesn't contain "; DELETE" somewhere in the middle.
The safest thing to do might be to connect to the database with read-only user credentials.
In MySQL, you can create a limited user (create new user and grant limited access), which can only access certain table.
Consider using SQLAlchemy. While SQLAlchemy is arguably the greatest Object Relational Mapper ever, you certainly don't need to use any of the ORM stuff to take advantage of all of the great Python/SQL work that's been done.
As the introductory documentation suggests:
Most importantly, SQLAlchemy is not just an ORM. Its data abstraction layer allows construction and manipulation of SQL expressions in a platform agnostic way, and offers easy to use and superfast result objects, as well as table creation and schema reflection utilities. No object relational mapping whatsoever is involved until you import the orm package. Or use SQLAlchemy to write your own!
Using SQLAlchemy will give you input sanitation "for free" and let you use standard Python logic to analyze statements for safety without having to do any messy text-parsing/pattern-matching.

Categories

Resources