sqlalchemy using custom methods in filter

sqlalchemy using custom methods in filter - python

I'm having a problem with this sqlalchemy query:
def bvalue(value):
if isinstance(value, unicode):
value = re.sub('[^\w]', "", value).lower()
return value
basicValue = bvalue(someVariable)
q = self.session.query(sheet.id).\
filter(bvalue(sheet.column) == basicValue)
The bvalue function works. I'm trying to match values after stripping them from any special characters and capitalisation. The stripped variable does match with the stripped db value, but still the query is not retrieving any results.
What am I doing wrong? Can't you use custom methods in filters?

You are aware that SQLAlchemy translates your queries into plain SQL statements that are then emitted to your configured database?
So naturally you can't simply add arbitrary python functions, since they would have to be translated into SQL which can't be done in a generic way.
Aside from this general issue, bvalue(sheet.column) will simply return sheet.column (since it's not a unicode instance) and it is evaluated before creating the query. So your query is in fact equivalent to:
q = self.session.query(sheet.id).\
filter(sheet.column == basicValue)
How to get the regex into SQL depends on the database you're using. Check e.g.
REGEXP_LIKE in SQLAlchemy
for a some suggestions.

Related

Python strings problem storing single quote

I am trying to execute mysql query from python. I want the output
query = "UPDATE 'college_general' SET 'fees' = '180000' WHERE ('college_id' = '2')"
Below is the snippet of the code
def update(table, column, value):
return f"UPDATE '{table}' SET '{column}' = '{value}' WHERE ('college_id' = '{id}')"
query = update("college_general", "fees", fee)
cursor.execute(query)
Instead Python is storing it like
query = 'UPDATE \'college_general\' SET \'fees\' = \'180000\' WHERE (\'college_id\' = \'2\')'
which is causing the script to fail. How can I achieve the desired output?
Thanks in advance!

You can replace the identifiers single quotes with backticks. For more detailed answers visit this question.
There are two types of quotes in MySQL:
' for enclosing string literals
` for enclosing identifiers such as table and column names

There are multiple issues here:
First, I suspect that the string handling bit of your program is actually working, but you are being confused by the external representation of strings. For example, if you do
x = "O'Reilly"
Python will, in some circumstances, display the string as
'O\'Reilly'
Second, I think you are using the wrong kind of quotes. Single quotes in SQL are for strings; MySQL uses backticks for names when necessary, while other SQL implementations usually use double quotes for this.
Third, AND THIS IS IMPORTANT! Do not use string manipulation for building SQL queries. The database library almost certainly has a feature for parametrized queries and you should be using that. Your query should look something like this:
query = 'UPDATE college_general SET fees = ? WHERE college_ID = ?'
cursor.execute(query, [180000, '2'])
but the details will depend on the DB library you are using. For example, some use %s instead of ?. This saves you from all kinds of headaches with quoting strings.

raw string is the simplest solution to your problem.
I believe the code below will achieve what you wanted.
def update(table, column, value):
return fr"UPDATE '{table}' SET '{column}' = '{value}' WHERE ('college_id' = '{id}')"
query = update("college_general", "fees", fee)
cursor.execute(query)

SQLAlchemy - how to get raw SQL of `.count()` queries?

The simplest possible way to get "raw" SQL for any query is just print it (actually, convert it to str).
But, this is not working for count() queries because count() is "firing" method - a method which is stated as "This results in an execution of the underlying query". Other "firing" methods include all(), first() and so on.
How to get SQL for such methods?
I'm especially interested in count() because it transforms underlying query in some way (actually, this way is described explicitly in docs, but things may vary). Other methods can alter resulting SQL as well, for example, first().
So, sometimes it is useful to get raw SQL of such queries in order to investigate how thing goes under the hood.
I read answers about "getting raw SQL" but this case is special because such methods don't return Query objects.
Note that I mean that I need a SQL of existing Query objects which are already constructed in some way.

The following example will return a count of any query object, which you should then be able to convert to a string representation:
from sqlalchemy import func
...
existing_query = session.query(Something)\
.join(OtherThing)\
.filter(OtherThing.foo = 'FOO')\
.subquery()
query = session.query(func.count(existing_query.c.bar).label('bar_count'))
print(query)
actual_count = query.as_scalar() # Executes query
Notice that you have to specify a field from the query output to count. In the example defined by existing_query.c.bar.

Add own literal string as additional field in result of query - SQLAlchemy

How in SQLAlchemy ORM to make analogue of the following raw sql?
SELECT "http://example.com/page/"||table.pagename as pageUrl
Need get value from table, modify using ORM/Python (here just a string concatenation), and output in result of the SQLAlchemy query as additional field.

The SQLAlchemy string types have operator overloads that allow you to treat them like you'd treat Python strings in this case (string concatenation), but produce SQL expressions:
session.query(
Table,
("http://example.com/page/" + Table.pagename).label("pageUrl"))
You can read more about SQLAlchemy's operator paradigm here: http://docs.sqlalchemy.org/en/latest/core/tutorial.html#operators

This can make via select, but there is almost no ORM:
from sqlalchemy.sql import select, text
q = select([text('"http://example.com/page/"||pagename as pageUrl')]).select_from(Table)
session.execute(q).fetchall()
Results will a list of objects in the RowProxy class.
For me seems that the solve via session.query (the answer above) is more convenient. It is short, and there results in result class that can be easy converting to dict.

Peewee execute_sql with escaped characters

I have wrote a query which has some string replacements. I am trying to update a url in a table but the url has % signs in which causes a tuple index out of range exception.
If I print the query and run in manually it works fine but through peewee causes an issue. How can I get round this? I'm guessing this is because the percentage signs?
query = """
update table
set url = '%s'
where id = 1
""" % 'www.example.com?colour=Black%26white'
db.execute_sql(query)

The code you are currently sharing is incredibly unsafe, probably for the same reason as is causing your bug. Please do not use it in production, or you will be hacked.
Generally: you practically never want to use normal string operations like %, +, or .format() to construct a SQL query. Rather, you should to use your SQL API/ORM's specific built-in methods for providing dynamic values for a query. In your case of SQLite in peewee, that looks like this:
query = """
update table
set url = ?
where id = 1
"""
values = ('www.example.com?colour=Black%26white',)
db.execute_sql(query, values)
The database engine will automatically take care of any special characters in your data, so you don't need to worry about them. If you ever find yourself encountering issues with special characters in your data, it is a very strong warning sign that some kind of security issue exists.
This is mentioned in the Security and SQL Injection section of peewee's docs.

Wtf are you doing? Peewee supports updates.
Table.update(url=new_url).where(Table.id == some_id).execute()

How to avoid multiple queries in one execute call

I've just realized that psycopg2 allows multiple queries in one execute call.
For instance, this code will actually insert two rows in my_table:
>>> import psycopg2
>>> connection = psycopg2.connection(database='testing')
>>> cursor = connection.cursor()
>>> sql = ('INSERT INTO my_table VALUES (1, 2);'
... 'INSERT INTO my_table VALUES (3, 4)')
>>> cursor.execute(sql)
>>> connection.commit()
Does psycopg2 have some way of disabling this functionality? Or is there some other way to prevent this from happening?
What I've come so far is to search if the query has any semicolon (;) on it:
if ';' in sql:
# Multiple queries not allowed!
But this solution is not perfect, because it wouldn't allow some valid queries like:
SELECT * FROM my_table WHERE name LIKE '%;'
EDIT: SQL injection attacks are not an issue here. I do want to give to the user full access of the database (he can even delete the whole database if he wants).

If you want a general solution to this kind of problem, the answer is always going to be "parse format X, or at least parse it well enough to handle your needs".
In this case, it's probably pretty simple. PostgreSQL doesn't allow semicolons in the middle of column or table names, etc.; the only places they can appear are inside strings, or as statement terminators. So, you don't need a full parser, just one that can handle strings.
Unfortunately, even that isn't completely trivial, because you have to know the rules for what counts as a string literal in PostgreSQL. For example, is "abc\"def" a string abc"def?
But once you write or find a parser that can identify strings in PostgreSQL, it's easy: skip all the strings, then see if there are any semicolons left over.
For example (this is probably not the correct logic,* and it's also written in a verbose and inefficient way, just to show you the idea):
def skip_quotes(sql):
in_1, in_2 = False, False
for c in sql:
if in_1:
if c == "'":
in_1 = False
elif in_2:
if c == '"':
in_2 = False
else:
if c == "'":
in_1 = True
elif c == '"':
in_2 = True
else:
yield c
Then you can just write:
if ';' in skip_quotes(sql):
# Multiple queries not allowed!
If you can't find a pre-made parser, the first things to consider are:
If it's so trivial that simple string operations like find will work, do that.
If it's a simple, regular language, use re.
If the logic can be explained descriptively (e.g., via a BNF grammar), use a parsing library or parser-generator library like pyparsing or pybison.
Otherwise, you will probably need to write a state machine, or even explicit iterative code (like my example above). But this is very rarely the best answer for anything but teaching purposes.
* This is correct for a dialect that accepts either single- or double-quoted strings, does not escape one quote type within the other, and escapes quotes by doubling them (we will incorrectly treat 'abc''def' as two strings abc and def, rather than one string abc'def, but since all we're doing is skipping the strings anyway, we get the right result), but does not have C-style backslash escapes or anything else. I believe this matches sqlite3 as it actually works, although not sqlite3 as it's documented, and I have no idea whether it matches PostgreSQL.

Allowing users to make arbitrary queries (even single queries) can open your program up to SQL injection attacks and denial-of-service (DOS) attacks. The safest way to deal with potentially malicious users is to enumerate exactly what what queries are allowable and only allow the user to supply parameter values, not the entire SQL query itself.
So for example, you could define
sql = 'INSERT INTO my_table VALUES (%s, %s)'
args = [1, 2] # <-- Supplied by the user
and then safely execute the INSERT statement with:
cursor.execute(sql, args)
This is called parametrized SQL because the sql uses %s as parameter placemarkers, and the cursor.execute statement takes two arguments. The second argument is expected to be a sequence, and the database driver (e.g. psycopg2) will replace the parameter placemarkers with propertly quoted values supplied by args.
This will prevent SQL injection attacks.
The onus is still on you (when you write your allowable SQL) to prevent denial-of-service attacks. You can attempt to protect yourself from DOS attacks by making sure the arguments supplied by the user is in a reasonable range, for instance.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.