Is there a way to dynamically build an sql query - python

I am coding a database manager in Python using SQLite3 and tkinter, and I am attempting to build a query function. Each element in my database has several attributes: 'name', 'path', 'seen', 'quality', 'franchise', 'genre' and 'tags'.
If possible, I want the user to be able to select certain options in the GUI and then create a request to the database, but the problem is that the user should be able to select any or all of those attributes to be filtered out or into the query. For example, one query might be asking for all the objects in the database with the name "Tony", franchise "toy Story", and genre "Action", whereas another query might just want all objects of seen "yes".
I've been having a lot of trouble with this, and though I've been tempted to, I can't hardcode every permutation of parts of the SQL Select statement, and I feel like there's a better way anyways I can't see. I've tried setting a 'default statement':
SELECT * FROM objects WHERE and then adding onto it like genre IS ? and if franchise matters than AND franchise IS ?, but then I run into the problem of I don't know how to format the substitutions dynamically. I'm pretty sure this can be done, so I'd love any help. Thanks!

You're absolutely on the right path. Consider building up a list of WHERE clauses and a parallel list of substitutions. In this example, I'm using strings, but you get the idea:
subs = []
where = []
where.append( "name = ?" )
subs.append( "Tony" )
where.append( "franchise = ?" )
subs.append( "Toy Story" )
sql = "SELECT * FROM movies WHERE " + (" AND ".join(where)) + ";"
query = cursor.execute( sql, subs )

Related

Using multiple databases in SQL with Python

I'm working on a registration site and right now I am trying to make multiple bases at the same time.
I have 2 databases, one called Tables, and the other called Digit.
I want to use the SAME function for both of them, only to manage 2 databases at the same time, differ by the front end choice.
I tried to use %s as a place holder for the table name, while i have a dict at the beginning for the different databases (1: Tables, 2 : Digit)
cursor.execute("SELECT * FROM %s WHERE active = %s AND isFull = %s ORDER BY minLevel" , [bases[DB], 1,0])
This is the code I wrote, I was hoping to switch the databases based on the DB given from the front end of the site.
And.. it didn't work. I'm really stuck here, and I am not sure if this way is even legal...
Thanks a head for you help!
I figured it out!
thanks to another post by the way - thank to cursor.query( 'select * from %s;', ('thistable',) ) throws syntax error 1064: ...near ' 'thistable' ' at
the problem is you cant use %s on "database variables", like column, databases, etc.
the way to work around it is to build the query as a string beforehand and then to use execute in this format :
cursor.execute (q , [variable])
while q is the pre-built query.
and while building the query to add the database wanted
so the code above should look like (i have a pre built dictionary)
q= "SELECT * FROM " + dict[Number] + " WHERE active = %s AND isFull = %s ORDER BY minLevel"
cursor.execute(q , [1,0])
while dict is the name of the dictionary, number is the variable i got from the front end. and active , is Full and minLevel are columns of mine i use.
hope it will help somebody!

Best practices to pass a subquery to a dynamic query in Python with Pandas

I'm currently working with Python and Pandas to query data from my databse. I have one query to get some customer information (this is obviously not the real query, it's simplified without joins etc.):
def customer_query(con, date):
stmt = """
SELECT
first_name
last_name
dob
FROM
customer
"""
return pd.read_sql(
stmt,
con
)
I do it with pandas so that I can easily export the dataset to csv without any hassle.
The requirements changed and I need to generate two different datasets based on circumstances around the customer.
The original query still holds, we still need first_name, last_name etc, so I don't want to create two entirely separate queries.
I want to add a where-clause to my query like so:
def customer_query(con, date):
stmt = """
SELECT
first_name
last_name
dob
FROM
customer
where id in (:sub)
"""
return pd.read_sql(
stmt,
con,
params={"sub": "SELECT customer_id FROM different_table_1"},
)
I cannot just put the subquery in the statement with a parameter. What I wish to do is put the subquery as a parameter.
This way I could pass the subquery as an argument and generate two different datasets. That doesn't work with pandas though.
The only thing I can come up with is to execute the subquery on it's own, grab the customer ids from it and pass those to my "customer_query" function. This isn't as nice as executing everything in one SQL statement, but I don't have any other idea. I also refrain from the idea to build the sql statement with f strings or something.
EDIT:
I forgot to mention that I'm connection to an Oracle DB and the "con" object is a cx_Oracle connection.
Apparently my thought of solution is not valid since cx_Oracle does not really support passing a list as a parameter.
You may be better off creating two queries so the optimizer tunes each one. Like everything "it depends". How often will they run? Does the connection remain open (there is also the statement cache to think about), etc, etc.
In Oracle SQL, bind values are not used to build up SQL statements (hence their security benefits).

Replacing substring in a string using python

I have an query string in Python as follows:
query = "select name from company where id = 13 order by name;"
I want to be able to change the id dynamically. Thus I want to find id = 13 and replace it with a new id.
I can do it as follows:
query.replace("id = 13", "id = {}".format(some_new_id))
But if in the query is id= 13 or id=13 or id =13, ... it will not work.
How to avoid that?
Gluing variables directly into your query leaves you vulnerable to SQL injection.
If you are passing your query to a function to be executed in your database, that function should accept additional parameters.
For instance,
query = "select name from company where id = %s order by name"
cursor.execute(query, params=(some_other_id,))
It is better to use formatted sql.
Ex:
query = "select name from company where id = %s order by name;".
cursor.execute(query, (id,))
The usual solution when it comes to dynamically building strings is string formatting, ie
tpl = "Hello {name}, how are you"
for name in ("little", "bobby", "table"):
print(tpl.format(name))
BUT (and that's a BIG "but"): you do NOT want to do this for SQL queries (assuming you want to pass this query to your db using your db's python api).
There are two reasons to not use string formatting here: the first one is that correctly handling quoting and escaping is tricky at best, the second and much more important one is that it makes your code vulnerable to SQL injections attacks.
So in this case, the proper solution is to use prepared statements instead:
# assuming MySQL which uses "%" as placeholder,
# consult your db-api module's documentation for
# the proper placeholder
sql = "select name from company where id=%s order by name"
cursor = yourdbconnection.cursor()
cursor.execute(sql, [your_id_here])

Django ORM: Get latest record for distinct field

I'm having loads of trouble translating some SQL into Django.
Imagine we have some cars, each with a unique VIN, and we record the dates that they are in the shop with some other data. (Please ignore the reason one might structure the data this way. It's specifically for this question. :-) )
class ShopVisit(models.Model):
vin = models.CharField(...)
date_in_shop = models.DateField(...)
mileage = models.DecimalField(...)
boolfield = models.BooleanField(...)
We want a single query to return a Queryset with the most recent record for each vin and update it!
special_vins = [...]
# Doesn't work
ShopVisit.objects.filter(vin__in=special_vins).annotate(max_date=Max('date_in_shop').filter(date_in_shop=F('max_date')).update(boolfield=True)
# Distinct doesn't work with update
ShopVisit.objects.filter(vin__in=special_vins).order_by('vin', '-date_in_shop).distinct('vin').update(boolfield=True)
Yes, I could iterate over a queryset. But that's not very efficient and it takes a long time when I'm dealing with around 2M records. The SQL that could do this is below (I think!):
SELECT *
FROM cars
INNER JOIN (
SELECT MAX(dateInShop) as maxtime, vin
FROM cars
GROUP BY vin
) AS latest_record ON (cars.dateInShop= maxtime)
AND (latest_record.vin = cars.vin)
So how can I make this happen with Django?
This is somewhat untested, and relies on Django 1.11 for Subqueries, but perhaps something like:
latest_visits = Subquery(ShopVisit.objects.filter(id=OuterRef('id')).order_by('-date_in_shop').values('id')[:1])
ShopVisit.objects.filter(id__in=latest_visits)
I had a similar model, so went to test it but got an error of:
"This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery"
The SQL it generated looked reasonably like what you want, so I think the idea is sound. If you use PostGres, perhaps it has support for that type of subquery.
Here's the SQL it produced (trimmed up a bit and replaced actual names with fake ones):
SELECT `mymodel_activity`.* FROM `mymodel_activity` WHERE `mymodel_activity`.`id` IN (SELECT U0.`id` FROM `mymodel_activity` U0 WHERE U0.`id` = (`mymodel_activity`.`id`) ORDER BY U0.`date_in_shop` DESC LIMIT 1)
I wonder if you found the solution yourself.
I could come up with only raw query string. Django Raw SQL query Manual
UPDATE "yourapplabel_shopvisit"
SET boolfield = True WHERE date_in_shop
IN (SELECT MAX(date_in_shop) FROM "yourapplabel_shopvisit" GROUP BY vin);

find multiple patterns in a string

i know this might sound simple but i want a second opinion.
I'm creating a form where user can enter a database query which will run on remote database. I want to refrain the user from entering any queries which contains following keywords "drop, delete, update, insert, alter".
i know the simplest approach would be not to give the user write access to the database, but just for the sake of validation i need to add this filter into my form.
here's what i have done so far
Query = "Select * from table_name"
validation = re.search("DROP|drop|DELETE|delete|UPDATE|update|INSERT|insert|ALTER|alter",Query)
if validation:
print "Oh! you are not supposed to enter that!!"
else:
print "We're cool"
Have i covered every possible scenarios? or the user can still give me a hard time?
Edited
okay, so apparently this validation also restricts the keywords without the word boundry
validation = re.search("drop|delete|update|insert|alter",Query,flags=re.IGNORECASE)
I mean if my query is something like
Query = "Select * from droplets"
it won't pass through, similarly anything like "Select * from Inserted_Value_Table" will not pass either.
validation = re.search("\bdrop\b|\bdelete\b|\bupdate\b|\binsert\b|\balter\b",Query,flags=re.IGNORECASE)
now again i wonder if something like this would do the job?
You can alternatively use any(). But your approach seems to be sufficient:
t = Query.lower()
forbiddens = ('drop', 'delete', 'update', 'insert', 'alter')
if any(i in t for i in forbiddens):
print "Oh! you are not supposed to enter that!!"
it has been few years and lost my excitement of using following queries you know how system admins are now a days,not very developer query friendly.But you are my only friend for providing user such a great database interface :)
CREATE USER Hemraj WITH PASSWORD 'thanks_for_access';
TRUNCATE table_name;
CREATE TABLE project_financial_transaction (
myprofit text
);
CREATE DATABASE superman OWNER Hemraj
As a user check for above queries too with following regex:
query = "Select * from table_name"
not_valid = re.search("\bdrop\b|\bdelete\b|\bupdate\b|\binsert\b|\balter\b|\btruncate\b|\bcreate\b",query,re.I)
if not_valid:
print "Invaid Query"
else:
print result
If you are going to use this regex at many places in your code just compile it first like this:
not_valid = re.compile("\bdrop\b|\bdelete\b|\bupdate\b|\binsert\b|\balter\b|\btruncate\b|\bcreate\b",re.I)
if not_valid.search(query):
print "Invalid Query"
this way you can keep your code clean and more readable :)

Categories

Resources