'Don't care' for a column in SQLite queries? - python

I've got a SQLite query, which depends on 2 variables, gender and hand. Each of these can have 3 values, 2 which actually mean something (so male/female and left/right) and the third is 'all'. If a variable has a value of 'all' then I don't care what the particular value of that column is.
Is it possible to achieve this functionality with a single query, and just changing the variable? I've had a look for a wildcard or don't care operator but haven't been able to find any except for % which doesn't work in this situation.
Obviously I could make a bunch of if statements and have different queries to use for each case but that's not very elegant.
Code:
select_sql = """ SELECT * FROM table
WHERE (gender = ? AND hand = ?)
"""
cursor.execute(select_sql, (gender_var, hand_var))
I.e. this query works if gender_val = 'male' and hand_var = 'left', but not if gender_val or hand_var = 'all'

You can indeed do this with a single query. Simply compare each variable to 'all' in your query.
select_sql = """ SELECT * FROM table
WHERE ((? = 'all' OR gender = ?) AND (? = 'all' OR hand = ?))
"""
cursor.execute(select_sql, (gender_var, gender_var, hand_var, hand_var))
Basically, when gender_var or hand_var is 'all', the first part of each OR expression is always true, so that branch of the AND is always true and matches all records, i.e., it is a no-op in the query.
It might be better to build a query dynamically in Python, however, that has just the fields you actually need to test. It might be noticeably faster, but you'd have to benchmark that to be sure.

Related

How does set_group_by works in Django?

I was writing the following query:
claim_query = ClaimBillSum.objects.filter(claim__lob__in = lobObj)\
.annotate(claim_count = Count("claim__claim_id", distinct=True))\
.annotate(claim_bill_sum = Sum("bill_sum"))\
.values("claim__body_part", "claim_count", "claim_bill_sum")\
.order_by("claim__body_part")
When I checked the query property, it was grouped by all properties of the tables related in this query, not only the ones selected in the values() function, when I only wanted to group by claim__body_part.
As I searched for a way to change the group by instruction, I found the query.set_group_by() function, that when applied, fixed the query in the way I wanted:
claim_query.query.set_group_by()
SELECT
"CLAIM"."body_part",
COUNT(DISTINCT "claim_bill_sum"."claim_id") AS "claim_count",
SUM("claim_bill_sum"."bill_sum") AS "claim_bill_sum"
FROM
"claim_bill_sum"
INNER JOIN "CLAIM" ON
("claim_bill_sum"."claim_id" = "CLAIM"."claim_id")
WHERE
"CLAIM"."lob_id" IN (SELECT U0."lob_id" FROM "LOB" U0 WHERE U0."client_id" = 1)
GROUP BY
"CLAIM"."body_part"
ORDER BY
"CLAIM"."body_part" ASC
But I couldn't find any information in Django documentation or anywhere else to better describe how this function works. Why the default group by is selecting all properties, and how .set_group_by() works, selecting exactly the property I wanted?

How to select all values in case of default in drop-down menu in sqlite3 query where clause

I want to display a data in QTableWidget according to QComboBoxes. In case of select all gender or select all ages, I want apply select all in the column in sqlite3 query
I want gender to be all
gender = "select all both male and female"
connection.execute("SELECT * FROM child where region=? and hospital=? and ageInMonths=? and gender=?", (region,hospital,ageInMonths,gender))
Welcome to Stackoverflow.
While it's a little tedious, the most sensible way to attack this problem is to build a list of the conditions you want to apply, and another of the data values that need to be inserted. Something like the following (untested) code, in which I assume that the variables are set to None if they aren't required in the search.
conditions = []
values = []
if region is not None:
conditions.append('region=?')
values.append(region)
# And similar logic for each other value ...
if gender is not None:
conditions.append('gender=?')
values.append(gender)
query = 'SELECT * FROM child'
if conditions:
query = query + ' WHERE ' + ' AND '.join(conditions)
connection.execute(query, values)
This way, if you want to include all values of a column you simply exclude if from the conditions by setting it to None.
You can build your where clause and your parameter list conditionally.
Below I am assuming that the ageInMonths and gender variables actually contain the value 'all' when this is selected on your form. You can change this to whichever value is actually passed to your code, if it is something different.
When it comes to your actual query, the best way to get all values for a field is to simply exclude it from the where clause of your query entirely.
So something like:
query_parameters = []
query_string = "SELECT * FROM child where region=? and hospital=?"
query_parameters.append(region)
query_parameters.append(hospital)
if ageInMonths != 'all':
query_string += " and ageInMonths=?"
query_parameters.append(ageInMonths)
if gender != 'all':
query_string += " and gender=?"
query_parameters.append(gender)
connection.execute(query_string, query_parameters)
Basically, at the same time we are testing and building the dynamic parts of the SQL statement (in query_string), we are also dynamically defining the list of variables to pass to the query in query_parameters, which is a list object.

Order by sometimes not work in my query

I defined two classes:
class OrderEntryVacancyRenew(OrderEntry):
...
vacancy_id = db.Column(db.Integer, db.ForeignKey('vacancy.id'), nullable=False)
vacancy = db.relationship('Vacancy')
remaining = db.Column(db.SmallInteger)
class Vacancy(db.Model):
id = db.Column(db.Integer, autoincrement=True, primary_key=True)
renew_at = db.Column(TZDateTime, index=True)
Then I defined the method to refresh OrderEntryVacancyRenew.remaining and Vacancy.renew_at.
def renew_vacancy():
filters = [
OrderEntryVacancyRenew.remaining,
Vacancy.status == 0,
or_(
Vacancy.renew_at <= (utcnow() - INTERVAL),
Vacancy.renew_at.is_(None))
]
renew_vacancies = OrderEntryVacancyRenew.query.options(
load_only('remaining', 'vacancy_id')
).order_by(
OrderEntryVacancyRenew.id
).from_self().group_by(
OrderEntryVacancyRenew.vacancy_id
).join(
OrderEntryVacancyRenew.vacancy
).options(
contains_eager(OrderEntryVacancyRenew.vacancy).load_only('renew_at')
).filter(*filters)
for entry in renew_vacancies:
entry.vacancy.renew_at = utcnow()
entry.remaining -= 1
db.session.commit()
I wrote the unit test to check renew_vacancy
vacancy1 = Vacancy(id=10000)
vacancy2 = Vacancy(id=10001)
db.session.add_all([vacancy1, vacancy2])
vacancy_renew1 = OrderEntryVacancyRenew(
vacancy_id=vacancy1.id,
remaining=24)
# make sure vacancy_renew1.id < vacancy_renew2.id
db.session.add(vacancy_renew1)
db.session.commit()
vacancy_renew2 = OrderEntryVacancyRenew(
vacancy_id=vacancy1.id,
remaining=8)
vacancy_renew3 = OrderEntryVacancyRenew(
vacancy_id=vacancy2.id,
remaining=42)
db.session.add_all((vacancy_renew2, vacancy_renew3))
db.session.commit()
renew_vacancy()
self.assertEqual(
(vacancy_renew1.remaining, vacancy_renew2.remaining), (23, 8))
renew_vacancies is order by OrderEntryVacancyRenew id and group by Vacancy id, so I expect it will filter vacancy_renew1 and vacancy_renew3.
I used the following command to run the unit test 100 times:
for i in `seq 1 100`; do python test.py; done
In some rare situations, it filters vacancy_renew2 instead of vacancy_renew1.
Why does it happen that sometimes order by does not work as expected?
I try to print vacancy_renew1.id and vacancy_renew2.id after renew_vacancy.
...
db.session.commit()
renew_vacancy()
print vacancy_renew1.id
print vacancy_renew2.id
self.assertEqual(
(vacancy_renew1.remaining, vacancy_renew2.remaining), (23, 8))
...
Why does it happen that sometimes ORDER BY does not work as expected?
Given standard SQL, the results of your query are indeterminate, so it is not very valuable to know why it works most of the time and fails rarely. There are two things that make the results vary:
Generally you should not rely on the order of rows of a subquery in enclosing queries, even if you apply an ordering. Some database implementations may have additional guarantees, but on others for example optimizers may deem the ORDER BY unnecessary – which MySQL 5.7 and up does, it removes the subquery entirely.
Usually databases, such as SQLite and MySQL1, that allow selecting non-aggregate items that are not functionally dependent on, or are not named in the GROUP BY clause, leave it unspecified from which row in the group the values are taken:
SQLite:
... Otherwise, it is evaluated against a single arbitrarily chosen row from within the group. If there is more than one non-aggregate expression in the result-set, then all such expressions are evaluated for the same row.
MySQL:
... In this case, the server is free to choose any value from each group, so unless they are the same, the values chosen are nondeterministic, which is probably not what you want. Furthermore, the selection of values from each group cannot be influenced by adding an ORDER BY clause.
Trying out the query on SQLite failed the assertion on this machine, while on MySQL it passed. This is probably due to implementation of selecting the row from within the group, but you should not rely on such details.
What you seem to be after is a greatest-n-per-group query, or top-1. Not knowing which database you are using here's a somewhat generic way to do just that using an EXISTS subquery expression:
renew_alias = db.aliased(OrderEntryVacancyRenew)
renew_vacancies = db.session.query(OrderEntryVacancyRenew).\
join(OrderEntryVacancyRenew.vacancy).\
options(
load_only('remaining'),
contains_eager(OrderEntryVacancyRenew.vacancy).load_only('renew_at')).\
filter(db.not_(db.exists().where(
db.and_(renew_alias.vacancy_id == OrderEntryVacancyRenew.vacancy_id,
renew_alias.id < OrderEntryVacancyRenew.id)))).\
filter(*filters)
This query passes the assertion on both SQLite and MySQL. Alternatively you could replace the EXISTS subquery expression with a LEFT JOIN and IS NULL predicate.
P.s. I suppose you're using some flavour of MySQL and following advice such as this. You should read the commentary on that one as well, since there are many people rightly pointing out the pitfalls. It does not work, at least for MySQL 5.7 and up.
1: Controllable with the ONLY_FULL_GROUP_BY SQL mode setting, enabled by default in MySQL 5.7.5 and up.

PostgreSQL WHERE EXISTS

I'm having trouble wrapping my head around the right way to use EXISTS (and whether there is a right way to use EXISTS for this particular case, or if I'm misunderstanding it).
I'm working against the Rigor schema (defined for SQLAlchemy here: https://github.com/blindsightcorp/rigor/blob/master/lib/types.py ).
The short of it is I have three tables I care about: "percept", "annotation", and "annotation_property". annotation_properties have an annotation_id, annotations have a percept_id.
I want to find all of the percepts that have annotations with a specific annotation_property (FOO=BAR).
Percepts may have many annotations that have a specific property, so it seems like an EXISTS should make things faster.
The (relatively slow) option is:
SELECT DISTINCT(percept.*) FROM percept, annotation, annotation_property
WHERE percept.id = annotation.percept_id AND
annotation_property.annotation_id = annotation.id AND
annotation_property.name = 'FOO' AND annotation_property.value = 'BAR';
How would I use EXISTS to optimize this?
It feels like the first step is something like:
SELECT percept.* FROM percept WHERE id IN (SELECT percept_id FROM
annotation, annotation_property WHERE
annotation.id = annotation_property.annotation_id AND
annotation_property.name = 'FOO' AND annotation_property.value = 'BAR');
But I don't see where to go from here....
To begin with, use ANSI JOIN syntax to distinguish your join conditions from your filter conditions. The result is easier to read, and it better displays the structure of your data:
SELECT DISTINCT(percept.*)
FROM
percept
JOIN annotation ON percept.id = annotation.percept_id
JOIN annotation_property ON annotation_property.annotation_id = annotation.id
WHERE
annotation_property.name = 'FOO'
AND annotation_property.value = 'BAR'
;
It would probably be an improvement to do as you said, and use distinct on the primary key column instead of on a whole percept row at a time, but that still likely involves computing a large result set and then merging it down. It is an alternative to an exists() condition, not a supplement to one.
Employing an EXISTS condition in the WHERE clause might look like this:
SELECT *
FROM percept p
WHERE EXISTS (
SELECT *
FROM
annotation ann
JOIN annotation_property anp
ON anp.annotation_id = ann.id
WHERE
anp.name = 'FOO'
AND anp.value = 'BAR'
AND ann.percept_id = p.id
)
;
The problem with your original query (apart from the implicit join syntax), is that you are bringing together lots of rows from the joins. Then you are aggregating to remove duplicates.
You can eliminate the duplication removal by just selecting from one table:
SELECT p.*
FROM percept p
WHERE EXISTS (SELECT 1
FROM annotation a JOIN
annotation_property ap
ON ap.annotation_id = a.id AND
ap.name = 'FOO' AND ap.value = 'BAR'
WHERE p.id = a.percept_id
) ;
This assumes that the rows in percept do not have duplicates, but that seems like a reasonable assumption.

filter SqlAlchemy column value by number of resulting characters

How can I filter SqlAlchemy column by number of resulting Characters,
Here is a kind of implementation I am looking at,
query = query.filter(Take_Last_7_Characters(column_1) == '0321334')
Where "Take_Last_7_Characters" fetches the last 7 characters from the resulting value of column_1
So How can I implement Take_Last_7_Characters(column_1) ??
use sqlalchemy.sql.expression.func , to generate SQL functions.
check for more info
Please use the func to generate SQL functions as directed by #tuxuday.
Note that the code is RDBMS-dependant. The code below runs for SQLite, which offers SUBSTR and LENGTH functions. Your actual database might have different names for those (LEN, SUSBSTRING, LEFT, RIGHT, etc).
qry = session.query(Test)
qry = qry.filter(func.SUBST(Test.column_1, func.LENGTH(Test.column_1) - 6, 7) == '0321334')

Categories

Resources