RIGHT OUTER JOIN in SQLAlchemy - python

I have two tables beard and moustache defined below:
+--------+---------+------------+-------------+
| person | beardID | beardStyle | beardLength |
+--------+---------+------------+-------------+
+--------+-------------+----------------+
| person | moustacheID | moustacheStyle |
+--------+-------------+----------------+
I have created a SQL Query in PostgreSQL which will combine these two tables and generate following result:
+--------+---------+------------+-------------+-------------+----------------+
| person | beardID | beardStyle | beardLength | moustacheID | moustacheStyle |
+--------+---------+------------+-------------+-------------+----------------+
| bob | 1 | rasputin | 1 | | |
+--------+---------+------------+-------------+-------------+----------------+
| bob | 2 | samson | 12 | | |
+--------+---------+------------+-------------+-------------+----------------+
| bob | | | | 1 | fu manchu |
+--------+---------+------------+-------------+-------------+----------------+
Query:
SELECT * FROM beards LEFT OUTER JOIN mustaches ON (false) WHERE person = "bob"
UNION ALL
SELECT * FROM beards b RIGHT OUTER JOIN mustaches ON (false) WHERE person = "bob"
However I can not create SQLAlchemy representation of it. I tried several ways from implementing from_statement to outerjoin but none of them really worked. Can anyone help me with it?

In SQL, A RIGHT OUTER JOIN B is equivalent of B LEFT OUTER JOIN A. So, technically there is no need in the RIGHT OUTER JOIN API - it is possible to do the same by switching the places of the target "selectable" and joined "selectable". SQL Alchemy provides an API for this:
# this **fictional** API:
query(A).join(B, right_outer_join=True) # right_outer_join doesn't exist in SQLA!
# can be implemented in SQLA like this:
query(A).select_entity_from(B).join(A, isouter=True)
See SQLA Query.join() doc, section "Controlling what to Join From".

From #Francis P's suggestion I came up with this snippet:
q1 = session.\
query(beard.person.label('person'),
beard.beardID.label('beardID'),
beard.beardStyle.label('beardStyle'),
sqlalchemy.sql.null().label('moustachID'),
sqlalchemy.sql.null().label('moustachStyle'),
).\
filter(beard.person == 'bob')
q2 = session.\
query(moustache.person.label('person'),
sqlalchemy.sql.null().label('beardID'),
sqlalchemy.sql.null().label('beardStyle'),
moustache.moustachID,
moustache.moustachStyle,
).\
filter(moustache.person == 'bob')
result = q1.union(q2).all()
However this works but you can't call it as an answer because it appears as a hack. This is one more reason why there should be RIGHT OUTER JOIN in sqlalchemy.

If A,B are tables, you can achieve:
SELECT * FROM A RIGHT JOIN B ON A.id = B.a_id WHERE B.id = my_id
by:
SELECT A.* FROM B JOIN ON A.id = B.a_id WHERE B.id = my_id
in sqlalchemy:
from sqlalchemy import select
result = session.query(A).select_entity_from(select([B]))\
.join(A, A.id == B.a_id)\
.filter(B.id == my_id).first()
for example:
# import ...
class User(Base):
__tablenane = "user"
id = Column(Integer, primary_key=True)
group_id = Column(Integer, ForeignKey("group.id"))
class Group(Base):
__tablename = "group"
id = Column(Integer, primary_key=True)
name = Column(String(100))
You can get user group name by user id with the follow code:
# import ...
from sqlalchemy import select
user_group_name, = session.query(Group.name)\
.select_entity_from(select([User]))\
.join(Group, User.group_id == Group.id)\
.filter(User.id == 1).first()
If you want a outer join, use outerjoin() instead of join().
This answer is a complement to the previous one(Timur's answer).

Here's what I've got, ORM style:
from sqlalchemy.sql import select, false
stmt = (
select([Beard, Moustache])
.select_from(
outerjoin(Beard, Moustache, false())
).apply_labels()
).union_all(
select([Beard, Moustache])
.select_from(
outerjoin(Moustache, Beard, false())
).apply_labels()
)
session.query(Beard, Moustache).select_entity_from(stmt)
Which seems to work on it's own, but seems to be impossible to join with another select expression

Unfortunately, SQLAlchemy only provides API for LEFT OUTER JOIN as .outerjoin(). As mentioned above, we could get a RIGHT OUTER JOIN by reversing the operands of LEFT OUTER JOIN; eg. A RIGHT JOIN B is the same as B LEFT JOIN A.
In SQL, the following statements are equivalent:
SELECT * FROM A RIGHT OUTER JOIN B ON A.common = B.common;
SELECT * FROM B LEFT OUTER JOIN A ON A.common = B.common;
However, in SQLAlchemy, we need to query on a class then perform join. The tricky part is rewriting the SQLAlchemy statement to reverse the tables. For example, the results of the first two queries below are different as they return different objects.
# No such API (rightouterjoin()) but this is what we want.
# This should return the result of A RIGHT JOIN B in a list of object A
session.query(A).rightouterjoin(B).all() # SELECT A.* FROM A RIGHT OUTER JOIN B ...
# We could reverse A and B but this returns a list of object B
session.query(B).outerjoin(A).all() # SELECT B.* FROM B LEFT OUTER JOIN A ...
# This returns a list of object A by choosing the 'left' side to be B using select_from()
session.query(A).select_from(B).outerjoin(A).all() # SELECT A.* FROM B LEFT OUTER JOIN A ...
# For OP's example, assuming we want to return a list of beard object:
session.query(beard).select_from(moustache).outerjoin(beard).all()
Just adding to the answers, you can find the use of select_from from the SQLAlchemy doc.

Related

SQL query returns result multiple times

I'm pretty new to SQL and am trying to join some tables in SQL.
I'm using SQLite3 and Pandas and have the following table structure:
User
|
Measurement - Environment - meas_device - Device
| |
Data Unit_of_Measurement
Why do I get the result of the following SQL-query multiple times (4x)?
query = """
SELECT User.name, Measurement.id, Data.set_id, Data.subset_id, Data.data
FROM Measurement
JOIN Data ON Measurement.id = Data.measurement_id
JOIN User ON Measurement.user_id = user.id
JOIN Environment ON Measurement.Environment_id = Environment.id
JOIN meas_device ON Environment.meas_dev_ids = meas_device.id
JOIN Device ON meas_device.device_id = Device.id
JOIN Unit_of_Measurement ON meas_device.Unit_id = Unit_of_Measurement.id
WHERE User.name = 'nicola'
"""
pd.read_sql_query(query, conn)
My guess is that I did something wrong with the joining, but I can not see what.
I hoped to be able to save a JOIN statement somewhere that works for every possible query, that's why more tables are joined than necessary for this query.
Update
I think the problem lies within the Environment table. Whenever I join this table the results get multiplied. As the Environment is a collection of meas_devices, there are multiple entries with the same Environment id.
(I could save the Environment table with the corresponding meas_device_id's as lists, but then I see no possibility to link the Environment table with the meas_device table.)
id | meas_device_id
1 | 1
1 | 2
1 | 5
2 | 3
2 | 4
Up until now i created the tables with pandas DataFrame.to_sql() therefore the id is not marked as primary key or something like that. Could this be the reason for my problem
Update 2
I found the problem. I don't think that actually helps somebody in the future. But for the sake of completeness, here the explanation. It was not really a question of how to link the tables but I neglected a crucial link. Because the Environment has multiple indices with the same value it created "open ends" that resulted in a multiplication of the results. I needed to add a cross-check between Environment.subset_id and Data.subset_id. The following query works fine:
query = f""" SELECT {SELECT}
FROM Data
JOIN Measurement ON Data.measurement_id = Measurement.id
JOIN User ON Measurement.user_id = User.id
JOIN Environment ON Measurement.Environment_id = Environment.id
JOIN meas_device ON Environment.meas_dev_ids = meas_device.id
JOIN Device ON meas_device.Device_id = Device.id
JOIN Unit_of_Measurement ON meas_device.Unit_id = Unit_of_Measurement.id
WHERE {WHERE} AND Environment.subset_id = Data.subset_id
"""
If you need to filter on tables that produce additional rows in the result the when they are joined, don't join them and instead include them in a sub-query in the WHERE clause.
E.g.
SELECT User.name, Measurement.id, Data.set_id, Data.subset_id, Data.data
FROM
Measurement
JOIN Data ON Measurement.id = Data.measurement_id
JOIN User ON Measurement.user_id = user.id
WHERE
Measurement.Environment_id IN (
SELECT Environment.id
FROM
Environment
JOIN meas_device ON Environment.meas_dev_ids = meas_device.id
JOIN Device ON meas_device.device_id = Device.id
JOIN Unit_of_Measurement ON meas_device.Unit_id = Unit_of_Measurement.id
WHERE Device.name = 'xy'
)
In this subquery you can join many tables without generating additional records.
If this is not an option because you want to select entries from other tables as well, you can simply add a DISTINCT to you original query.
SELECT DISTINCT
User.name, Measurement.id, Data.set_id, Data.subset_id, Data.data
FROM
Measurement
JOIN Data ON Measurement.id = Data.measurement_id
JOIN User ON Measurement.user_id = user.id
JOIN Environment ON Measurement.Environment_id = Environment.id
JOIN meas_device ON Environment.meas_dev_ids = meas_device.id
JOIN Device ON meas_device.device_id = Device.id
JOIN Unit_of_Measurement ON meas_device.Unit_id = Unit_of_Measurement.id
WHERE
User.name = 'nicola'

Complex query using Django QuerySets

I am working on a personal project and I am trying to write a complex query that:
Gets every device that belongs to a certain user
Gets every sensor belonging to every one of the user's devices
Gets the last recorded value and timestamp for each of the user's devices sensors.
I am using Sqlite, and I managed to write the query as plain SQL, however, for the life of me I cannot figure out a way to do it in django. I looked at other questions, tried going through the documentation, but to no avail.
My models:
class User(AbstractBaseUser):
email = models.EmailField()
class Device(models.Model):
user = models.ForeignKey(User)
name = models.CharField()
class Unit(models.Model):
name = models.CharField()
class SensorType(models.Model):
name = models.CharField()
unit = models.ForeignKey(Unit)
class Sensor(models.Model):
gpio_port = models.IntegerField()
device = models.ForeignKey(Device)
sensor_type = models.ForeignKey(SensorType)
class SensorData(models.Model):
sensor = models.ForeignKey(Sensor)
value = models.FloatField()
timestamp = models.DateTimeField()
And here is the SQL query:
SELECT acc.email,
dev.name as device_name,
stype.name as sensor_type,
sen.gpio_port as sensor_port,
sdata.value as sensor_latest_value,
unit.name as sensor_units,
sdata.latest as value_received_on
FROM devices_device as dev
INNER JOIN accounts_user as acc on dev.user_id = acc.id
INNER JOIN devices_sensor as sen on sen.device_id = dev.id
INNER JOIN devices_sensortype as stype on stype.id = sen.sensor_type_id
INNER JOIN devices_unit as unit on unit.id = stype.unit_id
LEFT JOIN (
SELECT MAX(sd.timestamp) latest, sd.value, sensor_id
FROM devices_sensordata as sd
INNER JOIN devices_sensor as s ON s.id = sd.sensor_id
GROUP BY sd.sensor_id) as sdata on sdata.sensor_id= sen.id
WHERE acc.id = 1
ORDER BY dev.id
I have been playing with the django shell in order to find a way to implement this query with the QuerySet API, but I cannot figure it out...
The closest I managed to get is with this:
>>> sub = SensorData.objects.values('sensor_id', 'value').filter(sensor_id=OuterRef('pk')).order_by('-timestamp')[:1]
>>> Sensor.objects.annotate(data_id=Subquery(sub.values('sensor_id'))).filter(id=F('data_id')).values(...)
However it has two problems:
It does not include the sensors that do not yet have any values in SensorsData
If i include the SensorData.values field into the .values() I start to get previously recorded values of the sensors
If someone could please show me how to do it, or at least tell me what I am doing wrong I will be very grateful!
Thanks!
P.S. Please excuse my grammar and spelling errors, I am writing this in the middle of the night and I am tired.
EDIT:
Based on the answers I should clarify:
I only want the latest sensor value for each sensor. For example I have In sensordata:
id | sensor_id | value | timestamp|
1 | 1 | 2 | <today> |
2 | 1 | 5 | <yesterday>|
3 | 2 | 3 | <yesterday>|
Only the latest should be returned for each sensor_id:
id | sensor_id | value | timestamp |
1 | 1 | 2 | <today> |
3 | 2 | 3 | <yesterday>|
Or if the sensor does not yet have any data in this table, I waant the query to return a record of it with "null" for value and timestamp (basically the left join in my SQL query).
EDIT2:
Based on #ivissani 's answer, I managed to produce this:
>>> latest_sensor_data = Sensor.objects.annotate(is_latest=~Exists(SensorData.objects.filter(sensor=OuterRef('id'),timestamp__gt=OuterRef('sensordata__timestamp')))).filter(is_latest=True)
>>> user_devices = latest_sensor_data.filter(device__user=1)
>>> for x in user_devices.values_list('device__name','sensor_type__name', 'gpio_port','sensordata__value', 'sensor_type__unit__name', 'sensordata__timestamp').order_by('device__name'):
... print(x)
Which seems to do the job.
This is the SQL it produces:
SELECT
"devices_device"."name",
"devices_sensortype"."name",
"devices_sensor"."gpio_port",
"devices_sensordata"."value",
"devices_unit"."name",
"devices_sensordata"."timestamp"
FROM
"devices_sensor"
LEFT OUTER JOIN "devices_sensordata" ON (
"devices_sensor"."id" = "devices_sensordata"."sensor_id"
)
INNER JOIN "devices_device" ON (
"devices_sensor"."device_id" = "devices_device"."id"
)
INNER JOIN "devices_sensortype" ON (
"devices_sensor"."sensor_type_id" = "devices_sensortype"."id"
)
INNER JOIN "devices_unit" ON (
"devices_sensortype"."unit_id" = "devices_unit"."id"
)
WHERE
(
NOT EXISTS(
SELECT
U0."id",
U0."sensor_id",
U0."value",
U0."timestamp"
FROM
"devices_sensordata" U0
WHERE
(
U0."sensor_id" = ("devices_sensor"."id")
AND U0."timestamp" > ("devices_sensordata"."timestamp")
)
) = True
AND "devices_device"."user_id" = 1
)
ORDER BY
"devices_device"."name" ASC
Actually your query is rather simple, the only complex part is to establish which SensorData is the latest for each Sensor. I would go by using annotations and an Exists subquery in the following way
latest_data = SensorData.objects.annotate(
is_latest=~Exists(
SensorData.objects.filter(sensor=OuterRef('sensor'),
timestamp__gt=OuterRef('timestamp'))
)
).filter(is_latest=True)
Then it's just a matter of filtering this queryset by user in the following way:
certain_user_latest_data = latest_data.filter(sensor__device__user=certain_user)
Now as you want to retrieve the sensors even if they don't have any data this query will not suffice as only SensorData instances are retrieved and the Sensor and Device must be accessed through fields. Unfortunately Django does not allow for explicit joins through its ORM. Therefore I suggest the following (and let me say, this is far from ideal from a performance perspective).
The idea is to annotate the Sensors queryset with the specific values of the latest SensorData (value and timestamp) if any exists in the following way:
latest_data = SensorData.objects.annotate(
is_latest=~Exists(
SensorData.objects.filter(sensor=OuterRef('sensor'),
timestamp__gt=OuterRef('timestamp'))
)
).filter(is_latest=True, sensor=OuterRef('pk'))
sensors_with_value = Sensor.objects.annotate(
latest_value=Subquery(latest_data.values('value')),
latest_value_timestamp=Subquery(latest_data.values('timestamp'))
) # This will generate two subqueries...
certain_user_sensors = sensors_with_value.filter(device__user=certain_user).select_related('device__user')
If there aren't any instances of SensorData for a certain Sensor then the annotated fields latest_value and latest_value_timestamp will simply be set to None.
For this kind of queries, I recommend strongly to use Q objects, here the docs https://docs.djangoproject.com/en/2.2/topics/db/queries/#complex-lookups-with-q-objects
It's perfectly fine to execute raw queries with django, especially if they are that complex.
If you want to map the results to models use this :
https://docs.djangoproject.com/en/2.2/topics/db/sql/#performing-raw-queries
Otherwise, see this : https://docs.djangoproject.com/en/2.2/topics/db/sql/#executing-custom-sql-directly
Note that in both cases, no checking is done on the query by django.
This means that the security of the query is your full responsability, sanitize the parameters.
Something like this?:
Multiple Devices for 1 User
device_ids = Device.objects.filter(user=user).values_list("id", flat=True)
SensorData.objects.filter(sensor__device__id__in=device_ids
).values("sensor__device__name", "sensor__sensor_type__name",
"value","timestamp").order_by("-timestamp")
1 Device, 1 User
SensorData.objects.filter(sensor__device__user=user
).values("sensor__device__name", "sensor__sensor_type__name",
"value", "timestamp").order_by("-timestamp")
That Queryset will:
1.Gets every device that belongs to a certain user
2.Gets every sensor belonging to every one of the user's devices (but it return sensor_type every sensor cause there is no name field there so i return sensor_type_name)
3.Gets all recorded(order by the latest timestamp) value and timestamp for each of the user's devices sensors.
UPDATE
try this:
list_data=[]
for _id in device_ids:
sensor_data=SensorData.objects.filter(sensor__device__user__id=_id)
if sensor_data.exists():
data=sensor_data.values("sensor__id", "value", "timestamp", "sensor__device__user__id").latest("timestamp")
list_data.append(data)

How to use postgresql before update trigger? This doesn't execute only compiles correctly

CREATE OR REPLACE FUNCTION get_issue_status_user_role() RETURNS TRIGGER AS
$issue_user_role$
DECLARE missue_id integer;
mstatus integer;
mcurr_user integer;
mrole_descrp varchar;
mcan_edit_tkts boolean;
mqueue_id integer;
BEGIN
raise notice 'howdya';
mcan_edit_tkts := False;
-- Check roles for the logged in user 'before update' event
-- get the queue id of the issue being edited. If user present
-- in user_group_id for the queue id in the q_users_roles
-- check the role(s) with status role of the current user
--IF (TG_OP = 'UPDATE') THEN
-- if OLD.description != NEW.description then
missue_id := OLD.id;
mcurr_user := NEW.updated_by;
mqueue_id := NEW.queue_id;
mstatus := OLD.status;
mrole_descrp := (
SELECT roles.description AS mrole_desc FROM rt_issues
LEFT OUTER JOIN queues ON rt_issues.queue_id = queues.id
LEFT OUTER JOIN rt_status ON rt_issues.status = rt_status.id
LEFT OUTER JOIN q_users_roles ON queues.id = q_users_roles.queue_id
LEFT OUTER JOIN roles ON q_users_roles.role_id = roles.id
LEFT OUTER JOIN users_groups ON q_users_roles.user_group_id = users_groups.id
LEFT OUTER JOIN users ON users_groups."user" = users.id
WHERE rt_issues.id = missue_id AND
rt_issues.status = mstatus AND
users_groups."user" = mcurr_user);
--end if;
if mrole_descrp != 'can_change_status' then
mcan_edit_tkts := False;
else
mcan_edit_tkts := True;
end if;
--END IF;
if mcan_edit_tkts then
raise notice 'Edit permitted';
RETURN NEW;
else
raise notice 'No permission to edit this ticket';
RETURN Null; -- result is ignored since this is an AFTER trigger
end if;
END;
$issue_user_role$ LANGUAGE plpgsql;
drop trigger if exists issue_user_role on rt_issues;
CREATE TRIGGER issue_user_role BEFORE UPDATE OR INSERT ON rt_issues FOR EACH ROW EXECUTE PROCEDURE get_issue_status_user_role();
The select statement returns a matching role description from the roles master for the issue status of the queue being updated associated with the role for the current user from the q_users_roles table belonging to the users_groups. The sql gives correct output (role description) when executed using sqlalchemy- core in python api call. This is my first trigger. Where is the syntax error
db1=# select id, first_name from users;
id | first_name
----+------------
1 | ytxz
2 | abcd
(2 rows)
db1=# select * from users_groups;
id | user | group |
----+------+-------+
2 | 2 | 1 |
1 | 1 | 2 |
(2 rows)
db1=# select id, cc_user_ids, status, queue_id, updated_by from rt_issues where id=10;
id | cc_user_ids | status | queue_id | updated_by
----+-------------+--------------+--------+----------+---
10 | not#quack.om | 2 | 1 | 2
(1 row)
db1=# select * from rt_status;
id | description | role_id | queue_id |
----+---------------------+---------+----------+
2 | Initial check | 1 | 1 |
3 | Awaiting assignment | 1 | 1 |
1 | New Issue | 1 | 1 |
(3 rows)
db1=# select * from q_users_roles;
id | queue_id | user_group_id | role_id |
----+----------+---------------+---------+
9 | 16 | 1 | 2 |
25 | 21 | 1 | 2 |
26 | 24 | 1 | 2 |
16 | 1 | 1 | 1 |
(4 rows)
db1=# select * from roles;
id | description | xdata
----+----------------------+-------
1 | can_change_status |
2 | can_create_tkts |
(2 rows)
db1=# SELECT roles.description AS mrole_desc FROM rt_issues LEFT OUTER JOIN queues ON rt_issues.queue_id = queues.id LEFT OUTER JOIN rt_status ON rt_issues.status = rt_status.id LEFT OUTER JOIN q_users_roles ON queues.id = q_users_roles.queue_id LEFT OUTER JOIN roles ON q_users_roles.role_id = roles.id LEFT OUTER JOIN users_groups ON q_users_roles.user_group_id = users_groups.id LEFT OUTER JOIN users ON users_groups."user" = users.id WHERE rt_issues.id = 10 AND rt_issues.status = 2 AND users_groups."user" = 1;
mrole_desc
-------------------
can_change_status
(1 row)
db1=# SELECT roles.description AS mrole_desc FROM rt_issues LEFT OUTER JOIN queues ON rt_issues.queue_id = queues.id LEFT OUTER JOIN rt_status ON rt_issues.status = rt_status.id LEFT OUTER JOIN q_users_roles ON queues.id = q_users_roles.queue_id LEFT OUTER JOIN roles ON q_users_roles.role_id = roles.id LEFT OUTER JOIN users_groups ON q_users_roles.user_group_id = users_groups.id LEFT OUTER JOIN users ON users_groups."user" = users.id WHERE rt_issues.id = 10 AND rt_issues.status = 2 AND users_groups."user" = 2;
mrole_desc
------------
(0 rows)
One of the key problems with how you create your trigger is you immediately do
drop trigger if exists issue_user_role on rt_issues;
So there won't be a trigger to execute after this.
A side problem is, that given the constraint you are trying to enforce, you probably want a trigger to fire on an insert too.
I had trouble figuring out exactly what you code is meant to be doing. So instead of directly answering your question here is an example trigger for a basic schema and examples of how and when it fires. There is a table value test_table which stores an operand (value), a unary operation (op_code) and the result. The trigger tries to ensure that the stored result is always correct for the given value and op_code.
Schema
DROP TABLE IF EXISTS test_table;
DROP TABLE IF EXISTS test_operations;
CREATE TABLE test_operations (
op_code TEXT PRIMARY KEY
);
INSERT INTO test_operations (op_code) VALUES
('double'),
('triple'),
('negative')
;
CREATE TABLE test_table (
id bigserial PRIMARY KEY,
op_code TEXT REFERENCES test_operations(op_code),
value INTEGER NOT NULL,
result INTEGER NOT NULL
)
;
Trigger Function
CREATE OR REPLACE FUNCTION test_table_update_trigger()
RETURNS TRIGGER AS $$
DECLARE
expected_result INTEGER;
BEGIN
expected_result := (
SELECT CASE NEW.op_code
WHEN 'double' THEN NEW.value * 2
WHEN 'triple' THEN NEW.value * 3
WHEN 'negative' THEN -NEW.value
END
);
IF NEW.result != expected_result
THEN
IF NEW.value BETWEEN -10 AND 10
THEN
-- silently ignore the update or insert
RETURN NULL;
ELSIF NEW.value >= 100
THEN
-- modify the update
NEW.result = expected_result;
ELSE
-- abort the transaction
RAISE EXCEPTION
'bad result (%) -- expected % for % %',
NEW.result, expected_result, NEW.op_code, NEW.value;
END IF;
END IF;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
Actual Trigger
-- remove old trigger if it exists
DROP TRIGGER IF EXISTS my_trigger ON test_table;
-- best practice to create triggers after the function they use
CREATE TRIGGER my_trigger
BEFORE UPDATE OR INSERT
ON test_table
FOR EACH ROW
EXECUTE PROCEDURE test_table_update_trigger();
Data
INSERT INTO test_table (op_code, value, result) VALUES
('double', 2, 4),
('double', 3, 6),
('double', 14, 28),
('triple', 2, 2), -- this insert is ignored
-- ('triple', 14, 14), -- this would be an error
('triple', 120, 0), -- this insert is corrected to have result of 360
('negative', 8, -8)
;
-- this updates targets the first two 'double' rows, but only one row
-- is updated as the trigger returns NULL in one instance
UPDATE test_table
SET
op_code = 'triple',
result = 6
WHERE
op_code = 'double'
AND value < 10 -- remove this clause to see an exception
;
If you need more information the PostgreSQL docs are usually quite detailed.

Create a temporary table in python to join with a sql table

I have the following data in a vertica db, Mytable:
+----+-------+
| ID | Value |
+----+-------+
| A | 5 |
| B | 9 |
| C | 10 |
| D | 7 |
+----+-------+
I am trying to create a query in python to access a vertica data base. In python I have a list:
ID_list= ['A', 'C']
I would like to create a query that basically inner joins Mytable with the ID_list and then I could make a WHERE query.
So it will be basically something like this
SELECT *
FROM Mytable
INNER JOIN ID_list
ON Mytable.ID = ID_list as temp_table
WHERE Value = 5
I don't have writing rights on the data base, so the table needs to be created localy. Or is there an alternative way of doing this?
If you have a small table, then you can do as Tim suggested and create an in-list.
I kind of prefer to do this using python ways, though. I would probably also make ID_list a set as well to keep from having dups, etc.
in_list = '(%s)' % ','.join(str(id) for id in ID_list)
or better use bind variables (depends on the client you are using, and probably not strictly necessary if you are dealing with a set of ints since I can't imagine a way to inject sql with that):
in_list = '(%s)' % ','.join(['%d'] * len(ID_list)
and send in your ID_list as a parameter list for your cursor.execute. This method is positional, so you'll need to arrange your bind parameters correctly.
If you have a very, very large list... you could create a local temp and load it before doing your query with join.
CREATE LOCAL TEMP TABLE mytable ( id INTEGER );
COPY mytable FROM STDIN;
-- Or however you need to load the data. Using python, you'll probably need to stream in a list using `cursor.copy`
Then join to mytable.
I wouldn't bother doing the latter with a very small number of rows, too much overhead.
So I used the approach from Tim:
# create a String of all the ID_list so they can be inserted into a SQL query
Sql_string='(';
for ss in ID_list:
Sql_string= Sql_string + " " + str(ss) + ","
Sql_string=Sql_string[:-1] + ")"
"SELECT * FROM
(SELECT * FROM Mytable WHERE ID IN " + Sql_string) as temp
Where Value = 5"
works surprisingly fast

Inefficient SQL query while excluding results on QuerySet

I'm trying to figure out why django ORM has such strange (as I think) behaviour. I have 2 basic models (simplified to get the main idea):
class A(models.Model):
pass
class B(models.Model):
name = models.CharField(max_length=15)
a = models.ForeignKey(A)
Now I want to select rows from table a that are refered from table b that dont have some value in column name.
Here is sample SQL I expect Django ORM to produce:
SELECT * FROM inefficient_foreign_key_exclude_a a
INNER JOIN inefficient_foreign_key_exclude_b b ON a.id = b.a_id
WHERE NOT (b.name = '123');
In case of filter() method of django.db.models.query.QuerySet it works as expected:
>>> from inefficient_foreign_key_exclude.models import A
>>> print A.objects.filter(b__name='123').query
SELECT `inefficient_foreign_key_exclude_a`.`id`
FROM `inefficient_foreign_key_exclude_a`
INNER JOIN `inefficient_foreign_key_exclude_b` ON (`inefficient_foreign_key_exclude_a`.`id` = `inefficient_foreign_key_exclude_b`.`a_id`)
WHERE `inefficient_foreign_key_exclude_b`.`name` = 123
But if I use exclude() method (a negative form of Q object in underlaying logic) it creates a really strange SQL query:
>>> print A.objects.exclude(b__name='123').query
SELECT `inefficient_foreign_key_exclude_a`.`id`
FROM `inefficient_foreign_key_exclude_a`
WHERE NOT ((`inefficient_foreign_key_exclude_a`.`id` IN (
SELECT U1.`a_id` FROM `inefficient_foreign_key_exclude_b` U1 WHERE (U1.`name` = 123 AND U1.`a_id` IS NOT NULL)
) AND `inefficient_foreign_key_exclude_a`.`id` IS NOT NULL))
Why does ORM make a subquery instead of just JOIN?
UPDATE:
I've made a test to prove that using a subquery is not efficient at all.
I created 500401 rows in both a and b tables. And here what I got:
For join:
mysql> SELECT count(*)
-> FROM inefficient_foreign_key_exclude_a a
-> INNER JOIN inefficient_foreign_key_exclude_b b ON a.id = b.a_id
-> WHERE NOT (b.name = 'abc');
+----------+
| count(*) |
+----------+
| 500401 |
+----------+
1 row in set (0.97 sec)
And for subquery:
mysql> SELECT count(*)
-> FROM inefficient_foreign_key_exclude_a a
-> WHERE NOT ((a.id IN (
-> SELECT U1.`a_id` FROM `inefficient_foreign_key_exclude_b` U1 WHERE (U1.`name` = 'abc' AND U1.`a_id` IS NOT NULL)
-> ) AND a.id IS NOT NULL));
+----------+
| count(*) |
+----------+
| 500401 |
+----------+
1 row in set (3.76 sec)
Join is almost 4 times faster.
It looks like it's a kind of optimization.
While filter() can be 'any' condition, it makes the join and the applies the restriction.
exclude() is more restrictive, so you are not forced to join the tables and it can build the query using subqueries which I suppose would make the query faster (due to index usage).
If you are using MySQL you could use explain command on the queries and see if my suggestion is right.

Categories

Resources