Django: IntegrityError during Many To Many add() - python

We run into a known issue in django:
IntegrityError during Many To Many add()
There is a race condition if several processes/requests try to add the same row to a ManyToManyRelation.
How to work around this?
Envionment:
Django 1.9
Linux Server
Postgres 9.3 (An update could be made, if necessary)
Details
How to reproduce it:
my_user.groups.add(foo_group)
Above fails if two requests try to execute this code at once. Here is the database table and the failing constraint:
myapp_egs_d=> \d auth_user_groups
id | integer | not null default ...
user_id | integer | not null
group_id | integer | not null
Indexes:
"auth_user_groups_pkey" PRIMARY KEY, btree (id)
fails ==> "auth_user_groups_user_id_group_id_key" UNIQUE CONSTRAINT,
btree (user_id, group_id)
Environment
Since this only happens on production machines, and all production machines in my context run postgres, a postgres only solution would work.

Can the error be reproduced?
Yes, let us use the famed Publication and Article models from Django docs. Then, let's create a few threads.
import threading
import random
def populate():
for i in range(100):
Article.objects.create(headline = 'headline{0}'.format(i))
Publication.objects.create(title = 'title{0}'.format(i))
print 'created objects'
class MyThread(threading.Thread):
def run(self):
for q in range(1,100):
for i in range(1,5):
pub = Publication.objects.all()[random.randint(1,2)]
for j in range(1,5):
article = Article.objects.all()[random.randint(1,15)]
pub.article_set.add(article)
print self.name
Article.objects.all().delete()
Publication.objects.all().delete()
populate()
thrd1 = MyThread()
thrd2 = MyThread()
thrd3 = MyThread()
thrd1.start()
thrd2.start()
thrd3.start()
You are sure to see unique key constraint violations of the type reported in the bug report. If you don't see them, try increasing the number of threads or iterations.
Is there a work around?
Yes. Use through models and get_or_create. Here is the models.py adapted from the example in the django docs.
class Publication(models.Model):
title = models.CharField(max_length=30)
def __str__(self): # __unicode__ on Python 2
return self.title
class Meta:
ordering = ('title',)
class Article(models.Model):
headline = models.CharField(max_length=100)
publications = models.ManyToManyField(Publication, through='ArticlePublication')
def __str__(self): # __unicode__ on Python 2
return self.headline
class Meta:
ordering = ('headline',)
class ArticlePublication(models.Model):
article = models.ForeignKey('Article', on_delete=models.CASCADE)
publication = models.ForeignKey('Publication', on_delete=models.CASCADE)
class Meta:
unique_together = ('article','publication')
Here is the new threading class which is a modification of the one above.
class MyThread2(threading.Thread):
def run(self):
for q in range(1,100):
for i in range(1,5):
pub = Publication.objects.all()[random.randint(1,2)]
for j in range(1,5):
article = Article.objects.all()[random.randint(1,15)]
ap , c = ArticlePublication.objects.get_or_create(article=article, publication=pub)
print 'Get or create', self.name
You will find that the exception no longer shows up. Feel free to increase the number of iterations. I only went up to a 1000 with get_or_create it didn't throw the exception. However add() usually threw an exception with in 20 iterations.
Why does this work?
Because get_or_create is atomic.
This method is atomic assuming correct usage, correct database
configuration, and correct behavior of the underlying database.
However, if uniqueness is not enforced at the database level for the
kwargs used in a get_or_create call (see unique or unique_together),
this method is prone to a race-condition which can result in multiple
rows with the same parameters being inserted simultaneously.
Update:
Thanks #louis for pointing out that the through model can in fact be eliminated. Thuse the get_or_create in MyThread2 can be changed as.
ap , c = article.publications.through.objects.get_or_create(
article=article, publication=pub)

If you are ready to solve it in PostgreSQL you may do the following in psql:
-- Create a RULE and function to intercept all INSERT attempts to the table and perform a check whether row exists:
CREATE RULE auth_user_group_ins AS
ON INSERT TO auth_user_groups
WHERE (EXISTS (SELECT 1
FROM auth_user_groups
WHERE user_id=NEW.user_id AND group_id=NEW.group_id))
DO INSTEAD NOTHING;
Then it will ignore duplicates only new inserts in table:
db=# TRUNCATE auth_user_groups;
TRUNCATE TABLE
db=# INSERT INTO auth_user_groups (user_id, group_id) VALUES (1,1);
INSERT 0 1 -- added
db=# INSERT INTO auth_user_groups (user_id, group_id) VALUES (1,1);
INSERT 0 0 -- no insert no error
db=# INSERT INTO auth_user_groups (user_id, group_id) VALUES (1,2);
INSERT 0 1 -- added
db=# SELECT * FROM auth_user_groups; -- check
id | user_id | group_id
----+---------+----------
14 | 1 | 1
16 | 1 | 2
(2 rows)
db=#

From what I'm seeing in the code provided. I believe that you have a constraint for uniqueness in pairs (user_id, group_id) in groups. So that's why running 2 times the same query will fail as you are trying to add 2 rows with the same user_id and group_id, the first one to execute will pass, but the second will raise an exception.

Related

How to execute two update statements in one transaction so they won't run into unique constraint in Django ORM?

Given models
from django.db import models
class RelatedTo(models.Model):
pass
class Thing(models.Model):
n = models.IntegerField()
related_to = models.ForeignKey(RelatedTo, on_delete=models.CASCADE)
class Meta:
constraints = [
models.UniqueConstraint(
fields=['n', 'related_to'],
name='unique_n_per_related_to'
)
]
and
>>> r = RelatedTo.objects.create()
>>> thing_zero = Thing.objects.create(related_to=r, n=0)
>>> thing_one = Thing.objects.create(related_to=r, n=1)
I want to switch their numbers (n).
In update method of my serializer (drf) I was trying to
#transaction.atomic
def update(self, instance, validated_data):
old_n = instance.n
new_n = validated_data['n']
Thing.objects.filter(
related_to=instance.related_to,
n=new_n
).update(n=old_n)
return super().update(instance, validated_data)
but it still runs into constraint.
select_for_update doesn't help either.
Is it possible not to run into this DB constraint using Django ORM or do I have to run raw sql to achieve that?
Django==3.1.2
postgres:12.5
Error
duplicate key value violates unique constraint "unique_n_per_related_to"
DETAIL: Key (n, related_to)=(1, 1) already exists.
I wasn't able to resolve this issue neither with bulk_update nor with raw sql.
stmt = f"""
update {to_update._meta.db_table} as t
set n = i.n
from (values
('{to_update.id}'::uuid, {n}),
('{method.id}'::uuid, {n})
) as i(id, n)
where t.id = i.id
"""
with connection.cursor() as cur:
cur.execute(stmt)
The only solution for this problem is making the column nullable and write 3 times to the table which physically hurts.

Peewee created new a entry successfully but lost the foreign-key value from select() query

Python version 3.6.8,
peewee version 3.10.0
I have 3 tables set up in a sqlite database using peewee.
Plan:
- id : int (primary key)
- plan_name : varchar (unique)
- status : int (foreign key for PlanStatus)
- category : int (foreign key for PlanCategory)
PlanStatus:
- id : int (primary key)
- value : varchar (unique)
PlanCategory:
- id : int (primary key)
- value : varchar (unique)
PlanStatus is an enum reference table, and PlanCategory is another enum reference table. In the code below, PlanStatus is implemented naively with much boilerplate that would have to be duplicated for each other enum table.
In contrast, PlanCategory inherits from parent class EnumBaseModel, including 2 classmethods. The goal is to reduce boilerplate with inheritance.
The result is that both enum tables were populated successfully, and you can access values from them with queries. However, in creating a Plan entry, a row is added in the database (inspected in sqlite), but a select query returns the row with a missing value for the PlanCategory foreign key.
Creating tables and adding rows:
from peewee import *
DATABASE = SqliteDatabase('test.db')
# Base class with the inner Meta class defined
class BaseModel(Model):
class Meta:
database = DATABASE
# PlanStatus class, used with the following 2 methods
class PlanStatus(BaseModel):
value = CharField(unique=True)
# Helper function for PlanStatus
def init_plan_status_values(values):
for value in values:
if not PlanStatus.select().where(PlanStatus.value == value).exists():
PlanStatus.create(value=value)
# Helper function for PlanStatus
def get_plan_status(value):
try:
return PlanStatus.get(PlanStatus.value == value)
except DoesNotExist as err:
return None
# Base class with 2 classmethods
class EnumBaseModel(BaseModel):
value = CharField(unique=True)
#classmethod
def init_values(cls, values):
for value in values:
if not cls.select().where(cls.value == value).exists():
cls.create(value=value)
#classmethod
def get(cls, value):
try:
return cls.select().where(cls.value == value).get()
except DoesNotExist as err:
return None
# PlanCategory inherits EnumBaseModel class and its 2 classmethods
class PlanCategory(EnumBaseModel):
pass
# Plan has 2 foreign keys
class Plan(BaseModel):
plan_name = CharField(unique=True)
status = ForeignKeyField(model=PlanStatus, backref='plans')
category = ForeignKeyField(model=PlanCategory, backref='plans')
DATABASE.connect()
DATABASE.create_tables(
[
PlanStatus,
PlanCategory,
Plan
],
safe=True
)
# Populating the enum values PlanStatus the explicit way above
init_plan_status_values(('STATUS-1', 'STATUS-2', 'STATUS-3'))
# Find status_2 the explicit way above
status_2 = get_plan_status('STATUS-2')
# Populating the enum values in PlanCategory using the inherited classmethod above
PlanCategory.init_values(('CATEGORY-1', 'CATEGORY-2', 'CATEGORY-3'))
# Find category_3 using the inherited classmethod above
category_3 = PlanCategory.get('CATEGORY-3')
# Add one plan
try:
Plan.create(
plan_name='not bad plan',
status=status_2,
category=category_3,
)
except IntegrityError as err:
print(err)
Now we see in sqlite3 the rows were added succesfully:
SQLite version 3.22.0 2018-01-22 18:45:57
Enter ".help" for usage hints.
sqlite> .tables
plan plancategory planstatus
sqlite> select * from planstatus;
1|STATUS-1
2|STATUS-2
3|STATUS-3
sqlite> select * from plancategory;
1|CATEGORY-1
2|CATEGORY-2
3|CATEGORY-3
sqlite> select * from plan;
1|not bad plan|2|3
sqlite>
Now checking the plan entry from the select() query, 'a_plan.status' is valid, but 'a_plan.category' is None.
# We see the references status_2 and category_3 are valid
print('status_2 = ', type(status_2), status_2, status_2.value)
print('category_3 = ', type(category_3), category_3, category_3.value)
print()
# We check the one plan in the table and see now the foreign-key value "category" is missing
a_plan = Plan.get()
print('a_plan: plan_name={}, status={}, category={}'.format(
a_plan.plan_name,
a_plan.status,
a_plan.category
))
print()
Printed results:
status_2 = <Model: PlanStatus> 2 STATUS-2
category_3 = <Model: PlanCategory> 3 CATEGORY-3
a_plan: plan_name=not bad plan, status=2, category=None
Additionally, I found attributes 'status_id' and 'category_id' created by peewee. At least 'category_id' still retains the foreign key int value.
# After inspecting dir(a_plan), found these attributes:
print('status = ', type(a_plan.status), a_plan.status)
print('status_id = ', type(a_plan.status_id), a_plan.status_id)
print('category = ', type(a_plan.category), a_plan.category)
print('category_id = ', type(a_plan.category_id), a_plan.category_id)
Printed results:
status = <Model: PlanStatus> 2
status_id = <class 'int'> 2
category = <class 'NoneType'> None
category_id = <class 'int'> 3
Is there any way to fix the problem so it can resolve 'a_plan.category'?
You're overriding methods (.get) that are used by Peewee. Don't do that! I think you are making things too magical (and introducing queries all over the place in the process).
Try simplifying. I can almost guarantee the issue is in the overrides you're doing of classmethods that Peewee depends on.

cassandra not set default value for new column added later in python model

I have code like below.
from uuid import uuid4
from uuid import uuid1
from cassandra.cqlengine import columns, connection
from cassandra.cqlengine.models import Model
from cassandra.cqlengine.management import sync_table
class BaseModel(Model):
__abstract__ = True
id = columns.UUID(primary_key=True, default=uuid4)
created_timestamp = columns.TimeUUID(primary_key=True,
clustering_order='DESC',
default=uuid1)
deleted = columns.Boolean(required=True, default=False)
class OtherModel(BaseModel):
__table_name__ = 'other_table'
if __name__ == '__main__':
connection.setup(hosts=['localhost'],
default_keyspace='test')
sync_table(OtherModel)
OtherModel.create()
After first execution, I can see the record in db when run query as.
cqlsh> select * from test.other_table;
id | created_timestamp | deleted
--------------------------------------+--------------------------------------+---------
febc7789-5806-44d8-bbd5-45321676def9 | 840e1b66-cc73-11e6-a66c-38c986054a88 | False
(1 rows)
After this, I added new column name in OtherModel it and run same program.
class OtherModel(BaseModel):
__table_name__ = 'other_table'
name = columns.Text(required=True, default='')
if __name__ == '__main__':
connection.setup(hosts=['localhost'],
default_keyspace='test')
sync_table(OtherModel)
OtherModel.create(name='test')
When check db entry
cqlsh> select * from test.other_table;
id | created_timestamp | deleted | name
--------------------------------------+--------------------------------------+---------+------
936cfd6c-44a4-43d3-a3c0-fdd493144f4b | 4d7fd78c-cc74-11e6-bb49-38c986054a88 | False | test
febc7789-5806-44d8-bbd5-45321676def9 | 840e1b66-cc73-11e6-a66c-38c986054a88 | False | null
(2 rows)
There is one row with name as null.
But I can't query on null value.
cqlsh> select * from test.other_table where name=null;
InvalidRequest: code=2200 [Invalid query] message="Unsupported null value for indexed column name"
I got reference How Can I Search for Records That Have A Null/Empty Field Using CQL?.
When I set default='' in the Model, why it not set for all the null value in table?
Is there any way to set null value in name to default value '' with query?
The null cell is actually it just not being set. And the absence of data isn't something you can query on, since its a filtering operation. Its not scalable or possible to do efficiently, so its not something C* will encourage (or in this case even allow).
Going back and retroactively setting values to all the previously created rows would be very expensive (has to read everything, then do as many writes). Its pretty easy in application side to just say if name is null its '' though.

Mapping a class against multiple tables in SQLAlchemy

# ! /usr/bin/env python
# -*- coding: utf-8 -*-
# login_frontend.py
""" Python 2.7.3
Cherrypy 3.2.2
PostgreSQL 9.1
psycopy2 2.4.5
SQLAlchemy 0.7.10
"""
I'm having a problem joining four tables in one Python/SQLAlchemy class. I'm trying this, so I can iterate the instance of this class, instead of the named tuple, which I get from joining tables with the ORM.
Why all of this? Because I already started that way and I came too far, to just leave it. Also, it has to be possible, so I want to know how it's done.
For this project (cherrypy web-frontend) I got an already completed module with the table classes. I moved it to the bottom of this post, because maybe it isn't even necessary for you.
The following is just one example of a joined multiple tables class attempt. I picked a simple case with more than only two tables and a junction table. Here I don't write into these joined tables, but it is necessary somewhere else. That's why classes would be a nice solution to this problem.
My attempt of a join class,
which is a combination of the given table classes module and the examples from these two websites:
-Mapping a Class against Multiple Tables
-SQLAlchemy: one classes – two tables
class JoinUserGroupPerson (Base):
persons = md.tables['persons']
users = md.tables['users']
user_groups = md.tables['user_groups']
groups = md.tables['groups']
user_group_person =(
join(persons, users, persons.c.id == users.c.id).
join(user_groups, users.c.id == user_groups.c.user_id).
join(groups, groups.c.id == user_groups.c.group_id))
__table__ = user_group_person
""" I expanded the redefinition of 'id' to three tables,
and removed this following one, since it made no difference:
users_id = column_property(users.c.id, user_groups.c.user_id)
"""
id = column_property(persons.c.id, users.c.id, user_groups.c.user_id)
groups_id = column_property(groups.c.id, user_groups.c.group_id)
groups_name = groups.c.name
def __init__(self, group_name, login, name, email=None, phone=None):
self.groups_name = group_name
self.login = login
self.name = name
self.email = email
self.phone = phone
def __repr__(self):
return(
"<JoinUserGroupPerson('%s', '%s', '%s', '%s', '%s')>" %(
self.groups_name, self.login, self.name, self.email, self.phone))
Different table accesses with this join class
This is how I tried to query this class in another module:
pg = sqlalchemy.create_engine(
'postgresql://{}:{}#{}:{}/{}'.
format(user, password, server, port, data))
Session = sessionmaker(bind=pg)
s1 = Session()
query = (s1.query(JoinUserGroupPerson).
filter(JoinUserGroupPerson.login==user).
order_by(JoinUserGroupPerson.id))
record = {}
for rowX in query:
for colX in rowX.__table__.columns:
record[column.name] = getattr(rowX,colX.name)
""" RESULT:
"""
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/cherrypy/_cprequest.py", line 656, in respond
response.body = self.handler()
File "/usr/local/lib/python2.7/dist-packages/cherrypy/lib/encoding.py", line 228, in __call__
ct.params['charset'] = self.find_acceptable_charset()
File "/usr/local/lib/python2.7/dist-packages/cherrypy/lib/encoding.py", line 134, in find_acceptable_charset
if encoder(encoding):
File "/usr/local/lib/python2.7/dist-packages/cherrypy/lib/encoding.py", line 86, in encode_string
for chunk in self.body:
File "XXX.py", line YYY, in ZZZ
record[colX.name] = getattr(rowX,colX.name)
AttributeError: 'JoinUserGroupPerson' object has no attribute 'user_id'
Then I checked the table attributes:
for rowX in query:
return (u'{}'.format(rowX.__table__.columns))
""" RESULT:
"""
['persons.id',
'persons.name',
'persons.email',
'persons.phone',
'users.id',
'users.login',
'user_groups.user_id',
'user_groups.group_id',
'groups.id',
'groups.name']
Then I checked, if the query or my class isn't working at all, by using a counter.
I got up to (count == 5), so the first two joined tables. But when I set the condition to (count == 6), I got the first error message again. AttributeError: 'JoinUserGroupPerson' object has no attribute 'user_id'.:
list = []
for rowX in query:
for count, colX in enumerate(rowX.__table__.columns):
list.append(getattr(rowX,colX.name))
if count == 5:
break
return (u'{}'.format(list))
""" RESULT:
"""
[4, u'user real name', None, None, 4, u'user']
""" which are these following six columns:
persons[id, name, email, phone], users[id, login]
"""
Then I checked each column:
list = []
for rowX in query:
for colX in rowX.__table__.columns:
list.append(colX)
return (u'{}'.format(list))
""" RESULT:
"""
[Column(u'id', INTEGER(), table=, primary_key=True, nullable=False, server_default=DefaultClause(, for_update=False)),
Column(u'name', VARCHAR(length=252), table=, nullable=False),
Column(u'email', VARCHAR(), table=),
Column(u'phone', VARCHAR(), table=),
Column(u'id', INTEGER(), ForeignKey(u'persons.id'), table=, primary_key=True, nullable=False),
Column(u'login', VARCHAR(length=60), table=, nullable=False),
Column(u'user_id', INTEGER(), ForeignKey(u'users.id'), table=, primary_key=True, nullable=False),
Column(u'group_id', INTEGER(), ForeignKey(u'groups.id'), table=, primary_key=True, nullable=False),
Column(u'id', INTEGER(), table=, primary_key=True, nullable=False),
Column(u'name', VARCHAR(length=60), table=, nullable=False)]
Then I tried another two direct accesses, which got me both KeyErrors for 'id' and 'persons.id':
for rowX in query:
return (u'{}'.format(rowX.__table__.columns['id'].name))
for rowX in query:
return (u'{}'.format(rowX.__table__.columns['persons.id'].name))
Conclusion
I tried a few other things, which were even more confusing. Since they didn't reveal any more information, I didn't add them. I don't see where my class is wrong.
I guess, somehow I must have set the class in a way, which would only correctly join the first two tables. But the join works at least partially, because when the 'user_groups' table was empty, I got an empty query as well.
Or maybe I did something wrong with the mapping of this 'user_groups' table. Since with the join some columns are double, they need an additional definition. And the 'user_id' is already part of the persons and users table, so I had to map it twice.
I even tried to remove the 'user_groups' table from the join, because it's in the relationships (with secondary). It got me a foreign key error message. But maybe I just did it wrong.
Admittedly, I even don't know why ...
rowX.__table__.columns # column names as table name suffix
... has different attribute names than ...
colX in rowX.__table__.columns # column names without table names
Extra Edits
Another thought! Would all of this be possible with inheritance? Each class has its own mapping, but then the user_groups class may be necessary. The joins had to be between the single classes instead. The init() and repr() still had to be redefined.
It probably has something to do with the 'user_groups' table, because I even couldn't join it with the 'groups' or 'users' table. And it always says, that the class object has no attribute 'user_id'. Maybe it's something about the many-to-many relationship.
Attachment
Here is the already given SQLAlchemy module, with header, without specific information about the database, and the classes of the joined tables:
#!/usr/bin/python
# vim: set fileencoding=utf-8 :
import sqlalchemy
from sqlalchemy import join
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import relationship, backref, column_property
pg = sqlalchemy.create_engine(
'postgresql://{}#{}:{}/{}'.format(user, host, port, data))
md = sqlalchemy.MetaData(pg, True)
Base = declarative_base()
""" ... following, three of the four joined tables.
UserGroups isn't necessary, so it wasn't part of the module.
And the other six classes shouldn't be important for this ...
"""
class Person(Base):
__table__ = md.tables['persons']
def __init__(self, name, email=None, phone=None):
self.name = name
self.email = email
self.phone = phone
def __repr__(self):
return(
"<Person(%s, '%s', '%s', '%s')>" %(
self.id, self.name, self.email, self.phone))
class Group(Base):
__table__ = md.tables['groups']
def __init__(self, name):
self.name = name
def __repr__(self):
return("<Group(%s, '%s')>" %(self.id, self.name))
class User(Base):
__table__ = md.tables['users']
person = relationship('Person')
groups = relationship(
'Group', secondary=md.tables['user_groups'], order_by='Group.id',
backref=backref('users', order_by='User.login'))
def __init__(self, person, login):
if isinstance(person, Person):
self.person = person
else:
self.id = person
self.login = login
def __repr__(self):
return("<User(%s, '%s')>" %(self.id, self.login))
Maybe the following script, which created the database, and also was already given, will prove useful here. As last part of it comes some test data - but between the columns are supposed to be tabs, no spaces. Because of that, this script also can be found as gist on github:
-- file create_str.sql
-- database creation script
-- central script for creating all database objects
-- set the database name
\set strdbname logincore
\c admin
BEGIN;
\i str_roles.sql
COMMIT;
DROP DATABASE IF EXISTS :strdbname;
CREATE DATABASE :strdbname TEMPLATE template1 OWNER str_db_owner
ENCODING 'UTF8';
\c :strdbname
SET ROLE str_db_owner;
BEGIN;
\i str.sql
COMMIT;
RESET ROLE;
-- file str_roles.sql
-- create roles for the database
-- owner of the database objects
SELECT create_role('str_db_owner', 'NOINHERIT');
-- role for using
SELECT create_role('str_user');
-- make str_db_owner member in all relevant roles
GRANT str_user TO str_db_owner WITH ADMIN OPTION;
-- file str.sql
-- creation of database
-- prototypes
\i str_prototypes.sql
-- domain for non empty text
CREATE DOMAIN ntext AS text CHECK (VALUE<>'');
-- domain for email addresses
CREATE DOMAIN email AS varchar(252) CHECK (is_email_address(VALUE));
-- domain for phone numbers
CREATE DOMAIN phone AS varchar(60) CHECK (is_phone_number(VALUE));
-- persons
CREATE TABLE persons (
id serial PRIMARY KEY,
name varchar(252) NOT NULL,
email email,
phone phone
);
GRANT SELECT, INSERT, UPDATE, DELETE ON persons TO str_user;
GRANT USAGE ON SEQUENCE persons_id_seq TO str_user;
CREATE TABLE groups (
id integer PRIMARY KEY,
name varchar(60) UNIQUE NOT NULL
);
GRANT SELECT ON groups TO str_user;
-- database users
CREATE TABLE users (
id integer PRIMARY KEY REFERENCES persons(id) ON UPDATE CASCADE,
login varchar(60) UNIQUE NOT NULL
);
GRANT SELECT ON users TO str_user;
-- user <-> groups
CREATE TABLE user_groups (
user_id integer NOT NULL REFERENCES users(id)
ON UPDATE CASCADE ON DELETE CASCADE,
group_id integer NOT NULL REFERENCES groups(id)
ON UPDATE CASCADE ON DELETE CASCADE,
PRIMARY KEY (user_id, group_id)
);
-- functions
\i str_functions.sql
-- file str_prototypes.sql
-- prototypes for database
-- simple check for correct email address
CREATE FUNCTION is_email_address(email varchar) RETURNS boolean
AS $CODE$
SELECT FALSE
$CODE$ LANGUAGE sql IMMUTABLE STRICT;
-- simple check for correct phone number
CREATE FUNCTION is_phone_number(nr varchar) RETURNS boolean
AS $CODE$
SELECT FALSE
$CODE$ LANGUAGE sql IMMUTABLE STRICT;
-- file str_functions.sql
-- functions for database
-- simple check for correct email address
CREATE OR REPLACE FUNCTION is_email_address(email varchar) RETURNS boolean
AS $CODE$
SELECT $1 ~ E'^[A-Za-z0-9.!#$%&\'\*\+\-/=\?\^_\`{\|}\~\.]+#[-a-z0-9\.]+$'
$CODE$ LANGUAGE sql IMMUTABLE STRICT;
-- simple check for correct phone number
CREATE OR REPLACE FUNCTION is_phone_number(nr varchar) RETURNS boolean
AS $CODE$
SELECT $1 ~ E'^[-+0-9\(\)/ ]+$'
$CODE$ LANGUAGE sql IMMUTABLE STRICT;
-- file fill_str_test.sql
-- test data for database
-- between the columns are supposed to be tabs, no spaces !!!
BEGIN;
COPY persons (id, name, email) FROM STDIN;
1 Joseph Schneider jschneid#lab.uni.de
2 Test User jschneid#lab.uni.de
3 Hans Dampf \N
\.
SELECT setval('persons_id_seq', (SELECT max(id) FROM persons));
COPY groups (id, name) FROM STDIN;
1 IT
2 SSG
\.
COPY users (id, login) FROM STDIN;
1 jschneid
2 tuser
3 dummy
\.
COPY user_groups (user_id, group_id) FROM STDIN;
1 1
2 1
3 2
\.
COMMIT;
Regarding the KeyError: The strings that are printed in the repr of the __table__.columns object are NOT the keys, and because you have multiple id columns there is some name munging going on. You probably want to do "persons_id" rather than "persons.id" but I recommend printing __table__.columns.keys() to be sure.
Regarding the AttributeError: SQLAlchemy maps column names directly to attributes by default, unless you define attribute mappings yourself, which you are. The fact that you are defining the id attribute as a column_property on persons.c.id, users.c.id, user_groups.c.user_id means that none of those columns is being directly mapped to an attribute on the ORM class anymore, but they will still be in the columns collection. So you just can't use columns as an iterable of attribute names.
I did not reproduce all of your code/data, but I put together a simpler test case with 3 tables (including a m2m relationship) to verify these items.

SQLAlchemy: One-Way Relationship, Correlated Subquery

thanks in advance for your help.
I have two entities, Human and Chimp. Each has a collection of metrics, which can contain subclasses of a MetricBlock, for instance CompleteBloodCount (with fields WHITE_CELLS, RED_CELLS, PLATELETS).
So my object model looks like (forgive the ASCII art):
--------- metrics --------------- ----------------------
| Human | ----------> | MetricBlock | <|-- | CompleteBloodCount |
--------- --------------- ----------------------
^
--------- metrics |
| Chimp | --------------
---------
This is implemented with the following tables:
Chimp (id, …)
Human (id, …)
MetricBlock (id, dtype)
CompleteBloodCount (id, white_cells, red_cells, platelets)
CholesterolCount (id, hdl, ldl)
ChimpToMetricBlock(chimp_id, metric_block_id)
HumanToMetricBlock(human_id, metric_block_id)
So a human knows its metric blocks, but a metric block does not know its human or chimp.
I would like to write a query in SQLAlchemy to find all CompleteBloodCounts for a particular human. In SQL I could write something like:
SELECT cbc.id
FROM complete_blood_count cbc
WHERE EXISTS (
SELECT 1
FROM human h
INNER JOIN human_to_metric_block h_to_m on h.id = h_to_m.human_id
WHERE
h_to_m.metric_block_id = cbc.id
)
I'm struggling though to write this in SQLAlchemy. I believe correlate(), any(), or an aliased join may be helpful, but the fact that a MetricBlock doesn't know its Human or Chimp is a stumbling block for me.
Does anyone have any advice on how to write this query? Alternately, are there other strategies to define the model in a way that works better with SQLAlchemy?
Thank you for your assistance.
Python 2.6
SQLAlchemy 0.7.4
Oracle 11g
Edit:
HumanToMetricBlock is defined as:
humanToMetricBlock = Table(
"human_to_metric_block",
metadata,
Column("human_id", Integer, ForeignKey("human.id"),
Column("metric_block_id", Integer, ForeginKey("metric_block.id")
)
per the manual.
Each primate should have a unique ID, regardless of what type of primate they are. I'm not sure why each set of attributes (MB, CBC, CC) are separate tables, but I assume that they have more than one dimension (primate) such as time, otherwise I would only have one giant table.
Thus, I would structure this problem in the following manner:
Create a parent object Primate and derive humans and chimps from it. This example is using single table inheritance, though you may want to use joined table inheritance based on their attributes.
class Primate(Base):
__tablename__ = 'primate'
id = Column(Integer, primary_key=True)
genus = Column(String)
...attributes all primates have...
__mapper_args__ = {'polymorphic_on': genus, 'polymorphic_identity': 'primate'}
class Chimp(Primate):
__mapper_args__ = {'polymorphic_identity': 'chimp'}
...attributes...
class Human(Primate):
__mapper_args__ = {'polymorphic_identity': 'human'}
...attributes...
class MetricBlock(Base):
id = ...
Then you create a single many-to-many table (you can use an association proxy instead):
class PrimateToMetricBlock(Base):
id = Column(Integer, primary_key=True) # primary key is needed!
primate_id = Column(Integer, ForeignKey('primate.id'))
primate = relationship('Primate') # If you care for relationships.
metricblock_id = Column(Integer, ForeignKey('metric_block.id')
metricblock = relationship('MetricBlock')
Then I would structure the query like so (note that the on clause is not necessary since SQLAlchemy can infer the relationships automatically since there's no ambiguity):
query = DBSession.query(CompleteBloodCount).\
join(PrimateToMetricBlock, PrimateToMetricBlock.metricblock_id == MetricBlock.id)
If you want to filter by primate type, join the Primate table and filter:
query = query.join(Primate, Primate.id == PrimateToMetricBlock.primate_id).\
filter(Primate.genus == 'human')
Otherwise, if you know the ID of the primate (primate_id), no additional join is necessary:
query = query.filter(PrimateToMetricBlock.primate_id == primate_id)
If you're only retrieving one object, end the query with:
return query.first()
Otherwise:
return query.all()
Forming your model like this should eliminate any confusion and actually make everything simpler. If I'm missing something, let me know.

Categories

Resources