I am trying to validate a field of an ORM model when it's created. If it is valid I would like it to return the model object. If it fails it will return None. My current approach is to use the __new__ method and carry out the checks there. But I seem to be getting some errors when doing so with a model class that inherits from Base. Here is my code:
class Thing(Base):
__tablename__ = "things"
id = sa.Column(sa.Integer, autoincrement=True, primary_key=True)
name = sa.Column(sa.String(), unique=True, nullable=False)
def __new__(cls, name: str) -> Thing:
"""
Validates input before creating instance
:param name: Thing name. E.g. orange, banana
"""
is_valid = cls.check_if_valid_name(name)
if is_valid:
instance = super(Thing, cls).__new__(cls)
instance.name = name
return instance
else:
return None
I find it quite hard to debug SQLAlchemy. But it looks like the error is occuring when I am assigning a value to the name field. I'm guessing because whatever SqlAlchemy is doing under the hood hasn't run yet, so the field hasn't been registered properly before assignment. Has anyone managed to implement something similar? If so what was your approach?
Returning None from the constructor (__new__) of a class is a monumentally bad idea and design choice. You expect two things from invoking a constructor:
Either getting an instance of the class or
getting an exception raised.
If you want validation checks, run them in the initializer (__init__) and throw e.g. a ValueError if they fail. Then the caller can take approriate action, such as defaulting a variable to None.
In the end I've added a function to handle the validation and initialization of the model.
def make_thing(name):
if check_if_valid_name(name):
thing = Thing(name=name)
else:
return None
I have plenty of Hardware models which have a HardwareType with various characteristics. Like so:
# models.py
from django.db import models
class HardwareType(model.Models):
name = models.CharField(max_length=32, unique=True)
# some characteristics of this particular piece of hardware
weight = models.DecimalField(max_digits=12, decimal_places=3)
# and more [...]
class Hardware(models.Model):
type = models.ForeignKey(HardwareType)
# some attributes
is_installed = models.BooleanField()
location_installed = models.TextField()
# and more [...]
If I wish to add a new Hardware object, I first have to retrieve the HardwareType every time, which is not very DRY:
tmp_hd_type = HardwareType.objects.get(name='NG35001')
new_hd = Hardware.objects.create(type=tmp_hd_type, is_installed=True, ...)
Therefore, I have tried to override the HardwareManager.create() method to automatically import the type when creating new Hardware like so:
# models.py
from django.db import models
class HardwareType(model.Models):
name = models.CharField(max_length=32, unique=True)
# some characteristics of this particular piece of hardware
weight = models.DecimalField(max_digits=12, decimal_places=3)
# and more [...]
class HardwareManager(models.Manager):
def create(self, *args, **kwargs):
if 'type' in kwargs and kwargs['type'] is str:
kwargs['type'] = HardwareType.objects.get(name=kwargs['type'])
super(HardwareManager, self).create(*args, **kwargs)
class Hardware(models.Model):
objects = HardwareManager()
type = models.ForeignKey(HardwareType)
# some attributes
is_installed = models.BooleanField()
location_installed = models.TextField()
# and more [...]
# so then I should be able to do:
new_hd = Hardware.objects.create(type='ND35001', is_installed=True, ...)
But I keep getting errors and really strange behaviors from the ORM (I don't have them right here, but I can post them if needed). I've searched in the Django documentation and the SO threads, but mostly I end up on solutions where:
the Hardware.save() method is overridden (should I get the HardwareType there ?) or,
the manager defines a new create_something method which calls self.create().
I also started digging into the code and saw that the Manager is some special kind of QuerySet but I don't know how to continue from there. I'd really like to replace the create method in place and I can't seem to manage this. What is preventing me from doing what I want to do?
The insight from Alasdair's answer helped a lot to catch both strings and unicode strings, but what was actually missing was a return statement before the call to super(HardwareManager, self).create(*args, **kwargs) in the HardwareManager.create() method.
The errors I was getting in my tests yesterday evening (being tired when coding is not a good idea :P) were ValueError: Cannot assign None: [...] does not allow null values. because the subsequent usage of new_hd that I had create()d was None because my create() method didn't have a return. What a stupid mistake !
Final corrected code:
class HardwareManager(models.Manager):
def create(self, *args, **kwargs):
if 'type' in kwargs and isinstance(kwargs['type'], basestring):
kwargs['type'] = HardwareType.objects.get(name=kwargs['type'])
return super(HardwareManager, self).create(*args, **kwargs)
Without seeing the traceback, I think the problem is on this line.
if 'type' in kwargs and kwargs['type'] is str:
This is checking whether kwargs['type'] is the same object as str, which will always be false.
In Python 3, to check whether `kwargs['type'] is a string, you should do:
if 'type' in kwargs and isinstance(kwargs['type'], str):
If you are using Python 2, you should use basestring, to catch byte strings and unicode strings.
if 'type' in kwargs and isinstance(kwargs['type'], basestring):
I was researching the same problem as you and decided not to use an override.
In my case making just another method made more sense given my constraints.
class HardwareManager(models.Manager):
def create_hardware(self, type):
_type = HardwareType.objects.get_or_create(name=type)
return self.create(type = _type ....)
I've actually never encountered this error before:
sqlalchemy.exc.InvalidRequestError: stale association proxy, parent object has gone out of scope
After doing some research, it looks like its because the parent object is being garbage collected while the association proxy is working. Fantastic.
However, I'm not sure where it's happening.
Relevant code:
# models.py
class Artist(db.Model):
# ...
tags = association_proxy('_tags', 'tag',
creator=lambda t: ArtistTag(tag=t))
# ...
class Tag(db.Model):
# ...
artist = association_proxy('_artists', 'artist',
creator=lambda a: ArtistTag(artist=a))
# ...
class ArtistTag(db.Model):
# ...
artist_id = db.Column(db.Integer, ForeignKey('artists.id'))
artist = db.relationship('Artist', backref='_tags')
tag_id = db.Column(db.Integer, ForeignKey('tags.id'))
tag = db.relationship('Tag', backref='_artists')
# api/tag.py
from flask.ext.restful import Resource
from ..
class ListArtistTag(Resource):
def get(self, id):
# much safer in actual app
return TagSchema(many=True)
.dump(Artist.query.get(id).tags)
.data
I know it's an old question, but I haven't found a clear solution to a similar problem anywhere on the web, so I've decided to reply here.
The key here is to assign the object that holds the association proxy to a variable before performing any further operations on them. Association proxies aren't regular object properties which would force the GC to hold the reference to the parent object. Actually, the call in form of:
tags = association_proxy('_tags', 'tag', creator=lambda t: ArtistTag(tag=t))
will result in creation of a new AssociationProxy class object, with a weak reference to the target's collection. In low memory conditions, GC may try to collect Artist.query.get(id) result, leaving just the result's tags collection (being a AssociationProxy class object), but it's required that the object having a association proxy to be present, due to SQLAlchemy's implementation (lazy loading mechanism precisely, I believe).
To fix this situation, we need to make sure that the Artist object returned from Artist.query.get(id) call is assigned to a variable, so that the reference count to that object is explicitly of non-zero value. So this:
class ListArtistTag(Resource):
def get(self, id):
# much safer in actual app
return TagSchema(many=True)
.dump(Artist.query.get(id).tags)
.data
becomes this:
class ListArtistTag(Resource):
def get(self, id):
artist = Artist.query.get(id)
return TagSchema(many=True)
.dump(artist.tags)
.data
And it will work as expected. Simple, right?
I've created a series of custom ModelFields that are simply restricted ForeignKeys. Below you'll find CompanyField. When instantiated, you may provide a type (e.g., Client, Vendor). With a type provided, the field ensures that only values that have the appropriate type are allowed.
The crm app, the one that defines the custom fields, compiles and runs just fine. Eventually I added references to the fields to a different app (incidents) using "from crm import fields". Now I'm seeing a whole bunch of errors like this:
incidents.incident: 'group' has a relation with model Company, which has either not been installed or is abstract.
Here are all the gory details. Please let me know if there's any more information I could provide which may be helpful.
## crm/fields.py
import models as crmmods
class CompanyField(models.ForeignKey):
def __init__(self, *args, **kwargs):
# This is a hack to get South working. In either case, we just need to
# make sure the FK refers to Company.
try:
# kwargs['to'] == crmmods.company doesn't work for some reason I
# still haven't figured out
if str(kwargs['to']) != str(crmmods.Company):
raise AttributeError("Only crm.models.Company is accepted " + \
"for keyword argument 'to'")
except:
kwargs['to'] = 'Company'
# See if a CompanyType was provided and, if so, store it as self.type
if len(args) > 0:
company_type = args[0]
# Type is expected to be a string or CompanyType
if isinstance(company_type, str):
company_type = company_type.upper()
if hasattr(crmmods.CompanyType, company_type):
company_type = getattr(crmmods.CompanyType, company_type)
else:
raise AttributeError(
"%s is not a valid CompanyType." % company_type)
elif not isinstance(company_type, crmmods.CompanyType):
raise AttributeError(
"Expected str or CompanyType for first argument.")
self.type = company_type
else:
self.type = None
super(CompanyField, self).__init__(**kwargs)
def formfield(self, **kwargs):
# Restrict the formfield so it only displays Companies with the correct
# type.
if self.type:
kwargs['queryset'] = \
crmmods.Company.objects.filter(companytype__role=self.type)
return super(CompanyField, self).formfield(**kwargs)
def validate(self, value, model_instance):
super(CompanyField, self).validate(value, model_instance)
# No type set, nothing to check.
if not value or not self.type:
return
# Ensure that value is correct type.
if not \
value.companytype_set.filter(role=self.type).exists():
raise ValidationError("Company does not have the " + \
"required roles.")
## crm/models.py
import fields
class CompanyType(models.Model):
name = models.CharField(max_length=25)
class Company(models.Model):
type = models.ForeignKey(CompanyType)
class Person(models.Model):
name = models.CharField(max_length=50)
company = fields.CompanyField("Client")
## incidents/models.py
from crm import fields as crmfields
class Incident(models.Model):
company = crmfields.CompanyField("Client")
You have a circular package dependency. fields imports models which imports fields which imports models which imports fields . . .
Circular package dependecies are A BAD IDEA(tm). Although it may work in some cases, it doesn't work in your case, for a complicated reason involving metaclasses, which I will spare myself.
EDIT
The reason is that the Django ORM module uses metaclasses to turn its class variables (the fields of the Model) into property descriptors on the object. This is done at class definition time by the metaclass. A class is defined when the module is loaded. For this reason its attributes must also be defined at class loading. This is unlike the code of a method, where references to names are resolved the moment a class is executed.
Now, since you refer to a field object in your class definition from models and back again, this will not work.
If you place all three in the same package, your problem will be solved.
The was fixed, after much toil, simply by changing
kwargs['to'] = 'Company'
to
kwargs['to'] = 'crm.Company'
It seems that when the 'to' argument was evaluated outside of the crm app, it was evaluating it in the context of the incidents app. That is, it was looking for 'incidents.Company' which, as the error message suggested, didn't exist.
I want to get an object from the database if it already exists (based on provided parameters) or create it if it does not.
Django's get_or_create (or source) does this. Is there an equivalent shortcut in SQLAlchemy?
I'm currently writing it out explicitly like this:
def get_or_create_instrument(session, serial_number):
instrument = session.query(Instrument).filter_by(serial_number=serial_number).first()
if instrument:
return instrument
else:
instrument = Instrument(serial_number)
session.add(instrument)
return instrument
Following the solution of #WoLpH, this is the code that worked for me (simple version):
def get_or_create(session, model, **kwargs):
instance = session.query(model).filter_by(**kwargs).first()
if instance:
return instance
else:
instance = model(**kwargs)
session.add(instance)
session.commit()
return instance
With this, I'm able to get_or_create any object of my model.
Suppose my model object is :
class Country(Base):
__tablename__ = 'countries'
id = Column(Integer, primary_key=True)
name = Column(String, unique=True)
To get or create my object I write :
myCountry = get_or_create(session, Country, name=countryName)
That's basically the way to do it, there is no shortcut readily available AFAIK.
You could generalize it ofcourse:
def get_or_create(session, model, defaults=None, **kwargs):
instance = session.query(model).filter_by(**kwargs).one_or_none()
if instance:
return instance, False
else:
params = {k: v for k, v in kwargs.items() if not isinstance(v, ClauseElement)}
params.update(defaults or {})
instance = model(**params)
try:
session.add(instance)
session.commit()
except Exception: # The actual exception depends on the specific database so we catch all exceptions. This is similar to the official documentation: https://docs.sqlalchemy.org/en/latest/orm/session_transaction.html
session.rollback()
instance = session.query(model).filter_by(**kwargs).one()
return instance, False
else:
return instance, True
2020 update (Python 3.9+ ONLY)
Here is a cleaner version with Python 3.9's the new dict union operator (|=)
def get_or_create(session, model, defaults=None, **kwargs):
instance = session.query(model).filter_by(**kwargs).one_or_none()
if instance:
return instance, False
else:
kwargs |= defaults or {}
instance = model(**kwargs)
try:
session.add(instance)
session.commit()
except Exception: # The actual exception depends on the specific database so we catch all exceptions. This is similar to the official documentation: https://docs.sqlalchemy.org/en/latest/orm/session_transaction.html
session.rollback()
instance = session.query(model).filter_by(**kwargs).one()
return instance, False
else:
return instance, True
Note:
Similar to the Django version this will catch duplicate key constraints and similar errors. If your get or create is not guaranteed to return a single result it can still result in race conditions.
To alleviate some of that issue you would need to add another one_or_none() style fetch right after the session.commit(). This still is no 100% guarantee against race conditions unless you also use a with_for_update() or serializable transaction mode.
I've been playing with this problem and have ended up with a fairly robust solution:
def get_one_or_create(session,
model,
create_method='',
create_method_kwargs=None,
**kwargs):
try:
return session.query(model).filter_by(**kwargs).one(), False
except NoResultFound:
kwargs.update(create_method_kwargs or {})
created = getattr(model, create_method, model)(**kwargs)
try:
session.add(created)
session.flush()
return created, True
except IntegrityError:
session.rollback()
return session.query(model).filter_by(**kwargs).one(), False
I just wrote a fairly expansive blog post on all the details, but a few quite ideas of why I used this.
It unpacks to a tuple that tells you if the object existed or not. This can often be useful in your workflow.
The function gives the ability to work with #classmethod decorated creator functions (and attributes specific to them).
The solution protects against Race Conditions when you have more than one process connected to the datastore.
EDIT: I've changed session.commit() to session.flush() as explained in this blog post. Note that these decisions are specific to the datastore used (Postgres in this case).
EDIT 2: I’ve updated using a {} as a default value in the function as this is typical Python gotcha. Thanks for the comment, Nigel! If your curious about this gotcha, check out this StackOverflow question and this blog post.
A modified version of erik's excellent answer
def get_one_or_create(session,
model,
create_method='',
create_method_kwargs=None,
**kwargs):
try:
return session.query(model).filter_by(**kwargs).one(), True
except NoResultFound:
kwargs.update(create_method_kwargs or {})
try:
with session.begin_nested():
created = getattr(model, create_method, model)(**kwargs)
session.add(created)
return created, False
except IntegrityError:
return session.query(model).filter_by(**kwargs).one(), True
Use a nested transaction to only roll back the addition of the new item instead of rolling back everything (See this answer to use nested transactions with SQLite)
Move create_method. If the created object has relations and it is assigned members through those relations, it is automatically added to the session. E.g. create a book, which has user_id and user as corresponding relationship, then doing book.user=<user object> inside of create_method will add book to the session. This means that create_method must be inside with to benefit from an eventual rollback. Note that begin_nested automatically triggers a flush.
Note that if using MySQL, the transaction isolation level must be set to READ COMMITTED rather than REPEATABLE READ for this to work. Django's get_or_create (and here) uses the same stratagem, see also the Django documentation.
This SQLALchemy recipe does the job nice and elegant.
The first thing to do is to define a function that is given a Session to work with, and associates a dictionary with the Session() which keeps track of current unique keys.
def _unique(session, cls, hashfunc, queryfunc, constructor, arg, kw):
cache = getattr(session, '_unique_cache', None)
if cache is None:
session._unique_cache = cache = {}
key = (cls, hashfunc(*arg, **kw))
if key in cache:
return cache[key]
else:
with session.no_autoflush:
q = session.query(cls)
q = queryfunc(q, *arg, **kw)
obj = q.first()
if not obj:
obj = constructor(*arg, **kw)
session.add(obj)
cache[key] = obj
return obj
An example of utilizing this function would be in a mixin:
class UniqueMixin(object):
#classmethod
def unique_hash(cls, *arg, **kw):
raise NotImplementedError()
#classmethod
def unique_filter(cls, query, *arg, **kw):
raise NotImplementedError()
#classmethod
def as_unique(cls, session, *arg, **kw):
return _unique(
session,
cls,
cls.unique_hash,
cls.unique_filter,
cls,
arg, kw
)
And finally creating the unique get_or_create model:
from sqlalchemy import Column, Integer, String, create_engine
from sqlalchemy.orm import sessionmaker
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
engine = create_engine('sqlite://', echo=True)
Session = sessionmaker(bind=engine)
class Widget(UniqueMixin, Base):
__tablename__ = 'widget'
id = Column(Integer, primary_key=True)
name = Column(String, unique=True, nullable=False)
#classmethod
def unique_hash(cls, name):
return name
#classmethod
def unique_filter(cls, query, name):
return query.filter(Widget.name == name)
Base.metadata.create_all(engine)
session = Session()
w1, w2, w3 = Widget.as_unique(session, name='w1'), \
Widget.as_unique(session, name='w2'), \
Widget.as_unique(session, name='w3')
w1b = Widget.as_unique(session, name='w1')
assert w1 is w1b
assert w2 is not w3
assert w2 is not w1
session.commit()
The recipe goes deeper into the idea and provides different approaches but I've used this one with great success.
The closest semantically is probably:
def get_or_create(model, **kwargs):
"""SqlAlchemy implementation of Django's get_or_create.
"""
session = Session()
instance = session.query(model).filter_by(**kwargs).first()
if instance:
return instance, False
else:
instance = model(**kwargs)
session.add(instance)
session.commit()
return instance, True
not sure how kosher it is to rely on a globally defined Session in sqlalchemy, but the Django version doesn't take a connection so...
The tuple returned contains the instance and a boolean indicating if the instance was created (i.e. it's False if we read the instance from the db).
Django's get_or_create is often used to make sure that global data is available, so I'm committing at the earliest point possible.
I slightly simplified #Kevin. solution to avoid wrapping the whole function in an if/else statement. This way there's only one return, which I find cleaner:
def get_or_create(session, model, **kwargs):
instance = session.query(model).filter_by(**kwargs).first()
if not instance:
instance = model(**kwargs)
session.add(instance)
return instance
There is a Python package that has #erik's solution as well as a version of update_or_create(). https://github.com/enricobarzetti/sqlalchemy_get_or_create
Depending on the isolation level you adopted, none of the above solutions would work.
The best solution I have found is a RAW SQL in the following form:
INSERT INTO table(f1, f2, unique_f3)
SELECT 'v1', 'v2', 'v3'
WHERE NOT EXISTS (SELECT 1 FROM table WHERE f3 = 'v3')
This is transactionally safe whatever the isolation level and the degree of parallelism are.
Beware: in order to make it efficient, it would be wise to have an INDEX for the unique column.
One problem I regularly encounter is when a field has a max length (say, STRING(40)) and you'd like to perform a get or create with a string of large length, the above solutions will fail.
Building off of the above solutions, here's my approach:
from sqlalchemy import Column, String
def get_or_create(self, add=True, flush=True, commit=False, **kwargs):
"""
Get the an entity based on the kwargs or create an entity with those kwargs.
Params:
add: (default True) should the instance be added to the session?
flush: (default True) flush the instance to the session?
commit: (default False) commit the session?
kwargs: key, value pairs of parameters to lookup/create.
Ex: SocialPlatform.get_or_create(**{'name':'facebook'})
returns --> existing record or, will create a new record
---------
NOTE: I like to add this as a classmethod in the base class of my tables, so that
all data models inherit the base class --> functionality is transmitted across
all orm defined models.
"""
# Truncate values if necessary
for key, value in kwargs.items():
# Only use strings
if not isinstance(value, str):
continue
# Only use if it's a column
my_col = getattr(self.__table__.columns, key)
if not isinstance(my_col, Column):
continue
# Skip non strings again here
if not isinstance(my_col.type, String):
continue
# Get the max length
max_len = my_col.type.length
if value and max_len and len(value) > max_len:
# Update the value
value = value[:max_len]
kwargs[key] = value
# -------------------------------------------------
# Make the query...
instance = session.query(self).filter_by(**kwargs).first()
if instance:
return instance
else:
# Max length isn't accounted for here.
# The assumption is that auto-truncation will happen on the child-model
# Or directtly in the db
instance = self(**kwargs)
# You'll usually want to add to the session
if add:
session.add(instance)
# Navigate these with caution
if add and commit:
try:
session.commit()
except IntegrityError:
session.rollback()
elif add and flush:
session.flush()
return instance