I implemented the storage of one-to-many models through a view. The model structure is as follows:
A DagRun model
Many DagRunModel, which have a FK to a DagRun
Many DagModelParam, which have a FK to a DagRunModel
So as follows, I Create all the instances first, to make sure there are no errors, and only in the very end, I persist them using save(). But this returns
django.db.utils.IntegrityError: NOT NULL constraint failed:
[28/Jul/2017 11:56:12] "POST /task/run HTTP/1.1" 500 19464
And this is the code of how I create the models and persist them in the end
def task_run(request):
dag_task = None
dag_tasks = DagTask.objects.filter(dag_id=request.POST["task"])
if len(dag_tasks) > 0:
dag_task = dag_tasks[0]
else:
raise ValueError("Task name is not a valid dag id")
dag_run = DagRun(
dag_task=dag_task,
database=request.POST["database"],
table=request.POST["table"],
label=request.POST["label"],
features=request.POST["features"],
user=request.user,
date=timezone.now()
)
dag_params = []
dag_models = []
models = json.loads(request.POST["models"])
for model in models:
dag_run_model = DagRunModel(
dag_run=dag_run,
dag_model=model["name"]
)
dag_models.append(dag_run_model)
for param in model["params"]:
dag_param = DagRunParam(
dag_run_model=dag_run_model,
dag_param=param["name"],
param_value=param["value"]
)
dag_params.append(dag_param)
dag_run.save()
for dag_model in dag_models:
dag_model.save()
for dag_param in dag_params:
dag_param.save()
If I instead decide to save them as I create them, this code works fine. So it looks like Foreign Keys can only be used once models are persisted, and this might be risked if my models fail to be created later in the hierarchy.
Is there any safer approach?
You may want to use a transaction so that you can enforce an "everything gets saved or nothing does" type of behavior. This would be the safest approach in my opinion.
There really isn't beyond finding ways to simplify the relationships between your models.
A ForeignKey relationship can't be safely established until the primary key of the related object is set. Otherwise, you could potentially kiss data integrity goodbye.
If you are deeply concerned about the possibility of failure somewhere along the chain, remember that you have access to the objects you just saved, and add error handling that deletes the entire chain of new objects on a failure and throws a useful error pointing out where the chain failed.
Related
I have a foreign key relationship in my Django (v3) models:
class Example(models.Model):
title = models.CharField(max_length=200) # this is irrelevant for the question here
not_before = models.DateTimeField(auto_now_add=True)
...
class ExampleItem(models.Model):
myParent = models.ForeignKey(Example, on_delete=models.CASCADE)
execution_date = models.DateTimeField(auto_now_add=True)
....
Can I have code running/triggered whenever an ExampleItem is "added to the list of items in an Example instance"? What I would like to do is run some checks and, depending on the concrete Example instance possibly alter the ExampleItem before saving it.
To illustrate
Let's say the Example's class not_before date dictates that the ExampleItem's execution_date must not be before not_before I would like to check if the "to be saved" ExampleItem's execution_date violates this condition. If so, I would want to either change the execution_date to make it "valid" or throw an exception (whichever is easier). The same is true for a duplicate execution_date (i.e. if the respective Example already has an ExampleItem with the same execution_date).
So, in a view, I have code like the following:
def doit(request, example_id):
# get the relevant `Example` object
example = get_object_or_404(Example, pk=example_id)
# create a new `ExampleItem`
itm = ExampleItem()
# set the item's parent
itm.myParent = example # <- this should trigger my validation code!
itm.save() # <- (or this???)
The thing is, this view is not the only way to create new ExampleItems; I also have an API for example that can do the same (let alone that a user could potentially "add ExampleItems manually via REPL). Preferably the validation code must not be duplicated in all the places where new ExampleItems can be created.
I was looking into Signals (Django docu), specifically pre_save and post_save (of ExampleItem) but I think pre_save is too early while post_save is too late... Also m2m_changed looks interesting, but I do not have a many-to-many relationship.
What would be the best/correct way to handle these requirements? They seem to be rather common, I imagine. Do I have to restructure my model?
The obvious solution here is to put this code in the ExampleItem.save() method - just beware that Model.save() is not invoked by some queryset bulk operations.
Using signals handlers on your own app's models is actually an antipattern - the goal of signal is to allow for your app to hook into other app's lifecycle without having to change those other apps code.
Also (unrelated but), you can populate your newly created models instances directly via their initializers ie:
itm = ExampleItem(myParent=example)
itm.save()
and you can even save them directly:
# creates a new instance, populate it AND save it
itm = ExampleItem.objects.create(myParent=example)
This will still invoke your model's save method so it's safe for your use case.
This question has been asked before, but the answers there do not solve my problem.
I am using a legacy database, nothing can be changed
Here are my django models, with all but the relevant fields stripped off, obviously class meta has Managed=False in my actual code:
class AppCosts(models.Model):
id = models.CharField(primary_key=True)
cost = models.DecimalField()
class AppDefs(models.Model):
id = models.CharField(primary_key=True)
data = models.TextField()
appcost = models.OneToOneField(AppCosts, db_column='id')
class JobHistory(models.Model):
job_name = models.CharField(primary_key=True)
job_application = models.CharField()
appcost = models.OneToOneField(AppCosts, to_field='id', db_column='job_application')
app = models.OneToOneField(AppDefs, to_field='id', db_column='job_application')
The OneToOne fields work fine for querying, and I get the correct result using select_related()
But when I create a new record for the JobHistory table, when I call save(), I get:
DatabaseError: (1110, "Column 'job_application' specified twice")
I am using django 1.4 and I do not quite get how this OneToOneField works. I can't find any example where primary keys are named differently and has this particular semantics
I need the django model that would let me do this SQL:
select job_history.job_name, job_history.job_application, app_costs.cost from job_history, app_costs where job_history.job_application = app_costs.id;
You have defined appcost and app to have the same underlying database column, job_application, which is also the name of another existing field: so three fields share the same column. That makes no sense at all.
OneToOneFields are just foreign keys constrained to a single value on both ends. If you have foreign keys from JobHistory to AppCost and AppDef, then presumably you have actual columns in your database that contain those foreign keys. Those are the values you should be using for db_field for those fields, not "job_application".
Edit I'm glad you said you didn't design this schema, because it is pretty horrible: you won't have any foreign key constraints, for example, which makes referential integrity impossible. But never mind, we can actually achieve what you want, more or less.
There are various issues with that you have, but the main one is that you don't need the separate "job_application" field at all. That is, as I said earlier, the foreign key, so let it be that. Also note it should be an actual foreign key field, not a one-to-one, since there are many histories to one app.
One constraint that we can't achieve easily in Django is to have the same field acting as FK for two tables. But that doesn't really matter, since we can get to AppCosts via AppDefs.
So the models could just look like this:
class AppCosts(models.Model):
app = models.OneToOneField('AppDefs', primary_key=True, db_field='id')
cost = models.DecimalField()
class AppDefs(models.Model):
id = models.CharField(primary_key=True)
data = models.TextField()
class JobHistory(models.Model):
job_name = models.CharField(primary_key=True)
app = models.ForeignKey(AppDefs, db_column='job_application')
Note that I've moved the one-to-one between Costs and Defs onto AppCosts, since it seems to make sense to have the canonical ID in Defs.
Now, given a JobHistory instance, you can do history.app to get the app instance, history.app.cost to get the app cost, and use the history.app_id to get the underlying app ID from the job_application column.
If you wanted to reproduce that SQL output more exactly, something like this would now work:
JobHistory.objects.values_list('job_name', 'app_id', 'app__appcosts__cost')
In my Django app very often I need to do something similar to get_or_create(). E.g.,
User submits a tag. Need to see if
that tag already is in the database.
If not, create a new record for it. If
it is, just update the existing
record.
But looking into the doc for get_or_create() it looks like it's not threadsafe. Thread A checks and finds Record X does not exist. Then Thread B checks and finds that Record X does not exist. Now both Thread A and Thread B will create a new Record X.
This must be a very common situation. How do I handle it in a threadsafe way?
Since 2013 or so, get_or_create is atomic, so it handles concurrency nicely:
This method is atomic assuming correct usage, correct database
configuration, and correct behavior of the underlying database.
However, if uniqueness is not enforced at the database level for the
kwargs used in a get_or_create call (see unique or unique_together),
this method is prone to a race-condition which can result in multiple
rows with the same parameters being inserted simultaneously.
If you are using MySQL, be sure to use the READ COMMITTED isolation
level rather than REPEATABLE READ (the default), otherwise you may see
cases where get_or_create will raise an IntegrityError but the object
won’t appear in a subsequent get() call.
From: https://docs.djangoproject.com/en/dev/ref/models/querysets/#get-or-create
Here's an example of how you could do it:
Define a model with either unique=True:
class MyModel(models.Model):
slug = models.SlugField(max_length=255, unique=True)
name = models.CharField(max_length=255)
MyModel.objects.get_or_create(slug=<user_slug_here>, defaults={"name": <user_name_here>})
... or by using unique_togheter:
class MyModel(models.Model):
prefix = models.CharField(max_length=3)
slug = models.SlugField(max_length=255)
name = models.CharField(max_length=255)
class Meta:
unique_together = ("prefix", "slug")
MyModel.objects.get_or_create(prefix=<user_prefix_here>, slug=<user_slug_here>, defaults={"name": <user_name_here>})
Note how the non-unique fields are in the defaults dict, NOT among the unique fields in get_or_create. This will ensure your creates are atomic.
Here's how it's implemented in Django: https://github.com/django/django/blob/fd60e6c8878986a102f0125d9cdf61c717605cf1/django/db/models/query.py#L466 - Try creating an object, catch an eventual IntegrityError, and return the copy in that case. In other words: handle atomicity in the database.
This must be a very common situation. How do I handle it in a threadsafe way?
Yes.
The "standard" solution in SQL is to simply attempt to create the record. If it works, that's good. Keep going.
If an attempt to create a record gets a "duplicate" exception from the RDBMS, then do a SELECT and keep going.
Django, however, has an ORM layer, with it's own cache. So the logic is inverted to make the common case work directly and quickly and the uncommon case (the duplicate) raise a rare exception.
try transaction.commit_on_success decorator for callable where you are trying get_or_create(**kwargs)
"Use the commit_on_success decorator to use a single transaction for all the work done in a function.If the function returns successfully, then Django will commit all work done within the function at that point. If the function raises an exception, though, Django will roll back the transaction."
apart from it, in concurrent calls to get_or_create, both the threads try to get the object with argument passed to it (except for "defaults" arg which is a dict used during create call in case get() fails to retrieve any object). in case of failure both the threads try to create the object resulting in multiple duplicate objects unless some unique/unique together is implemented at database level with field(s) used in get()'s call.
it is similar to this post
How do I deal with this race condition in django?
So many years have passed, but nobody has written about threading.Lock. If you don't have the opportunity to make migrations for unique together, for legacy reasons, you can use locks or threading.Semaphore objects. Here is the pseudocode:
from concurrent.futures import ThreadPoolExecutor
from threading import Lock
_lock = Lock()
def get_staff(data: dict):
_lock.acquire()
try:
staff, created = MyModel.objects.get_or_create(**data)
return staff
finally:
_lock.release()
with ThreadPoolExecutor(max_workers=50) as pool:
pool.map(get_staff, get_list_of_some_data())
I have the following in my model:
class info(models.Model):
add = models.CharField(max_length=255)
name = models.CharField(max_length=255)
An in the views when i say
info_l = info.objects.filter(id=1)
logging.debug(info_l.name)
i get an error saying name doesnt exist at debug statement.
'QuerySet' object has no attribute 'name'
1.How can this be resolved.
2.Also how to query for only one field instead of selecting all like select name from info.
1. Selecting Single Items
It looks like you're trying to get a single object. Using filter will return a QuerySet object (as is happening in your code), which behaves more like a list (and, as you've noticed, lacks the name attribute).
You have two options here. First, you can just grab the first element:
info_l = info.objects.filter(id=1)[0]
You could also use the objects.get method instead, which will return a single object (and raise an exception if it doesn't exist):
info_l = info.objects.get(id=1)
Django has some pretty good documentation on QuerySets, and it may be worth taking a look at it:
Docs on using filters
QuerySet reference
2. Retrieving Specific Fields
Django provides the defer and only methods, which will let you choose specific fields from the database, rather than fetching everything at once. These don't actually prevent the fields from being read; rather, it loads them lazily. defer is an "opt-in" mode, which lets you specify what fields should be lazily loaded. only is "out-out" -- you call it, and only the fields you pass will by eagerly loaded.
So in your example, you'd want to do something like this:
info_l = info.objects.filter(id=1).only('name')[0]
Though with a model as simple as the example you give, I wouldn't worry much at all about limiting fields.
In django, I'm trying to do something like this:
# if form is valid ...
article = form.save(commit=False)
article.author = req.user
product_name = form.cleaned_data['product_name']
try:
article.product = Component.objects.get(name=product_name)
except:
article.product = Component(name=product_name)
article.save()
# do some more form processing ...
But then it tells me:
null value in column "product_id" violates not-null constraint
But I don't understand why this is a problem. When article.save() is called, it should be able the create the product then (and generate an id).
I can get around this problem by using this code in the except block:
product = Component(name=product_name)
product.save()
article.product = product
But the reason this concerns me is because if article.save() fails, it will already have created a new component/product. I want them to succeed or fail together.
Is there a nice way to get around this?
The way the Django ManyToManyField works is that it creates an extra table. So say you have two models, ModelA and ModelB. If you did...
ModelA.model_b = models.ManyToManyField(ModelB)
What Django actually does behind the scenes is it creates a table... app_modela_modelb with three columns: id, model_a_id, model_b_id.
Hold that thought in your mind. Regarding the saving of ModelB, Django does not assign it an ID until it's saved. You could technically manually assign it an ID and avoid this problem. It seems you're letting django handle that which is perfectly acceptable.
Django has a problem then doing the M2M. Why? If ModelB doesn't have an id yet, what goes in the model_b_id column on the M2M table? The error for null product_id is more than likely a null constraint error on the M2M field, not the ModelB record id.
If you would like them to "succeed together" or "fail together" perhaps it's time to look into transactions. You, for example, wrap the whole thing in a transaction, and do a rollback in the case of a partial failure. I haven't done a whole lot of work personally in this area so hopefully someone else will be of assistance on that topic.
You could get around this by using :
target_product, created_flag = Component.objects.get_or_create(name=product_name)
article.product = target_product
as I'm pretty sure get_or_create() will set the id of an object, if it has to create one.
Alternatively, if you don't mind empty FK relations on the Article table, you could add null=True to the definition.
There's little value in including a code snippet on transactions, as you should read the Django documentation to gain a good understanding.