Django Many to Many Relationship Add Not Working - python

I'm using Django's ManyToManyField for one of my models.
class Requirement(models.Model):
name = models.CharField(max_length=200)
class Course(models.Model):
requirements = models.ManyToManyField(Requirement)
I want to be able to assign requirements for my classes, so to do that, I try the following: I get a class, course, that is already saved or that I have just saved, and I run the following:
c = Course.objects.get(title="STACK 100")
req = Requirement.objects.get(name="XYZ")
c.requirements.add(req)
While this works when I do it through the Django manage.py shell, it does not work when I do it programatically in a script. I work with other models in this script and that all works fine. And I even know it successfully retrieves the current course and the requirement as I check both. I can't figure out what the problem is!
EDIT:
What I mean by not working is that, the requirements field of the course remains empty. For example, if i do c.requirements.all(), I'll get an empty list. However, if I do this approach through the shell, the list will be populated. The script is a crawler that uses BeautifulSoup to crawl a website. I try to add requirements to courses in the following function:
def create_model_object(self, course_dict, req, sem):
semester = Semester.objects.get(season=sem)
#Checks if the course already exists in the database
existing_matches = Course.objects.filter(number=course_dict["number"])
if len(existing_matches) > 0:
existing_course = existing_matches[0]
if sem=="spring":
existing_course.spring = semester
else:
existing_course.fall = semester
existing_course.save()
c = existing_course
#Creates a new Course in the database
else:
if sem == "spring":
new_course = Course(title=course_dict["title"],
spring=semester)
else:
new_course = Course(title=course_dict["title"],
fall=semester)
new_course.save()
c = new_course
curr_req = Requirement.objects.get(name=req)
c.requirements.add(curr_req)
print(c.id)
EDIT 2:
After stepping into the function, this is what I found:
def __get__(self, instance, instance_type=None):
if instance is None:
return self
rel_model = self.related.related_model
manager = self.related_manager_cls(
model=rel_model,
query_field_name=self.related.field.name,
prefetch_cache_name=self.related.field.related_query_name(),
instance=instance,
symmetrical=False,
source_field_name=self.related.field.m2m_reverse_field_name(),
target_field_name=self.related.field.m2m_field_name(),
reverse=True,
through=self.related.field.rel.through,
)
return manager
And according to my debugger, manager is of type planner(my project name).Course.None.

Verify the last element in the sequence of the joint table and make the update. Django does not throw an error in case the pk already exists.
ALTER SEQUENCE schema.table_id_seq RESTART <last_number + 1>;

I think you need to call c.save() after c.requirements.add(curr_req)

Related

Factory-boy / Django - Factory instance not reflecting changes to model instance

I am writing tests for a website I'm working on and I'm representing the models with factoryboy Factory objects.
However, I've run into some behavior I found somewhat confusing and I was wondering if anyone here would be so kind to explain it to me
I'm running a test that contains the following model:
STATUS = (
('CALCULATING'),
('PENDING'),
('BUSY'),
('SUCCESS'),
('FAILED')
)
class SchoolImport(models.Model):
date = models.DateTimeField(auto_now_add=True)
status = models.CharField(
verbose_name=_('status'), choices=STATUS,
max_length=50, default='CALCULATING'
)
For which I've created the following factory. As you can see the status is set to its default value, which I found more realistic than having a randomly selected value
class SchoolImportFactory(factory.DjangoModelFactory):
class Meta:
model = models.SchoolImport
status = 'CALCULATING'
school = factory.SubFactory(SchoolFactory)
#factory.lazy_attribute
def date(self):
return timezone.now() - datetime.timedelta(days=10)
Below you'll see both a (simplified) version of the function that is being tested, as well as the test itself. (I've currently commented out all other code on my laptop, so the function that you see below is an accurate representation)
The gist of it is that the function receives an id value that it will use to fetch an SchoolImport object from the database and change its status. The function will be run in celery and thus returns nothing.
When I run this test through the debugger I can see that the value is changed correctly. However, when the test runs its final assertion it fails as self.school_import.status is still equal to CALCULATING.
#app.utils.py
def process_template_objects(school_import_pk):
school_import = models.SchoolImport.objects.get(id=import_file_pk)
school_import.status = 'BUSY'
school_import.save()
#app.tests.test_utils.py
class Test_process_template_objects_function(TestCase):
def setUp(self):
self.school = SchoolFactory()
self.school_import = SchoolImportFactory(
school=self.school
)
def test_function_alters_school_import_status(self):
self.assertEqual(
self.school_import.status, 'CALCULATING'
)
utils.process_template_objects(self.school_import.id)
self.assertNotEqual(
self.school_import.status, 'CALCULATING'
)
When I run this test through a debugger (with a breakpoint just before the failing assertion) and run SchoolImport.objects.get(id=self.school_import.id).status it does return the correct BUSY value.
So though the object being represented by the FactoryInstance is being updated correctly, the changes are not reflected in the factory instance itself.
Though I realize I'm probably doing something wrong here / encountering expected behavior, I was wondering how people who write tests using factoryboy fget around this behavior, or if perhaps there was a way to 'refresh' the factoryboy instance to reflect changes to the model instance.
The issue comes from the fact that, in process_template_objects, you work with a different instance of the SchoolImport object than the one in the test.
If you run:
a = models.SchoolImport.objects.get(pk=1)
b = models.SchoolImport.objects.get(pk=2)
assert a == b # True: both refer to the same object in the database
assert a is b # False: different Python objects, each with its own memory
a.status = 'SUCCESS'
a.save()
assert a.status == 'SUCCESS' # True: it was indeed changed in this Python object
assert b.status == 'SUCCESS' # False: the 'b' object hasn't seen the change
In order to fix this, you should refetch the instance from the database after calling process_template_objects:
utils.process_template_objects(self.school_import.id)
self.school_import.refresh_from_db()
See https://docs.djangoproject.com/en/2.2/ref/models/instances/#refreshing-objects-from-database for a more detailed explanation!
If you delete a field from a model instance, accessing it again reloads the value from the database.
obj = MyModel.objects.first()
del obj.field
obj.field # Loads the field from the database
See https://docs.djangoproject.com/en/2.2/ref/models/instances/#refreshing-objects-from-database

Django save behaving randomly

I have a Story model with a M2M relationship to some Resource objects. Some of the Resource objects are missing a name so I want to copy the title of the Story to the assigned Resource objects.
Here is my code:
from collector import models
from django.core.paginator import Paginator
paginator = Paginator(models.Story.objects.all(), 1000)
def fix_issues():
for page in range(1, paginator.num_pages + 1):
for story in paginator.page(page).object_list:
name_story = story.title
for r in story.resources.select_subclasses():
if r.name != name_story:
r.name = name_story
r.save()
if len(r.name) == 0:
print("Something went wrong: " + name_story)
print("done processing page %s out of %s" % (page, paginator.num_pages))
fix_issues()
I need to use a paginator because I'm dealing with a million objects. The weird part is that after calling fix_issues() about half of my resources that had no name, now have the correct name, while the other half still has no name. I can call fix_issues() again and again and every time more objects receive a name. This seems really weird to me, why would an object not be updated the first time but only the second time?
Additional information:
The "Something went wrong: " message is never printed.
I'm using select_subclasses from django-model-utils to iterate over all resources (any type).
The story.title is never empty.
No error message is printed, when I run these commands.
I did not override the save method of the Resource model (only the save method of the Story model).
I tried to use #transaction.atomic but the result was the same.
My Model:
class Resource(models.Model):
name = models.CharField(max_length=200)
# Important for retrieving the correct subtype.
objects = InheritanceManager()
def __str__(self):
return str(self.name)
class CustomResource(Resource):
homepage = models.CharField(max_length=3000, default="", blank=True, null=True)
class Story(models.Model):
url = models.URLField(max_length=3000)
resources = models.ManyToManyField(Resource)
popularity = models.FloatField()
def _update_popularity(self):
self.popularity = 3
def save(self, *args, **kwargs):
super(Story, self).save(*args, **kwargs)
self._update_popularity()
super(Story, self).save(*args, **kwargs)
Documentation for the select_subclasses:
http://django-model-utils.readthedocs.io/en/latest/managers.html#inheritancemanager
Further investigation:
I thought that maybe select_subclasses did not return all the objects. Right now every story has exactly one resource. So it was easy enough to check that select_subclasses always returns one item. This is the function I used:
def find_issues():
for page in range(1, paginator.num_pages + 1):
for story in paginator.page(page).object_list:
assert(len(story.resources.select_subclasses()) == 1)
print("done processing page %s out of %s" % (page, paginator.num_pages))
But again, this executes without any problems. So I don't thing the select_subclasses is to blame. I also checked if paginator.num_pages is right and it is. If i divide by 1000 (items per page) I get exactly the number of stories I have in my database.
I think I know what is happening:
The Paginator loads a Queryset and gives me the first n items. I process these and update some of the values. But for the next iteration the order of the items in the queryset changes (because I updated some of them and did not define an order). So I'm skipping over items that are now on the first page. I can avoid it by specifying an order (pk for example).
If you think I'm wrong, please let me know. Otherwise I will accept this as the correct answer. Thank you.

Caching in Django's ManyToManyField

I struggle with some caching issue inside Django. So far I've seen this issue only when running testsuite. The problem is that sometimes (it seems that this happens always on second invocation of the code), Django does not update it's cache or it becomes inconsistent.
The extracted code with some debugging is:
class Source(models.Model):
name = models.CharField(max_length = 50)
quality = models.IntegerField(default = 0)
class Reference(models.Model):
url = models.URLField()
source = models.ForeignKey(Source)
class Meta:
ordering = ['-source__quality']
class Issue(models.Model):
references = models.ManyToManyField(Reference)
master = models.ForeignKey(Reference, related_name = 'mastered_issue_set')
def auto_create(instance):
issue = Issue.objects.create(master = instance)
print issue.references.count(), issue.references.all()
issue.references.add(instance)
print issue.references.count(), issue.references.all()
At first invocation I correctly get following output:
0 []
1 [<Reference: test>]
However in second call to to auto_create, Django thinks there is one reference, but it does not give it to me:
0 []
1 []
This behavior of course breaks further code. Any idea what can be going wrong here or at least how to debug it?
PS: It looks like ordering on Reference class is causing this. But it is still unclear to me why.
I was not able to reproduce with sqlite3. Could it be that the instance of Reference passed in is not saved? The following ran without a hiccup:
def auto_create(instance):
issue = Issue.objects.create(master = instance)
print issue.references.count(), issue.references.all()
assert issue.references.count()==0, "initial ref count is not null"
assert len(issue.references.all())==0, "initial ref array is not empty"
issue.references.add(instance)
print issue.references.count(), issue.references.all()
assert issue.references.count()==1, "ref count is not incremented"
assert len(issue.references.all())==1, "initial ref array is not populated"
def test_auto():
s = Source()
s.save()
r = Reference(source=s)
r.save()
auto_create(r)
In the end I've found what is causing this problem. It was my own caching code rather than Django's.
I had custom Source manager in place, which returned and cached some standard source:
class SourceManager(models.Manager):
url_source = None
def get_generic(self):
if self.url_source is None:
self.url_source, created = self.get_or_create(name = 'URL', quality = 0)
return self.url_source
class Source(models.Model):
name = models.CharField(max_length = 50)
quality = models.IntegerField(default = 0)
objects = SourceManager()
This works perfectly fine in the application - once source is created, the manager remembers it for it's existence as the sources do not change over their lifetime. However in tests they go away as the whole test is run in single transaction and then reverted.
What I find strange is that models.ForeignKey did not complain about getting non existing object, but the error appeared later, while sorting by source__quality as the underlaying JOIN SELECT could not find matching Source object.

App engine datastore query issue

I have a weired problem with couple of queries I am trying to run.
I have built a method which returns a tuple of result from the query-
def get_activeproducts():
query = Product.gql("WHERE active = True")
choices = []
for obj in query:
choices.append((str(obj.key()), obj.name))
return choices
The problem is, the result is same for each call. Even if products are deleted or changed to 'False' in the product attribute 'active'. The result will be refreshed only when I restart the sdk server. In production, it just doesnt change till I change versions.
I have seen similar issue with one more query where the query property is BooleanProperty.
Any idea on how this could be fixed?
EDIT:
I am using the method in a tipfy application. It is used to populate a select field in wtforms. 'choices' basically takes in a list of tuples (value, name) pair.
class InvoiceForm(Form):
product = SelectField('Product', choices=get_activeproducts())
I dont have any issue with editing. WHen I check it from the admin end, I can see that certain products are set to 'False'. And even if I empty(delete) the whole list of products, I get the same list I got the first time.
I am not using caching anywhere in the application.
Your class definition is getting cached by the App Engine runtime when an instance is started, with the default set to what it was when the instance started. To make the choices dynamic, you need to set them at runtime.
Example from the wtforms (which IIRC is what tipfy is using) docs; will need to be adjusted for App Engine queries:
class UserDetails(Form):
group_id = SelectField(u'Group', coerce=int)
def edit_user(request, id):
user = User.query.get(id)
form = UserDetails(request.POST, obj=user)
form.group_id.choices = [(g.id, g.name) for g in Group.query.order_by('name')]
when you create your form, the function is called once.
you can overload the form __init__.py function to do this cleanly
class InvoiceForm(Form):
product = SelectField(u'Group', choices=[])
def __init__(self, product_select, *args, **kwargs)
super(InvoiceForm, self).__init__(*args, **kwargs)
self.product.choices = select_dict
----
form = InvoiceForm(product_select=get_activeproducts())

Django admin list_display weirdly slow with foreign keys

Django 1.2.5
Python: 2.5.5
My admin list of a sports model has just gone really slow (5 minutes for 400 records). It was returning in a second or so until we got 400 games, 50 odd teams and 2 sports.
I have fixed it in an awful way so I'd like to see if anyone has seen this before. My app looks like this:
models:
Sport( models.Model )
name
Venue( models.Model )
name
Team( models.Model )
name
Fixture( models.Model )
date
sport = models.ForeignKey(Sport)
venue = models.ForeignKey(Venue)
TeamFixture( Fixture )
team1 = models.ForeignKey(Team, related_name="Team 1")
team2 = models.ForeignKey(Team, related_name="Team 2")
admin:
TeamFixture_ModelAdmin (ModelAdmin)
list_display = ('date','sport','venue','team1','team2',)
If I remove any foreign keys from list_display then it's quick. As soon as I add any foreign key then slow.
I fixed it by using non foreign keys but calculating them in the model init so this works:
models:
TeamFixture( Fixture )
team1 = models.ForeignKey(Team, related_name="Team 1")
team2 = models.ForeignKey(Team, related_name="Team 2")
sport_name = ""
venue_name = ""
team1_name = ""
team2_name = ""
def __init__(self, *args, **kwargs):
super(TeamFixture, self).__init__(*args, **kwargs)
self.sport_name = self.sport.name
self.venue_name = self.venue.name
self.team1_name = self.team1.name
self.team2_name = self.team2.name
admin:
TeamFixture_ModelAdmin (ModelAdmin)
list_display = ('date','sport_name','venue_name','team1_name','team2_name',)
Administration for all other models are fine with several thousand records at the moment and all views in the actual site is functioning fine.
It's driving me crazy. list_select_related is set to True, however adding a foreign key to User in the list_display generates one query per row in the admin, which makes the listing slow. Select_related is True, so the Django admin shouldn't call this query on each row.
What is going on ?
The first thing I would look for, are the database calls. If you shouldn't have done that already, install django-debug-toolbar. That awesome tool lets you inspect all sql queries done for the current request. I assume there are lots of them. If you look at them, you will know where to look for the problem.
One problem I myself have run into: When the __unicode__ method of a model uses a foreign key, that leads to one database hit per instance. I know of two ways to overcome this problem:
use select_related, which usually is your best bet.
make your __unicode__ return a static string and override the save method to update this string accordingly.
This is a very old problem with django admin and foreign keys. What happens here is that whenever you try to load an object it tries to get all the objects of that foreign key. So lets say you are trying to load a fixture with a some teams (say the number of teams is about 100), its going to keep on including all the 100 teams in one go. You can try to optimize them by using something called as raw_fields. What this would do is instead of having to calling everything at once, it will limit the number of calls and make sure that the call is only made when an event is triggered (i.e. when you are selecting a team).
If that seems a bit like a UI mess you can try using this class:
"""
For Raw_id_field to optimize django performance for many to many fields
"""
class RawIdWidget(ManyToManyRawIdWidget):
def label_for_value(self, value):
values = value.split(',')
str_values = []
key = self.rel.get_related_field().name
for v in values:
try:
obj = self.rel.to._default_manager.using(self.db).get(**{key: v})
x = smart_unicode(obj)
change_url = reverse(
"admin:%s_%s_change" % (obj._meta.app_label, obj._meta.object_name.lower()),
args=(obj.pk,)
)
str_values += ['<strong>%s</strong>' % (change_url, escape(x))]
except self.rel.to.DoesNotExist:
str_values += [u'No input or index in the db']
return u', '.join(str_values)
class ImproveRawId(admin.ModelAdmin):
raw_id_fields = ('created_by', 'updated_by')
def formfield_for_dbfield(self, db_field, **kwargs):
if db_field.name in self.raw_id_fields:
kwargs.pop("request", None)
type = db_field.rel.__class__.__name__
kwargs['widget'] = RawIdWidget(db_field.rel, site)
return db_field.formfield(**kwargs)
return super(ImproveRawId, self).formfield_for_dbfield(db_field, **kwargs)
Just make sure that you inherit the class properly. I am guessing something like TeamFixture_ModelAdmin (ImproveRawIdFieldsForm). This will most likely give you a pretty cool performance boost in your django admin.
I fixed my problem by setting list_select_related to the list of related model fields instead of just True

Categories

Resources