Django save behaving randomly - python

I have a Story model with a M2M relationship to some Resource objects. Some of the Resource objects are missing a name so I want to copy the title of the Story to the assigned Resource objects.
Here is my code:
from collector import models
from django.core.paginator import Paginator
paginator = Paginator(models.Story.objects.all(), 1000)
def fix_issues():
for page in range(1, paginator.num_pages + 1):
for story in paginator.page(page).object_list:
name_story = story.title
for r in story.resources.select_subclasses():
if r.name != name_story:
r.name = name_story
r.save()
if len(r.name) == 0:
print("Something went wrong: " + name_story)
print("done processing page %s out of %s" % (page, paginator.num_pages))
fix_issues()
I need to use a paginator because I'm dealing with a million objects. The weird part is that after calling fix_issues() about half of my resources that had no name, now have the correct name, while the other half still has no name. I can call fix_issues() again and again and every time more objects receive a name. This seems really weird to me, why would an object not be updated the first time but only the second time?
Additional information:
The "Something went wrong: " message is never printed.
I'm using select_subclasses from django-model-utils to iterate over all resources (any type).
The story.title is never empty.
No error message is printed, when I run these commands.
I did not override the save method of the Resource model (only the save method of the Story model).
I tried to use #transaction.atomic but the result was the same.
My Model:
class Resource(models.Model):
name = models.CharField(max_length=200)
# Important for retrieving the correct subtype.
objects = InheritanceManager()
def __str__(self):
return str(self.name)
class CustomResource(Resource):
homepage = models.CharField(max_length=3000, default="", blank=True, null=True)
class Story(models.Model):
url = models.URLField(max_length=3000)
resources = models.ManyToManyField(Resource)
popularity = models.FloatField()
def _update_popularity(self):
self.popularity = 3
def save(self, *args, **kwargs):
super(Story, self).save(*args, **kwargs)
self._update_popularity()
super(Story, self).save(*args, **kwargs)
Documentation for the select_subclasses:
http://django-model-utils.readthedocs.io/en/latest/managers.html#inheritancemanager
Further investigation:
I thought that maybe select_subclasses did not return all the objects. Right now every story has exactly one resource. So it was easy enough to check that select_subclasses always returns one item. This is the function I used:
def find_issues():
for page in range(1, paginator.num_pages + 1):
for story in paginator.page(page).object_list:
assert(len(story.resources.select_subclasses()) == 1)
print("done processing page %s out of %s" % (page, paginator.num_pages))
But again, this executes without any problems. So I don't thing the select_subclasses is to blame. I also checked if paginator.num_pages is right and it is. If i divide by 1000 (items per page) I get exactly the number of stories I have in my database.

I think I know what is happening:
The Paginator loads a Queryset and gives me the first n items. I process these and update some of the values. But for the next iteration the order of the items in the queryset changes (because I updated some of them and did not define an order). So I'm skipping over items that are now on the first page. I can avoid it by specifying an order (pk for example).
If you think I'm wrong, please let me know. Otherwise I will accept this as the correct answer. Thank you.

Related

How to resolve name error on variable in class

I am making my first model, and I'm creating an upload system which uploads to a folder with the name of the user uploading it.
For some reason, I get this error when I try to create an object from the model:
NameError at /admin/tracks/track/add/
name '_Track__user_name' is not defined
Here's my models.py
from django.core.exceptions import ValidationError
from django.db import models
from django.core.files.images import get_image_dimensions
# Create your models here.
class Track(models.Model):
user_name = "no_user"
def get_username():
user_name = "no_user"
if request.user.is_authenticated():
user_name = request.user.username
else:
user_name = "DELETE"
def generate_user_folder_tracks(instance, filename):
return "uploads/users/%s/tracks/%s" % (user_name, filename)
def is_mp3(value):
if not value.name.endswith('.mp3'):
raise ValidationError(u'You may only upload mp3 files for tracks!')
def generate_user_folder_art(instance, filename):
return "uploads/users/%s/art/%s" % (user_name, filename)
def is_square_png(self):
if not self.name.endswith('.png'):
raise ValidationError("You may only upload png files for album art!")
else:
w, h = get_image_dimensions(self)
if not h == w:
raise ValidationError("This picture is not square! Your picture must be equally wide as its height.")
else:
if not (h + w) >= 1000:
raise ValidationError("This picture is too small! The minimum dimensions are 500 by 500 pixels.")
return self
# Variables
track_type_choices = [
('ORG', 'Original'),
('RMX', 'Remix'),
('CLB', 'Collab'),
('LIV', 'Live'),
]
# Model Fields
name = models.CharField(max_length=100)
desc = models.TextField(max_length=7500)
track_type = models.CharField(max_length=3,
choices=track_type_choices,
default='ORG')
track_type_content = models.CharField(max_length=100,blank=True)
created = models.TimeField(auto_now=True,auto_now_add=False)
upload = models.FileField(upload_to=generate_user_folder_tracks,validators=[is_mp3])
albumart = models.ImageField(upload_to=generate_user_folder_art,validators=[is_square_png])
As you can see from the first line after the class is defined, there is clearly a variable called "user_name", and when using my upload functions, it is supposed to use this variable for the folder name.
I am very confused to why this is throwing an error, what am I doing wrong?
You have some serious problems with variable scope here. Just defining an attribute called "user_name" at the top of the class does not automatically give you access to it elsewhere in the class; you would need to access it via the class itself. Usually you do that through the self variable that is the first parameter to every method.
However, many of your methods do not even accept a self parameter, so they would give TypeError when they are called. On top of that, your user_name attribute is actually a class attribute, which would be shared by all instances of User - this would clearly be a bad thing. You should really make it a Django field, like the other attributes.
Finally, your scope issues worsen when you try and access request in one of those methods. Again, you can't access a variable unless it has been passed to that method (or is available in global scope, which the request is definitely not). So get_username cannot work at all.
I must say though that all that is irrelevant, as the error you get does not even match your code: you must have accessed Track.__user_name somewhere to get that error.
You do have a variable username, but its not a field which would mean that the query set it looks like you're creating won't find it
user_name = "no_user"
should be one of the following
user_name = models.CharField(default='no_user')
user = models.ForeignKey(settings.AUTH_USER_MODEL, null=True)
The only reason I've suggested a CharField here is incase you don't use some form of authorisation user model in your app. If you do, then you should use a foreign key to that model.

Django Many to Many Relationship Add Not Working

I'm using Django's ManyToManyField for one of my models.
class Requirement(models.Model):
name = models.CharField(max_length=200)
class Course(models.Model):
requirements = models.ManyToManyField(Requirement)
I want to be able to assign requirements for my classes, so to do that, I try the following: I get a class, course, that is already saved or that I have just saved, and I run the following:
c = Course.objects.get(title="STACK 100")
req = Requirement.objects.get(name="XYZ")
c.requirements.add(req)
While this works when I do it through the Django manage.py shell, it does not work when I do it programatically in a script. I work with other models in this script and that all works fine. And I even know it successfully retrieves the current course and the requirement as I check both. I can't figure out what the problem is!
EDIT:
What I mean by not working is that, the requirements field of the course remains empty. For example, if i do c.requirements.all(), I'll get an empty list. However, if I do this approach through the shell, the list will be populated. The script is a crawler that uses BeautifulSoup to crawl a website. I try to add requirements to courses in the following function:
def create_model_object(self, course_dict, req, sem):
semester = Semester.objects.get(season=sem)
#Checks if the course already exists in the database
existing_matches = Course.objects.filter(number=course_dict["number"])
if len(existing_matches) > 0:
existing_course = existing_matches[0]
if sem=="spring":
existing_course.spring = semester
else:
existing_course.fall = semester
existing_course.save()
c = existing_course
#Creates a new Course in the database
else:
if sem == "spring":
new_course = Course(title=course_dict["title"],
spring=semester)
else:
new_course = Course(title=course_dict["title"],
fall=semester)
new_course.save()
c = new_course
curr_req = Requirement.objects.get(name=req)
c.requirements.add(curr_req)
print(c.id)
EDIT 2:
After stepping into the function, this is what I found:
def __get__(self, instance, instance_type=None):
if instance is None:
return self
rel_model = self.related.related_model
manager = self.related_manager_cls(
model=rel_model,
query_field_name=self.related.field.name,
prefetch_cache_name=self.related.field.related_query_name(),
instance=instance,
symmetrical=False,
source_field_name=self.related.field.m2m_reverse_field_name(),
target_field_name=self.related.field.m2m_field_name(),
reverse=True,
through=self.related.field.rel.through,
)
return manager
And according to my debugger, manager is of type planner(my project name).Course.None.
Verify the last element in the sequence of the joint table and make the update. Django does not throw an error in case the pk already exists.
ALTER SEQUENCE schema.table_id_seq RESTART <last_number + 1>;
I think you need to call c.save() after c.requirements.add(curr_req)

Is there a sane way to simulate virtual inheritance in Django models?

I want to log actions made by users. In most OO languages, I would implement this via a LoggedAction class, having several child classes like LoginActionand LogoutAction. I could then iterate over a list of LoggedActions and get the specific child behaviour through virtual inheritance. This does not work using Django models however.
Example models.py:
class LoggedAction(models.Model):
user = models.ForeignKey(User)
timestamp = models.DateTimeField(auto_now_add=True)
def __unicode__(self):
return "%s: %s %s" % (unicode(self.timestamp), unicode(self.user), unicode(self.action()))
def action(self):
return ""
class LoginAction(LoggedAction):
def action(self):
return "logged in"
class LogoutAction(LoggedAction):
def action(self):
return "logged out"
Then I'd like to do [unicode(l) for l in LoggedAction.objects.all()] and get a list of messages like u'2012-02-18 18:47:09.105840: knatten logged in'.
As expected, this does not work, since what I get from all() is a list of LoggedAction objects having either a loginaction member or a logoutaction member. (The output is a list of messages like u'2012-02-18 18:47:09.105840: knatten, with no mention of the action.)
Is there a sane way to get the behaviour I'm after, or am I trying to apply the wrong paradigm here? (I guess I am, and that I should just have the specific action as a member in LoggedAction)
Yes, this is probably the wrong paradigm. It's easy to be misled by the object-relational mapper (ORM) - database tables don't really map all that well to objects, and this difference is known as the object-relational impedance mismatch.
What you actually need is to make action a field. This field can take a choices parameter which represents the possible values of that field - ie logged in or logged out:
class LoggedAction(models.Model):
ACTIONS = (
('I', 'logged in'),
('O', 'logged out')
)
user = models.ForeignKey(User)
timestamp = models.DateTimeField(auto_now_add=True)
action = models.CharField(max_length=1, choices=ACTIONS)
def __unicode__(self):
return u"%s: %s %s" % (self.timestamp, self.user, self.get_action_display())
Note that I've used arbitrary single-character strings to represent the actions, and the get_action_display() magic method to get the full description.
Have a look at InheritanceManager from django-model-utils. It allows you to get the concrete subclasses.

Django admin list_display weirdly slow with foreign keys

Django 1.2.5
Python: 2.5.5
My admin list of a sports model has just gone really slow (5 minutes for 400 records). It was returning in a second or so until we got 400 games, 50 odd teams and 2 sports.
I have fixed it in an awful way so I'd like to see if anyone has seen this before. My app looks like this:
models:
Sport( models.Model )
name
Venue( models.Model )
name
Team( models.Model )
name
Fixture( models.Model )
date
sport = models.ForeignKey(Sport)
venue = models.ForeignKey(Venue)
TeamFixture( Fixture )
team1 = models.ForeignKey(Team, related_name="Team 1")
team2 = models.ForeignKey(Team, related_name="Team 2")
admin:
TeamFixture_ModelAdmin (ModelAdmin)
list_display = ('date','sport','venue','team1','team2',)
If I remove any foreign keys from list_display then it's quick. As soon as I add any foreign key then slow.
I fixed it by using non foreign keys but calculating them in the model init so this works:
models:
TeamFixture( Fixture )
team1 = models.ForeignKey(Team, related_name="Team 1")
team2 = models.ForeignKey(Team, related_name="Team 2")
sport_name = ""
venue_name = ""
team1_name = ""
team2_name = ""
def __init__(self, *args, **kwargs):
super(TeamFixture, self).__init__(*args, **kwargs)
self.sport_name = self.sport.name
self.venue_name = self.venue.name
self.team1_name = self.team1.name
self.team2_name = self.team2.name
admin:
TeamFixture_ModelAdmin (ModelAdmin)
list_display = ('date','sport_name','venue_name','team1_name','team2_name',)
Administration for all other models are fine with several thousand records at the moment and all views in the actual site is functioning fine.
It's driving me crazy. list_select_related is set to True, however adding a foreign key to User in the list_display generates one query per row in the admin, which makes the listing slow. Select_related is True, so the Django admin shouldn't call this query on each row.
What is going on ?
The first thing I would look for, are the database calls. If you shouldn't have done that already, install django-debug-toolbar. That awesome tool lets you inspect all sql queries done for the current request. I assume there are lots of them. If you look at them, you will know where to look for the problem.
One problem I myself have run into: When the __unicode__ method of a model uses a foreign key, that leads to one database hit per instance. I know of two ways to overcome this problem:
use select_related, which usually is your best bet.
make your __unicode__ return a static string and override the save method to update this string accordingly.
This is a very old problem with django admin and foreign keys. What happens here is that whenever you try to load an object it tries to get all the objects of that foreign key. So lets say you are trying to load a fixture with a some teams (say the number of teams is about 100), its going to keep on including all the 100 teams in one go. You can try to optimize them by using something called as raw_fields. What this would do is instead of having to calling everything at once, it will limit the number of calls and make sure that the call is only made when an event is triggered (i.e. when you are selecting a team).
If that seems a bit like a UI mess you can try using this class:
"""
For Raw_id_field to optimize django performance for many to many fields
"""
class RawIdWidget(ManyToManyRawIdWidget):
def label_for_value(self, value):
values = value.split(',')
str_values = []
key = self.rel.get_related_field().name
for v in values:
try:
obj = self.rel.to._default_manager.using(self.db).get(**{key: v})
x = smart_unicode(obj)
change_url = reverse(
"admin:%s_%s_change" % (obj._meta.app_label, obj._meta.object_name.lower()),
args=(obj.pk,)
)
str_values += ['<strong>%s</strong>' % (change_url, escape(x))]
except self.rel.to.DoesNotExist:
str_values += [u'No input or index in the db']
return u', '.join(str_values)
class ImproveRawId(admin.ModelAdmin):
raw_id_fields = ('created_by', 'updated_by')
def formfield_for_dbfield(self, db_field, **kwargs):
if db_field.name in self.raw_id_fields:
kwargs.pop("request", None)
type = db_field.rel.__class__.__name__
kwargs['widget'] = RawIdWidget(db_field.rel, site)
return db_field.formfield(**kwargs)
return super(ImproveRawId, self).formfield_for_dbfield(db_field, **kwargs)
Just make sure that you inherit the class properly. I am guessing something like TeamFixture_ModelAdmin (ImproveRawIdFieldsForm). This will most likely give you a pretty cool performance boost in your django admin.
I fixed my problem by setting list_select_related to the list of related model fields instead of just True

Django, making a page activate for a fixed time

Greetings
I am hacking Django and trying to test something such as:
Like woot.com , I want to sell "an item per day", so only one item will be available for that day (say the default www.mysite.com will be redirected to that item),
Assume my urls for calling these items will be such: www.mysite.com/item/<number>
my model for item:
class Item(models.Model):
item_name = models.CharField(max_length=30)
price = models.FloatField()
content = models.TextField() #keeps all the html content
start_time = models.DateTimeField()
end_time = models.DateTimeField()
And my view for rendering this:
def results(request, item_id):
item = get_object_or_404(Item, pk=item_id)
now = datetime.now()
if item.start_time > now:
#render and return some "not started yet" error templete
elif item.end_time < now:
#render and return some "item selling ended" error templete
else:
# render the real templete for selling this item
What would be the efficient and clever model & templete for achieving this ?
It seems you've got the basics figured out, so I'm assuming you're asking for polishing suggestions... A few ideas in this vein:
I think I'd have a separate URL like /items/today/ for this, or perhaps just /today/.
You'll want to use the date components of datime.datetime.now() only. The whole thing is an object containing the current time specified to a microsecond's precision.
How about using a single base template for all three cases and inheriting from it to change a block containing either the button to click on when buying, the price etc., or a note saying that the item is not being sold yet / any more. Then people can still use the numbered URLs when saying things like See what I bought yesterday, you have to go to that site in an e-mail. ;-)
I have a photo of the day feature on my site. I have a model that represents today's photo, and a cron job runs a custom management command at midnight to update it with the next photo in the sequence (also a model). So all my view has to do is retrieve the current photo from the database.

Categories

Resources