Background:
I scrape data from 2 sources for upcoming properties for sale, lets call one SaleAnnouncement, and the other SellerMaintainedData. They share many of the same field names (although some data can only be found in one and not the other). If an item is coming up for sale, there is guaranteed to be a SaleAnnouncement, but not necessarily SellerMaintainedData. In fact only about 10% of the "sellers" maintain there own site with relevant data. However those that do, always have more information and that data is more up to date than the data in the announcement. Also, the "announcement" is free form text which needs to go through several processing steps before the relevant data is extracted and as such, the model has some fields to store data in intermediate steps of processing (part of the reason I opted for 2 models as opposed to combining them into 1), while the "seller" data is scraped in a neat tabular format.
Problem:
I would ultimately like to combine them into one SaleItem and have implemented a model which is related to the previous 2 models and relies heavily on properties to prioritize which model the data comes from. Something like:
#property
def sale_datetime(self):
if self.sellermaintaineddata and self.sellermaintaineddata.sale_datetime:
return self.trusteeinfo.sale_datetime
else:
return self.latest_announcement and self.latest_announcement.sale_datetime
However I obviously won't be able to query those fields, which would be my end goal when listing upcoming sales. I have been suggested a solution which involves creating a custom manager which overrides the filter/exclude methods, which sounds promising but I would have to duplicate all the property field logic in the model manager.
Summary (for clarity)
I have:
class SourceA(Model):
sale_datetime = ...
address = ...
parcel_number = ...
# other attrs...
class SourceB(Model):
sale_datetime = ...
address = ...
# no parcel number here
# other attrs...
I want:
class Combined(Model):
sale_datetime = # from sourceB if sourceB else from sourceA
...
I want a unified model where common fields between SourceA and SourceB are prioritized so that if SourceB exists it derives the value of that field from SourceB or else it comes from SourceA. I would also like to query those fields so maybe using properties is not the best way...
Question
Is there a better way, should I consider restructuring my models (possibly combining those 2), or is the custom manager solution the way to go?
I would suggest another solution. What about using inheritance? You could create base class that would be abstract (https://docs.djangoproject.com/en/1.9/topics/db/models/#abstract-base-classes). You can put all common fields there and then create separate model for SaleAnnouncement and SellerMaintainedData. Since both of them will inherit from your base model, you'll have to define fields only specific for the certain model.
Related
For some contrived reason I have two QuerySets which match up in row-order, but don't have a shared foreign key, so a join is not possible. I don't have the ability to add a key here, so I'd like to do a "hstack" of the two results and display as a table. This is easy with jinja templating, but I'd like to use the convenience functions of tables2 (e.g. sorting, etc) and I would like to still retain the ability to do foreign key traversal on each queryset.
Equivalently, consider providing a QuerySet and a list of external data that is the result of an operation on that QuerySet.
qs = ModelA.objects.filter( ... ) # queryset
ext_data = get_metadata_from_elsewhere(qs) # returns a list of dict
# len(qs) == len(ext_data)
For example, with two models I can create a Mixin:
class ModelATable(tables.Table):
class Meta:
model = ModelA
class ModelBTable(ModelATable, tables.Table):
class Meta:
model = ModelB
Which produces a rendered table with the fields from both models. If I supply ModelBTable(query_model_b) then only those fields are displayed as expected, and similarly for ModelBTable(query_model_a). How do I provide both query_model_a and query_model_b?
Also if there's an easy way to do hstack(query_a, query_b) then that seems like it'd be easier. Providing a dictionary of the combined results isn't great because I lose access to foreign keys, but I suppose I could add some logic to do that while generating the merged dictionary? But it's nice that tables2 automatically infers things based on the model field type, and I'd lose that.
I assume internally tables.Table just iterates through the provided data and tries to access by key. So I think I need to provide a data object that can resolve names from either model?
EDIT: Seems like defining the columns to be returned and providing a custom accessor might do the job, but it seems like overkill and I don't know which functions ought to be overridden (resolve, at least).
I'm trying to learn Django from a background of coding the database schema directly myself. I want to understand how I should be effectively using the database abstraction tools to normalize.
As a contrived example, let's say I have a conversation that can ask questions on 3 subjects, and each question is complicated enough to warrant its own Class.
Class Conversation(models.Model):
partner = models.CharField()
Class Weather_q(models.Model):
#stuff
Class Health_q(models.Model):
#stuff
Class Family_q(models.Model):
#stuff
So let's say I want to have 2 conversations:
Conversation 1 with Bob: ask two different weather questions and one question about his health
Conversation 2 with Alice: ask about the weather and her family
Usually, I would code myself a normalization table for this:
INSERT INTO Conversation (partner) values ("Bob", "Alice"); --primary keys = 1 and 2
INSERT INTO NormalizationTable (fk_Conversation, fk_Weather_q, fk_Health_q, fk_Family_q) VALUES
(1,1,0,0), -- Bob weather#1
(1,2,0,0), -- Bob weather#2
(1,0,1,0), -- Bob health#1
(2,1,0,0), -- Alice weather#1
(2,0,0,1); -- Alice family#1
Do I need to explicitly create this normalization table or is that discouraged?
Class NormalizationTable(models.Model):
fk_Conversation = models.ForeignKey(Conversation)
fk_Weather_q = models.ForeignKey(Weather)
fk_Health_q = models.ForeignKey(Health)
fk_Family_q = models.ForeignKey(Family)
Then I then wanted to execute the conversations. I wrote a view like this (skipping exception catching and logic to iterate through multiple questions per conversation):
from myapp.models import Conversation, Weather_q, Health_q, Family_q
def converse(request):
#get this conversation's pk
#assuming "mypartner" is provided by the URL dispatcher
conversation = Conversation.objects.filter(partner=mypartner)[0]
#get the relevant row of the NormalizationTable
questions = NormalizationTable.objects.filter(fk_Conversation=conversation)[0]
for question in questions:
if question.fk_Weather_q:
return render("weather.html", Weather_q.objects.filter(pk=fk_Weather_q)[0])
if question.fk_Health_q:
return render("health.html", Health_q.objects.filter(pk=fk_Health_q)[0])
if question.fk_Family_q:
return render("family.html", Family_q.objects.filter(pk=fk_Family_q)[0])
Considered holistically, is this the "Django" way to solve this kind of normalization problem (N objects associated with a container object)? Can I make better use of Django's inbuilt ORM or other tools?
Leaving aside "normalization tables" (the term is unfamiliar to me), this is what I think is a "djangish" way of solving your problem. Please note that I went with your statement "each question is complicated enough to warrant its own Class". For me this means that every type of question necessitate its own unique fields and methods. Otherwise I would create a single Question model connected to a Category model by a ForeignKey.
class Partner(models.Model):
name = models.CharField()
class Question(models.Model):
# Fields and methods common to all kinds of questions
partner = models.ForeignKey(Partner)
label = models.CharField() # example field
class WeatherQuestion(Question):
# Fields and methods for weather questions only
class HealthQuestion(Question):
# Fields and methods for health questions only
class FamilyQuestion(Question):
# Fields and methods for family questions only
This way you would have a base Question model for all the fields and methods common to all questions, and a bunch of child models for describing different kinds of questions. There is an implicit relation between base model and its child models, maintained by Django. This gives you an ability to create a single queryset with different questions, no matter their type. Items in this queryset are of Question type by default, but can be converted to a particular question type by accessing a special attribute (for example a healthquestion attribute for HealtQuestions). This is described in detail in the "Multi-table model inheritance" section of Django documentation.
Then in a view you can get a list of (different types of) questions and then detect their particular type:
from myapp.models import Question
def converse(request, partner_id):
question = Question.objects.filter(partner=partner_id).first()
# Detect question type
question_type = "other"
question_obj = question
# in real life the list of types below would probably live in the settings
for current_type in ['weather', 'health', 'family']:
if hasattr(question, current_type + 'question'):
question_type = current_type
question_obj = getattr(question, current_type + 'question')
break
return render(
"questions/{}.html".format(question_type),
{'question': question_obj}
)
The code for detecting question type is quite ugly and complicated. You could make it much simpler and more generic using the InheritanceManager from django-model-utils package. You would need to install the package and add the line to the Question model:
objects = InheritanceManager()
Then the view would then look something like this:
from myapp.models import Question
def converse(request, partner_id):
question = Question.objects.filter(partner=partner_id).select_subclasses().first()
question_type = question._meta.object_name.lower()
return render(
"questions/{}.html".format(question_type),
{'question': question}
)
Both views select only a single question - the first one. That's how the view in your example behaved, so I went with it. You could easily convert those examples to return a list of questions (of different types).
I'm not familiar with the term normalization table, but I see what you're trying to do.
What you've described is not, in my opinion, a very satisfactory way to model a database. The simplest approach would be to make all questions part of the same table, with a "type" field, and maybe some other optional fields that vary between the types. In that case, this becomes very simple in Django.
But, OK, you said "let's say... each question is complicated enough to warrant its own class." Django does have a solution for that, which is generic relations. It would look something like this:
class ConversationQuestion(models.Model):
conversation = models.ForeignKey(Conversation)
content_type = models.ForeignKey(ContentType)
question_id = models.PositiveIntegerField()
question = GenericForeignKey('content_type', 'question_id')
# you can use prefetch_related("question") for efficiency
cqs = ConversationQuestion.objects.filter(conversation=conversation)
for cq in cqs:
# do something with the question
# you can look at the content_type if, as above, you need to choose
# a separate template for each type.
print(cq.question)
Because it's part of Django, you get some (but not total) support in terms of the admin, forms, etc.
Or you could do what you've done above, but, as you noticed, it's ugly and doesn't seem to capture the advantages of working with an ORM.
I'd like to create a directed graph in Django, but each node could be a separate model, with separate fields, etc.
Here's what I've got so far:
from bannergraph.apps.banners.models import *
class Node(models.Model):
uuid = UUIDField(db_index=True, auto=True)
class Meta:
abstract = True
class FirstNode(Node):
field_name = models.CharField(max_length=100)
next_node = UUIDField()
class SecondNode(Node):
is_something = models.BooleanField(default=False)
first_choice = UUIDField()
second_choice = UUIDField()
(obviously FirstNode and SecondNode are placeholders for the more domain-specific models, but hopefully you get the point.)
So what I'd like to do is query all the subclasses at once, returning all of the ones that match. I'm not quite sure how to do this efficiently.
Things I've tried:
Iterating over the subclasses with queries - I don't like this, as it could get quite heavy with the number of queries.
Making Node concrete. Apparently I have to still check for each subclass, which goes back to #1.
Things I've considered:
Making Node the class, and sticking a JSON blob in it. I don't like this.
Storing pointers in an external table or system. This would mean 2 queries per UUID, where I'd ideally want to have 1, but it would probably do OK in a pinch.
So, am I approaching this wrong, or forgetting about some neat feature of Django? I'd rather not use a schemaless DB if I don't have to (the Django admin is almost essential for this project). Any ideas?
The InheritanceManager from django-model-utils is what you are looking for.
You can iterate over all your Nodes with:
nodes = Node.objects.filter(foo="bar").select_subclasses()
for node in nodes:
#logic
I'm building a Django site. I need to model many different product categories such as TV, laptops, women's apparel, men's shoes, etc.
Since different product categories have different product attributes, each category has its own separate Model: TV, Laptop, WomensApparel, MensShoes, etc.
And for each Model I created a ModelForm. Hence I have TVForm, LaptopForm, WomensApparelForm, MensShoesForm, etc.
Users can enter product details by selecting a product category through multi-level drop-down boxes. Once a user has selected a product category, I need to display the corresponding product form.
The obvious way to do this is to use a giant if-elif structure:
# category is the product category selected by the user
if category == "TV":
form = TVForm()
elif category == "Laptop":
form = LaptopForm()
elif category == "WomensApparel":
form = WomensApparelForm()
...
Unfortunately there could be hundreds if not more of categories. So the above method is going to be error-prone and tedious.
Is there any way I could use the value of the variable category to directly select and initialize the appropriate ModelForm without resorting to a giant if-elif statement?
Something like:
# This doesn't work
model_form_name = category + "Form"
form = model_form_name()
Is there any way to do this?
If all your *Form classes are in the one module (let's call it forms), you can do this:
import forms
form = getattr(forms, category + "Form")()
(Obviously, add whatever verification is necessary, such as catching AttributeError. Security-wise, if you are using a named module rather than the global namespace, it's that little bit harder for someone to inject a new *Form class.)
One simple way to do this is to maintain a dictionary of category names to form classes. For e.g.
categories_and_classes = dict(TV = TVForm, Laptop = LaptopForm, ...)
And then you can use the category to look up the form class:
form = categories_and_classes.get(category, DefaultForm)
Alternately you can use convention, as #Zooba said in his answer. This would work if your forms are uniformly named, say <category name> + Form.
Sounds like what you need is a mapping, or dictionary in Python. For example, create a dictionary that maps model category names to ModelForm classes. You could have another that maps them to Model classes. Either way you can map the string with the Model name in it to whatever you want.
A more "object-oriented" approach would be to just use the a Model class dictionary and add (possibly static) method(s) to each one which return or do what you need done, such as return the appropriate ModelForm. I mean something like this:
class TV:
#staticmethod
def getform():
return TVForm
...
class Laptop:
#staticmethod
def getform():
return LaptopForm
...
class WomensApparel:
...etc...
Models = { 'TV':TV, 'Laptop':Laptop, 'WomensApparel':WomensApparel, ...etc }
form = Models[category].getform()
With these two techniques, you won't need to write those kinds of giant if-elif structures -- and if you ever start to, it's a sign it's time to re-think your design.
I've got two models. One represents a piece of equipment, the other represents a possible attribute the equipment has. Semantically, this might look like:
Equipment: tractor, Attributes: wheels, towing
Equipment: lawnmower, Attributes: wheels, blades
Equipment: hedgetrimmer, Attributes: blades
I want to make queries like,
wheels = Attributes.objects.get(name='wheels')
blades = Attributes.objects.get(name='blades')
Equipment.objects.filter(has_attribute=wheels) \
.exclude(has_attribute=blades)
How can I create Django models to do this?
This seems simple, but I'm just too dense to see the right solution.
One solution that popped into my head is to encode the list of Attribute IDs in an integer list like |109|14|3 and test for attributes using Equipment.objects.filter(attributes_contains='|%d|' % id) -- but this seems really wrong.
Your second example is pretty close, but you need to understand how the QuerySet API works across relationships (i.e. joins).
class Attribute(models.Model):
name = models.CharField(max_length=20)
class Equipment(models.Model):
name = models.CharField(max_length=20)
attributes = models.ManyToManyField(Attribute)
equips = Equipment.objects.filter(
attributes__name='wheels').exclude(attributes__name='blades')
You can use Q objects in your QuerySet to do more interesting queries.
And keep in mind you can always dump the SQL from a QuerySet like this:
print equips.query.as_sql()
Sometimes you'll want to see the exact SQL being generated to make sure you're using the API correctly.