How to handle 300 parameters in Django Model / Form? - python

I develop an app for creating products in online shop. Let's suppose I have 50 categories of products and each of these has some required parameters for product (like color, size, etc.).
Some parameters apper in all categories, and some are unique. That gives me around 300 parameters (fields) that should be defined in Django model.
I suppose it is not good idea to create one big database with 300 fields and add products that have 1-15 parameters there (leaving remaining fields empty). What would be the best way to handle it?
What would be the best way to display form that will ask only for parameters required in given category?

If you have to keep the Model structure as you have defined it here, I would create a "Product" "Category" "ProductCategory" tables.
Product table is as follows:
ProductID
ProductName
1
Shirt
2
Table
3
Vase
Category table is following
CategoryID
CategoryName
1
Size
2
Color
3
Material
ProductCategory
ID
ProductID
CategoryID
CategoryValue
1
1 (Shirt)
1 (Size)
Medium
2
2 (Table)
2 (Color)
Dark Oak
3
3 (Vase)
3 (Material)
Glass
3
3 (Vase)
3 (Material)
Plastic
This would be the easiest way, which wouldn't create 300 columns, would allow you to reuse categories across different types of products, but in the case of many products, would start to slowdown the database queries, as you would be joining 2 big tables. Product and ProductCategory
You could split it up in more major Categories such as "Plants", "Kitchenware" etc etc.

1-1. Create Models for each category
You can build 50 classes for describing your products in shop.
This is a simple and basic solution. It can also be an optimal solution if the domain logic varies from category to category.
class Furniture(Model):
price
size
color
...
class Clothes(Model):
price
gender
texture
...
1-2. Aggregate common fields into base class
If you have many common fields, introducing inheritance would be a great idea.
class Base(Model):
price
...
class Meta:
abstract = True
class Furniture(Base):
size
color
...
class Clothes(Base):
gender
texture
...
2-1. One BigTable
I guess this is what you were going to do.
I suppose it is not good idea to create one big database with 300 fields and add products that have 1-15 parameters there (leaving remaining fields empty).
Like you said, the rest of the field will remain, but it's a not bad idea unless domain logic is different by category.
class Product(Model):
price
size
color
gender
texture
...
2-2. One Table, but several models
Tables are in the data layer and models are in the domain layer. It does not have to be considered the same as the model.
You can build a proxy model to describe each category type.
Pros
simple data layer
available to deal with complex domain logic across different categories
Cons
code complexity due to proxy processing
various difficulties arising from the table and model not being one-on-one
class ProductProxyManager(Manager):
def get_queryset(self):
return (
super()
.get_queryset()
.filter(type=self.model.product_type)
.only(*(self.model.required_fields + self.model.base_fields))
)
class ProductType(enum.Enum):
Furniture = "furniture"
Clothes = "clothes"
class Product(Model):
type: ProductType
price
size
color
gender
texture
...
def __new__(cls, *args, **kwargs):
# get proxy name, either from kwargs or from args
type: ProductType = kwargs.get("type")
if type is None:
type_field_index = cls._meta.fields.index(cls._meta.get_field("type"))
proxy_name = args[type_field_index]
else:
proxy_name = type
# get proxy class, by name, from the block formwork
instance_class = Product.get_instance_class(proxy_name)
o = super().__new__(instance_class)
return o
#staticmethod
def get_instance_class(type: ProductType) -> Type["ProductType"]:
return {
ProductType.Furniture: Furniture,
ProductType.Clothes: Clothes,
}[type]
class Furniture(Product):
class Meta:
proxy = True
required_fields = ("size", "color")
objects = ProductProxyManager()
class Clothes(Product):
class Meta:
proxy = True
required_fields = ("gender", "texture")
objects = ProductProxyManager()
You can see further steps here. (I followed up to step 3.)
https://stackoverflow.com/a/60894618/8614565

Related

I'm trying to build a Django query that will sum multiple categories in to one distinct category

I have a model called Actuals with a field called category which is unique, and another model called Budget which is a many to many field in the Actuals Model. A user can select a unique category in budget and select it in actuals so there can be many actuals to a budget. I am trying to create a query that will group and Sum 'transaction_amount' by category in Actuals model.
class Actuals(models.Model):
category = models.ForeignKey(Category,on_delete=models.CASCADE)
date = models.DateTimeField(auto_now_add=False)
transactions_amount = models.IntegerField()
vendor = models.CharField(max_length = 255,default="")
details = models.CharField(max_length = 255)
budget = models.ManyToManyField('budget')
def __str__(self):
return self.category.category_feild
This is the query that I currently have. However it still gives me multiple categories
lub = Actuals.objects.filter(category__income_or_expense = 'Expense', date__year = "2022" ,date__month = "01").values('category__category_feild','date').order_by('category__category_feild').annotate(total_actuals = Sum('transactions_amount')).annotate(total_budget = Sum('budget__budget_amt'))
This is the output. There should only be one line for "Fun" and one line for "Paycheck".
<QuerySet [<Actuals: Fun>, <Actuals: Fun>, <Actuals: Paycheck>, <Actuals: Paycheck>]>
annotate method only adds another attribute to the objects returned in the queryset. If you want to get a single object as a result you should use the aggregate queryset method:
lub = Actuals.objects.filter(category__income_or_expense='Expense', date__year="2022" ,date__month="01").order_by('category__category_feild').aggregate(total_actuals=Sum('transactions_amount'), total_budget=Sum('budget__budget_amt'))
To get the result values use:
total_actuals = lub['total_actuals']
total_budget = lub['total_budget']
This code is not tested, so let me know if it works.
Also is category__category_feild a typo, or?
PS., if you're wondering why I didn't use values, see this antipattern

how to calculate between to different models field without having connection

hi i need to calculate between to different models field without having any connection
imagine i have two models (tables) i want to get profit and income in a storage , Model1 for selling purpose and the other for costs of the company , i need to know profit and incomes , field_1 all selling prices and field_2 all costs of the company
class Model1(models.Model):
field_1 = models.IntegerField()
class Model2(models.Model):
field_2 = models.IntegerField()
can i calculate something like this model1__field1 - model2__field2 ?
i much appreciate your helps
For this first you need to get the both models object.
obj1 = Model1.objects.get(pk=1)
obj2 = Model2.objects.get(pk=1)
Now you can calculate the difference.
diff = obj1.field_1 - obj2.field_2

Django: How can I add an aggregated field to a queryset based on data from the row and data from another Model?

I have a Django App with the following models:
CURRENCY_CHOICES = (('USD', 'US Dollars'), ('EUR', 'Euro'))
class ExchangeRate(models.Model):
currency = models.CharField(max_length=3, default='USD', choices=CURRENCY_CHOICES)
rate = models.FloatField()
exchange_date = models.DateField()
class Donation(models.Model):
donation_date = models.DateField()
donor = models.CharField(max_length=250)
amount = models.FloatField()
currency = models.CharField(max_length=3, default='USD', choices=CURRENCY_CHOICES)
I also have a form I use to filter donations based on some criteria:
class DonationFilterForm(forms.Form)
min_amount = models.FloatField(required=False)
max_amount = models.FloatField(required=False)
The min_amount and max_amount fields will always represent values in US Dollars.
I need to be able to filter a queryset based on min_amount and max_amount, but for that all the amounts must be in USD. To convert the donation amount to USD I need to multiply by the ExchangeRate of the donation currency and date.
The only way I found of doing this so far is by iterating the dict(queryset) and adding a new value called usd_amount, but that may offer very poor performance in the future.
Reading Django documentation, it seems the same thing can be done using aggregation, but so far I haven't been able to create the right logic that would give me same result.
I knew I had to use annotate to solve this, but I didn't know exactly how because it involved getting data from an unrelated Model.
Upon further investigation I found what I needed in the Django Documentation. I needed to use the Subquery and the OuterRef expressions to get the values from the outer queryset so I could filter the inner queryset.
The final solution looks like this:
# Prepare the filter with dynamic fields using OuterRef
rates = ExchangeRate.objects.filter(exchange_date=OuterRef('date'), currency='EUR')
# Get the exchange rate for every donation made in Euros
qs = Donation.objects.filter(currency='EUR').annotate(exchange_rate=Subquery(rates.values('rate')[:1]))
# Get the equivalent amount in USD
qs = qs.annotate(usd_amount=F('amount') * F('exchange_rate'))
So, finally, I could filter the resulting queryset like so:
final_qs = qs.filter(usd_amount__gte=min_amount, usd_amount__lte=max_amount)

Django: How to get objects by date from many models?

3 differents models have a different datetime field:
class ModelA(models.Model):
# some fields here
date = models.DateField()
class ModelB(models.Model):
# some fields here
date = models.DateField()
class ModelC(models.Model):
# some fields here
date = models.DateField()
I'd like to get the 50 last objects using the date fields (whatever their class).
For now, it works but I'm doing it in a very innecient way as you can see:
all_a = ModelA.objects.all()
all_b = ModelB.objects.all()
all_c = ModelC.objects.all()
last_50_events = sorted(
chain(all_a, all_b, all_c),
key=attrgetter('date'),
reverse=True)[:50]
How to do it un a efficient way (ie. without loading useless data)?
Easy solution - which i recommend you - load 50 objects of each type, sort, get first 50 (load 3 times more)
"Proper solution" can't be achieved in ORM with your current schema.
Probably easiest way is add new model with date and generic relation to whole model.
Theoretically you can also do some magic with union and raw queries, but all stuff like this is dirty with non trivial manual processing.

Using Django with MySQL for storing and looking up Large DNA Microarray Results

I'm trying to setup a django app that allows me to store and lookup the results of a dna microarray with ~500k unique probes for a large number of subjects.
The model set up I've been toying with is as follows:
class Subject(models.Model):
name = models.CharField()
class Chip(models.Model):
chip_name = models.Charfield()
class Probe(models.Model):
chips = models.ManyToManyField(Chip, related_name="probes" )
rs_name = models.CharField(unique=True)
chromosome = models.IntegerField()
location = models.IntegerField()
class Genotype(models.Model):
probe = models.ForeignKey(Probe, related_name='genotypes')
subject = models.ForeignKey(Subject, related_name='genotypes')
genotype = models.CharField()
I was wondering if there's there a better way to set this up? I was just thinking that for each subject I would be creating 500k rows in the Genotype table.
If I'm using a MySQL db, will it be able to handle a large number of subjects each adding 500k rows to that table?
Well if you need a result (genotype) per Probe for every Subject, then a standard many-to-many intermediary table (Genotype) is going to get huge indeed.
With 1000 Subjects you'd have 500 million records.
If you could save the values for genotype field encoded/serialized in one or more columns, that would reduce the amount of records drastically. Saving 500k results encoded in a single column would be a problem, but if you can split them in groups, should be workable. This would reduce amount of records to nr. of Subjects. Or another possibility could be having Probe-s grouped in ProbeGroup-s and having nr. ProbeResults = nr. Subject * nr. ProbeGroup.
First option would be something like:
class SubjectProbeResults(models.Model):
subject = models.ForeignKey(Subject, related_name='probe_results')
pg_a_genotypes = models.TextField()
..
pg_n_genotypes = models.TextField()
This will of course make it more difficult to search/filter results, but shouldn't be too hard if the saved format is simple.
You can have the following format in genotype columns: "probe1_id|genotype1,probe2_id|genotype2,probe3_id|genotype3,..."
To retrieve a queryset of subjects for a specific genotype + probe.
a. Determine which group the probe belongs to
i.e "Group C" -> pg_c_genotypes
b. Query the respective column for probe_id + genotype combination.
from django.db.models import Q
qstring = "%s|%s" % (probe_id, genotype)
subjects = Subject.objects.filter(Q(probe_results__pg_c_genotypes__contains=',%s,' % qstring) | \
Q(probe_results__pg_c_genotypes__startswith='%s,' % qstring) | \
Q(probe_results__pg_c_genotypes__endswith=',%s' % qstring))
The other option that I've mentioned is to have ProbeGroup model too and each Probe will have a ForeignKey to ProbeGroup. And then:
class SubjectProbeResults(models.Model):
subject = models.ForeignKey(Subject, related_name='probe_results')
probe_group = models.ForeignKey(ProbeGroup, related_name='probe_results')
genotypes = models.TextField()
You can query the genotypes field the same, except now you can query the group directly, instead of determining the column you need to search.
This way if you have for ex. 1000 probes per group -> 500 groups. Then for 1000 Subjects you'll have 500K SubjectProbeResults, still a lot, but certainly more manageable than 500M. But you could have less groups, you'd have to test what works best.

Categories

Resources