Does a StructuredProperty reference the parent or child?
class Invoices(ndb.Model): #Child
class Customers(ndb.Model): #Parent
invoice = ndb.StructuredProperty(Invoices)
or...
class Customers(ndb.Model): #Parent
class Invoices(ndb.Model): #Child
customer = ndb.StructuredProperty(Customers)
To answer your question in the context of "what is the better practice for a NoSQL Datastore",
here's what I can offer.
First, you probably want to name your models in the singular, as they should describe a single
Invoice or Customer entity, not several.
Next, using a StructuredProperty implies that you'd like to keep all of this information in a
single entity - this will reduce write/read ops, but can introduce some limitations.
(See the docs -
or this related question)
The most common relationship would be a one(Customer) to many(Invoice) relationship,
which you can structure as below:
class Invoice(ndb.Model): #Child
invoice_id = ndb.StringProperty(required=True) #
class Customer(ndb.Model): #Parent
invoices = ndb.StructuredProperty(Invoices, repeated=True)
def get_invoice_by_id(self, invoice_id):
"""Returns a customer Invoice by invoice_id. Raises KeyError if invoice is not present."""
invoice_matches = [iv for iv in self.invoices if iv.invoice_id == invoice_id]
if not invoice_matches: raise KeyError("Customer has no Invoice with ID %s" % invoice_id)
return invoice_matches[0] # this could be changed to return all matches
Keep in mind the following restrictions of this implementation:
StructuredPropertys can not contain repeated properties inside of themselves.
The complexity for keeping invoice_id globally unique is going to be higher than if Invoice were in its own entity-group. (invoice_key.get() is always better than the query this requires))
You would need an instance method on Customer to find an Invoice by invoice_id.
You would need logic to prevent invoices with the same ID from existing on a single Customer
Here are some of the advantages:
You can query for Customer
Querying an Invoice by invoice_id will return the Customer instance, along with all invoices. (This could be a pro and a con, actually - you need logic to return the invoice from the customer)
Here is a more common solution, but by no means is it necessarily the "right solution."
This solution uses ancestor relationships, that allow you to keep writes to Invoice and
the related Customer atomic - so you could maintain aggregate invoice statistics on the
Customer level. (total_orders, total_gross, etc.)
class Invoice(ndb.Model):
customer = ndb.ComputedProperty(lambda self: self.key.parent(), indexed=False) # when not indexed, this is essentially a #property
class Customer(ndb.Model):
def get_invoice_by_id(self, invoice_id):
"""Returns a customer Invoice by invoice_id. Raises KeyError if invoice is not present."""
invoice_key = ndb.Key(Invoice._get_kind(), invoice_id, parent=self.key)
return invoice_key.get()
def query_invoices(self):
"""Returns ndb.Query for invoices by this Customer."""
return self.query(ancestor=self.key)
invoice = Invoice(parent=customer.key, **invoice_properties)
Good luck with Appengine! Once you get the hang of all of this, it is a truly rewarding platform.
Update:
Here is some additional code for transactionally updating customer aggregate totals as I mentioned above.
def create_invoice(customer_key, gross_amount_paid):
"""Creates an invoice for a given customer.
Args:
customer_key: (ndb.Key) Customer key
gross_amount_paid: (int) Gross amount paid by customer
"""
#ndb.transactional
def _txn():
customer = customer_key.get()
invoice = Invoice(parent=customer.key, gross_amount=gross_amount_paid)
# Keep an atomic, transactional count of customer aggregates
customer.total_gross += gross_amount_paid
customer.total_orders += 1
# batched put for API optimization
ndb.put_multi([customer, invoice])
return invoice
return _txn()
The above code works in a single entity group transaction (e.g. ndb.transactional(xg=False)) because Invoice is a child entity to Customer. If that connection is lost, you would need xg=True. (I'm not sure if it's more expensive, but it is less optimized)
Related
I have a model in my django app like below:
models.py
class Profit(models.Model):
client = models.ForeignKey(Client, null=True, on_delete=models.CASCADE)
month = models.CharField(max_length=100)
amount = models.IntegerField()
total_profit = models.IntegerField()
Now, what I want to do is that whenever a new instance/object is created for this class, the user puts the month and the amount of profit for that month, But I want that it also calculates the total profit the user got up till the current profit, by adding all the profits that was being added in the past.
For example.
if the user is adding the profit for month April, then it add all the values in the amount field of previously added objects of (March, February, January and so on..) and put it in the field total_profit. So that the user can see how much total_profit he got at each new entry.
My views.py where I am printing the list of profits is given below:
views.py
class ProfitListView(ListView):
model = Profit
template_name = 'client_management_system/profit_detail.html'
context_object_name = 'profits'
# pk=self.kwargs['pk'] is to get the client id/pk from URL
def get_queryset(self):
user = get_object_or_404(Client, pk=self.kwargs['pk'])
return Profit.objects.filter(client=user)
Client is the another model in my models.py to which the Profit class is connected via ForeignKey
I also don't exactly know how to use window functions inside this view.
As stated in the comments one should generally not store things in the database that can be calculated from other data. Since that leads to duplication and then makes it difficult to update data. Although if your data might not change and this is some financial data one might store it anyway for record keeping purposes.
Firstly month as a CharField is not a very suitable field of yours for your schema. As firstly they are not easily ordered, secondly it would be better for you to work with a DateTimeField instead:
class Profit(models.Model):
month = models.CharField(max_length=100) # Remove this
made_on = models.DateTimeField() # A `DateTimeField` is better suited
amount = models.IntegerField()
total_profit = models.IntegerField()
Next since you want to print all the Profit instances along with the total amount you should use a Window function [Django docs] which will be ordered by made_on and we will also use a frame just in case that the made_on is same for two entries:
from django.db.models import F, RowRange, Sum, Window
queryset = Profit.objects.annotate(
total_amount=Window(
expression=Sum('amount'),
order_by=F('made_on').asc(),
frame=RowRange(end=0)
)
)
for profit in queryset:
print(f"Date: {profit.made_on}, Amount: {profit.amount}, Total amount: {profit.total_amount}")
I wrote my own billing software, but I don't know how to approach this problem.
Right now I have 3 models:
Number
Receipt
MilageReceipt
The point is I need to write two kinds of receipts to my costumers. But for the ministry of finance, they have to have a continuous ID over them. So Number just contains an auto field and Receipt and MilageReceipt just has a Foreign key to that. This way I have an ID over two different models.
But now I want to expand this to also handle multiple companies. So there are two different types of Receipts that need to have a continuous number, but there will be multiple users who all need their own continuous numbers.
I want to have something that results in:
Receipt: company:1, id:1
Receipt: company:2, id:1
Receipt: company:1, id:2
MilageReceipt: company:1, id:3
Receipt: company:2, id:2
MilageReceipt: company:1, id:4
Receipt: company:1, id:5
MilageReceipt: company:2, id:3
I hope it is somewhat clear what I want to achieve. Can you please point me in the direction on how to set up models to get this behavior?
I want to keep the admin as original as possible so I'd like to do this on the model level, not the views - if possible. Right now, for example, I create the Number automatically every time a Receipt or a MilageReceipt is created. So the user doesn't even notice.
Thank you for your help!
For continuous sequences you really can't use database sequences. The easiest solution is to have your own table (Model) to hold the sequences. Here's some code I use to do that:
class Sequence(models.Model):
class Meta:
verbose_name = 'Sequence'
permissions = (('view_sequence', 'Can View Sequence'),)
name = models.CharField(max_length=20)
value = models.IntegerField()
def __str__(self):
return 'Sequence %s=%s' % (self.name, self.value)
def __unicode__(self):
return self.__str__()
#classmethod
def set(cls, name, value=None, increment=0):
with transaction.atomic():
seq = cls.objects.select_for_update().filter(name=name).first()
if not seq:
seq = cls(name=name, value=0)
seq.value = increment + (value if value is not None else seq.value)
seq.save()
return seq.value
#classmethod
def get_next(cls, name):
return cls.set(name, increment=1)
To get the next sequence you would form a key string that names the sequence, eg:
nextnumber = Sequence.get_next('receipt_%d' % company.pk)
Since every company would have a different key, they will be separate sequences.
If you were to use this in let's say your Receipt save method, like so:
class Receipt(models.Model):
def save(self, *args, **kwargs):
if not self.seq_num:
self.seq_num = Sequence.get_next('receipt_%d' % self.company.pk)
super(Receipt, self).save(*args, **kwargs)
company = models.ForeignKey(Company, null=False, Blank=False)
seq_num = models.IntegerField(blank=True, unique=True)
# etc etc
Whenever you save a Receipt, if it doesn't have a seq_num, one will be assigned to it.
There's just one gotcha here, which is that the way the transaction code is structured, there's a possibility of skipping a number if something goes wrong. To avoid that, move the transaction.atomic() block into your receipt save() method.
I'm having a dilemma on choosing whether to use ContentType or using ManyToManyField.
Consider the following example:
class Book(Model):
identifiers = ManyToManyField('Identifier')
title = CharField(max_length=10)
class Series(Model):
identifiers = ManyToManyField('Identifier')
book = ForeignKey('Book')
name = CharField(max_length=10)
class Author(Model):
identifiers = ManyToManyField('Identifier')
name = CharField(max_length=10)
class Identifier(Model):
id_type = ForeignKey('IdType')
value = CharField(max_length=10)
class IdType(Model):
# Sample Value:
# Book: ISBN10, ISBN13, LCCN
# Serial: ISSN
# Author: DAI, AIS
name = CharField(max_length=10)
As you notice, Identifier is being used in many places, and in fact, it is so generic that many business related object requires Identifier, similar to how TagItem from the Django examples is being used.
An alternative approach is to generalized this using the Generic Relation.
class Book(Model):
identifiers = GenericRelation(Identifier)
title = CharField(max_length=10)
authors = ManyToManyField(Author)
class Series(Model):
identifiers = GenericRelation(Identifier)
book = ForeignKey('Book')
name = CharField(max_length=10)
class Author(Model):
identifiers = GenericRelation(Identifier)
name = CharField(max_length=10)
class Identifier(Model):
content_type = models.ForeignKey(ContentType, on_delete=models.CASCADE)
object_id = models.PositiveIntegerField()
content_object = GenericForeignKey('content_type', 'object_id')
id_type = ForeignKey('IdType')
value = CharField(max_length=10)
class IdType(Model):
# Sample Value:
# Book: ISBN10, ISBN13, LCCN, MyLibrary, YourLibrary, XYZLibrary, etc.
# Serial: ISSN, XYZSerial, etc...
# Author: DAI, AIS, XYZAuthor, etc...
name = CharField(max_length=10)
I'm unsure if I'm raising the right concern regarding Generic Relation.
I'm worried about the data growth of the Identifier table, as it will grow very fast on one table, for example:
100,000 books, average 4 identifiers each. (total 400,000 identifier records)
average 2 authors per book (total 200,000 identifier records)
For each record in book, 4-6x data increased in identifier table. Soon, Identifier Table will be millions of records. Will the queries becoming very slow in the long run? Moreover, I believed that identifier is a field that being queried and used in the application.
Is this generalization correctly done? As in, Author Identifier is completely unrelated with Book Identifier and should have its own BookIdentifier and AuthorIdentifier on its own. Although they seems to have IdType.name and IdType.value pattern, but the domain are completely not related, one is author, the other is book. Should they be generalized? Why not?
What problem could there be if I were to implement under GenericRelation model?
We have product like:
class Product(Model):
"""
Base Product Model
"""
shop_id = columns.UUID(primary_key=True, required=True)
product_id = columns.UUID(primary_key=True, required=True, default=uuid.uuid4)
wikimart_id = columns.Integer(index=True) # Convert to user defined type?
yandex_id = columns.Integer(index=True)
Periodically (once a day) we update products from list.
Currently we have to use constructions like
if Product.filter(wikimart_id=external_id):
p = Product.get(shop_id=shop_id, wikimart_id=external_id)
d['product_id'] = p.product_id # Setting key in dict from which model will be updated
Is it ok for Cassandra, or we should think how to create models that will have external_id as primary key for updating products?
Like:
class ProductWikimart(Model):
"""
Wikimart Product Model
"""
shop_id = columns.UUID(primary_key=True, required=True)
wikimart_id = columns.Integer(primary_key=True)
product_id = columns.UUID(index=True)
class ProductYandex(Model):
"""
Yandex Product Model
"""
shop_id = columns.UUID(primary_key=True, required=True)
yandex_id = columns.Integer(primary_key=True)
product_id = columns.UUID(index=True)
Which way is more preferable?
UPD This question is about generic modelling for NoSQL. Not only about cassandra :)
Maybe this article would be helpful for you.
I don't think the product_id is a good candidate for a clustering key due to it relatively frequent changes. So, I think the second version of product model (with ProductWikimart and ProductYandex) would be better. But then you can get new problems: for instance, how to match ProductWikimart and ProductYandex product ids?
Speaking of data modeling for Cassandra in general there is Model Around Your Queries rule. So, to tell what kind of table structure would be better we should know how it would be requested.
I'm trying to use a modified version of the django-oscar import_oscar_catalogue class to import a bunch of products from a CSV, and on the first encounter of a product (defined by title), create a canonical parent product, and then for all future encounters create a child product under that parent product.
This seems to work, but the canonical product does not reflect the combined stock levels of the child product, nor display the correct attributes for that product. It does correctly list them as variations within the django dashboard though.
How can I programmatically create this child/parent relationship in products, with the correct stock records?
Relevant code:
def _create_item(self, upc, title, product_class, other_product_attributes):
product_class, __ \
= ProductClass.objects.get_or_create(name=product_class)
try:
parent = Product.objects.get(title=title)
item = Product()
item.parent = parent
except Product.DoesNotExist:
# Here is where I think it might need to be changed
# Maybe pitem = ParentProduct() or something?
pitem = Product()
pitem.upc = upc
pitem.title = title
pitem.other_product_attributes = other_product_attributes
# Here parent item is saved to db
pitem.save()
# Create item because no parent was found
item = Product()
parent = Product.objects.get(title=title)
#Set parent
item.parent = parent
# Customize child attributes
item.product_class = product_class
item.title = title
item.other_product_attributes = other_product_attributes
# Save the child item
item.save()
def _create_stockrecord(self, item, partner_name, partner_sku, price_excl_tax,
num_in_stock, stats):
# Create partner and stock record
partner, _ = Partner.objects.get_or_create(
name=partner_name)
try:
stock = StockRecord.objects.get(partner_sku=partner_sku)
except StockRecord.DoesNotExist:
stock = StockRecord()
stock.num_in_stock = 0
# General attributes
stock.product = item
stock.partner = partner
# SKU will be unique for every object
stock.partner_sku = partner_sku
stock.price_excl_tax = D(price_excl_tax)
stock.num_in_stock += int(num_in_stock)
# Save the object to database
stock.save()
The create_stockrecord() creates a record of 1 stock for each unique item variation, but these variation's stockrecords don't translate to the parent item.
EDIT:
I have updated the class with a method that explicitly calls ProductClass.objects.track_stock() against the ProductClass instance, and I'm calling it after looping through all rows of the CSV file (passing it the name of the one product class I use currently). However, when looking at the stock in dashboard, none of the child/variations stock is being counted against the parent.
def track_stock(self, class_name):
self.logger.info("ProductClass name: %s" % class_name)
product_class = ProductClass.objects.get_or_create(name=class_name)
self.logger.info("ProductClass: %s" % str(product_class))
self.logger.info("TrackStock: %s" % str(product_class[0].track_stock))
product_class[0].track_stock = True
self.logger.info("TrackStock: %s" % str(product_class[0].track_stock))
product_class[0].save()
INFO Starting catalogue import
INFO - Importing records from 'sample_inventory.csv'
INFO - Flushing product data before import
INFO Parent items: 6, child items: 10
INFO ProductClass name: ClassName
INFO ProductClass: (<ProductClass: ClassName>, False)
INFO TrackStock: True
INFO TrackStock: True
I have checked the admin page, only 1 ProductClass is created, and it has the same name as is being passed to track_stock(). Is there something else that needs to be done to enable this feature? track_stock() documentation is kind of sparse. In the output, track_stock looks like it is true in both instances. Does it have to be False while the child_objects are created, and then flipped to True?
To be able to correctly reflect the stock levels for any Product you need to have a Partner that will supply the Product and then you need to have StockRecord that links the Partner and the Products together.
First make sure that you have all that information in the database for each one of your Product variations.
Then you need to update your ProductClass and set the "track_stock" attribute as True since its None by default.
You also need to remove the ProductClass from your child products since they inherit the ProductClass from their Parent Product.
EDIT 1:
To add attributes to a Product you have to add a ProductAttribute for the ProductClass and then you can set the attributes directly on the Product like this example.
EDIT 2:
You also need to set the "net_stock_level" on the StockRecord.
To get a more in depth look into how Oscar gets the stock levels look into Selector. This class determines which pricing, tax and stock level strategies to use which you might need to customize in the future if you want to charge tax or offer different pricing based on the user.
After some research from the test factory, I solved the issue by specifying
product.stucture = 'parent'
On the parent object, and
product.structure = 'child'
on the child object. I also needed to change the custom attributes of my objects to a dict product_attributes, and then set each value on the object:
if product_attributes:
for code, value in product_attributes.items():
product_class.attributes.get_or_create(name=code, code=code)
setattr(product.attr, code, value)
It was not necessary to create a stock record for each parent object, as they track the stock records of the child objects to which they are associated. It was also not necessary to set track_stock = True, as it is set to True by default when creating a Product()
This answer was posted as an edit to the question Create a canonical "parent" product in Django Oscar programmatically by the OP Tui Popenoe under CC BY-SA 3.0.