I have this data:
Firefox 3.6
There are 3 items
name
max version
min version
I am storing it this way:
class MyModel(models.Model):
browser_name = models.CharField(...)
browser_max_version = models.IntegerField(...)
browser_min_version = models.IntegerField(...)
or alternative
class Browser(models.Model):
name = models.CharField(...)
max_version = models.IntegerField(...)
min_version = models.IntegerField(...)
class MyModel(models.Model):
browser = models.ForeignKey(Browser)
Is there any clever way to store the value in 1 field and making it parsable at the same time?
I know this might sound weird, but I wonder if there are any alternative to building 1 million models to represent data.
Any ideas? :)
You could make it parseable, but probably not indexable. For example, you could concatenate the values together separated by semicolons (or some other character), then simply split the string to get the values back. "Firefox 3.6" would become "Firefox;3;6". While this is somewhat easier to parse, it doesn't provide much of an advantage over the original formatting.
The big caveat with this approach is that the column wouldn't be indexable in a very granular way. For example, you couldn't ask for all versions of Firefox. PostgreSQL allows for some very advanced indexing which, I believe, would allow you to create the required indexes, but I don't know of any way you could access the indexes via Django's ORM.
What is the purpose of MyModel in the second example? The one table Browser is all you need. Why on earth would you need 'millions' of models? Or are you talking about rows in a table?
class Browser(models.Model):
name = models.CharField(...)
max_version = models.IntegerField(...)
min_version = models.IntegerField(...)
is fine
Related
Let's say I want to ask a User a Question: "Order the following animals from biggest to smallest". Here's a little simplified django:
class Question(models.Model):
text = models.CharField() #eg "Order the following animals..."
class Image(models.Model):
image = models.ImageField() #pictures of animals
fk_question = models.ForeignKey(Question)
Now I can assign a variable number of Images to each Question, and customize the question text. Yay.
What would be the appropriate way to record the responses? Obviously I'll need foreign keys to the User and the Question:
class Response(models.Model):
fk_user = models.ForeignKey(User)
fk_question = models.ForeignKey(Question)
But now I'm stuck. How do I elegantly record the order of the Image objects that this User specified?
Edit: I'm using Postgres 9.5
I am generally strongly opposed to storing comma separated data in a column. However this seems like an exception to the rule! May I propose CommaSeparatedIntegerField?
class CommaSeparatedIntegerField(max_length=None, **options)[source]ΒΆ
A field of integers separated by commas. As in CharField, the
max_length argument is required and the note about database
portability mentioned there should be heeded.
This is essentially a charfield, so the order that you input will be preserved in the db.
You haven't mentioned your database. If you are fortunate enough to be on Postgresql and using django 1.9 you can use the ArrayField as well.
using arrayfield would be much better because then the conversion back and forth between string and lists would not be there. The case against comma separated fields is that searching is hard and you can't easily pull the Nth element. POstgresql arrays remove the latter difficulty.
I want to create a database of dislike items, but depending on the category of item, it has different columns I'd like to show when all you're looking at is cars. In fact, I'd like the columns to be dynamic based on the category so we can easily an additional property to cars in the future, and have that column show up now too.
For example:
But when you filter on car or person, additional rows show up for filtering.
All the examples that I can find about using django models aren't giving me a very clear picture on how I might accomplish this behavior in a clean, simple web interface.
I would probably go for a model describing a "dislike criterion":
class DislikeElement(models.Model):
item = models.ForeignKey(Item) # Item is the model corresponding to your first table
field_name = models.CharField() # e.g. "Model", "Year born"...
value = models.CharField() # e.g. "Mustang", "1960"...
You would have quite a lot of flexibility in what data you can retrieve. For example, to get for a given item all the dislike elements, you would just have to do something like item.dislikeelements_set.all().
The only problem with this solution is that you would to store in value numbers, strings, dates... under the same data type. But maybe that's not an issue for you.
Given a class:
from django.db import models
class Person(models.Model):
name = models.CharField(max_length=20)
Is it possible, and if so how, to have a QuerySet that filters based on dynamic arguments? For example:
# Instead of:
Person.objects.filter(name__startswith='B')
# ... and:
Person.objects.filter(name__endswith='B')
# ... is there some way, given:
filter_by = '{0}__{1}'.format('name', 'startswith')
filter_value = 'B'
# ... that you can run the equivalent of this?
Person.objects.filter(filter_by=filter_value)
# ... which will throw an exception, since `filter_by` is not
# an attribute of `Person`.
Python's argument expansion may be used to solve this problem:
kwargs = {
'{0}__{1}'.format('name', 'startswith'): 'A',
'{0}__{1}'.format('name', 'endswith'): 'Z'
}
Person.objects.filter(**kwargs)
This is a very common and useful Python idiom.
A simplified example:
In a Django survey app, I wanted an HTML select list showing registered users. But because we have 5000 registered users, I needed a way to filter that list based on query criteria (such as just people who completed a certain workshop). In order for the survey element to be re-usable, I needed for the person creating the survey question to be able to attach those criteria to that question (don't want to hard-code the query into the app).
The solution I came up with isn't 100% user friendly (requires help from a tech person to create the query) but it does solve the problem. When creating the question, the editor can enter a dictionary into a custom field, e.g.:
{'is_staff':True,'last_name__startswith':'A',}
That string is stored in the database. In the view code, it comes back in as self.question.custom_query . The value of that is a string that looks like a dictionary. We turn it back into a real dictionary with eval() and then stuff it into the queryset with **kwargs:
kwargs = eval(self.question.custom_query)
user_list = User.objects.filter(**kwargs).order_by("last_name")
Additionally to extend on previous answer that made some requests for further code elements I am adding some working code that I am using
in my code with Q. Let's say that I in my request it is possible to have or not filter on fields like:
publisher_id
date_from
date_until
Those fields can appear in query but they may also be missed.
This is how I am building filters based on those fields on an aggregated query that cannot be further filtered after the initial queryset execution:
# prepare filters to apply to queryset
filters = {}
if publisher_id:
filters['publisher_id'] = publisher_id
if date_from:
filters['metric_date__gte'] = date_from
if date_until:
filters['metric_date__lte'] = date_until
filter_q = Q(**filters)
queryset = Something.objects.filter(filter_q)...
Hope this helps since I've spent quite some time to dig this up.
Edit:
As an additional benefit, you can use lists too. For previous example, if instead of publisher_id you have a list called publisher_ids, than you could use this piece of code:
if publisher_ids:
filters['publisher_id__in'] = publisher_ids
Django.db.models.Q is exactly what you want in a Django way.
This looks much more understandable to me:
kwargs = {
'name__startswith': 'A',
'name__endswith': 'Z',
***(Add more filters here)***
}
Person.objects.filter(**kwargs)
A really complex search forms usually indicates that a simpler model is trying to dig it's way out.
How, exactly, do you expect to get the values for the column name and operation?
Where do you get the values of 'name' an 'startswith'?
filter_by = '%s__%s' % ('name', 'startswith')
A "search" form? You're going to -- what? -- pick the name from a list of names? Pick the operation from a list of operations? While open-ended, most people find this confusing and hard-to-use.
How many columns have such filters? 6? 12? 18?
A few? A complex pick-list doesn't make sense. A few fields and a few if-statements make sense.
A large number? Your model doesn't sound right. It sounds like the "field" is actually a key to a row in another table, not a column.
Specific filter buttons. Wait... That's the way the Django admin works. Specific filters are turned into buttons. And the same analysis as above applies. A few filters make sense. A large number of filters usually means a kind of first normal form violation.
A lot of similar fields often means there should have been more rows and fewer fields.
I am trying to create a django model which has as one of its fields a reference to some sort of python type, which could be either a integer, string, date, or decimal.
class MyTag(models.Model):
name = models.CharField(max_length=50)
object = (what goes here??)
I know that if I want a foreign key to any other model, I can use GenericForeignKeys and content_types. How can I have a model field that references any python type? The only idea I have come up with so far is to create models that are simple wrappers on objects and use GenericForeignKeys.
Is there any way to do this?
Since you want to filter, you would need some kind of DB support for your field and json field won't be that valuable for you. You can use some different solution with different
complication levels according to your actual need. One suggestion is to serialize your data to string. Pad with enough zeros for string/integer/float sorting. Now you can filter all your stuff as string (make sure you pad the value you are filtering by as well). Add a data_type column for fetching the right python object.
TYPES = [(int, 1), (Decimal, 2),(date, 3), (str, 4)]
class MyTag(models.Model):
name = models.CharField(max_length=50)
data_type = models.IntegetField(choices=TYPES)
value = models.CharField(max_length=100)
def set_the_value(self, value):
choices = dict(TYPES)
self.data_type = choices[type(value)]
if self.data_type == int:
self.value = "%010d" % value
# else... repeat for other data types
def get_the_value(self):
choices = dict([(y,x) for x,y in TYPES])
return choices[self.data_type](self.value)
(Disclaimer: this is a hack, but probably not the worst one).
JSONField would work, but it won't handle Decimal as is.
If you don't need to filter through this field you could either use JsonField or you could pickle your objects like this
The second aproach would allow you to store nearly any type of python data type, though it would be usable only from python code.
If you need to filter through this data you should just create separate fields with one data type.
I've built a product database that is divided in 3 parts. And each part has a "sub" part containing labels. But the more I work with it the more unstable it feels. And each addition I make it takes more and more code to get it to work.
A product is built of parts, and each part is of a type. Each product, part and type has a label. And there's a label for each language.
A product contains parts in 2 list. One list for default parts (one of each type) and one of optional parts.
Now I want to add currency in the mix and have come to the decision to re-model the entire way I handle this.
The result I want to get is a list of all product objects that contains the name, description, price, all parts and all types that match the parts. And for these the correct language labels.
Like so:
product
- name
- description (by language)
- price (by currency)
- parts
- part (type name and part name by language)
- partPrice (by currency)
The problem with my current setup that is a wild mix of db.ReferenceProperty and db.ListProperty(db.key)
And getting all data by is a bit of a hassle that require multiple for-loops, matching dict and datastore calls. Well it's bit of a mess.
The re-model(un-tested) look like this
class Products(db.model)
name = db.StringProperty()
imageUrl = db.StringProperty()
optionalParts = db.ListProperty(db.Key)
defaultParts = db.ListProperty(db.Key)
active = db.BooleanProperty(default=True)
#property
def itemId(self):
return self.key().id()
class ProductPartTypes(db.Model):
name= db.StringProperty()
#property
def itemId(self):
return self.key().id()
class ProductParts(db.Model):
name = db.StringProperty()
type = db.ReferenceProperty(ProductPartTypes)
imageUrl = db.StringProperty()
parts = db.ListProperty(db.Key)
#property
def itemId(self):
return self.key().id()
class Labels(db.Model)
key = db.StringProperty() #want to store a key here
language = db.StringProperty()
label = db.StringProperty()
class Price(db.Model)
key = db.StringProperty() #want to store a key here
language = db.StringProperty()
price = db.IntegerProperty()
The major thing here is that I've split the Labels and Price out. So these can contain labels and prices for any products, parts or types.
So what I am curious about, is this a solid solution from a architectural point of view? Will this hold even if there's thousands of entries in each model?
Also, any tips for retrieving data in a good manner are welcome. My current solution of get all data first and for-looping over them and stick them in dicts works but feels like it could fail any minute.
..fredrik
You need to keep in mind that App Engine's datastore requires you to rethink your usual way of designing databases. It goes against intuition at first but you must denormalize your data as much as possible if you want your application to be scalable. The datastore has been designed this way.
The approach I usually take is to consider first what kind of queries will need to be done in different use cases, eg. what data do I need to retrieve at the same time ? In what order ? What properties should be indexed ?
If I understand correctly, your main goal is to fetch a list of products with complete details. BTW, if you have other query scenarios - ie. filtering on price, type, etc - you should take them into account too.
In order to fetch all the data you need from only one query, I suggest you create one model which could look like this :
class ProductPart(db.Model):
product_name = db.StringProperty()
product_image_url = db.StringProperty()
product_active = db.BooleanProperty(default=True)
product_description = db.StringListProperty(indexed=False) # Contains product description in all languages
part_name = db.StringProperty()
part_image_url = db.StringProperty()
part_type = db.StringListProperty(indexed=False) # Contains part type in all languages
part_label = db.StringListProperty(indexed=False) # Contains part label in all languages
part_price = db.ListProperty(float, indexed=False) # Contains part price in all currencies
part_default = db.BooleanProperty()
part_optional = db.BooleanProperty()
About this solution :
ListProperties are set to
indexed=False in order to avoid
exploding indexes if you don't need
to filter on them.
In order to get the right
description, label or type, you will have to set
list values always in the same order.
For example : part_label[0] is
English, part_label[1] is Spanish,
etc. Same idea for prices and
currencies.
After fetching entities from this
model you will have to do some
in-memory manipulations in order to
get the data nicely structured the way
you want, maybe in a new dictionary.
Obviously, there will be a lot of redundancy in the datastore with such a design - but that's okay, since it allows you to query the datastore in a scalable fashion.
Besides, this is not meant as a replacement for the architecture that you had in mind, but rather an additional Model designed specifically for the user-facing kind of queries that you need to do, ie. retrieving lists of complete product/parts information.
These ProductPart entities could be populated by background tasks, replicating data located in your other normalized entities which would be the authoritative data source. Since you have plenty of data storage on App Engine, this should not be a problem.
IMO your design mostly makes sense. I did come up with almost same design after reading your problem statement. With a few differnces
I had prices with Product and ProductPart not as a separate table.
Other difference was part_types. If there are not many part_type you can simply have them as python list/tuple.
part_types = ('wheel', 'break', 'mirror')
It also depends on kind of queries you are anticipating. If there are many queries of nature price calculation (independent of rest of product and part info) then it might make sense to design it way you have done.
You have mentioned that you will get all the data first. Isn't querying possible? If you get the whole data in your app and then sort/filter in python then it would be slow. Which database are you considering? For me mongodb looks like a good option here.
Finally why are you suspicious about even 1000 records? You can run a few tests on your db beforehand.
Bests