Today I was refactoring some code and revisited an old friend, an Address class (see below). It occurred to me that, in our application, we don't do anything special with addresses-- no queries, only lightweight validation and frequent serialization to JSON. The only "useful" properties from the developer point-of-view are the label and person.
So, I considered refactoring the Address model to use a custom AddressProperty (see below), which strikes me as a good thing to do, but off-the-top I don't see any compelling advantages.
Which method would you choose, why and what tradeoffs guide that decision?
# a lightweight Address for form-based CRUD
# many-to-one relationship with Person objects
class Address(db.Model):
label=db.StringProperty()
person=db.ReferenceProperty(collection_name='addresses')
address1=db.StringProperty()
address2=db.StringProperty()
city=db.StringProperty()
zipcode=db.StringProperty()
# an alternate representation -- worthwhile? tradeoffs?
class Address(db.Model):
label=db.StringProperty()
person=db.ReferenceProperty(collection_name='addresses')
details=AddressProperty() # like db.PostalAddressProperty, more methods
Given that you don't need to query on addresses, and given they tend to be fairly small (as opposed to, say, a large binary blob), I would suggest going with the latter. It'll save space and time (fetching it) - the only real downside is that you have to implement the property yourself.
If you wanted to be a little different you could always store the data in one table as structures and then have another table for lookups and metadata
Related
I'm a self-taught programmer and a lot of the problems I encounter come from a lack of formal education (and often also experience).
My question it the following: How to rationalize where you store the data a class or function creates? I'll make a simple example:
Case: I have a webshop (SHOP) with a REST api and a product provider (PROVIDER) also with a REST API. I determine the product, I send that data to PROVIDER who sends me back formatted data that can be read by SHOP to make a working product on the webshop. PROVIDER also has a secondary REST api that provides generated images.
What I would come up with:
I'd make three classes: ProductBase, Shop and Provider
ProductBase would be the class from where I instantiate and store the individual product information.
Shop would be where I design the api interactions with the webshop.
Provider same as shop, but for interactions with provider api.
My problem: At some point you're creating data that's not clearly separated in concern. For example: Would I store the generated product data (from PROVIDER) in the ProductBase instance I created? It feels like I'm coupling the two classes this way. But it not there, then where?
What if I create product images with PROVIDER and I upload them to SHOP? Do I store the uploaded image-url in PRODUCT? How do you keep track of all this info?
The question I want answered:
I've read a lot on OOP and Design Patterns, and I have adopted a TDD approach which has greatly helped to improve my code but I haven't found anything on how to approach the flow of at runtime generated data within software engineering.
What would be a good way to solve above problem(s) and could you explain your rationale for it?
If I understand correctly, I think your current concern is that you have "raw" product data, which you want to store in objects, and you have "processed" (formatted) product data, which you also want to store in objects. Your question being should you mix them.
Let me just first point out the other obvious option. Namely, having two product classes: RawProduct and ProcessedProduct. Which to do?
(Edit: also, to be sure, product data should not be stored in provider. The provide performs the action of formatting but the data is product data. Not provider data).
It depends. There are a couple of considerations:
1) In general, in OOP, the idea is to couple actions on data with the data. So if possible, you have some method in ProductBase like "format()", where format will send the object off to the API to get formatted, and store the result in an instance variable. You can then also have a method like "find_image", that goes and fetches the image url from the API and then stores that in a field. An object's data is meant to be dynamic. It is meant to be altered by object methods.
2) If you need version control (if you want the full history of the object's state to be available), then you can't override fields with new data. So either you need to store a history of every object field in the object, or you need to create new objects.
3) Is RAM a concern? I sometimes create dataclasses that store only the final part of an object's life so that I can fit more of the objects into memory.
Personally I often find myself creating "RawObject" and "ProcessedObject" classes, it's just easier a lot of the time. But that's probably because I mostly work with document processing, so it's very clear. Usually You'll just update the objects data.
A benefit of having one object with the full history is that it is much easier to debug. Because the raw data and the API result are in the same object. So you can very easily probe what went wrong. If you start splitting things up it's harder to track. In general, the more information an object has about where it's been, the easier it is to figure out what went wrong with it.
Remember also though, since this is a Python question, Python is multi-paridigm. And if you're writing pipeline-style architectures (synchronous, linear processes), then a functional approach can also work well.
Once your data is stored in a product object, anything can hold a reference to that. So a shop can reference an object and a product can reference the object. Be clear on the difference between "has-a" relationships and "is-a" relationships.
As a caveat: I am an utter novice here. I wouldn't be surprised to learn a) this is already answered, but I can't find it because I lack the vocabulary to describe my problem or b) my question is basically silly to begin with, because what I want to do is silly.
Is there some way to store a reference to a class instance that defined and stored in active memory and not stored in NDB? I'm trying to write an app that would help manage a number of characters/guilds in an MMO. I have a class, CharacterClass, that includes properties such as armor, name, etc. that I define in main.py as a base python object, and then define the properties for each of the classes in the game. Each Character, which would be stored in Datastore, would have a property charClass, which would be a reference to one of those instances of CharacterClass. In theory I would be able to do things like
if character.charClass.armor == "Cloth":
while storing the potentially hundreds of unique characters and their specifc data in Datastore, but without creating a copy of "Cloth" for every cloth-armor character, or querying Datastore for what kind of armor a mage wears thousands of times a day.
I don't know what kind of NDB property to use in Character to store the reference to the applicable CharacterClass. Or if that's the right way to do it, even. Thanks for taking the time to puzzle through my confused question.
A string is all you need. You just need to fetch the class based on the string value. You could create a custom property that automatically instantiates the class on reference.
However I have a feeling that hard coding the values in code might be a bit unwieldy. May be you character class instances should be datastore entities as well. It means you can adjust these parameters without deploying new code.
If you want these objects in memory then you can pre-cache them on warmup.
I'm working on creating a python program to interact with many different types of conceptual objects. For example, it might represent a person, in which case it'd have something like this:
type = "person"
name = "Bono"
profession = "performer"
nationality = "Irish"
However, it might also represent a magazine, in which case it'd look something like this
type = "publication"
name = "Rolling Stone"
editor = ("Jann Wenner" , "Will Dana")
founding_year = "1967"
Aside from type and name, all of the other fields are optional. Here's the tricky bit -- it's part of code written for a scraper, so all of the other fields are determined/created on the fly. In other words, we won't know that we need an "editor" field until the scraper spits back "editor" to the code
Ideally, this would be implemented fairly straightforwardly as a python dictionary of lists. However, we'll be working with a large number of records -- too many to keep in memory at the same time. As a result, I'd like to have database compatibility -- something like Django's MVC, so we can easily query the record set.
One option I had considered was Django fieldsets, but it looks like they're still in beta and I worry that I'll lose some generality in what I can store -- ideally, I'd be able to store any type of data with a key, (value_list) pair. I'd love any input on the feasibility of fieldsets or example code.
Another option I had considered was a combination of the Django MVC and JSON. In this case, I'd have three columns for each object -- type, name, and attributes. Attributes would be a JSON serialization (or other appropriate pickling method) of all of the other attributes, so that once you had the object, you could reconstitute it's attributes and query the set. I'd store something like this or this (links). With this method, I'd lose the ability to easily do a search for any of the attributes in the dict.
I'd very much appreciate any input or guidance. If anyone knows of similar projects, I'd love to know.
This seems like an excellent opportunity to use NoSQL databases. Something like MongoDB doesn't rely on a fixed schema, so it might be suitable for your scenario.
In may app, I have the following process:
Get a very long list of people
Create an entity for each person
Send an email to each person (step 2 must be completed before step 3 starts)
Because the list of people is very large, I don't want to put them in the same entity group.
In doing step 3, I can query the list of people like this:
Person.all()
Because of eventual consistency, I might miss some people in step 3. What is a good way to ensure that I am not missing anyone in step 3?
Is there a better solution than this?:
while Person.all().count() < N:
pass
for p in Person.all()
# do whatever
EDIT:
Another possible solution came to mind. I could create a linked list of the people. I can store a link to the first one, he can link to the second one and so one. It seems that the performance would be poor however, because you'd be doing each get separately and wouldn't have the efficiencies of a query.
UPDATE: I reread your post and saw that you don't want to put them all in the same entity group. I'm not sure how to guarantee strong consistency without doing so. You might want to restructure your data so that you don't have to put them in the same entity group, but in several. Perhaps depending on some aspect of a group of Person entities? (e.g., mailing list they are on, type of email being sent, etc.) Does each Person only contain a name and an email address, or are there other properties involved?
Google suggests a a few other alternatives:
If your application is likely to encounter heavier write usage, you may need to consider using other means: for example, you might put recent posts in a memcache with an expiration and display a mix of recent posts from the memcache and the Datastore, or you might cache them in a cookie, put some state in the URL, or something else entirely. The goal is to find a caching solution that provides the data for the current user for the period of time in which the user is posting to your application. Remember, if you do a get, a put, or any operation within a transaction, you will always see the most recently written data.
So it looks like you may want to investigate those possibilities, although I'm not sure how well they would translate to what your app needs.
ORIGINAL POST: Use ancestor queries.
From Google's "Structuring Data for Strong Consistency":
To obtain strongly consistent query results, you need to use an ancestor query limiting the results to a single entity group. This works because entity groups are a unit of consistency as well as transactionality. All data operations are applied to the entire group; an ancestor query won't return its results until the entire entity group is up to date. If your application relies on strongly consistent results for certain queries, you may need to take this into consideration when designing your data model. This page discusses best practices for structuring your data to support strong consistency.
So when you create a Person entity, set a parent for it. I believe you could even just have a specific entity be the "parent" of all the others, and it should give you strong consistency. (Although I like to structure my data a bit with ancestors anyway.)
# Gives you the ancestor key
def ancestor_key(kind, id_or_name):
return db.Key.from_path(kind, id_or_name)
# Kind is the db model your using (should be 'Person' in this case) and
# id_or_name should be the key id or name for the parent
new_person = Person(your_params, parent=ancestor_key('Kind', id_or_name)
You could even do queries at that point for all the entities with the same parent, which is nice. But that should help you get more consistent results regardless.
I'm creating a small website with Django, and I need to calculate statistics with data taken from several tables in the database.
For example (nothing to do with my actual models), for a given user, let's say I want all birthday parties he has attended, and people he spoke with in said parties. For this, I would need a wide query, accessing several tables.
Now, from the object-oriented perspective, it would be great if the User class implemented a method that returned that information. From a database model perspective, I don't like at all the idea of adding functionality to a "row instance" that needs to query other tables. I would like to keep all properties and methods in the Model classes relevant to that single row, so as to avoid scattering the business logic all over the place.
How should I go about implementing database-wide queries that, from an object-oriented standpoint, belong to a single object? Should I have an external kinda God-object that knows how to collect and organize this information? Or is there a better, more elegant solution?
I recommend extending Django's Model-Template-View approach with a controller. I usually have a controller.py within my apps which is the only interface to the data sources. So in your above case I'd have something like get_all_parties_and_people_for_user(user).
This is especially useful when your "data taken from several tables in the database" becomes "data taken from several tables in SEVERAL databases" or even "data taken from various sources, e.g. databases, cache backends, external apis, etc.".
User.get_attended_birthday_parties() or Event.get_attended_parties(user) work fine: it's an interface that makes sense when you use it. Creating an additional "all-purpose" object will not make your code cleaner or easier to maintain.