Move or copy an entity to another kind - python

Is there a way to move an entity to another kind in appengine.
Say you have a kind defines, and you want to keep a record of deleted entities of that kind.
But you want to separate the storage of live object and archived objects.
Kinds are basically just serialized dicts in the bigtable anyway. And maybe you don't need to index the archive in the same way as the live data.
So how would you make a move or copy of a entity of one kind to another kind.

No - once created, the kind is a part of the entity's immutable key. You need to create a new entity and copy everything across. One way to do this would be to use the low-level google.appengine.api.datastore interface, which treats entities as dicts.

Unless someone's written utilities for this kind of thing, the way to go is to read from one and write to the other kind!

Related

What to do with runtime generated data spanning several classes?

I'm a self-taught programmer and a lot of the problems I encounter come from a lack of formal education (and often also experience).
My question it the following: How to rationalize where you store the data a class or function creates? I'll make a simple example:
Case: I have a webshop (SHOP) with a REST api and a product provider (PROVIDER) also with a REST API. I determine the product, I send that data to PROVIDER who sends me back formatted data that can be read by SHOP to make a working product on the webshop. PROVIDER also has a secondary REST api that provides generated images.
What I would come up with:
I'd make three classes: ProductBase, Shop and Provider
ProductBase would be the class from where I instantiate and store the individual product information.
Shop would be where I design the api interactions with the webshop.
Provider same as shop, but for interactions with provider api.
My problem: At some point you're creating data that's not clearly separated in concern. For example: Would I store the generated product data (from PROVIDER) in the ProductBase instance I created? It feels like I'm coupling the two classes this way. But it not there, then where?
What if I create product images with PROVIDER and I upload them to SHOP? Do I store the uploaded image-url in PRODUCT? How do you keep track of all this info?
The question I want answered:
I've read a lot on OOP and Design Patterns, and I have adopted a TDD approach which has greatly helped to improve my code but I haven't found anything on how to approach the flow of at runtime generated data within software engineering.
What would be a good way to solve above problem(s) and could you explain your rationale for it?
If I understand correctly, I think your current concern is that you have "raw" product data, which you want to store in objects, and you have "processed" (formatted) product data, which you also want to store in objects. Your question being should you mix them.
Let me just first point out the other obvious option. Namely, having two product classes: RawProduct and ProcessedProduct. Which to do?
(Edit: also, to be sure, product data should not be stored in provider. The provide performs the action of formatting but the data is product data. Not provider data).
It depends. There are a couple of considerations:
1) In general, in OOP, the idea is to couple actions on data with the data. So if possible, you have some method in ProductBase like "format()", where format will send the object off to the API to get formatted, and store the result in an instance variable. You can then also have a method like "find_image", that goes and fetches the image url from the API and then stores that in a field. An object's data is meant to be dynamic. It is meant to be altered by object methods.
2) If you need version control (if you want the full history of the object's state to be available), then you can't override fields with new data. So either you need to store a history of every object field in the object, or you need to create new objects.
3) Is RAM a concern? I sometimes create dataclasses that store only the final part of an object's life so that I can fit more of the objects into memory.
Personally I often find myself creating "RawObject" and "ProcessedObject" classes, it's just easier a lot of the time. But that's probably because I mostly work with document processing, so it's very clear. Usually You'll just update the objects data.
A benefit of having one object with the full history is that it is much easier to debug. Because the raw data and the API result are in the same object. So you can very easily probe what went wrong. If you start splitting things up it's harder to track. In general, the more information an object has about where it's been, the easier it is to figure out what went wrong with it.
Remember also though, since this is a Python question, Python is multi-paridigm. And if you're writing pipeline-style architectures (synchronous, linear processes), then a functional approach can also work well.
Once your data is stored in a product object, anything can hold a reference to that. So a shop can reference an object and a product can reference the object. Be clear on the difference between "has-a" relationships and "is-a" relationships.

Store reference to non-NDB object in an NDB model

As a caveat: I am an utter novice here. I wouldn't be surprised to learn a) this is already answered, but I can't find it because I lack the vocabulary to describe my problem or b) my question is basically silly to begin with, because what I want to do is silly.
Is there some way to store a reference to a class instance that defined and stored in active memory and not stored in NDB? I'm trying to write an app that would help manage a number of characters/guilds in an MMO. I have a class, CharacterClass, that includes properties such as armor, name, etc. that I define in main.py as a base python object, and then define the properties for each of the classes in the game. Each Character, which would be stored in Datastore, would have a property charClass, which would be a reference to one of those instances of CharacterClass. In theory I would be able to do things like
if character.charClass.armor == "Cloth":
while storing the potentially hundreds of unique characters and their specifc data in Datastore, but without creating a copy of "Cloth" for every cloth-armor character, or querying Datastore for what kind of armor a mage wears thousands of times a day.
I don't know what kind of NDB property to use in Character to store the reference to the applicable CharacterClass. Or if that's the right way to do it, even. Thanks for taking the time to puzzle through my confused question.
A string is all you need. You just need to fetch the class based on the string value. You could create a custom property that automatically instantiates the class on reference.
However I have a feeling that hard coding the values in code might be a bit unwieldy. May be you character class instances should be datastore entities as well. It means you can adjust these parameters without deploying new code.
If you want these objects in memory then you can pre-cache them on warmup.

Proper way to save python dictionaries and retrieve them at a later stage

following an earlier question I asked here (Most appropriate way to combine features of a class to another?) I got an answer that I finally grown to understand. In short what I intend to now is have a bunch of dictionaries, each dictionary will look somewhat like this:
{ "url": "http://....", "parser": SomeParserClass }
though more properties might be added later but will include either strings or some other classes.
Now my question is: what's the best way to save these objects?
I thought up of 3 solutions, not sure which one is the best and if there are any other more acceptable solutions.
Use pickle, while it seems efficient to use it would make editing any of these dictionaries a pain, since it's saved in binary format.
Save each dictionary in a separate module and import these modules dynamically from a single directory, each module would either have a function inside it to return the dictionary or a specially crafted variable name to hold it so I could call it from my loading code. This seems the easier the edit but doesn't sound very efficient or pythonic
Use some sort of database like MongoDB or Riak to save these objects, my problem with this one is either editing which is doable but doesn't sound like fun and the fact that the former 2 are equipped with means to correctly save my parser class inside the dictionary, I have no idea how these databases serialize or 'pickle' such objects.
As you see my main concerns are how easy would it be to edit them, the efficiency of saving and retrieving the data (though not a huge concern since I only have a couple of hundreds of these) and the correctness of the solution.
So, any thoughts?
Thank you in advance for any help you might be able to provide.
Use JSON. It supports python dictionaries and can be easily edited.
You can try shelve. It's built on top of pickle and let's you serialize objects and associate them to string keys.
Because it is based on dbm, it will only access key/values as you need them. So if you only need to access a few items from a large dictionary, shelve may be a better choice than json, which has to load the entire JSON file into a dictionary first.

python object save and reload

I've been working on a python program that basically creates 5 different types of objects that are in a hierarchy. For example my program might create 1 Region object that contains 2000 Column objects that contains 8000 Cell objects(4 Cells in each Column) where all the objects are interacting with each other based on video input.
Now, I want to be able to save all the objects states after the video input changes each of their states over a period of time. So my question is how can I save and reload thousands of objects in Python efficiently? Thanks in advance!
Not sure how efficient pickle is for large scales but I think what you're looking for is object serialization. But are you trying to 'refresh' the information in these objects or save and load them? Also read the section on 'Persistence of External Objects' since you will need to create an alphanumeric id that is associated with each object, for the relations/associations.
One totally hacky way could also be to json-ify the objects and store that. You would still need the alphanumeric id or some sort of usable identifier to associate each of the objects.
Have you looked at Shelve, Pickle or cPickle?
http://docs.python.org/release/2.5/lib/persistence.html
I think you need to look into the ZODB.
The ZODB is an object database that uses pickle to serialize data, is very adept at handling hierarchies of objects, and if your objects use the included persistent.Persistent base-class, will detect and only save the objects that changed when you commit; e.g. there is no need to write out the whole hierarchy on every little change.
Included in the ZODB project is a package called BTrees, which are ZODB aware and make storing thousands of objects in one place efficient. Use these for your Region object to store the Columns. We use BTrees to store millions of datapoints at times.

Getting newest S3 keys first

I am writing an app that stores (potentially millions of) objects in an S3 bucket. My app will take the most recent object (roughly), process it, and write it back to the same bucket. I need a way of accessing keys and naming new objects so that the app can easily get to the newest objects.
I know I can do this properly by putting metadata in SimpleDB, but I don't need hard consistency. It's ok if the app grabs an object that isn't quite the newest. I just need the app to tend to grab new-ish keys instead of old ones. So I'm trying to keep it simple by using S3 alone.
Is there a way to access and sort on S3 meta-data? Or might there be a scheme for naming the objects that would get what I need (since I know S3 lists keys in lexicographic order and boto can handle paging).
s3 versioning really helps out here. If these are really the same "thing" you can turn on versioning for you bucket, get the data from your key, modify it and store it back to the same key.
you'll need to use boto's
bucket.get_all_versions( prefix='yourkeynamehere' )
you get versions out, most recent first, so while this function doesn't handle paging, you can just get the first index and you've got the most recent version.
if you want to go back further and need paging, boto also offers a list_versions() function that takes a prefix as well and will give you a result set that will iterate through all the versions without you needing to worry about it.
if these objects really aren't the "same" object, it really doesn't matter because s3 doesn't store diffs -- it stores the whole thing every time. If you have multiple 'types' of objects, you can have multiple version sets of which you can pull the most recent.
i've been using versioning and i'm pretty happy with it.

Categories

Resources