I've been working on a python program that basically creates 5 different types of objects that are in a hierarchy. For example my program might create 1 Region object that contains 2000 Column objects that contains 8000 Cell objects(4 Cells in each Column) where all the objects are interacting with each other based on video input.
Now, I want to be able to save all the objects states after the video input changes each of their states over a period of time. So my question is how can I save and reload thousands of objects in Python efficiently? Thanks in advance!
Not sure how efficient pickle is for large scales but I think what you're looking for is object serialization. But are you trying to 'refresh' the information in these objects or save and load them? Also read the section on 'Persistence of External Objects' since you will need to create an alphanumeric id that is associated with each object, for the relations/associations.
One totally hacky way could also be to json-ify the objects and store that. You would still need the alphanumeric id or some sort of usable identifier to associate each of the objects.
Have you looked at Shelve, Pickle or cPickle?
http://docs.python.org/release/2.5/lib/persistence.html
I think you need to look into the ZODB.
The ZODB is an object database that uses pickle to serialize data, is very adept at handling hierarchies of objects, and if your objects use the included persistent.Persistent base-class, will detect and only save the objects that changed when you commit; e.g. there is no need to write out the whole hierarchy on every little change.
Included in the ZODB project is a package called BTrees, which are ZODB aware and make storing thousands of objects in one place efficient. Use these for your Region object to store the Columns. We use BTrees to store millions of datapoints at times.
Related
I'm a self-taught programmer and a lot of the problems I encounter come from a lack of formal education (and often also experience).
My question it the following: How to rationalize where you store the data a class or function creates? I'll make a simple example:
Case: I have a webshop (SHOP) with a REST api and a product provider (PROVIDER) also with a REST API. I determine the product, I send that data to PROVIDER who sends me back formatted data that can be read by SHOP to make a working product on the webshop. PROVIDER also has a secondary REST api that provides generated images.
What I would come up with:
I'd make three classes: ProductBase, Shop and Provider
ProductBase would be the class from where I instantiate and store the individual product information.
Shop would be where I design the api interactions with the webshop.
Provider same as shop, but for interactions with provider api.
My problem: At some point you're creating data that's not clearly separated in concern. For example: Would I store the generated product data (from PROVIDER) in the ProductBase instance I created? It feels like I'm coupling the two classes this way. But it not there, then where?
What if I create product images with PROVIDER and I upload them to SHOP? Do I store the uploaded image-url in PRODUCT? How do you keep track of all this info?
The question I want answered:
I've read a lot on OOP and Design Patterns, and I have adopted a TDD approach which has greatly helped to improve my code but I haven't found anything on how to approach the flow of at runtime generated data within software engineering.
What would be a good way to solve above problem(s) and could you explain your rationale for it?
If I understand correctly, I think your current concern is that you have "raw" product data, which you want to store in objects, and you have "processed" (formatted) product data, which you also want to store in objects. Your question being should you mix them.
Let me just first point out the other obvious option. Namely, having two product classes: RawProduct and ProcessedProduct. Which to do?
(Edit: also, to be sure, product data should not be stored in provider. The provide performs the action of formatting but the data is product data. Not provider data).
It depends. There are a couple of considerations:
1) In general, in OOP, the idea is to couple actions on data with the data. So if possible, you have some method in ProductBase like "format()", where format will send the object off to the API to get formatted, and store the result in an instance variable. You can then also have a method like "find_image", that goes and fetches the image url from the API and then stores that in a field. An object's data is meant to be dynamic. It is meant to be altered by object methods.
2) If you need version control (if you want the full history of the object's state to be available), then you can't override fields with new data. So either you need to store a history of every object field in the object, or you need to create new objects.
3) Is RAM a concern? I sometimes create dataclasses that store only the final part of an object's life so that I can fit more of the objects into memory.
Personally I often find myself creating "RawObject" and "ProcessedObject" classes, it's just easier a lot of the time. But that's probably because I mostly work with document processing, so it's very clear. Usually You'll just update the objects data.
A benefit of having one object with the full history is that it is much easier to debug. Because the raw data and the API result are in the same object. So you can very easily probe what went wrong. If you start splitting things up it's harder to track. In general, the more information an object has about where it's been, the easier it is to figure out what went wrong with it.
Remember also though, since this is a Python question, Python is multi-paridigm. And if you're writing pipeline-style architectures (synchronous, linear processes), then a functional approach can also work well.
Once your data is stored in a product object, anything can hold a reference to that. So a shop can reference an object and a product can reference the object. Be clear on the difference between "has-a" relationships and "is-a" relationships.
following an earlier question I asked here (Most appropriate way to combine features of a class to another?) I got an answer that I finally grown to understand. In short what I intend to now is have a bunch of dictionaries, each dictionary will look somewhat like this:
{ "url": "http://....", "parser": SomeParserClass }
though more properties might be added later but will include either strings or some other classes.
Now my question is: what's the best way to save these objects?
I thought up of 3 solutions, not sure which one is the best and if there are any other more acceptable solutions.
Use pickle, while it seems efficient to use it would make editing any of these dictionaries a pain, since it's saved in binary format.
Save each dictionary in a separate module and import these modules dynamically from a single directory, each module would either have a function inside it to return the dictionary or a specially crafted variable name to hold it so I could call it from my loading code. This seems the easier the edit but doesn't sound very efficient or pythonic
Use some sort of database like MongoDB or Riak to save these objects, my problem with this one is either editing which is doable but doesn't sound like fun and the fact that the former 2 are equipped with means to correctly save my parser class inside the dictionary, I have no idea how these databases serialize or 'pickle' such objects.
As you see my main concerns are how easy would it be to edit them, the efficiency of saving and retrieving the data (though not a huge concern since I only have a couple of hundreds of these) and the correctness of the solution.
So, any thoughts?
Thank you in advance for any help you might be able to provide.
Use JSON. It supports python dictionaries and can be easily edited.
You can try shelve. It's built on top of pickle and let's you serialize objects and associate them to string keys.
Because it is based on dbm, it will only access key/values as you need them. So if you only need to access a few items from a large dictionary, shelve may be a better choice than json, which has to load the entire JSON file into a dictionary first.
I'm creating a game mod for Counter-Strike in python, and it's basically all done. The only thing left is to code a REAL database, and I don't have any experience on sqlite, so I need quite a lot of help.
I have a Player class with attribute self.steamid, which is unique for every Counter-Strike player (received from the game engine), and self.entity, which holds in an "Entity" for player, and Entity-class has lots and lots of more attributes, such as level, name and loads of methods. And Entity is a self-made Python class).
What would be the best way to implement a database, first of all, how can I save instances of Player with an other instance of Entity as it's attribute into a database, powerfully?
Also, I will need to get that users data every time he connects to the game server, (I have player_connect event), so how would I receive the data back?
All the tutorials I found only taught about saving strings or integers, but nothing about whole instances. Will I have to save every attribute on all instances (Entity instance has few more instances as it's attributes, and all of them have huge amounts of attributes...), or is there a faster, easier way?
Also, it's going to be a locally saved database, so I can't really use any other languages than sql.
You need an ORM. Either you roll your own (which I never suggest), or you use one that exists already. Probably the two most popular in Python are sqlalchemy, and the ORM bundled with Django.
SQL databses typically can hold only fundamental datatypes. You can use SQLAlchemy if you want to map your models so that their attributes are automatically mapped to SQL types - but it would require a lot of study and trial and error using SQLlite on your part.
I think you are not entirely correct when you say "it has to be SQL" - if you are running Python code, you can save whatver format you like.
However, Python allows you to serialize your instance Data to a string - which is persistable in a database.
So, you can create a varchar(65535) field in the SQL, along with an ID field (which could be the player ID number you mentioned, for example), and persist to it the value returned by:
import pickle
value = pickle.dumps(my_instance)
When retrieving the value you do the reverse:
my_instance = pickle.loads(value)
I have a need to store a python set in a database for accessing later. What's the best way to go about doing this? My initial plan was to use a textfield on my model and just store the set as a comma or pipe delimited string, then when I need to pull it back out for use in my app I could initialize a set by calling split on the string. Obviously if there is a simple way to serialize the set to store it in the db so I can pull it back out as a set when I need to use it later that would be best.
If your database is better at storing blobs of binary data, you can pickle your set. Actually, pickle stores data as text by default, so it might be better than the delimited string approach anyway. Just pickle.dumps(your_set) and unpickled = pickle.loads(database_string) later.
There are a number of options here, depending on what kind of data you wish to store in the set.
If it's regular integers, CommaSeparatedIntegerField might work fine, although it often feels like a clumsy storage method to me.
If it's other kinds of Python objects, you can try pickling it before saving it to the database, and unpickling it when you load it again. That seems like a good approach.
If you want something human-readable in your database though, you could even JSON-encode it into a TextField, as long as the data you're storing doesn't include Python objects.
Redis natively stores sets (as well as other data structures (lists, dicts, queue)) and provides set operations - and its rocket fast too. I find it's the swiss army knife for python development.
I know its not a relational database per se, but it does solve this problem very concisely.
What about CommaSeparatedIntegerField?
If you need other type (string for example) you can create your own field which would work like CommaSeparatedIntegerField but will use strings (without commas).
Or, if you need other type, probably a better way of doing it: have a dictionary which maps integers to your values.
Is there a way to move an entity to another kind in appengine.
Say you have a kind defines, and you want to keep a record of deleted entities of that kind.
But you want to separate the storage of live object and archived objects.
Kinds are basically just serialized dicts in the bigtable anyway. And maybe you don't need to index the archive in the same way as the live data.
So how would you make a move or copy of a entity of one kind to another kind.
No - once created, the kind is a part of the entity's immutable key. You need to create a new entity and copy everything across. One way to do this would be to use the low-level google.appengine.api.datastore interface, which treats entities as dicts.
Unless someone's written utilities for this kind of thing, the way to go is to read from one and write to the other kind!