dictionary | class | namedtuple from YAML - python

I have a large-ish YAML file (~40 lines) that I'm loading using PyYAML. This is of course parsed into a large-ish dictionary plus a couple of arrays.
My question is: how to manage the data. I can of course leave it in the output dictionary and work through the data. But I was wondering if it's better instead to mangle the data in a class or use a nametuple to hold the data.
Any first-hand experience about that?

Whether you post-process the data structure into a class or not primarily has to do with how you are using that data. The same applies to the decision whether to use a tag or not and load (some off) the data from the YAML file into a specific instance of a class that way.
The primary advantage of using a class in both cases (post-processing, tagging) is that you can do additional tests during initialisation for consistency, that are not done on the key-value pairs of a dict or on the items of list.
A class also allows you to provide methods to check values before they are set, e.g. to make sure they are of the right type.
Whether that overhead is necessary depends on the project, who is using and/or updating the data etc and how long this project and its data is going to live (i.e. are you still going to understand the data and its implicit structure a year from now). These are all issues for which a well designed (and documented) class can help, at the cost of some extra work up-front.

Related

How can i implement structure array like matlab in python?

How can i implement structure array like matlab in python ?
matlab code :
cluster.c=[]
cluster.indiv=[]
Although you can do this in Python (as I explain below), it might not be the best or most pythonic approach. For other users that have to look at your code (including yourself in 3 months) this syntax is extremely confusing. Think for example about how this deals with name conflicts, undefined values and iterating over properties.
Instead consider storing the data in a data structure that is better suited for this such as a dictionary. Then you can just store everything in
cluster = {'c':[],'indiv':[]}
Imitating Matlab in a bad way:
You can assign properties to any mutable objects in python.
If you need an object just for data storage, then you can define a custom class without any functionality in the following way:
class CustomStruct():
pass
Then you can have
struct=CustomStruct()
struct.c=[]
and change or request properties of the class in this way.
Better approach:
If you really want to store these things as properties of an object, then it might be best to define the variables in the init of that class.
class BetterStruct():
def __init__(self):
self.c=[]
self.indiv=[]
In this way, users looking at your code can immediately understand the expected values, and you can guarantee that they are initalised in a proper fashion.
Allowing data control
If you want to verify the data when it is stored, or if it has to be calculated once the user requests it (instead of storing it constantly), then consider using Python property decorators

Using attrs to turn JSONs into Python classes

I was wondering if it possible to use the attrs library to convert nested JSONs to Python class instances so that I can access attributes in that JSON via dot notation (object.attribute.nested_attribute).
My JSONs have a fixed schema, and I would be fine with having to define the classes for that schema manually, but I'm not sure if it would be possible to turn the JSON into the nested class structure without having to instantiate every nested object individually. I'm basically looking for a fromdict() function that knows (based on the keys) which class to turn a JSON object into.
(I also know that there are other ways to build 'DotDicts', but these seem always a bit hacky to me and would probably need thorough testing to verify that they work correctly.)
The attrs wiki currently has two serialization libraries:
cattrs
and related.
With cattrs being maintained by one of attrs’ most prolific contributors.
I know that some people mention integrations with other systems too. At this point it's unlikely that attrs will grow an own solution since the externally developed look pretty good.

Proper way to save python dictionaries and retrieve them at a later stage

following an earlier question I asked here (Most appropriate way to combine features of a class to another?) I got an answer that I finally grown to understand. In short what I intend to now is have a bunch of dictionaries, each dictionary will look somewhat like this:
{ "url": "http://....", "parser": SomeParserClass }
though more properties might be added later but will include either strings or some other classes.
Now my question is: what's the best way to save these objects?
I thought up of 3 solutions, not sure which one is the best and if there are any other more acceptable solutions.
Use pickle, while it seems efficient to use it would make editing any of these dictionaries a pain, since it's saved in binary format.
Save each dictionary in a separate module and import these modules dynamically from a single directory, each module would either have a function inside it to return the dictionary or a specially crafted variable name to hold it so I could call it from my loading code. This seems the easier the edit but doesn't sound very efficient or pythonic
Use some sort of database like MongoDB or Riak to save these objects, my problem with this one is either editing which is doable but doesn't sound like fun and the fact that the former 2 are equipped with means to correctly save my parser class inside the dictionary, I have no idea how these databases serialize or 'pickle' such objects.
As you see my main concerns are how easy would it be to edit them, the efficiency of saving and retrieving the data (though not a huge concern since I only have a couple of hundreds of these) and the correctness of the solution.
So, any thoughts?
Thank you in advance for any help you might be able to provide.
Use JSON. It supports python dictionaries and can be easily edited.
You can try shelve. It's built on top of pickle and let's you serialize objects and associate them to string keys.
Because it is based on dbm, it will only access key/values as you need them. So if you only need to access a few items from a large dictionary, shelve may be a better choice than json, which has to load the entire JSON file into a dictionary first.

Storing a python set in a database with django

I have a need to store a python set in a database for accessing later. What's the best way to go about doing this? My initial plan was to use a textfield on my model and just store the set as a comma or pipe delimited string, then when I need to pull it back out for use in my app I could initialize a set by calling split on the string. Obviously if there is a simple way to serialize the set to store it in the db so I can pull it back out as a set when I need to use it later that would be best.
If your database is better at storing blobs of binary data, you can pickle your set. Actually, pickle stores data as text by default, so it might be better than the delimited string approach anyway. Just pickle.dumps(your_set) and unpickled = pickle.loads(database_string) later.
There are a number of options here, depending on what kind of data you wish to store in the set.
If it's regular integers, CommaSeparatedIntegerField might work fine, although it often feels like a clumsy storage method to me.
If it's other kinds of Python objects, you can try pickling it before saving it to the database, and unpickling it when you load it again. That seems like a good approach.
If you want something human-readable in your database though, you could even JSON-encode it into a TextField, as long as the data you're storing doesn't include Python objects.
Redis natively stores sets (as well as other data structures (lists, dicts, queue)) and provides set operations - and its rocket fast too. I find it's the swiss army knife for python development.
I know its not a relational database per se, but it does solve this problem very concisely.
What about CommaSeparatedIntegerField?
If you need other type (string for example) you can create your own field which would work like CommaSeparatedIntegerField but will use strings (without commas).
Or, if you need other type, probably a better way of doing it: have a dictionary which maps integers to your values.

Python: Should I put my data in lists or object attributes?

I am looking for an appropriate data structure in Python for processing variably structured forms. By variably structured forms I mean that the number of form fields and the types of the form's contents are not known in advance. They are defined by the user who populates the forms with his input.
What are the pros and cons of putting data in A) object attributes (e.g. of an otherwise empty "form"-class) or B) simply lists/dicts? Consider that I have to preserve the sequence of form fields, the form field names and the types.
(Strangely, it has been difficult to find conclusive information on this topic. As I am still new to Python, it's possible that I have searched for the wrong terms. If my question is not clear enough, please ask in the comments and I will try to clarify.)
In Python, as in all object-oriented languages, the purpose of classes is to associate data and closely-related methods that act on that data. If there's no real encapsulation going on (i.e. the methods help define the ways you can interact with the data), the best choice is a conglomeration of builtin types like lists and dictionaries as you mention and perhaps some utility functions that act on those sorts of data structures.
Python classes are literally just two dicts (one for functions, one for data), a name and the rules how Python looks for keys. When you access existing keys, there is absolutely no difference to a dict (unless you overwrote the access rules of cause).
That means that there is no drawback (besides more code) to using classes at all and you should never be afraid to write a class.
In your particular case I think you should go with classes, for one simple reason: You might want to extend them later. Maybe you want to add constraints on the name (length, allowed letters, uniqueness, ...) or the value (not empty, length, type, ...) of a field one day. Maybe you want to validate all fields in a form. If you use a class you can do this without changing any code outside the class! And as I said before, even if you don't, there are no drawbacks!
I guess my rule of thumb for classes is: Don't use a class if you're absolutely sure that there is nothing to add to it. If not just write those few extra lines.
It's not very Pythonic to randomly add members to an object. It would be more Pythonic if you used member methods to do it, but still not the way things are usually done.
Every library I've seen for this kind of thing uses dictionaries or lists. So that is the idiomatically Python way to handle the problem. Sometimes they use an object that overrides __getitem__ so it can behave like a dictionary or list, but it's still dictionary syntax that's used to access the fields.
I think all the pros and cons have to do with people understanding your code, and since I've never seen code that handles this by having an object with members that can appear I don't think many people will find code that does do that to be very understandable.
A list of dictionaries (e.g. [{"type": "text", "name": "field_name", "value": "test value"}, ...]) would be a usable structure, if I understand your requirement correctly.
Whether object are better in this case depends on what you're doing later. If you use the objects just as data storage, you don't gain anything. Maybe a list of field objects, which implement some appropriate methods to deal with your data, would also be a good choice.
maybe if you set up an object to use for each field and store those in a list, but that is practically ending up like a glorified dictionary
then you could access it like
fields[2].name
fields[2].value
ect

Categories

Resources