Save objects in python

Save objects in python - python

I'm programming an animal guessing game in Python, as a binary tree with animals as leaves and discriminatory questions as intermediate nodes. Leaves and questions are objects. Now I want to be able to save the animals and the intermediate questions as pickle-file.
But I do not know how I can identify the various objects for pickling. Normally you would create an object like so: monkey = Animal('Is it a monkey?') so that you could refer to the object by the name monkey.
But as the tree grows the leaf-object monkey is changed into an intermediate node with question 'Does it like peanuts' with a yes-exit to a new monkey-node, and a no-exit to another (new) animal. So, how do I pickle these objects?

I would utilize a pre-order traversal starting at the root node and traversing down using the pre-order methodology.
Then, when you want to read the file, you can use the same type of traversal to read your tree back to your program.
All of your nodes can be reached from the root node, so these types of traversals are really handy for easy writing and reading of binary search trees. Last year in my Data Structures course I completed a very similar assignment using this method.

Related

How to save an AST generated by ANTLR

I have successfully generated an AST using ANTLR in python but I cannot figure out for the life of me how I can save this for later use. The only option I have been able to figure out is to use tree.toStringTree() method, but the output of this is messy and not overly convenient or easy to work with.
How do I save it and what format would be best/easiest to work with and be able to visualise and load it in in the future?
EDIT: I can see in the java documentation there is a DotGenerator() to generate a DOT file of the tree, but I can't find a way to do anything like this in python.

What you are looking for is a serializer/deserializer of the parse tree. Serialization was previously addressed in StackOverflow here. It isn't supported in the runtime (ASAIK) because it is not useful: one can reconstruct the tree very quickly by re-parsing the input. Even if you want to change the tree using a transformation, you can replace the nodes in the tree with sub-trees with node types that don't even exist in your parser, print out the tree, then re-parse to reconstruct the tree with the parse types for your grammar. It only makes sense if parsing with semantic analysis is very slow. So, you should consider the problem carefully.
However, it's not difficult to write a crude serializer/deserializer that does not consider "off-channel" content like spacing or comments. This C# program (which you could adapt to python) is an example that reconstructs the tree using the grammars-v4/sexpression.g4 grammar for a target grammar arithmetic.g4. Using toStringTree(rule-names), the tree is first serialized into a string. (Note, toStringTree() without the parser rule names is difficult to read, that is why I asked.) Then, the s-expression is parsed and a bottom-up reconstruction is performed using an Antlr visitor. Since toStringTree() does not mark the parse tree leaves with the type of the token (e.g., to distinguish between a number versus a symbol), the string is lexed to reconstruct the value. It also uses reflection to create the correct parse tree node type.
Outputting a Dot graph for the parse tree is also easy, which I included in the program, using a top-down recursive visitor. Here, the recursive function outputs each edge to a child for a particular node. Since each node name has to be unique (it's a tree), I added the pre-order tree number for the node to the name.
--Ken

Family tree in Python

I need to model a four generational family tree starting with a couple. After that if I input a name of a person and a relation like 'brother' or 'sister' or 'parent' my code should output the person's brothers or sisters or parents. I have a fair bit of knowledge of python and self taught in DSA. I think I should model the data as a dictionary and code for a tree DS with two root nodes(i.e, the first couple). But I am not sure how to start. I just need to know how to start modelling the family tree and the direction of how to proceed to code. Thank you in advance!

There's plenty of ways to skin a cat, but I'd suggest to create:
A Person class which holds relevant data about the individual (gender) and direct relationship data (parents, spouse, children).
A dictionary mapping names to Person elements.
That should allow you to answer all of the necessary questions, and it's flexible enough to handle all kinds of family trees (including non-tree-shaped ones).

How to create a nested data structure in Python?

Since I recently started a new project, I'm stuck in the "think before you code" phase. I've always done basic coding, but I really think I now need to carefully plan how I should organize the results that are produced by my script.
It's essentially quite simple: I have a bunch of satellite data I'm extracting from Google Earth Engine, including different sensors, different acquisition modes, etc. What I would like to do is to loop through a list of "sensor-acquisition_mode" couples, request the data, do some more processing, and finally save it to a variable or file.
Suppose I have the following example:
sensors = ['landsat','sentinel1']
sentinel_modes = ['ASCENDING','DESCENDING']
sentinel_polarization = ['VV','VH']
In the end, I would like to have some sort of nested data structure that at the highest level has the elements 'landsat' and 'sentinel1'; under 'landsat' I would have a time and values matrix; under 'sentinel1' I would have the different modes and then as well the data matrices.
I've been thinking about lists, dictionaries or classes with attributes, but I really can't make up my mind, also since I don't have that much of experience.
At this stage, a little help in the right direction would be much appreciated!

Lists: Don't use lists for nested and complex data structures. You're just shooting yourself in the foot- code you write will be specialized to the exact format you are using, and any changes or additions will be brutal to implement.
Dictionaries: Aren't bad- they'll nest nicely and you can use a dictionary whose value is a dictionary to hold named info about the keys. This is probably the easiest choice.
Classes: Classes are really really useful for this if you need a lot of behavior to go with them - you want the string of it to be represented a certain way, you want to be able to use primitive operators for some functionality, or you just want to make the code slightly more readable or reusable.
From there, it's all your choice- if you want to go through the extra code (it's good for you) of writing them as classes, do it! Otherwise, dictionaries will get you where you need to go. Notably the only thing a dictionary couldn't do would be if you have two things that should be at the key level in the dictionary with the same name (Dicts don't do repetition).

Tree of trees? Table of trees? What kind of data structure have I created?

I am creating a python module that creates and operates on data structures to store lots of semantically tagged data and metadata from real experiments. So in an experiment you have:
subjects
treatments
replicates
Enclosing these 3 categories is the experiment, and combinations of the three categories are what I am calling "units". Now there is no inherently correct hierarchy between the 3 (table-like) but for certain analyses it is useful to think of a certain permutation of the 3 as a hierarchy,
e.g. (subjects-->(treatments-->(replicates)))
or
(replicates-->(treatments-->(subjects)))
Moreover, when collecting data, files will be copy-pasted into a folder on a desktop, so data is at least coming in as a tree. I have thought a lot about which hierarchy is "better" but I keep coming up with use cases for most of the 6 possible permutations. I want my module to be flexible in that the user can think of the experiment or collect the data using whatever hierarchy, table, hierarchy-table hybrid makes sense to them.
Also the "units" or (table entries) are containers for arbitrary amounts of data (bytes to Gigabytes, whatever ideally) of any organizational complexity. This is why I didn't think a relational database approach was really the way to go and a NoSQL type solution makes more sense. But then i have the problem of how to order the three categories if none is "correct".
So my question is what is this multifaceted data structure?
Does some sort of fluid data structure or set of algorithms exist to easily inter-convert or produce structured views?

The short answer is that HDF5 addresses these fairly common concerns and I would suggest it. http://www.hdfgroup.org/HDF5/
In python: http://docs.h5py.org/en/latest/high/group.html
http://odo.pydata.org/en/latest/hdf5.html
will help.

Representing hierarchical relationships with "multiple inheritance" in a relational database

I'm working on a python program that allows the user to categorise files by attaching 'tags' to them. These tags can stand in hierarchical relationships to one another. For example, the 'cat' tag can be categorized as a "descendant" of the 'mammal' tag. As a consequence, once a file is tagged as 'dog', it can be accessed via the 'mammal' tag as well.
These tags and their relationships to each other and to files will obviously need to be stored in a database, and I'm most familiar with relational databases.
I very much like the Modified Pre-order Tree Traversal method for storing trees in a relational database because it removes the need for recursion and requires fewer database queries.
However, I also want to facilitate tags with multiple parents. For example, 'dog' could be a child of 'mammal' and also of 'four-legged-thing' where not all four legged things are mammals or even animals (e.g. tables), and the 'mammal' and 'four-legged-thing' tags have no "common ancestor".
Does anyone know of a method of representing such relationships in a database while maintaining some of the advantages of the MPTT method?
Thanks for any help.

What you are describing is an acyclic directed graph, not a tree, so you can't use any of the sql "tree-storage" methods like MPTT. Here is an article that demonstrates an adjacency-list approach to this problem.
I highly recommend that you do not go down this path, however, not because of the difficulty of implementation, but because you will end up confusing and frustrating your users. In my experience users make poor use of complex ontological systems and are easily confused by them. Either use a flat "tag" namespace with no parent-child relationships, or use a tree arrangement with at most one parent per node.
But if you want to have a graph, he most straightforward way is to have a table like this:
CREATE TABLE tag_relationships (
tag_child_id INTEGER NOT NULL REFERENCES tags (id) ON UPDATE CASCADE ON DELETE CASCADE,
tag_parent_id INTEGER NOT NULL REFERENCES tags (id) ON UPDATE CASCADE ON DELETE CASCADE,
PRIMARY KEY (tag_child_id, tag_parent_id)
);
You will probably not be able to avoid recursive queries. When you want to create a matching search, use the tags you have as search criteria and recursively add child tags until you have a complete tag list.
You will also have to be careful about creating cycles. When you add a relationship, you need to recursively visit parents and make sure you don't end up at the same node twice.
Something you can do to avoid recursive queries and help detect cycles is to denormalize your data a bit by making all relationships explicit for every node. What I mean is, suppose A is a child of B and C, and C is a child of D.
Instead of the minimum number of edges necessary to represent this fact:
tag_child_id tag_parent_id
A B
A C
C D
You would make all implicit relationships (ones you would have had to find via recursion) explicit:
A B
A C
A D
C D
Notice that I added (A, D).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.