Visualizing the relationships of classes and methods in Python - python

I am reading the source code of a game written in Python, which involves a lot of methods under many classes tangles together. I want to start with a graph which gives an overview of the whole package. Something like Class1.methodA uses Class2.methodA and Class2.methodC; Class2.methodC uses Class2.methodB.... And presented in a graph with nodes and arrows so that I can see the dependencies clearly.
I can certainly do that manually level by level, but that will take a lot of time and might mess up when it gets complex.
I've seen a tool called "snakefood" which visualize dependencies. I tried but failed (does not work for Python3? I am not sure why. And therefore also not sure if it is what I am looking for). Any suggestions?

Related

How to structure data clustering python project?

The project: Read in 2D data, cluster datapoints based on different cluster techniques/models, and evaluate how well the clustering has worked.
Since I am unhappy with my project structure so far, and have little experience with project structures, I hope to get some feedback on how to proceed. The structure is as follows:
/2Dclustering
__init__.py
__main__.py
__2dcluster.py
/cluster_forming
__init__.py
__cluster_models.py
/evaluate_cluster
__how_good_is_clustering.py
__choose_the_better_cluster.py
In main.py, we read in the input data, and create a 2dcluster object using __2dcluster.py that is then saved as the output. The 2dcluster class uses the function from cluster_forming and evaluate_cluster to form a cluster and adding a metric (i.e.how well did it perfom?) to it. In both subfolders (cluster_forming and evaluate_cluster), we have just files with a bunch of functions instead of classes. My question is:
1.) Does it in general make sense to break everything into so many subfolders?
2.) Would it make sense to have class objects for evaluate_clusters that evaluate_cluster? I feel like now it is a little messy but I have no intuition if creating classes would over-complicate things.
3.) Is there a sensible way of creating classes that deal with all the subclasses, i.e. a class that just combines other classes- or is this nonsense?
If anyone has an intuition on a structure that would make more sense, Id be really happy to hear it. As I said, as someone that has never written bigger projects, I am kind of at loss on what is considered a clever solution and what is overcomplicating the project. Thanks!
Your project's structure should balance the tension between competing stakeholder's needs.
i) The Coder : This person (probably you) will want to have a number of small, composable functions spread across the codebase where they can be easily tested in isolation, and reused in lots of different ways. The code should be split logically into isolatable, functional blocks that provide clear themes of functionality. Over time, as the codebase gets larger, it pays to split it into ever smaller unit components, to support testing and debugging.
ii) The End-User : This person wants to be able to install your code and be able to run or import it with a minimum of fiddling about. Their priority will be utility, and as such they will want a single point of entry, with a simple interface without having to spend time learning about project structures to get stuff done.
The structure should split the codebase up into different, but meaningful blocks of code, each of which might exist as elements in their own right.
The user should be able to run or access your code via a handful of useful entry-points, and the tester/coder should be able to isolate any problem to a discrete function, the fixing of which wont impact anything else in the codebase.
Often, when building a project, it's common to start with a single, monolithic chunk of code, and then over time, split it out into separate units to support maintenance. As a project matures, splitting out commonly reused components into their own utility areas becomes a good strategy - having the foresight to do this from the start is laudable, but not always necessary.
If your project is to do with clustering, then there's likely a workflow that follows the steps outlined: process data, perform clustering, evaluate results - so there's likely going to be a functional split that develops along those lines - but they're all part of a fairly tightly coupled package of functionality, so I'd be tempted to arrange all of these into a single directory - maybe even a single .py file initially, depending on how much code you're likely to generate.
Possibly, if you're going to process data in lots of different ways (i.e. not just for clustering) then there might be a case for developing some utility data-reading/processing package that you can hook in for future processing tasks, which would warrant making a different package, or placing in its own sub-directory but that's highly speculative - and presupposes that you'll be bulking this package out with additional (non clustering) functionality/workflows.
I don't think you need to build your own classes on the fly as proposed; A cluster is just a set of associations between object identifiers and groups. Any clustering can be expressed as a set of tuples, where each tuple associates one index(i) with one group(g) with i's drawn from the set I (all your data's indices) and G (the full collection of groups).
One cluster assignment boils down to (i,g) where i ∊ I and g ∊ G
So a full clustering would consist of a list of all [(i,g)] for each i in I, and each associated g in G
Which is likely going to be the same for any cluster/grouping.

Create a viewer/editor of directed acyclic graph with python

Until now I've been using the streamlit framework for most of plotting and visualizing but recently I've had some new ideas. Long story short, something like Unreal's Blueprint editor is what I need (that's a bit too much, I know). For now I would be content with something least remotely similar to it. As only a few people will be using it so it is not a product just a sketch.
Maybe we can omit some of the details and say that we have a Pipeline.
Meaning it has some Steps, which in turn have Inputs and Outputs.
Then we say that earlier steps do not have access to later outputs.
Now we have a picture. And that picture would be an acyclic graph!
But maybe you see other options. How would you approach such a problem?

What kind of algorithm should I look for in order to recognize the blueprint of a building?

I have a project in which I should analyze the layout of a building in order to navigate inside it, and I was thinking about taking the blueprint of the building (or maybe an edited version of the blueprint, which should be modified in some way I am still thinking of), transforming it in some kind of object and then elaborate it.
Basically, I was thinking about doing something similar to OCR but limited (and I guess using limited sounds pretty silly to most of you, but still bear with me) to recognition of, for example, walls and doors. My idea was transforming the whole image into a matrix of points - I guess, a lower resolution version of the source - and then elaborating over the matrix the route from point A to point B.
This is the idea, but I guess that I'm actually looking at a problem way more complex than it looks to me, moreover I don't really know whether this is the best (read: easiest) way to proceed.
In short, my question is:
Is this framework feasible? Are there any libraries for, say, Python, with similar functions? Is the recognition doable by working in someway with a graphic design software (e.g. Photoshop)?

Is there such thing as a generic dependency resolution library for python?

I'm looking for something that might let me define some kind of 'fact', and 'dependencies' between facts, and define functions which will attempt to resolve these dependencies.
Sort of like the way that a package manager resolves dependencies between packages, but a library that abstracts that dependency resolution process.
As in I'd like to be able to declare that I'd like fact A to be true, and that fact A also requires fact B to be true, and given some set of inputs that would allow it to determine whether A and B are true, this system would do whatever it takes to make both A and B true.
I know it has been a very long time since this question was asked but I had exactly the same question and I could not find anything useful on Google...
TL;DR: Based on this post I started working on a multi-version dependency resolving algorithm which you can find here. I cannot guarantee it is correct but seems complete. It was more of an experiment but it runs successfully with a controlled data-set and produces a result with random data.
Now, what I have learned from this process is that is fairly easy to implement a simple dependency resolution algorithm that handles most of the cases on known data and works on a "flat" graph. However, this gets a lot more complex if one takes into consideration multiple versions, range based dependencies (ie version > 2.0 and !2.2) as opposed to absolute ones, multiple correct solutions, incomplete or incorrect data.
The most noticeable problem, in complex cases, is Circular Dependencies (see here and here). These definitely exists on software/package managers. I have attempted to solve it with a round-based approach which will either lead to a solution, a failure or a (detected) loop after a number of attempts.
Finally, I tried to generalize the problem and create a reusable library. You will notice thought that my main logic (_resolve()) is a lot more straight forward than satisfy_criteria() which is problem/area specific function of dependecy acceptance. Also, the circular dependency "resolution" is completely based on package management/versioning and most probably is not applicable to any other case/area.
The above lead me to believe that despite the fact that most of the solutions will involve graphs, at one stage or another, there cannot be a high-level generic solution to the dependency resolution problem.
My next step (given time) would be to start reading apt-get, rpm or aptitude sources and see how other people solved this problem...

solving ODEs on networks with PyDSTool

After using scipy.integrate for a while I am at the point where I need more functions like bifurcation analysis or parameter estimation. This is why im interested in using the PyDSTool, but from the documentation I can't figure out how to work with ModelSpec and if this is actually what will lead me to the solution.
Here is a toy example of what I am trying to do: I have a network with two nodes, both having the same (SIR) dynamic, described by two ODEs, but different initial conditions. The equations are coupled between nodes via the Epsilon (see formula below).
formulas as a picture for better read, the 'n' and 'm' are indices, not exponents ~>
http://image.noelshack.com/fichiers/2014/28/1404918182-odes.png
(could not use the upload on stack, sadly)
In the two node case my code (using PyDSTool) looks like this:
#multiple SIR metapopulations
#parameter and initial condition definition; a dict is a must
import PyDSTool as pdt
params={'alpha': 0.7, 'beta':0.1, 'epsilon1':0.5,'epsilon2':0.5}
ini={'s1':0.99,'s2':1,'i1':0.01,'i2':0.00}
DSargs=pdt.args(name='SIRtest_multi',
ics=ini,
pars=params,
tdata=[0,20],
#the for-macro generates formulas for s1,s2 and i1,i2;
#sum works similar but sums over the expressions in it
varspecs={'s[o]':'for(o,1,2,-alpha*s[o]*sum(k,1,2,epsilon[k]*i[k]))',
'i[l]':'for(l,1,2,alpha*s[l]*sum(m,1,2,epsilon[m]*i[m]))'})
#generator
DS = pdt.Generator.Vode_ODEsystem(DSargs)
#computation, a trajectory object is generated
trj=DS.compute('test')
#extraction of the points for plotting
pts=trj.sample()
#plotting; pylab is imported along with PyDSTool as plt
pdt.plt.plot(pts['t'],pts['s1'],label='s1')
pdt.plt.plot(pts['t'],pts['i1'],label='i1')
pdt.plt.plot(pts['t'],pts['s2'],label='s2')
pdt.plt.plot(pts['t'],pts['i2'],label='i2')
pdt.plt.legend()
pdt.plt.xlabel('t')
pdt.plt.show()
But in my original problem, there are more than 1000 nodes and 5 ODEs for each, every node is coupled to a different number of other nodes and the epsilon values are not equal for all the nodes. So tinkering with this syntax did not led me anywhere near the solution yet.
What I am actually thinking of is a way to construct separate sub-models/solver(?) for every node, having its own parameters (epsilons, since they are different for every node). Then link them to each other. And this is the point where I do not know wether it is possible in PyDSTool and if it is the way to handle this kind of problems.
I looked through the examples and the Docs of PyDSTool but could not figure out how to do it, so help is very appreciated! If the way I'm trying to do things is unorthodox or plain stupid, you are welcome to make suggestions how to do it more efficiently. (Which is actually more efficient/fast/better way to solve problems like this: subdivide it into many small (still not decoupled) models/solvers or one containing all the ODEs at once?)
(Im neither a mathematician nor a programmer, but willing to learn, so please be patient!)
The solution is definitely not to build separate simulation models. That won't work because so many variables will be continuously coupled between the sub-models. You absolutely must have all the ODEs in one place together.
It sounds like the solution you need is to use the ModelSpec object constructs. These let you hierarchically build the sub-model definitions out of symbolic pieces. They can have their own "epsilon" parameters, etc. You declare all the pieces when you're finished and let PyDSTool make the final strings containing the ODE definitions for you. I suggest you look at the tutorial example at:
http://www.ni.gsu.edu/~rclewley/PyDSTool/Tutorial/Tutorial_compneuro.html
and the provided examples: ModelSpec_test.py, MultiCompartments.py. But, remember that you still have to have a source for the parameters and coupling data (i.e., a big matrix or dictionary loaded from a file) to be able to automate the process of building the model, otherwise you'd still be writing it all out by hand.
You have to build some classes for the components that you want to have. You might also create a factory function (compare 'makeSoma' in the neuralcomp.py toolbox) that will take all your sub-components and create an ODE based on summing something up from each of the declared components. At the end, you can refer to the parameters by their position in the hierarchy. One might be 's1.epsilon' while another might be 'i4.epsilon'.
Unfortunately, to build models like this efficiently you will have to learn to do some more complex programming! So start by understanding all the steps in the tutorial. You can email me directly through the SourceForge support discussions or email once you've got started and have specific questions.

Categories

Resources