Python's Networkx, updating attributes "automatically"

Python's Networkx, updating attributes "automatically" - python

everybody. I'm building a DiGraph using NetworkX and iterating an algorithm over it. In a particular iteration, every node "n" changes a specific attribute, let's say "A_n". Now, every edge concerning to this particular node "n" and a given predecessor "m", has another attribute of interest, that depends on "A_n", let's call it "B_mn". My question is: Is it possible to update "B_mn" "automatically" by modifying "A_n" for all "n","m" in my set of nodes? I mean, not iterating over the nodes, and then over their predecessors, but using kind of a dinamic function "B_mn(A_n)" that changes its value at the very moment "A_n" changes. Is this possible?
I thinking in something like this:
Let X and Y be numbers, let's suppose that
G.node["n"]["A"]=X and G.edge["m"]["n"]["B"]= Y+G.node["n"]["A"]
I want that by changing the value of X, the value of the attribute "B" in the edge would be updated as well.
Thank you very much in advance for your help :)

One problem with this question -> Don't ever delete nodes.
In your example you are assigning X to G.node["n"]["A"]. If you say:
G.node["n"]["A"] = 5
G.node["n"]["A"] = 6
That destroy's data locations and now G.node["n"]["A"] is pointing to a new object with a new memory location.
Instead of assignment like '=' you need to do an update of X. Which will leave the datatype and memory location in place. Which means you need a datatype which supports ".update()" like a dictionary.
Everything past here is dependent on your use case:
If the node data is a value (like an int or float) then you don't have a problem adding them together. You can keep running calculations based on value addition of changes only 1 level deeper than the calculation is being performed.
However if the node data is an expression of expressions...
example G.node.get('n')['A']+ G.node.get('m')['A'] (which G.node.get('m')['A'] is also an expression that needs to be evaluated.)
then you have one of 2 problems:
You will need a recursive function that does the evaluating OR
You will need to keep a running list of dictionaries outside of the Graph and perform the running evaluation there which will update the data values in the Graph.
It is possible to do this all within the graph using something like ast.literal_eval() (warning this is not a GOOD idea)
If you only have one operation to perform (addition?) then there are some tricks you can use like keep a running list of the data locations and then do a sum().

Related

Python: Computation in for loop doesn't match result of manual computation

I'm currently working on a project researching properties of some gas mixtures. Testing my code with different inputs, I came upon a bug(?) which I fail to be able to explain. Basically, it's concerning a computation on a numpy array in a for loop. When it computed the for-loop, it yields a different (and wrong) result as opposed to the manual construction of the result, using the same exact code snippets as in the for-loop, but indexing manually. I have no clue, why it is happening and whether it is my own mistake, or a bug within numpy.
It's super weird, that certain instances of the desired input objects run through the whole for loop without any problem, while others run perfectly up to a certain index and others fail to even compute the very first loop.
For instance, one input always stopped at index 16, throwing a:
ValueError: could not broadcast input array from shape (25,) into shape (32,)
Upon further investigation I could confirm, that the previous 15 loops threw the correct results, the results in loop of index 16 were wrong and not even of the correct size. When running loop 16 manually through the console, no errors occured...
The lower array shows the results for index 16, when it's running in the loop.
These are the results for index 16, when running the code in the for loop manually in the console. These are, what one would expect to get.
The important part of the code is really only the np.multiply() in the for loop - I left the rest of it for context but am pretty sure it shouldn't interfere with my intentions.
def thermic_dissociation(input_gas, pressure):
# Copy of the input_gas object, which may not be altered out of scope
gas = copy.copy(input_gas)
# Temperature range
T = np.logspace(2.473, 4.4, 1000)
# Matrix containing the data over the whole range of interest
moles = np.zeros((gas.gas_cantera.n_species, len(T)))
# Array containing other property of interest
sum_particles = np.zeros(len(T))
# The troublesome for-loop:
for index in range(len(T)):
print(str(index) + ' start')
# Set temperature and pressure of the gas
gas.gas_cantera.TP = T[index], pressure
# Set gas mixture to a state of chemical equilibrium
gas.gas_cantera.equilibrate('TP')
# Sum of particles = Molar Density * Avogadro constant for every temperature
sum_particles[index] = gas.gas_cantera.density_mole * ct.avogadro
#This multiplication is doing the weird stuff, printed it to see what's computed before it puts it into the result matrix and throwing the error
print(np.multiply(list(gas.gas_cantera.mole_fraction_dict().values()), sum_particles[index]))
# This is where the error is thrown, as the resulting array is of smaller size, than it should be and thus resulting in the error
moles[:, index] = np.multiply(list(gas.gas_cantera.mole_fraction_dict().values()), sum_particles[index])
print(str(index) + ' end')
# An array helping to handle the results
molecule_order = list(gas.gas_cantera.mole_fraction_dict().keys())
return [moles, sum_particles, T, molecule_order]
Help will be very appreciated!

If you want the array of all species mole fractions, you should use the X property of the cantera.Solution object, which always returns that full array directly. You can see the documentation for that method: cantera.Solution.X`.
The mole_fraction_dict method is specifically meant for cases where you want to refer to the species by name, rather than their order in the Solution object, such as when relating two different Solution objects that define different sets of species.

This particular issue is not related to numpy. The call to mole_fraction_dict returns a standard python dictionary. The number of elements in the dictionary depends on the optional threshold argument, which has a default value of 0.0.
The source code of Cantera can be inspected to see what happens exactly.
mole_fraction_dict
getMoleFractionsByName
In other words, a value ends up in the dictionary if x > threshold. Maybe it would make more sense if >= was used here instead of >. And maybe this would have prevented the unexpected outcome in your case.
As confirmed in the comments, you can use mole_fraction_dict(threshold=-np.inf) to get all of the desired values in the dictionary. Or -float('inf') can also be used.
In your code you proceed to call .values() on the dictionary but this would be problematic if the order of the values is not guaranteed. I'm not sure if this is the case. It might be better to make the order explicit by retrieving values out of the dict using their key.

Can I have 2 index within a single dataframe?

I'm working with computer simulation and I use a lot of variables that change from one simulation to another. I have to run short but numerous simulation (like 1000+) so keeping track of these is important.
Up until now, I was simply adding a new columns with the data inside. So my data would look something like.
DataX, DataY, DataZ, variable1, variable2, variable3, ....
So I was basically making 1 column per variable.
Every time I would need to get new variables I would add them as a new column.
Not effective but at least everything was within the same file which was quite handy tbh.
My internship at my lab is about to end and my tutor asked me to clear up the code and make it so that anyone could keep using it.
The thing is, each of these variable also have 2 further sub variable.
So I made a new function that gather all those variables and make a neat little dataframe which looks like this
Parameter Value Lambda Mod
temperature 10 1 0
VarE 1.5 5 0.5
etc
To make it easily accessible I also turned Parameter as the index so I can use df_param.loc['VarE','Value'] for instance
However because of that, they're not all within the same file. Which isn't handy.
Since they'll have to use above 1000+ data file when plotting, and have to filter everything, having the parameter separated from the data can lead to mistakes (which isn't possible atm since everything is within the same file).
If I convert back "parameter" as a column, I can easily do that
Index DataX, DataY, DataZ, Parameter, Value, Lambda, Mod
The issue I have (mostly from a practical stand point) is that since parameter isn't the index anymore, I can't do df_param.loc['VarE','Value'] anymore. I would need to know at exactly what index 'VarE' is and do df.param.loc['index','value'] and with well over 15 parameter to pick from, it's a bit sketchy.
Basically, is there a way to have 2 index? like
One index for DataX, DataY, DataZ (let's call it 'dt') and one for Value Lambda and Mod which would be 'Parameter'
So two df within one basically.
Thank you in advance

How to remove duplicates from list of objects(according to value inside object), choosing item from dup’s according to other value inside object?

so basically, I have a list of objects. Let's say each object has two properties: A and B. A is a tuple of 3 integers: (A1, A2, A3), and B is an integer. This list may contain objects that have identical A's, and I want to get rid of those duplicates. However, I want to do so in a way that among those objects that have the same A, the one with the lowest B is chosen. In the end, I want a list of objects with all unique A's and with the lowest B's.
I thought about it for a while, and I think I can come up with a really janky way to do this with lots of for loops, but I feel like there must be a much better way built into a function in python or in some sort of library (to do at least a part of this). Anyone have any ideas?
Thanks!
edit: For more detail, this is actually for a tetris AI, for finding all possible moves with a given piece. My objects are nodes in a tree of possible Tetris moves. Each node has two values: A: (x_position, y_position, rotation), and B: the number of frames it takes to reach that position. I start with a root node at the starting position. At each step, I expand the tree by making children by doing one move to the left, one move to the right, one rotation left, one rotation right, or one softdrop downward, and for each child I update both A, the XYR position, and B, the number of frames it took to get there. I add all these to a list of potential moves. After this, I merge all nodes that have the same XYR position, choosing the node that has the least frames to get there. The next step, I expand each node inside of the list of potential moves and repeat the process. Sorry, I realize this explanation might be confusing, which is why I didn't include it in the original explanation. I think it's advantageous to do it this way because in modern tetris, there is a rather complicated rotation system called SRS (Super Rotation System) that allows you to perform complicated spins with various pieces, so by making a pathfinder in this way and simulating the piece making the moves according to SRS is a good way since it tells you if the move was a spin or not (sending more/less dmg), and it also allows you to know the exact movement to execute the placement (I also store a list of series of moves to reach that position) with the least frames. Later, I want to be able to figure out how to hash the states properly so I don't revisit, but I'm still figuring it out.

d = {}
for obj in the_list:
current_lowest = d.setdefault(obj.A, obj)
if obj.B < current_lowest.B:
d[obj.A] = obj
# Get the result
desired_list = list(d.values())
We have a dict d whose keys are tuples (A) and values are objects themselves. The .setdefault ensures that if the A of interest is not seen yet, it sets it with the current object obj. If it was seen already, it returns the value (an object) corresponding to that A. Then we compare that object's B with the one at hand and act dependingly. At the end, the desired result will lie in the values of d.

Implementing Disjoint Set Data Structure in Python

I'm working on a small project involving cluster, and I think the code given here https://www.ics.uci.edu/~eppstein/PADS/UnionFind.py might be a good starting point for my work. However, I have come across a few difficulties implementing it to my work:
If I make a set containing all my clusters cluster=set([0,1,2,3,4,...,99]) (there are 100 points with the numbers labelling them), then I would like to to group the numbers into cluster, do I simply write cluster=UnionFind()? Now what is the data type of cluster?
How can I perform the usual operations for set on cluster? For instance, I would like to read all the points (which may have been grouped together) in cluster, but type print cluster results in <main.UnionFind instance at 0x00000000082F6408>. I would also like to keep adding new elements to cluster, how do I do it? Do I need to write the specific methods for UnionFind()?
How do I know all the members of a group with one of its member is called? For instance, 0,1,3,4 are grouped together, then if I call 3, I want it to print 0,1,3,4, how do I do this?
thanks

Here's a small sample code on how to use the provided UnionFind class.
Initialization
The only way to create a set using the provided class is to FIND it, because it creates a set for a point only when it doesn't find it. You might want to create an initialization method instead.
union_find = UnionFind()
clusters = set([0,1,2,3,4])
for i in clusters:
union_find[i]
Union
# Merge clusters 0 and 1
union_find.union(0, 1)
# Add point 2 to the same set
union_find.union(0, 2)
Find
# Get the set for clusters 0 and 1
print union_find[0]
print union_find[1]
Getting all Clusters
# print all clusters and their sets
for cluster in union_find:
print cluster, union_find[cluster]
Note:
There is no direct way that gets you all the points given a cluster number. You can loop over all the points and pick the ones that have the required cluster number. You might want to modify the given class to support that operation more efficiently.

How to Sort Arrays in Dictionary?

I'm currently writing a program in Python to track statistics on video games. An example of the dictionary I'm using to track the scores :
ten = 1
sec = 9
fir = 10
thi5 = 6
sec5 = 8
games = {
'adom': [ten+fir+sec+sec5, "Ancient Domain of Mysteries"],
'nethack': [fir+fir+fir+sec+thi5, "Nethack"]
}
Right now, I'm going about this the hard way, and making a big long list of nested ifs, but I don't think that's the proper way to go about it. I was trying to figure out a way to sort the dictionary, via the arrays, and then, finding a way to display the first ten that pop up... instead of having to work deep in the if statements.
So... basically, my question is : Do you have any ideas that I could use to about making this easier, instead of wayyyy, way harder?
===== EDIT ====
the ten+fir produces numbers. I want to find a way to go about sorting the lists (I lack the knowledge of proper terminology) to go by the number (basically, whichever ones have the highest number in the first part of the array go first.
Here's an example of my current way of going about it (though, it's incomplete, due to it being very tiresome : Example Nests (paste2) (let's try this one?)
==== SECOND EDIT ====
In case someone doesn't see my comment below :
ten, fir, et cetera - these are just variables for scores. Basically, it goes from a top ten list into a variable number.
ten = 1, nin = 2, fir = 10, fir5 = 10, sec5 = 8, sec = 9...
so : 'adom': [ten+fir+sec+sec5, "Ancient Domain of Mysteries"] actually registers as : 'adom': [1+10+9+8, "Ancient Domain of Mysteries"] , which ends up looking like :
'adom': [28, "Ancient Domain of Mysteries"]
So, basically, if I ended up doing the "top two" out of my example, it'd be :
((1)) Nethack (48)
((2)) ADOM (28)
I'd write an actual number, but I'm thinking of changing a few things up, so the numbers might be a touch different, and I wouldn't want to rewrite it.
== THIRD (AND HOPEFULLY THE FINAL) EDIT ==
Fixed my original code example.

How about something like this:
scores = games.items()
scores.sort(key = lambda key, value: value[0])
return scores[:10]
This will return the first 10 items, sorted by the first item in the array.
I'm not sure if this is what you want though, please update the question (and fix the example link) if you need something else...

import heapq
return heapq.nlargest(10, games.iteritems(), key=lambda k, v: v[0])
is the most direct way to get the top ten key / value pairs, sorted by the first item of each "value" list. If you can define more precisely what output you want (just the names, the name / value pairs, or what else?) and the sorting criterion, this is easy to adjust, of course.

Wim's solution is good, but I'd say that you should probably go the extra mile and push this work off onto a database, rather than relying on Python. Python interfaces well with most types of databases, where much of what you're exploring is already a solved problem.
For example, instead of worrying about shifting your dictionaries to various other data types in order to properly sort them, you can simply get all the data for each pertinent entry pre-sorted based on the criteria of your query. There goes the need for convoluted sorting and resorting right there.
While dictionaries are tempting to use, because they give the illusion of database-like abilities to access data based on its attributes, I still think they stumble quite a bit with respect to implementation. I don't really have any numbers to throw at you, but just from personal experience, anything you do on Python when it comes to manipulating large amounts of data, you can do much faster and more efficient both in code and computation with something like MySQL.
I'm not sure what you have planned as far as the structure of your data goes, but along with adding data, changing its structure is a lot easier using a database, too.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.