Exploring linked point with recursing "for" function in python

Exploring linked point with recursing "for" function in python - python

I have a set of points in the space, each of them is linked to some other: http://molview.org/?q=Decane
For each point I need to find three other points:
One to form a bond: first neighbors
Second to form an angle: second neighbors
Third to form a dihedral angle: third neighbors is best but second if not existing
I have a working algorithm:
def search_and_build(index, neighbor):
#index is the currently selected point, neighbor is a list containing all the connected point...
#if the index is already done return directly
if is_done(index):
return
set_done(index)
for i, j in enumerate(neighbor):
#add function are adding data in dictionaries and search function are searching in the bonding dict for second and third neighbors
add_bond(j, index)
add_angle(j, search_angle(j))
add_dihedral(j, search_dihedral(j))
search_and_build(j, get_sorted_neighbors(j))
This algorithm is using recursivity in a for loop. I use it because I thought recursivity is cool and also because it instantly worked. I assumed that python would finish the for first and then run another function, but after some debugging I realized that it's not working like that. Sometimes the for is running multiples times before another function sometimes not
I googled and it's apparently a bad practice to use such algorithms, would someone be able to explain?

Each time your for loop gets to the last line it calls the function again, starting the for loop again and so on.
The issue is that the for loop in all of those functions calls has not finished executing, it has executed once, and put a new function call on the stack for search_and_build and each search_and_build execution will do the same while there's still something in your dict.
By the time you get back to the first For loops the dict that's getting iterated on doesn't exist or has shrunk a lot, but there was supposed to be something/more of something to iterate over when it first started.
Basicly recursion is cool, but it makes thing pretty hard to get your head around or debug, even more if you invole other loops inside each steps of your recursion.
TLDR : Mutating and iterable while looping over it is very bad.

Related

Appropriate to use repeated function calls to loop through something (i.e. a list) in Python?

Lets say I have the following Python script:
def pop_and_loop():
my_list.pop(0)
my_func()
def my_func():
#do something with list item [0]
if my_list[0] finished_with:
pop_and_loop()
#continued actions if not finished with
if my_list[0] finished_with:
pop_and_loop()
my_list = [#list containing 100 items]
my_func()
Is this an appropriate setup? Because, am I not leaving each function call open in a way because its having to hold a marker at the position where I have left the function to go to another, so theoretically it is waiting for me to come back, but I'm never coming back to that one. Does this create problems and is there a different way you're meant to do this?
EDIT: My actual script is more complicated than this, with loads of different functions that I need to call to whilst processing each item in the main list. Essentially my question is whether I need to convert this setup into an actual loop. Bearing in mind that I will need to refresh the main list to refill it again and then loop through the same again. So how would I keep looping that? Should I instead have:
my_list = []
def my_func(item):
#do something with list item
if item finished_with:
return output
elif item_finished_now:
return output
while not len(my_list):
while #there are items to fill the list with:
#fill list
for x in my_list:
output = my_func(x)
#deal with output and list popping here
#sleep loop waiting for there to be things to put into the list again
time.sleep(60)

Yours is simply an example of recursion.
Both the question and answer are borderline opinion-based, but in most cases you would prefer an iterative solution (loops) over recursion unless the recursive solution has a clear benefit of either being simpler or being easier to comprehend in code and in reasoning.
For various reasons, Python does not have any recursion optimizations such as tail call and creates a new stack frame for each new level (or function call). That, and more, are reasons an iterative solution would generally be faster and why the overhead of extra recursive calls in Python is rather large - it takes more memory for the stack and spends more time creating those frames. On top of all, there is a limit to the recursion depth, and most recursive algorithms can be converted to an iterative solution in an easy fashion.
Your specific example is simple enough to convert like so:
while my_list:
while my_list[0] != "finished":
# do stuff
my_list.pop(0)
On a side note, please don't pop(0) and use a collections.deque instead as it's O(1) instead of O(N).

Changing iteration order during nested iteration

Given two associative arrays, one array contains pointers to 3d point coordinates, the other array contains a hash/dictionary of surfaces. For every point in the first array, there will be only one matching surface found in the second array (where the point lay on the surface.)
We need to iterate through the points to find the matching surface (to then get the unit vector [ijk points] normal to the surface at that point.)
Brute force could iterate through every item in each list, breaking the iteration once each surface data point is found. Though I’ve already found in writing earlier versions of this program, astronomically numerous calculations will be performed, and I need to be efficient.
There will always be more points than surfaces, and the surfaces will be adjacent, meaning as I iteration through the points in a certain order, it’s more likely than not that the next point will be on the same surface as the last.
I’m wondering if I can run a loop which, for example,
for point n:
for surface i:
does the point lay on the surface? if so, break
…and if the last ‘i’ value was 5, begin the next iteration at i=5 (and if the point wasn’t on surface 5, continue iterating through each surface.) It would be better if I could have it iterate in a order like: not 5? try 6; not 6? try 4——
Expanding on that idea, imagine that ‘i’ were organized in an 2d array. I.e:
[1,2,3]
[4,5,6]
[7,8,9]
And for n points:
For i surfaces: (continuing where I left off,) not 4? try 2; not 2? try 8.
I’m wondering if a ‘for’ loop won’t give me the versatility I need. (By the way, the program will likely be written in either Python or .NET) I’m thinking that I can make a while loop and write some sort of logic that will iterate the way I want. Am I trying to reinvent the wheel? Am I on the right track?

This is only a partial answer, because your question doesn't have a lot of specifics on what your actual code is. But, that said, one thing to note is that the variable from a for loop retains its value even after the loop has ended. So this will print 5:
for i in range(1000):
if i == 5:
break
print(i)
So you can easily check after the inner loop what value it ended on. And then you can do whatever you want with that value. For instance, you could look at it on the next run through the outer loop and fix things up so that the inner loop starts at some other place.
A for loop will almost surely give you the versatility you need, because you can use a for loop to iterate over many kinds of things, including some custom iterator that you create. So for instance you could do something like this:
def best_order_to_loop(all_surfaces, previous_stopping_point):
# some logic here
yield next_value
previous_stopping_point = None
for point in points:
surfaces_in_order = best_order_to_loop(all_surfaces, previous_stopping_point)
for surface in surfaces_in_order:
# do whatever
previous_stopping_point = surface
Again, this is just a sketch and I'm not sure I'm 100% understanding your setup. But it seems like you're saying "if the previous loop stopped at X, then based on that I want the next loop to loop in such-and-such order". If that is the case you can write a function like best_order_to_loop that determines how the next inner loop will go, based on the previous stopping point.

Using a method to create multiple, new instances of a generator

I've been learning about generators in Python recently and have a question. I've used iterators before when learning Java, so I know how they basically work.
So I understand what's going on here in this question: Python for loop and iterator behavior
Essentially, once the for loop traverses through the iterator, it stops there, so doing another for loop would continue the iterator at the end of it (and result in nothing being printed out). That is clear to me
I'm also aware of the tee method from itertools, which lets me "duplicate" a generator. I found this to be helpful when I want to check if a generator is empty before doing anything to it (as I can check whether the duplicate in list form is empty).
In a code I'm writing, I need to create many of the same generators at different instances throughout the code, so my line of thought was: why don't I write a method that makes a generator? So every time I need a new one, I can call that method. Maybe my misunderstanding has to do with this "generator creation" process but that seemed right to me.
Here is the code I'm using. When I first call the method and duplicate it using tee, everything works fine, but then once I call it again after looping through it, the method returns an empty generator. Does this "using a method" workaround not work?
node_list=[]
generate_hash_2, temp= tee(generate_nodes(...))
for node in list(temp):
node_list.append(...)
print("Generate_hash_2:{}".format(node_list))
for node in generate_hash_2:
if node.hash_value==x:
print x
node_list2=[]
generate_hash_3, temp2= tee(generate_nodes(...)) #exact same parameters as before
for node in list(temp2):
node_list2.append(...)
print("Generate_hash_3:{}".format(node_list2))
`
def generate_nodes(nodes, type):
for node in nodes:
if isinstanceof(node.type,type):
yield node
Please ignore the poor variable name choices but the temp2 prints out fine, but temp3 prints out an empty list, despite the methods taking identical parameters :( Note that the inside of the for loop doesn't modify any of the items or anything. Any help or explanation would be great!
For a sample XML file, I have this:
<top></top>
For a sample output, I'm getting:
Generate_hash_2:["XPath:/*[name()='top'][1], Node Type:UnknownNode, Tag:top, Text:"]
Generate_hash_3:[]
If you are interested in helping me understand this further, I've been writing these methods to get an understanding of the files in here: https://github.com/mmoosstt/XmlXdiff/tree/master/lib/diffx , specifically the differ.py file
The code in that file constantly calls the _gen_dx_nodes() method (with the same parameters), which is a method that creates a generator. But the code's generator never "ends" and forces the writer to do something to reset it. So I'm confused why this happens to me (because I've been running into my problem when calling that method from different methods in succession). I've also been using the same test cases so I'm pretty lost here on how to fix this. Any help would be great!

Is it possible to (quickly) find only the first cycle in a networkx graph?

I have a directed network that may or may not have cycles in it. I need to find them and remove the cyclicity. If I have a networkx DiGraph (G), I can find all the cycles with
cycle_nodes = nx.simple_cycles(G)
which creates a cycle-returning generator.
However, I don't want to return all cycles a la list(cycle_nodes) because many of the cycles are subsets of each other, and fixing one will fix others. Instead, I would like to only find the first instance of a cycle. As cycle_nodes is a generator, I tried
next(cycle_nodes)
to return only the first instance. However, I found that the time required to return the first instance not much smaller compared to the time required to return all instances:
list(cycle_nodes) : 58s
next(cycle_nodes) : 44s
Is this just due to the nature of my graph (i.e. the first cycle is far along the search order), or is there a more efficient way to return any cycle (doesn't necessarily need to be the first)?
The reason I suspect there may be a faster way is because when I run nx.is_directed_acyclic_graph(G), it takes only a second or two and returns False, so it's obviously finding at least one cycle in just a second or so.

The answer was sort of obvious. The algorithm nx.find_cycle() with no starting node provided will return the first cycle it finds, and rapidly. I was under the impression that a starting node needed to be provided, RTFM!

Python does not return value but print value

I am new to python, and relative new to recursive.
Below is my code,
def day_add(day,delta):
if day_num(day) + delta >= 7:
newNum = delta - 7
day_add(day,newNum)
return day_add(day,newNum)
else:
day = day_name(delta+day_num(day))
return day
if I have the line
return day_add(day,newNum)
the function behaves correctly and return the correct value.
However, if I do not have this line, but have
print(day)
the function may return None if it goes to recursion, but print the correct value.
So why do I need to return the function if I have recursion?

Recursion is a method where the solution to a problem depends on
solutions to smaller instances of the same problem... - Wikipedia
So, trying to solve the "big" problem, you will use the same method but in a "smaller" problem. In other words, you will need the answer from the "smaller" problem in order to solve the "big" one.
Therefore, you must return that result because if not, you will just print it, and it couldn't be used to solve the "bigger" problem.

There are two paths through your function. One that returns day, and one that re-calls your function with a new set of arguments.
The else clause is the "uninteresting" one. It just returns a fixed value, so no mystery there.
The if clause is more "interesting". It has the effect of breaking up a computation into a series of linked steps. A little piece of the computation is done at each step, and handed on down the line until the computation is complete. The else branch decides when the compuation is complete by returning the final value. This value gets handed back up the line until the first recursive call finally returns it from your function at the top level.
So if wasn't for the recursive returns, the final value couldn't get passed back up the line. With no explicit return statements, None would be returned by default.
Putting print(day) in the else clause allows you to "peek" at the final value before the recursive calls start to return. The computation steps are all complete at that stage - the only thing left to do is to retrace those steps and exit from the top-level function.

If you don't say to return something, it won't, recursive or not.

print displays to the screen. or, in other words, it gives the value to the you (the user). but in this case, your code needs the answer to the smaller problem. it can't read what is on the screen. it needs to the value from the function it calls -- even if in this case it calls itself. by 'return'ing the value it passes the information back up the chain.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.