Using a method to create multiple, new instances of a generator - python

I've been learning about generators in Python recently and have a question. I've used iterators before when learning Java, so I know how they basically work.
So I understand what's going on here in this question: Python for loop and iterator behavior
Essentially, once the for loop traverses through the iterator, it stops there, so doing another for loop would continue the iterator at the end of it (and result in nothing being printed out). That is clear to me
I'm also aware of the tee method from itertools, which lets me "duplicate" a generator. I found this to be helpful when I want to check if a generator is empty before doing anything to it (as I can check whether the duplicate in list form is empty).
In a code I'm writing, I need to create many of the same generators at different instances throughout the code, so my line of thought was: why don't I write a method that makes a generator? So every time I need a new one, I can call that method. Maybe my misunderstanding has to do with this "generator creation" process but that seemed right to me.
Here is the code I'm using. When I first call the method and duplicate it using tee, everything works fine, but then once I call it again after looping through it, the method returns an empty generator. Does this "using a method" workaround not work?
node_list=[]
generate_hash_2, temp= tee(generate_nodes(...))
for node in list(temp):
node_list.append(...)
print("Generate_hash_2:{}".format(node_list))
for node in generate_hash_2:
if node.hash_value==x:
print x
node_list2=[]
generate_hash_3, temp2= tee(generate_nodes(...)) #exact same parameters as before
for node in list(temp2):
node_list2.append(...)
print("Generate_hash_3:{}".format(node_list2))
`
def generate_nodes(nodes, type):
for node in nodes:
if isinstanceof(node.type,type):
yield node
Please ignore the poor variable name choices but the temp2 prints out fine, but temp3 prints out an empty list, despite the methods taking identical parameters :( Note that the inside of the for loop doesn't modify any of the items or anything. Any help or explanation would be great!
For a sample XML file, I have this:
<top></top>
For a sample output, I'm getting:
Generate_hash_2:["XPath:/*[name()='top'][1], Node Type:UnknownNode, Tag:top, Text:"]
Generate_hash_3:[]
If you are interested in helping me understand this further, I've been writing these methods to get an understanding of the files in here: https://github.com/mmoosstt/XmlXdiff/tree/master/lib/diffx , specifically the differ.py file
The code in that file constantly calls the _gen_dx_nodes() method (with the same parameters), which is a method that creates a generator. But the code's generator never "ends" and forces the writer to do something to reset it. So I'm confused why this happens to me (because I've been running into my problem when calling that method from different methods in succession). I've also been using the same test cases so I'm pretty lost here on how to fix this. Any help would be great!

Related

generator runs only once but why does this generator could run multiple times?

Hi I'm trying to wrap my head around the concept of generator in Python specifically using Spacy.
As far as I understood, generator runs only once. and nlp.pipe(list) returns a generator to use
machine effectively.
And the generator worked as I predicted like below.
matches = ['one', 'two', 'three']
docs = nlp.pipe(matches)
type(docs)
for x in docs:
print(x)
# First iteration, worked
one
two
three
for x in docs:
print(x)
# Nothing is printed this time
But strange thing happened when I tried to make a list using the generator
for things in nlp.pipe(example1):
print(things)
#First iteration prints things
a is something
b is other thing
c is new thing
d is extra
for things in nlp.pipe(example1):
print(things)
#Second iteration prints things again!
a is something
b is other thing
c is new thing
d is extra
Why this generator runs infinitely? I tried several times and it seems like it runs infinitely.
Thank you
I think you're confused because the term "generator" can be used to mean two different things in Python.
The first thing it can mean is a "generator object" which kind of iterator. The docs variable you created in your first example is a reference to one of these. A generator object can only be iterated once, after that it's exhausted and you'll need to create another one if you want to do more iteration.
The other thing "generator" can mean is a "generator function". A generator function is a function that returns a generator object when you call it. Indeed, the term "generator" is sometimes sloppily used for functions that return iterators generally, even when that's not technically correct. A real generator function is implemented using the yield keyword, but from the caller's perspective, it doesn't really matter how the function is implemented, just that it returns some kind of iterator.
I don't know anything about the library you're using, but it seems like nlp.pipe returns an iterator, so in at least the loosest sense (at least) it can be called a generator function. The iterator it returns is (presumably) the generator object.
Generator objects are single-use, like all iterators are supposed to be. Generator functions on the other hand, can be called as many times as you find appropriate (some might have side effects). Each time you call the generator function, you'll get a new generator object. This is why your second code block works, as you're calling nlp.pipe once for each loop, rather than iterating on the same iterator for both loops.
for things in nlp.pipe(example1) creates a new instance of the nlp.pipe() generator (i.e. an iterator).
If you had assigned the generator to a variable and used the variable multiple times, then you would have seen the effect that you were expecting:
pipeGen = nlp.pipe(example1)
for things in pipeGen:
print(things)
#First iteration will things
for things in pipeGen:
print(things)
#Second iteration will print nothing
In other words nlp.pipe() returns a NEW iterator whereas pipeGen IS an iterator.

Exploring linked point with recursing "for" function in python

I have a set of points in the space, each of them is linked to some other: http://molview.org/?q=Decane
For each point I need to find three other points:
One to form a bond: first neighbors
Second to form an angle: second neighbors
Third to form a dihedral angle: third neighbors is best but second if not existing
I have a working algorithm:
def search_and_build(index, neighbor):
#index is the currently selected point, neighbor is a list containing all the connected point...
#if the index is already done return directly
if is_done(index):
return
set_done(index)
for i, j in enumerate(neighbor):
#add function are adding data in dictionaries and search function are searching in the bonding dict for second and third neighbors
add_bond(j, index)
add_angle(j, search_angle(j))
add_dihedral(j, search_dihedral(j))
search_and_build(j, get_sorted_neighbors(j))
This algorithm is using recursivity in a for loop. I use it because I thought recursivity is cool and also because it instantly worked. I assumed that python would finish the for first and then run another function, but after some debugging I realized that it's not working like that. Sometimes the for is running multiples times before another function sometimes not
I googled and it's apparently a bad practice to use such algorithms, would someone be able to explain?
Each time your for loop gets to the last line it calls the function again, starting the for loop again and so on.
The issue is that the for loop in all of those functions calls has not finished executing, it has executed once, and put a new function call on the stack for search_and_build and each search_and_build execution will do the same while there's still something in your dict.
By the time you get back to the first For loops the dict that's getting iterated on doesn't exist or has shrunk a lot, but there was supposed to be something/more of something to iterate over when it first started.
Basicly recursion is cool, but it makes thing pretty hard to get your head around or debug, even more if you invole other loops inside each steps of your recursion.
TLDR : Mutating and iterable while looping over it is very bad.

Calling non-pure function in list comprehension

I have the following code (simplified):
def send_issue(issue):
message = bot.send_issue(issue)
return message
def send_issues(issues):
return [send_issue(issue) for issue in issues]
As you see, send_issues and send_issue are non-pure functions. Is this considered a good practice (and Pythonic) to call non-pure functions in list comprehensions? The reason I want to do this is that this is convenient. The reason against this is that when you see a list comprehension, you expect that this code just generates the list and nothing more, but that's not the case.
UPD:
I actually want to create and return the list contrary this question.
The question here is - Do you really need to create the list?
If this is so it is okay but not the best design.
It is good practice for a function to do only one thing especially if it has a side effect like I/O.
In your case, the function is creating and sending a message.
To fix this you can create a function that is sending the message and a function which is generating the message.
It is better to write it as.
msgs = [bot.get_message(issue) for issue in issues]
for msg in msgs:
bot.send(msg)
This is clearer and widens the use of API while keeping the side effect isolated.
If you don't want to create another function you can at least use map since it says - "map this function to every element".
map(lambda issue: bot.send_issue(issue), issues) # returns a list
Also, the function send_issue is not needed because it just wraps the bot.send_issue.
Adding such functions is only making the code noisy which is not a good practice.

How to write Python generator function that never yields anything

I want to write a Python generator function that never actually yields anything. Basically it's a "do-nothing" drop-in that can be used by other code which expects to call a generator (but doesn't always need results from it). So far I have this:
def empty_generator():
# ... do some stuff, but don't yield anything
if False:
yield
Now, this works OK, but I'm wondering if there's a more expressive way to say the same thing, that is, declare a function to be a generator even if it never yields any value. The trick I've employed above is to show Python a yield statement inside my function, even though it is unreachable.
Another way is
def empty_generator():
return
yield
Not really "more expressive", but shorter. :)
Note that iter([]) or simply [] will do as well.
An even shorter solution:
def empty_generator():
yield from []
For maximum readability and maintainability, I would prioritize a construct which goes at the top of the function. So either
your original if False: yield construct, but hoisted to the very first line, or
a separate decorator which adds generator behavior to a non-generator callable.
(That's assuming you didn't just need a callable which did something and then returned an empty iterable/iterator. If so then you could just use a regular function and return ()/return iter(()) at the end.)
Imagine the reader of your code sees:
def name_fitting_what_the_function_does():
# We need this function to be an empty generator:
if False: yield
# that crucial stuff that this function exists to do
Having this at the top immediately cues in every reader of this function to this detail, which affects the whole function - affects the expectations and interpretations of this function's behavior and usage.
How long is your function body? More than a couple lines? Then as a reader, I will feel righteous fury and condemnation towards the author if I don't get a cue that this function is a generator until the very end, because I will probably have spent significant mental cost weaving a model in my head based on the assumption that this is a regular function - the first yield in a generator should ideally be immediately visible, when you don't even know to look for it.
Also, in a function longer than a few lines, a construct at the very beginning of the function is more trustworthy - I can trust that anyone who has looked at a function has probably seen its first line every time they looked at it. That means a higher chance that if that line was mistaken or broken, someone would have spotted it. That means I can be less vigilant for the possibility that this whole thing is actually broken but being used in a way that makes the breakage non-obvious.
If you're working with people who are sufficiently fluently familiar with the workings of Python, you could even leave off that comment, because to someone who immediately remembers that yield is what makes Python turn a function into a generator, it is obvious that this is the effect, and probably the intent since there is no other reason for correct code to have a non-executed yield.
Alternatively, you could go the decorator route:
#generator_that_yields_nothing
def name_fitting_what_the_function_does():
# that crucial stuff for which this exists
def generator_that_yields_nothing(wrapped):
#functools.wraps(wrapped)
def wrapper_generator():
if False: yield
wrapped()
return wrapper_generator

function in python

In a function, I need to perform some logic that requires me to call a function inside a function. What I did with this, like:
def dfs(problem):
stack.push(bache)
search(root)
while stack.isEmpty() != 0:
def search(vertex):
closed.add(vertex)
for index in sars:
stack.push(index)
return stack
In the function, dfs, I am using search(root), is this is the correct way to do it?
I am getting an error: local variable 'search' referenced before assignment
There are many mysterious bug-looking aspects in your code. The wrong order of definition (assuming you do need the search function to be a nested one) and the syntax error from the empty while loop have already been observed, but there are more...:
def dfs(problem):
stack.push(bache)
search(root)
what's bache, what's stack, what's root? If they're all global variables, then you're overusing globals -- and apparently nowhere ever using the argument problem (?!).
while stack.isEmpty() != 0:
what's this weird-looking method isEmpty? IOW, what type is stack (clearly not a Python list, and that's weird enough, since they do make excellent LIFO stacks;-)...? And what's ever going to make it empty...?
def search(vertex):
closed.add(vertex)
...don't tell me: closed is yet another global? Presumably a set? (I remember from a few of your Qs back that you absolutely wanted to have a closed dict, not set, even though I suggested that as a possibility...
for index in sars:
...and what's sars?!
stack.push(index)
return stack
what a weird "loop" -- one that executes exactly once, altering a global variable, and then immediately returns that global variable (?) without doing any of the other steps through the loop. Even if this is exactly what you mean (push the first item of sars, period) I don't recommend hiding it in a pseudo-loop -- it seriously looks like a mystery bug just waiting to happen;-).
You need to de-indent your search function. The way you have it set up right now you are defining your search function as a part of the completion of your dfs call. Also, encapsulation in a class would help.
Thats the wrong order. Try this:
def dfs(problem):
def search(vertex):
closed.add(vertex)
for index in sars:
stack.push(index)
return stack
stack.push(bache)
search(root)
while stack.isEmpty() != 0:
Either define search before you call it, or define it outside of dfs.
you have to define the function before using it
root doesn't seem to be available in your scope - make sure it's reachable
You don't have body to your for your while loop. That is probably causing problems parsing the code. I would also suggest putting the local function definition before it is used, so it is easier to follow.

Categories

Resources