def strange_syntax(stuff):
return ".".join(item for item in stuff)
How (and why) works this code? What happens here? Normally I can't use this syntax. Also, this syntax doesn't exist if it's not inside some function as an argument.
I know, I could do the same with:
def normal_syntax(stuff):
return ".".join(stuff)
This is called a generator expression.
It works just like a list comprehension (but evaluating the iterated objects lazily and not building a new list from them), but you use parentheses instead of brackets to create them. And you can drop the parentheses in a function call that only has one argument.
In your specific case, there is no need for a generator expression (as you noted) - (item for item in stuff) gives the same result as stuff. Those expressions start making sense when doing something with the items like (item.strip() for item in stuff) (map) or (item for item in stuff if item.isdigit()) (filter) etc.
When used in a function call, the syntax:
f(a for a in b)
implicitly is compiled as a generator, meaning
f((a for a in b))
This is just syntactic sugar, to make the program look nicer. It doesn't make much sense to write directly in the console
>>>a for a in b
because it's unclear if you want to create a generator, or perform a regular loop. In this case you must use the outer ().
Related
I'm beginning to appreciate the value of lambda expressions in python, particularly when it comes to functional programming, map, functions returning functions, etc. However, I've also been naming lambdas within functions because:
I need the same functionality several times and don't want to repeat code.
The functionality is specific to the function in which it appears; its not needed elsewhere.
When I encounter a situation that meets the above criteria, I've been writing a named lambda expression in order to DRY and narrowly scope functionality. For example, I am writing a function that operates on some numpy arrays, and I need to do some moderately tedious indexing of all the arrays passed to the function (which can easily fit on a single line). I've written a named lambda expression to do the indexing instead of writing a whole other function or copy/pasting the indexing several times throughout the function definition.
def fcn_operating_on_arrays(array0, array1):
indexer = lambda a0, a1, idx: a0[idx] + a1[idx]
# codecodecode
indexed = indexer(array0, array1, indices)
# codecodecode in which other arrays are created and require `indexer`
return the_answer
Is this an abuse of python's lambdas? Should I just suck it up and define a separate function?
Edits
Probably worth linking function inside function.
This is not Pythonic and PEP8 discourages it:
Always use a def statement instead of an assignment statement that
binds a lambda expression directly to an identifier.
Yes:
def f(x): return 2*x
No:
f = lambda x: 2*x
The first form means that the name of the resulting function object is
specifically 'f' instead of the generic '<lambda>'. This is more
useful for tracebacks and string representations in general. The use
of the assignment statement eliminates the sole benefit a lambda
expression can offer over an explicit def statement (i.e. that it can
be embedded inside a larger expression)
A rule of thumb for this is to think on its definition: lambdas expressions are anonymous functions. If you name it, it isn't anonymous anymore. :)
I've written a named lambda expression to do the indexing instead of writing a whole other function
Well, you are writing a whole other function. You're just writing it with a lambda expression.
Why not use def? You get nicer stack traces and more syntactical flexibility, and you don't lose anything. It's not like def can't occur inside another function:
def fcn_operating_on_arrays(array0, array1):
def indexer(a0, a1, idx):
return a0[idx] + a1[idx]
...
Hi I'm trying to wrap my head around the concept of generator in Python specifically using Spacy.
As far as I understood, generator runs only once. and nlp.pipe(list) returns a generator to use
machine effectively.
And the generator worked as I predicted like below.
matches = ['one', 'two', 'three']
docs = nlp.pipe(matches)
type(docs)
for x in docs:
print(x)
# First iteration, worked
one
two
three
for x in docs:
print(x)
# Nothing is printed this time
But strange thing happened when I tried to make a list using the generator
for things in nlp.pipe(example1):
print(things)
#First iteration prints things
a is something
b is other thing
c is new thing
d is extra
for things in nlp.pipe(example1):
print(things)
#Second iteration prints things again!
a is something
b is other thing
c is new thing
d is extra
Why this generator runs infinitely? I tried several times and it seems like it runs infinitely.
Thank you
I think you're confused because the term "generator" can be used to mean two different things in Python.
The first thing it can mean is a "generator object" which kind of iterator. The docs variable you created in your first example is a reference to one of these. A generator object can only be iterated once, after that it's exhausted and you'll need to create another one if you want to do more iteration.
The other thing "generator" can mean is a "generator function". A generator function is a function that returns a generator object when you call it. Indeed, the term "generator" is sometimes sloppily used for functions that return iterators generally, even when that's not technically correct. A real generator function is implemented using the yield keyword, but from the caller's perspective, it doesn't really matter how the function is implemented, just that it returns some kind of iterator.
I don't know anything about the library you're using, but it seems like nlp.pipe returns an iterator, so in at least the loosest sense (at least) it can be called a generator function. The iterator it returns is (presumably) the generator object.
Generator objects are single-use, like all iterators are supposed to be. Generator functions on the other hand, can be called as many times as you find appropriate (some might have side effects). Each time you call the generator function, you'll get a new generator object. This is why your second code block works, as you're calling nlp.pipe once for each loop, rather than iterating on the same iterator for both loops.
for things in nlp.pipe(example1) creates a new instance of the nlp.pipe() generator (i.e. an iterator).
If you had assigned the generator to a variable and used the variable multiple times, then you would have seen the effect that you were expecting:
pipeGen = nlp.pipe(example1)
for things in pipeGen:
print(things)
#First iteration will things
for things in pipeGen:
print(things)
#Second iteration will print nothing
In other words nlp.pipe() returns a NEW iterator whereas pipeGen IS an iterator.
I have a list of objects and they have a method called process. In Python 2 one could do this
map(lambda x: x.process, my_object_list)
In Python 3 this will not work because map doesn't call the function until the iterable is traversed. One could do this:
list(map(lambda x: x.process(), my_object_list))
But then you waste memory with a throwaway list (an issue if the list is big). I could also use a 2-line explicit loop. But this pattern is so common for me that I don't want to, or think I should need to, write a loop every time.
Is there a more idiomatic way to do this in Python 3?
Don't use map or a list comprehension where simple for loop will do:
for x in list_of_objs:
x.process()
It's not significantly longer than any function you might use to abstract it, but it is significantly clearer.
Of course, if process returns a useful value, then by all means, use a list comprehension.
results = [x.process() for x in list_of_objs]
or map:
results = list(map(lambda x: x.process(), list_of_objs))
There is a function available that makes map a little less clunky, especially if you would reuse the caller:
from operator import methodcaller
processor = methodcaller('process')
results = list(map(processor, list_of_objs))
more_results = list(map(processor, another_list_of_objs))
If you are looking for a good name for a function to wrap the loop, Haskell has a nice convention: a function name ending with an underscore discards its "return value". (Actually, it discards the result of a monadic action, but I'd rather ignore that distinction for the purposes of this answer.)
def map_(f, *args):
for f_args in zip(*args):
f(*f_args)
# Compare:
map(f, [1,2,3]) # -- return value of [f(1), f(2), f(3)] is ignored
map_(f, [1,2,3]) # list of return values is never built
Since you're looking for a Pythonic solution, why would even bother trying to adapt map(lambda x: x.process, my_object_list) for Python 3 ?
Isn't a simple for loop enough ?
for x in my_object_list:
x.process()
I mean, this is concise, readable and avoid creating an unnecessary list if you don't need return values.
If I try to modify the 'board' list in-place in the way below, it doesn't work, it seems like it generate some new 'board' instead of modify in-place.
def func(self, board):
"""
:type board: List[List[str]]
"""
board = [['A' for j in range(len(board[0]))] for i in range(len(board))]
return
I have to do something like this to modify it in-place, what's the reason? Thanks.
for i in range(len(board)):
for j in range(len(board[0])):
board[i][j] = 'A'
You seem to understand the difference between these two cases, and want to know why Python makes you handle them differently?
I have to do something like this to modify it in-place, what's the reason?
Creating a new copy is something that has a value. So it makes sense for it to be an expression. In fact, list comprehensions would be useless if they weren't expressions.
Mutating a list in-place isn't something that has a value. So, there's no reason to make it an expression, and in fact, it would be weird to do so. Sure, you could come up with some kind of value (like, say, the list being mutated). But that would be at odds with everything else in the design of Python: spam.append(eggs) doesn't return spam, it returns nothing. spam = eggs doesn't have a value. And so on.
Secondarily, the comprehension style feeds very well into the iterable paradigm, which is fundamental to Python. For example, notice that you can turn a list comprehension into a generator comprehension (which gives you a lazy iterator over values that are computed on demand) just by changing the […] to (…). What useful equivalent could there be for mutation?
Making the transforming-copy more convenient also encourages people to use a non-mutating style, which often leads to better answers for many problems. When you want to know how to avoid writing three lines of nested statement to mutate some global, the answer is to stop mutating that global and instead pass in a parameter and return the new value.
Also, the syntax was copied from Haskell, where there is no mutation.
But of course all those "often" and "usually" don't mean "never". Sometimes (unless you're designing a language with no mutation), you need to do things in-place. That's why we have list.sort as well as sorted. (And a lot of work has gone into optimizing the hell out of list.sort; it's not just an afterthought.)
Python doesn't stop you from doing it. It just doesn't bend over quite as far to make it easy as it does for copying.
that is not modifying it in place. The list comprehension syntax [x for y in z] is creating a new list. The original list is not modified by this syntax. Making the name inside the function point to a new list won't change what list the name outside the function is pointing.
In other words, when calling a function python passes a reference to the object, not the name, so there is no easy way to change which object the variable name outside the function is refering to.
I know I’ve seen (perhaps exclusively in other languages) where you can use for loops in function arguments. I forget what it was called, but in an attempt to make my code smaller I want to try it. For those of you who don't know what I'm talking about, it goes something like this:
math.sum(for i in range(5)) # Just an example; code will probably not work
Or something like that? I'm not sure how it works yet, but I intend to learn. I know there is a name for this sort of thing, but I've forgotten what it is. Could anyone give me some pointers, or am I insane and this doesn't exist in python?
A "for loop as an expression" is usually called a "comprehension", at least in Haskell, Python, and other languages inspired by them.
Read List Comprehensions in the tutorial for an introduction to the idea. There are also set comprehensions and dict comprehensions, which are pretty obvious once you get list comprehensions.
Then there are generator expressions, which are a bit trickier—but a lot cooler. You're not going to understand those until you first read Iterators, and then Generators, and then Generator Expressions is the very next section.
It still probably won't be clear why generator expressions are cool, but David Beazley explains that masterfully.
To translate your code to real code, all you need is:
math.sum(i for i in range(5))
However, what you're asking for is "all of the elements of range(5), which you can do a lot more easily like this:
math.sum(range(5))
Why? Because a range is already an iterable object.* If it weren't, you couldn't use it in a for loop in the first place, by definition.
Unless you have either some expression to perform on each element, or an if clause to filter the loop, or multiple for clauses for nested looping, comprehensions don't buy you anything. So, here's some more useful examples:
math.sum(i*i for i in range(5))
math.sum(i for i in range(5) if i%3 != 0)
math.sum(j for i in range(5) for j in range(i))
* Technically speaking, you're asking for an iterator over all of the elements in range(5), not just any iterable over them. For a for loop it doesn't matter, but if you need something that you can call next on manually, have it remember its current position, etc., it does. In that case, what you want is iter(range(5)).
The fact that your comprehension happens to be a function argument is almost completely irrelevant here. You can use them anywhere you can use an expression:
squares_to_5 = (i*i for i in range(5)) # often useful
for square in (i*i for i in range(5)): # silly, but legal
However, notice that generator expressions need to be put inside parentheses. In the special case where a generator expression is the only argument to a function, so it's already in parentheses, you can leave the extra parentheses off.
You're thinking of list comprehensions and generator expressions.
This would work in Python with only a slight modification:
sum(i for i in range(5))
This is the seminal work on generators: http://www.dabeaz.com/generators/
Technically speaking they are completely unrelated to the fact that you're using them as function arguments:
x = (i for i in range(5))
evens = [i for i in range(100) if i % 2 == 0]
even_squares = [i**2 for i in evens]