What would be the difference between the following two statements in python?
l = [1,2,3,4]
a = {item:0 for item in l}
b = dict((item,0) for item in l)
a == b
# True
I believe the first is the proper way to initialize a dictionary via comprehension from PEP, yet the second way seems to just create a generator expression and then create a dict from that (and so maybe it does the exact same thing as the first approach behind the scenes?). What actually would be the difference between the two, and which one should be preferred over the other?
a = {item:0 for item in l}
Directly constructs a dict, no intermediates.
b = dict((item,0) for item in l)
Generates a tuple for each item in the list and feeds that to the dict() constructor.
Without really digging into the guts of the resulting Python byte code, I doubt there's an easy way of finding out how exactly they differ. Performance-wise, they are likely to be very close as well.
The main thing here I would consider is readability and maintainability. The first way only relies on the elements you need, without involving an intermediate data type (tuple) and without directly calling a type, but instead relying on the language itself to hook things up correctly. As a bonus, it's shorter and simpler - I don't see any advantage in using the second option, except maybe for the explicit use of dict, telling others what the expected type is. But if they don't get that from the {} in the first instance, I doubt they're much good anyway...
I figured I'd test the speed:
from timeit import timeit
from random import randint
l = [randint(0, 1000) for _ in range(1000)]
def first():
return {item: 0 for item in l}
def second():
return dict((item,0) for item in l)
print(timeit(first, number=10000))
print(timeit(second, number=10000))
Result:
0.46899440000000003
1.0817516999999999
Consistently faster as well, so no need to ever use the second option, it seems. If there's anything surprising here, it's actually how poorly optimised the second example is and how badly it performs.
Related
Think about a function that I'm calling for its side effects, not return values (like printing to screen, updating GUI, printing to a file, etc.).
def fun_with_side_effects(x):
...side effects...
return y
Now, is it Pythonic to use list comprehensions to call this func:
[fun_with_side_effects(x) for x in y if (...conditions...)]
Note that I don't save the list anywhere
Or should I call this func like this:
for x in y:
if (...conditions...):
fun_with_side_effects(x)
Which is better and why?
It is very anti-Pythonic to do so, and any seasoned Pythonista will give you hell over it. The intermediate list is thrown away after it is created, and it could potentially be very, very large, and therefore expensive to create.
You shouldn't use a list comprehension, because as people have said that will build a large temporary list that you don't need. The following two methods are equivalent:
consume(side_effects(x) for x in xs)
for x in xs:
side_effects(x)
with the definition of consume from the itertools man page:
def consume(iterator, n=None):
"Advance the iterator n-steps ahead. If n is none, consume entirely."
# Use functions that consume iterators at C speed.
if n is None:
# feed the entire iterator into a zero-length deque
collections.deque(iterator, maxlen=0)
else:
# advance to the empty slice starting at position n
next(islice(iterator, n, n), None)
Of course, the latter is clearer and easier to understand.
List comprehensions are for creating lists. And unless you are actually creating a list, you should not use list comprehensions.
So I would got for the second option, just iterating over the list and then call the function when the conditions apply.
Second is better.
Think of the person who would need to understand your code. You can get bad karma easily with the first :)
You could go middle between the two by using filter(). Consider the example:
y=[1,2,3,4,5,6]
def func(x):
print "call with %r"%x
for x in filter(lambda x: x>3, y):
func(x)
Depends on your goal.
If you are trying to do some operation on each object in a list, the second approach should be adopted.
If you are trying to generate a list from another list, you may use list comprehension.
Explicit is better than implicit.
Simple is better than complex. (Python Zen)
You can do
for z in (fun_with_side_effects(x) for x in y if (...conditions...)): pass
but it's not very pretty.
Using a list comprehension for its side effects is ugly, non-Pythonic, inefficient, and I wouldn't do it. I would use a for loop instead, because a for loop signals a procedural style in which side-effects are important.
But, if you absolutely insist on using a list comprehension for its side effects, you should avoid the inefficiency by using a generator expression instead. If you absolutely insist on this style, do one of these two:
any(fun_with_side_effects(x) and False for x in y if (...conditions...))
or:
all(fun_with_side_effects(x) or True for x in y if (...conditions...))
These are generator expressions, and they do not generate a random list that gets tossed out. I think the all form is perhaps slightly more clear, though I think both of them are confusing and shouldn't be used.
I think this is ugly and I wouldn't actually do it in code. But if you insist on implementing your loops in this fashion, that's how I would do it.
I tend to feel that list comprehensions and their ilk should signal an attempt to use something at least faintly resembling a functional style. Putting things with side effects that break that assumption will cause people to have to read your code more carefully, and I think that's a bad thing.
I have a list of objects and they have a method called process. In Python 2 one could do this
map(lambda x: x.process, my_object_list)
In Python 3 this will not work because map doesn't call the function until the iterable is traversed. One could do this:
list(map(lambda x: x.process(), my_object_list))
But then you waste memory with a throwaway list (an issue if the list is big). I could also use a 2-line explicit loop. But this pattern is so common for me that I don't want to, or think I should need to, write a loop every time.
Is there a more idiomatic way to do this in Python 3?
Don't use map or a list comprehension where simple for loop will do:
for x in list_of_objs:
x.process()
It's not significantly longer than any function you might use to abstract it, but it is significantly clearer.
Of course, if process returns a useful value, then by all means, use a list comprehension.
results = [x.process() for x in list_of_objs]
or map:
results = list(map(lambda x: x.process(), list_of_objs))
There is a function available that makes map a little less clunky, especially if you would reuse the caller:
from operator import methodcaller
processor = methodcaller('process')
results = list(map(processor, list_of_objs))
more_results = list(map(processor, another_list_of_objs))
If you are looking for a good name for a function to wrap the loop, Haskell has a nice convention: a function name ending with an underscore discards its "return value". (Actually, it discards the result of a monadic action, but I'd rather ignore that distinction for the purposes of this answer.)
def map_(f, *args):
for f_args in zip(*args):
f(*f_args)
# Compare:
map(f, [1,2,3]) # -- return value of [f(1), f(2), f(3)] is ignored
map_(f, [1,2,3]) # list of return values is never built
Since you're looking for a Pythonic solution, why would even bother trying to adapt map(lambda x: x.process, my_object_list) for Python 3 ?
Isn't a simple for loop enough ?
for x in my_object_list:
x.process()
I mean, this is concise, readable and avoid creating an unnecessary list if you don't need return values.
Think about a function that I'm calling for its side effects, not return values (like printing to screen, updating GUI, printing to a file, etc.).
def fun_with_side_effects(x):
...side effects...
return y
Now, is it Pythonic to use list comprehensions to call this func:
[fun_with_side_effects(x) for x in y if (...conditions...)]
Note that I don't save the list anywhere
Or should I call this func like this:
for x in y:
if (...conditions...):
fun_with_side_effects(x)
Which is better and why?
It is very anti-Pythonic to do so, and any seasoned Pythonista will give you hell over it. The intermediate list is thrown away after it is created, and it could potentially be very, very large, and therefore expensive to create.
You shouldn't use a list comprehension, because as people have said that will build a large temporary list that you don't need. The following two methods are equivalent:
consume(side_effects(x) for x in xs)
for x in xs:
side_effects(x)
with the definition of consume from the itertools man page:
def consume(iterator, n=None):
"Advance the iterator n-steps ahead. If n is none, consume entirely."
# Use functions that consume iterators at C speed.
if n is None:
# feed the entire iterator into a zero-length deque
collections.deque(iterator, maxlen=0)
else:
# advance to the empty slice starting at position n
next(islice(iterator, n, n), None)
Of course, the latter is clearer and easier to understand.
List comprehensions are for creating lists. And unless you are actually creating a list, you should not use list comprehensions.
So I would got for the second option, just iterating over the list and then call the function when the conditions apply.
Second is better.
Think of the person who would need to understand your code. You can get bad karma easily with the first :)
You could go middle between the two by using filter(). Consider the example:
y=[1,2,3,4,5,6]
def func(x):
print "call with %r"%x
for x in filter(lambda x: x>3, y):
func(x)
Depends on your goal.
If you are trying to do some operation on each object in a list, the second approach should be adopted.
If you are trying to generate a list from another list, you may use list comprehension.
Explicit is better than implicit.
Simple is better than complex. (Python Zen)
You can do
for z in (fun_with_side_effects(x) for x in y if (...conditions...)): pass
but it's not very pretty.
Using a list comprehension for its side effects is ugly, non-Pythonic, inefficient, and I wouldn't do it. I would use a for loop instead, because a for loop signals a procedural style in which side-effects are important.
But, if you absolutely insist on using a list comprehension for its side effects, you should avoid the inefficiency by using a generator expression instead. If you absolutely insist on this style, do one of these two:
any(fun_with_side_effects(x) and False for x in y if (...conditions...))
or:
all(fun_with_side_effects(x) or True for x in y if (...conditions...))
These are generator expressions, and they do not generate a random list that gets tossed out. I think the all form is perhaps slightly more clear, though I think both of them are confusing and shouldn't be used.
I think this is ugly and I wouldn't actually do it in code. But if you insist on implementing your loops in this fashion, that's how I would do it.
I tend to feel that list comprehensions and their ilk should signal an attempt to use something at least faintly resembling a functional style. Putting things with side effects that break that assumption will cause people to have to read your code more carefully, and I think that's a bad thing.
I have one or more unordered sequences of (immutable, hashable) objects with possible duplicates and I want to get a sorted sequence of all those objects without duplicates.
Right now I'm using a set to quickly gather all the elements discarding duplicates, convert it to a list and then sort that:
result = set()
for s in sequences:
result = result.union(s)
result = list(result)
result.sort()
return result
It works but I wouldn't call it "pretty". Is there a better way?
This should work:
sorted(set(itertools.chain.from_iterable(sequences)))
I like your code just fine. It is straightforward and easy to understand.
We can shorten it just a little bit by chaining off the list():
result = set()
for s in sequences:
result = result.union(s)
return sorted(result)
I really have no desire to try to boil it down beyond that, but you could do it with reduce():
result = reduce(lambda s, x: s.union(x), sequences, set())
return sorted(result)
Personally, I think this is harder to understand than the above, but people steeped in functional programming might prefer it.
EDIT: #agf is much better at this reduce() stuff than I am. From the comments below:
return sorted(reduce(set().union, sequences))
I had no idea this would work. If I correctly understand how this works, we are giving reduce() a callable which is really a method function on one instance of a set() (call it x for the sake of discussion, but note that I am not saying that Python will bind the name x with this object). Then reduce() will feed this function the first two iterables from sequences, returning x, the instance whose method function we are using. Then reduce() will repeatedly call the .union() method and ask it to take the union of x and the next iterable from sequences. Since the .union() method is likely smart enough to notice that it is being asked to take the union with its own instance and not bother to do any work, it should be just as fast to call x.union(x, some_iterable) as to just call x.union(some_iterable). Finally, reduce() will return x, and we have the set we want.
This is a bit tricky for my personal taste. I had to think this through to understand it, while the itertools.chain() solution made sense to me right away.
EDIT: #agf made it less tricky:
return sorted(reduce(set.union, sequences, set()))
What this is doing is much simpler to understand! If we call the instance returned by set() by the name of x again (and just like above with the understanding that I am not claiming that Python will bind the name x with this instance); and if we use the name n to refer to each "next" value from sequences; then reduce() will be repeatedly calling set.union(x, n). And of course this is exactly the same thing as x.union(n). IMHO if you want a reduce() solution, this is the best one.
--
If you want it to be fast, ask yourself: is there any way we can apply itertools to this? There is a pretty good way:
from itertools import chain
return sorted(set(chain(*sequences)))
itertools.chain() called with *sequences serves to "flatten" the list of lists into a single iterable. It's a little bit tricky, but only a little bit, and it's a common idiom.
EDIT: As #Jbernardo wrote in the most popular answer, and as #agf observes in comments, itertools.chain() returns an object that has a .from_iterable() method, and the documentation says it evaluates an iterable lazily. The * notation forces building a list, which may consume considerable memory if the iterable is a long sequence. In fact, you could have a never-ending generator, and with itertools.chain().from_iterable() you would be able to pull values from it for as long as you want to run your program, while the * notation would just run out of memory.
As #Jbernardo wrote:
sorted(set(itertools.chain.from_iterable(sequences)))
This is the best answer, and I already upvoted it.
Which of the following is better to use and why?
Method 1:
for k, v in os.environ.items():
print "%s=%s" % (k, v)
Method 2:
print "\n".join(["%s=%s" % (k, v)
for k,v in os.environ.items()])
I tend to lead towards the first as more understandable, but that might just be because I'm new to Python and list comprehensions are still somewhat foreign to me. Is the second way considered more Pythonic? I'm assuming there's no performance difference, but I may be wrong. What would be the advantages and disadvantages of these 2 techniques?
(Code taken from Dive into Python)
If the iteration is being done for its side effect ( as it is in your "print" example ), then a loop is clearer.
If the iteration is executed in order to build a composite value, then list comprehensions are usually more readable.
The particular code examples you have chosen do not demonstrate any advantage of the list comprehension, because it is being (mis-)used for the trivial task of printing. In this simple case I would choose the simple for loop.
In many other cases, you will want to supply an actual list to another function or method, and the list comprehension is the easiest and most readable way to do that.
An example which would clearly show the superiority of the list comp could be made by replacing the print example with one involving creating another actual list, by appending to one on each iteration of the for loop:
L = []
for x in range(10):
L.append(x**2)
Gives the same L as:
L = [x**2 for x in range(10)]
I find the first example better - less verbose, clearer and more readable.
In my opinion, go with what best gets your intention across, after all:
Programs should be written for people
to read, and only incidentally for
machines to execute.
-- from "Structure and Interpretation of Computer Programs" by Abelson and Sussman
By the way, since you're just starting to learn Python, start learning the new String Formatting syntax right away:
for k, v in os.environ.items():
print "{0}={1}".format(k, v)
List comprehension is more than twice as fast as explicit loop. Base on Ben James' variation, but replace the x**2 with a more trivial x+2 function, the two alternatives are:
def foo(n):
L = []
for x in xrange(n):
L.append(x+2)
return L
def bar(n):
return [x+2 for x in xrange(n)]
Timing result:
In [674]: timeit foo(1000)
10000 loops, best of 3: 195 us per loop
In [675]: timeit bar(1000)
10000 loops, best of 3: 81.7 us per loop
List comprehension wins by a large margin.
I agree than readability should be a priority over performance optimization. However readability is in the eye of beholder. When I first learn Python, list comprehension is a weird thing I find hard to comprehend! :-O But once I got use to it, it becomes a really nice short hand notation. If you are to become proficient in Python you have to master list comprehension.
The first one in my opinion, because:
It doesn't build a huge string.
It doesn't build a huge list (can easily be fixed with a generator, by removing the []).
In both cases, you access the items in the same way (using the dictionary iterator).
list comprehensions are supposed to be run at C level, so if there is huge loop, list comprehensions are good choice.
I agree with #Ben, #Tim, #Steven:
readability is the most important thing (do "import this" to remind yourself of what is)
a listcomp may or may not be much faster than an iterative-loop version... it depends on the total number of function calls that are made
if you do decide to go with listcomps with large datasets, it's better to use generator expressions instead
Example:
print "\n".join("%s=%s" % (k, v) for k,v in os.environ.iteritems())
in the code snippet above, I made two changes... I replaced the listcomp with a genexp, and I changed the method call to iteritems(). [this trend is moving forward as in Python 3, iteritems() replaces and is renamed to items().]