Map vs List vs loop in python 3.7 - python

When reading articles about the speed of Loop vs List comprehension vs Map, I usually find that list comprehension if faster than map when using lambda functions.
Here is a test I am running:
import timeit
def square(range):
squares = []
for number in range:
squares.append(number*number)
return squares
print(timeit.timeit('map(lambda a: a*a, range(100))', number = 100000))
print(timeit.timeit('[a*a for a in range(100)]', number = 100000))
print(timeit.timeit('square(range(100))', 'from __main__ import square', number = 100000))
and the results :
0.03845796199857432
0.5889980600004492
0.9229458660011005
so Map is the clear winner altough using a lambda function. Has there been a change in python 3.7 causing this notable speed boost ?

First of all, to have a fare comparison you have to convert the result of the map function to list. map in Python 3.X returns an iterator object not a list. Second of all, in CPython implementation built in functions are actually wrappers around c functions which makes them faster than any Python code with same functionality, although when you use lambda inside a built-in function you're actually breaking the chain and this will make it approximately as fast as a Python code.
Another important point is that list comprehension is just a syntactic sugar around a regular loop and you can use it to avoid extra function calls like appending to lists, etc.

Related

Idiomatic way to call method on all objects in a list of objects Python 3

I have a list of objects and they have a method called process. In Python 2 one could do this
map(lambda x: x.process, my_object_list)
In Python 3 this will not work because map doesn't call the function until the iterable is traversed. One could do this:
list(map(lambda x: x.process(), my_object_list))
But then you waste memory with a throwaway list (an issue if the list is big). I could also use a 2-line explicit loop. But this pattern is so common for me that I don't want to, or think I should need to, write a loop every time.
Is there a more idiomatic way to do this in Python 3?
Don't use map or a list comprehension where simple for loop will do:
for x in list_of_objs:
x.process()
It's not significantly longer than any function you might use to abstract it, but it is significantly clearer.
Of course, if process returns a useful value, then by all means, use a list comprehension.
results = [x.process() for x in list_of_objs]
or map:
results = list(map(lambda x: x.process(), list_of_objs))
There is a function available that makes map a little less clunky, especially if you would reuse the caller:
from operator import methodcaller
processor = methodcaller('process')
results = list(map(processor, list_of_objs))
more_results = list(map(processor, another_list_of_objs))
If you are looking for a good name for a function to wrap the loop, Haskell has a nice convention: a function name ending with an underscore discards its "return value". (Actually, it discards the result of a monadic action, but I'd rather ignore that distinction for the purposes of this answer.)
def map_(f, *args):
for f_args in zip(*args):
f(*f_args)
# Compare:
map(f, [1,2,3]) # -- return value of [f(1), f(2), f(3)] is ignored
map_(f, [1,2,3]) # list of return values is never built
Since you're looking for a Pythonic solution, why would even bother trying to adapt map(lambda x: x.process, my_object_list) for Python 3 ?
Isn't a simple for loop enough ?
for x in my_object_list:
x.process()
I mean, this is concise, readable and avoid creating an unnecessary list if you don't need return values.

Want to confirm the meaning of the name islice in Python itertools.islice

I'm new in Python and I'm not an English native speaker. Today I learned some functions in the itertools module. There is a function called islice. Does it stand for infinitive slice? As I understand it can be used to slice infinitive sequence of objects and is commonly used with itertools.count().
I presume it stands for "iterable slice", since it takes the same arguments as the slice built-in but generates a sequence of results rather than returning a list.
You may be suffering from some slight misunderstanding of "infinitive," which is a part of speech (in English, "to fall" is the infinitive of the verb "fall"). You perhaps mean "infinite," which is never-ending or uncountable.
If so, you have correctly observed that one advantage of the functions in itertools is that they can be applied to infinite sequences. This is because they return iterators that yield results on demand, rather than functions that return lists.
slice is a built-in class. The prefix 'i' for 'iterator' is added to avoid confusion and a name clash if one does from itertools import *.
In Python 2, itertools also had imap and ifilter, to avoid clashing with the old versions of map and filter. In Python 3, imap and ifilter became the new versions of map and filter and were hence removed from itertools.

iterating over a single list in parallel in python

The objective is to do calculations on a single iter in parallel using builtin sum & map functions concurrently. Maybe using (something like) itertools instead of classic for loops to analyze (LARGE) data that arrives via an iterator...
In one simple example case I want to calculate ilen, sum_x & sum_x_sq:
ilen,sum_x,sum_x_sq=iterlen(iter),sum(iter),sum(map(lambda x:x*x, iter))
But without converting the (large) iter to a list (as with iter=list(iter))
n.b. Do this using sum & map and without for loops, maybe using the itertools and/or threading modules?
def example_large_data(n=100000000, mean=0, std_dev=1):
for i in range(n): yield random.gauss(mean,std_dev)
-- edit --
Being VERY specific: I was taking a good look at itertools hoping that there was a dual function like map that could do it. For example: len_x,sum_x,sum_x_sq=itertools.iterfork(iter_x,iterlen,sum,sum_sq)
If I was to be very very specific: I am looking for just one answer, python source code for the "iterfork" procedure.
You can use itertools.tee to turn your single iterator into three iterators which you can pass to your three functions.
iter0, iter1, iter2 = itertools.tee(input_iter, 3)
ilen, sum_x, sum_x_sq = count(iter0),sum(iter1),sum(map(lambda x:x*x, iter2))
That will work, but the builtin function sum (and map in Python 2) is not implemented in a way that supports parallel iteration. The first function you call will consume its iterator completely, then the second one will consume the second iterator, then the third function will consume the third iterator. Since tee has to store the values seen by one of its output iterators but not all of the others, this is essentially the same as creating a list from the iterator and passing it to each function.
Now, if you use generator functions that consume only a single value from their input for each value they output, you might be able to make parallel iteration work using zip. In Python 3, map and zip are both generators. The question is how to make sum into a generator.
I think you can get pretty much what you want by using itertools.accumulate (which was added in Python 3.2). It is a generator that yields a running sum of its input. Here's how you could make it work for your problem (I'm assuming your count function was supposed to be an iterator-friendly version of len):
iter0, iter1, iter2 = itertools.tee(input_iter, 3)
len_gen = itertools.accumulate(map(lambda x: 1, iter0))
sum_gen = itertools.accumulate(iter1)
sum_sq_gen = itertools.accumulate(map(lambda x: x*x, iter2))
parallel_gen = zip(len_gen, sum_gen, sum_sq_gen) # zip is a generator in Python 3
for ilen, sum_x, sum_x_sq in parallel_gen:
pass # the generators do all the work, so there's nothing for us to do here
# ilen_x, sum_x, sum_x_sq have the right values here!
If you're using Python 2, rather than 3, you'll have to write your own accumulate generator function (there's a pure Python implementation in the docs I linked above), and use itertools.imap and itertools.izip rather than the builtin map and zip functions.

When does using list comprehension in Python become inefficient?

I see that using list comprehension provides a very simple way to create new lists in Python.
However, if instead of creating a new list I just want to call a void function for each argument in a list without expecting any sort of return value, should I use list comprehension or just use a for loop to iterate? Does the simplicity in the code justify creating a new list (even if it remains empty) for each set of iterations? Even if this added cost is negligible in small programs, does it make sense to do it in large-scale programs/production?
Thanks!
List comprehensions are the wrong way if you don't actually need a list. Use this instead:
for i in seq:
some_function(i)
This is both more efficient and more expressive than using:
[some_function(i) for i in seq]
Note that there is something similar that doesn't work (and in particular it's not a tuple comprehension):
(some_function(i) for i in seq)
because that only creates an iterator. If you actually pass a list around that only gets iterated once, passing such an iterator around is a much better solution though.
for x in lst: f(x)
looks about equally short (it's actually one character shorter) as
[f(x) for x in lst]
Or is that not what you were trying to do?
There are more possible solutions for calling a funcion on every member of a list:
numpy can vectorize functions
import numpy as np
def func(i):
print i
v_func = np.vectorize(func)
v_func(['one', 'two', 'three'])
python has a builtin map function, that maps a function on every member of an iterable
def func(i):
print i
map(func, ['one', 'two', 'three'])
Are you asking if it is inefficient to create a list you don't need? Put that way, the answer should be obvious.
(To satisfy the answer police: yes, it is less efficient to create a list you don't need.)

In Python, is it better to use list comprehensions or for-each loops?

Which of the following is better to use and why?
Method 1:
for k, v in os.environ.items():
print "%s=%s" % (k, v)
Method 2:
print "\n".join(["%s=%s" % (k, v)
for k,v in os.environ.items()])
I tend to lead towards the first as more understandable, but that might just be because I'm new to Python and list comprehensions are still somewhat foreign to me. Is the second way considered more Pythonic? I'm assuming there's no performance difference, but I may be wrong. What would be the advantages and disadvantages of these 2 techniques?
(Code taken from Dive into Python)
If the iteration is being done for its side effect ( as it is in your "print" example ), then a loop is clearer.
If the iteration is executed in order to build a composite value, then list comprehensions are usually more readable.
The particular code examples you have chosen do not demonstrate any advantage of the list comprehension, because it is being (mis-)used for the trivial task of printing. In this simple case I would choose the simple for loop.
In many other cases, you will want to supply an actual list to another function or method, and the list comprehension is the easiest and most readable way to do that.
An example which would clearly show the superiority of the list comp could be made by replacing the print example with one involving creating another actual list, by appending to one on each iteration of the for loop:
L = []
for x in range(10):
L.append(x**2)
Gives the same L as:
L = [x**2 for x in range(10)]
I find the first example better - less verbose, clearer and more readable.
In my opinion, go with what best gets your intention across, after all:
Programs should be written for people
to read, and only incidentally for
machines to execute.
-- from "Structure and Interpretation of Computer Programs" by Abelson and Sussman
By the way, since you're just starting to learn Python, start learning the new String Formatting syntax right away:
for k, v in os.environ.items():
print "{0}={1}".format(k, v)
List comprehension is more than twice as fast as explicit loop. Base on Ben James' variation, but replace the x**2 with a more trivial x+2 function, the two alternatives are:
def foo(n):
L = []
for x in xrange(n):
L.append(x+2)
return L
def bar(n):
return [x+2 for x in xrange(n)]
Timing result:
In [674]: timeit foo(1000)
10000 loops, best of 3: 195 us per loop
In [675]: timeit bar(1000)
10000 loops, best of 3: 81.7 us per loop
List comprehension wins by a large margin.
I agree than readability should be a priority over performance optimization. However readability is in the eye of beholder. When I first learn Python, list comprehension is a weird thing I find hard to comprehend! :-O But once I got use to it, it becomes a really nice short hand notation. If you are to become proficient in Python you have to master list comprehension.
The first one in my opinion, because:
It doesn't build a huge string.
It doesn't build a huge list (can easily be fixed with a generator, by removing the []).
In both cases, you access the items in the same way (using the dictionary iterator).
list comprehensions are supposed to be run at C level, so if there is huge loop, list comprehensions are good choice.
I agree with #Ben, #Tim, #Steven:
readability is the most important thing (do "import this" to remind yourself of what is)
a listcomp may or may not be much faster than an iterative-loop version... it depends on the total number of function calls that are made
if you do decide to go with listcomps with large datasets, it's better to use generator expressions instead
Example:
print "\n".join("%s=%s" % (k, v) for k,v in os.environ.iteritems())
in the code snippet above, I made two changes... I replaced the listcomp with a genexp, and I changed the method call to iteritems(). [this trend is moving forward as in Python 3, iteritems() replaces and is renamed to items().]

Categories

Resources