list comprehension functions in python - python

There are the map,reduce,filter functions to make list comprehensions.
What is the difference between passing a xrange argument or a range argument to each of these functions?
For example:
map(someFunc, range(1,11))
OR
map(someFunc,xrange(1,11))

In Python 2, range returns an actual list, while xrange returns a generating function which can be iterated over. Since map only cares that it iterates over a sequence, both are applicable, though xrange uses less memory. Python 3's xrange replaces range, so your only option is to use range to return a generating function. (You can use [i for i in range(10)] to generate the actual range if so desired.)
The modulus operator returns 0, a falsey value, for the even numbers: 2 mod 2 is 0. As such, the even numbers are filtered out.

Actually, the map, reduce, and filter functions are an alternative to list comprehensions. The term "list comprehension" refers to the specific syntactic construct; anything that doesn't look like a list comprehension is necessarily not a list comprehension.
They actually predate list comprehensions, and are borrowed from other languages. But most of those languages have ways of constructing anonymous functions which are more powerful than Python's lambda, so functions such as these are more natural. List comprehensions are considered a more natural fit to Python.
The difference between range and xrange is that range actually constructs a list containing the numbers that form the range, whereas an xrange is an object that knows its endpoints and can iterate over itself without ever actually constructing the full list of values in memory. xrange(1,1000) takes up no more space than xrange(1,5), whereas range(1,1000) generates a 999-element list.

If range() and xrange() were implemented in the Python language, they would look something like this:
def xrange(start, stop=None, step=1):
if stop is None: stop, start = start, 0
i = start
while i < stop:
yield i
i += step
def range(start, stop=None, step=1):
if stop is None: stop, start = start, 0
acc, i = [], start
while i < stop:
acc.append(i)
i += step
return acc
As you can see, range() creates a list and returns it, while xrange() lazily generates the values in a range on demand. This has the advantage that the overhead for creating a list is avoided in xrange(), since it doesn't store the values or create a list object. For most instances, there is no difference in the end result.
One obvious difference is that xrange() doesn't support slicing:
>>> range(10)[2:5]
[2, 3, 4]
>>> xrange(10)[2:5]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: sequence index must be integer, not 'slice'
>>>
It does, however support indexing:
>>> xrange(11)[10]
10

Related

how to create own map() function in python

I am trying to create the built-in map() function in python.
Here is may attempt:
def mapper(func, *sequences):
if len(sequences) > 1:
while True:
list.append(func(sequences[0][0],sequences[0][0],))
return list
return list
But i really stuck, because if the user gives e.g 100 arguments how do i deal with those
You use the asterisk * when you call the function:
def mapper(func, *sequences):
result = []
if len(sequences) > 0:
minl = min(len(subseq) for subseq in sequences)
for i in range(minl):
result.append(func(*[subseq[i] for subseq in sequences]))
return result
This produces:
>>> import operator
>>> mapper(operator.add, [1,2,4], [3,6,9])
[4, 8, 13]
By using the asterisk, we unpack the iterable as separate parameters in the function call.
Note that this is still not fully equivalent, since:
the sequences should be iterables, not per se lists, so we can not always index; and
the result of a map in python-3.x is an iterable as well, so not a list.
A more python-3.x-like map function would be:
def mapper(func, *sequences):
if not sequences:
raise TypeError('Mapper should have at least two parameters')
iters = [iter(seq) for seq in sequences]
while True:
yield func(*[next(it) for it in iters])
Note however that most Python interpreters will implement map closer to the interpreter than Python code, so it is definitely more efficient to use the builtin map, than writing your own.
N.B.: it is better not to use variable names like list, set, dict, etc. since these will override (here locally) the reference to the list type. As a result a call like list(some_iterable) will no longer work.
Separating the part of combining of the sequence or sequences logic is much more easier to read and understand.
def mapper(func, *args):
for i in zip(*args):
yield func(*i)
Here we are using Python inbuilt zip
if you want to replace it entirely with your own implementation replace with zip with the below zipper function
def zipper(*args):
for i in range(len(args[0])):
index_elements = []
for arg in args:
index_elements.append(arg[i])
yield positional_elements

Using comprehensions instead of a for loop

The following is a simplified example of my code.
>>> def action(num):
print "Number is", num
>>> items = [1, 3, 6]
>>> for i in [j for j in items if j > 4]:
action(i)
Number is 6
My question is the following: is it bad practice (for reasons such as code clarity) to simply replace the for loop with a comprehension which will still call the action function? That is:
>>> (action(j) for j in items if j > 2)
Number is 6
This shouldn't use a generator or comprehension at all.
def action(num):
print "Number is", num
items = [1, 3, 6]
for j in items:
if j > 4:
action(i)
Generators evaluate lazily. The expression (action(j) for j in items if j > 2) will merely return a generator expression to the caller. Nothing will happen in it unless you explicitly exhaust it. List comprehensions evaluate eagerly, but, in this particular case, you are left with a list with no purpose. Just use a regular loop.
This is bad practice. Firstly, your code fragment does not produce the desired output. You would instead get something like: <generator object <genexpr> at 0x03D826F0>.
Secondly, a list comprehension is for creating sequences, and generators a for creating streams of objects. Typically, they do not have side effects. Your action function is a prime example of a side effect -- it prints its input and returns nothing. Rather, a generator should for each item it generates, take an input and compute some output. eg.
doubled_odds = [x*2 for x in range(10) if x % 2 != 0]
By using a generator you are obfuscating the purpose of your code, which is to mutate global state (printing something), and not to create a stream of objects.
Whereas, just using a for loop makes the code slightly longer (basically just more whitespace), but immediately you can see that the purpose is to apply function to a selection of items (as opposed to creating a new stream/list of items).
for i in items:
if i < 4:
action(i)
Remember that generators are still looping constructs and that the underlying bytecode is more or less the same (if anything, generators are marginally less efficient), and you lose clarity. Generators and list comprehensions are great, but this is not the right situation for them.
While I personally favour Tigerhawk's solution, there might be a middle ground between his and willywonkadailyblah's solution (now deleted).
One of willywonkadailyblah's points was:
Why create a new list instead of just using the old one? You already have the condition to filter out the correct elements, so why put them away in memory and come back for them?
One way to avoid this problem is to use lazy evaluation of the filtering i.e. have the filtering done only when iterating using the for loop by making the filtering part of a generator expression rather than a list comprehension:
for i in (j for j in items if j > 4):
action(i)
Output
Number is 6
In all honesty, I think Tigerhawk's solution is the best for this, though. This is just one possible alternative.
The reason that I proposed this is that it reminds me a lot of LINQ queries in C#, where you define a lazy way to extract, filter and project elements from a sequence in one statement (the LINQ expression) and can then use a separate for each loop with that query to perform some action on each element.

Index of first differing element without loop

I have two lists say
A = [1,3]
B = [1,3,5,6]
I want to know the index of the first differing element between these lists (2 in this case).
Is there a simple way to do this, or do I need to write a loop?
You can use following generator expression within next() function using enumerate() and zip() function:
>>> next(ind for ind,(i,j) in enumerate(zip(A,B)) if i != j)
2
Perhaps the loop you mentioned is the most obvious way, not necessarily the most pretty. Still every O(n) complexity solution is fine by me.
lesser_length = min(len(A), len(B))
answer = lesser_length # If one of the lists is shorter and a sublist,
# this will be the answer, because the if condition
# will never be satisfied.
for i in xrange(lesser_length):
if A[i] != B[i]:
answer = i
break
range instead of xrange in Python3. A generator would be the best way given that you don't know when the difference between lists will occur.(In Python2, xrange is generator. In Python3, xrange became the regular range() function.)
A list comprehension is also viable. I find this to be more readable.

Extend range in Python

I need a loop containing range(3,666,2) and 2 (for the sieve of Eratosthenes, by the way). This doesn't work ("AttributeError: 'range' object has no attribute 'extend'" ... or "append"):
primes = range(3,limit,2)
primes.extend(2)
How can I do it in the simple intuitive pythonesque way?
range() in Python 3 returns a dedicated immutable sequence object. You'll have to turn it into a list to extend it:
primes = list(range(3, limit, 2))
primes.append(2)
Note that I used list.append(), not list.extend() (which expects a sequence of values, not one integer).
However, you probably want to start your loop with 2, not end it. Moreover, materializing the whole range into a list requires some memory and kills the efficiency of the object. Use iterator chaining instead:
from itertools import chain
primes = chain([2], range(3, limit, 2))
Now you can loop over primes without materializing a whole list in memory, and still include 2 at the start of the loop.
If you're only looping and don't want to materialise, then:
from itertools import chain
primes = chain([2], range(3, limit, 2))
I think the two makes more sense at the start though...

Explain the use of yields in this Game of Life implementation

In this PyCon talk, Jack Diederich shows this "simple" implementation of Conway's Game of Life. I am not intimately familiar with either GoL or semi-advanced Python, but the code seems quite easy to grasp, if not for two things:
The use of yield. I have seen the use of yield to create generators before, but eight of them in a row is new... Does it return a list of eight generators, or how does this thing work?
set(itertools.chain(*map(neighbors, board))). The star unpacks the resulting list (?) from applying neighbours to board, and ... my mind just blew.
Could someone try to explain these two parts for a programmer that is used to hacking together some python code using map, filter and reduce, but that is not using Python on a daily basis? :-)
import itertools
def neighbors(point):
x, y = point
yield x + 1, y
yield x - 1, y
yield x, y + 1
yield x, y - 1
yield x + 1, y + 1
yield x + 1, y - 1
yield x - 1, y + 1
yield x - 1, y - 1
def advance(board):
newstate = set()
recalc = board | set(itertools.chain(*map(neighbors, board)))
for point in recalc:
count = sum((neigh in board) for neigh in neighbors(point))
if count == 3 or (count == 2 and point in board):
newstate.add(point)
return newstate
glider = set([(0,0), (1,0), (2, 0), (0,1), (1,2)])
for i in range(1000):
glider = advance(glider)
print glider
Generators operate on two principles: they produce a value each time a yield statement is encountered, and unless it is iterated over, their code is paused.
It doesn't matter how many yield statements are used in a generator, the code is still run in normal python ordering. In this case, there is no loop, just a series of yield statements, so each time the generator is advanced, python executes the next line, which is another yield statement.
What happens with the neighbors generator is this:
Generators always start paused, so calling neighbors(position) returns a generator that hasn't done anything yet.
When it is advanced (next() is called on it), the code is run until the first yield statement. First x, y = point is executed, then x + 1, y is calculated and yielded. The code pauses again.
When advanced again, the code runs until the next yield statement is encountered. It yields x - 1, y.
etc. until the function completes.
The set(itertools.chain(*map(neighbors, board))) line does:
map(neighbors, board) produces an iterator for each and every position in the board sequence. It simply loops over board, calls neighbors on each value, and returns a new sequence of the results. Each neighbors() function returns a generator.
The *parameter syntax expands the parameter sequence into a list of parameters, as if the function was called with each element in parameter as a separate positional parameter instead. param = [1, 2, 3]; foo(*param) would translate to foo(1, 2, 3).
itertools.chain(*map(..)) takes each and every generator produced by the map, and applies that as a series of positional parameters to itertools.chain(). Looping over the output of chain means that each and every generator for each and every board position is iterated over once, in order.
All the generated positions are added to a set, essentially removing duplicates
You could expand the code to:
positions = set()
for board_position in board:
for neighbor in neighbors(board):
positions.add(neighbor)
In python 3, that line could be expressed a little more efficiently still by using itertools.chain.from_iterable() instead, because map() in Python 3 is a generator too; .from_iterable() doesn't force the map() to be expanded and will instead loop over the map() results one by one as needed.
Wow, that's a neat implementation, thanks for posting it !
For the yield, there is nothing to add to Martijn's answer.
As for the star : the map returns a generator or a list (depending on python 2 or 3), and each item of this list is a generator (from neighbors), so we have a list of generators.
chain takes many arguments that are iterables and chains them, meaning it returns a single iterable while iterate over all of them in turn.
Because we have a list of generators, and chain takes many arguments, we use a star to convert the list of generator to arguments. We could have done the same with chain.from_iterable.
it just returns a tuple of all cell's neighbours. If you do understand what generators do, it is pretty clear that using generators is a good practice when working with big amount of data. you do not need to store all this in memory, you calculate it only when you need it.

Categories

Resources