Why does map work like izip_longest with fill=None? - python

When map has different-length inputs, a fill value of None is used for the missing inputs:
>>> x = [[1,2,3,4],[5,6]]
>>> map(lambda *x:x, *x)
[(1, 5), (2, 6), (3, None), (4, None)]
This is the same behavior as:
>>> import itertools
>>> list(itertools.izip_longest(*x))
[(1, 5), (2, 6), (3, None), (4, None)]
What's the reason map provides this behavior, and not the following?
>>> map(lambda *x:x, *x)
[(1, 5), (2, 6), (3,), (4,)]
…and is there an easy way to get the latter behavior either with some flavor of zip or map?

I think this a design decision that the core devs opted for at the time they implemented map. There's no universally defined behavior for map when it is used with multiple iterables, quoting from Map (higher-order function):
Map with 2 or more lists encounters the issue of handling when the
lists are of different lengths. Various languages differ on this; some
raise an exception, some stop after the length of the shortest list
and ignore extra items on the other lists; some continue on to the
length of the longest list, and for the lists that have already ended,
pass some placeholder value to the function indicating no value.
So, Python core devs opted for None as placeholder for shorter iterables at the time map was introduced to Python in 1993.
But in case of itertools.imap it short-circuits with the shortest iterable because its design is heavily inspired from languages like Standard ML, Haskell and APL. In Standard ML and Haskell map ends with shortest iterable(I am not sure about APL though).
Python 3 also removed the map(None, ...)(or we should say itertools.imap, Python 3's map is actually almost itertools.imap: Move map() from itertools to builtins) construct because it was present in Python 2 only because at the time map was added to Python there was no zip() function in Python.
From Issue2186: map and filter shouldn't support None as first argument (in Py3k only):
I concur with Guido that we never would have created map(None, ...) if
zip() had existed. The primary use case is obsolete.
To get the result you want I would suggest using itertools.izip_longest with a sentinel value(object()) rather than default None, None will break things if the iterables itself contain None:
from itertools import izip_longest
def solve(seq):
sentinel = object()
return [tuple(x for x in item if x is not sentinel) for item in
izip_longest(*seq, fillvalue=sentinel)]
print solve([[1,2,3,4],[5,6]])
# [(1, 5), (2, 6), (3,), (4,)]

Given that the first list is always longer and that there are only two lists, you would do something like this:
x = [1,2,3,4,5]
y = ['a','b']
zip(x,y) + [(i,) for i in x[len(y):]]
[(1, 'a'), (2, 'b'), (3,), (4,), (5,)]

Related

how does itemgetter work? what happens when we initialise it to key

>>> from operator import itemgetter
>>> a = [(5, 3), (1, 3), (1, 2), (2, -1), (4, 9)]
>>> sorted(a, key=itemgetter(0))
[(1, 3), (1, 2), (2, -1), (4, 9), (5, 3)]
How does this work? is key a function as well? i am confused about what goes behind key=itemgetter(0) ? If some one can explain step by step
itemgetter(..) [python-doc] is a function that constructs a function. This concept is known in computer science as currying [wiki]. Currying is very common in functional programming languages.
A simplified version of itemgetter would be implemented as follows:
def itemgetter(key):
def f(item):
return item[key]
return f
So for example if we construct an itemgetter(1), we can then call that function, for example:
>>> f = itemgetter(1)
>>> f([1,4,2,5])
4
So here f(..) will take the second item of the list.
is key a function as well?
Yes, the key is a function. As the documentation of sorted(..) [python-doc] says:
key specifies a function of one argument that is used to extract a comparison key from each element in iterable (for example, key=str.lower). The default value is None (compare the elements directly).

Why is python's sorted called stable despite not preserving the original order?

Summary
Sorting in Python is guaranteed to be stable since Python 2.2, as documented here and here.
Wikipedia explains what the property of being stable means for the behavior of the algorithm:
A sorting algorithm is stable if whenever there are two records R and S with the same key, and R appears before S in the original list, then R will always appear before S in the sorted list.
However, when sorting objects, such as tuples, sorting appears to be unstable.
For example,
>>> a = [(1, 3), (3, 2), (2, 4), (1, 2)]
>>> sorted(a)
[(1, 2), (1, 3), (2, 4), (3, 2)]
However, to be considered stable, I thought the new sequence should've been
[(1, 3), (1, 2), (2, 4), (3, 2)]
because, in the original sequence, the tuple (1, 3) appears before tuple (1, 2). The sorted function is relying on the 2-ary "keys" when the 1-ary "keys" are equal. (To clarify, the 1-ary key of some tuple t would be t[0] and the 2-ary t[1].)
To produce the expected result, we have to do the following:
>>> sorted(a, key=lambda t: t[0])
[(1, 3), (1, 2), (2, 4), (3, 2)]
I'm guessing there's a false assumption on my part, either about sorted or maybe on how tuple and/or list types are treated during comparison.
Questions
Why is the sorted function said to be "stable" even though it alters the original sequence in this manner?
Wouldn't setting the default behavior to that of the lambda version be more consistent with what "stable" means? Why is it not set this way?
Is this behavior simply a side-effect of how tuples and/or lists are inherently compared (i.e. the false assumption)?
Thanks.
Please note that this is not about whether the default behavior is or isn't useful, common, or something else. It's about whether the default behavior is consistent with the definition of what it means to be stable (which, IMHO, does not appear to be the case) and the guarantee of stability mentioned in the docs.
Think about it - (1, 2) comes before (1, 3), does it not? Sorting a list by default does not automatically mean "just sort it based off the first element". Otherwise you could say that apple comes before aardvark in the alphabet. In other words, this has nothing to do with stability.
The docs also have a nice explanation about how data structures such as lists and tuples are sorted lexicographically:
In particular, tuples and lists are compared lexicographically by comparing corresponding elements. This means that to compare equal, every element must compare equal and the two sequences must be of the same type and have the same length.
Stable sort keeps the order of those elements which are considered equal from the sorting point of view. Because tuples are compared element by element lexicographically, (1, 2) precedes (1, 3), so it should go first:
>>> (1, 2) < (1, 3)
True
A tuple's key is made out of all of its items.
>>> (1,2) < (1,3)
True

Lambda function behavior

I'm trying to understand the behavior of the lambda below. What value is actually passing on to argument pair? This will help me understand the return part pair[1].
pairs = [(1, 'one'), (2, 'two'), (3, 'three'), (4, 'four')]
pairs.sort(key=lambda pair: pair[1])
print (pairs)
As I understand, sort will sort the list pairs. It will compare if a function is passed as a parameter. So how I'm I getting the below output:
OUTPUT:
[(4, 'four'), (1, 'one'), (3, 'three'), (2, 'two')]
If you want to sort by the numeric value, change your lambda to
pairs.sort(key=lambda pair: pair[0])
Python is zero-indexed. The first element of each tuple has index 0. pair[1] would refer to the words in the tuple, not the numbers. So if you want to sort by the text, alphabetically, what you have works.
If you want to see what's being passed through the lambda --- which was your question:
from __future__ import print_function #Needed if you're on Python 2
pairs = [(1, 'one'), (2, 'two'), (3, 'three'), (4, 'four')]
pairs.sort(key=lambda pair: print(pair[1]))
Which returns
one
two
three
four
Verify this by checking the output if you print(pair[0]) or print(pair).
The lambda function receives a parameter which in your case is a tuple of size 2 and returns the second element (the number in word format in your case). the sort method will sort your pairs list according to the key you pass to it, which is the lambda function in your code. In python, when sorting a list of strings, it will sort in lexicographically, so your code will sort the pairs in a way that the 2nd elements are sorted lexicographically.
Try to avoid lambda expressions where possible. At one point, they were going to be eliminated from the language altogether, and various helper functions and classes were introduced to fill the void that would be left behind. (Ultimately, lambda expressions survived, but the helpers remained.)
One of those was the itemgetter class, which can be used to define functions suitable for use with sort.
from operator import itemgetter
pairs.sort(key=itemgetter(0))
(As #metropolis points out, you want to use 0, not 1, to sort by the initial integer component of each pair.)

Get all combinations of list elements not ignoring position in Python

I want to combine all elements of a list into sublists (not tuples) with a specified length.
The itertools.combinations_with_replacement generator does nearly what I want to achieve:
>>> list(itertools.combinations_with_replacement([1,2],2))
[(1, 1), (1, 2), (2, 2)]
There are only two things I dislike: It creates tuples instead of sublists (what I could change with a map), and it misses the element (2,1) in the example above.
Is there any builtin module in Python 2 that does what I want? If not, is there any simple way to at least get combinations_with_replacements (or any other module function) to generate the missing element in the provided example?
maybe:
>>> from itertools import product
>>> list(product([1, 2], repeat=2))
[(1, 1), (1, 2), (2, 1), (2, 2)]

Unexpected behavor zipping an iterator with a sequence

While trying to solve a particular code golf question, I came across a particular scenario, which I was having difficulty in understanding the behavior.
The scenario was, ziping an iterator with a sequence, and after the transpose operation, the iterator was one past the expected element.
>>> l = range(10)
>>> it = iter(l)
>>> zip(it, range(5))
[(0, 0), (1, 1), (2, 2), (3, 3), (4, 4)]
>>> next(it) #expecting 5 here
6
Am I missing something obvious?
Note Please provide credible references for answers that may not be obvious
I suspect that, the 5 is consumed when zip tried to zip the next items. Zip stops when one of its arg is "empty":
>>> l = range(10)
>>> it = iter(l)
>>> zip(range(5),it)
[(0, 0), (1, 1), (2, 2), (3, 3), (4, 4)]
>>> it.next()
5
By reversing the order, zip knows that it can stop and do not consume the next item from it
If you want references you can check the izip documentation. It gives an equivalent implementation:
def izip(*iterables):
iterators = map(iter, iterables)
while iterators:
yield tuple(map(next, iterators))
Since list(izip(*args)) is expected to have same behavior as zip(*args), the result you got is actually the logical behavior.

Categories

Resources