As zip yields as many values as the shortest iterable given, I would have expected passing zero arguments to zip to return an iterable yielding infinitely many tuples, instead of returning an empty iterable.
This would have been consistent with how other monoidal operations behave:
>>> sum([]) # sum
0
>>> math.prod([]) # product
1
>>> all([]) # logical conjunction
True
>>> any([]) # logical disjunction
False
>>> list(itertools.product()) # Cartesian product
[()]
For each of these operations, the value returned when given no arguments the identity value for the operation, which is to say, one that does not modify the result when included in the operation:
sum(xs) == sum([*xs, 0]) == sum([*xs, sum()])
math.prod(xs) == math.prod([*xs, 1]) == math.prod([*xs, math.prod()])
all(xs) == all([*xs, True]) == all([*xs, all()])
any(xs) == any([*xs, False]) == any([*xs, any()])
Or at least, one that gives a trivially isomorphic result:
itertools.product(*xs, itertools.product()) ≡
≡ itertools.product(*xs, [()]) ≡
≡ (*x, ()) for x in itertools.product(*xs)
In the case of zip, this would have been:
zip(*xs, zip()) ≡ f(x) for x in zip(*xs)
Because zip returns an n-tuple when given n arguments, it follows that zip() with 0 arguments must yield 0-tuples, i.e. (). This forces f to return (*x, ()) and therefore zip() to be equivalent to itertools.repeat(()). Another, more general law is:
((*x, *y) for x, y in zip(zip(*xs), zip(*ys)) ≡ zip(*xs, *ys)
which would have then held for all xs and ys, including when either xs or ys is empty (and does hold for itertools.product).
Yielding empty tuples indefinitely is also the behaviour that falls out of this straightforward reimplementation:
def my_zip(*iterables):
iterables = tuple(map(iter, iterables))
while True:
item = []
for it in iterables:
try:
item.append(next(it))
except StopIteration:
return
yield tuple(item)
which means that the case of zip with no arguments must have been specifically special-cased not to do that.
Why is zip() not equivalent to itertools.repeat(()) despite all the above?
PEP 201 and related discussion show that zip() with no arguments originally raised an exception. It was changed to return an empty list because this is more convenient for some cases of zip(*s) where s turns out to be an empty list. No consideration was given to what might be the 'identity', which in any case appears difficult to define with respect to zip - there is nothing you can zip with arbitrary x that will return x.
The original reasons for certain commutative and associative mathematical functions applied to an empty list to return the identity by default are not clear, but may have been driven by convenience, principle of least astonishment, and the history of earlier languages like Perl or ABC. Explicit reference to the concept of mathematical identity is rarely if ever made (see e.g. Reason for "all" and "any" result on empty lists). So there is no reason to rely on functions in general to do this. In many cases it would be less surprising for them to raise an exception instead.
Related
Full question: Design a recursive algorithm that will display whether it is possible to choose two integers from a list of integers such that the difference of the two equals a given value. Hint: You may wish to invoke another algorithm, (i.e., function), that accepts more parameters which performs the recursion. Submit a python function is_diff_two(values,diff) that take a list and the desired difference as a non-negative integer and returns true or false. The only functions you may use are: print, str, int, float, bool, len, list, range, abs, round, pow. We will not deduct for the use of str(), but you do not actually need it. Do not import libraries.
Keywords allowed: if, elif, else, and, or, not, return, def, assert,
Keywords not allowed: for, while, in, import.
Note that you may not use slicing(AKA colons in your indicies).
I am having trouble implementing this algorithm; I'm not sure what the base case is, how to use a helper method, or how to do the problem without using loops or python list slicing. What I currently have is a helper method called check_diff(), that takes a list as a parameter, recursively iterating through the list and appending to a new list all the possible differences between values in the original list. I was then going to call the method in the is_diff_two() method, in one line checking to see if the diff parameter is in the list - if so, it should return true, if not then false. This is what I have for my helper method thus far, but I cannot figure out how to correctly recurse through the list and get all the possible differences.
def check_diff(values):
diff_list = []
if len(values) == 1:
return diff_list
else:
diff_list.append(values[0] - check_diff(values[1]))
return values[0] - check_diff(values[1])
You can make a function that optionally takes two indices of the given list, and makes recursive calls with the second indice incremented until it is out of range, at which point make recursive calls with first indice incremented until it is out of range, at which point return False if there is no two values at the two indices found to differ by the target value, or return True if any such values are found:
def is_diff_two(values, diff, a=0, b=1):
if a == len(values):
return False
if b == len(values):
return is_diff_two(values, diff, a + 1, a + 2)
return abs(values[a] - values[b]) == diff or is_diff_two(values, diff, a, b + 1)
so that is_diff_two([2, 4, 8], 3) returns False,
and that is_diff_two([2, 4, 8], 4) returns True.
CLARIFICATIONS:
I just realized my definition and code below might be wrong, because they don't take into account nested lists. I really want the ultimate result from concatenate to be either an object which is not a list, or a list of more than 1 objects that are not lists (so no nested lists). And the empty list should become the object Empty.
But it is possible for the user to provide an input consisting of nested lists, in such a case I need them to be denested. Apologies for not being clear on this.
I have objects of a certain type (which can have Empty as value too), and I have a binary concatenation operator on these objects that satisfies the following axioms (here [A, B] means the list containing A and B):
concatenate2(Empty, A) = concatenate2(A, Empty) = A
concatenate2(A, [B, C]) = concatenate2([A, B], C) = [A, B, C]
concatenate2(A, B) = [A, B] (if A, B do not match any of the previous cases).
Now I want to also have a concatenation of arbitrarily many terms:
concatenate([]) = Empty
concatenate([A]) = A
concatenate([A, B]) = concatenate2(A, B)
concatenate([A, B, ...]) = concatenate([concatenate2(A, B), ...])
I would like to implement these operators in a way that minimizes the number of list copy operations, but I am not sure how to do this best in Python.
My current idea was to do something like this:
def concatenate2(A, B):
if A == Empty:
return B
if B == Empty:
return A
if type(A) == list:
return concatenate(A + [B])
if type(B) == list:
return concatenate([A] + B)
return [A, B]
def concatenate(terms):
if terms == []:
return Empty
if len(terms) == 1:
return terms[0]
if len(terms) == 2:
return concatenate2(terms[0], terms[1])
return concatenate(concatenate2(terms[0], terms[1]) + terms[2:])
This looks pretty nice and clear, but I don't know how well it stands in terms of performance and memory usage. I am worried it might cause too many list copies during each [...] + [...] operation.
Is there a better way to implement these operations?
Note that ultimately only the concatenate operation is really required. The concatenate2 operator was used to give a nice recursive definition, but if someone can propose a more efficient solution that does not use it, I would accept it too.
Using + for repeated concatenation is not ideal as it keeps creating intermediate list objects for each binary concatenation which results in quadratic worst case time-complexity with regard to the combined length. A simpler and better approach would be a nested comprehension which has linear complexity.
This also uses the * operator to unpack an arbitrary number of arguments:
def concatenate(*terms):
return [x for t in terms for x in (t if isinstance(t, list) else [t])]
>>> concatenate([3, 4], 5, [], 7, [1])
[3, 4, 5, 7, 1]
>>> concatenate()
[]
What you seem to be wanting is not just variadic, but also has a mixed type signature.
Suppose that we want to define some concatenate_all(*args) function that concatenates all arguments thrown at it.
If you agree that all arguments of concatenate_all are sequences, we can form a single sequence out of them, and fold-left it with concatenate:
import itertools
# Pretend that concatenate_all is [[A]] -> [A]
def concatenate_all(*seqs):
all_seqs = itertools.chain(*seqs)
return reduce(lambda acc, x: concatenate(acc, x), all_seqs, EMPTY)
If we assume that some of the args are scalars, and some are lists, we can wrap the scalars into lists and use the same trick.
def concatenate_all(*scalars_or_seqs):
def to_list(x):
# TODO: should make it work with generators, too.
return x if isinstance(x, list) else [x]
# We use itertools to avoid creating intermediate lists.
all_items = itertools.chain(*scalars_or_seqs)
all_lists = itertools.imap(to_list, all_items)
return reduce(lambda acc, x: concatenate(acc, x), all_lists, EMPTY)
If we assume that some of the args are also nested lists which we need to flatten, you can update the code above to also handle that.
I want to warn you against making the a function that is too smart about its arguments. Excessive magic may initially look neat, but in practice becomes too hard to reason about, especially when using such a highly dynamic language as Python, with nearly zero static checks. It's better to push wrapping and flattening to the caller side and make them explicit.
When a Python list is known to always contain a single item, is there a way to access it other than:
mylist[0]
You may ask, 'Why would you want to?'. Curiosity alone. There seems to be an alternative way to do everything in Python.
Raises exception if not exactly one item:
Sequence unpacking:
singleitem, = mylist
# Identical in behavior (byte code produced is the same),
# but arguably more readable since a lone trailing comma could be missed:
[singleitem] = mylist
Rampant insanity, unpack the input to the identity lambda function:
# The only even semi-reasonable way to retrieve a single item and raise an exception on
# failure for too many, not just too few, elements as an expression, rather than a
# statement, without resorting to defining/importing functions elsewhere to do the work
singleitem = (lambda x: x)(*mylist)
All others silently ignore spec violation, producing first or last item:
Explicit use of iterator protocol:
singleitem = next(iter(mylist))
Destructive pop:
singleitem = mylist.pop()
Negative index:
singleitem = mylist[-1]
Set via single iteration for (because the loop variable remains available with its last value when a loop terminates):
for singleitem in mylist: break
There are many others (combining or varying bits of the above, or otherwise relying on implicit iteration), but you get the idea.
I will add that the more_itertools
library has a tool that returns one item from an iterable.
from more_itertools import one
iterable = ["foo"]
one(iterable)
# "foo"
In addition, more_itertools.one raises an error if the iterable is empty or has more than one item.
iterable = []
one(iterable)
# ValueError: not enough values to unpack (expected 1, got 0)
iterable = ["foo", "bar"]
one(iterable)
# ValueError: too many values to unpack (expected 1)
more_itertools is a third-party package > pip install more-itertools
(This is an adjusted repost of my answer to a similar question related to sets.)
One way is to use reduce with lambda x: x.
from functools import reduce
> reduce(lambda x: x, [3]})
3
> reduce(lambda x: x, [1, 2, 3])
TypeError: <lambda>() takes 1 positional argument but 2 were given
> reduce(lambda x: x, [])
TypeError: reduce() of empty sequence with no initial value
Benefits:
Fails for multiple and zero values
Doesn't change the original list
Doesn't need a new variable and can be passed as an argument
Cons: "API misuse" (see comments).
I am wondering about the use of == when comparing two generators
For example:
x = ['1','2','3','4','5']
gen_1 = (int(ele) for ele in x)
gen_2 = (int(ele) for ele in x)
gen_1 and gen_2 are the same for all practical purposes, and yet when I compare them:
>>> gen_1 == gen_2
False
My guess here is that == here is treated like is normally is, and since gen_1 and gen_2 are located in different places in memory:
>>> gen_1
<generator object <genexpr> at 0x01E8BAA8>
>>> gen_2
<generator object <genexpr> at 0x01EEE4B8>
their comparison evaluates to False. Am I right on this guess? And any other insight is welcome.
And btw, I do know how to compare two generators:
>>> all(a == b for a,b in zip(gen_1, gen_2))
True
or even
>>> list(gen_1) == list(gen_2)
True
But if there is a better way, I'd love to know.
You are right with your guess – the fallback for comparison of types that don't define == is comparison based on object identity.
A better way to compare the values they generate would be
from itertools import zip_longest, tee
sentinel = object()
all(a == b for a, b in zip_longest(gen_1, gen_2, fillvalue=sentinel))
(For Python 2.x use izip_longest instead of zip_longest)
This can actually short-circuit without necessarily having to look at all values. As pointed out by larsmans in the comments, we can't use zip() here since it might give wrong results if the generators produce a different number of elements – zip() will stop on the shortest iterator. We use a newly created object instance as fill value for zip_longest(), since object instances compare unequal to any sane value that could appear in one of the generators (including other object instances).
Note that there is no way to compare generators without changing their state. You could store the items that were consumed if you need them later on:
gen_1, gen_1_teed = tee(gen_1)
gen_2, gen_2_teed = tee(gen_2)
all(a == b for a, b in zip_longest(gen_1, gen_2, fillvalue=sentinel))
This will give leave the state of gen_1 and gen_2 essentially unchanged. All values consumed by all() are stored inside the tee object.
At that point, you might ask yourself if it is really worth it to use lazy generators for the application at hand -- it might be better to simply convert them to lists and work with the lists instead.
Because generators generate their values on-demand, there isn't any way to "compare" them without actually consuming them. And if your generators generate an infinite sequence of values, such an equality test as you propose would be useless.
== is indeed the same as is on two generators, because that's the only check that can be made without changing their state and thus losing elements.
list(gen_1) == list(gen_2)
is the reliable and general way of comparing two finite generators (but obviously consumes both); your zip-based solution fails when they do not generate an equal numbers of elements:
>>> list(zip([1,2,3,4], [1,2,3]))
[(1, 1), (2, 2), (3, 3)]
>>> all(a == b for a, b in zip([1,2,3,4], [1,2,3]))
True
The list-based solution still fails when either generator generates an infinite number of elements. You can devise a workaround for that, but when both generators are infinite, you can only devise a semi-algorithm for non-equality.
In order to do an item-wise comparison of two generators as with lists and other containers, Python would have to consume them both entirely (well, the shorter one, anyway). I think it's good that you must do this explicitly, especially since one or the other may be infinite.
Simply put! there is this list say LST = [[12,1],[23,2],[16,3],[12,4],[14,5]] and i want to get all the minimum elements of this list according to its first element of the inside list. So for the above example the answer would be [12,1] and [12,4]. Is there any typical way in python of doing this?
Thanking you in advance.
Two passes:
minval = min(LST)[0]
return [x for x in LST if x[0] == minval]
One pass:
def all_minima(iterable, key=None):
if key is None: key = id
hasminvalue = False
minvalue = None
minlist = []
for entry in iterable:
value = key(entry)
if not hasminvalue or value < minvalue:
minvalue = value
hasminvalue = True
minlist = [entry]
elif value == minvalue:
minlist.append(entry)
return minlist
from operator import itemgetter
return all_minima(LST, key=itemgetter(0))
A compact single-pass solution requires sorting the list -- that's technically O(N log N) for an N-long list, but Python's sort is so good, and so many sequences "just happen" to have some embedded order in them (which timsort cleverly exploits to go faster), that sorting-based solutions sometimes have surprisingly good performance in the real world.
Here's a solution requiring 2.6 or better:
import itertools
import operator
f = operator.itemgetter(0)
def minima(lol):
return list(next(itertools.groupby(sorted(lol, key=f), key=f))[1])
To understand this approach, looking "from the inside, outwards" helps.
f, i.e., operator.itemgetter(0), is a key-function that picks the first item of its argument for ordering purposes -- the very purpose of operator.itemgetter is to easily and compactly build such functions.
sorted(lol, key=f) therefore returns a sorted copy of the list-of-lists lol, ordered by increasing first item. If you omit the key=f the sorted copy will be ordered lexicographically, so it will also be in order of increasing first item, but that acts only as the "primary key" -- items with the same first sub-item will in turn be sorted among them by the values of their second sub-items, and so forth -- while with the key=f you're guaranteed to preserve the original order among items with the same first sub-item. You don't specify which behavior you require (and in your example the two behaviors happen to produce the same result, so we cannot distinguish from that example) which is why I'm carefully detailing both possibilities so you can choose.
itertools.groupby(sorted(lol, key=f), key=f) performs the "grouping" task that is the heart of the operation: it yields groups from the sequence (in this case, the sequence sorted provides) based on the key ordering criteria. That is, a group with all adjacent items producing the same value among themselves when you call f with the item as an argument, then a group with all adjacent item producing a different value from the first group (but same among themselves), and so forth. groupby respect the ordering of the sequence it takes as its argument, which is why we had to sort the lol first (and this behavior of groupby makes it very useful in many cases in which the sequence's ordering does matter).
Each result yielded by groupby is a pair k, g: a key k which is the result of f(i) on each item in the group, an iterator g which yields each item in the group in sequence.
The next built-in (the only bit in this solution which requires Python 2.6) given an iterator produces its next item -- in particular, the first item when called on a fresh, newly made iterator (and, every generator of course is an iterator, as is groupby's result). In earlier Python versions, it would have to be groupby(...).next() (since next was only a method of iterators, not a built-in), which is deprecated since 2.6.
So, summarizing, the result of our next(...) is exactly the pair k, g where k is the minimum (i.e., first after sorting) value for the first sub-item, and g is an iterator for the group's items.
So, with that [1] we pick just the iterator, so we have an iterator yielding just the subitems we want.
Since we want a list, not an iterator (per your specs), the outermost list(...) call completes the job.
Is all of this worth it, performance-wise? Not on the tiny example list you give -- minima is actually slower than either code in #Kenny's answer (of which the first, "two-pass" solution is speedier). I just think it's worth keeping the ideas in mind for the next sequence processing problem you may encounter, where the details of typical inputs may be quite different (longer lists, rarer minima, partial ordering in the input, &c, &c;-).
m = min(LST, key=operator.itemgetter(0))[0]
print [x for x in LST if x[0] == m]
minval = min(x[0] for x in LST)
result = [x for x in LST if x[0]==minval]