Python implementing a variadic concatenation operator - python

CLARIFICATIONS:
I just realized my definition and code below might be wrong, because they don't take into account nested lists. I really want the ultimate result from concatenate to be either an object which is not a list, or a list of more than 1 objects that are not lists (so no nested lists). And the empty list should become the object Empty.
But it is possible for the user to provide an input consisting of nested lists, in such a case I need them to be denested. Apologies for not being clear on this.
I have objects of a certain type (which can have Empty as value too), and I have a binary concatenation operator on these objects that satisfies the following axioms (here [A, B] means the list containing A and B):
concatenate2(Empty, A) = concatenate2(A, Empty) = A
concatenate2(A, [B, C]) = concatenate2([A, B], C) = [A, B, C]
concatenate2(A, B) = [A, B] (if A, B do not match any of the previous cases).
Now I want to also have a concatenation of arbitrarily many terms:
concatenate([]) = Empty
concatenate([A]) = A
concatenate([A, B]) = concatenate2(A, B)
concatenate([A, B, ...]) = concatenate([concatenate2(A, B), ...])
I would like to implement these operators in a way that minimizes the number of list copy operations, but I am not sure how to do this best in Python.
My current idea was to do something like this:
def concatenate2(A, B):
if A == Empty:
return B
if B == Empty:
return A
if type(A) == list:
return concatenate(A + [B])
if type(B) == list:
return concatenate([A] + B)
return [A, B]
def concatenate(terms):
if terms == []:
return Empty
if len(terms) == 1:
return terms[0]
if len(terms) == 2:
return concatenate2(terms[0], terms[1])
return concatenate(concatenate2(terms[0], terms[1]) + terms[2:])
This looks pretty nice and clear, but I don't know how well it stands in terms of performance and memory usage. I am worried it might cause too many list copies during each [...] + [...] operation.
Is there a better way to implement these operations?
Note that ultimately only the concatenate operation is really required. The concatenate2 operator was used to give a nice recursive definition, but if someone can propose a more efficient solution that does not use it, I would accept it too.

Using + for repeated concatenation is not ideal as it keeps creating intermediate list objects for each binary concatenation which results in quadratic worst case time-complexity with regard to the combined length. A simpler and better approach would be a nested comprehension which has linear complexity.
This also uses the * operator to unpack an arbitrary number of arguments:
def concatenate(*terms):
return [x for t in terms for x in (t if isinstance(t, list) else [t])]
>>> concatenate([3, 4], 5, [], 7, [1])
[3, 4, 5, 7, 1]
>>> concatenate()
[]

What you seem to be wanting is not just variadic, but also has a mixed type signature.
Suppose that we want to define some concatenate_all(*args) function that concatenates all arguments thrown at it.
If you agree that all arguments of concatenate_all are sequences, we can form a single sequence out of them, and fold-left it with concatenate:
import itertools
# Pretend that concatenate_all is [[A]] -> [A]
def concatenate_all(*seqs):
all_seqs = itertools.chain(*seqs)
return reduce(lambda acc, x: concatenate(acc, x), all_seqs, EMPTY)
If we assume that some of the args are scalars, and some are lists, we can wrap the scalars into lists and use the same trick.
def concatenate_all(*scalars_or_seqs):
def to_list(x):
# TODO: should make it work with generators, too.
return x if isinstance(x, list) else [x]
# We use itertools to avoid creating intermediate lists.
all_items = itertools.chain(*scalars_or_seqs)
all_lists = itertools.imap(to_list, all_items)
return reduce(lambda acc, x: concatenate(acc, x), all_lists, EMPTY)
If we assume that some of the args are also nested lists which we need to flatten, you can update the code above to also handle that.
I want to warn you against making the a function that is too smart about its arguments. Excessive magic may initially look neat, but in practice becomes too hard to reason about, especially when using such a highly dynamic language as Python, with nearly zero static checks. It's better to push wrapping and flattening to the caller side and make them explicit.

Related

Why does Python `zip()` yield nothing when given no iterables?

As zip yields as many values as the shortest iterable given, I would have expected passing zero arguments to zip to return an iterable yielding infinitely many tuples, instead of returning an empty iterable.
This would have been consistent with how other monoidal operations behave:
>>> sum([]) # sum
0
>>> math.prod([]) # product
1
>>> all([]) # logical conjunction
True
>>> any([]) # logical disjunction
False
>>> list(itertools.product()) # Cartesian product
[()]
For each of these operations, the value returned when given no arguments the identity value for the operation, which is to say, one that does not modify the result when included in the operation:
sum(xs) == sum([*xs, 0]) == sum([*xs, sum()])
math.prod(xs) == math.prod([*xs, 1]) == math.prod([*xs, math.prod()])
all(xs) == all([*xs, True]) == all([*xs, all()])
any(xs) == any([*xs, False]) == any([*xs, any()])
Or at least, one that gives a trivially isomorphic result:
itertools.product(*xs, itertools.product()) ≡
≡ itertools.product(*xs, [()]) ≡
≡ (*x, ()) for x in itertools.product(*xs)
In the case of zip, this would have been:
zip(*xs, zip()) ≡ f(x) for x in zip(*xs)
Because zip returns an n-tuple when given n arguments, it follows that zip() with 0 arguments must yield 0-tuples, i.e. (). This forces f to return (*x, ()) and therefore zip() to be equivalent to itertools.repeat(()). Another, more general law is:
((*x, *y) for x, y in zip(zip(*xs), zip(*ys)) ≡ zip(*xs, *ys)
which would have then held for all xs and ys, including when either xs or ys is empty (and does hold for itertools.product).
Yielding empty tuples indefinitely is also the behaviour that falls out of this straightforward reimplementation:
def my_zip(*iterables):
iterables = tuple(map(iter, iterables))
while True:
item = []
for it in iterables:
try:
item.append(next(it))
except StopIteration:
return
yield tuple(item)
which means that the case of zip with no arguments must have been specifically special-cased not to do that.
Why is zip() not equivalent to itertools.repeat(()) despite all the above?
PEP 201 and related discussion show that zip() with no arguments originally raised an exception. It was changed to return an empty list because this is more convenient for some cases of zip(*s) where s turns out to be an empty list. No consideration was given to what might be the 'identity', which in any case appears difficult to define with respect to zip - there is nothing you can zip with arbitrary x that will return x.
The original reasons for certain commutative and associative mathematical functions applied to an empty list to return the identity by default are not clear, but may have been driven by convenience, principle of least astonishment, and the history of earlier languages like Perl or ABC. Explicit reference to the concept of mathematical identity is rarely if ever made (see e.g. Reason for "all" and "any" result on empty lists). So there is no reason to rely on functions in general to do this. In many cases it would be less surprising for them to raise an exception instead.

Why does b+=(4,) work and b = b + (4,) doesn't work when b is a list?

If we take b = [1,2,3] and if we try doing: b+=(4,)
It returns b = [1,2,3,4], but if we try doing b = b + (4,) it doesn't work.
b = [1,2,3]
b+=(4,) # Prints out b = [1,2,3,4]
b = b + (4,) # Gives an error saying you can't add tuples and lists
I expected b+=(4,) to fail as you can't add a list and a tuple, but it worked. So I tried b = b + (4,) expecting to get the same result, but it didn't work.
The problem with "why" questions is that usually they can mean multiple different things. I will try to answer each one I think you might have in mind.
"Why is it possible for it to work differently?" which is answered by e.g. this. Basically, += tries to use different methods of the object: __iadd__ (which is only checked on the left-hand side), vs __add__ and __radd__ ("reverse add", checked on the right-hand side if the left-hand side doesn't have __add__) for +.
"What exactly does each version do?" In short, the list.__iadd__ method does the same thing as list.extend (but because of the language design, there is still an assignment back).
This also means for example that
>>> a = [1,2,3]
>>> b = a
>>> a += [4] # uses the .extend logic, so it is still the same object
>>> b # therefore a and b are still the same list, and b has the `4` added
[1, 2, 3, 4]
>>> b = b + [5] # makes a new list and assigns back to b
>>> a # so now a is a separate list and does not have the `5`
[1, 2, 3, 4]
+, of course, creates a new object, but explicitly requires another list instead of trying to pull elements out of a different sequence.
"Why is it useful for += to do this? It's more efficient; the extend method doesn't have to create a new object. Of course, this has some surprising effects sometimes (like above), and generally Python is not really about efficiency, but these decisions were made a long time ago.
"What is the reason not to allow adding lists and tuples with +?" See here (thanks, #splash58); one idea is that (tuple + list) should produce the same type as (list + tuple), and it's not clear which type the result should be. += doesn't have this problem, because a += b obviously should not change the type of a.
They are not equivalent:
b += (4,)
is shorthand for:
b.extend((4,))
while + concatenates lists, so by:
b = b + (4,)
you're trying to concatenate a tuple to a list
When you do this:
b += (4,)
is converted to this:
b.__iadd__((4,))
Under the hood it calls b.extend((4,)), extend accepts an iterator and this why this also work:
b = [1,2,3]
b += range(2) # prints [1, 2, 3, 0, 1]
but when you do this:
b = b + (4,)
is converted to this:
b = b.__add__((4,))
accept only list object.
From the official docs, for mutable sequence types both:
s += t
s.extend(t)
are defined as:
extends s with the contents of t
Which is different than being defined as:
s = s + t # not equivalent in Python!
This also means any sequence type will work for t, including a tuple like in your example.
But it also works for ranges and generators! For instance, you can also do:
s += range(3)
The "augmented" assignment operators like += were introduced in Python 2.0, which was released in October 2000. The design and rationale are described in PEP 203. One of the declared goals of these operators was the support of in-place operations. Writing
a = [1, 2, 3]
a += [4, 5, 6]
is supposed to update the list a in place. This matters if there are other references to the list a, e.g. when a was received as a function argument.
However, the operation can't always happen in place, since many Python types, including integers and strings, are immutable, so e.g. i += 1 for an integer i can't possibly operate in place.
In summary, augmented assignment operators were supposed to work in place when possible, and create a new object otherwise. To facilitate these design goals, the expression x += y was specified to behave as follows:
If x.__iadd__ is defined, x.__iadd__(y) is evaluated.
Otherwise, if x.__add__ is implemented x.__add__(y) is evaluated.
Otherwise, if y.__radd__ is implemented y.__radd__(x) is evaluated.
Otherwise raise an error.
The first result obtained by this process will be assigned back to x (unless that result is the NotImplemented singleton, in which case the lookup continues with the next step).
This process allows types that support in-place modification to implement __iadd__(). Types that don't support in-place modification don't need to add any new magic methods, since Python will automatically fall back to essentially x = x + y.
So let's finally come to your actual question – why you can add a tuple to a list with an augmented assignment operator. From memory, the history of this was roughly like this: The list.__iadd__() method was implemented to simply call the already existing list.extend() method in Python 2.0. When iterators were introduced in Python 2.1, the list.extend() method was updated to accept arbitrary iterators. The end result of these changes was that my_list += my_tuple worked starting from Python 2.1. The list.__add__() method, however, was never supposed to support arbitrary iterators as the right-hand argument – this was considered inappropriate for a strongly typed language.
I personally think the implementation of augmented operators ended up being a bit too complex in Python. It has many surprising side effects, e.g. this code:
t = ([42], [43])
t[0] += [44]
The second line raises TypeError: 'tuple' object does not support item assignment, but the operation is successfully performed anyway – t will be ([42, 44], [43]) after executing the line that raises the error.
Most people would expect X += Y to be equivalent to X = X + Y. Indeed, the Python Pocket Reference (4th ed) by Mark Lutz says on page 57 "The following two formats are roughly equivalent: X = X + Y , X += Y". However, the people who specified Python did not make them equivalent. Possibly that was a mistake which will result in hours of debugging time by frustrated programmers for as long as Python remains in use, but it's now just the way Python is. If X is a mutable sequence type, X += Y is equivalent to X.extend( Y ) and not to X = X + Y.
As it's explained here, if array doesn't implement __iadd__ method, the b+=(4,) would be just a shorthanded of b = b + (4,) but obviously it's not, so array does implement __iadd__ method. Apparently the implementation of __iadd__ method is something like this:
def __iadd__(self, x):
self.extend(x)
However we know that the above code is not the actual implementation of __iadd__ method but we can assume and accept that there's something like extend method, which accepts tupple inputs.

how to create own map() function in python

I am trying to create the built-in map() function in python.
Here is may attempt:
def mapper(func, *sequences):
if len(sequences) > 1:
while True:
list.append(func(sequences[0][0],sequences[0][0],))
return list
return list
But i really stuck, because if the user gives e.g 100 arguments how do i deal with those
You use the asterisk * when you call the function:
def mapper(func, *sequences):
result = []
if len(sequences) > 0:
minl = min(len(subseq) for subseq in sequences)
for i in range(minl):
result.append(func(*[subseq[i] for subseq in sequences]))
return result
This produces:
>>> import operator
>>> mapper(operator.add, [1,2,4], [3,6,9])
[4, 8, 13]
By using the asterisk, we unpack the iterable as separate parameters in the function call.
Note that this is still not fully equivalent, since:
the sequences should be iterables, not per se lists, so we can not always index; and
the result of a map in python-3.x is an iterable as well, so not a list.
A more python-3.x-like map function would be:
def mapper(func, *sequences):
if not sequences:
raise TypeError('Mapper should have at least two parameters')
iters = [iter(seq) for seq in sequences]
while True:
yield func(*[next(it) for it in iters])
Note however that most Python interpreters will implement map closer to the interpreter than Python code, so it is definitely more efficient to use the builtin map, than writing your own.
N.B.: it is better not to use variable names like list, set, dict, etc. since these will override (here locally) the reference to the list type. As a result a call like list(some_iterable) will no longer work.
Separating the part of combining of the sequence or sequences logic is much more easier to read and understand.
def mapper(func, *args):
for i in zip(*args):
yield func(*i)
Here we are using Python inbuilt zip
if you want to replace it entirely with your own implementation replace with zip with the below zipper function
def zipper(*args):
for i in range(len(args[0])):
index_elements = []
for arg in args:
index_elements.append(arg[i])
yield positional_elements

Naming variables within nested list comprehensions in Python?

Like the title says, is there any way to name variables (i.e., lists) used within a nested list comprehension in Python?
I could come up with a fitting example, but I think the question is clear enough.
Here is an example of pseudo code:
[... [r for r in some_list if r.some_attribute == something_from_within_this_list comprehension] ... [r for r in some_list if r.some_attribute == something_from_within_this_list comprehension] ...]
Is there any way to avoid the repetition here and simply add a variable for this temporary list only for use within the list comprehension?
CLARIFICATION:
The list comprehension is already working fine, so it's not a question of 'can it be done with a list comprehension'. And it is quicker than it's original form of a for statement too, so it's not one of those 'for statements vs list comprehensions' questions either. It is simply a question of making the list comprehension more readable by making variable names for variables internal to the list comprehension alone. Just googling around I haven't really found any answer. I found this and this, but that's not really what I am after.
Based on my understanding of what you want to do, No you cannot do it.
You cannot carry out assignments in list comprehensions because a list comprehension is essentially of the form
[expression(x, y) for x in expression_that_creates_a_container
for y in some_other_expression_that_creates_a_container(x)
if predicate(y, x)]
Granted there are a few other cases but they're all about like that. Note that nowhere does there exist room for a statement which is what a name assignment is. So you cannot assign to a name in the context of a list comprehension except by using the for my_variable in syntax.
If you have the list comprehension working, you could post it and see if it can be simplified. Solutions based on itertools are often a good alternative to burly list comprehensions.
I think I understand exactly what you meant, and I came up with a "partial solution" to this problem. The solution works fine, but is not efficent.
Let me explain with an example:
I was just trying to solve a Pythagorean triplet which sum was 1000. The python code to solve it is just:
def pythagoreanTriplet(sum):
for a in xrange(1, sum/2):
for b in xrange(1, sum/3):
c = sum - a - b
if c > 0 and c**2 == a**2 + b**2:
return a, b, c
But I wanted to code it in a functional programming-like style:
def pythagoreanTriplet2(sum):
return next((a, b, sum-a-b) for a in xrange(1, sum/2) for b in xrange(1, sum/3) if (sum-a-b) > 0 and (sum-a-b)**2 == a**2 + b**2)
As can be seen in the code, I calc 3 times (sum-a-b), and I wanted to store the result in an internal varible to avoid redundant calculation. The only way I found to do that was by adding another loop with a single value to declare an internal variable:
def pythagoreanTriplet3(sum):
return next((a, b, c) for a in xrange(1, sum/2) for b in xrange(1, sum/3) for c in [sum-a-b] if c > 0 and c**2 == a**2 + b**2)
It works fine... but as I said at the begin of the post, is not an efficent method. Comparing the 3 methods with cProfile, the time required for each method is the next one:
First method: 0.077 seconds
Secnd method: 0.087 seconds
Third method: 0.109 seconds
Some people could classify the following as a "hack", but it is definitely useful in some cases.
f = lambda i,j: int(i==j) #A dummy function (here Kronecker's delta)
a = tuple(tuple(i + (2+f_ij)*j + (i + (1+f_ij)*j)**2
for j in range(4)
for f_ij in (f(i,j),) ) #"Assign" value f(i,j) to f_ij.
for i in range(4) )
print(a)
#Output: ((0, 3, 8, 15), (2, 13, 14, 23), (6, 13, 44, 33), (12, 21, 32, 93))
This approach is particularly convenient if the function f is costly to evaluate. Because it is somewhat unusual, it may be a good idea to document the "assignment" line, as I did above.
I'm just gonna go out on a limb here, because I have no idea what you really are trying to do. I'm just going to guess that you are trying to shoehorn more than you should be into a single expression. Don't do that, just assign subexpressions to variables:
sublist = [r for r in some_list if r.some_attribute == something_from_within_this_list comprehension]
composedlist = [... sublist ... sublist ...]
This feature was added in Python 3.8 (see PEP 572), it's called "assignment expressions" and the operator is := .
Examples from the documentation:
results = [(x, y, x/y) for x in input_data if (y := f(x)) > 0]
stuff = [[y := f(x), x/y] for x in range(5)]

how to rewrite this loop in a more efficient way in python

I have a loop of the following type:
a = range(10)
b = [something]
for i in range(len(a)-1):
b.append(someFunction(b[-1], a[i], a[i+1]))
However the for-loop is killing a lot of performance. I have try to write a windows generator to give me 2 elements everything time but it still require explicit for-loop in the end. Is there a way to make this shorter and more efficient in a pythonic way?
Thanks
edit: I forgot the element in b.. sorry guys. However the solution to my previous problem is very helpful in other problem I have too. Thanks.
Consider this
def make_b( a, seed ):
yield seed
for a,b in zip( a[:-1], a[1:] ):
seed= someFunction( seed, a, b )
yield seed
Which lets you do this
a = xrange(10)
b= list(make_b(a,something))
Note that you can often use this:
b = make_b(a)
Instead of actually creating b as a list. b as a generator function saves you considerable storage (and some time) because you may not really need a list object in the first place. Often, you only need something iterable.
Similarly for a. It does not have to be a list, merely something iterable -- like a generator function with a yield statement.
For your initially stated problem of mapping a function over pairs of an input sequence the following will work, and is about as efficient as it gets while staying in Python land.
from itertools import tee
a = range(10)
a1, a2 = tee(a)
a2.next()
b = map(someFunction, a1, a2)
As for the expanded problem where you need to access the result of the previous iteration - this kind of inner state is present in the functional concept unfold. But Python doesn't include an unfold construct, and for a good reason for loops are more readable in this case and most likely faster too. As for making it more Pythonic, I suggest lifting the pairwise iteration out to a function and create an explicit loop variable.
def pairwise(seq):
a, b = tee(seq)
b.next()
return izip(a, b)
def unfold_over_pairwise(unfolder, seq, initial):
state = initial
for cur_item, next_item in pairwise(seq):
state = unfolder(state, cur_item, next_item)
yield state
b = [something]
b.extend(unfold_over_pairwise(someFunction, a, initial=b[-1]))
If the looping overhead really is a problem, then someFunction must be something really simple. In that case it probably is best to write the whole loop in a faster language, such as C.
Some loop or other will always be around, but one possibility that might reduce overhead is:
import itertools
def generate(a, item):
a1, a2 = itertools.tee(a)
next(a2)
for x1, x2 in itertools.izip(a1, a2):
item = someFunction(item, x1, x2)
yield item
to be used as:
b.extend(generate(a, b[-1]))
Try something like this:
a = range(10)
b = [something]
s = len(b)
b+= [0] * (len(a) - 1)
[ b.__setitem__(i, someFunction(b[i-1], a[i-s], a[i-s+1])) for i in range(s, len(b))]
Also:
using functions from itertools should
be useful also (earlier posts)
maybe you can rewrite someFunction and use map instead of list
comprehension

Categories

Resources