Efficient enumeration of ordered subsets in Python - python

I'm not sure of the appropriate mathematical terminology for the code I'm trying to write. I'd like to generate combinations of unique integers, where "ordered subsets" of each combination are used to exclude certain later combinations.
Hopefully an example will make this clear:
from itertools import chain, combinations
​
mylist = range(4)
max_depth = 3
rev = chain.from_iterable(combinations(mylist, i) for i in xrange(max_depth, 0, -1))
for el in list(rev):
print el
That code results in output that contains all the subsets I want, but also some extra ones that I do not. I have manually inserted comments to indicate which elements I don't want.
(0, 1, 2)
(0, 1, 3)
(0, 2, 3)
(1, 2, 3)
(0, 1) # Exclude: (0, 1, _) occurs as part of (0, 1, 2) above
(0, 2) # Exclude: (0, 2, _) occurs above
(0, 3) # Keep
(1, 2) # Exclude: (1, 2, _) occurs above
(1, 3) # Keep: (_, 1, 3) occurs above, but (1, 3, _) does not
(2, 3) # Keep
(0,) # Exclude: (0, _, _) occurs above
(1,) # Exclude: (1, _, _) occurs above
(2,) # Exclude: (2, _) occurs above
(3,) # Keep
Thus, the desired output of my generator or iterator would be:
(0, 1, 2)
(0, 1, 3)
(0, 2, 3)
(1, 2, 3)
(0, 3)
(1, 3)
(2, 3)
(3,)
I know I could make a list of all the (wanted and unwanted) combinations and then filter out the ones I don't want, but I was wondering if there was a more efficient, generator or iterator based way.

You are trying to exclude any combination that is a prefix of a previously-returned combination. Doing so is straightforward.
If a tuple t has length max_depth, it can't be a prefix of a previously-returned tuple, since any tuple it's a prefix of would have to be longer.
If a tuple t ends with mylist[-1], then it can't be a prefix of a previously-returned tuple, since there are no elements that could legally be added to the end of t to extend it.
If a tuple t has length less than max_depth and does not end with mylist[-1], then t is a prefix of the previously-returned tuple t + (mylist[-1],), and t should not be returned.
Thus, the combinations you should generate are exactly the ones of length max_depth and the shorter ones that end with mylist[-1]. The following code does so, in exactly the same order as your original code, and correctly handling cases like maxdepth > len(mylist):
def nonprefix_combinations(iterable, maxlen):
iterable = list(iterable)
if not (iterable and maxlen):
return
for comb in combinations(iterable, maxlen):
yield comb
for length in xrange(maxlen-2, -1, -1):
for comb in combinations(iterable[:-1], length):
yield comb + (iterable[-1],)
(I've assumed here that in the case where maxdepth == 0, you still don't want to include the empty tuple in your output, even though for maxdepth == 0, it isn't a prefix of a previously-returned tuple. If you do want the empty tuple in this case, you can change if not (iterable and maxlen) to if not iterable.)

I noticed an interesting pattern in your desired output and I have a generator that produces that. Does this work for all your cases?
from itertools import combinations
def orderedSetCombination(iterable, r):
# Get the last element of the iterable
last = (iterable[-1], )
# yield all the combinations of the iterable without the
# last element
for iter in combinations(iterable[:-1], r):
yield iter
# while r > 1 reduce r by 1 and yield all the combinations
while r>1:
r -= 1
for iter in combinations(iterable[:-1], r):
yield iter+last
# yield the last item
yield last
iter = [0,1,2,3]
for el in (list(orderedSetCombination(iter, 3))):
print(el)
Here is my explaination of the logic:
# All combinations that does not include the last element of the iterable
# taking r = max_depth items at a time
(0,1,2)
# from here on, its the combinations of all the elements except
# the last element and the last element is added to it.
# so here taking r = r -1 items at a time and adding the last element
# combinations([0,1,2], r=2)
(0,1,3)
(0,2,3)
(1,2,3)
# the only possible value right now at index r = 2 is the last element (3)
# since all possible values of (0,1,_) (0,2,_) (1,2,_) are already listed
# So reduce r by 1 again and continue: combinations([0,1,2], r=1)
(0, 3)
(1, 3)
(2, 3)
# continue until r == 0 and then yield the last element
(3,)

Related

I'm not able to understand this code in tuple

init_tuple = [(0, 1), (1, 2), (2, 3)]
result = sum(n for _, n in init_tuple)
print(result)
The output for this code is 6. Could someone explain how it worked?
Your code extracts each tuple and sums all values in the second position (i.e. [1]).
If you rewrite it in loops, it may be easier to understand:
init_tuple = [(0, 1), (1, 2), (2, 3)]
result = 0
for (val1, val2) in init_tuple:
result = result + val2
print(result)
The expression (n for _, n in init_tuple) is a generator expression. You can iterate on such an expression to get all the values it generates. In that case it reads as: generate the second component of each tuple of init_tuple.
(Note on _: The _ here stands for the first component of the tuple. It is common in python to use this name when you don't care about the variable it refers to (i.e., if you don't plan to use it) as it is the case here. Another way to write your generator would then be (tup[1] for tup in init_tuple))
You can iterate over a generator expression using for loop. For example:
>>> for x in (n for _, n in init_tuple):
>>> print(x)
1
2
3
And of course, since you can iterate on a generator expression, you can sum it as you have done in your code.
To get better understanding first look at this.
init_tuple = [(0, 1), (1, 2), (2, 3)]
sum = 0
for x,y in init_tuple:
sum = sum + y
print(sum)
Now, you can see that what above code does is that it calculate sum of second elements of tuple, its equivalent to your code as both does same job.
for x,y in init_tuple:
x hold first value of tuple and y hold second of tuple, in first iteration:
x = 0, y = 1,
then in second iteration:
x = 1, y = 2 and so on.
In your case you don't need first element of tuple so you just use _ instead of using variable.

Generating 3-tuples from a set of 2-tuples

In an earlier question:
Generating maximum number of 3-tuples from a list of 2-tuples
I got an answer from #AChampion that seems to work if the number of 2-tuples is divisible by 3. However, the solution fails if we, for example, have 10 2-tuples. After fumbling with it for a while I'm under the impression that it is impossible to find a perfect solution for say:
(1,2)(1,3),(1,4),(2,3),(2,4),(3,4)
So I'm interested in finding one solution that minimizes the number of remainder tuples. In the example above the result could be:
(1,2,3) # derived from (1,2), (1,3), (2,3)
(1,4),(2,4),(3,4) # remainder tuples
The rule for generating 3-tuple from 3 2-tuple is:
(a,b), (b,c), (c,a) -> (a, b, c)
i.e. the 2-tuples is a cycle with length 3. The order of the elements in a 3-tuple is not important, i.e:
(a,b,c) == (c,a,b)
I'm actually interested in the case where we have a number n:
for x in range(1,n+1):
for y in range(1,n+1):
if x!=y:
a.append((x,y))
# a = [ (1,2),...,(1,n), (2,1),(2,3),...,(2,n),...(n,1),...,(n,n-1) ]
From a, minimize the number of 2-tuples that is left when producing 3-tuples. Each 2-tuple can only be used once.
I wrapped my brain around this for several hours but I can't seem to come up with an elegant solution (well, neither have I found an ugly one:-) for the general case. Any thoughts?
For this you need to create number of combinations that will use for replacement. Then loop over you data for 3 item that contains any of above combinations and replace them.
I have done thi in several steps.
from itertools import combinations
# create replacements elements
number_combinations_raw = list(combinations(range(1, 5), 3))
# create proper number combinations
number_combinations = []
for item in number_combinations_raw:
if (item[0] + 1 == item[1]) and (item[1] + 1 == item[2]):
number_combinations.append(item)
# create test data
data = [(1, 2), (1, 3), (1, 4), (2, 3), (2, 4)]
# reduce data
reduce_data = []
for number_set in number_combinations:
count = 0
merged_data = []
for item in data:
if (number_set[0] in item and number_set[1] in item) or (number_set[1] in item and number_set[2] in item) \
or (number_set[0] in item and number_set[2] in item):
merged_data.append(item)
count += 1
if count == 3:
reduce_data.append((number_set, merged_data))
# delete merged elements from data list and add replacement
for item in data:
for reduce_item in reduce_data:
for element in reduce_item[1]:
if element in data:
data.remove(element)
data = [reduce_item[0]] + data
# remove duplicated replaced elements
final_list = list(dict.fromkeys(data))
Output:
[(1, 2, 3), (1, 4), (2, 4)]

How to iterate over lists from the middle out

I would like to iterate over all lists/tuples of length n with elements from -s...s. Currently I do this with:
for k in itertools.product(range(-s,s+1), repeat = n):
#process k and maybe print out the result
However this not useful for me as there are a huge number of such tuples and my code may never terminate. I would really like to start with the most interesting ones first. In this case the order I would like for the iteration is:
All tuples that contain only 0 (there is only one)
All tuples that contain only 0, 1 and -1 excluding those tuples we have already seen.
All tuples that contain only 0, 1,-1, 2 and-2 excluding those tuples we have already seen.
And so on...
How can one do this?
How about this:
import itertools
def sorted_tuples(length, max_s):
nums = [0]
for s in range(max_s):
for p in itertools.combinations_with_replacement(nums, length):
if s in p or -s in p:
yield p
nums = [-(s+1)] + nums + [s+1]
for i in sorted_tuples(3,2):
print(i)
# prints the following
(0, 0, 0)
(-1, -1, -1)
(-1, -1, 0)
(-1, -1, 1)
(-1, 0, 0)
(-1, 0, 1)
(-1, 1, 1)
(0, 0, 1)
(0, 1, 1)
(1, 1, 1)
So your code to be done with the lists in much more expensive than sorting?
Then you can sort the list of these lists with a key argument. The things you call list are tuples indeed, right? At least in my python 2.7 itertools. I would convert them to arrays, since I think you cannot use abs otherwise. Then the sorting function is:
lists.sort(key = lambda t: np.max(np.abs(np.array(t))))
does this work fast enough?

Remove duplicate tuples from a list if they are exactly the same including order of items

I know questions similar to this have been asked many, many times on Stack Overflow, but I need to remove duplicate tuples from a list, but not just if their elements match up, their elements have to be in the same order. In other words, (4,3,5) and (3,4,5) would both be present in the output, while if there were both(3,3,5) and (3,3,5), only one would be in the output.
Specifically, my code is:
import itertools
x = [1,1,1,2,2,2,3,3,3,4,4,5]
y = []
for x in itertools.combinations(x,3):
y.append(x)
print(y)
of which the output is quite lengthy. For example, in the output, there should be both (1,2,1) and (1,1,2). But there should only be one (1,2,2).
set will take care of that:
>>> a = [(1,2,2), (2,2,1), (1,2,2), (4,3,5), (3,3,5), (3,3,5), (3,4,5)]
>>> set(a)
set([(1, 2, 2), (2, 2, 1), (3, 4, 5), (3, 3, 5), (4, 3, 5)])
>>> list(set(a))
[(1, 2, 2), (2, 2, 1), (3, 4, 5), (3, 3, 5), (4, 3, 5)]
>>>
set will remove only exact duplicates.
What you need is unique permutations rather than combinations:
y = list(set(itertools.permutations(x,3)))
That is, (1,2,2) and (2,1,2) will be considered as same combination and only one of them will be returned. They are, however, different permutations. Use set() to remove duplicates.
If afterwards you want to sort elements within each tuple and also have the whole list sorted, you can do:
y = [tuple(sorted(q)) for q in y]
y.sort()
No need to do for loop, combinations gives a generator.
x = [1,1,1,2,2,2,3,3,3,4,4,5]
y = list(set(itertools.combinations(x,3)))
This will probably do what you want, but it's vast overkill. It's a low-level prototype for a generator that may be added to itertools some day. It's low level to ease re-implementing it in C. Where N is the length of the iterable input, it requires worst-case space O(N) and does at most N*(N-1)//2 element comparisons, regardless of how many anagrams are generated. Both of those are optimal ;-)
You'd use it like so:
>>> x = [1,1,1,2,2,2,3,3,3,4,4,5]
>>> for t in anagrams(x, 3):
... print(t)
(1, 1, 1)
(1, 1, 2)
(1, 1, 3)
(1, 1, 4)
(1, 1, 5)
(1, 2, 1)
...
There will be no duplicates in the output. Note: this is Python 3 code. It needs a few changes to run under Python 2.
import operator
class ENode:
def __init__(self, initial_index=None):
self.indices = [initial_index]
self.current = 0
self.prev = self.next = self
def index(self):
"Return current index."
return self.indices[self.current]
def unlink(self):
"Remove self from list."
self.prev.next = self.next
self.next.prev = self.prev
def insert_after(self, x):
"Insert node x after self."
x.prev = self
x.next = self.next
self.next.prev = x
self.next = x
def advance(self):
"""Advance the current index.
If we're already at the end, remove self from list.
.restore() undoes everything .advance() did."""
assert self.current < len(self.indices)
self.current += 1
if self.current == len(self.indices):
self.unlink()
def restore(self):
"Undo what .advance() did."
assert self.current <= len(self.indices)
if self.current == len(self.indices):
self.prev.insert_after(self)
self.current -= 1
def build_equivalence_classes(items, equal):
ehead = ENode()
for i, elt in enumerate(items):
e = ehead.next
while e is not ehead:
if equal(elt, items[e.indices[0]]):
# Add (index of) elt to this equivalence class.
e.indices.append(i)
break
e = e.next
else:
# elt not equal to anything seen so far: append
# new equivalence class.
e = ENode(i)
ehead.prev.insert_after(e)
return ehead
def anagrams(iterable, count=None, equal=operator.__eq__):
def perm(i):
if i:
e = ehead.next
assert e is not ehead
while e is not ehead:
result[count - i] = e.index()
e.advance()
yield from perm(i-1)
e.restore()
e = e.next
else:
yield tuple(items[j] for j in result)
items = tuple(iterable)
if count is None:
count = len(items)
if count > len(items):
return
ehead = build_equivalence_classes(items, equal)
result = [None] * count
yield from perm(count)
You were really close. Just get permutations, not combinations. Order matters in permutations, and it does not in combinations. Thus (1, 2, 2) is a distinct permutation from (2, 2, 1). However (1, 2, 2) is considered a singular combination of one 1 and two 2s. Therefore (2, 2, 1) is not considered a distinct combination from (1, 2, 2).
You can convert your list y to a set so that you remove duplicates...
import itertools
x = [1,1,1,2,2,2,3,3,3,4,4,5]
y = []
for x in itertools.permutations(x,3):
y.append(x)
print(set(y))
And voila, you are done. :)
Using a set should probably work. A set is basically a container that doesn't contain any duplicated elements.
Python also includes a data type for sets. A set is an unordered
collection with no duplicate elements. Basic uses include membership
testing and eliminating duplicate entries. Set objects also support
mathematical operations like union, intersection, difference, and
symmetric difference.
import itertools
x = [1,1,1,2,2,2,3,3,3,4,4,5]
y = set()
for x in itertools.combinations(x,3):
y.add(x)
print(y)

Python: fast dictionary of big int keys

I have got a list of >10.000 int items. The values of the items can be very high, up to 10^27. Now I want to create all pairs of the items and calculate their sum. Then I want to look for different pairs with the same sum.
For example:
l[0] = 4
l[1] = 3
l[2] = 6
l[3] = 1
...
pairs[10] = [(0,2)] # 10 is the sum of the values of l[0] and l[2]
pairs[7] = [(0,1), (2,3)] # 7 is the sum of the values of l[0] and l[1] or l[2] and l[3]
pairs[5] = [(0,3)]
pairs[9] = [(1,2)]
...
The contents of pairs[7] is what I am looking for. It gives me two pairs with the same value sum.
I have implemented it as follows - and I wonder if it can be done faster. Currently, for 10.000 items it takes >6 hours on a fast machine. (As I said, the values of l and so the keys of pairs are ints up to 10^27.)
l = [4,3,6,1]
pairs = {}
for i in range( len( l ) ):
for j in range(i+1, len( l ) ):
s = l[i] + l[j]
if not s in pairs:
pairs[s] = []
pairs[s].append((i,j))
# pairs = {9: [(1, 2)], 10: [(0, 2)], 4: [(1, 3)], 5: [(0, 3)], 7: [(0, 1), (2, 3)]}
Edit: I want to add some background, as asked by Simon Stelling.
The goal is to find Formal Analogies like
lays : laid :: says : said
within a list of words like
[ lays, lay, laid, says, said, foo, bar ... ]
I already have a function analogy(a,b,c,d) giving True if a : b :: c : d. However, I would need to check all possible quadruples created from the list, which would be a complexity of around O((n^4)/2).
As a pre-filter, I want to use the char-count property. It says that every char has the same count in (a,d) and in (b,c). For instance, in "layssaid" we have got 2 a's, and so we do in "laidsays"
So the idea until now was
for every word to create a "char count vector" and represent it as an integer (the items in the list l)
create all pairings in pairs and see if there are "pair clusters", i.e. more than one pair for a particular char count vector sum.
And it works, it's just slow. The complexity is down to around O((n^2)/2) but this is still a lot, and especially the dictionary lookup and insert is done that often.
There are the trivial optimizations like caching constant values in a local variable and using xrange instead of range:
pairs = {}
len_l = len(l)
for i in xrange(len_l):
for j in xrange(i+1, len_l):
s = l[i] + l[j]
res = pairs.setdefault(s, [])
res.append((i,j))
However, it is probably far more wise to not pre-calculate the list and instead optimize the method on a concept level. What is the intrinsic goal you want to achieve? Do you really just want to calculate what you do? Or are you going to use that result for something else? What is that something else?
Just a hint. Have a look on itertools.combinations.
This is not exactly what you are looking for (because it stores pair of values, not of indexes), but it can be a starting code:
from itertools import combinations
for (a, b) in combinations(l, 2):
pairs.setdefault(a + b, []).append((a, b))
The above comment from SimonStelling is correct; generating all possible pairs is just fundamentally slow, and there's nothing you can do about it aside from altering your algorithm. The correct function to use from itertools is product; and you can get some minor improvements from not creating extra variables or doing unnecessary list indexes, but underneath the hood these are still all O(n^2). Here's how I would do it:
from itertools import product
l = [4,3,6,1]
pairs = {}
for (m,n) in product(l,repeat=2):
pairs.setdefault(m+n, []).append((m,n))
Finally, I have came up with my own solution, just taking half of the calculation time on average.
The basic idea: Instead of reading and writing into the growing dictionary n^2 times, I first collect all the sums in a list. Then I sort the list. Within the sorted list, I then look for same neighbouring items.
This is the code:
from operator import itemgetter
def getPairClusters( l ):
# first, we just store all possible pairs sequentially
# clustering will happen later
pairs = []
for i in xrange( len( l) ):
for j in xrange(i+1, len( l ) ):
pair = l[i] + l[j]
pairs.append( ( pair, i, j ) )
pairs.sort(key=itemgetter(0))
# pairs = [ (4, 1, 3), (5, 0, 3), (7, 0, 1), (7, 2, 3), (9, 1, 2), (10, 0, 2)]
# a list item of pairs now contains a tuple (like (4, 1, 3)) with
# * the sum of two l items: 4
# * the index of the two l items: 1, 3
# now clustering starts
# we want to find neighbouring items as
# (7, 0, 1), (7, 2, 3)
# (since 7=7)
pairClusters = []
# flag if we are within a cluster
# while iterating over pairs list
withinCluster = False
# iterate over pair list
for i in xrange(len(pairs)-1):
if not withinCluster:
if pairs[i][0] == pairs[i+1][0]:
# if not within a cluster
# and found 2 neighbouring same numbers:
# init new cluster
pairCluster = [ ( pairs[i][1], pairs[i][2] ) ]
withinCluster = True
else:
# if still within cluster
if pairs[i][0] == pairs[i+1][0]:
pairCluster.append( ( pairs[i][1], pairs[i][2] ) )
# else cluster has ended
# (next neighbouring item has different number)
else:
pairCluster.append( ( pairs[i][1], pairs[i][2] ) )
pairClusters.append(pairCluster)
withinCluster = False
return pairClusters
l = [4,3,6,1]
print getPairClusters(l)

Categories

Resources