Python removing tuples from list that satisfy given conditions - python

I have a list of tuples and i want to remove tuples so that there is only one tuple in the list that has a given length and sum.
That's a bad explanation so for example:
[(0,1,2), (0,2,1), (0,0,1)]
remove (0,1,2) or (0,2,1)
I want to be able to iterate though the list and remove any tuples that satisfy the following conditions:
len(tuple1) == len(tuple2) and sum(tuple1) == sum(tuple2)
but keep either tuple1 or tuple2 in the list.
I tried:
for t1 in list:
for t2 in list:
if len(t1) == len(t2) and sum(t1) == sum(t2):
list.remove(t1)
but im pretty sure this removes all tuples and the console crashed.

In essence this is a "uniqness filter", but where we specify a function f, and only if that f(x) occurs a second time, we filter that element out.
We can implement such uniqness filter, given f(x) produces hashable values, with:
def uniq(iterable, key=lambda x: x):
seen = set()
for item in iterable:
u = key(item)
if u not in seen:
yield item
seen.add(u)
Then we can use this filter as:
result = list(uniq(data, lambda x: (len(x), sum(x))))
for example:
>>> list(uniq(data, lambda x: (len(x), sum(x))))
[(0, 1, 2), (0, 0, 1)]
Here we will always retain the first occurrence of the "duplicates".

Let me offer a slightly different solution. Note that this is not something I'd use for a one-off script, but for a real project. Because your [(0, 0, 1)] actually represents something logical/physical.
set(..) removes duplicates. How about we use that? The only thing to keep it mind is that the hash value and equality of the elements need to be modified.
class Converted(object):
def __init__(self, tup):
self.tup = tup
self.transformed = len(tup), sum(tup)
def __eq__(self, other):
return self.transformed == other.transformed
def __hash__(self):
return hash(self.transformed)
inp = [(0,1,2), (0,2,1), (0,0,1)]
out = [x.tup for x in set(map(Converted, inp))]
print(out)
# [(0, 0, 1), (0, 1, 2)]

You can also use groupby to group elements by sum and len and fetch 1 element from each group to create a new list:
from itertools import groupby
def _key(t):
return (len(t), sum(t))
data = [(0, 1, 2), (0, 2, 1), (0, 0, 1), (1, 0, 0), (0, 1, 0), (3, 0, 0, 0)]
result = []
for k, g in groupby(sorted(data, key=_key), key=_key):
result.append(next(g))
print(result)
# [(0, 0, 1), (0, 1, 2), (3, 0, 0, 0)]

The complexity of your problem comes mainly from the fact that you have two independent filters you want to implement. A good way to go about filtering on data with this sort of requirement is to use groupby. However, before you can do that you need to sort first. Since you normally sort over one key, you'll need to sort twice before you can group:
from itertools import groupby
def lensumFilter(data):
return [next(g) for _, g in groupby(sorted(sorted(data, key = len), key = sum),
key = lambda x: (len(x), sum(x)))]
>>> print(lensumFilter( [(0, 1, 2), (0, 2, 1), (0, 0, 1)] )
[(0, 0, 1), (0, 2, 1)]
>>> print(lensumFilter( [(0, 1, 2), (0, 2, 1), (0, 0, 0, 3), (0, 0, 1)] )
[(0, 0, 1), (0, 2, 1), (0, 0, 0, 3)]
>>> print(lensumFilter( [(0, 1, 2), (0, 2, 2), (0, 4), (0, 0, 0, 5), (0, 0, 3)] )
[(0, 1, 2), (0, 4), (0, 2, 2), (0, 0, 0, 5)]
Note that if you change how the sorts work, you change how the output will look. For instance, I sorted on length and then sum so my results are in order with respect to sum (smallest sum first) and then in order with respect to length (fewest number of elements first) within sum-groupings. That's why (0, 1, 2) comes before (0, 4) but (0, 4) comes before (0, 2, 2).

If you want to do something concise and more pythonic, you could use the function filter.
It will keep all the elements that are matching your requirements (here sum being not equal when same length):
tup_remove = (0,2,1)
list(filter(lambda current_tup: not (sum(tup_remove) == sum(current_tup) and len(tup_remove) == len(current_tup))
For better readability and extensibility, I would encourage you to use a function:
def not_same_sum_len_tuple(tup_to_check, current_tuple):
"""Return True when not same sum AND same length"""
same_sum = sum(tup_to_check) == sum(current_tuple) # Check the sum
same_len = len(tup_remove) == len(current_tuple) # Check the length
return not (same_sum and same_len)
tup_remove = (0,2,1)
list(filter(lambda current_tup: not_same_sum_len_tuple(tup_remove, current_tup), tup_list))

It's probably easier to just make a new list that meets your conditions.
old_list = [(0,1,2), (0,2,1), (0,0,1)]
new_list = []
for old_t in old_list:
for new_t in new_list:
if len(old_t) == len(new_t) and sum(old_t) == sum(new_t):
break
else:
new_list.append(old_t)
# new_list == [(0, 1, 2), (0, 0, 1)]

This is a simpler solution but may not be performant. Just make a dict with (len(t), sum(t)) as keys and the tuples as values. The last tuple stays.
lst = [(0,1,2), (0,2,1), (0,0,1)]
d = {(len(t), sum(t)): t for t in lst}
list(d.values())
In one line;
list({(len(t), sum(t)): t for t in lst}.values())
To make it performant just memoize len and sum.
from functools import lru_cache
mlen, msum = (lru_cache(maxsize=None)(f) for f in (len, sum))
list({(mlen(t), msum(t)): t for t in lst}.values())

Related

Python, permutation to permuation-index function

I have some permutations of a list:
>>> import itertools
>>> perms = list(itertools.permutations([0,1,2,3]))
>>> perms
[(0, 1, 2, 3), (0, 1, 3, 2), (0, 2, 1, 3), (0, 2, 3, 1), (0, 3, 1, 2), (0, 3, 2, 1), (1, 0, 2, 3), (1, 0, 3, 2), (1, 2, 0, 3), (1, 2, 3, 0), (1, 3, 0, 2), (1, 3, 2, 0), (2, 0, 1, 3), (2, 0, 3, 1), (2, 1, 0, 3), (2, 1, 3, 0), (2, 3, 0, 1), (2, 3, 1, 0), (3, 0, 1, 2), (3, 0, 2, 1), (3, 1, 0, 2), (3, 1, 2, 0), (3, 2, 0, 1), (3, 2, 1, 0)]
>>> len(perms)
24
What function can I use (without access to the list perm) to get the index of an arbitrary permutation, e.g. (0, 2, 3, 1) -> 3?
(You can assume that permuted elements are always an ascending list of integers, starting at zero.)
Hint: The factorial number system may be involved. https://en.wikipedia.org/wiki/Factorial_number_system
Off the top of my head I came up with the following, didn't test it thoroughly.
from math import factorial
elements = list(range(4))
permutation = (3, 2, 1, 0)
index = 0
nf = factorial(len(elements))
for n in permutation:
nf //= len(elements)
index += elements.index(n) * nf
elements.remove(n)
print(index)
EDIT: replaced nf /= len(elements) with nf //= len(elements)
I suppose this is a challenge, so here is my (recursive) answer:
import math
import itertools
def get_index(l):
# In a real function, there should be more tests to validate that the input is valid, e.g. len(l)>0
# Terminal case
if len(l)==1:
return 0
# Number of possible permutations starting with l[0]
span = math.factorial(len(l)-1)
# Slightly modifying l[1:] to use the function recursively
new_l = [ val if val < l[0] else val-1 for val in l[1:] ]
# Actual solution
return get_index(new_l) + span*l[0]
get_index((0,1,2,3))
# 0
get_index((0,2,3,1))
# 3
get_index((3,2,1,0))
# 23
get_index((4,2,0,1,5,3))
# 529
list(itertools.permutations((0,1,2,3,4,5))).index((4,2,0,1,5,3))
# 529
You need to write your own function. Something like this would work
import math
def perm_loc(P):
N = len(P)
assert set(P) == set(range(N))
def rec(perm):
nums = set(perm)
if not perm:
return 0
else:
sub_res = rec(perm[1:]) # Result for tail of permutation
sub_size = math.factorial(len(nums) - 1) # How many tail permutations exist
sub_index = sorted(nums).index(perm[0]) # Location of first element in permutaiotn
# in the sorted list of number
return sub_index * sub_size + sub_res
return rec(P)
The function that does all the work is rec, with perm_loc just serving as a wrapper around it. Note that this algorithm is based on the nature of the permutation algorithm that itertools.permutation happens to use.
The following code tests the above function. First on your sample, and then on all permutations of range(7):
print perm_loc([0,2,3,1]) # Print the result from the example
import itertools
def test(N):
correct = 0
perms = list(itertools.permutations(range(N)))
for (i, p) in enumerate(perms):
pl = perm_loc(p)
if i == pl:
correct += 1
else:
print ":: Incorrect", p, perms.index(p), perm_loc(N, p)
print ":: Found %d correct results" % correct
test(7) # Test on all permutations of range(7)
from math import factorial
def perm_to_permidx(perm):
# Extract info
n = len(perm)
elements = range(n)
# "Gone"s will be the elements of the given perm
gones = []
# According to each number in perm, we add the repsective offsets
offset = 0
for i, num in enumerate(perm[:-1], start=1):
idx = num - sum(num > gone for gone in gones)
offset += idx * factorial(n - i)
gones.append(num)
return offset
the_perm = (0, 2, 3, 1)
print(perm_to_permidx(the_perm))
# 3
Explanation: All permutations of a given range can be considered as a groups of permutations. So, for example, for the permutations of 0, 1, 2, 3 we first "fix" 0 and permute rest, then fix 1 and permute rest, and so on. Once we fix a number, the rest is again permutations; so we again fix a number at a time from the remaining numbers and permute the rest. This goes on till we are left with one number only. Every level of fixing has a corresponding (n-i)! permutations.
So this code finds the "offsets" for each level of permutation. The offset corresonds to where the given permutation starts when we fix numbers of perm in order. For the given example of (0, 2, 3, 1), we first look at the first number in the given perm which is 0, and figure the offset as 0. Then this goes to gones list (we will see its usage). Then, at the next level of permutation we see 2 as the fixing number. To calculate the offset for this, we need the "order" of this 2 among the remaining three numbers. This is where gones come into play; if an already-fixed and considered number (in this case 0) is less than the current fixer, we subtract 1 to find the new order. Then offset is calculated and accumulated. For the next number 3, the new order is 3 - (1 + 1) = 1 because both previous fixers 0 and 2 are at the "left" of 3.
This goes on till the last number of the given perm since there is no need to look at it; it will have been determined anyway.

Generating all possible combinations of n-sized vector that follow certain conditions on each element

I have a list d of length r such that d = (d_1, d_2,..., d_r).
I would like to generate all possible vectors of length r such that for any i (from 0 to r), v_i is between 0 and d_i.
For example,
if r =2 and d= (1,2), v_1 can be 0 or 1 and v_2 can be 0,1 or 2.
Hence there are 6 possible vectors:
[0,0] , [0,1], [0,2], [1,0] , [1,1], [1,2]
I have looked into Itertools and combinations and I have a feeling I will have to use recursion however I have not managed to solve it yet and was hoping for some help or advice into the right direction.
Edit:
I have written the following code for my problem and it works however I did it in a very inefficient way by disregarding the condition and generating all possible vectors then pruning the invalid ones. I took the largest d_i and generated all vectors of size r from (0,0,...0) all the way to (max_d_i,max_d_i,....max_d_i) and then eliminated those that were invalid.
Code:
import itertools
import copy
def main(d):
arr = []
correct_list =[]
curr = []
r= len(d)
greatest = max(d)
for i in range(0,greatest+1):
arr = arr + [i]
#all_poss_arr is a list that holds all possible vectors of length r from (0,0,...,0) to (max,max,...,max)
# for example if greatest was 3 and r= 4, all_poss_arr would have (0,0,0,0), then (0,0,0,1) and so on,
#all the way to (3,3,3,3)
all_poss_arr = list(itertools.product(arr,repeat = r))
#Now I am going to remove all the vectors that dont follow the v_i is between 0 and d_i
for i in range(0,len(all_poss_arr)):
curr = all_poss_arr[i]
cnt = 0
for j in range(0,len(curr)):
if curr[j] <= d[j]:
cnt = cnt +1
if cnt == r:
curr = list(curr)
currcopy = copy.copy(curr)
correct_list = correct_list + [currcopy]
cnt =0
return correct_list
If anyone knows a better way, let me know, it is much appreciated.
You basically want a Cartesian product. I'll demonstrate a basic, functional and iterative approach.
Given
import operator as op
import functools as ft
import itertools as it
def compose(f, g):
"""Return a function composed of two functions."""
def h(*args, **kwargs):
return f(g(*args, **kwargs))
return h
d = (1, 2)
Code
Option 1: Basic - Manual Unpacking
list(it.product(range(d[0] + 1), range(d[1] + 1)))
# [(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2)]
Option 2: Functional - Automated Mapping
def vector_combs(v):
"""Return a Cartesian product of unpacked elements from `v`."""
plus_one = ft.partial(op.add, 1)
range_plus_one = compose(range, plus_one)
res = list(it.product(*map(range_plus_one, v)))
return res
vector_combs(d)
# [(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2)]
Option 3: Iterative - Range Replication (Recommended)
list(it.product(*[range(x + 1) for x in d]))
# [(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2)]
Details
Option 1
The basic idea is illustrated in Option 1:
Make a Cartesian product using a series of modified ranges.
Note, each range is manually incremented and passed in as an index from d. We automate these limitations in with the last options.
Option 2
We apply a functional approach to handle the various arguments and functions:
Partial the 1 argument to the add() function. This returns a function that will increment any number.
Let's pass this function into range through composition. This allows us to have a modified range function that auto increments the integer passed in.
Finally we map the latter function to each element in tuple d. Now d works with any length r.
Example (d = (1, 2, 1), r = 3):
vector_combs((1, 2, 1))
# [(0, 0, 0),
# (0, 0, 1),
# (0, 1, 0),
# (0, 1, 1),
# (0, 2, 0),
# (0, 2, 1),
# (1, 0, 0),
# (1, 0, 1),
# (1, 1, 0),
# (1, 1, 1),
# (1, 2, 0),
# (1, 2, 1)]
Option 3
Perhaps most elegantly, just use a list comprehension to create r ranges. ;)

create list of adjacent elements of another list in Python

I am looking to take as input a list and then create another list which contains tuples (or sub-lists) of adjacent elements from the original list, wrapping around for the beginning and ending elements. The input/output would look like this:
l_in = [0, 1, 2, 3]
l_out = [(3, 0, 1), (0, 1, 2), (1, 2, 3), (2, 3, 0)]
My question is closely related to another titled getting successive adjacent elements of a list, but this other question does not take into account wrapping around for the end elements and only handles pairs of elements rather than triplets.
I have a somewhat longer approach to do this involving rotating deques and zipping them together:
from collections import deque
l_in = [0, 1, 2, 3]
deq = deque(l_in)
deq.rotate(1)
deq_prev = deque(deq)
deq.rotate(-2)
deq_next = deque(deq)
deq.rotate(1)
l_out = list(zip(deq_prev, deq, deq_next))
# l_out is [(3, 0, 1), (0, 1, 2), (1, 2, 3), (2, 3, 0)]
However, I feel like there is probably a more elegant (and/or efficient) way to do this using other built-in Python functionality. If, for instance, the rotate() function of deque returned the rotated list instead of modifying it in place, this could be a one- or two-liner (though this approach of zipping together rotated lists is perhaps not the most efficient). How can I accomplish this more elegantly and/or efficiently?
One approach may be to use itertools combined with more_itertools.windowed:
import itertools as it
import more_itertools as mit
l_in = [0, 1, 2, 3]
n = len(l_in)
list(it.islice(mit.windowed(it.cycle(l_in), 3), n-1, 2*n-1))
# [(3, 0, 1), (0, 1, 2), (1, 2, 3), (2, 3, 0)]
Here we generated an infinite cycle of sliding windows and sliced the desired subset.
FWIW, here is an abstraction of the latter code for a general, flexible solution given any iterable input e.g. range(5), "abcde", iter([0, 1, 2, 3]), etc.:
def get_windows(iterable, size=3, offset=-1):
"""Return an iterable of windows including an optional offset."""
it1, it2 = it.tee(iterable)
n = mit.ilen(it1)
return it.islice(mit.windowed(it.cycle(it2), size), n+offset, 2*n+offset)
list(get_windows(l_in))
# [(3, 0, 1), (0, 1, 2), (1, 2, 3), (2, 3, 0)]
list(get_windows("abc", size=2))
# [('c', 'a'), ('a', 'b'), ('b', 'c')]
list(get_windows(range(5), size=2, offset=-2))
# [(3, 4), (4, 0), (0, 1), (1, 2), (2, 3)]
Note: more-itertools is a separate library, easily installed via:
> pip install more_itertools
This can be done with slices:
l_in = [0, 1, 2, 3]
l_in = [l_in[-1]] + l_in + [l_in[0]]
l_out = [l_in[i:i+3] for i in range(len(l_in)-2)]
Well, or such a perversion:
div = len(l_in)
n = 3
l_out = [l_in[i % div: i % div + 3]
if len(l_in[i % div: i % div + 3]) == 3
else l_in[i % div: i % div + 3] + l_in[:3 - len(l_in[i % div: i % div + 3])]
for i in range(3, len(l_in) + 3 * n + 2)]
You can specify the number of iterations.
Well I figured out a better solution as I was writing the question, but I already went through the work of writing it, so here goes. This solution is at least much more concise:
l_out = list(zip(l_in[-1:] + l_in[:-1], l_in, l_in[1:] + l_in[:1]))
See this post for different answers on how to rotate lists in Python.
The one-line solution above should be at least as efficient as the solution in the question (based on my understanding) since the slicing should not be more expensive than the rotating and copying of the deques (see https://wiki.python.org/moin/TimeComplexity).
Other answers with more efficient (or elegant) solutions are still welcome though.
as you found there is a list rotation slicing based idiom lst[i:] + lst[:i]
using it inside a comprehension taking a variable n for the number of adjacent elements wanted is more general [lst[i:] + lst[:i] for i in range(n)]
so everything can be parameterized, the number of adjacent elements n in the cyclic rotation and the 'phase' p, the starting point if not the 'natural' 0 base index, although the default p=-1 is set to -1 to fit the apparant desired output
tst = list(range(4))
def rot(lst, n, p=-1):
return list(zip(*([lst[i+p:] + lst[:i+p] for i in range(n)])))
rot(tst, 3)
Out[2]: [(3, 0, 1), (0, 1, 2), (1, 2, 3), (2, 3, 0)]
showing the shortend code as per the comment

Descartes product with repetition

So i wanted to make a function that takes positive integer n and returns bunch of n-tuples, filled with all possible combinations of True/False (1/0), for example:
f(1) = (0,),(1,)
f(2) = (0, 0), (0, 1), (1, 0), (1, 1)
My code was:
def fill(n: int) -> Tuple[Tuple[int]]:
if n == 1:
return (0,),(1,)
return tuple((i + j) for i in fill(n-1) for j in fill(1))
I've heard python isn't very good with recursion, and generally feel this isn't effective solution.
It seemed like using powerset of a range of a given number (recipe for powerset is from the itertools module) and then using some kind of Indicator function would do the thing.
from itertools import chain, combinations
def range_powerset(n: int):
s = list(range(n))
return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))
def indicator(A: Iterable, B: Iterable):
return tuple(i in A for i in B)
def fill2(n: int):
return (indicator(i, range(n)) for i in range_powerset(n))
Yet it seems like too much work for a pretty basic thing.
Is there a better way to do it?
What you describe is not a powerset but a Descartes product with repetition. Use itertools.product:
import itertools
def fill(n):
return itertools.product((0,1), repeat=n)
print(list(fill(3)))
# [(0, 0, 0), (0, 0, 1), (0, 1, 0), (0, 1, 1), (1, 0, 0), (1, 0, 1), (1, 1, 0), (1, 1, 1)]

Generating a list of repetitions regardless of the order

I want to generate combinations that associate indices in a list with "slots". For instance,(0, 0, 1) means that 0 and 1 belong to the same slot while 2 belongs to an other. (0, 1, 1, 1) means that 1, 2, 3 belong to the same slot while 0 is by itself. In this example, 0 and 1 are just ways of identifying these slots but do not carry information for my usage.
Consequently, (0, 0, 0) is absolutely identical to (1, 1, 1) for my purposes, and (0, 0, 1) is equivalent to (1, 1, 0).
The classical cartesian product generates a lot of these repetitions I'd like to get rid of.
This is what I obtain with itertools.product :
>>> LEN, SIZE = (3,1)
>>> list(itertools.product(range(SIZE+1), repeat=LEN))
>>>
[(0, 0, 0),
(0, 0, 1),
(0, 1, 0),
(0, 1, 1),
(1, 0, 0),
(1, 0, 1),
(1, 1, 0),
(1, 1, 1)]
And this is what I'd like to get:
>>> [(0, 0, 0),
(0, 0, 1),
(0, 1, 0),
(0, 1, 1)]
It is easy with small lists but I don't quite see how to do this with bigger sets. Do you have a suggestion?
If it's unclear, please tell me so that I can clarify my question. Thank you!
Edit: based on Sneftel's answer, this function seems to work, but I don't know if it actually yields all the results:
def test():
for p in product(range(2), repeat=3):
j=-1
good = True
for k in p:
if k> j and (k-j) > 1:
good = False
elif k >j:
j = k
if good:
yield p
I would start by making the following observations:
The first element of each combination must be 0.
The second element must be 0 or 1.
The third element must be 0, 1 or 2, but it can only be 2 if the second element was 1.
These observations suggest the following algorithm:
def assignments(n, m, used=0):
"""Generate assignments of `n` items to `m` indistinguishable
buckets, where `used` buckets have been used so far.
>>> list(assignments(3, 1))
[(0, 0, 0)]
>>> list(assignments(3, 2))
[(0, 0, 0), (0, 0, 1), (0, 1, 0), (0, 1, 1)]
>>> list(assignments(3, 3))
[(0, 0, 0), (0, 0, 1), (0, 1, 0), (0, 1, 1), (0, 1, 2)]
"""
if n == 0:
yield ()
return
aa = list(assignments(n - 1, m, used))
for first in range(used):
for a in aa:
yield (first,) + a
if used < m:
for a in assignments(n - 1, m, used + 1):
yield (used,) + a
This handles your use case (12 items, 5 buckets) in a few seconds:
>>> from timeit import timeit
>>> timeit(lambda:list(assignments(12, 5)), number=1)
4.513746023178101
>>> sum(1 for _ in assignments(12, 5))
2079475
This is substantially faster than the function you give at the end of your answer (the one that calls product and then drops the invalid assignments) would be if it were modified to handle the (12, 5) use case:
>>> timeit(lambda:list(test(12, 5)), number=1)
540.693009853363
Before checking for duplicates, you should harmonize the notation (assuming you don't want to set up some fancy AI): iterate through the lists and assign set-affiliation numbers for differing elements starting at 0, counting upwards. That is, you create a temporary dictionary per line that you are processing.
An exemplary output would be
(0,0,0) -> (0,0,0)
(0,1,0) -> (0,1,0)
but
(1,0,1) -> (0,1,0)
Removing the duplicates can then easily be performed as the problem is reduced to the problem of the solved question at Python : How to remove duplicate lists in a list of list?
If you only consider the elements of the cartesian product where the first occurrences of all indices are sorted and consecutive from zero, that should be sufficient. itertools.combinations_with_replacement() will eliminate those that are not sorted, so you'll only need to check that indices aren't being skipped.
In your specific case you could simply take the first or the second half of the list of those items produced by a cartesian product.
import itertools
alphabet = '01'
words3Lettered = [''.join(letter) for letter in itertools.product(alphabet,repeat=3)]
for n lettered words use repeat=n
words3Lettered looks like this:
['000', '001', '010', '011', '100', '101', '110', '111']
next,
usefulWords = words3Lettered[:len(words3Lettered)/2]
which looks like this:
['000', '001', '010', '011']
you might be interested in the other half i.e. words3Lettered[len(words3Lettered)/2:] though the other half was supposed to "fold" onto the first half.
most probably you want to use the combination of letters in numeric form so...
indexes = [tuple(int(j) for j in word) for word in usefulWords]
which gives us:
[(0, 0, 0), (0, 0, 1), (0, 1, 0), (0, 1, 1)]

Categories

Resources