Related
I have a list of tuples and i want to remove tuples so that there is only one tuple in the list that has a given length and sum.
That's a bad explanation so for example:
[(0,1,2), (0,2,1), (0,0,1)]
remove (0,1,2) or (0,2,1)
I want to be able to iterate though the list and remove any tuples that satisfy the following conditions:
len(tuple1) == len(tuple2) and sum(tuple1) == sum(tuple2)
but keep either tuple1 or tuple2 in the list.
I tried:
for t1 in list:
for t2 in list:
if len(t1) == len(t2) and sum(t1) == sum(t2):
list.remove(t1)
but im pretty sure this removes all tuples and the console crashed.
In essence this is a "uniqness filter", but where we specify a function f, and only if that f(x) occurs a second time, we filter that element out.
We can implement such uniqness filter, given f(x) produces hashable values, with:
def uniq(iterable, key=lambda x: x):
seen = set()
for item in iterable:
u = key(item)
if u not in seen:
yield item
seen.add(u)
Then we can use this filter as:
result = list(uniq(data, lambda x: (len(x), sum(x))))
for example:
>>> list(uniq(data, lambda x: (len(x), sum(x))))
[(0, 1, 2), (0, 0, 1)]
Here we will always retain the first occurrence of the "duplicates".
Let me offer a slightly different solution. Note that this is not something I'd use for a one-off script, but for a real project. Because your [(0, 0, 1)] actually represents something logical/physical.
set(..) removes duplicates. How about we use that? The only thing to keep it mind is that the hash value and equality of the elements need to be modified.
class Converted(object):
def __init__(self, tup):
self.tup = tup
self.transformed = len(tup), sum(tup)
def __eq__(self, other):
return self.transformed == other.transformed
def __hash__(self):
return hash(self.transformed)
inp = [(0,1,2), (0,2,1), (0,0,1)]
out = [x.tup for x in set(map(Converted, inp))]
print(out)
# [(0, 0, 1), (0, 1, 2)]
You can also use groupby to group elements by sum and len and fetch 1 element from each group to create a new list:
from itertools import groupby
def _key(t):
return (len(t), sum(t))
data = [(0, 1, 2), (0, 2, 1), (0, 0, 1), (1, 0, 0), (0, 1, 0), (3, 0, 0, 0)]
result = []
for k, g in groupby(sorted(data, key=_key), key=_key):
result.append(next(g))
print(result)
# [(0, 0, 1), (0, 1, 2), (3, 0, 0, 0)]
The complexity of your problem comes mainly from the fact that you have two independent filters you want to implement. A good way to go about filtering on data with this sort of requirement is to use groupby. However, before you can do that you need to sort first. Since you normally sort over one key, you'll need to sort twice before you can group:
from itertools import groupby
def lensumFilter(data):
return [next(g) for _, g in groupby(sorted(sorted(data, key = len), key = sum),
key = lambda x: (len(x), sum(x)))]
>>> print(lensumFilter( [(0, 1, 2), (0, 2, 1), (0, 0, 1)] )
[(0, 0, 1), (0, 2, 1)]
>>> print(lensumFilter( [(0, 1, 2), (0, 2, 1), (0, 0, 0, 3), (0, 0, 1)] )
[(0, 0, 1), (0, 2, 1), (0, 0, 0, 3)]
>>> print(lensumFilter( [(0, 1, 2), (0, 2, 2), (0, 4), (0, 0, 0, 5), (0, 0, 3)] )
[(0, 1, 2), (0, 4), (0, 2, 2), (0, 0, 0, 5)]
Note that if you change how the sorts work, you change how the output will look. For instance, I sorted on length and then sum so my results are in order with respect to sum (smallest sum first) and then in order with respect to length (fewest number of elements first) within sum-groupings. That's why (0, 1, 2) comes before (0, 4) but (0, 4) comes before (0, 2, 2).
If you want to do something concise and more pythonic, you could use the function filter.
It will keep all the elements that are matching your requirements (here sum being not equal when same length):
tup_remove = (0,2,1)
list(filter(lambda current_tup: not (sum(tup_remove) == sum(current_tup) and len(tup_remove) == len(current_tup))
For better readability and extensibility, I would encourage you to use a function:
def not_same_sum_len_tuple(tup_to_check, current_tuple):
"""Return True when not same sum AND same length"""
same_sum = sum(tup_to_check) == sum(current_tuple) # Check the sum
same_len = len(tup_remove) == len(current_tuple) # Check the length
return not (same_sum and same_len)
tup_remove = (0,2,1)
list(filter(lambda current_tup: not_same_sum_len_tuple(tup_remove, current_tup), tup_list))
It's probably easier to just make a new list that meets your conditions.
old_list = [(0,1,2), (0,2,1), (0,0,1)]
new_list = []
for old_t in old_list:
for new_t in new_list:
if len(old_t) == len(new_t) and sum(old_t) == sum(new_t):
break
else:
new_list.append(old_t)
# new_list == [(0, 1, 2), (0, 0, 1)]
This is a simpler solution but may not be performant. Just make a dict with (len(t), sum(t)) as keys and the tuples as values. The last tuple stays.
lst = [(0,1,2), (0,2,1), (0,0,1)]
d = {(len(t), sum(t)): t for t in lst}
list(d.values())
In one line;
list({(len(t), sum(t)): t for t in lst}.values())
To make it performant just memoize len and sum.
from functools import lru_cache
mlen, msum = (lru_cache(maxsize=None)(f) for f in (len, sum))
list({(mlen(t), msum(t)): t for t in lst}.values())
I am looking to take as input a list and then create another list which contains tuples (or sub-lists) of adjacent elements from the original list, wrapping around for the beginning and ending elements. The input/output would look like this:
l_in = [0, 1, 2, 3]
l_out = [(3, 0, 1), (0, 1, 2), (1, 2, 3), (2, 3, 0)]
My question is closely related to another titled getting successive adjacent elements of a list, but this other question does not take into account wrapping around for the end elements and only handles pairs of elements rather than triplets.
I have a somewhat longer approach to do this involving rotating deques and zipping them together:
from collections import deque
l_in = [0, 1, 2, 3]
deq = deque(l_in)
deq.rotate(1)
deq_prev = deque(deq)
deq.rotate(-2)
deq_next = deque(deq)
deq.rotate(1)
l_out = list(zip(deq_prev, deq, deq_next))
# l_out is [(3, 0, 1), (0, 1, 2), (1, 2, 3), (2, 3, 0)]
However, I feel like there is probably a more elegant (and/or efficient) way to do this using other built-in Python functionality. If, for instance, the rotate() function of deque returned the rotated list instead of modifying it in place, this could be a one- or two-liner (though this approach of zipping together rotated lists is perhaps not the most efficient). How can I accomplish this more elegantly and/or efficiently?
One approach may be to use itertools combined with more_itertools.windowed:
import itertools as it
import more_itertools as mit
l_in = [0, 1, 2, 3]
n = len(l_in)
list(it.islice(mit.windowed(it.cycle(l_in), 3), n-1, 2*n-1))
# [(3, 0, 1), (0, 1, 2), (1, 2, 3), (2, 3, 0)]
Here we generated an infinite cycle of sliding windows and sliced the desired subset.
FWIW, here is an abstraction of the latter code for a general, flexible solution given any iterable input e.g. range(5), "abcde", iter([0, 1, 2, 3]), etc.:
def get_windows(iterable, size=3, offset=-1):
"""Return an iterable of windows including an optional offset."""
it1, it2 = it.tee(iterable)
n = mit.ilen(it1)
return it.islice(mit.windowed(it.cycle(it2), size), n+offset, 2*n+offset)
list(get_windows(l_in))
# [(3, 0, 1), (0, 1, 2), (1, 2, 3), (2, 3, 0)]
list(get_windows("abc", size=2))
# [('c', 'a'), ('a', 'b'), ('b', 'c')]
list(get_windows(range(5), size=2, offset=-2))
# [(3, 4), (4, 0), (0, 1), (1, 2), (2, 3)]
Note: more-itertools is a separate library, easily installed via:
> pip install more_itertools
This can be done with slices:
l_in = [0, 1, 2, 3]
l_in = [l_in[-1]] + l_in + [l_in[0]]
l_out = [l_in[i:i+3] for i in range(len(l_in)-2)]
Well, or such a perversion:
div = len(l_in)
n = 3
l_out = [l_in[i % div: i % div + 3]
if len(l_in[i % div: i % div + 3]) == 3
else l_in[i % div: i % div + 3] + l_in[:3 - len(l_in[i % div: i % div + 3])]
for i in range(3, len(l_in) + 3 * n + 2)]
You can specify the number of iterations.
Well I figured out a better solution as I was writing the question, but I already went through the work of writing it, so here goes. This solution is at least much more concise:
l_out = list(zip(l_in[-1:] + l_in[:-1], l_in, l_in[1:] + l_in[:1]))
See this post for different answers on how to rotate lists in Python.
The one-line solution above should be at least as efficient as the solution in the question (based on my understanding) since the slicing should not be more expensive than the rotating and copying of the deques (see https://wiki.python.org/moin/TimeComplexity).
Other answers with more efficient (or elegant) solutions are still welcome though.
as you found there is a list rotation slicing based idiom lst[i:] + lst[:i]
using it inside a comprehension taking a variable n for the number of adjacent elements wanted is more general [lst[i:] + lst[:i] for i in range(n)]
so everything can be parameterized, the number of adjacent elements n in the cyclic rotation and the 'phase' p, the starting point if not the 'natural' 0 base index, although the default p=-1 is set to -1 to fit the apparant desired output
tst = list(range(4))
def rot(lst, n, p=-1):
return list(zip(*([lst[i+p:] + lst[:i+p] for i in range(n)])))
rot(tst, 3)
Out[2]: [(3, 0, 1), (0, 1, 2), (1, 2, 3), (2, 3, 0)]
showing the shortend code as per the comment
Say I have a list of tuples [(0, 1, 2, 3), (4, 5, 6, 7), (3, 2, 1, 0)], I would like to remove all instances where a tuple is reversed e.g. removing (3, 2, 1, 0) from the above list.
My current (rudimentary) method is:
L = list(itertools.permutations(np.arange(x), 4))
for ll in L:
if ll[::-1] in L:
L.remove(ll[::-1])
Where time taken increases exponentially with increasing x. So if x is large this takes ages! How can I speed this up?
Using set comes to mind:
L = set()
for ll in itertools.permutations(np.arange(x), 4):
if ll[::-1] not in L:
L.add(ll)
or even, for slightly better performance:
L = set()
for ll in itertools.permutations(np.arange(x), 4):
if ll not in L:
L.add(ll[::-1])
The need to keep the first looks like it forces you to iterate with a contitional.
a = [(0, 1, 2, 3), (4, 5, 6, 7), (3, 2, 1, 0)]
s = set(); a1 = []
for t in a:
if t not in s:
a1.append(t)
s.add(t[::-1])
Edit: The accepted answer addresses the example code (i.e. the itertools permutations sample). This answers the generalized question for any list (or iterable).
I want to print all possible combination of 3 numbers from the set (0 ... n-1), while each one of those combinations is unique. I get the variable n via this code:
n = raw_input("Please enter n: ")
But I'm stuck at coming up with the algorithm. Any help please?
from itertools import combinations
list(combinations(range(n),3))
This would work as long as you are using later than Python 2.6
If you want all the possible combinations with repetition in values and differ in position you need to use product like this:
from itertools import product
t = range(n)
print set(product(set(t),repeat = 3))
for example, if n = 3, the output will be:
set([(0, 1, 1), (1, 1, 0), (1, 0, 0), (0, 0, 1), (1, 0, 1), (0, 0, 0), (0, 1, 0), (1, 1, 1)])
hope this helps
itertools is your friend here, specifically permutations.
Demo:
from itertools import permutations
for item in permutations(range(n), 3):
print item
This is assuming you have Python 2.6 or newer.
combos = []
for x in xrange(n):
for y in xrange(n):
for z in xrange(n):
combos.append([x,y,z])
I want to generate combinations that associate indices in a list with "slots". For instance,(0, 0, 1) means that 0 and 1 belong to the same slot while 2 belongs to an other. (0, 1, 1, 1) means that 1, 2, 3 belong to the same slot while 0 is by itself. In this example, 0 and 1 are just ways of identifying these slots but do not carry information for my usage.
Consequently, (0, 0, 0) is absolutely identical to (1, 1, 1) for my purposes, and (0, 0, 1) is equivalent to (1, 1, 0).
The classical cartesian product generates a lot of these repetitions I'd like to get rid of.
This is what I obtain with itertools.product :
>>> LEN, SIZE = (3,1)
>>> list(itertools.product(range(SIZE+1), repeat=LEN))
>>>
[(0, 0, 0),
(0, 0, 1),
(0, 1, 0),
(0, 1, 1),
(1, 0, 0),
(1, 0, 1),
(1, 1, 0),
(1, 1, 1)]
And this is what I'd like to get:
>>> [(0, 0, 0),
(0, 0, 1),
(0, 1, 0),
(0, 1, 1)]
It is easy with small lists but I don't quite see how to do this with bigger sets. Do you have a suggestion?
If it's unclear, please tell me so that I can clarify my question. Thank you!
Edit: based on Sneftel's answer, this function seems to work, but I don't know if it actually yields all the results:
def test():
for p in product(range(2), repeat=3):
j=-1
good = True
for k in p:
if k> j and (k-j) > 1:
good = False
elif k >j:
j = k
if good:
yield p
I would start by making the following observations:
The first element of each combination must be 0.
The second element must be 0 or 1.
The third element must be 0, 1 or 2, but it can only be 2 if the second element was 1.
These observations suggest the following algorithm:
def assignments(n, m, used=0):
"""Generate assignments of `n` items to `m` indistinguishable
buckets, where `used` buckets have been used so far.
>>> list(assignments(3, 1))
[(0, 0, 0)]
>>> list(assignments(3, 2))
[(0, 0, 0), (0, 0, 1), (0, 1, 0), (0, 1, 1)]
>>> list(assignments(3, 3))
[(0, 0, 0), (0, 0, 1), (0, 1, 0), (0, 1, 1), (0, 1, 2)]
"""
if n == 0:
yield ()
return
aa = list(assignments(n - 1, m, used))
for first in range(used):
for a in aa:
yield (first,) + a
if used < m:
for a in assignments(n - 1, m, used + 1):
yield (used,) + a
This handles your use case (12 items, 5 buckets) in a few seconds:
>>> from timeit import timeit
>>> timeit(lambda:list(assignments(12, 5)), number=1)
4.513746023178101
>>> sum(1 for _ in assignments(12, 5))
2079475
This is substantially faster than the function you give at the end of your answer (the one that calls product and then drops the invalid assignments) would be if it were modified to handle the (12, 5) use case:
>>> timeit(lambda:list(test(12, 5)), number=1)
540.693009853363
Before checking for duplicates, you should harmonize the notation (assuming you don't want to set up some fancy AI): iterate through the lists and assign set-affiliation numbers for differing elements starting at 0, counting upwards. That is, you create a temporary dictionary per line that you are processing.
An exemplary output would be
(0,0,0) -> (0,0,0)
(0,1,0) -> (0,1,0)
but
(1,0,1) -> (0,1,0)
Removing the duplicates can then easily be performed as the problem is reduced to the problem of the solved question at Python : How to remove duplicate lists in a list of list?
If you only consider the elements of the cartesian product where the first occurrences of all indices are sorted and consecutive from zero, that should be sufficient. itertools.combinations_with_replacement() will eliminate those that are not sorted, so you'll only need to check that indices aren't being skipped.
In your specific case you could simply take the first or the second half of the list of those items produced by a cartesian product.
import itertools
alphabet = '01'
words3Lettered = [''.join(letter) for letter in itertools.product(alphabet,repeat=3)]
for n lettered words use repeat=n
words3Lettered looks like this:
['000', '001', '010', '011', '100', '101', '110', '111']
next,
usefulWords = words3Lettered[:len(words3Lettered)/2]
which looks like this:
['000', '001', '010', '011']
you might be interested in the other half i.e. words3Lettered[len(words3Lettered)/2:] though the other half was supposed to "fold" onto the first half.
most probably you want to use the combination of letters in numeric form so...
indexes = [tuple(int(j) for j in word) for word in usefulWords]
which gives us:
[(0, 0, 0), (0, 0, 1), (0, 1, 0), (0, 1, 1)]