How to get rid of sub-tuples in this list? - python

list_of_tuple = [(0,2), (0,6), (4,6), (6,7), (8,9)]
Since (0,2) & (4,6) are both within the indexes of (0,6), so I want to remove them. The resulting list would be:
list_of_tuple = [(0,6), (6,7), (8,9)]
It seems I need to sort this tuple of list somehow to make it easier to remove. But How to sort a list of tuples?
Given two tuples of array indexes, [m,n] and [a,b], if:
m >=a & n<=b
Then [m,n] is included in [a,b], then remove [m,n] from the list.

To remove all tuples from list_of_tuples with a range out of the specified tuple:
list_of_tuple = [(0,2), (0,6), (4,6), (6,7), (8,9)]
def rm(lst,tup):
return [tup]+[t for t in lst if t[0] < tup[0] or t[1] > tup[1]]
print(rm(list_of_tuple,(0,6)))
Output:
[(0, 6), (6, 7), (8, 9)]

Here's a dead-simple solution, but it's O(n2):
intervals = [(0, 2), (0, 6), (4, 6), (6, 7), (8, 9)] # list_of_tuple
result = [
t for t in intervals
if not any(t != u and t[0] >= u[0] and t[1] <= u[1] for u in intervals)
]
It filters out intervals that are not equal to, but contained in, any other intervals.

Seems like an opportunity to abuse both reduce() and Python's logical operators! Solution assumes list is sorted as in the OP's example, primarily on the second element of each tuple, and secondarily on the first:
from functools import reduce
list_of_sorted_tuples = [(0, 2), (0, 6), (4, 6), (6, 7), (8, 9)]
def contains(a, b):
return a[0] >= b[0] and a[1] <= b[1] and [b] or b[0] >= a[0] and b[1] <= a[1] and [a] or [a, b]
reduced_list = reduce(lambda x, y: x[:-1] + contains(x[-1], y) if x else [y], list_of_sorted_tuples, [])
print(reduced_list)
OUTPUT
> python3 test.py
[(0, 6), (6, 7), (8, 9)]
>

You could try something like this to check if both ends of the (half-open) interval are contained within another interval:
list_of_tuple = [(0,2), (0,6), (4,6), (6,7), (8,9)]
reduced_list = []
for t in list_of_tuple:
add = True
for o in list_of_tuple:
if t is not o:
r = range(*o)
if t[0] in r and (t[1] - 1) in r:
add = False
if add:
reduced_list.append(t)
print(reduced_list) # [(0, 6), (6, 7), (8, 9)]
Note: This assumes that your tuples are half-open intervals, i.e. [0, 6) where 0 is inclusive but 6 is exclusive, similar to how range would treat the start and stop parameters. A couple of small changes would have to be made for the case of fully closed intervals:
range(*o) -> range(o[0], o[1] + 1)
and
if t[0] in r and (t[1] - 1) in r: -> if t[0] in r and t[1] in r:

Here is the first step towards a solution that can be done in O(n log(n)):
def non_cont(lot):
s = sorted(lot, key = lambda t: (t[0], -t[1]))
i = 1
while i < len(s):
if s[i][0] >= s[i - 1][0] and s[i][1] <= s[i - 1][1]:
del s[i]
else:
i += 1
return s
The idea is that after sorting using the special key function, the each element that is contained in some other element, will be located directly after an element that contains it. Then, we sweep the list, removing elements that are contained by the element that precedes them. Now, the sweep and delete loop is, itself, of complexity O(n^2). The above solution is for clarity, more than anything else. We can move to the next implementation:
def non_cont_on(lot):
s = sorted(lot, key = lambda t: (t[0], -t[1]))
i = 1
result = s[:1]
for i in s:
if not (i[0] >= result[-1][0] and i[1] <= result[-1][1]):
result.append(i)
return result
There is no quadratic sweep and delete loop here, only a nice, linear process of constructing the result. Space complexity is O(n). It is possible to perform this algorithm without extra, non-constant, space, but I will leave this out.
A side effect of both algorithm is that the intervals are sorted.

If you want to preserve the information about the inclusion-structure (by which enclosing interval an interval of the original set is consumed) you can build a "one-level tree":
def contained(tpl1, tpl2):
return tpl1[0] >= tpl2[0] and tpl1[1] <= tpl2[1]
def interval_hierarchy(lst):
if not lst:
return
root = lst.pop()
children_dict = {root: []}
while lst:
t = lst.pop()
curr_children = list(children_dict.keys())
for k in curr_children:
if contained(k, t):
children_dict[t] = (children_dict[t] if t in children_dict else []) +\
[k, *children_dict[k]]
children_dict.pop(k)
elif contained(t, k):
children_dict[k].append(t)
if t in children_dict:
children_dict[k] += children_dict[t]
children_dict.pop(t)
else:
if not t in children_dict:
children_dict[t] = []
# return whatever information you might want to use
return children_dict, list(children_dict.keys())

It appears you are trying to merge intervals which are overlapping. For example, (9,11), (10,12) are merged in the second example below to produce (9,12).
In that case, a simple sort using sorted will automatically handle tuples.
Approach: Store the next interval to be added. Keep extending the end of the interval until you encounter a value whose "start" comes after (>=) the "end" of the next value to add. At that point, that stored next interval can be appended to the results. Append at the end to account for processing all values.
def merge_intervals(val_input):
if not val_input:
return []
vals_sorted = sorted(val_input) # sorts by tuple values "natural ordering"
result = []
x0, x1 = vals_sorted[0] # store next interval to be added as (x0, x1)
for start, end in vals_sorted[1:]:
if start >= x1: # reached next separate interval
result.append((x0, x1))
x0, x1 = (start, end)
elif end > x1:
x1 = end # extend length of next interval to be added
result.append((x0, x1))
return result
print(merge_intervals([(0,2), (0,6), (4,6), (6,7), (8,9)]))
print(merge_intervals([(1,2), (9,11), (10,12), (1,7)]))
Output:
[(0, 6), (6, 7), (8, 9)]
[(1, 7), (9, 12)]

Related

Count the maximum number of 0s between two 1s in list python

I want to count the 0s between two 1s from a list.
For example:
l = [0,1,0,0,0,0,1,1,1,0,0,1,0,1,1,0]
I want the output to be [4,2,1]. How can I do that in python?
A slightly different way using itertools.groupby - using the fact that any entries beyond the first and last 1 is irrelevant to us
from itertools import groupby
first_one = l.index(1) # index of the first "1"
last_one = len(l) - l[::-1].index(1) - 1 # index of the last "1"
out = [len(list(g)) for k, g in groupby(l[first_one:last_one], key=lambda x: x == 0) if k]
Output
[4, 2, 1]
My one-liner
Just for fun (not that I encourage doing so in real project), here is a one liner (but a big line), using almost all iterators in itertools (well, not nearly, in reality. There are really lot of them)
(y for (x,y),(z,t) in itertools.pairwise(itertools.dropwhile(lambda x: x[0]==0, ((x,sum(1 for _ in y)) for x,y in itertools.groupby(l)))) if x==0)
It's an iterator, and nowhere in the process do I build any list. So it would work f l was itself an iterator giving billions of 1 and 0, without using any memory
Explanation
itertools.groupby(l)
is an iterator giving subiterators for each new value of l.
So
for v,it in itertools.groupby(l):
for x in it:
print(x)
Just prints all elements of l. But with 9 iterations for x in it, one in which x is 1 time 0, then one in which x is 1 time 1, then one in which x is 4 times 0, then etc.
If y is an iterator, then sum(1 for _ in y) is the number of iterations.
So
((x,sum(1 for _ in y)) for x,y in itertools.groupby(l))
iterates pairs (value0or1, numberOfSuchInGroup), with alternating value0or1
For example
list((x,sum(1 for _ in y)) for x,y in itertools.groupby(l))
here is
[(0, 1), (1, 1), (0, 4), (1, 3), (0, 2), (1, 1), (0, 1), (1, 2), (0, 1)]
If want to drop the first pair, at least if it is a group of 0, since leading 0 does not count. Plus, I want to play with another iterator, which is dropwhile. So
itertools.dropwhile(lambda x: x[0]==0, ((x,sum(1 for _ in y)) for x,y in itertools.groupby(l)))
is the same iterator as before. But without the first pair if it is group of 0
list(itertools.dropwhile(lambda x: x[0]==0, ((x,sum(1 for _ in y)) for x,y in itertools.groupby(l))))
is
[(1, 1), (0, 4), (1, 3), (0, 2), (1, 1), (0, 1), (1, 2), (0, 1)]
I also want do drop the last pair (at least if it is a group of 0, but it doesn't hurt if I drop it also if it is a group of 1). And I haven't played with pairwise yet. Which iterates through pairs of subsequent elemnents
list(itertools.pairwise(range(5)))
is
((0,1),(1,2),(2,3),(3,4))
for example
Here, I use it for a very silly reason: just to drop the last item, since of course, there is one less item in pairwise iteration. In my last example, we have 4 items
list(x for x,y in itertools.pairwise(range(5)))
is
[0,1,2,3]
So, strange usage of pairwise, but it drops the last iteration used that way.
So in our case
((x,y) for (x,y),(z,t) in itertools.pairwise(itertools.dropwhile(lambda x: x[0]==0, ((x,sum(1 for _ in y)) for x,y in itertools.groupby(l)))))
is the same iterator as before, but without the last pair
list((x,y) for (x,y),(z,t) in itertools.pairwise(itertools.dropwhile(lambda x: x[0]==0, ((x,sum(1 for _ in y)) for x,y in itertools.groupby(l)))))
is
[(1, 1), (0, 4), (1, 3), (0, 2), (1, 1), (0, 1), (1, 2)]
Now that we have only groups of 0 that are valid, we can filter out the 1s
list((x,y) for (x,y),(z,t) in itertools.pairwise(itertools.dropwhile(lambda x: x[0]==0, ((x,sum(1 for _ in y)) for x,y in itertools.groupby(l)))) if x==0)
is
[(0, 4), (0, 2), (0, 1)]
Plus, we don't need the 0s because at this stage they are all 0 anyway.
So, keep just y not (x,y)
list(y for (x,y),(z,t) in itertools.pairwise(itertools.dropwhile(lambda x: x[0]==0, ((x,sum(1 for _ in y)) for x,y in itertools.groupby(l)))) if x==0)
Is
[4,2,1]
One option using pandas:
l = [0,1,0,0,0,0,1,1,1,0,0,1,0,1,1,0]
s = pd.Series(l)
out = (s[s.eq(0)&s.cummax()&s.loc[::-1].cummax()]
.rsub(1).groupby(s.ne(0).cumsum()).sum()
.tolist()
)
With pure python and itertools.groupby:
from itertools import groupby
l = [0,1,0,0,0,0,1,1,1,0,0,1,0,1,1,0]
out = []
start = False
for k, g in groupby(l):
if k == 1:
if start:
out.append(count)
start = True
else:
count = len(list(g))
output: [4, 2, 1]
An old-school answer.
def count(l: list[int]) -> list[int]:
res = []
counter = 0
lastx = l[0]
for x in l[1:]:
rising = (x-lastx) > 0
if rising and counter != 0:
res.append(counter)
counter = counter+1 if x==0 else 0
lastx = x
return res
count([0,1,0,0,0,0,1,1,1,0,0,1,0,1,1,0]) # [4, 2, 1]
How about this, explanation is all in the code:
l = [0,1,0,0,0,0,1,1,1,0,0,1,0,1,1,0]
output = []
for i in range(len(l)): #this goes through every index in l (1,2,3,...15, 16)
if l[i] == 1: #if in the list at the current index is a 1 it
zeros = 0 #sets zeros to 0
while l[i+1+zeros] == 0: #and starts a while loop as long as the current index+1+'the amount of zeros after the last number 1' is a zero. (so it stops when it reaches another 1)
zeros += 1 # because the while loop still runs it adds another 0
if i+1+zeros == len(l): #the current index + 1 + 'the amount of zeros' = the length of our list
zeros = 0 # it sets zeros back to 0 so the program doesn't add them to the output (else the output would be [4, 2, 1, 1])
break #breaks out of the loop
if zeros > 0: #if the zeros counted between two 1s are more then 0:
output.append(zeros) # it adds them to our final output
print(output) #prints [4, 2, 1] to the terminal

Efficient way to find the index of repeated sequence in a list?

I have a large list of numbers in python, and I want to write a function that finds sections of the list where the same number is repeated more than n times. For example, if n is 3 then my function should return the following results for the following examples:
When applied to example = [1,2,1,1,1,1,2,3] the function should return [(2,6)], because example[2:6] is a sequence containing all the same value.
When applied to example = [0,0,0,7,3,2,2,2,2,1] the function should return [(0,3), (5,9)] because both example[0:3] and example[5:9] contain repeated sequences of the same value.
When applied to example = [1,2,1,2,1,2,1,2,1,2] the function should return [] because there is no sequence of three or more elements that are all the same number.
I know I could write a bunch of loops to get what I want, but that seems kind of inefficient, and I was wondering if there was an easier option to obtain what I wanted.
Use itertools.groupby and enumerate:
>>> from itertools import groupby
>>> n = 3
>>> x = [1,2,1,1,1,1,2,3]
>>> grouped = (list(g) for _,g in groupby(enumerate(x), lambda t:t[1]))
>>> [(g[0][0], g[-1][0] + 1) for g in grouped if len(g) >= n]
[(2, 6)]
>>> x = [0,0,0,7,3,2,2,2,2,1]
>>> grouped = (list(g) for _,g in groupby(enumerate(x), lambda t:t[1]))
>>> [(g[0][0], g[-1][0] + 1) for g in grouped if len(g) >= n]
[(0, 3), (5, 9)]
To understand groupby: just realize that each iteration returns the value of the key, which is used to group the elements of the iterable, along with a new lazy-iterable that will iterate over the group.
>>> list(groupby(enumerate(x), lambda t:t[1]))
[(0, <itertools._grouper object at 0x7fc90a707bd0>), (7, <itertools._grouper object at 0x7fc90a707ad0>), (3, <itertools._grouper object at 0x7fc90a707950>), (2, <itertools._grouper object at 0x7fc90a707c10>), (1, <itertools._grouper object at 0x7fc90a707c50>)]
You can do this in a single loop by following the current algorithm:
def find_pairs (array, n):
result_pairs = []
prev = idx = 0
count = 1
for i in range (0, len(array)):
if(i > 0):
if(array[i] == prev):
count += 1
else:
if(count >= n):
result_pairs.append((idx, i))
else:
prev = array[i]
idx = i
count = 1
else:
prev = array[i]
idx = i
return result_pairs
And you call the function like this: find_pairs(list, n). The is the most efficient way you can perform this task, as it has complexity O(len(array)). I think is pretty simple to understand, but if you have any doubts just ask.
You could use this. Note that your question is ambiguous as to the role of n. I assume here that a series of n equal values should be matched. If it should have at least n+1 values, then replace >= by >:
def monotoneRanges(a, n):
idx = [i for i, v in enumerate(a) if not i or a[i-1] != v] + [len(a)]
return [r for r in zip(idx, idx[1:]) if r[1] >= r[0]+n]
# example call
res = monotoneRanges([0,0,0,7,3,2,2,2,2,1], 3)
print(res)
Outputs:
[(0, 3), (5, 9)]

Organizing list of tuples

I have a list of tuples which I create dynamically.
The list appears as:
List = [(1,4), (8,10), (19,25), (10,13), (14,16), (25,30)]
Each tuple (a, b) of list represents the range of indexes from a certain table.
The ranges (a, b) and (b, d) is same in my situation as (a, d)
I want to merge the tuples where the 2nd element matches the first of any other.
So, in the example above, I want to merge (8, 10), (10,13) to obtain (8,13) and remove (8, 10), (10,13)
(19,25) and (25,30) merge should yield (19, 30)
I don't have a clue where to start. The tuples are non overlapping.
Edit: I have been trying to just avoid any kind of for loop as I have a pretty large list
If you need to take into account things like skovorodkin's example in the comment,
[(1, 4), (4, 8), (8, 10)]
(or even more complex examples), then one way to do efficiently would be using graphs.
Say you create a digraph (possibly using networkx), where each pair is a node, and there is an edge from (a, b) to node (c, d) if b == c. Now run topological sort, iterate according to the order, and merge accordingly. You should take care to handle nodes with two (or more) outgoing edges properly.
I realize your question states you'd like to avoid loops on account of the long list size. Conversely, for long lists, I doubt you'll find even an efficient linear time solution using list comprehension (or something like that). Note that you cannot sort the list in linear time, for example.
Here is a possible implementation:
Say we start with
l = [(1,4), (8,10), (19,25), (10,13), (14,16), (25,30)]
It simplifies the following to remove duplicates, so let's do:
l = list(set(l))
Now to build the digraph:
import networkx as nx
import collections
g = nx.DiGraph()
The vertices are simply the pairs:
g.add_nodes_from(l)
To build the edges, we need a dictionary:
froms = collections.defaultdict(list)
for p in l:
froms[p[0]].append(p)
Now we can add the edges:
for p in l:
for from_p in froms[p[1]]:
g.add_edge(p, from_p)
Next two lines are unneeded - they're just here to show what the graph looks like at this point:
>>> g.nodes()
[(25, 30), (14, 16), (10, 13), (8, 10), (1, 4), (19, 25)]
>>> g.edges()
[((8, 10), (10, 13)), ((19, 25), (25, 30))]
Now, let's sort the pairs by topological sort:
l = nx.topological_sort(g)
Finally, here's the tricky part. The result will be a DAG. We have to to traverse things recursively, but remember what we visited already.
Let's create a dict of what we visited:
visited = {p: False for p in l}
Now a recursive function, that given a node, returns the maximum range edge from any node reachable from it:
def visit(p):
neighbs = g.neighbors(p)
if visited[p] or not neighbs:
visited[p] = True
return p[1]
mx = max([visit(neighb_p) for neighb_p in neighbs])
visited[p] = True
return mx
We're all ready. Let's create a list for the final pairs:
final_l = []
and visit all nodes:
for p in l:
if visited[p]:
continue
final_l.append((p[0], visit(p)))
Here's the final result:
>>> final_l
[(1, 4), (8, 13), (14, 16)]
If they don't overlap, then you can sort them, and then just combine adjacent ones.
Here's a generator that yields the new tuples:
def combine_ranges(L):
L = sorted(L) # Make a copy as we're going to remove items!
while L:
start, end = L.pop(0) # Get the first item
while L and L[0][0] == end:
# While the first of the rest connects to it, adjust
# the end and remove the first of the rest
_, end = L.pop(0)
yield (start, end)
print(list(combine_ranges(List)))
If speed is important, use a collections.deque instead of a list, so that the .pop(0) operations can be in constant speed.
non-recursive approach, using sorting (I've added more nodes to handle complex case):
l = [(1,4), (8,10), (19,25), (10,13), (14,16), (25,30), (30,34), (38,40)]
l = sorted(l)
r=[]
idx=0
while idx<len(l):
local=idx+1
previous_value = l[idx][1]
# search longest string
while local<len(l):
if l[local][0]!=previous_value:
break
previous_value = l[local][1]
local+=1
# store tuple
r.append((l[idx][0],l[local-1][1]))
idx = local
print(r)
result:
[(1, 4), (8, 13), (14, 16), (19, 34), (38, 40)]
The only drawback is that original sort order is not preserved. I don't know if it's a problem.
Here is one optimized recursion approach:
In [44]: def find_intersection(m_list):
for i, (v1, v2) in enumerate(m_list):
for j, (k1, k2) in enumerate(m_list[i + 1:], i + 1):
if v2 == k1:
m_list[i] = (v1, m_list.pop(j)[1])
return find_intersection(m_list)
return m_list
Demo:
In [45]: lst = [(1,4), (8,10), (19,25), (10,13), (14,16), (25,30)]
In [46]: find_intersection(lst)
Out[46]: [(1, 4), (8, 13), (19, 30), (14, 16)]
You can use a dictionary to map the different end indices to the range ending at that index; then just iterate the list sorted by start index and merge the segments accordingly:
def join_lists(lst):
ending = {} # will map end position to range
for start, end in sorted(lst): # iterate in sorted order
if start in ending:
ending[end] = (ending[start][0], end) # merge
del ending[start] # remove old value
else:
ending[end] = (start, end)
return list(ending.values()) # return remaining values from dict
Alternatively, as pointed out by Tomer W in comments, you can do without the sorting, by iterating the list twice, making this solution take only linear time (O(n)) w.r.t. the length of the list.
def join_lists(lst):
ending = {} # will map end position to range
# first pass: add to dictionary
for start, end in lst:
ending[end] = (start, end)
# second pass: lookup and merge
for start, end in lst:
if start in ending:
ending[end] = (ending[start][0], end)
del ending[start]
# return remaining values from dict
return list(ending.values())
Examples output, for both cases:
>>> join_lists([(1,4), (8,10), (19,25), (10,13), (14,16), (25,30)])
[(1, 4), (8, 13), (14, 16), (19, 30)]
>>> join_lists(lst = [(1, 4), (4, 8), (8, 10)])
[(1, 10)]
The list is first sorted and adjacent pairs of (min1, max1), (min2, max2) are merged together if they overlap.
MIN=0
MAX=1
def normalize(intervals):
isort = sorted(intervals)
for i in range(len(isort) - 1):
if isort[i][MAX] >= isort[i + 1][MIN]:
vmin = isort[i][MIN]
vmax = max(isort[i][MAX], isort[i + 1][MAX])
isort[i] = None
isort[i + 1] = (vmin, vmax)
return [r for r in isort if r is not None]
List1 = [(1,4), (8,10), (19,25), (10,13), (14,16), (25,30)]
List2 = [(1, 4), (4, 8), (8, 10)]
print(normalize(List1))
print(normalize(List2))
#[(1, 4), (8, 13), (14, 16), (19, 30)]
#[(1, 10)]
The following should work. It breaks tuples into individual numbers, then finds the tuple bound on each cluster. This should work even with difficult overlaps, like [(4, 10), (9, 12)]
It's a very simple fix.
# First turn your list of tuples into a list of numbers:
my_list = []
for item in List: my_list = my_list + [i for i in range(item[0], item[1]+1)]
# Then create tuple pairs:
output = []
a = False
for x in range(max(my_list)+1):
if (not a) and (x in my_list): a = x
if (a) and (x+1 not in my_list):
output.append((a, x))
a = False
print output

how to find the max number of items in a list such that certain pairs are not together in the output?

I have a list of numbers
l = [1,2,3,4,5]
and a list of tuples which describe which items should not be in the output together.
gl_distribute = [(1, 2), (1,4), (1, 5), (2, 3), (3, 4)]
the possible lists are
[1,3]
[2,4,5]
[3,5]
and I want my algorithm to give me the second one [2,4,5]
I was thinking to do it recursively.
In the first case (t1) I call my recursive algorithm with all the items except the 1st, and in the second case (t2) I call it again removing the pairs from gl_distribute where the 1st item appears.
Here is my algorithm
def check_distribute(items, distribute):
i = sorted(items[:])
d = distribute[:]
if not i:
return []
if not d:
return i
if len(remove_from_distribute(i, d)) == len(d):
return i
first = i[0]
rest = items[1:]
distr_without_first = remove_from_distribute([first], d)
t1 = check_distribute(rest, d)
t2 = check_distribute(rest, distr_without_first)
t2.append(first)
if len(t1) >= len(t2):
return t1
else:
return t2
The remove_from_distribute(items, distr_list) removes the pairs from distr_list that include any of the items in items.
def remove_from_distribute(items, distribute_list):
new_distr = distribute_list[:]
for item in items:
for pair in distribute_list:
x, y = pair
if x == item or y == item and pair in new_distr:
new_distr.remove((x,y))
if new_distr:
return new_distr
else:
return []
My output is [4, 5, 3, 2, 1] which obviously is not correct. Can you tell me what I am doing wrong here? Or can you give me a better way to approach this?
I will suggest an alternative approach.
Assuming your list and your distribution are sorted and your list is length of n, and your distribution is length of m.
First, create a list of two tuples with all valid combinations. This should be a O(n^2) solution.
Once you have the list, it's just a simple loop through the valid combination and find the longest list. There are probably some better solutions to further reduce the complexity.
Here are my sample codes:
def get_valid():
seq = [1, 2, 3, 4, 5]
gl_dist = [(1, 2), (1,4), (1, 5), (2, 3), (3, 4)]
gl_index = 0
valid = []
for i in xrange(len(seq)):
for j in xrange(i+1, len(seq)):
if gl_index < len(gl_dist):
if (seq[i], seq[j]) != gl_dist[gl_index] :
valid.append((seq[i], seq[j]))
else:
gl_index += 1
else:
valid.append((seq[i], seq[j]))
return valid
>>>> get_valid()
[(1, 3), (2, 4), (2, 5), (3, 5), (4, 5)]
def get_list():
total = get_valid()
start = total[0][0]
result = [start]
for i, j in total:
if i == start:
result.append(j)
else:
start = i
return_result = list(result)
result = [i, j]
yield return_result
yield list(result)
raise StopIteration
>>> list(get_list())
[[1, 3], [2, 4, 5], [3, 5], [4, 5]]
I am not sure I fully understand your output as I think 4,5 and 5,2 should be possible lists as they are not in the list of tuples:
If so you could use itertools to get the combinations and filter based on the gl_distribute list using sets to see if any two numbers in the different combinations in combs contains two elements that should not be together, then get the max
combs = (combinations(l,r) for r in range(2,len(l)))
final = []
for x in combs:
final += x
res = max(filter(lambda x: not any(len(set(x).intersection(s)) == 2 for s in gl_distribute),final),key=len)
print res
(2, 4, 5)

Python: determine length of sequence of equal items in list

I have a list as follows:
l = [0,0,0,0,0,0,1,1,1,1,1,1,0,0,0,0,2,2,2]
I want to determine the length of a sequence of equal items, i.e for the given list I want the output to be:
[(0, 6), (1, 6), (0, 4), (2, 3)]
(or a similar format).
I thought about using a defaultdict but it counts the occurrences of each item and accumulates it for the entire list, since I cannot have more than one key '0'.
Right now, my solution looks like this:
out = []
cnt = 0
last_x = l[0]
for x in l:
if x == last_x:
cnt += 1
else:
out.append((last_x, cnt))
cnt = 1
last_x = x
out.append((last_x, cnt))
print out
I am wondering if there is a more pythonic way of doing this.
You almost surely want to use itertools.groupby:
l = [0,0,0,0,0,0,1,1,1,1,1,1,0,0,0,0,2,2,2]
answer = []
for key, iter in itertools.groupby(l):
answer.append((key, len(list(iter))))
# answer is [(0, 6), (1, 6), (0, 4), (2, 3)]
If you want to make it more memory efficient, yet add more complexity, you can add a length function:
def length(l):
if hasattr(l, '__len__'):
return len(l)
else:
i = 0
for _ in l:
i += 1
return i
l = [0,0,0,0,0,0,1,1,1,1,1,1,0,0,0,0,2,2,2]
answer = []
for key, iter in itertools.groupby(l):
answer.append((key, length(iter)))
# answer is [(0, 6), (1, 6), (0, 4), (2, 3)]
Note though that I have not benchmarked the length() function, and it's quite possible it will slow you down.
Mike's answer is good, but the itertools._grouper returned by groupby will never have a __len__ method so there is no point testing for it
I use sum(1 for _ in i) to get the length of the itertools._grouper
>>> import itertools as it
>>> L = [0,0,0,0,0,0,1,1,1,1,1,1,0,0,0,0,2,2,2]
>>> [(k, sum(1 for _ in i)) for k, i in it.groupby(L)]
[(0, 6), (1, 6), (0, 4), (2, 3)]

Categories

Resources