Detect ranges in Python

Detect ranges in Python - python

I'm trying to solve this exercise in my coursework:
Create a function named detect_ranges that gets a list of integers as a parameter.
The function should then sort this list, and transform the list into another list where pairs are used for all the detected intervals.
So 3,4,5,6 is replaced by the pair (3,7).
Numbers that are not part of any interval result just single numbers.
The resulting list consists of these numbers and pairs, separated by commas. An example of how this function works:
print(detect_ranges([2,5,4,8,12,6,7,10,13]))
[2,(4,9),10,(12,14)]
I couldn't comprehend the exercise topic and can't think of how I can detect range. Do you guys have any hints or tips?

Another way of doing this. Although this method will not be as efficient as the other one, but since its an exercise, it will be easier to follow.
I have used zip function in python to do some stuff I explained below, you can check it here to know more about it.
1. First sort the list data, so you get: [2, 4, 5, 6, 7, 8, 10, 12, 13]
2. Then find the differences of increasing values in list. Like (4-2),(5-4), .. If the difference is <=1, then it will be part of a range:
(Also, insert a 0 in the front, just to account for the 1st element and make the obtained list's length equal to original list)
>>> diff = [j-i for i, j in zip(lst[:-1], lst[1:])]
>>> diff.insert(0, 0)
>>> diff
[0, 2, 1, 1, 1, 1, 2, 2, 1]
3. Now get positions in above list where difference is >= 2. This is to detect the ranges:
(Again, insert a 0 in the front, just to account for the 1st element, and make sure it gets picked in range detection)
>>> ind = [i for i,v in enumerate(diff) if v >= 2]
>>> ind.insert(0, 0)
>>> ind
[0, 1, 6, 7]
So the ranges are 0 to 1, 1 to 6, and 6 to 7 in your original list.
4. Group the elements together that will form ranges, using the ind list obtained:
>>> groups = [lst[i:j] for i,j in zip(ind, ind[1:]+[None])]
>>> groups
[[2], [4, 5, 6, 7, 8], [10], [12, 13]]
5. Finally obtain your desired ranges:
>>> ranges = [(i[0],i[-1]+1) if len(i)>1 else i[0] for i in groups]
>>> ranges
[2, (4, 9), 10, (12, 14)]
Putting it all in a function detect_ranges:
def detect_ranges(lst):
lst = sorted(lst)
diff = [j-i for i, j in zip(lst[:-1], lst[1:])]
diff.insert(0, 0)
ind = [i for i,v in enumerate(diff) if v >= 2]
ind.insert(0, 0)
groups = [lst[i:j] for i,j in zip(ind, ind[1:]+[None])]
ranges = [(i[0],i[-1]+1) if len(i)>1 else i[0] for i in groups]
return ranges
Examples:
>>> lst = [2,6,1,9,3,7,12,45,46,13,90,14,92]
>>> detect_ranges(lst)
[(1, 4), (6, 8), 9, (12, 15), (45, 47), 90, 92]
>>> lst = [12,43,43,11,4,3,6,6,9,9,10,78,32,23,22,98]
>>> detect_ranges(lst)
[(3, 5), (6, 7), (9, 13), (22, 24), 32, (43, 44), 78, 98]

Iterate through the elements and save the start of each interval.
def detect_ranges(xs):
it = iter(xs)
try:
start = next(it)
except StopIteration:
return
prev = start
for x in it:
if prev + 1 != x:
yield start, prev + 1
start = x
prev = x
yield start, prev + 1
Usage:
>>> xs = [2, 4, 5, 6, 7, 8, 10, 12, 13]
>>> ranges = list(detect_ranges(xs))
>>> ranges
[(2, 3), (4, 9), (10, 11), (12, 14)]
If you want to reduce single item intervals like (2, 3) to 2, you can do:
>>> ranges = [a if a + 1 == b else (a, b) for a, b in ranges]
>>> ranges
[2, (4, 9), 10, (12, 14)]

Related

How to find a variable sublist in a list with conditions?

I am trying to figure out the following problem. I have a list with integers.
list = [1, 2, 3, 5, 6, 9, 10]
The goal is to find the longest sub-list within the list. The sub-list is defined by having the difference between two integers not being more than 1 (or -1). In this example, the longest sub-list respecting this condition is:
lista = [1, 2, 3, 5, 6, 9, 10]
difference = []
i = 0
for number in range(len(lista)-1):
diff = lista[i]-lista[i+1]
difference.append(diff)
i += 1
print(difference)
winner = 0
ehdokas = 0
for a in difference:
if a == 1 or a == -1:
ehdokas += 1
else:
if ehdokas > winner:
winner = ehdokas
ehdokas = 0
if ehdokas > winner:
winner = ehdokas
print(winner)
Now, the "print(winner)" will print "2" whereas I wish that it would print "3" since the first three integers are "adjacent" to each other (1-2 = -1 , 2-3 = -1)
Basically I am trying to iterate through the list and calculate the difference between the adjacent integers and the calculate the consecutive number of "1" and "-1" in the "difference" list.
This code works sometimes, depending on the list fed through, sometimes it doesn't. Any improvement proposals would be highly appreciated.

Given:
lista = [1, 2, 3, 5, 6, 9, 10]
You can construct a new list of tuples that have the index and difference in the tuple:
diffs=[(i,f"{lista[i]}-{lista[i+1]}={lista[i]-lista[i+1]}",lista[i]-lista[i+1])
for i in range(len(lista)-1)]
>>> m
[(0, '1-2=-1', -1), (1, '2-3=-1', -1), (2, '3-5=-2', -2), (3, '5-6=-1', -1), (4, '6-9=-3', -3), (5, '9-10=-1', -1)]
Given a list like that, you can use groupby, max to find the longest length of the sub lists that satisfy that condition:
from itertools import groupby
lista = [1, 2, 3, 5, 6, 9, 10]
m=max((list(v) for k,v in groupby(
((i,lista[i]-lista[i+1]) for i in range(len(lista)-1)),
key=lambda t: t[1] in (-1,0,1)) if k),key=len)
>>> m
[(0, -1), (1, -1)]

A nice and simple solution based on more_itertools:
#!pip install more_itertools
l = [1, 2, 3, 5, 6, 9, 10]
import more_itertools as mit
sublists = []
for group in mit.consecutive_groups(l):
sublists.append(list(group))
max(sublists, key=len)
This outputs:
[1,2,3]
Which is the longest sublist of consecutive numbers.

Here's a solution without the use of libraries:
l = [1, 2, 3, 4, 10, 11, 20, 19, 18, 17, 16, 30, 29, 28, 27, 40, 41]
r = []
if len(l) > 1:
t = [l[0]]
for i in range(0, len(l)-1):
if abs(l[i]-l[i+1]) == 1:
t.append(l[i+1])
if len(t) > len(r):
r = t.copy()
else:
t.clear()
t.append(l[i+1])
print(r)
Which will print:
[20, 19, 18, 17, 16]

You can get differences between items and their predecessor with zip(). This will allow you to generate a list of break positions (the indexes of items that cannot be combined with their predecessor). Using zip on these breaking positions will allow you to get the start and end indexes of subsets of the list that form groups of consecutive "compatible" items. The difference between start and end is the size of the corresponding group.
L = [1, 2, 3, 5, 6, 9, 10]
breaks = [i for i,(a,b) in enumerate(zip(L,L[1:]),1) if abs(a-b)>1 ]
winner = max( e-s for s,e in zip([0]+breaks,breaks+[len(L)]) )
print(winner) # 3
If you want to see how the items are grouped, you can use the start/end indexes to get the subsets:
[ L[s:e] for s,e in zip([0]+breaks,breaks+[len(L)]) ]
[[1, 2, 3], [5, 6], [9, 10]]

Removing groups of duplicate elements in a list, but keeping the first in Python. Two Lists

Basically I have two lists (lengths around 4000). one has integers that represent state, the other has time values. I have groups of repeating integers in the state list that I want to remove, but keep the first, for each grouping. At the same time for any element removed in the state list I want to remove the element of the same index in the time list. I cannot use dictionaries.
(first time on this site please forgive me if i do this wrong)
This is the code I have tried so far, it cut my list in half, but I still have repeating states.
for i in range (len(state)):
if state[i] == state[i-1]:
state[i] = 0
tt_time[i] = 0
while 0 in state:
state.remove(0)
while 0 in tt_time:
tt_time.remove(0)
Example of what I want:
[4,4,4,5,5,5,4,4,3,3,5,5] => [4,5,4,3,5] (for state list)
at the same time:
[1,2,3,4,5,6,7,8,9,10,11,12] => [1,4,7,8,11] (for time list)
please note both lists are the same length

I would use groupby in this case:
from itertools import groupby
state = [4,4,4,5,5,5,4,4,3,3,5,5]
time = [1,2,3,4,5,6,7,8,9,10,11,12]
res1 = []
res2 = [time[0]]
for k, v in groupby(state):
res1.append(k)
res2.append(res2[-1] + len(list(v)))
res2.pop()
which produces:
# res1 -> [4, 5, 4, 3, 5]
# res2 -> [1, 4, 7, 9, 11]

Using zip
Ex:
state = [4,4,4,5,5,5,4,4,3,3,5,5]
time = [1,2,3,4,5,6,7,8,9,10,11,12]
result_state = []
result_time = []
for s, t in zip(state, time): #Iterate both lists
if not result_state: #Check if result lists are empty.
result_state.append(s)
result_time.append(t)
else:
if result_state[-1] != s: #Check if last element in result is not same as s
result_state.append(s)
result_time.append(t)
print(result_state)
print(result_time)
Output:
[4, 5, 4, 3, 5]
[1, 4, 7, 9, 11]

You can pair adjacent items of the state list by zipping it with itself but with a padding of a item different from the first item, so that you can use a list comprehension to filter items that are the same as the adjacent items. Zip the list with the tt_time list to pair the result with items in tt_time:
states, times = map(list, zip(*((a, t) for ((a, b), t) in zip(zip(state, [state[0] + 1] + state), tt_time) if a != b)))
states becomes:
[4, 5, 4, 3, 5]
times becomes:
[1, 4, 7, 9, 11]

Another solution using itertools.groupby:
from itertools import groupby
from operator import itemgetter
l1 = [4,4,4,5,5,5,4,4,3,3,5,5]
l2 = [1,2,3,4,5,6,7,8,9,10,11,12]
grouped = list(map(itemgetter(0), (list(g) for _, g in groupby(zip(l1, l2), key=itemgetter(0)))))
# [(4, 1), (5, 4), (4, 7), (3, 9), (5, 11)]
print(list(map(itemgetter(0), grouped)))
# [4, 5, 4, 3, 5]
print(list(map(itemgetter(1), grouped)))
# [1, 4, 7, 9, 11]

Indices of intersection of lists

Given two lists of equal length:
_list = [1, 4, 8, 7, 3, 15, 5, 0, 6]
_list2 = [7, 4, 0, 1, 5, 5, 7, 2, 2]
How do I try getting an output like this:
output = [(0,3), (1,1), (3,0), (6,4), (6,5), (7,2)]
Here the intersection of two lists are obtained and the common elements' indices are arranged in the list:
output = list of (index of an element in _list, where it appears in _list2)
Trying intersection with sets is not an option since the set removes the repeating elements.

Basic-Intermediate: As a generator:
def find_matching_indices(a, b):
for i, x in enumerate(a):
for j, y in enumerate(b):
if x == y:
yield i, j
list(find_matching_indices(list1_, list2_))
# [(0, 3), (1, 1), (3, 0), (3, 6), (6, 4), (6, 5), (7, 2)]
Basic-Intermediate: As a list comprehension:
[(i, j) for i, x in enumerate(list1_) for j, y in enumerate(list2_) if x == y]
# [(0, 3), (1, 1), (3, 0), (3, 6), (6, 4), (6, 5), (7, 2)]
These solutions involve two loops.
Intermediate-Advanced: For fun, a dictionary is another data structure you might consider:
import collections as ct
import more_itertools as mit
def locate_indices(a, b):
"""Return a dictionary of `a` index keys found at `b` indices."""
dd = ct.defaultdict(list)
for i, y in enumerate(a):
idxs = list(mit.locate(b, lambda z: z == y))
if idxs: dd[i].extend(idxs)
return dd
locate_indices(list1_, list2_)
# defaultdict(list, {0: [3], 1: [1], 3: [0, 6], 6: [4, 5], 7: [2]})
Note the index of list a is the key in the dictionary. All indices in list b that share the same value are appended.
A defaultdict was used since it is helpful in building dictionaries with list values. See more on the third-party tool more_itertools.locate(), which simply yields all indices that satisfy the lambda condition - an item in list a is also found in b.

from itertools import product
from collections import defaultdict
def mathcing_indices(*lists):
d = defaultdict(lambda: tuple([] for _ in range(len(lists))))
for l_idx, l in enumerate(lists):
for i, elem in enumerate(l):
d[elem][l_idx].append(i)
return sorted([tup for _, v in d.items() for tup in product(*v)])
This solution builds a dictionary that tracks the indices that values appear at in the input lists. So if the value 5 appears at indices 0 and 2 of the first list and index 3 of the second, the value for 5 in the dictionary would be ([0, 2], [3])
It then uses itertools.product to build all the combinations of those indices.
This looks more complicated than the other answers here, but because it is O(nlogn) and not O(n**2) it is significantly faster, especially for large inputs. Two length 1000 lists of random numbers 0-1000 complete 100 tests in ~.4 seconds using the above algorithm and 6-13 seconds using some of the others here

Here is a solution that runs in O(n log n):
ind1 = numpy.argsort(_list)
ind2 = numpy.argsort(_list2)
pairs = []
i = 0
j = 0
while i<ind1.size and j<ind2.size:
e1 = _list[ind1[i]]
e2 = _list2[ind2[j]]
if e1==e2:
pairs.append((ind1[i],ind2[j]))
i = i + 1
j = j + 1
elif e1<e2:
i = i +1
elif e2<e1:
j = j + 1
print(pairs)

group clusters of numbers in array

I have an array like:
A = [1,3,8,9,3,7,2,1,3,9,6,8,3,8,8,1,2]
And I want to count the number of "entry clusters" that are >5. In this case the result should be 4, because:
[1, 3, (8,9), 3, (7), 2, 1, 3, (9,6,8), 3, (8,8), 1, 2]
Given L length of the array, I can do:
A = [1,3,8,9,3,7,2,1,3,9,6,8,3,8,8,1,2]
A = np.array(A)
for k in range(0,L):
if A[k]>5:
print k, A[k]
and this gives me all entries greater than 5. But how could I group every cluster of numbers?

You could use the groupby function from itertools.
from itertools import groupby
A = [1,3,8,9,3,7,2,1,3,9,6,8,3,8,8,1,2]
result = [tuple(g) for k, g in groupby(A, lambda x: x > 5) if k]
print(result)
# [(8, 9), (7,), (9, 6, 8), (8, 8)]
print(len(result))
# 4

Detecting consecutive integers in a list [duplicate]

This question already has answers here:
Identify groups of consecutive numbers in a list
(19 answers)
Closed 4 years ago.
I have a list containing data as such:
[1, 2, 3, 4, 7, 8, 10, 11, 12, 13, 14]
I'd like to print out the ranges of consecutive integers:
1-4, 7-8, 10-14
Is there a built-in/fast/efficient way of doing this?

From the docs:
>>> from itertools import groupby
>>> from operator import itemgetter
>>> data = [ 1, 4,5,6, 10, 15,16,17,18, 22, 25,26,27,28]
>>> for k, g in groupby(enumerate(data), lambda (i, x): i-x):
... print map(itemgetter(1), g)
...
[1]
[4, 5, 6]
[10]
[15, 16, 17, 18]
[22]
[25, 26, 27, 28]
You can adapt this fairly easily to get a printed set of ranges.

A short solution that works without additional imports. It accepts any iterable, sorts unsorted inputs, and removes duplicate items:
def ranges(nums):
nums = sorted(set(nums))
gaps = [[s, e] for s, e in zip(nums, nums[1:]) if s+1 < e]
edges = iter(nums[:1] + sum(gaps, []) + nums[-1:])
return list(zip(edges, edges))
Example:
>>> ranges([2, 3, 4, 7, 8, 9, 15])
[(2, 4), (7, 9), (15, 15)]
>>> ranges([-1, 0, 1, 2, 3, 12, 13, 15, 100])
[(-1, 3), (12, 13), (15, 15), (100, 100)]
>>> ranges(range(100))
[(0, 99)]
>>> ranges([0])
[(0, 0)]
>>> ranges([])
[]
This is the same as #dansalmo's solution which I found amazing, albeit a bit hard to read and apply (as it's not given as a function).
Note that it could easily be modified to spit out "traditional" open ranges [start, end), by e.g. altering the return statement:
return [(s, e+1) for s, e in zip(edges, edges)]

This will print exactly as you specified:
>>> nums = [1, 2, 3, 4, 7, 8, 10, 11, 12, 13, 14]
>>> ranges = sum((list(t) for t in zip(nums, nums[1:]) if t[0]+1 != t[1]), [])
>>> iranges = iter(nums[0:1] + ranges + nums[-1:])
>>> print ', '.join([str(n) + '-' + str(next(iranges)) for n in iranges])
1-4, 7-8, 10-14
If the list has any single number ranges, they would be shown as n-n:
>>> nums = [1, 2, 3, 4, 5, 7, 8, 9, 12, 15, 16, 17, 18]
>>> ranges = sum((list(t) for t in zip(nums, nums[1:]) if t[0]+1 != t[1]), [])
>>> iranges = iter(nums[0:1] + ranges + nums[-1:])
>>> print ', '.join([str(n) + '-' + str(next(iranges)) for n in iranges])
1-5, 7-9, 12-12, 15-18

Built-In: No, as far as I'm aware.
You have to run through the array. Start off with putting the first value in a variable and print it, then as long as you keep hitting the next number do nothing but remember the last number in another variable. If the next number is not in line, check the last number remembered versus the first number. If it's the same, do nothing. If it's different, print "-" and the last number. Then put the current value in the first variable and start over.
At the end of the array you run the same routine as if you had hit a number out of line.
I could have written the code, of course, but I don't want to spoil your homework :-)

I had a similar problem and am using the following for a sorted list. It outputs a dictionary with ranges of values listed in a dictionary. The keys separate each run of consecutive numbers and are also the running total of non-sequential items between numbers in sequence.
Your list gives me an output of {0: [1, 4], 1: [7, 8], 2: [10, 14]}
def series_dictf(index_list):
from collections import defaultdict
series_dict = defaultdict(list)
sequence_dict = dict()
list_len = len(index_list)
series_interrupts = 0
for i in range(list_len):
if i == (list_len - 1):
break
position_a = index_list[i]
position_b = index_list[i + 1]
if position_b == (position_a + 1):
sequence_dict[position_a] = (series_interrupts)
sequence_dict[position_b] = (series_interrupts)
if position_b != (position_a + 1):
series_interrupts += 1
for position, series in sequence_dict.items():
series_dict[series].append(position)
for series, position in series_dict.items():
series_dict[series] = [position[0], position[-1]]
return series_dict

Using set operation, the following algorithm can be executed
def get_consecutive_integer_series(integer_list):
integer_list = sorted(integer_list)
start_item = integer_list[0]
end_item = integer_list[-1]
a = set(integer_list) # Set a
b = range(start_item, end_item+1)
# Pick items that are not in range.
c = set(b) - a # Set operation b-a
li = []
start = 0
for i in sorted(c):
end = b.index(i) # Get end point of the list slicing
li.append(b[start:end]) # Slice list using values
start = end + 1 # Increment the start point for next slicing
li.append(b[start:]) # Add the last series
for sliced_list in li:
if not sliced_list:
# list is empty
continue
if len(sliced_list) == 1:
# If only one item found in list
yield sliced_list[0]
else:
yield "{0}-{1}".format(sliced_list[0], sliced_list[-1])
a = [1, 2, 3, 6, 7, 8, 4, 14, 15, 21]
for series in get_consecutive_integer_series(a):
print series
Output for the above list "a"
1-4
6-8
14-15
21

Here is another basic solution without using any module, which is good for interview, generally in the interview they asked without using any modules:
#!/usr/bin/python
def split_list(n):
"""will return the list index"""
return [(x+1) for x,y in zip(n, n[1:]) if y-x != 1]
def get_sub_list(my_list):
"""will split the list base on the index"""
my_index = split_list(my_list)
output = list()
prev = 0
for index in my_index:
new_list = [ x for x in my_list[prev:] if x < index]
output.append(new_list)
prev += len(new_list)
output.append([ x for x in my_list[prev:]])
return output
my_list = [1, 3, 4, 7, 8, 10, 11, 13, 14]
print get_sub_list(my_list)
Output:
[[1], [3, 4], [7, 8], [10, 11], [13, 14]]

You can use collections library which has a class called Counter. Counter can come in handy if trying to poll the no of distinct elements in any iterable
from collections import Counter
data = [ 1, 4,5,6, 10, 15,16,17,18, 22, 25,26,27,28]
cnt=Counter(data)
print(cnt)
the output for this looks like
Counter({1: 1, 4: 1, 5: 1, 6: 1, 10: 1, 15: 1, 16: 1, 17: 1, 18: 1, 22: 1, 25: 1, 26: 1, 27: 1, 28: 1})
which just like any other dictionary can be polled for key values

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Detect ranges in Python - python

Related

How to find a variable sublist in a list with conditions?

Removing groups of duplicate elements in a list, but keeping the first in Python. Two Lists

Indices of intersection of lists

group clusters of numbers in array

Detecting consecutive integers in a list [duplicate]

Categories

Resources