Peak finder in Python in O(log n) complexity - python

I'm completely new to Python, thus the question. I'm trying to solve a standard interview question, which is finding a peak in an array. A peak is defined as a number which is greater than it's left and right neighbor. I'm trying to find the largest such peak.
This is my code:
def main():
arr = [7, 12, 13, 8, 2, 16, 24, 11, 5, 1]
print(find_peak(arr))
def find_peak(arr):
return _find_peak(arr, 0, len(arr))
def _find_peak(arr, start, stop):
mid = (start + stop) // 2
if arr[mid] > arr[mid - 1] and arr[mid] > arr[mid + 1]:
return arr[mid]
elif arr[mid] < arr[mid - 1]:
_find_peak(arr, 0, mid - 1)
elif arr[mid] < arr[mid + 1]:
_find_peak(arr, mid + 1, stop)
if __name__ == '__main__':
main()
The output of this program is None, where as the expected output is 24. Any help appreciated.

Data
arr = [7, 12, 13, 8, 2, 16, 24, 11, 5, 1]
A one-liner:
One line should be enough:
max_peak = max(x2 for x1, x2, x3 in zip(arr, arr[1:], arr[2:]) if x1 < x2 > x3)
In a loop
Maybe easier to understand when you are new to Python:
peak = float('-inf')
for x1, x2, x3 in zip(arr, arr[1:], arr[2:]):
if x1 < x2 > x3:
peak = max(peak, x2)
print(peak)
Output:
24
All peaks
You can also use a one-liner to get all peaks:
>>> [x2 for x1, x2, x3 in zip(arr, arr[1:], arr[2:]) if x1 < x2 > x3]
[13, 24]
and get the greatest one with max() on the result.
Explanation
Let's have a look at some of the components of the solution. I am working with Python 3 here, as everybody should. ;)
You can slice lists.
>>> arr = [7, 12, 13, 8, 2, 16, 24, 11, 5, 1]
This gives you all of the list but the first element:
>>> arr[1:]
[12, 13, 8, 2, 16, 24, 11, 5, 1]
Here its starts with element three:
>>> arr[2:]
[13, 8, 2, 16, 24, 11, 5, 1]
The zip() function zips multiple sequences together. To visualize what happens, you can convert the zip object into a list:
>>> list(zip(arr, arr[1:], arr[2:]))
[(7, 12, 13),
(12, 13, 8),
(13, 8, 2),
(8, 2, 16),
(2, 16, 24),
(16, 24, 11),
(24, 11, 5),
(11, 5, 1)]
Python supports tuple unpacking. This allows to assign individual names to all members of a tuple:
>>> x1, x2, x3 = (7, 12, 13)
>>> x1
7
>>> x2
12
>>> x3
13
Another nice feature is the comparison of more than two objects:
>>> 10 < 12 > 8
True
This is equivalent to:
>>> (10 < 12) and (12 > 8)
True
Python offers list comprehensions:
>>> [x * 2 for x in range(2, 6)]
[4, 6, 8, 10]
Generator expression work in a similar way but don't produce a list but an iterator and can be consumed without using lots of memory:
>>> sum(x * 2 for x in range(2, 6))
28

you are missing a return statement for your two elif cases

I think the 13 also is a peak (greater than 12 and 8).
Try this approach:
def main():
arr = [7, 12, 13, 8, 2, 16, 24, 11, 5, 1]
print(find_peaks(arr))
def find_peaks(arr):
return list(_search(arr))
def _search(arr):
last = len(arr) - 1
for i, e in enumerate(arr):
if not any((i > 0 and arr[i-1] > e, i < last and arr[i+1] > e)):
yield e
if __name__ == '__main__':
main()
If you don’t understand anything, ask!

Another approach – using only one function:
def main():
arr = [7, 12, 13, 8, 2, 16, 24, 11, 5, 1]
print(find_peaks(arr))
def find_peaks(arr):
last = len(arr) - 1
return [
e for i, e in enumerate(arr)
if not any((i > 0 and arr[i-1] > e, i < last and arr[i+1] > e))
]
if __name__ == '__main__':
main()

I don't think you can find a peak in O(log N) time, because by definition the items cannot be in order, and there is no way to predict the peaky-ness of any item in a list given other items, except that comparing item N with item N+1 is presumably reflexive - it tells you that either N or N+1 might be a peak. That gets you to N/2 compares, which must then be followed by N/2 more compares to check the other side of the peak.
Here's a local_maxima(iterable) function that you can use with max() to find peaks. It treats start/end elements as peaks if they are greater than their one neighbor.
data = [7, 12, 13, 8, 2, 16, 24, 11, 5, 1, None, 2, None, 3, 4, None, 5, 1, None]
firstpeak = [12, 7, 9, 8]
lastpeak = [1, 2, 3, 4]
def local_maxima(it):
"""Find local maxima in iterable _it_. Compares with None using
`is (not) None`, and using operator `<`."""
peaking = False
last = None
for item in it:
# Deal with last item peaking
if peaking and (item is None or item < last):
yield last
peaking = False
elif item is None:
peaking = False
elif last is None or last < item:
peaking = True
else:
peaking = False
last = item
if peaking:
yield last
print([x for x in local_maxima(data)])
print("Highest:", max(local_maxima(data)))
print([x for x in local_maxima(firstpeak)])
print("Highest:", max(local_maxima(firstpeak)))
print([x for x in local_maxima(lastpeak)])
print("Highest:", max(local_maxima(lastpeak)))

Related

Using list comp to print pairs(x,y) which add up to a certain number,. Like if i enter 5 as input,then i should get the output as [[1,4],[2,3],[3,2]] [duplicate]

I have a list of numbers, e.g.
numbers = [1, 2, 3, 7, 7, 9, 10]
As you can see, numbers may appear more than once in this list.
I need to get all combinations of these numbers that have a given sum, e.g. 10.
The items in the combinations may not be repeated, but each item in numbers has to be treated uniquely, that means e.g. the two 7 in the list represent different items with the same value.
The order is unimportant, so that [1, 9] and [9, 1] are the same combination.
There are no length restrictions for the combinations, [10] is as valid as [1, 2, 7].
How can I create a list of all combinations meeting the criteria above?
In this example, it would be [[1,2,7], [1,2,7], [1,9], [3,7], [3,7], [10]]
You could use itertools to iterate through every combination of every possible size, and filter out everything that doesn't sum to 10:
import itertools
numbers = [1, 2, 3, 7, 7, 9, 10]
target = 10
result = [seq for i in range(len(numbers), 0, -1)
for seq in itertools.combinations(numbers, i)
if sum(seq) == target]
print(result)
Result:
[(1, 2, 7), (1, 2, 7), (1, 9), (3, 7), (3, 7), (10,)]
Unfortunately this is something like O(2^N) complexity, so it isn't suitable for input lists larger than, say, 20 elements.
The solution #kgoodrick offered is great but I think it is more useful as a generator:
def subset_sum(numbers, target, partial=[], partial_sum=0):
if partial_sum == target:
yield partial
if partial_sum >= target:
return
for i, n in enumerate(numbers):
remaining = numbers[i + 1:]
yield from subset_sum(remaining, target, partial + [n], partial_sum + n)
Output:
print(list(subset_sum([1, 2, 3, 7, 7, 9, 10], 10)))
# [[1, 2, 7], [1, 2, 7], [1, 9], [3, 7], [3, 7], [10]]
This question has been asked before, see #msalvadores answer here. I updated the python code given to run in python 3:
def subset_sum(numbers, target, partial=[]):
s = sum(partial)
# check if the partial sum is equals to target
if s == target:
print("sum(%s)=%s" % (partial, target))
if s >= target:
return # if we reach the number why bother to continue
for i in range(len(numbers)):
n = numbers[i]
remaining = numbers[i + 1:]
subset_sum(remaining, target, partial + [n])
if __name__ == "__main__":
subset_sum([3, 3, 9, 8, 4, 5, 7, 10], 15)
# Outputs:
# sum([3, 8, 4])=15
# sum([3, 5, 7])=15
# sum([8, 7])=15
# sum([5, 10])=15
#qasimalbaqali
This may not be exactly what the post is looking for, but if you wanted to:
Find all combinations of a range of numbers [lst], where each lst contains N number of elements, and that sum up to K: use this:
# Python3 program to find all pairs in a list of integers with given sum
from itertools import combinations
def findPairs(lst, K, N):
return [pair for pair in combinations(lst, N) if sum(pair) == K]
#monthly cost range; unique numbers
lst = list(range(10, 30))
#sum of annual revenue per machine/customer
K = 200
#number of months (12 - 9 = num months free)
N = 9
print('Possible monthly subscription costs that still equate to $200 per year:')
#print(findPairs(lst, K, N))
findPairs(lst,K,N)
Results:
Possible monthly subscription costs that still equate to $200 per year:
Out[27]:
[(10, 11, 20, 24, 25, 26, 27, 28, 29),
(10, 11, 21, 23, 25, 26, 27, 28, 29),
(10, 11, 22, 23, 24, 26, 27, 28, 29),
The idea/question behind this was "how much can we charge per month if we give x number of months free and still meet revenue targets".
This works...
from itertools import combinations
def SumTheList(thelist, target):
arr = []
p = []
if len(thelist) > 0:
for r in range(0,len(thelist)+1):
arr += list(combinations(thelist, r))
for item in arr:
if sum(item) == target:
p.append(item)
return p
Append: including zero.
import random as rd
def combine(soma, menor, maior):
"""All combinations of 'n' sticks and '3' plus sinals.
seq = [menor, menor+1, ..., maior]
menor = min(seq); maior = max(seq)"""
lista = []
while len(set(lista)) < 286: # This number is defined by the combination
# of (sum + #summands - 1, #summands - 1) -> choose(13, 3)
zero = rd.randint(menor, maior)
if zero == soma and (zero, 0, 0, 0) not in lista:
lista.append((zero, 0, 0, 0))
else:
# You can add more summands!
um = rd.randint(0, soma - zero)
dois = rd.randint(0, soma - zero - um)
tres = rd.randint(0, soma - zero - um - dois)
if (zero + um + dois + tres == soma and
(zero, um, dois, tres) not in lista):
lista.append((zero, um, dois, tres))
return sorted(lista)
>>> result_sum = 10
>>> combine(result_sum, 0, 10)
Output
[(0,0,0,10), (0,0,1,9), (0,0,2,8), (0,0,3,7), ...,
(9,1,0,0), (10,0,0,0)]

Checking if n elements in an array are increasing

I have written a code for SPC and I am attempting to highlight certain out of control runs.
So I was wondering if there was a way to pull out n(in my case 7) amount of increasing elements in an array so I can index with with the color red when I go to plot them.
This is what I attempted but I obviously get an indexing error.
import numpy as np
import matplotlib.pyplot as plt
y = np.linspace(0,10,15)
x = np.array([1,2,3,4,5,6,7,8,9,1,4,6,4,6,8])
col =[]
for i in range(len(x)):
if x[i]<x[i+1] and x[i+1]<x[i+2] and x[i+2]<x[i+3] and x[i+3]<x[i+4] and x[i+4]<x[i+5] and x[i+5]<x[i+6] and x[i+6]<x[i+7]:
col.append('red')
elif x[i]>x[i+1] and x[i+1]>x[i+2] and x[i+2]>x[i+3] and x[i+3]>x[i+4] and x[i+4]>x[i+5] and x[i+5]>x[i+6] and x[i+6]>x[i+7]:
col.append('red')
else:
col.append('blue')
for i in range(len(x)):
# plotting the corresponding x with y
# and respective color
plt.scatter(y[i], x[i], c = col[i], s = 10,
linewidth = 0)
Any help would be greatly appreciated!
As Andy said in his comment you get the index error because at i=8 you get to 15 which is the length of x.
Either you only loop over len(x)-7 and just repeat the last entry in col 7 times or you could do something like this:
import numpy as np
import matplotlib.pyplot as plt
y = np.linspace(0,10,20)
x = np.array([1,2,3,4,5,6,1,2,3,1,0,-1,-2,-3,-4,-5,-6,4,5])
col =[]
diff = np.diff(x) # get diff to see if x inc + or dec - // len(x)-1
diff_sign = np.diff(np.sign(diff)) # get difference of the signs to get either 1 (true) or 0 (false) // len(x)-2
zero_crossings = np.where(diff_sign)[0] + 2 # get indices (-2 from len(x)-2) where a zero crossing occures
diff_zero_crossings = np.diff(np.concatenate([[0],zero_crossings,[len(x)]])) # get how long the periods are till next zero crossing
for i in diff_zero_crossings:
if i >= 6:
for _ in range(i):
col.append("r")
else:
for _ in range(i):
col.append("b")
for i in range(len(x)):
# plotting the corresponding x with y
# and respective color
plt.scatter(y[i], x[i], c = col[i], s = 10,
linewidth = 0)
plt.show()
To determine if all integer elements of a list are ascending, you could do this:-
def ascending(arr):
_rv = True
for i in range(len(arr) - 1):
if arr[i + 1] <= arr[i]:
_rv = False
break
return _rv
a1 = [1, 2, 3, 4, 5, 6, 7, 8, 9, 8, 10, 11, 12, 13, 14, 16]
a2 = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16]
print(ascending(a1))
print(ascending(a2))
If you want to limit the sequence of ascending values then you could just use nested loops. It may look inelegant but it's surprisingly efficient and much simpler than bringing dataframes into the mix:-
def ascending(arr, seq):
for i in range(len(arr) - seq + 1):
state = True
for j in range(i, i + seq - 1):
if arr[j] >= arr[j + 1]:
state = False
break
if state:
return True
return False
a1 = [100, 99, 98, 6, 7, 8, 10, 11, 12, 13, 14, 13]
a2 = [9, 8, 7, 6, 5, 4, 3, 2, 1]
print(ascending(a1, 7))
print(ascending(a2, 7))

How to get postion range of a list when give any number?

I have list [1, 2, 5, 6, 9, 10, 14, 19], how can i get any number range index.
For example:
l = [1, 2, 5, 6, 9, 10, 14, 19]
value = 11
range_index = get_range_index(l)
range_index = (5, 6) # need like this
# give a value = 11, need to get value index like (5, 6), because 10 < value < 14.
# the list size may be very very long,can there have good method?
This i try to get left value and calculate index by returned left value.
It's not very good and not high performance.
def get_left_point(self, data, value):
if len(data) == 1:
return data[0]
mid_index, mid_value = len(data) // 2, data[len(data) // 2]
if value >= float(mid_value):
ret = self.get_left_point(data[mid_index:], value)
else:
ret = self.get_left_point(data[:mid_index], value)
return ret
my_range = (100, 101)
[x for x in l if (my_range[0] <= x) and (x <= my_range[1])]

how to split an array or a list according to the values [duplicate]

I'd like to identify groups of consecutive numbers in a list, so that:
myfunc([2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 20])
Returns:
[(2,5), (12,17), 20]
And was wondering what the best way to do this was (particularly if there's something inbuilt into Python).
Edit: Note I originally forgot to mention that individual numbers should be returned as individual numbers, not ranges.
EDIT 2: To answer the OP new requirement
ranges = []
for key, group in groupby(enumerate(data), lambda (index, item): index - item):
group = map(itemgetter(1), group)
if len(group) > 1:
ranges.append(xrange(group[0], group[-1]))
else:
ranges.append(group[0])
Output:
[xrange(2, 5), xrange(12, 17), 20]
You can replace xrange with range or any other custom class.
Python docs have a very neat recipe for this:
from operator import itemgetter
from itertools import groupby
data = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17]
for k, g in groupby(enumerate(data), lambda (i,x):i-x):
print(map(itemgetter(1), g))
Output:
[2, 3, 4, 5]
[12, 13, 14, 15, 16, 17]
If you want to get the exact same output, you can do this:
ranges = []
for k, g in groupby(enumerate(data), lambda (i,x):i-x):
group = map(itemgetter(1), g)
ranges.append((group[0], group[-1]))
output:
[(2, 5), (12, 17)]
EDIT: The example is already explained in the documentation but maybe I should explain it more:
The key to the solution is
differencing with a range so that
consecutive numbers all appear in same
group.
If the data was: [2, 3, 4, 5, 12, 13, 14, 15, 16, 17]
Then groupby(enumerate(data), lambda (i,x):i-x) is equivalent of the following:
groupby(
[(0, 2), (1, 3), (2, 4), (3, 5), (4, 12),
(5, 13), (6, 14), (7, 15), (8, 16), (9, 17)],
lambda (i,x):i-x
)
The lambda function subtracts the element index from the element value. So when you apply the lambda on each item. You'll get the following keys for groupby:
[-2, -2, -2, -2, -8, -8, -8, -8, -8, -8]
groupby groups elements by equal key value, so the first 4 elements will be grouped together and so forth.
I hope this makes it more readable.
python 3 version may be helpful for beginners
import the libraries required first
from itertools import groupby
from operator import itemgetter
ranges =[]
for k,g in groupby(enumerate(data),lambda x:x[0]-x[1]):
group = (map(itemgetter(1),g))
group = list(map(int,group))
ranges.append((group[0],group[-1]))
more_itertools.consecutive_groups was added in version 4.0.
Demo
import more_itertools as mit
iterable = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 20]
[list(group) for group in mit.consecutive_groups(iterable)]
# [[2, 3, 4, 5], [12, 13, 14, 15, 16, 17], [20]]
Code
Applying this tool, we make a generator function that finds ranges of consecutive numbers.
def find_ranges(iterable):
"""Yield range of consecutive numbers."""
for group in mit.consecutive_groups(iterable):
group = list(group)
if len(group) == 1:
yield group[0]
else:
yield group[0], group[-1]
iterable = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 20]
list(find_ranges(iterable))
# [(2, 5), (12, 17), 20]
The source implementation emulates a classic recipe (as demonstrated by #Nadia Alramli).
Note: more_itertools is a third-party package installable via pip install more_itertools.
The "naive" solution which I find somewhat readable atleast.
x = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 22, 25, 26, 28, 51, 52, 57]
def group(L):
first = last = L[0]
for n in L[1:]:
if n - 1 == last: # Part of the group, bump the end
last = n
else: # Not part of the group, yield current group and start a new
yield first, last
first = last = n
yield first, last # Yield the last group
>>>print list(group(x))
[(2, 5), (12, 17), (22, 22), (25, 26), (28, 28), (51, 52), (57, 57)]
Assuming your list is sorted:
>>> from itertools import groupby
>>> def ranges(lst):
pos = (j - i for i, j in enumerate(lst))
t = 0
for i, els in groupby(pos):
l = len(list(els))
el = lst[t]
t += l
yield range(el, el+l)
>>> lst = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17]
>>> list(ranges(lst))
[range(2, 6), range(12, 18)]
Here it is something that should work, without any import needed:
def myfunc(lst):
ret = []
a = b = lst[0] # a and b are range's bounds
for el in lst[1:]:
if el == b+1:
b = el # range grows
else: # range ended
ret.append(a if a==b else (a,b)) # is a single or a range?
a = b = el # let's start again with a single
ret.append(a if a==b else (a,b)) # corner case for last single/range
return ret
Please note that the code using groupby doesn't work as given in Python 3 so use this.
for k, g in groupby(enumerate(data), lambda x:x[0]-x[1]):
group = list(map(itemgetter(1), g))
ranges.append((group[0], group[-1]))
This doesn't use a standard function - it just iiterates over the input, but it should work:
def myfunc(l):
r = []
p = q = None
for x in l + [-1]:
if x - 1 == q:
q += 1
else:
if p:
if q > p:
r.append('%s-%s' % (p, q))
else:
r.append(str(p))
p = q = x
return '(%s)' % ', '.join(r)
Note that it requires that the input contains only positive numbers in ascending order. You should validate the input, but this code is omitted for clarity.
import numpy as np
myarray = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 20]
sequences = np.split(myarray, np.array(np.where(np.diff(myarray) > 1)[0]) + 1)
l = []
for s in sequences:
if len(s) > 1:
l.append((np.min(s), np.max(s)))
else:
l.append(s[0])
print(l)
Output:
[(2, 5), (12, 17), 20]
I think this way is simpler than any of the answers I've seen here (Edit: fixed based on comment from Pleastry):
data = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 20]
starts = [x for x in data if x-1 not in data and x+1 in data]
ends = [x for x in data if x-1 in data and x+1 not in data and x not in starts]
singles = [x for x in data if x-1 not in data and x+1 not in data]
list(zip(starts, ends)) + singles
Output:
[(2, 5), (12, 17), 20]
Edited:
As #dawg notes, this is O(n**2). One option to improve performance would be to convert the original list to a set (and also the starts list to a set) i.e.
data = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 20]
data_as_set = set(data)
starts = [x for x in data_as_set if x-1 not in data_as_set and x+1 in data_as_set]
startset = set(starts)
ends = [x for x in data_as_set if x-1 in data_as_set and x+1 not in data_as_set and x not in startset]
singles = [x for x in data_as_set if x-1 not in data_as_set and x+1 not in data_as_set]
print(list(zip(starts, ends)) + singles)
Using groupby and count from itertools gives us a short solution. The idea is that, in an increasing sequence, the difference between the index and the value will remain the same.
In order to keep track of the index, we can use an itertools.count, which makes the code cleaner as using enumerate:
from itertools import groupby, count
def intervals(data):
out = []
counter = count()
for key, group in groupby(data, key = lambda x: x-next(counter)):
block = list(group)
out.append([block[0], block[-1]])
return out
Some sample output:
print(intervals([0, 1, 3, 4, 6]))
# [[0, 1], [3, 4], [6, 6]]
print(intervals([2, 3, 4, 5]))
# [[2, 5]]
This is my method in which I tried to prioritize readability. Note that it returns a tuple of the same values if there is only one value in a group. That can be fixed easily in the second snippet I'll post.
def group(values):
"""return the first and last value of each continuous set in a list of sorted values"""
values = sorted(values)
first = last = values[0]
for index in values[1:]:
if index - last > 1: # triggered if in a new group
yield first, last
first = index # update first only if in a new group
last = index # update last on every iteration
yield first, last # this is needed to yield the last set of numbers
Here is the result of a test:
values = [0, 5, 6, 7, 12, 13, 21, 22, 23, 24, 25, 26, 30, 44, 45, 50]
result = list(group(values))
print(result)
result = [(0, 0), (5, 7), (12, 13), (21, 26), (30, 30), (44, 45), (50, 50)]
If you want to return only a single value in the case of a single value in a group, just add a conditional check to the yields:
def group(values):
"""return the first and last value of each continuous set in a list of sorted values"""
values = sorted(values)
first = last = values[0]
for index in values[1:]:
if index - last > 1: # triggered if in a new group
if first == last:
yield first
else:
yield first, last
first = index # update first only if in a new group
last = index # update last on every iteration
if first == last:
yield first
else:
yield first, last
result = [0, (5, 7), (12, 13), (21, 26), 30, (44, 45), 50]
Here's the answer I came up with. I'm writing the code for other people to understand, so I'm fairly verbose with variable names and comments.
First a quick helper function:
def getpreviousitem(mylist,myitem):
'''Given a list and an item, return previous item in list'''
for position, item in enumerate(mylist):
if item == myitem:
# First item has no previous item
if position == 0:
return None
# Return previous item
return mylist[position-1]
And then the actual code:
def getranges(cpulist):
'''Given a sorted list of numbers, return a list of ranges'''
rangelist = []
inrange = False
for item in cpulist:
previousitem = getpreviousitem(cpulist,item)
if previousitem == item - 1:
# We're in a range
if inrange == True:
# It's an existing range - change the end to the current item
newrange[1] = item
else:
# We've found a new range.
newrange = [item-1,item]
# Update to show we are now in a range
inrange = True
else:
# We were in a range but now it just ended
if inrange == True:
# Save the old range
rangelist.append(newrange)
# Update to show we're no longer in a range
inrange = False
# Add the final range found to our list
if inrange == True:
rangelist.append(newrange)
return rangelist
Example run:
getranges([2, 3, 4, 5, 12, 13, 14, 15, 16, 17])
returns:
[[2, 5], [12, 17]]
Using numpy + comprehension lists:
With numpy diff function, consequent input vector entries that their difference is not equal to one can be identified. The start and end of the input vector need to be considered.
import numpy as np
data = np.array([2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 20])
d = [i for i, df in enumerate(np.diff(data)) if df!= 1]
d = np.hstack([-1, d, len(data)-1]) # add first and last elements
d = np.vstack([d[:-1]+1, d[1:]]).T
print(data[d])
Output:
[[ 2 5]
[12 17]
[20 20]]
Note: The request that individual numbers should be treated differently, (returned as individual, not ranges) was omitted. This can be reached by further post-processing the results. Usually this will make things more complex without gaining any benefit.
One-liner in Python 2.7 if interested:
x = [2, 3, 6, 7, 8, 14, 15, 19, 20, 21]
d = iter(x[:1] + sum(([i1, i2] for i1, i2 in zip(x, x[1:] + x[:1]) if i2 != i1+1), []))
print zip(d, d)
>>> [(2, 3), (6, 8), (14, 15), (19, 21)]
A short solution that works without additional imports. It accepts any iterable, sorts unsorted inputs, and removes duplicate items:
def ranges(nums):
nums = sorted(set(nums))
gaps = [[s, e] for s, e in zip(nums, nums[1:]) if s+1 < e]
edges = iter(nums[:1] + sum(gaps, []) + nums[-1:])
return list(zip(edges, edges))
Example:
>>> ranges([2, 3, 4, 7, 8, 9, 15])
[(2, 4), (7, 9), (15, 15)]
>>> ranges([-1, 0, 1, 2, 3, 12, 13, 15, 100])
[(-1, 3), (12, 13), (15, 15), (100, 100)]
>>> ranges(range(100))
[(0, 99)]
>>> ranges([0])
[(0, 0)]
>>> ranges([])
[]
This is the same as #dansalmo's solution which I found amazing, albeit a bit hard to read and apply (as it's not given as a function).
Note that it could easily be modified to spit out "traditional" open ranges [start, end), by e.g. altering the return statement:
return [(s, e+1) for s, e in zip(edges, edges)]
I copied this answer over from another question that was marked as a duplicate of this one with the intention to make it easier findable (after I just now searched again for this topic, finding only the question here at first and not being satisfied with the answers given).
The versions by Mark Byers, Andrea Ambu, SilentGhost, Nadia Alramli, and truppo are simple and fast. The 'truppo' version encouraged me to write a version that retains the same nimble behavior while handling step sizes other than 1 (and lists as singletons elements that don't extend more than 1 step with a given step size). It is given here.
>>> list(ranges([1,2,3,4,3,2,1,3,5,7,11,1,2,3]))
[(1, 4, 1), (3, 1, -1), (3, 7, 2), 11, (1, 3, 1)]
Not the best approach , but here is my 2 cents
def getConsecutiveValues2(arr):
x = ""
final = []
end = 0
start = 0
for i in range(1,len(arr)) :
if arr[i] - arr[i-1] == 1 :
end = i
else :
print(start,end)
final.append(arr[start:end+1])
start = i
if i == len(arr) - 1 :
final.append(arr[start:end+1])
return final
x = [1,2,3,5,6,8,9,10,11,12]
print(getConsecutiveValues2(x))
>> [[1, 2, 3], [5, 6], [8, 9, 10, 11]]
This implementation works for regular or irregular steps
I needed to achieve the same thing but with the slight difference where steps can be irregular. this is my implementation
def ranges(l):
if not len(l):
return range(0,0)
elif len(l)==1:
return range(l[0],l[0]+1)
# get steps
sl = sorted(l)
steps = [i-j for i,j in zip(sl[1:],sl[:-1])]
# get unique steps indexes range
groups = [[0,0,steps[0]],]
for i,s in enumerate(steps):
if s==groups[-1][-1]:
groups[-1][1] = i+1
else:
groups.append( [i+1,i+1,s] )
g2 = groups[-2]
if g2[0]==g2[1]:
if sl[i+1]-sl[i]==s:
_=groups.pop(-2)
groups[-1][0] = i
# create list of ranges
return [range(sl[i],sl[j]+s,s) if s!=0 else [sl[i]]*(j+1-i) for i,j,s in groups]
Here's an example
from timeit import timeit
# for regular ranges
l = list(range(1000000))
ranges(l)
>>> [range(0, 1000000)]
l = list(range(10)) + list(range(20,25)) + [1,2,3]
ranges(l)
>>> [range(0, 2), range(1, 3), range(2, 4), range(3, 10), range(20, 25)]
sorted(l);[list(i) for i in ranges(l)]
>>> [0, 1, 1, 2, 2, 3, 3, 4, 5, 6, 7, 8, 9, 20, 21, 22, 23, 24]
>>> [[0, 1], [1, 2], [2, 3], [3, 4, 5, 6, 7, 8, 9], [20, 21, 22, 23, 24]]
# for irregular steps list
l = [1, 3, 5, 7, 10, 11, 12, 100, 200, 300, 400, 60, 99, 4000,4001]
ranges(l)
>>> [range(1, 9, 2), range(10, 13), range(60, 138, 39), range(100, 500, 100), range(4000, 4002)]
## Speed test
timeit("ranges(l)","from __main__ import ranges,l", number=1000)/1000
>>> 9.303160999934334e-06
Yet another solution if you expect your input to be a set:
def group_years(years):
consecutive_years = []
for year in years:
close = {y for y in years if abs(y - year) == 1}
for group in consecutive_years:
if len(close.intersection(group)):
group |= close
break
else:
consecutive_years.append({year, *close})
return consecutive_years
Example:
group_years({2016, 2017, 2019, 2020, 2022})
Out[54]: [{2016, 2017}, {2019, 2020}, {2022}]

How do I split a list when items (ints) are 1 away from each other? [duplicate]

I'd like to identify groups of consecutive numbers in a list, so that:
myfunc([2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 20])
Returns:
[(2,5), (12,17), 20]
And was wondering what the best way to do this was (particularly if there's something inbuilt into Python).
Edit: Note I originally forgot to mention that individual numbers should be returned as individual numbers, not ranges.
EDIT 2: To answer the OP new requirement
ranges = []
for key, group in groupby(enumerate(data), lambda (index, item): index - item):
group = map(itemgetter(1), group)
if len(group) > 1:
ranges.append(xrange(group[0], group[-1]))
else:
ranges.append(group[0])
Output:
[xrange(2, 5), xrange(12, 17), 20]
You can replace xrange with range or any other custom class.
Python docs have a very neat recipe for this:
from operator import itemgetter
from itertools import groupby
data = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17]
for k, g in groupby(enumerate(data), lambda (i,x):i-x):
print(map(itemgetter(1), g))
Output:
[2, 3, 4, 5]
[12, 13, 14, 15, 16, 17]
If you want to get the exact same output, you can do this:
ranges = []
for k, g in groupby(enumerate(data), lambda (i,x):i-x):
group = map(itemgetter(1), g)
ranges.append((group[0], group[-1]))
output:
[(2, 5), (12, 17)]
EDIT: The example is already explained in the documentation but maybe I should explain it more:
The key to the solution is
differencing with a range so that
consecutive numbers all appear in same
group.
If the data was: [2, 3, 4, 5, 12, 13, 14, 15, 16, 17]
Then groupby(enumerate(data), lambda (i,x):i-x) is equivalent of the following:
groupby(
[(0, 2), (1, 3), (2, 4), (3, 5), (4, 12),
(5, 13), (6, 14), (7, 15), (8, 16), (9, 17)],
lambda (i,x):i-x
)
The lambda function subtracts the element index from the element value. So when you apply the lambda on each item. You'll get the following keys for groupby:
[-2, -2, -2, -2, -8, -8, -8, -8, -8, -8]
groupby groups elements by equal key value, so the first 4 elements will be grouped together and so forth.
I hope this makes it more readable.
python 3 version may be helpful for beginners
import the libraries required first
from itertools import groupby
from operator import itemgetter
ranges =[]
for k,g in groupby(enumerate(data),lambda x:x[0]-x[1]):
group = (map(itemgetter(1),g))
group = list(map(int,group))
ranges.append((group[0],group[-1]))
more_itertools.consecutive_groups was added in version 4.0.
Demo
import more_itertools as mit
iterable = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 20]
[list(group) for group in mit.consecutive_groups(iterable)]
# [[2, 3, 4, 5], [12, 13, 14, 15, 16, 17], [20]]
Code
Applying this tool, we make a generator function that finds ranges of consecutive numbers.
def find_ranges(iterable):
"""Yield range of consecutive numbers."""
for group in mit.consecutive_groups(iterable):
group = list(group)
if len(group) == 1:
yield group[0]
else:
yield group[0], group[-1]
iterable = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 20]
list(find_ranges(iterable))
# [(2, 5), (12, 17), 20]
The source implementation emulates a classic recipe (as demonstrated by #Nadia Alramli).
Note: more_itertools is a third-party package installable via pip install more_itertools.
The "naive" solution which I find somewhat readable atleast.
x = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 22, 25, 26, 28, 51, 52, 57]
def group(L):
first = last = L[0]
for n in L[1:]:
if n - 1 == last: # Part of the group, bump the end
last = n
else: # Not part of the group, yield current group and start a new
yield first, last
first = last = n
yield first, last # Yield the last group
>>>print list(group(x))
[(2, 5), (12, 17), (22, 22), (25, 26), (28, 28), (51, 52), (57, 57)]
Assuming your list is sorted:
>>> from itertools import groupby
>>> def ranges(lst):
pos = (j - i for i, j in enumerate(lst))
t = 0
for i, els in groupby(pos):
l = len(list(els))
el = lst[t]
t += l
yield range(el, el+l)
>>> lst = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17]
>>> list(ranges(lst))
[range(2, 6), range(12, 18)]
Here it is something that should work, without any import needed:
def myfunc(lst):
ret = []
a = b = lst[0] # a and b are range's bounds
for el in lst[1:]:
if el == b+1:
b = el # range grows
else: # range ended
ret.append(a if a==b else (a,b)) # is a single or a range?
a = b = el # let's start again with a single
ret.append(a if a==b else (a,b)) # corner case for last single/range
return ret
Please note that the code using groupby doesn't work as given in Python 3 so use this.
for k, g in groupby(enumerate(data), lambda x:x[0]-x[1]):
group = list(map(itemgetter(1), g))
ranges.append((group[0], group[-1]))
This doesn't use a standard function - it just iiterates over the input, but it should work:
def myfunc(l):
r = []
p = q = None
for x in l + [-1]:
if x - 1 == q:
q += 1
else:
if p:
if q > p:
r.append('%s-%s' % (p, q))
else:
r.append(str(p))
p = q = x
return '(%s)' % ', '.join(r)
Note that it requires that the input contains only positive numbers in ascending order. You should validate the input, but this code is omitted for clarity.
import numpy as np
myarray = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 20]
sequences = np.split(myarray, np.array(np.where(np.diff(myarray) > 1)[0]) + 1)
l = []
for s in sequences:
if len(s) > 1:
l.append((np.min(s), np.max(s)))
else:
l.append(s[0])
print(l)
Output:
[(2, 5), (12, 17), 20]
I think this way is simpler than any of the answers I've seen here (Edit: fixed based on comment from Pleastry):
data = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 20]
starts = [x for x in data if x-1 not in data and x+1 in data]
ends = [x for x in data if x-1 in data and x+1 not in data and x not in starts]
singles = [x for x in data if x-1 not in data and x+1 not in data]
list(zip(starts, ends)) + singles
Output:
[(2, 5), (12, 17), 20]
Edited:
As #dawg notes, this is O(n**2). One option to improve performance would be to convert the original list to a set (and also the starts list to a set) i.e.
data = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 20]
data_as_set = set(data)
starts = [x for x in data_as_set if x-1 not in data_as_set and x+1 in data_as_set]
startset = set(starts)
ends = [x for x in data_as_set if x-1 in data_as_set and x+1 not in data_as_set and x not in startset]
singles = [x for x in data_as_set if x-1 not in data_as_set and x+1 not in data_as_set]
print(list(zip(starts, ends)) + singles)
Using groupby and count from itertools gives us a short solution. The idea is that, in an increasing sequence, the difference between the index and the value will remain the same.
In order to keep track of the index, we can use an itertools.count, which makes the code cleaner as using enumerate:
from itertools import groupby, count
def intervals(data):
out = []
counter = count()
for key, group in groupby(data, key = lambda x: x-next(counter)):
block = list(group)
out.append([block[0], block[-1]])
return out
Some sample output:
print(intervals([0, 1, 3, 4, 6]))
# [[0, 1], [3, 4], [6, 6]]
print(intervals([2, 3, 4, 5]))
# [[2, 5]]
This is my method in which I tried to prioritize readability. Note that it returns a tuple of the same values if there is only one value in a group. That can be fixed easily in the second snippet I'll post.
def group(values):
"""return the first and last value of each continuous set in a list of sorted values"""
values = sorted(values)
first = last = values[0]
for index in values[1:]:
if index - last > 1: # triggered if in a new group
yield first, last
first = index # update first only if in a new group
last = index # update last on every iteration
yield first, last # this is needed to yield the last set of numbers
Here is the result of a test:
values = [0, 5, 6, 7, 12, 13, 21, 22, 23, 24, 25, 26, 30, 44, 45, 50]
result = list(group(values))
print(result)
result = [(0, 0), (5, 7), (12, 13), (21, 26), (30, 30), (44, 45), (50, 50)]
If you want to return only a single value in the case of a single value in a group, just add a conditional check to the yields:
def group(values):
"""return the first and last value of each continuous set in a list of sorted values"""
values = sorted(values)
first = last = values[0]
for index in values[1:]:
if index - last > 1: # triggered if in a new group
if first == last:
yield first
else:
yield first, last
first = index # update first only if in a new group
last = index # update last on every iteration
if first == last:
yield first
else:
yield first, last
result = [0, (5, 7), (12, 13), (21, 26), 30, (44, 45), 50]
Here's the answer I came up with. I'm writing the code for other people to understand, so I'm fairly verbose with variable names and comments.
First a quick helper function:
def getpreviousitem(mylist,myitem):
'''Given a list and an item, return previous item in list'''
for position, item in enumerate(mylist):
if item == myitem:
# First item has no previous item
if position == 0:
return None
# Return previous item
return mylist[position-1]
And then the actual code:
def getranges(cpulist):
'''Given a sorted list of numbers, return a list of ranges'''
rangelist = []
inrange = False
for item in cpulist:
previousitem = getpreviousitem(cpulist,item)
if previousitem == item - 1:
# We're in a range
if inrange == True:
# It's an existing range - change the end to the current item
newrange[1] = item
else:
# We've found a new range.
newrange = [item-1,item]
# Update to show we are now in a range
inrange = True
else:
# We were in a range but now it just ended
if inrange == True:
# Save the old range
rangelist.append(newrange)
# Update to show we're no longer in a range
inrange = False
# Add the final range found to our list
if inrange == True:
rangelist.append(newrange)
return rangelist
Example run:
getranges([2, 3, 4, 5, 12, 13, 14, 15, 16, 17])
returns:
[[2, 5], [12, 17]]
Using numpy + comprehension lists:
With numpy diff function, consequent input vector entries that their difference is not equal to one can be identified. The start and end of the input vector need to be considered.
import numpy as np
data = np.array([2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 20])
d = [i for i, df in enumerate(np.diff(data)) if df!= 1]
d = np.hstack([-1, d, len(data)-1]) # add first and last elements
d = np.vstack([d[:-1]+1, d[1:]]).T
print(data[d])
Output:
[[ 2 5]
[12 17]
[20 20]]
Note: The request that individual numbers should be treated differently, (returned as individual, not ranges) was omitted. This can be reached by further post-processing the results. Usually this will make things more complex without gaining any benefit.
One-liner in Python 2.7 if interested:
x = [2, 3, 6, 7, 8, 14, 15, 19, 20, 21]
d = iter(x[:1] + sum(([i1, i2] for i1, i2 in zip(x, x[1:] + x[:1]) if i2 != i1+1), []))
print zip(d, d)
>>> [(2, 3), (6, 8), (14, 15), (19, 21)]
A short solution that works without additional imports. It accepts any iterable, sorts unsorted inputs, and removes duplicate items:
def ranges(nums):
nums = sorted(set(nums))
gaps = [[s, e] for s, e in zip(nums, nums[1:]) if s+1 < e]
edges = iter(nums[:1] + sum(gaps, []) + nums[-1:])
return list(zip(edges, edges))
Example:
>>> ranges([2, 3, 4, 7, 8, 9, 15])
[(2, 4), (7, 9), (15, 15)]
>>> ranges([-1, 0, 1, 2, 3, 12, 13, 15, 100])
[(-1, 3), (12, 13), (15, 15), (100, 100)]
>>> ranges(range(100))
[(0, 99)]
>>> ranges([0])
[(0, 0)]
>>> ranges([])
[]
This is the same as #dansalmo's solution which I found amazing, albeit a bit hard to read and apply (as it's not given as a function).
Note that it could easily be modified to spit out "traditional" open ranges [start, end), by e.g. altering the return statement:
return [(s, e+1) for s, e in zip(edges, edges)]
I copied this answer over from another question that was marked as a duplicate of this one with the intention to make it easier findable (after I just now searched again for this topic, finding only the question here at first and not being satisfied with the answers given).
The versions by Mark Byers, Andrea Ambu, SilentGhost, Nadia Alramli, and truppo are simple and fast. The 'truppo' version encouraged me to write a version that retains the same nimble behavior while handling step sizes other than 1 (and lists as singletons elements that don't extend more than 1 step with a given step size). It is given here.
>>> list(ranges([1,2,3,4,3,2,1,3,5,7,11,1,2,3]))
[(1, 4, 1), (3, 1, -1), (3, 7, 2), 11, (1, 3, 1)]
Not the best approach , but here is my 2 cents
def getConsecutiveValues2(arr):
x = ""
final = []
end = 0
start = 0
for i in range(1,len(arr)) :
if arr[i] - arr[i-1] == 1 :
end = i
else :
print(start,end)
final.append(arr[start:end+1])
start = i
if i == len(arr) - 1 :
final.append(arr[start:end+1])
return final
x = [1,2,3,5,6,8,9,10,11,12]
print(getConsecutiveValues2(x))
>> [[1, 2, 3], [5, 6], [8, 9, 10, 11]]
This implementation works for regular or irregular steps
I needed to achieve the same thing but with the slight difference where steps can be irregular. this is my implementation
def ranges(l):
if not len(l):
return range(0,0)
elif len(l)==1:
return range(l[0],l[0]+1)
# get steps
sl = sorted(l)
steps = [i-j for i,j in zip(sl[1:],sl[:-1])]
# get unique steps indexes range
groups = [[0,0,steps[0]],]
for i,s in enumerate(steps):
if s==groups[-1][-1]:
groups[-1][1] = i+1
else:
groups.append( [i+1,i+1,s] )
g2 = groups[-2]
if g2[0]==g2[1]:
if sl[i+1]-sl[i]==s:
_=groups.pop(-2)
groups[-1][0] = i
# create list of ranges
return [range(sl[i],sl[j]+s,s) if s!=0 else [sl[i]]*(j+1-i) for i,j,s in groups]
Here's an example
from timeit import timeit
# for regular ranges
l = list(range(1000000))
ranges(l)
>>> [range(0, 1000000)]
l = list(range(10)) + list(range(20,25)) + [1,2,3]
ranges(l)
>>> [range(0, 2), range(1, 3), range(2, 4), range(3, 10), range(20, 25)]
sorted(l);[list(i) for i in ranges(l)]
>>> [0, 1, 1, 2, 2, 3, 3, 4, 5, 6, 7, 8, 9, 20, 21, 22, 23, 24]
>>> [[0, 1], [1, 2], [2, 3], [3, 4, 5, 6, 7, 8, 9], [20, 21, 22, 23, 24]]
# for irregular steps list
l = [1, 3, 5, 7, 10, 11, 12, 100, 200, 300, 400, 60, 99, 4000,4001]
ranges(l)
>>> [range(1, 9, 2), range(10, 13), range(60, 138, 39), range(100, 500, 100), range(4000, 4002)]
## Speed test
timeit("ranges(l)","from __main__ import ranges,l", number=1000)/1000
>>> 9.303160999934334e-06
Yet another solution if you expect your input to be a set:
def group_years(years):
consecutive_years = []
for year in years:
close = {y for y in years if abs(y - year) == 1}
for group in consecutive_years:
if len(close.intersection(group)):
group |= close
break
else:
consecutive_years.append({year, *close})
return consecutive_years
Example:
group_years({2016, 2017, 2019, 2020, 2022})
Out[54]: [{2016, 2017}, {2019, 2020}, {2022}]

Categories

Resources