This question already has answers here:
Identify groups of consecutive numbers in a list
(19 answers)
Closed 4 years ago.
I have a list containing data as such:
[1, 2, 3, 4, 7, 8, 10, 11, 12, 13, 14]
I'd like to print out the ranges of consecutive integers:
1-4, 7-8, 10-14
Is there a built-in/fast/efficient way of doing this?
From the docs:
>>> from itertools import groupby
>>> from operator import itemgetter
>>> data = [ 1, 4,5,6, 10, 15,16,17,18, 22, 25,26,27,28]
>>> for k, g in groupby(enumerate(data), lambda (i, x): i-x):
... print map(itemgetter(1), g)
...
[1]
[4, 5, 6]
[10]
[15, 16, 17, 18]
[22]
[25, 26, 27, 28]
You can adapt this fairly easily to get a printed set of ranges.
A short solution that works without additional imports. It accepts any iterable, sorts unsorted inputs, and removes duplicate items:
def ranges(nums):
nums = sorted(set(nums))
gaps = [[s, e] for s, e in zip(nums, nums[1:]) if s+1 < e]
edges = iter(nums[:1] + sum(gaps, []) + nums[-1:])
return list(zip(edges, edges))
Example:
>>> ranges([2, 3, 4, 7, 8, 9, 15])
[(2, 4), (7, 9), (15, 15)]
>>> ranges([-1, 0, 1, 2, 3, 12, 13, 15, 100])
[(-1, 3), (12, 13), (15, 15), (100, 100)]
>>> ranges(range(100))
[(0, 99)]
>>> ranges([0])
[(0, 0)]
>>> ranges([])
[]
This is the same as #dansalmo's solution which I found amazing, albeit a bit hard to read and apply (as it's not given as a function).
Note that it could easily be modified to spit out "traditional" open ranges [start, end), by e.g. altering the return statement:
return [(s, e+1) for s, e in zip(edges, edges)]
This will print exactly as you specified:
>>> nums = [1, 2, 3, 4, 7, 8, 10, 11, 12, 13, 14]
>>> ranges = sum((list(t) for t in zip(nums, nums[1:]) if t[0]+1 != t[1]), [])
>>> iranges = iter(nums[0:1] + ranges + nums[-1:])
>>> print ', '.join([str(n) + '-' + str(next(iranges)) for n in iranges])
1-4, 7-8, 10-14
If the list has any single number ranges, they would be shown as n-n:
>>> nums = [1, 2, 3, 4, 5, 7, 8, 9, 12, 15, 16, 17, 18]
>>> ranges = sum((list(t) for t in zip(nums, nums[1:]) if t[0]+1 != t[1]), [])
>>> iranges = iter(nums[0:1] + ranges + nums[-1:])
>>> print ', '.join([str(n) + '-' + str(next(iranges)) for n in iranges])
1-5, 7-9, 12-12, 15-18
Built-In: No, as far as I'm aware.
You have to run through the array. Start off with putting the first value in a variable and print it, then as long as you keep hitting the next number do nothing but remember the last number in another variable. If the next number is not in line, check the last number remembered versus the first number. If it's the same, do nothing. If it's different, print "-" and the last number. Then put the current value in the first variable and start over.
At the end of the array you run the same routine as if you had hit a number out of line.
I could have written the code, of course, but I don't want to spoil your homework :-)
I had a similar problem and am using the following for a sorted list. It outputs a dictionary with ranges of values listed in a dictionary. The keys separate each run of consecutive numbers and are also the running total of non-sequential items between numbers in sequence.
Your list gives me an output of {0: [1, 4], 1: [7, 8], 2: [10, 14]}
def series_dictf(index_list):
from collections import defaultdict
series_dict = defaultdict(list)
sequence_dict = dict()
list_len = len(index_list)
series_interrupts = 0
for i in range(list_len):
if i == (list_len - 1):
break
position_a = index_list[i]
position_b = index_list[i + 1]
if position_b == (position_a + 1):
sequence_dict[position_a] = (series_interrupts)
sequence_dict[position_b] = (series_interrupts)
if position_b != (position_a + 1):
series_interrupts += 1
for position, series in sequence_dict.items():
series_dict[series].append(position)
for series, position in series_dict.items():
series_dict[series] = [position[0], position[-1]]
return series_dict
Using set operation, the following algorithm can be executed
def get_consecutive_integer_series(integer_list):
integer_list = sorted(integer_list)
start_item = integer_list[0]
end_item = integer_list[-1]
a = set(integer_list) # Set a
b = range(start_item, end_item+1)
# Pick items that are not in range.
c = set(b) - a # Set operation b-a
li = []
start = 0
for i in sorted(c):
end = b.index(i) # Get end point of the list slicing
li.append(b[start:end]) # Slice list using values
start = end + 1 # Increment the start point for next slicing
li.append(b[start:]) # Add the last series
for sliced_list in li:
if not sliced_list:
# list is empty
continue
if len(sliced_list) == 1:
# If only one item found in list
yield sliced_list[0]
else:
yield "{0}-{1}".format(sliced_list[0], sliced_list[-1])
a = [1, 2, 3, 6, 7, 8, 4, 14, 15, 21]
for series in get_consecutive_integer_series(a):
print series
Output for the above list "a"
1-4
6-8
14-15
21
Here is another basic solution without using any module, which is good for interview, generally in the interview they asked without using any modules:
#!/usr/bin/python
def split_list(n):
"""will return the list index"""
return [(x+1) for x,y in zip(n, n[1:]) if y-x != 1]
def get_sub_list(my_list):
"""will split the list base on the index"""
my_index = split_list(my_list)
output = list()
prev = 0
for index in my_index:
new_list = [ x for x in my_list[prev:] if x < index]
output.append(new_list)
prev += len(new_list)
output.append([ x for x in my_list[prev:]])
return output
my_list = [1, 3, 4, 7, 8, 10, 11, 13, 14]
print get_sub_list(my_list)
Output:
[[1], [3, 4], [7, 8], [10, 11], [13, 14]]
You can use collections library which has a class called Counter. Counter can come in handy if trying to poll the no of distinct elements in any iterable
from collections import Counter
data = [ 1, 4,5,6, 10, 15,16,17,18, 22, 25,26,27,28]
cnt=Counter(data)
print(cnt)
the output for this looks like
Counter({1: 1, 4: 1, 5: 1, 6: 1, 10: 1, 15: 1, 16: 1, 17: 1, 18: 1, 22: 1, 25: 1, 26: 1, 27: 1, 28: 1})
which just like any other dictionary can be polled for key values
Related
how can i split a list based on neighboring elements, so if i have a list such as
test = [3,5,7,1,10,17]
and i want to split the list if element 10 and 17 are next to each other so that the split happens between [3,5,7,1] and [10,17].
I know there is groupby but i could only figure out how to use that to check if one element is present and then split, but not two after each other.
pseudocode:
for i in list:
if element[i] == 10 and element[i+1] == 17:
splitlist() # split before elements 10
You can zip() the list with an offset of itself to get pairs. Then find the index of the pair you are looking for (assuming this happens once or you only care about the first). Then splice the list:
test = [3,5,7,1,10,17]
def partition_on_pair(test, pair):
index = next((i for i, n in enumerate(zip(test, test[1:])) if n == pair), len(test))
return test[:index], test[index:]
partition_on_pair(test, (10, 17))
# ([3, 5, 7, 1], [10, 17])
partition_on_pair(test, (10, 19)) # doesn't exist, so you get an empty
#([3, 5, 7, 1, 10, 17], [])
partition_on_pair(test, (5, 7))
#([3], [5, 7, 1, 10, 17])
partition_on_pair(test, (3,5))
#([], [3, 5, 7, 1, 10, 17])
Here is an example based on your output:
def split_list(test, match):
idx = [test.index(i) for i in match]
if sum([i - min(idx) for i in idx]) == sum(range(len(match))
return [
test[0:idx[0]],
test[idx[0]:idx[-1]+1]
]
split_list(test=[3, 5, 7, 1, 10, 17], match=[10, 17])
Here is a simple working code:
test = [3,5,7,1,10,17]
def neighbor_splitting():
for x in test:
if x == 10:
index = test.index(x)
list1 = test[:index]
list2 = test[index:]
return list1, list2
# [3, 5, 7, 1]
# [10, 17]
I am trying to figure out the following problem. I have a list with integers.
list = [1, 2, 3, 5, 6, 9, 10]
The goal is to find the longest sub-list within the list. The sub-list is defined by having the difference between two integers not being more than 1 (or -1). In this example, the longest sub-list respecting this condition is:
lista = [1, 2, 3, 5, 6, 9, 10]
difference = []
i = 0
for number in range(len(lista)-1):
diff = lista[i]-lista[i+1]
difference.append(diff)
i += 1
print(difference)
winner = 0
ehdokas = 0
for a in difference:
if a == 1 or a == -1:
ehdokas += 1
else:
if ehdokas > winner:
winner = ehdokas
ehdokas = 0
if ehdokas > winner:
winner = ehdokas
print(winner)
Now, the "print(winner)" will print "2" whereas I wish that it would print "3" since the first three integers are "adjacent" to each other (1-2 = -1 , 2-3 = -1)
Basically I am trying to iterate through the list and calculate the difference between the adjacent integers and the calculate the consecutive number of "1" and "-1" in the "difference" list.
This code works sometimes, depending on the list fed through, sometimes it doesn't. Any improvement proposals would be highly appreciated.
Given:
lista = [1, 2, 3, 5, 6, 9, 10]
You can construct a new list of tuples that have the index and difference in the tuple:
diffs=[(i,f"{lista[i]}-{lista[i+1]}={lista[i]-lista[i+1]}",lista[i]-lista[i+1])
for i in range(len(lista)-1)]
>>> m
[(0, '1-2=-1', -1), (1, '2-3=-1', -1), (2, '3-5=-2', -2), (3, '5-6=-1', -1), (4, '6-9=-3', -3), (5, '9-10=-1', -1)]
Given a list like that, you can use groupby, max to find the longest length of the sub lists that satisfy that condition:
from itertools import groupby
lista = [1, 2, 3, 5, 6, 9, 10]
m=max((list(v) for k,v in groupby(
((i,lista[i]-lista[i+1]) for i in range(len(lista)-1)),
key=lambda t: t[1] in (-1,0,1)) if k),key=len)
>>> m
[(0, -1), (1, -1)]
A nice and simple solution based on more_itertools:
#!pip install more_itertools
l = [1, 2, 3, 5, 6, 9, 10]
import more_itertools as mit
sublists = []
for group in mit.consecutive_groups(l):
sublists.append(list(group))
max(sublists, key=len)
This outputs:
[1,2,3]
Which is the longest sublist of consecutive numbers.
Here's a solution without the use of libraries:
l = [1, 2, 3, 4, 10, 11, 20, 19, 18, 17, 16, 30, 29, 28, 27, 40, 41]
r = []
if len(l) > 1:
t = [l[0]]
for i in range(0, len(l)-1):
if abs(l[i]-l[i+1]) == 1:
t.append(l[i+1])
if len(t) > len(r):
r = t.copy()
else:
t.clear()
t.append(l[i+1])
print(r)
Which will print:
[20, 19, 18, 17, 16]
You can get differences between items and their predecessor with zip(). This will allow you to generate a list of break positions (the indexes of items that cannot be combined with their predecessor). Using zip on these breaking positions will allow you to get the start and end indexes of subsets of the list that form groups of consecutive "compatible" items. The difference between start and end is the size of the corresponding group.
L = [1, 2, 3, 5, 6, 9, 10]
breaks = [i for i,(a,b) in enumerate(zip(L,L[1:]),1) if abs(a-b)>1 ]
winner = max( e-s for s,e in zip([0]+breaks,breaks+[len(L)]) )
print(winner) # 3
If you want to see how the items are grouped, you can use the start/end indexes to get the subsets:
[ L[s:e] for s,e in zip([0]+breaks,breaks+[len(L)]) ]
[[1, 2, 3], [5, 6], [9, 10]]
I want to store even number and odd number in a separate list. But, here I am facing a unique problem. I am able to store it in sets but not in lists. Is there a way wherein I can store these in a List without repetition.
I have tried this in Jupyter notebook
list_loop=[1,2,3,4,5,6,7,8,9,10,11,12,13,1,4,1,51,6,17,]
for i in list_loop:
if i % 2 == 0 :
list_even = list_even + [i]
else:
list_odd = list_odd + [i]
print(set(list_even))
print(set(list_odd))
Expected output:
[2,4,6,8,10,12]
[1,3,5,7,9,11,13,17,51]
Define list_odd and list_even as lists and don't convert them to sets before printing. Note that you can use list comprehension to fill list_odd and list_even:
list_odd = []
list_even = []
list_loop=[1,2,3,4,5,6,7,8,9,10,11,12,13,1,4,1,51,6,17,]
list_odd = [elem for elem in list_loop if elem % 2 != 0]
list_even = [elem for elem in list_loop if elem % 2 == 0]
print(list_even)
print(list_odd)
Output:
[2, 4, 6, 8, 10, 12, 4, 6]
[1, 3, 5, 7, 9, 11, 13, 1, 1, 51, 17]
Edit: for uniqueness, turn list_loop into a set:
list_loop=set([1,2,3,4,5,6,7,8,9,10,11,12,13,1,4,1,51,6,17,])
Output:
[2, 4, 6, 8, 10, 12]
[1, 3, 5, 7, 9, 11, 13, 17, 51]
Use a comprehension
>>> list_loop=[1,2,3,4,5,6,7,8,9,10,11,12,13,1,4,1,51,6,17,]
>>> print(list(set(_ for _ in list_loop if _ % 2)))
[1, 3, 5, 7, 9, 11, 13, 17, 51]
Similarly for even numbers.
There are a couple of ways you could do this. You could use the OrderedDict in the collections library, or you could just sort the set and get a list,
...
print(sorted(set(list_even)))
print(sorted(set(list_odd)))
Also, I would personally create those lists using a set comprehension
list_even = sorted({x for x in list_loop if x % 2 == 0})
list_odd = sorted({x for x in list_loop if x % 2 == 1})
You can solve this using a list comprehension with a filter condition - but you then iterate your list twice.
By using a simple for loop you only need to touch any number once at it will conserve the original order - what putting your numbers through a set might not do - order in a set is not guaranteed:
Keep a set of seen numbers, only add anything if your current number was not yet seen.
list_loop = [1,2,3,4,5,6,7,8,9,10,11,12,13,1,4,1,51,6,17,]
list_even = []
list_odd = []
seen = set()
trick = [list_even, list_odd] # even list is at index 0, odd list at index 1
for i in list_loop:
if i in seen:
continue
else:
seen.add(i)
# the trick eliminates the need for an if-clause
trick[i%2].append(i) # you use i%2 to get either the even or odd index
print(list_even)
print(list_odd)
Output:
[2, 4, 6, 8, 10, 12]
[1, 3, 5, 7, 9, 11, 13, 51, 17]
You can apply the list function to your set object in order to
convert it to a list.
list_from_set = list(set(list_even))
>>> print(list_from_set)
[2, 4, 6, 8, 10, 12]
My list, for example, is
my_list = [1,2,3,4,5, 9,10,11,12,13,14, 20,21,22,23,24,25,26,27]
I would like to save the first and last boundary of two elements in consecutive values. So what I need to get is:
output = [1,2,4,5, 9,10,13,14, 20,21,26,27]
How can I simply or efficiently get this result?
Use more_itertools.consecutive_groups
import more_itertools as mit
my_list = [1,2,3,4,5,9,10,11,12,13,14,15]
x = [list(group) for group in mit.consecutive_groups(my_list)]
oputput = []
for i in x:
temp = [i[0],i[1],i[-2],i[-1]]
output.extend(temp)
Output:
[1,2,4,5,9,10,14,15]
Use groupby and itemgetter:
from operator import itemgetter
from itertools import groupby
my_list = [1,2,3,4,5,9,10,11,12,13,14,20,21,22,23,24,25,26,27]
output = []
for k, g in groupby(enumerate(my_list), lambda x: x[0]-x[1]):
lst = list(map(itemgetter(1), g))
output.extend([lst[0], lst[1], lst[-2], lst[-1]])
print(output)
# [1, 2, 4, 5, 9, 10, 13, 14, 20, 21, 26, 27]
Using only the standard itertools module, you can do:
from itertools import count, groupby
def remove_middle_of_seq(lst):
out = []
index = count()
for _, sequence in groupby(lst, lambda value: value - next(index)):
seq = list(sequence)
out.extend([seq[0], seq[1], seq[-2], seq[-1]])
return out
my_list = [1,2,3,4,5, 9,10,11,12,13,14, 20,21,22,23,24,25,26,27]
print(remove_middle_of_seq(my_list))
# [1, 2, 4, 5, 9, 10, 13, 14, 20, 21, 26, 27]
In groups of consecutive values, the difference between the values and their index is constant, so groupby can group them using this difference as key.
There isn't really a function that does this kind of thing in the standard library, so you have to write most of it manually. It's easiest to first group all ascending numbers, and then delete the middle of each group:
import itertools
def group_consecutive(sequence):
"""
Aggregates consecutive integers into groups.
>>> group_consecutive([8, 9, 1, 3, 4, 5])
[[8, 9], [1], [3, 4, 5]]
"""
result = []
prev_num = None
for num in sequence:
if prev_num is None or num != prev_num + 1:
group = [num]
result.append(group)
else:
group.append(num)
prev_num = num
return result
def drop_consecutive(sequence, keep_left=2, keep_right=2):
"""
Groups consecutive integers and then keeps only the 2 first and last numbers
in each group. The result is then flattened.
>>> drop_consecutive([1, 2, 3, 4, 5, 8, 9])
[1, 2, 4, 5, 8, 9]
"""
grouped_seq = group_consecutive(sequence)
for group in grouped_seq:
del group[keep_left:-keep_right]
return list(itertools.chain.from_iterable(grouped_seq))
>>> my_list = [1,2,3,4,5, 9,10,11,12,13,14, 20,21,22,23,24,25,26,27]
>>> drop_consecutive(my_list)
[1, 2, 4, 5, 9, 10, 13, 14, 20, 21, 26, 27]
See also:
itertools.chain and itertools.chain.from_iterable
You can pair adjacent list items by zipping the list with itself with an offset of 1, but pad the shifted list with a non-consecutive value, so that you can iterate through the pairings and determine that there is a separate group when the difference of a pair is not 1:
def consecutive_groups(l):
o = []
for a, b in zip([l[0] - 2] + l, l):
if b - a != 1:
o.append([])
o[-1].append(b)
return [s[:2] + s[-2:] for s in o]
Given your sample input, consecutive_groups(my_list) returns:
[[1, 2, 4, 5], [9, 10, 13, 14], [20, 21, 26, 27]]
I have a long list of float numbers ranging from 1 to 5, called "average", and I want to return the list of indices for elements that are smaller than a or larger than b
def find(lst,a,b):
result = []
for x in lst:
if x<a or x>b:
i = lst.index(x)
result.append(i)
return result
matches = find(average,2,4)
But surprisingly, the output for "matches" has a lot of repetitions in it, e.g. [2, 2, 10, 2, 2, 2, 19, 2, 10, 2, 2, 42, 2, 2, 10, 2, 2, 2, 10, 2, 2, ...].
Why is this happening?
You are using .index() which will only find the first occurrence of your value in the list. So if you have a value 1.0 at index 2, and at index 9, then .index(1.0) will always return 2, no matter how many times 1.0 occurs in the list.
Use enumerate() to add indices to your loop instead:
def find(lst, a, b):
result = []
for i, x in enumerate(lst):
if x<a or x>b:
result.append(i)
return result
You can collapse this into a list comprehension:
def find(lst, a, b):
return [i for i, x in enumerate(lst) if x<a or x>b]
if you're doing a lot of this kind of thing you should consider using numpy.
In [56]: import random, numpy
In [57]: lst = numpy.array([random.uniform(0, 5) for _ in range(1000)]) # example list
In [58]: a, b = 1, 3
In [59]: numpy.flatnonzero((lst > a) & (lst < b))[:10]
Out[59]: array([ 0, 12, 13, 15, 18, 19, 23, 24, 26, 29])
In response to Seanny123's question, I used this timing code:
import numpy, timeit, random
a, b = 1, 3
lst = numpy.array([random.uniform(0, 5) for _ in range(1000)])
def numpy_way():
numpy.flatnonzero((lst > 1) & (lst < 3))[:10]
def list_comprehension():
[e for e in lst if 1 < e < 3][:10]
print timeit.timeit(numpy_way)
print timeit.timeit(list_comprehension)
The numpy version is over 60 times faster.
>>> average = [1,3,2,1,1,0,24,23,7,2,727,2,7,68,7,83,2]
>>> matches = [i for i in range(0,len(average)) if average[i]<2 or average[i]>4]
>>> matches
[0, 3, 4, 5, 6, 7, 8, 10, 12, 13, 14, 15]