I have a long list of float numbers ranging from 1 to 5, called "average", and I want to return the list of indices for elements that are smaller than a or larger than b
def find(lst,a,b):
result = []
for x in lst:
if x<a or x>b:
i = lst.index(x)
result.append(i)
return result
matches = find(average,2,4)
But surprisingly, the output for "matches" has a lot of repetitions in it, e.g. [2, 2, 10, 2, 2, 2, 19, 2, 10, 2, 2, 42, 2, 2, 10, 2, 2, 2, 10, 2, 2, ...].
Why is this happening?
You are using .index() which will only find the first occurrence of your value in the list. So if you have a value 1.0 at index 2, and at index 9, then .index(1.0) will always return 2, no matter how many times 1.0 occurs in the list.
Use enumerate() to add indices to your loop instead:
def find(lst, a, b):
result = []
for i, x in enumerate(lst):
if x<a or x>b:
result.append(i)
return result
You can collapse this into a list comprehension:
def find(lst, a, b):
return [i for i, x in enumerate(lst) if x<a or x>b]
if you're doing a lot of this kind of thing you should consider using numpy.
In [56]: import random, numpy
In [57]: lst = numpy.array([random.uniform(0, 5) for _ in range(1000)]) # example list
In [58]: a, b = 1, 3
In [59]: numpy.flatnonzero((lst > a) & (lst < b))[:10]
Out[59]: array([ 0, 12, 13, 15, 18, 19, 23, 24, 26, 29])
In response to Seanny123's question, I used this timing code:
import numpy, timeit, random
a, b = 1, 3
lst = numpy.array([random.uniform(0, 5) for _ in range(1000)])
def numpy_way():
numpy.flatnonzero((lst > 1) & (lst < 3))[:10]
def list_comprehension():
[e for e in lst if 1 < e < 3][:10]
print timeit.timeit(numpy_way)
print timeit.timeit(list_comprehension)
The numpy version is over 60 times faster.
>>> average = [1,3,2,1,1,0,24,23,7,2,727,2,7,68,7,83,2]
>>> matches = [i for i in range(0,len(average)) if average[i]<2 or average[i]>4]
>>> matches
[0, 3, 4, 5, 6, 7, 8, 10, 12, 13, 14, 15]
Related
I have below list:
l = [1, 2, 3, 4, 10, 11, 12]
By looking at the above list, we can say it's not consecutive. In order to find that using python, we can use below line of code:
print(sorted(l) == list(range(min(l), max(l)+1)))
# Output: False
This gives output False because 5, 6, 7, 8, 9 are missing. I want to further extend this functionality to check how many integers are missing. Also to note, no duplicates are allowed in the list. For ex:
l = [1, 2, 3, 4, 10, 11, 12, 14]
output of above list should be [5, 1] because 5 integers are missing between 4 and 10 and 1 is missing between 12 and 14
This answers the question from the comments of how to find out how many are missing at multiple points in the list. Here we assume the list arr is sorted and has no duplicates:
it1, it2 = iter(arr), iter(arr)
next(it2, None) # advance past the first element
counts_of_missing = [j - i - 1 for i, j in zip(it1, it2) if j - i > 1]
total_missing = sum(counts_of_missing)
The iterators allow us to avoid making an extra copy of arr. If we can be wasteful of memory, omit the first two lines and change zip(it1, it2) to zip(arr, arr[1:]):
counts_of_missing = [j - i - 1 for i, j in zip(arr, arr[1:]) if j - i > 1]
I think this will help you
L = [1, 2, 3, 4, 10, 11, 12, 14]
C = []
D = True
for _ in range(1,len(L)):
if L[_]-1!=L[_-1]:
C.append(L[_]-L[_-1]-1)
D = False
print(D)
print(C)
Here I have checked that a number at ith index minus 1 is equal to its previous index. if not then D = false and add it to list
here is my attempt:
from itertools import groupby
l = [1, 2, 3, 4, 10, 11, 12, 14]
not_in = [i not in l for i in range(min(l),max(l)+1)]
missed = [sum(g) for i,g in groupby(not_in) if i]
>>> missed
'''
[5, 1]
I have a list [0, 1, 2, 3, 4, 5, 6] and I sum its parts so that:
l = [0, 1, 2, 3, 4, 5, 6] -> 21
l = [1, 2, 3, 4, 5, 6] -> 21
l = [2, 3, 4, 5, 6] -> 20
l = [3, 4, 5, 6] -> 18
l = [4, 5, 6] -> 15
l = [5, 6] -> 11
l = [6] -> 6
l = [] -> 0
So, I get the corresponding sums of the list's parts: [21, 21, 20, 18, 15, 11, 6, 0]
The code I use is:
[sum(l[i:]) for i in range(len(l) + 1)]
But, for lists with range greater than 100000 the code slows down significantly.
Any idea why and how to optimize it?
I would suggest itertools.accumulate for this (which i recall is faster than np.cumsum), with some list reversing to get your desired output:
>>> from itertools import accumulate
>>> lst = [0, 1, 2, 3, 4, 5, 6]
>>> list(accumulate(reversed(lst)))[::-1]
[21, 21, 20, 18, 15, 11, 6]
(you can trivially add 0 to the end if needed)
This might help to reduce calculation time for big lists :
l = [0, 1, 2, 3, 4, 5, 6]
output = list(np.cumsum(l[::-1]))[::-1]+[0]
Output :
[21, 21, 20, 18, 15, 11, 6, 0]
Here is one comparison over performance for four different methods, all of which does the same thing :
from timeit import timeit
def sum10(l):
from itertools import accumulate
return list(accumulate(reversed(l)))[::-1]+[0]
def sum11(l):
from itertools import accumulate
return list(accumulate(l[::-1]))[::-1]+[0]
def sum20(l):
from numpy import cumsum
return list(cumsum(l[::-1]))[::-1]+[0]
def sum21(l):
from numpy import cumsum
return list(cumsum(list(reversed(l))))[::-1]+[0]
l = list(range(1000000))
iter_0 = timeit(lambda: sum10(l), number=10) #0.14102990700121154
iter_1 = timeit(lambda: sum11(l), number=10) #0.1336850459993002
nump_0 = timeit(lambda: sum20(l), number=10) #0.6019859320003889
nump_1 = timeit(lambda: sum21(l), number=10) #0.3818727100006072
There is no clean way of doing it with list comprehensions as far as I know.
This code will work without any other libraries:
def cumulative_sum(a):
total= 0
for item in a:
total += item
yield total
list(cumulative_sum(listname))
From Python 3.8 on, there is a new operator that might help:
[(x, total := total + x) for x in items]
My list, for example, is
my_list = [1,2,3,4,5, 9,10,11,12,13,14, 20,21,22,23,24,25,26,27]
I would like to save the first and last boundary of two elements in consecutive values. So what I need to get is:
output = [1,2,4,5, 9,10,13,14, 20,21,26,27]
How can I simply or efficiently get this result?
Use more_itertools.consecutive_groups
import more_itertools as mit
my_list = [1,2,3,4,5,9,10,11,12,13,14,15]
x = [list(group) for group in mit.consecutive_groups(my_list)]
oputput = []
for i in x:
temp = [i[0],i[1],i[-2],i[-1]]
output.extend(temp)
Output:
[1,2,4,5,9,10,14,15]
Use groupby and itemgetter:
from operator import itemgetter
from itertools import groupby
my_list = [1,2,3,4,5,9,10,11,12,13,14,20,21,22,23,24,25,26,27]
output = []
for k, g in groupby(enumerate(my_list), lambda x: x[0]-x[1]):
lst = list(map(itemgetter(1), g))
output.extend([lst[0], lst[1], lst[-2], lst[-1]])
print(output)
# [1, 2, 4, 5, 9, 10, 13, 14, 20, 21, 26, 27]
Using only the standard itertools module, you can do:
from itertools import count, groupby
def remove_middle_of_seq(lst):
out = []
index = count()
for _, sequence in groupby(lst, lambda value: value - next(index)):
seq = list(sequence)
out.extend([seq[0], seq[1], seq[-2], seq[-1]])
return out
my_list = [1,2,3,4,5, 9,10,11,12,13,14, 20,21,22,23,24,25,26,27]
print(remove_middle_of_seq(my_list))
# [1, 2, 4, 5, 9, 10, 13, 14, 20, 21, 26, 27]
In groups of consecutive values, the difference between the values and their index is constant, so groupby can group them using this difference as key.
There isn't really a function that does this kind of thing in the standard library, so you have to write most of it manually. It's easiest to first group all ascending numbers, and then delete the middle of each group:
import itertools
def group_consecutive(sequence):
"""
Aggregates consecutive integers into groups.
>>> group_consecutive([8, 9, 1, 3, 4, 5])
[[8, 9], [1], [3, 4, 5]]
"""
result = []
prev_num = None
for num in sequence:
if prev_num is None or num != prev_num + 1:
group = [num]
result.append(group)
else:
group.append(num)
prev_num = num
return result
def drop_consecutive(sequence, keep_left=2, keep_right=2):
"""
Groups consecutive integers and then keeps only the 2 first and last numbers
in each group. The result is then flattened.
>>> drop_consecutive([1, 2, 3, 4, 5, 8, 9])
[1, 2, 4, 5, 8, 9]
"""
grouped_seq = group_consecutive(sequence)
for group in grouped_seq:
del group[keep_left:-keep_right]
return list(itertools.chain.from_iterable(grouped_seq))
>>> my_list = [1,2,3,4,5, 9,10,11,12,13,14, 20,21,22,23,24,25,26,27]
>>> drop_consecutive(my_list)
[1, 2, 4, 5, 9, 10, 13, 14, 20, 21, 26, 27]
See also:
itertools.chain and itertools.chain.from_iterable
You can pair adjacent list items by zipping the list with itself with an offset of 1, but pad the shifted list with a non-consecutive value, so that you can iterate through the pairings and determine that there is a separate group when the difference of a pair is not 1:
def consecutive_groups(l):
o = []
for a, b in zip([l[0] - 2] + l, l):
if b - a != 1:
o.append([])
o[-1].append(b)
return [s[:2] + s[-2:] for s in o]
Given your sample input, consecutive_groups(my_list) returns:
[[1, 2, 4, 5], [9, 10, 13, 14], [20, 21, 26, 27]]
I have a base list [1,4,10] which needs to be converted to a list having consecutive elements of each element in the base list in an efficient way
Examples:
If I need 2 consecutive numbers then [1,4,10] will be [1,2,4,5,10,11].
If 3 consecutive numbers then [1,4,10] will be [1,2,3,4,5,6,10,11,12].
arr=[1,4,10]
con=3
[r + i for r in arr for i in range(con)]
# [1, 2, 3, 4, 5, 6, 10, 11, 12]
Here's a one liner, assuming the list is x and the number of 'consecutives' is c:
reduce(lambda a, b: a + b, map(lambda x: range(x, x+c), x))
a = [1,4,10]
k = 3 #no of consecutive
x=[range(b,b+k) for b in a]
output = [m for d in x for m in d]
Here is one way. itertools.chain removes the need for explicit nested loops.
from itertools import chain
def consecutiver(lst, n=3):
return list(chain.from_iterable(range(i, i+n) for i in lst))
res = consecutiver([1, 4, 10], 2)
# [1, 2, 4, 5, 10, 11]
res2 = consecutiver([1, 4, 10], 3)
# [1, 2, 3, 4, 5, 6, 10, 11, 12]
This question already has answers here:
Identify groups of consecutive numbers in a list
(19 answers)
Closed 4 years ago.
I have a list containing data as such:
[1, 2, 3, 4, 7, 8, 10, 11, 12, 13, 14]
I'd like to print out the ranges of consecutive integers:
1-4, 7-8, 10-14
Is there a built-in/fast/efficient way of doing this?
From the docs:
>>> from itertools import groupby
>>> from operator import itemgetter
>>> data = [ 1, 4,5,6, 10, 15,16,17,18, 22, 25,26,27,28]
>>> for k, g in groupby(enumerate(data), lambda (i, x): i-x):
... print map(itemgetter(1), g)
...
[1]
[4, 5, 6]
[10]
[15, 16, 17, 18]
[22]
[25, 26, 27, 28]
You can adapt this fairly easily to get a printed set of ranges.
A short solution that works without additional imports. It accepts any iterable, sorts unsorted inputs, and removes duplicate items:
def ranges(nums):
nums = sorted(set(nums))
gaps = [[s, e] for s, e in zip(nums, nums[1:]) if s+1 < e]
edges = iter(nums[:1] + sum(gaps, []) + nums[-1:])
return list(zip(edges, edges))
Example:
>>> ranges([2, 3, 4, 7, 8, 9, 15])
[(2, 4), (7, 9), (15, 15)]
>>> ranges([-1, 0, 1, 2, 3, 12, 13, 15, 100])
[(-1, 3), (12, 13), (15, 15), (100, 100)]
>>> ranges(range(100))
[(0, 99)]
>>> ranges([0])
[(0, 0)]
>>> ranges([])
[]
This is the same as #dansalmo's solution which I found amazing, albeit a bit hard to read and apply (as it's not given as a function).
Note that it could easily be modified to spit out "traditional" open ranges [start, end), by e.g. altering the return statement:
return [(s, e+1) for s, e in zip(edges, edges)]
This will print exactly as you specified:
>>> nums = [1, 2, 3, 4, 7, 8, 10, 11, 12, 13, 14]
>>> ranges = sum((list(t) for t in zip(nums, nums[1:]) if t[0]+1 != t[1]), [])
>>> iranges = iter(nums[0:1] + ranges + nums[-1:])
>>> print ', '.join([str(n) + '-' + str(next(iranges)) for n in iranges])
1-4, 7-8, 10-14
If the list has any single number ranges, they would be shown as n-n:
>>> nums = [1, 2, 3, 4, 5, 7, 8, 9, 12, 15, 16, 17, 18]
>>> ranges = sum((list(t) for t in zip(nums, nums[1:]) if t[0]+1 != t[1]), [])
>>> iranges = iter(nums[0:1] + ranges + nums[-1:])
>>> print ', '.join([str(n) + '-' + str(next(iranges)) for n in iranges])
1-5, 7-9, 12-12, 15-18
Built-In: No, as far as I'm aware.
You have to run through the array. Start off with putting the first value in a variable and print it, then as long as you keep hitting the next number do nothing but remember the last number in another variable. If the next number is not in line, check the last number remembered versus the first number. If it's the same, do nothing. If it's different, print "-" and the last number. Then put the current value in the first variable and start over.
At the end of the array you run the same routine as if you had hit a number out of line.
I could have written the code, of course, but I don't want to spoil your homework :-)
I had a similar problem and am using the following for a sorted list. It outputs a dictionary with ranges of values listed in a dictionary. The keys separate each run of consecutive numbers and are also the running total of non-sequential items between numbers in sequence.
Your list gives me an output of {0: [1, 4], 1: [7, 8], 2: [10, 14]}
def series_dictf(index_list):
from collections import defaultdict
series_dict = defaultdict(list)
sequence_dict = dict()
list_len = len(index_list)
series_interrupts = 0
for i in range(list_len):
if i == (list_len - 1):
break
position_a = index_list[i]
position_b = index_list[i + 1]
if position_b == (position_a + 1):
sequence_dict[position_a] = (series_interrupts)
sequence_dict[position_b] = (series_interrupts)
if position_b != (position_a + 1):
series_interrupts += 1
for position, series in sequence_dict.items():
series_dict[series].append(position)
for series, position in series_dict.items():
series_dict[series] = [position[0], position[-1]]
return series_dict
Using set operation, the following algorithm can be executed
def get_consecutive_integer_series(integer_list):
integer_list = sorted(integer_list)
start_item = integer_list[0]
end_item = integer_list[-1]
a = set(integer_list) # Set a
b = range(start_item, end_item+1)
# Pick items that are not in range.
c = set(b) - a # Set operation b-a
li = []
start = 0
for i in sorted(c):
end = b.index(i) # Get end point of the list slicing
li.append(b[start:end]) # Slice list using values
start = end + 1 # Increment the start point for next slicing
li.append(b[start:]) # Add the last series
for sliced_list in li:
if not sliced_list:
# list is empty
continue
if len(sliced_list) == 1:
# If only one item found in list
yield sliced_list[0]
else:
yield "{0}-{1}".format(sliced_list[0], sliced_list[-1])
a = [1, 2, 3, 6, 7, 8, 4, 14, 15, 21]
for series in get_consecutive_integer_series(a):
print series
Output for the above list "a"
1-4
6-8
14-15
21
Here is another basic solution without using any module, which is good for interview, generally in the interview they asked without using any modules:
#!/usr/bin/python
def split_list(n):
"""will return the list index"""
return [(x+1) for x,y in zip(n, n[1:]) if y-x != 1]
def get_sub_list(my_list):
"""will split the list base on the index"""
my_index = split_list(my_list)
output = list()
prev = 0
for index in my_index:
new_list = [ x for x in my_list[prev:] if x < index]
output.append(new_list)
prev += len(new_list)
output.append([ x for x in my_list[prev:]])
return output
my_list = [1, 3, 4, 7, 8, 10, 11, 13, 14]
print get_sub_list(my_list)
Output:
[[1], [3, 4], [7, 8], [10, 11], [13, 14]]
You can use collections library which has a class called Counter. Counter can come in handy if trying to poll the no of distinct elements in any iterable
from collections import Counter
data = [ 1, 4,5,6, 10, 15,16,17,18, 22, 25,26,27,28]
cnt=Counter(data)
print(cnt)
the output for this looks like
Counter({1: 1, 4: 1, 5: 1, 6: 1, 10: 1, 15: 1, 16: 1, 17: 1, 18: 1, 22: 1, 25: 1, 26: 1, 27: 1, 28: 1})
which just like any other dictionary can be polled for key values