Split an array according to user defined rules - python

I am new to Python and I am quite struggling with how to split the array according to user-defined rules.
Say I have an array with elements inside:
A=[0,1,2,6,7,8,9,14,15,16]
I want to use python to split the above array into several groups.
So if the interval between the current element and the next element is smaller than 2, then they belong to one group, else belong to a different group.
So essentially I want my new array to look like this:
B = [[0,1,2] [6,7,8,9] [14,15,16]]
I was thinking to use for loop and loop through array A, but not quite sure how to do so....

A basic for loop works
a=[0,1,2,6,7,8,9,14,15,16]
g = []
tmp = []
for i in range(len(a)):
tmp.append(a[i])
if i==len(a)-1 or a[i+1] - a[i] >= 2:
g.append(tmp)
tmp = []
print(g)
Output
[[0, 1, 2], [6, 7, 8, 9], [14, 15, 16]]

You can just iterate over your A list, starting a new sub-list in B each time the difference between the current value in A and the previous one is more than 1:
A=[0,1,2,6,7,8,9,14,15,16]
B = [[A[0]]]
Bindex = 0
last = A[0]
for i in range(1, len(A)):
if A[i] - last >= 2:
B.append([A[i]])
Bindex += 1
else:
B[Bindex].append(A[i])
last = A[i]
print(B)
Output:
[[0, 1, 2], [6, 7, 8, 9], [14, 15, 16]]

A = [0, 1, 2, 6, 7, 8, 9, 14, 15, 17]
temp = []
B = []
for index, value in enumerate(A):
if index != 0 and value - A[index-1] >= 2:
B.append(temp)
temp = [value]
continue
temp.append(value)
B.append(temp)
How the code works:
Enumerate takes a list and makes it into tuple format.
So enumerate will make your list look like this:
(0, 0) (1, 1) (2, 2) (3, 6) (4, 7) etc....
index != 0 condition ensures that you do not do access an element before 0 in the next part of the if statement. I.e. A[index-1], if index was 0 then this would be A[0-1] = A[-1] = NOT what we want (because this will give the last element)
temp = [value] makes a new list. I.e. the next list to be appended into B.
continue goes back to the for loop skipping any code below it for that particular iteration.

I think an elegant and much faster way is to use itertools.groupby like:
from itertools import groupby
from operator import itemgetter
A=[0,1,2,6,7,8,9,14,15,16]
B = [list(map(itemgetter(1), group)) for key, group in groupby(enumerate(A), lambda x: x[0] - x[1])]

A = [0, 1, 2, 6, 7, 8, 9, 14, 15, 17]
B = []
start = 0
end = 0
for i, val in enumerate(A):
if i != 0 and val - A[i-1] >= 2:
B.append(A[start:end])
start = end
end += 1
B.append(A[start:end])

Related

How to check if list contains consecutive numbers Python

I have below list:
l = [1, 2, 3, 4, 10, 11, 12]
By looking at the above list, we can say it's not consecutive. In order to find that using python, we can use below line of code:
print(sorted(l) == list(range(min(l), max(l)+1)))
# Output: False
This gives output False because 5, 6, 7, 8, 9 are missing. I want to further extend this functionality to check how many integers are missing. Also to note, no duplicates are allowed in the list. For ex:
l = [1, 2, 3, 4, 10, 11, 12, 14]
output of above list should be [5, 1] because 5 integers are missing between 4 and 10 and 1 is missing between 12 and 14
This answers the question from the comments of how to find out how many are missing at multiple points in the list. Here we assume the list arr is sorted and has no duplicates:
it1, it2 = iter(arr), iter(arr)
next(it2, None) # advance past the first element
counts_of_missing = [j - i - 1 for i, j in zip(it1, it2) if j - i > 1]
total_missing = sum(counts_of_missing)
The iterators allow us to avoid making an extra copy of arr. If we can be wasteful of memory, omit the first two lines and change zip(it1, it2) to zip(arr, arr[1:]):
counts_of_missing = [j - i - 1 for i, j in zip(arr, arr[1:]) if j - i > 1]
I think this will help you
L = [1, 2, 3, 4, 10, 11, 12, 14]
C = []
D = True
for _ in range(1,len(L)):
if L[_]-1!=L[_-1]:
C.append(L[_]-L[_-1]-1)
D = False
print(D)
print(C)
Here I have checked that a number at ith index minus 1 is equal to its previous index. if not then D = false and add it to list
here is my attempt:
from itertools import groupby
l = [1, 2, 3, 4, 10, 11, 12, 14]
not_in = [i not in l for i in range(min(l),max(l)+1)]
missed = [sum(g) for i,g in groupby(not_in) if i]
>>> missed
'''
[5, 1]

Finding all consecutive indices that sum X number on list

I'm currently practicing some Python and came across this problem. Let's say we have a list of integers and we want to find out all the indices of its elements that sum a certain number (in particular, the first index and the last index). Here's an example:
arr = [6, 7, 5, 4, 3, 1, 2, 3, 5, 6, 7, 9, 0, 0, 1, 2, 4, 1, 2, 3, 5, 1, 2]
sum_to_find = 13
So we have the array, and we want to find all elements that sum 13, and save the indexes (first and last of the interval) each time. The answer for this problem would be:
answer = [[0, 1], [2, 5], [3, 7], [9, 10], [12, 19], [13, 19], [14, 19], [18, 22]]
Below is the code I've tried:
def find_sum_range(array):
summ = 0
lst_sum = []
lst_ix = []
i = 0
j = -1
while i < len(array):
if summ < 13:
val = array[i]
summ += val
lst_ix.append(i)
elif summ == 13:
lst_sum.append(lst_ix)
j += 1
i = lst_sum[j][1]
summ = 0
lst_ix = []
i += 1
return lst_sum
But it's only returning the first two answers, mostly because I can't seem to properly backtrack the i iterator to start again from the first index of the last sum it correctly identified.
This approach is unnecessarily complicated. Utilizing list slicing produces much simpler code. Try this:
def find_sum_range(array):
result = []
for begin in range(len(array)):
for end in range(begin, len(array)):
if sum(array[begin:end+1]) == 13:
result.append([begin,end])
return result
or, with list comprehension:
def find_sum_range(array):
return [ [begin,end]
for begin in range(len(array))
for end in range(begin+1, len(array))
if sum(array[begin:end+1]) == 13 ]
A simple approach to this problem would be to use nested loops. So, go over each element in the list and for each of them iterate over the elements in the list that come after that element.
As soon as the sum exceeds or is equal to summ, we can break the nested for loop and go over to the main loop. If it turns out to be equal, then just append a list with the correct indices to the answer.
arr = [6, 7, 5, 4, 3, 1, 2, 3, 5, 6, 7, 9, 0, 0, 1, 2, 4, 1, 2, 3, 5, 1, 2]
req_sum = 13
answer = []
for i in range(len(arr)):
curr_s = arr[i]
for k in range(i+1, len(arr)):
curr_s += arr[k]
if curr_s >= req_sum:
if curr_s == req_sum:
answer.append([i, k])
break
print(answer)
Note that you use i = lst_sum[j][1] to try to backtrack. This is the second element in the list you just saved. You should use i = lst_sum[j][0] instead.
Also, you need to treat the case where you go above 13.
You can reduce the number of operations needed by moving a start index instead of keeping all the potential list indexes and deleting everything each time you arrive at 13 or above:
def find_sum_range(array):
summ = 0
lst_sum = []
start = 0
end = -1
for element in array:
summ += element
end += 1
while summ >= 13:
if summ == 13:
lst_sum.append([start, end])
summ -= array[start]
start += 1
return lst_sum

Remove the first N items that match a condition in a Python list

If I have a function matchCondition(x), how can I remove the first n items in a Python list that match that condition?
One solution is to iterate over each item, mark it for deletion (e.g., by setting it to None), and then filter the list with a comprehension. This requires iterating over the list twice and mutates the data. Is there a more idiomatic or efficient way to do this?
n = 3
def condition(x):
return x < 5
data = [1, 10, 2, 9, 3, 8, 4, 7]
out = do_remove(data, n, condition)
print(out) # [10, 9, 8, 4, 7] (1, 2, and 3 are removed, 4 remains)
One way using itertools.filterfalse and itertools.count:
from itertools import count, filterfalse
data = [1, 10, 2, 9, 3, 8, 4, 7]
output = filterfalse(lambda L, c=count(): L < 5 and next(c) < 3, data)
Then list(output), gives you:
[10, 9, 8, 4, 7]
Write a generator that takes the iterable, a condition, and an amount to drop. Iterate over the data and yield items that don't meet the condition. If the condition is met, increment a counter and don't yield the value. Always yield items once the counter reaches the amount you want to drop.
def iter_drop_n(data, condition, drop):
dropped = 0
for item in data:
if dropped >= drop:
yield item
continue
if condition(item):
dropped += 1
continue
yield item
data = [1, 10, 2, 9, 3, 8, 4, 7]
out = list(iter_drop_n(data, lambda x: x < 5, 3))
This does not require an extra copy of the list, only iterates over the list once, and only calls the condition once for each item. Unless you actually want to see the whole list, leave off the list call on the result and iterate over the returned generator directly.
The accepted answer was a little too magical for my liking. Here's one where the flow is hopefully a bit clearer to follow:
def matchCondition(x):
return x < 5
def my_gen(L, drop_condition, max_drops=3):
count = 0
iterator = iter(L)
for element in iterator:
if drop_condition(element):
count += 1
if count >= max_drops:
break
else:
yield element
yield from iterator
example = [1, 10, 2, 9, 3, 8, 4, 7]
print(list(my_gen(example, drop_condition=matchCondition)))
It's similar to logic in davidism answer, but instead of checking the drop count is exceeded on every step, we just short-circuit the rest of the loop.
Note: If you don't have yield from available, just replace it with another for loop over the remaining items in iterator.
If mutation is required:
def do_remove(ls, N, predicate):
i, delete_count, l = 0, 0, len(ls)
while i < l and delete_count < N:
if predicate(ls[i]):
ls.pop(i) # remove item at i
delete_count, l = delete_count + 1, l - 1
else:
i += 1
return ls # for convenience
assert(do_remove(l, N, matchCondition) == [10, 9, 8, 4, 7])
Straightforward Python:
N = 3
data = [1, 10, 2, 9, 3, 8, 4, 7]
def matchCondition(x):
return x < 5
c = 1
l = []
for x in data:
if c > N or not matchCondition(x):
l.append(x)
else:
c += 1
print(l)
This can easily be turned into a generator if desired:
def filter_first(n, func, iterable):
c = 1
for x in iterable:
if c > n or not func(x):
yield x
else:
c += 1
print(list(filter_first(N, matchCondition, data)))
Starting Python 3.8, and the introduction of assignment expressions (PEP 572) (:= operator), we can use and increment a variable within a list comprehension:
# items = [1, 10, 2, 9, 3, 8, 4, 7]
total = 0
[x for x in items if not (x < 5 and (total := total + 1) <= 3)]
# [10, 9, 8, 4, 7]
This:
Initializes a variable total to 0 which will symbolize the number of previously matched occurrences within the list comprehension
Checks for each item if it both:
matches the exclusion condition (x < 5)
and if we've not already discarded more than the number of items we wanted to filter out by:
incrementing total (total := total + 1) via an assignment expression
and at the same time comparing the new value of total to the max number of items to discard (3)
Using list comprehensions:
n = 3
data = [1, 10, 2, 9, 3, 8, 4, 7]
count = 0
def counter(x):
global count
count += 1
return x
def condition(x):
return x < 5
filtered = [counter(x) for x in data if count < n and condition(x)]
This will also stop checking the condition after n elements are found thanks to boolean short-circuiting.

Identify varying consecutive multiples in a sorted list

I have a sorted list and would like to identify consecutive multiple numbers in that list. The list can contain consecutive multiples of different order, which makes it more difficult.
Some test cases:
[1,3,4,5] -> [[1], [3,4,5]]
[1,3,5,6,7] -> [[1], [3], [5,6,7]]
# consecutive multiples of 1 and 2 (or n)
[1,2,3,7,9,11] -> [[1,2,3], [7,9,11]
[1,2,3,7,10,12,14,25] -> [[1,2,3], [7], [10,12,14], [25]]
# overlapping consecutives !!!
[1,2,3,4,6,8,10] -> [[1,2,3,4], [6,8,10]
Now, I have no idea what I'm doing. What I have done is to group pairwise by the distance between numbers, which was a good start, but then I am having a lot of issues identifying which element in each pair goes where, i.e.
# initial list
[1,3,4,5]
# pairs of same distance
[[1,3], [[3,4], [4,5]]
# algo to get the final result ?
[[1], [3,4,5]]
Any help is greatly appreciated.
EDIT: Maybe mentioning what I want this for would make it more clear.
I want to transform something like:
[1,5,10,11,12,13,14,15,17,20,22,24,26,28,30]
into
1, 5, 10 to 15 by 1, 17, 20 to 30 by 2
Here is a version that incorporates #Bakuriu's optimization:
MINIMAL_MATCH = 3
def find_some_sort_of_weird_consecutiveness(data):
"""
>>> find_some_sort_of_weird_consecutiveness([1,3,4,5])
[[1], [3, 4, 5]]
>>> find_some_sort_of_weird_consecutiveness([1,3,5,6,7])
[[1, 3, 5], [6], [7]]
>>> find_some_sort_of_weird_consecutiveness([1,2,3,7,9,11])
[[1, 2, 3], [7, 9, 11]]
>>> find_some_sort_of_weird_consecutiveness([1,2,3,7,10,12,14,25])
[[1, 2, 3], [7], [10, 12, 14], [25]]
>>> find_some_sort_of_weird_consecutiveness([1,2,3,4,6,8,10])
[[1, 2, 3, 4], [6, 8, 10]]
>>> find_some_sort_of_weird_consecutiveness([1,5,10,11,12,13,14,15,17,20,22,24,26,28,30])
[[1], [5], [10, 11, 12, 13, 14, 15], [17], [20, 22, 24, 26, 28, 30]]
"""
def pair_iter(series):
from itertools import tee
_first, _next = tee(series)
next(_next, None)
for i, (f, n) in enumerate(zip(_first, _next), start=MINIMAL_MATCH - 1):
yield i, f, n
result = []
while len(data) >= MINIMAL_MATCH:
test = data[1] - data[0]
if (data[2] - data[1]) == test:
for i, f, n in pair_iter(data):
if (n - f) != test:
i -= 1
break
else:
i = 1
data, match = data[i:], data[:i]
result.append(match)
for d in data:
result.append([d])
return result
if __name__ == '__main__':
from doctest import testmod
testmod()
It handles all your current test cases. Give me new failing test cases if you have any.
As mentioned in comments below, I am assuming that the shortest sequence is now three elements since a sequence of two is trivial.
See http://docs.python.org/2/library/itertools.html for an explanation of the pairwise iterator.
I'd start out with a difference list.
length_a = len(list1)
diff_v = [list1[j+1] - list1[j] for j in range(length_a-1)]
so [1,2,3,7,11,13,15,17] becomes [1,1,4,4,2,2,2]
now it is easy
You can just keep track of your last output value as you go along:
in_ = [1, 2, 3, 4, 5]
out = [[in[0]]]
for item in in_[1:]:
if out[-1][-1] != item - 1:
out.append([])
out[-1].append(item)
I would group the list by its difference between index and value:
from itertools import groupby
lst = [1,3,4,5]
result = []
for key, group in groupby(enumerate(lst), key = lambda (i, value): value - i):
result.append([value for i, value in group])
print result
[[1], [3, 4, 5]]
What did I do?
# at first I enumerate every item of list:
print list(enumerate(lst))
[(0, 1), (1, 3), (2, 4), (3, 5)]
# Then I subtract the index of each item from the item itself:
print [ value - i for i, value in enumerate(lst)]
[1, 2, 2, 2]
# As you see, consecutive numbers turn out to have the same difference between index and value
# We can use this feature and group the list by the difference of value minus index
print list( groupby(enumerate(lst), key = lambda (i, value): value - i) )
[(1, <itertools._grouper object at 0x104bff050>), (2, <itertools._grouper object at 0x104bff410>)]
# Now you can see how it works. Now I just want to add how to write this in one logical line:
result = [ [value for i, value in group]
for key, group in groupby(enumerate(lst), key = lambda (i, value): value - i)]
print result
[[1], [3, 4, 5]]
Approach for identifying consecutive multiples of n
Let's have a look at this list,
lst = [1,5,10,11,12,13,14,15,17,21,24,26,28,30]
especially at the differences between neighbor elements and the differences of differences of three consecutive elements:
1, 5, 10, 11, 12, 13, 14, 15, 17, 21, 24, 26, 28, 30
4, 5, 1, 1, 1, 1, 1, 2, 4, 3, 2, 2, 2
1, -4, 0, 0, 0, 0, 1, 2, -1, -1, 0, 0
We see, that there are zeros in the third row, whenever there are connective multiples in the first row. If we think of it mathematically, the 2nd derivative of a functions's linear sections is also zero. So lets use this property...
The "2nd derivative" of a list lst can be calculated like this
lst[i+2]-2*lst[i+1]+lst[i]
Note that this definition of the second order difference "looks" two indexes ahead.
Now here is the code detecting the consecutive multiples:
from itertools import groupby
# We have to keep track of the indexes in the list, that have already been used
available_indexes = set(range(len(lst)))
for second_order_diff, grouper in groupby(range(len(lst)-2), key = lambda i: lst[i+2]-2*lst[i+1]+lst[i]):
# store all not-consumed indexes in a list
grp_indexes = [i for i in grouper if i in available_indexes]
if grp_indexes and second_order_diff == 0:
# There are consecutive multiples
min_index, max_index = grp_indexes[0], grp_indexes[-1] + 2
print "Group from ", lst[min_index], "to", lst[max_index], "by", lst[min_index+1]-lst[min_index]
available_indexes -= set(range(min_index, max_index+1))
else:
# The not "consumed" indexes in this group are not consecutive
for i in grp_indexes:
print lst[i]
available_indexes.discard(i)
# The last two elements could be lost without the following two lines
for i in sorted(available_indexes):
print lst[i]
Output:
1
5
Group from 10 to 15 by 1
17
21
Group from 24 to 30 by 2

Detecting consecutive integers in a list [duplicate]

This question already has answers here:
Identify groups of consecutive numbers in a list
(19 answers)
Closed 4 years ago.
I have a list containing data as such:
[1, 2, 3, 4, 7, 8, 10, 11, 12, 13, 14]
I'd like to print out the ranges of consecutive integers:
1-4, 7-8, 10-14
Is there a built-in/fast/efficient way of doing this?
From the docs:
>>> from itertools import groupby
>>> from operator import itemgetter
>>> data = [ 1, 4,5,6, 10, 15,16,17,18, 22, 25,26,27,28]
>>> for k, g in groupby(enumerate(data), lambda (i, x): i-x):
... print map(itemgetter(1), g)
...
[1]
[4, 5, 6]
[10]
[15, 16, 17, 18]
[22]
[25, 26, 27, 28]
You can adapt this fairly easily to get a printed set of ranges.
A short solution that works without additional imports. It accepts any iterable, sorts unsorted inputs, and removes duplicate items:
def ranges(nums):
nums = sorted(set(nums))
gaps = [[s, e] for s, e in zip(nums, nums[1:]) if s+1 < e]
edges = iter(nums[:1] + sum(gaps, []) + nums[-1:])
return list(zip(edges, edges))
Example:
>>> ranges([2, 3, 4, 7, 8, 9, 15])
[(2, 4), (7, 9), (15, 15)]
>>> ranges([-1, 0, 1, 2, 3, 12, 13, 15, 100])
[(-1, 3), (12, 13), (15, 15), (100, 100)]
>>> ranges(range(100))
[(0, 99)]
>>> ranges([0])
[(0, 0)]
>>> ranges([])
[]
This is the same as #dansalmo's solution which I found amazing, albeit a bit hard to read and apply (as it's not given as a function).
Note that it could easily be modified to spit out "traditional" open ranges [start, end), by e.g. altering the return statement:
return [(s, e+1) for s, e in zip(edges, edges)]
This will print exactly as you specified:
>>> nums = [1, 2, 3, 4, 7, 8, 10, 11, 12, 13, 14]
>>> ranges = sum((list(t) for t in zip(nums, nums[1:]) if t[0]+1 != t[1]), [])
>>> iranges = iter(nums[0:1] + ranges + nums[-1:])
>>> print ', '.join([str(n) + '-' + str(next(iranges)) for n in iranges])
1-4, 7-8, 10-14
If the list has any single number ranges, they would be shown as n-n:
>>> nums = [1, 2, 3, 4, 5, 7, 8, 9, 12, 15, 16, 17, 18]
>>> ranges = sum((list(t) for t in zip(nums, nums[1:]) if t[0]+1 != t[1]), [])
>>> iranges = iter(nums[0:1] + ranges + nums[-1:])
>>> print ', '.join([str(n) + '-' + str(next(iranges)) for n in iranges])
1-5, 7-9, 12-12, 15-18
Built-In: No, as far as I'm aware.
You have to run through the array. Start off with putting the first value in a variable and print it, then as long as you keep hitting the next number do nothing but remember the last number in another variable. If the next number is not in line, check the last number remembered versus the first number. If it's the same, do nothing. If it's different, print "-" and the last number. Then put the current value in the first variable and start over.
At the end of the array you run the same routine as if you had hit a number out of line.
I could have written the code, of course, but I don't want to spoil your homework :-)
I had a similar problem and am using the following for a sorted list. It outputs a dictionary with ranges of values listed in a dictionary. The keys separate each run of consecutive numbers and are also the running total of non-sequential items between numbers in sequence.
Your list gives me an output of {0: [1, 4], 1: [7, 8], 2: [10, 14]}
def series_dictf(index_list):
from collections import defaultdict
series_dict = defaultdict(list)
sequence_dict = dict()
list_len = len(index_list)
series_interrupts = 0
for i in range(list_len):
if i == (list_len - 1):
break
position_a = index_list[i]
position_b = index_list[i + 1]
if position_b == (position_a + 1):
sequence_dict[position_a] = (series_interrupts)
sequence_dict[position_b] = (series_interrupts)
if position_b != (position_a + 1):
series_interrupts += 1
for position, series in sequence_dict.items():
series_dict[series].append(position)
for series, position in series_dict.items():
series_dict[series] = [position[0], position[-1]]
return series_dict
Using set operation, the following algorithm can be executed
def get_consecutive_integer_series(integer_list):
integer_list = sorted(integer_list)
start_item = integer_list[0]
end_item = integer_list[-1]
a = set(integer_list) # Set a
b = range(start_item, end_item+1)
# Pick items that are not in range.
c = set(b) - a # Set operation b-a
li = []
start = 0
for i in sorted(c):
end = b.index(i) # Get end point of the list slicing
li.append(b[start:end]) # Slice list using values
start = end + 1 # Increment the start point for next slicing
li.append(b[start:]) # Add the last series
for sliced_list in li:
if not sliced_list:
# list is empty
continue
if len(sliced_list) == 1:
# If only one item found in list
yield sliced_list[0]
else:
yield "{0}-{1}".format(sliced_list[0], sliced_list[-1])
a = [1, 2, 3, 6, 7, 8, 4, 14, 15, 21]
for series in get_consecutive_integer_series(a):
print series
Output for the above list "a"
1-4
6-8
14-15
21
Here is another basic solution without using any module, which is good for interview, generally in the interview they asked without using any modules:
#!/usr/bin/python
def split_list(n):
"""will return the list index"""
return [(x+1) for x,y in zip(n, n[1:]) if y-x != 1]
def get_sub_list(my_list):
"""will split the list base on the index"""
my_index = split_list(my_list)
output = list()
prev = 0
for index in my_index:
new_list = [ x for x in my_list[prev:] if x < index]
output.append(new_list)
prev += len(new_list)
output.append([ x for x in my_list[prev:]])
return output
my_list = [1, 3, 4, 7, 8, 10, 11, 13, 14]
print get_sub_list(my_list)
Output:
[[1], [3, 4], [7, 8], [10, 11], [13, 14]]
You can use collections library which has a class called Counter. Counter can come in handy if trying to poll the no of distinct elements in any iterable
from collections import Counter
data = [ 1, 4,5,6, 10, 15,16,17,18, 22, 25,26,27,28]
cnt=Counter(data)
print(cnt)
the output for this looks like
Counter({1: 1, 4: 1, 5: 1, 6: 1, 10: 1, 15: 1, 16: 1, 17: 1, 18: 1, 22: 1, 25: 1, 26: 1, 27: 1, 28: 1})
which just like any other dictionary can be polled for key values

Categories

Resources