Going through a list and checking in Python

Going through a list and checking in Python - python

I'm currently looking for an algorithm to be able to go through a list such as the following one: [1,1,1,1,2,3,4,5,5,5,3,2]
I want, in this example, to be able to select the first "1" as there's a duplicate next to it, and keep going through the list until finding the next number having a duplicate next to it, and then select the last number of this one (ie. "5" in this example).
Eventually, make the difference between these 2 numbers (ie. 5-1)
I have this code at the moment:
i=0
for i in range(len(X)):
if (X[i] == X[i+1]):
first_number = X[i]
elif (X[i] != X[i+1]):
i+=1
I'd like to add a further condition to my question. Suppose you have the following list: lst=[1,1,1,1,2,3,4,5,5,5,3,3,3,3,2,2,2,4,3] In this case, I'll get the following differences according to your code = lst = [4,-2,-1] and then stops. However, I'd like "4-2" to be added to the list afterwards because "4" is followed by a number less than "4" (thus, going to the opposite direction - up - of what "2" followed "4" were following). I hope this is clear enough. Many thanks

You can use enumerate with a starting index of 1. Duplicates are detected if the current value is equal to the value at the previous index:
l = [1,1,1,1,2,3,4,5,5,5,3,2]
r = [v for i, v in enumerate(l, 1) if i < len(l) and v == l[i]]
result = r[-1] - r[0]
# 4
The list r is a list of all duplicates. r[-1] is the last item and r[0] is the first.
More trials:
>>> l= [1,1,5,5,5,2,2]
>>> r = [v for i, v in enumerate(l, 1) if i < len(l) and v == l[i]]
>>> r[-1] - r[0]
1

Solution:
def subDupeLimits( aList ):
dupList = []
prevX = None
for x in aList:
if x == prevX:
dupList.append(x) # track duplicates
prevX = x # update previous x
# return last duplicate minus first
return dupList[-1] - dupList[0]
# call it
y = subDupeLimits( [1,1,1,1,2,3,4,5,5,5,3,2] )
# y = 4

You can use itertools.groupby to find groups of repeating numbers, then find the difference between the first two of those:
>>> import itertools
>>> lst = [1,1,1,1,2,3,4,5,5,5,3,2]
>>> duplicates = [k for k, g in itertools.groupby(lst) if len(list(g)) > 1]
>>> duplicates[1] - duplicates[0]
4
Or use duplicates[-1] - duplicates[0] if you want the difference between the first and the last repeated number.
In the more general case, if you want the difference between all pairs of consecutive repeated numbers, you could combine that with zip:
>>> lst = [1,1,1,1,2,3,4,5,5,5,3,3,3,3,2,2,2]
>>> duplicates = [k for k, g in itertools.groupby(lst) if len(list(g)) > 1]
>>> duplicates
[1, 5, 3, 2]
>>> [x - y for x,y in zip(duplicates, duplicates[1:])]
[-4, 2, 1]
I think now I got what you want: You want the difference between any consecutive "plateaus" in the list, where a plateau is either a repeated value, or a local minimum or maximum. This is a bit more complicated and will take several steps:
>>> lst=[1,1,1,1,2,3,4,5,5,5,3,3,3,3,2,2,2,4,3]
>>> plateaus = [lst[i] for i in range(1, len(lst)-1) if lst[i] == lst[i-1]
... or lst[i-1] <= lst[i] >= lst[i+1]
... or lst[i-1] >= lst[i] <= lst[i+1]]
>>> condensed = [k for k, g in itertools.groupby(plateaus)]
>>> [y-x for x, y in zip(condensed, condensed[1:])]
[4, -2, -1, 2]

Related

What is the most efficient way of getting the intersection of k sorted arrays?

Given k sorted arrays what is the most efficient way of getting the intersection of these lists
Example
INPUT:
[[1,3,5,7], [1,1,3,5,7], [1,4,7,9]]
Output:
[1,7]
There is a way to get the union of k sorted arrays based on what I read in the Elements of programming interviews book in nlogk time. I was wondering if there is a way to do something similar for the intersection as well
## merge sorted arrays in nlogk time [ regular appending and merging is nlogn time ]
import heapq
def mergeArys(srtd_arys):
heap = []
srtd_iters = [iter(x) for x in srtd_arys]
# put the first element from each srtd array onto the heap
for idx, it in enumerate(srtd_iters):
elem = next(it, None)
if elem:
heapq.heappush(heap, (elem, idx))
res = []
# collect results in nlogK time
while heap:
elem, ary = heapq.heappop(heap)
it = srtd_iters[ary]
res.append(elem)
nxt = next(it, None)
if nxt:
heapq.heappush(heap, (nxt, ary))
EDIT: obviously this is an algorithm question that I am trying to solve so I cannot use any of the inbuilt functions like set intersection etc

Exploiting sort order
Here is a single pass O(n) approach that doesn't require any special data structures or auxiliary memory beyond the fundamental requirement of one iterator per input.
from itertools import cycle, islice
def intersection(inputs):
"Yield the intersection of elements from multiple sorted inputs."
# intersection(['ABBCD', 'BBDE', 'BBBDDE']) --> B B D
n = len(inputs)
iters = cycle(map(iter, inputs))
try:
candidate = next(next(iters))
while True:
for it in islice(iters, n-1):
while (value := next(it)) < candidate:
pass
if value != candidate:
candidate = value
break
else:
yield candidate
candidate = next(next(iters))
except StopIteration:
return
Here's a sample session:
>>> data = [[1,3,5,7], [1,1,3,5,7], [1,4,7,9]]
>>> list(intersection(data))
[1, 7]
>>> data = [[1,1,2,3], [1,1,4,4]]
>>> list(intersection(data))
[1, 1]
Algorithm in words
The algorithm starts by selecting the next value from the next iterator to be a candidate.
The main loop assumes a candidate has been selected and it loops over the next n - 1 iterators. For each of those iterators, it consumes values until it finds a value that is a least as large as the candidate. If that value is larger than the candidate, that value becomes the new candidate and the main loop starts again. If all n - 1 values are equal to the candidate, then the candidate is emitted and a new candidate is fetched.
When any input iterator is exhausted, the algorithm is complete.
Doing it without libraries (core language only)
The same algorithm works fine (though less beautifully) without using itertools. Just replace cycle and islice with their list based equivalents:
def intersection(inputs):
"Yield the intersection of elements from multiple sorted inputs."
# intersection(['ABBCD', 'BBDE', 'BBBDDE']) --> B B D
n = len(inputs)
iters = list(map(iter, inputs))
curr_iter = 0
try:
it = iters[curr_iter]
curr_iter = (curr_iter + 1) % n
candidate = next(it)
while True:
for i in range(n - 1):
it = iters[curr_iter]
curr_iter = (curr_iter + 1) % n
while (value := next(it)) < candidate:
pass
if value != candidate:
candidate = value
break
else:
yield candidate
it = iters[curr_iter]
curr_iter = (curr_iter + 1) % n
candidate = next(it)
except StopIteration:
return

Yes, it is possible! I've modified your example code to do this.
My answer assumes that your question is about the algorithm - if you want the fastest-running code using sets, see other answers.
This maintains the O(n log(k)) time complexity: all the code between if lowest != elem or ary != times_seen: and unbench_all = False is O(log(k)). There is a nested loop inside the main loop (for unbenched in range(times_seen):) but this only runs times_seen times, and times_seen is initially 0 and is reset to 0 after every time this inner loop is run, and can only be incremented once per main loop iteration, so the inner loop cannot do more iterations in total than the main loop. Thus, since the code inside the inner loop is O(log(k)) and runs at most as many times as the outer loop, and the outer loop is O(log(k)) and runs n times, the algorithm is O(n log(k)).
This algorithm relies upon how tuples are compared in Python. It compares the first items of the tuples, and if they are equal it, compares the second items (i.e. (x, a) < (x, b) is true if and only if a < b).
In this algorithm, unlike in the example code in the question, when an item is popped from the heap, it is not necessarily pushed again in the same iteration. Since we need to check if all sub-lists contain the same number, after a number is popped from the heap, it's sublist is what I call "benched", meaning that it is not added back to the heap. This is because we need to check if other sub-lists contain the same item, so adding this sub-list's next item is not needed right now.
If a number is indeed in all sub-lists, then the heap will look something like [(2,0),(2,1),(2,2),(2,3)], with all the first elements of the tuples the same, so heappop will select the one with the lowest sub-list index. This means that first index 0 will be popped and times_seen will be incremented to 1, then index 1 will be popped and times_seen will be incremented to 2 - if ary is not equal to times_seen then the number is not in the intersection of all sub-lists. This leads to the condition if lowest != elem or ary != times_seen:, which decides when a number shouldn't be in the result. The else branch of this if statement is for when it still might be in the result.
The unbench_all boolean is for when all sub-lists need to be removed from the bench - this could be because:
The current number is known to not be in the intersection of the sub-lists
It is known to be in the intersection of the sub-lists
When unbench_all is True, all the sub-lists that were removed from the heap are re-added. It is known that these are the ones with indices in range(times_seen) since the algorithm removes items from the heap only if they have the same number, so they must have been removed in order of index, contiguously and starting from index 0, and there must be times_seen of them. This means that we don't need to store the indices of the benched sub-lists, only the number that have been benched.
import heapq
def mergeArys(srtd_arys):
heap = []
srtd_iters = [iter(x) for x in srtd_arys]
# put the first element from each srtd array onto the heap
for idx, it in enumerate(srtd_iters):
elem = next(it, None)
if elem:
heapq.heappush(heap, (elem, idx))
res = []
# the number of tims that the current number has been seen
times_seen = 0
# the lowest number from the heap - currently checking if the first numbers in all sub-lists are equal to this
lowest = heap[0][0] if heap else None
# collect results in nlogK time
while heap:
elem, ary = heap[0]
unbench_all = True
if lowest != elem or ary != times_seen:
if lowest == elem:
heapq.heappop(heap)
it = srtd_iters[ary]
nxt = next(it, None)
if nxt:
heapq.heappush(heap, (nxt, ary))
else:
heapq.heappop(heap)
times_seen += 1
if times_seen == len(srtd_arys):
res.append(elem)
else:
unbench_all = False
if unbench_all:
for unbenched in range(times_seen):
unbenched_it = srtd_iters[unbenched]
nxt = next(unbenched_it, None)
if nxt:
heapq.heappush(heap, (nxt, unbenched))
times_seen = 0
if heap:
lowest = heap[0][0]
return res
if __name__ == '__main__':
a1 = [[1, 3, 5, 7], [1, 1, 3, 5, 7], [1, 4, 7, 9]]
a2 = [[1, 1], [1, 1, 2, 2, 3]]
for arys in [a1, a2]:
print(mergeArys(arys))
An equivalent algorithm can be written like this, if you prefer:
def mergeArys(srtd_arys):
heap = []
srtd_iters = [iter(x) for x in srtd_arys]
# put the first element from each srtd array onto the heap
for idx, it in enumerate(srtd_iters):
elem = next(it, None)
if elem:
heapq.heappush(heap, (elem, idx))
res = []
# collect results in nlogK time
while heap:
elem, ary = heap[0]
lowest = elem
keep_elem = True
for i in range(len(srtd_arys)):
elem, ary = heap[0]
if lowest != elem or ary != i:
if ary != i:
heapq.heappop(heap)
it = srtd_iters[ary]
nxt = next(it, None)
if nxt:
heapq.heappush(heap, (nxt, ary))
keep_elem = False
i -= 1
break
heapq.heappop(heap)
if keep_elem:
res.append(elem)
for unbenched in range(i+1):
unbenched_it = srtd_iters[unbenched]
nxt = next(unbenched_it, None)
if nxt:
heapq.heappush(heap, (nxt, unbenched))
if len(heap) < len(srtd_arys):
heap = []
return res

You can use builtin sets and sets intersections :
d = [[1,3,5,7],[1,1,3,5,7],[1,4,7,9]]
result = set(d[0]).intersection(*d[1:])
{1, 7}

You can use reduce:
from functools import reduce
a = [[1,3,5,7],[1,1,3,5,7],[1,4,7,9]]
reduce(lambda x, y: x & set(y), a[1:], set(a[0]))
{1, 7}

I've come up with this algorithm. It doesn't exceed O(nk) I don't know if it's good enough for you. the point of this algorithm is that you can have k indexes for each array and each iteration you find the indexes of the next element in the intersection and increase every index until you exceed the bounds of an array and there are no more items in the intersection. the trick is since the arrays are sorted you can look at two elements in two different arrays and if one is bigger than the other you can instantly throw away the other because you know you cant have a smaller number than the one you are looking at. the worst case of this algorithm is that every index will be increased to the bound which takes kn time since an index cannot decrease its value.
inter = []
for n in range(len(arrays[0])):
if indexes[0] >= len(arrays[0]):
return inter
for i in range(1,k):
if indexes[i] >= len(arrays[i]):
return inter
while indexes[i] < len(arrays[i]) and arrays[i][indexes[i]] < arrays[0][indexes[0]]:
indexes[i] += 1
while indexes[i] < len(arrays[i]) and indexes[0] < len(arrays[0]) and arrays[i][indexes[i]] > arrays[0][indexes[0]]:
indexes[0] += 1
if indexes[0] < len(arrays[0]):
inter.append(arrays[0][indexes[0]])
indexes = [idx+1 for idx in indexes]
return inter

You said we can't use sets but how about dicts / hash tables? (yes I know they're basically the same thing) :D
If so, here's a fairly simple approach (please excuse the py2 syntax):
arrays = [[1,3,5,7],[1,1,3,5,7],[1,4,7,9]]
counts = {}
for ar in arrays:
last = None
for i in ar:
if (i != last):
counts[i] = counts.get(i, 0) + 1
last = i
N = len(arrays)
intersection = [i for i, n in counts.iteritems() if n == N]
print intersection

Same as Raymond Hettinger's solution but with more basic python code:
def intersection(arrays, unique: bool=False):
result = []
if not len(arrays) or any(not len(array) for array in arrays):
return result
pointers = [0] * len(arrays)
target = arrays[0][0]
start_step = 0
current_step = 1
while True:
idx = current_step % len(arrays)
array = arrays[idx]
while pointers[idx] < len(array) and array[pointers[idx]] < target:
pointers[idx] += 1
if pointers[idx] < len(array) and array[pointers[idx]] > target:
target = array[pointers[idx]]
start_step = current_step
current_step += 1
continue
if unique:
while (
pointers[idx] + 1 < len(array)
and array[pointers[idx]] == array[pointers[idx] + 1]
):
pointers[idx] += 1
if (current_step - start_step) == len(arrays):
result.append(target)
for other_idx, other_array in enumerate(arrays):
pointers[other_idx] += 1
if pointers[idx] < len(array):
target = array[pointers[idx]]
start_step = current_step
if pointers[idx] == len(array):
return result
current_step += 1

Here's an O(n) answer (where n = sum(len(sublist) for sublist in data)).
from itertools import cycle
def intersection(data):
result = []
maxval = float("-inf")
consecutive = 0
try:
for sublist in cycle(iter(sublist) for sublist in data):
value = next(sublist)
while value < maxval:
value = next(sublist)
if value > maxval:
maxval = value
consecutive = 0
continue
consecutive += 1
if consecutive >= len(data)-1:
result.append(maxval)
consecutive = 0
except StopIteration:
return result
print(intersection([[1,3,5,7], [1,1,3,5,7], [1,4,7,9]]))
[1, 7]

Some of the above methods are not covering the examples when there are duplicates in every subset of the list. The Below code implements this intersection and it will be more efficient if there are lots of duplicates in the subset of the list :) If not sure about duplicates it is recommended to use Counter from collections from collections import Counter. The custom counter function is made for increasing the efficiency of handling large duplicates. But still can not beat Raymond Hettinger's implementation.
def counter(my_list):
my_list = sorted(my_list)
first_val, *all_val = my_list
p_index = my_list.index(first_val)
my_counter = {}
for item in all_val:
c_index = my_list.index(item)
diff = abs(c_index-p_index)
p_index = c_index
my_counter[first_val] = diff
first_val = item
c_index = my_list.index(item)
diff = len(my_list) - c_index
my_counter[first_val] = diff
return my_counter
def my_func(data):
if not data or not isinstance(data, list):
return
# get the first value
first_val, *all_val = data
if not isinstance(first_val, list):
return
# count items in first value
p = counter(first_val) # counter({1: 2, 3: 1, 5: 1, 7: 1})
# collect all common items and calculate the minimum occurance in intersection
for val in all_val:
# collecting common items
c = counter(val)
# calculate the minimum occurance in intersection
inner_dict = {}
for inner_val in set(c).intersection(set(p)):
inner_dict[inner_val] = min(p[inner_val], c[inner_val])
p = inner_dict
# >>>p
# {1: 2, 7: 1}
# Sort by keys of counter
sorted_items = sorted(p.items(), key=lambda x:x[0]) # [(1, 2), (7, 1)]
result=[i[0] for i in sorted_items for _ in range(i[1])] # [1, 1, 7]
return result
Here are the sample Examples
>>> data = [[1,3,5,7],[1,1,3,5,7],[1,4,7,9]]
>>> my_func(data=data)
[1, 7]
>>> data = [[1,1,3,5,7],[1,1,3,5,7],[1,1,4,7,9]]
>>> my_func(data=data)
[1, 1, 7]

You can do the following using the functions heapq.merge, chain.from_iterable and groupby
from heapq import merge
from itertools import groupby, chain
ls = [[1, 3, 5, 7], [1, 1, 3, 5, 7], [1, 4, 7, 9]]
def index_groups(lst):
"""[1, 1, 3, 5, 7] -> [(1, 0), (1, 1), (3, 0), (5, 0), (7, 0)]"""
return chain.from_iterable(((e, i) for i, e in enumerate(group)) for k, group in groupby(lst))
iterables = (index_groups(li) for li in ls)
flat = merge(*iterables)
res = [k for (k, _), g in groupby(flat) if sum(1 for _ in g) == len(ls)]
print(res)
Output
[1, 7]
The idea is to give an extra value (using enumerate) to differentiate between equal values within the same list (see the function index_groups).
The complexity of this algorithm is O(n) where n is the sum of the lengths of each list in the input.
Note that the output for (an extra 1 en each list):
ls = [[1, 1, 3, 5, 7], [1, 1, 3, 5, 7], [1, 1, 4, 7, 9]]
is:
[1, 1, 7]

You can use bit-masking with one-hot encoding. The inner lists become maxterms. You and them together for the intersection and or them for the union. Then you have to convert back, for which I've used a bit hack.
problem = [[1,3,5,7],[1,1,3,5,8,7],[1,4,7,9]];
debruijn = [0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8,
31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9];
u32 = accum = (1 << 32) - 1;
for vec in problem:
maxterm = 0;
for v in vec:
maxterm |= 1 << v;
accum &= maxterm;
# https://graphics.stanford.edu/~seander/bithacks.html#IntegerLogDeBruijn
result = [];
while accum:
power = accum;
accum &= accum - 1; # Peter Wegner CACM 3 (1960), 322
power &= ~accum;
result.append(debruijn[((power * 0x077CB531) & u32) >> 27]);
print result;
This uses (simulates) 32-bit integers, so you can only have [0, 31] in your sets.
*I am inexperienced at Python, so I timed it. One should definitely use set.intersection.

Here is the single-pass counting algorithm, a simplified version of what others have suggested.
def intersection(iterables):
target, count = None, 0
for it in itertools.cycle(map(iter, iterables)):
for value in it:
if count == 0 or value > target:
target, count = value, 1
break
if value == target:
count += 1
break
else: # exhausted iterator
return
if count >= len(iterables):
yield target
count = 0
Binary and exponential search haven't come up yet. They're easily recreated even with the "no builtins" constraint.
In practice, that would be much faster, and sub-linear. In the worst case - where the intersection isn't shrinking - the naive approach would repeat work. But there's a solution for that: integrate the binary search while splitting the arrays in half.
def intersection(seqs):
seq = min(seqs, key=len)
if not seq:
return
pivot = seq[len(seq) // 2]
lows, counts, highs = [], [], []
for seq in seqs:
start = bisect.bisect_left(seq, pivot)
stop = bisect.bisect_right(seq, pivot, start)
lows.append(seq[:start])
counts.append(stop - start)
highs.append(seq[stop:])
yield from intersection(lows)
yield from itertools.repeat(pivot, min(counts))
yield from intersection(highs)
Both handle duplicates. Both guarantee O(N) worst-case time (counting slicing as atomic). The latter will approach O(min_size) speed; by always splitting the smallest in half it essentially can't suffer from the bad luck of uneven splits.

I couldn't help but notice that this is seems to be a variation on the Welfare Crook problem; see David Gries's book, The Science of Programming. Edsger Dijkstra also wrote an EWD about this, see Ascending Functions and the Welfare Crook.
The Welfare Crook
Suppose we have three long magnetic tapes, each containing a list of names in alphabetical order:
all people working for IBM Yorktown
students at Columbia University
people on welfare in New York City
Practically speaking, all three lists are endless, so no upper bounds are given. It is know that at least one person is on all three lists. Write a program to locate the first such person.
Our intersection of the ordered lists problem is a generalization of the Welfare Crook problem.
Here's a (rather primitive?) Python solution to the Welfare Crook problem:
def find_welfare_crook(f, g, h, i, j, k):
"""f, g, and h are "ascending functions," i.e.,
i <= j implies f[i] <= f[j] or, equivalently,
f[i] < f[j] implies i < j, and the same goes for g and h.
i, j, k define where to start the search in each list.
"""
# This is an implementation of a solution to the Welfare Crook
# problems presented in David Gries's book, The Science of Programming.
# The surprising and beautiful thing is that the guard predicates are
# so few and so simple.
i , j , k = i , j , k
while True:
if f[i] < g[j]:
i += 1
elif g[j] < h[k]:
j += 1
elif h[k] < f[i]:
k += 1
else:
break
return (i,j,k)
# The other remarkable thing is how the negation of the guard
# predicates works out to be: f[i] == g[j] and g[j] == c[k].
Generalization to Intersection of K Lists
This generalizes to K lists, and here's what I devised; I don't know how Pythonic this is, but it pretty compact:
def findIntersectionLofL(lofl):
"""Generalized findIntersection function which operates on a "list of lists." """
K = len(lofl)
indices = [0 for i in range(K)]
result = []
#
try:
while True:
# idea is to maintain the indices via a construct like the following:
allEqual = True
for i in range(K):
if lofl[i][indices[i]] < lofl[(i+1)%K][indices[(i+1)%K]] :
indices[i] += 1
allEqual = False
# When the above iteration finishes, if all of the list
# items indexed by the indices are equal, then another
# item common to all of the lists must be added to the result.
if allEqual :
result.append(lofl[0][indices[0]])
while lofl[0][indices[0]] == lofl[1][indices[1]]:
indices[0] += 1
except IndexError as e:
# Eventually, the foregoing iteration will advance one of the
# indices past the end of one of the lists, and when that happens
# an IndexError exception will be raised. This means the algorithm
# is finished.
return result
This solution does not keep repeated items. Changing the program to include all of the repeated items by changing what the program does in the conditional at the end of the "while True" loop is an exercise left to the reader.
Improved Performance
Comments from #greybeard prompted refinements shown below, in the
pre-computation of the "array index moduli" (the "(i+1)%K" expressions) and further investigation also brought about changes to the inner iteration's structure, to further remove overhead:
def findIntersectionLofLunRolled(lofl):
"""Generalized findIntersection function which operates on a "list of lists."
Accepts a list-of-lists, lofl. Each of the lists must be ordered.
Returns the list of each element which appears in all of the lists at least once.
"""
K = len(lofl)
indices = [0] * K
result = []
lt = [ (i, (i+1) % K) for i in range(K) ] # avoids evaluation of index exprs inside the loop
#
try:
while True:
allUnEqual = True
while allUnEqual:
allUnEqual = False
for i,j in lt:
if lofl[i][indices[i]] < lofl[j][indices[j]]:
indices[i] += 1
allUnEqual = True
# Now all of the lofl[i][indices[i]], for all i, are the same value.
# Store that value in the result, and then advance all of the indices
# past that common value:
v = lofl[0][indices[0]]
result.append(v)
for i,j in lt:
while lofl[i][indices[i]] == v:
indices[i] += 1
except IndexError as e:
# Eventually, the foregoing iteration will advance one of the
# indices past the end of one of the lists, and when that happens
# an IndexError exception will be raised. This means the algorithm
# is finished.
return result

Accessing a tuple in enumerate?

I want to iterate through a list and sum all the elements. Except, if the number is a 5, I want to skip the number following that 5. So for example:
x=[1,2,3,4,5,6,7,5,4,3] #should results in 30.
I'm just not sure how I can access the index of a tuple, when I use enumerate. What I want to do, is use an if statement, that if the number at the previous index == 5, continue the loop.
Thanks you

The itertools documentation has a recipe for this called pairwise. You can either copy-paste the function or import it from more_itertools (which needs to be installed).
Demo:
>>> from more_itertools import pairwise
>>>
>>> x = [1,2,3,4,5,6,7,5,4,3]
>>> x[0] + sum(m for n, m in pairwise(x) if n != 5)
30
edit:
But what if my datastructure is iterable, but does not support indexing?
In this case, the above solution needs a minor modification.
>>> from itertools import tee
>>> from more_itertools import pairwise
>>>
>>> x = (n for n in [1,2,3,4,5,6,7,5,4,3]) # generator, no indices!
>>> it1, it2 = tee(x)
>>> next(it1, 0) + sum(m for n, m in pairwise(it2) if n != 5)
30

Not a fan of bug-ridden one-liners that get upvoted.
So here's the answer with a for-loop.
x=[1,2,3,4,5,6,7,5,4,3, 5] #should results in 35.
s = 0
for i, v in enumerate(x):
if i != 0 and x[i-1] == 5:
continue
s += v
print(s)

Using sum with enumerate
Ex:
x=[1,2,3,4,5,6,7,5,4,3]
print(sum(v for i, v in enumerate(x) if (i == 0) or (x[i-1] != 5)))
Output:
30

Simple, verbose way:
SKIP_PREV = 5
x = [1,2,3,4,5,6,7,5,4,3]
prev = -1
s = 0
for num in x:
if prev != SKIP_PREV:
s += num
prev = num
print(s)
# 30
Compact, maybe less clear way:
SKIP_PREV = 5
x = [1,2,3,4,5,6,7,5,4,3]
s = sum(num for i, num in enumerate(x) if i == 0 or x[i - 1] != SKIP_PREV)
print(s)
# 30

If you are happy to use a 3rd party library, you can use NumPy with integer indexing:
import numpy as np
x = np.array([1,2,3,4,5,6,7,5,4,3])
res = x.sum() - x[np.where(x == 5)[0]+1].sum() # 30
See also What are the advantages of NumPy over regular Python lists?

You can pair the list with a shifted version of itself. This should work:
sum(val for (prev, val)
in zip(itertools.chain((None,), x), x)
if prev != 5 )

The longest code so far. Anyway, it does not need enumerate, it is a simple FSM.
x = [1,2,3,4,5,6,7,5,4,3]
skip = False
s = 0
for v in x:
if skip:
skip = False
else:
s += v
skip = v == 5
print(s)

Finding the subsequence that starts and ends with the same number with the maximum sum

I have to make a program that takes as input a list of numbers and returns the sum of the subsequence that starts and ends with the same number which has the maximum sum (including the equal numbers in the beginning and end of the subsequence in the sum). It also has to return the placement of the start and end of the subsequence, that is, their index+1. The problem is that my current code runs smoothly only while the length of list is not that long. When the list length extends to 5000 the program does not give an answer.
The input is the following:
6
3 2 4 3 5 6
The first line is for the length of the list. The second line is the list itself, with list items separated by space. The output will be 12, 1, 4, because as you can see there is 1 equal pair of numbers (3): the first and fourth element, so the sum of elements between them is 3 + 2 + 4 + 3 = 12, and their placement is first and fourth.
Here is my code.
length = int(input())
mass = raw_input().split()
for i in range(length):
mass[i]=int(mass[i])
value=-10000000000
b = 1
e = 1
for i in range(len(mass)):
if mass[i:].count(mass[i])!=1:
for j in range(i,len(mass)):
if mass[j]==mass[i]:
f = mass[i:j+1]
if sum(f)>value:
value = sum(f)
b = i+1
e = j+1
else:
if mass[i]>value:
value = mass[i]
b = i+1
e = i+1
print value
print b,e

This should be faster than your current approach.
Rather than searching through mass looking for pairs of matching numbers we pair each number in mass with its index and sort those pairs. We can then use groupby to find groups of equal numbers. If there are more than 2 of the same number we use the first and last, since they will have the greatest sum between them.
from operator import itemgetter
from itertools import groupby
raw = '3 5 6 3 5 4'
mass = [int(u) for u in raw.split()]
result = []
a = sorted((u, i) for i, u in enumerate(mass))
for _, g in groupby(a, itemgetter(0)):
g = list(g)
if len(g) > 1:
u, v = g[0][1], g[-1][1]
result.append((sum(mass[u:v+1]), u+1, v+1))
print(max(result))
output
(19, 2, 5)
Note that this code will not necessarily give the maximum sum between equal elements in the list if the list contains negative numbers. It will still work correctly with negative numbers if no group of equal numbers has more than two members. If that's not the case, we need to use a slower algorithm that tests every pair within a group of equal numbers.
Here's a more efficient version. Instead of using the sum function we build a list of the cumulative sums of the whole list. This doesn't make much of a difference for small lists, but it's much faster when the list size is large. Eg, for a list of 10,000 elements this approach is about 10 times faster. To test it, I create an array of random positive integers.
from operator import itemgetter
from itertools import groupby
from random import seed, randrange
seed(42)
def maxsum(seq):
total = 0
sums = [0]
for u in seq:
total += u
sums.append(total)
result = []
a = sorted((u, i) for i, u in enumerate(seq))
for _, g in groupby(a, itemgetter(0)):
g = list(g)
if len(g) > 1:
u, v = g[0][1], g[-1][1]
result.append((sums[v+1] - sums[u], u+1, v+1))
return max(result)
num = 25000
hi = num // 2
mass = [randrange(1, hi) for _ in range(num)]
print(maxsum(mass))
output
(155821402, 21, 24831)
If you're using a recent version of Python you can use itertools.accumulate to build the list of cumulative sums. This is around 10% faster.
from itertools import accumulate
def maxsum(seq):
sums = [0] + list(accumulate(seq))
result = []
a = sorted((u, i) for i, u in enumerate(seq))
for _, g in groupby(a, itemgetter(0)):
g = list(g)
if len(g) > 1:
u, v = g[0][1], g[-1][1]
result.append((sums[v+1] - sums[u], u+1, v+1))
return max(result)
Here's a faster version, derived from code by Stefan Pochmann, which uses a dict, instead of sorting & groupby. Thanks, Stefan!
def maxsum(seq):
total = 0
sums = [0]
for u in seq:
total += u
sums.append(total)
where = {}
for i, x in enumerate(seq, 1):
where.setdefault(x, [i, i])[1] = i
return max((sums[j] - sums[i - 1], i, j)
for i, j in where.values())
If the list contains no duplicate items (and hence no subsequences bound by duplicate items) it returns the maximum item in the list.
Here are two more variations. These can handle negative items correctly, and if there are no duplicate items they return None. In Python 3 that could be handled elegantly by passing default=None to max, but that option isn't available in Python 2, so instead I catch the ValueError exception that's raised when you attempt to find the max of an empty iterable.
The first version, maxsum_combo, uses itertools.combinations to generate all combinations of a group of equal numbers and thence finds the combination that gives the maximum sum. The second version, maxsum_kadane uses a variation of Kadane's algorithm to find the maximum subsequence within a group.
If there aren't many duplicates in the original sequence, so the average group size is small, maxsum_combo is generally faster. But if the groups are large, then maxsum_kadane is much faster than maxsum_combo. The code below tests these functions on random sequences of 15000 items, firstly on sequences with few duplicates (and hence small mean group size) and then on sequences with lots of duplicates. It verifies that both versions give the same results, and then it performs timeit tests.
from __future__ import print_function
from itertools import groupby, combinations
from random import seed, randrange
from timeit import Timer
seed(42)
def maxsum_combo(seq):
total = 0
sums = [0]
for u in seq:
total += u
sums.append(total)
where = {}
for i, x in enumerate(seq, 1):
where.setdefault(x, []).append(i)
try:
return max((sums[j] - sums[i - 1], i, j)
for v in where.values() for i, j in combinations(v, 2))
except ValueError:
return None
def maxsum_kadane(seq):
total = 0
sums = [0]
for u in seq:
total += u
sums.append(total)
where = {}
for i, x in enumerate(seq, 1):
where.setdefault(x, []).append(i)
try:
return max(max_sublist([(sums[j] - sums[i-1], i, j)
for i, j in zip(v, v[1:])], k)
for k, v in where.items() if len(v) > 1)
except ValueError:
return None
# Kadane's Algorithm to find maximum sublist
# From https://en.wikipedia.org/wiki/Maximum_subarray_problem
def max_sublist(seq, k):
max_ending_here = max_so_far = seq[0]
for x in seq[1:]:
y = max_ending_here[0] + x[0] - k, max_ending_here[1], x[2]
max_ending_here = max(x, y)
max_so_far = max(max_so_far, max_ending_here)
return max_so_far
def test(num, hi, loops):
print('\nnum = {0}, hi = {1}, loops = {2}'.format(num, hi, loops))
print('Verifying...')
for k in range(5):
mass = [randrange(-hi // 2, hi) for _ in range(num)]
a = maxsum_combo(mass)
b = maxsum_kadane(mass)
print(a, b, a==b)
print('\nTiming...')
for func in maxsum_combo, maxsum_kadane:
t = Timer(lambda: func(mass))
result = sorted(t.repeat(3, loops))
result = ', '.join([format(u, '.5f') for u in result])
print('{0:14} : {1}'.format(func.__name__, result))
loops = 20
num = 15000
hi = num // 4
test(num, hi, loops)
loops = 10
hi = num // 100
test(num, hi, loops)
output
num = 15000, hi = 3750, loops = 20
Verifying...
(13983131, 44, 14940) (13983131, 44, 14940) True
(13928837, 27, 14985) (13928837, 27, 14985) True
(14057416, 40, 14995) (14057416, 40, 14995) True
(13997395, 65, 14996) (13997395, 65, 14996) True
(14050007, 12, 14972) (14050007, 12, 14972) True
Timing...
maxsum_combo : 1.72903, 1.73780, 1.81138
maxsum_kadane : 2.17738, 2.22108, 2.22394
num = 15000, hi = 150, loops = 10
Verifying...
(553789, 21, 14996) (553789, 21, 14996) True
(550174, 1, 14992) (550174, 1, 14992) True
(551017, 13, 14991) (551017, 13, 14991) True
(554317, 2, 14986) (554317, 2, 14986) True
(558663, 15, 14988) (558663, 15, 14988) True
Timing...
maxsum_combo : 7.29226, 7.34213, 7.36688
maxsum_kadane : 1.07532, 1.07695, 1.10525
This code runs on both Python 2 and Python 3. The above results were generated on an old 32 bit 2GHz machine running Python 2.6.6 on a Debian derivative of Linux. The speeds for Python 3.6.0 are similar.
If you want to include groups that consist of a single non-repeated number, and also want to include the numbers that are in groups as a "subsequence" of length 1, you can use this version:
def maxsum_kadane(seq):
if not seq:
return None
total = 0
sums = [0]
for u in seq:
total += u
sums.append(total)
where = {}
for i, x in enumerate(seq, 1):
where.setdefault(x, []).append(i)
# Find the maximum of the single items
m_single = max((k, v[0], v[0]) for k, v in where.items())
# Find the maximum of the subsequences
try:
m_subseq = max(max_sublist([(sums[j] - sums[i-1], i, j)
for i, j in zip(v, v[1:])], k)
for k, v in where.items() if len(v) > 1)
return max(m_single, m_subseq)
except ValueError:
# No subsequences
return m_single
I haven't tested it extensively, but it should work. ;)

Finding if the next element is smaller than the one before it and deleting it from the list python

I am having trouble with my code, I am writing a method that will check if the next element is smaller than the previous element and if it is, it will delete it.
Example:
Input: [1, 20, 10, 30]
Desired output: [1,20,30]
Actual output: [30]
def findSmaller(s):
i = -1
y = []
while i <= len(s):
for j in range(len(s)):
if s[i+1] <= s[i]:
del s[i + 1]
y.append(s[i])
i += 1
return y

If you are uncertain about how your loops work I recommend adding in some print statements. That way you can see what your loop is actually doing, especially in more complicated problems this is useful.
Something like this would solve your problem.
a = [1,2,3,2,4]
for k in range(0,len(a)-2): #-2 so that one don't go past the loops length
#print(k)
y = a
if(a[k]>a[k+1]):
del y[k+1] #delete the k+1 element if it is

>>> s = [5, 20, 10, 15, 30]
>>> max_so_far = s[0]
>>> result = []
>>> for x in s:
if x >= max_so_far:
result.append(x)
max_so_far = x
>>> result
[5, 20, 30]

Depending whether you need to do some calculation later with the list you can use a generator
s = [1, 20, 10, 30]
def find_smaller_generator(l: list):
last_item = None
for item in l:
if last_item is None or item >= last_item:
last_item = item
yield item
def find_smaller_list(l: list):
return list(find_smaller_generator(l))
print(find_smaller_list(s))
for i in find_smaller_generator(s):
print(i)
print([i**2 for i in find_smaller_generator(s)])
this returns:
[1, 20, 30]
1
20
30
[1, 400, 900]

You can try something like this
def findSmaller(s):
# sets p (previous) as the first value in s
p = s[0]
# initializes y to be an array and sets the first value to p
y = [p]
# iterates over s, starting with the second element
for i in s[1::]:
# checks if i is greater than or equal to the previous element
if i >= p:
# if it is, i is appended to the list y
y.append(i)
# also set the previous value to i, so the next iteration can check against p
p = i
#returns the list
return y
What this does is iterate over s and checks if the current item in the list is greater than or equal to the previous element in the list. If it is then it appends it to y, and y is returned.
Try out the code here.

Python for() loop with math operators

Say I have a list of numbers such as:
my_list = [1, 17, 2]
And I wanted to add those together. I know I can use print(sum(my_list)). However I wanted to see if there was another way of doing so, so I tried the following:
b = len(my_list)
for m in range(my_list[0], my_list[b-1]):
m += m
print(m)
I am sure something like this should work, but I am obviously doing it wrong. The output of this is 2. After I tried:
result = 0
b = len(my_list)
for m in range(my_list[0], my_list[b-1]):
result = result + m
print(result)
This outputs 1.
Please explain what I am doing wrong and how I can correct it.

Since you are using range function defining range between 1 and 2. The only data generated in m is 1 hence result is 1.
In Python, you can iterate over the elements of a sequence directly:
m = [1, 17, 2]
res = 0
for i in m:
res += i
print res

First, you should put a correct range: 0..2 in your case (since your list items' indexes starts from 0 and has 2 items)
for i in range(0, b):
result = result + my_list[i];
Or if you prefer "for each" style you should itterate by list you are summing:
for m in my_list:
result = result + m;
Finally if you want to print a final sum only you should correct print indent:
for m in my_list:
result = result + m;
print(result) # <- mind indent
Wrapping up:
my_list = [1, 17, 2]
result = 0
for m in my_list:
result = result + m;
print(result)

from operator import add
my_list = [1, 17, 2]
result=reduce(add, my_list)

import functools
print(functools.reduce(lambda x,y: x+y, my_list))

try this
my_list = [1, 17, 2]
reduce(lambda x, y: x+y, my_list)

to get the values from my_list you can use this syntax:
for m in my_list:
print m
If you use range it will give you a range from 1 ( first value of your list ) to 2 (length of your list -1)
To add the values of your list you can try this code:
out = 0
for m in my_list:
out = out + m
print(out)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Going through a list and checking in Python - python

Solution: def subDupeLimits( aList ): dupList = [] prevX = None for x in aList: if x == prevX: dupList.append(x) # track duplicates prevX = x # update previous x # return last duplicate minus first return dupList[-1] - dupList[0] # call it y = subDupeLimits( [1,1,1,1,2,3,4,5,5,5,3,2] ) # y = 4

Related

What is the most efficient way of getting the intersection of k sorted arrays?

Accessing a tuple in enumerate?

Finding the subsequence that starts and ends with the same number with the maximum sum

Finding if the next element is smaller than the one before it and deleting it from the list python

Python for() loop with math operators

Categories

Resources