How to Compare sublists - python

I'm trying to find the smallest number out of a list made up of sublists.
The output for the program should go like this:
least([[2,4,3],[1,7,9,4]])
in [[2,4,3],[1,7,9,4]] the least number is 1 found in sublist [1,7,9,4]
So far I have code that finds the smallest number in a list and that prints out sublists but how do I combine them together, that's really my issue.
# finds smallest number in list
def test(list1):
x = list1[0]
for i in list1:
if i < x:
x = i
print(x)
# prints out sublists
def test2(num):
for x in num:
for y in x:
print (y, end = " ")
print("")
Does the body of "test" go before or after the line
for y in x:

Python has a built-in min function. But I guess it's a good learning exercise to write your own.
We can write a function to find the sublist containing the minimum element by creating a modified version of your test function.
The key idea is to find the minimum of each sublist, and when we find a new minimum we store the sublist that that minimum came from.
In the code below I've change the function name from test to minimum to make it more meaningful.
def minimum(list1):
''' Finds smallest item in list1 '''
x = list1[0]
for i in list1:
if i < x:
x = i
return x
def least(list2d):
minseq = list2d[0]
x = minimum(minseq)
for seq in list2d[1:]:
i = minimum(seq)
if i < x:
x = i
minseq = seq
print('In {} the least number is {} found in sublist {}'.format(list2d, x, minseq))
# Test
data = [[2, 4, 3], [1, 7, 9, 4], [6, 7, 5]]
least(data)
output
In [[2, 4, 3], [1, 7, 9, 4], [6, 7, 5]] the least number is 1 found in sublist [1, 7, 9, 4]
However, we can write this in a more compact way by using the built-in min function to find the smallest sublist for us. The trick here is that we get min to call itself to find the the minimum item in each sublist, and then use those minima to decide which sublist is the minimal one.
def least(list2d):
minseq = min(list2d, key=min)
x = min(minseq)
print('In {} the least number is {} found in sublist {}'.format(list2d, x, minseq))
This version is slightly inefficient since it computes the minimum of the sublist with the smallest item twice. To avoid that we can pass min a generator expression:
def least(list2d):
x, minseq = min((min(seq), seq) for seq in list2d)
print('In {} the least number is {} found in sublist {}'.format(list2d, x, minseq))
That generator expression creates tuples of each sublist and its minimum, those tuples are then passed to the outer min call in order to find the tuple containing the smallest minimum. If 2 or more tuples tie for the minimum then the tuples themselves are compared to decide on the winner.

Assuming that sub lists are only one level deep you can do this by maintaining a variable containing the least value seen, and another to keep track of the list that contained that value:
lists = [[2, 4, 3], [1, 7, 9, 4]]
min_list = None
min_value = lists[0][0] # initialse to the first item of the first list
for sublist in lists:
min_ = min(sublist)
if min_ < min_value:
min_value = min_
min_list = sublist
print("the least number is {} found in sublist {}".format(min_value, min_list))

Short and simple:
>>> lst = [[2, 4, 3], [1, 7, 9, 4]]
>>> min_value_in_lst = min(min(sublist) for sublist in lst)
1
If you want to know where that min value came from, just simply create a loop:
for sublist in lst:
if min_value_in_lst in sublist:
return sublist
The full function:
def least(lst):
min_value_in_lst = min(min(sublist) for sublist in lst)
that_sublist = None
for sublist in lst:
if min_value_in_lst in sublist:
that_sublist = sublist
break
# print out the result

If you need pythonic way to deal, then try this simple code:-
a = [[2,4,3],[1,7,9,4]]
min(reduce(lambda x,y : x+y, a))
Output:-
1

Related

How to compare each value from list with each value from another list with specific condition?

I have two list with values: list1 = [3,4,5,6,7,8] and list2 = [8,9,10,12,14,16,18,20].
i want to compare each value from list1 with each value from list2 and if their sum is even, I want to add the value from list2 to another list: list3. HOWEVER. I don't want too many similar values in list3. So, once value from list2 is included to list3, I don't want it to be used anymore, UNLESS there is no other option.
So in this example the output would be: [9, 8, 8, 10, 9, 12]. as you see, 8 was added again only because no other value from list2 summed up to even number with 5. same thing with 7 from list1, but since 8 from list2 was already used two times it uses 9 instead.
How could i do that? Is there better solution to get desired output?
If your condition is only for the "sum being even", then as #jl-peyret suggested, you can partition the list and efficiently create your resulting list in O(len(list1) + len(list2)) time.
If the condition is something generic, then here's a generic method to generate the result -
def condition(x, y):
# sum is even
return not bool((x + y) % 2)
def get_default(list2, current_default, used):
# get next default value, in case none of the values match condition
# TODO: for OP to implement
return list2[0], current_default
list1 = [3,4,5,6,7,8]
list2 = [8,9,10,12,14,16,18,20]
list3 = []
used = [False] * len(list2) # for keeping track of used items in list2
current_default = [0, 0] # index, and count of times the value is used
for i, num1 in enumerate(list1):
# check if condition is true for any item in list2 which is not used yet
for j, num2 in enumerate(list2):
if not used[j] and condition(num1, num2):
list3.append(num2)
used[j] = True
break
# if none of the items in list2 match with the condition, use default logic
if len(list3) < i + 1:
val, current_default = get_default(list2, current_default, used)
list3.append(val)
print(list3)
this prints
[9, 8, 8, 10, 8, 12]
Currently, the get_default() is using list2[0] as default value in case no item is found in list2 matching given condition. I will leave implementing the get_default() to you, it should be trivial to implement it given your condition that a value can be used twice at max with the help of tracking pointer current_default.
For generic condition, I don't think there is a way to do it more efficiently than O(n^2) but you can do a further optimisation to use a linked list instead of list for list2 and shorten the search space for used values where we are doing the condition matching.
To get the expected output listed in the question, the following code iterates through the first list checking each number against the numbers in the second list until an even sum is found that has not yet been added to the third list. Once found, it is added, but if a repetition becomes necessary, then a counter dictionary is used to add the first number from the second list which has been repeated in the third list the least.
def listComp(list1, list2):
list3 = []
for list1NUM in list1:
preList = []
for list2NUM in list2:
if (list1NUM + list2NUM) % 2 == 0:
preList.append(list2NUM)
postList = []
for prelistNum in preList:
if prelistNum not in list3:
list3.append(prelistNum)
break
else:
postList.append(prelistNum)
if len(preList) == len(postList):
from collections import Counter
itemCount = Counter(list3)
list3.append(min(itemCount, key=itemCount.get))
return list3
list1 = [3,4,5,6,7,8]
list2 = [8,9,10,12,14,16,18,20]
list3 = listComp(list1, list2)
print (list3)
The preceding code produces the following output:
[9, 8, 8, 10, 9, 12]

Updating the range of list in function argument

Problem to solve: Define a Python function remdup(l) that takes a non-empty list of integers l
and removes all duplicates in l, keeping only the last occurrence of each number. For instance:
if we pass this argument then remdup([3,1,3,5]) it should give us a result [1,3,5]
def remdup(l):
for last in reversed(l):
pos=l.index(last)
for search in reversed(l[pos]):
if search==last:
l.remove(search)
print(l)
remdup([3,5,7,5,3,7,10])
# intended output [5, 3, 7, 10]
On line 4 for loop I want the reverse function to check for each number excluding index[last] but if I use the way I did in the above code it takes the value at pos, not the index number. How can I solve this
You need to reverse the entire slice, not merely one element:
for search in reversed(l[:pos]):
Note that you will likely run into a problem for modifying a list while iterating. See here
It took me a few minutes to figure out the clunky logic. Instead, you need the rest of the list:
for search in reversed(l[pos+1:]):
Output:
[5, 3, 7, 10]
Your original algorithm could be improved. The nested loop leads to some unnecessary complexity.
Alternatively, you can do this:
def remdup(l):
seen = set()
for i in reversed(l):
if i in seen:
l.remove(i)
else:
seen.add(i)
print(l)
I use the 'seen' set to keep track of the numbers that have already appeared.
However, this would be more efficient:
def remdup(l):
seen = set()
for i in range(len(l)-1, -1, -1):
if l[i] in seen:
del l[i]
else:
seen.add(l[i])
print(l)
In the second algorithm, we are iterating over the list in reverse order using a range, and then we delete any item that already exists in 'seen'. I'm not sure what the implementation of reversed() and remove() is, so I can't say what the exact impact on time/space complexity is. However, it is clear to see exactly what is happening in the second algorithm, so I would say that it is a safer option.
This is a fairly inefficient way of accomplishing this:
def remdup(l):
i = 0
while i < len(l):
v = l[i]
scan = i + 1
while scan < len(l):
if l[scan] == v:
l.remove(v)
scan -= 1
i -= 1
scan += 1
i += 1
l = [3,5,7,5,3,7,10]
remdup(l)
print(l)
It essentially walks through the list (indexed by i). For each element, it scans forward in the list for a match, and for each match it finds, it removes the original element. Since removing an element shifts the indices, it adjusts both its indices accordingly before continuing.
It takes advantage of the built-in the list.remove: "Remove the first item from the list whose value is equal to x."
Here is another solution, iterating backward and popping the index of a previously encountered item:
def remdup(l):
visited= []
for i in range(len(l)-1, -1, -1):
if l[i] in visited:
l.pop(i)
else:
visited.append(l[i])
print(l)
remdup([3,5,7,5,3,7,10])
#[5, 3, 7, 10]
Using dictionary:
def remdup(ar):
d = {}
for i, v in enumerate(ar):
d[v] = i
return [pair[0] for pair in sorted(d.items(), key=lambda x: x[1])]
if __name__ == "__main__":
test_case = [3, 1, 3, 5]
output = remdup(test_case)
expected_output = [1, 3, 5]
assert output == expected_output, f"Error in {test_case}"
test_case = [3, 5, 7, 5, 3, 7, 10]
output = remdup(test_case)
expected_output = [5, 3, 7, 10]
assert output == expected_output, f"Error in {test_case}"
Explanation
Keep the last index of each occurrence of the numbers in a dictionary. So, we store like: dict[number] = last_occurrence
Sort the dictionary by values and use list comprehension to make a new list from the keys of the dictionary.
Along with other right answers, here's one more.
from iteration_utilities import unique_everseen,duplicates
import numpy as np
list1=[3,5,7,5,3,7,10]
dup=np.sort(list((duplicates(list1))))
list2=list1.copy()
for j,i in enumerate(list2):
try:
if dup[j]==i:
list1.remove(dup[j])
except:
break
print(list1)
How about this one-liner: (convert to a function is easy enough for an exercise)
# - one-liner Version
lst = [3,5,7,5,3,7,10]
>>>list(dict.fromkeys(reversed(lst)))[::-1]
# [5, 3, 7, 10]
if you don't want a new list, you can do this instead:
lst[:] = list(dict.fromkeys(reversed(lst)))[::-1]

Find the permutations that sums to the three smallest numbers

I asked the same thing yesterday but was finding a hard time finding the right sentence to describe my problem, so I deleted it. But here it is again.
Let us say that we have 3 lists:
list1 = [1, 2]
list2 = [2, 3]
list3 = [1]
Let us say I want to find the 3 permutations of these list, which when added together, it results in the smallest number possible. So here, the permutations that we want would be:
1,2,1
2,2,1
1,3,1
Because the sum of the numbers on each permutation creates the smallest numbers possible.
2,3,1
Will not be a part of the solution since the sum is larger than the other three, thus, not a part of the three smallest.
Of course, using itertools and list all the permutations, and add the numbers on each permutation would be the most obvious solution, but I was wondering if there is a more efficient algorithm for this? Considering It should be able to take 1000 lists.
NOTE: If the number of list is N, then i would need to find N permutations. Thus, if there are 3 lists, I find the 3 smallest permutations.
PRECONDITIONS:
-A part of the precondition is that all of these lists are sorted.
-The number of elements on all list is 2N-1, to deal with the case where only one list have more than 1 element.
-All of the lists are sorted from smallest.
Since the lists are sorted, the smallest element in each list is the first one, the sum of which gives us the "minimal sum permutation". Picking any element except from the first one is going to increase the sum value.
We start off by calculating the difference between element i and the first one for each list. For example, for the lists [1, 3, 4, 8] and [3, 9, 12, 15], these differences would be [2, 3, 7] and [6, 9, 12] respectively. We keep them separate in cost_lists, because they will be needed later on. But in cost_global, we pool them all together and by sorting them in ascending order, we find a solution where for all lists but one we choose the minimal value. To keep track which element from which list will give us the next minimum sum, we group the difference values with both the index of the list it comes from and which element in that list it is.
However, this is not a complete approach. It is possible, for example, that taking the next value from two lists incurs a smaller cost than taking the next value from one list. So, we have to search for the product of the combinations for k = 2, 3, ..., N. Doing that normally would result to N**N complexity, but we can take some really good shortcuts.
From the partial solution above, we have a list of the minimal costs in order. Since we want only the first N minimal sums, we check what the cost value of the Nth permutation is (threshold). So, when we search for a group of two next values, we can safely ignore their sum if it exceeds our current threshold. And since the difference values within lists are in ascending order, once we cross the threshold, we can instantly exit the loop. Similarly, if we haven't found any new combinations within the threshold for k = 2, it is pointless to look for k > 2. Considering that most likely the smallest sum costs will be the result of a single nonminimal value, or a few small ones (unless most lists have massive differences between sequential values), we are bound to exit these loops rather quickly. The code I came up to achieve this is fairly ugly, but it effectively does the same as
for k in xrange(2, len(lists)):
for comb in itertools.combinations(cost_lists, k):
for group in itertools.product(*comb):
if sum(g[0] for g in group) <= threshold:
cost_global.append(group)
except that we exit the loops as soon as we guarantee not to find any results, lest we pointlessly shift through an innumerable number of combinations/products which are over the threshold.
def filter_cost(cost_lists, threshold):
cost = [[i for i in ilist if i[0] <= threshold] for ilist in cost_lists]
# the algorithm requires that we remove any lists that have become empty
return [ilist for ilist in cost if ilist]
def _combi(cost_lists, k, start, depth, subtotal, threshold):
if depth == k:
for i in xrange(start, len(cost_lists)):
for value in cost_lists[i]:
if value[0] + subtotal > threshold:
break
yield (value,)
else:
for i in xrange(start, len(cost_lists)):
for value in cost_lists[i]:
if value[0] + subtotal > threshold:
break
for c in _combi(cost_lists, k, i+1, depth+1,
value[0]+subtotal, threshold):
yield (value,) + c
def combinations_product(cost_lists, k, threshold):
for i in xrange(len(cost_lists)-k+1):
for value in cost_lists[i]:
if value[0] > threshold:
break
for comb in _combi(cost_lists, k, i+1, 2, value[0], threshold):
temp = (value,) + comb
cost, ilists, ith_items = zip(*temp)
yield sum(cost), ilists, ith_items
def find_smallest_sum_permutations(lists):
minima = [min(x) for x in lists]
cost_local = []
cost_global = []
for i, ilist in enumerate(lists):
if len(ilist) > 1:
first = ilist[0]
diff = [(num-first, i, j) for j, num in enumerate(ilist[1:], 1)]
cost_local.append(diff)
cost_global.extend(diff)
cost_global.sort()
threshold_index = len(lists) - 2
cost_threshold = cost_global[threshold_index][0]
cost_local = filter_cost(cost_local, cost_threshold)
for k in xrange(2, len(lists)):
group_combinations = tuple(combinations_product(cost_local, k,
cost_threshold))
if group_combinations:
cost_global.extend(group_combinations)
cost_global.sort()
cost_threshold = cost_global[threshold_index][0]
cost_local = filter_cost(cost_local, cost_threshold)
else:
break
permutations = [minima]
for k in xrange(N-1):
_, ilist, ith_item = cost_global[k]
if type(ilist) == int:
permutation = [minima[i]
if i != ilist else lists[ilist][ith_item]
for i in xrange(N)]
else:
# multiple nonminimal values combination
mapping = dict(zip(ilist, ith_item))
permutation = [minima[i]
if i not in mapping else lists[i][mapping[i]]
for i in xrange(N)]
permutations.append(permutation)
return permutations
Examples
Example in the question.
>>> lists = [
[1, 2],
[2, 3],
[1],
]
>>> for p in find_smallest_sum_permutations(lists):
... print p, sum(p)
[1, 2, 1] 4
[2, 2, 1] 5
[1, 3, 1] 5
Example I had generated with random lists.
>>> import random
>>> N = 5
>>> random.seed(1024)
>>> lists = [sorted(random.sample(range(10*N), 2*N-1)) for _ in xrange(N)]
>>> for p in find_smallest_sum_permutations(lists):
... print p, sum(p)
[4, 4, 1, 6, 0] 15
[4, 6, 1, 6, 0] 17
[4, 4, 3, 6, 0] 17
[4, 4, 1, 6, 4] 19
[4, 6, 3, 6, 0] 19
Example by user2357112 which had caught a glaring error in my previous iteration.
>>> lists = [
[1, 2, 30, 40],
[1, 2, 30, 40],
[10, 20, 30, 40],
[10, 20, 30, 40],
]
>>> for p in find_smallest_sum_permutations(lists):
... print p, sum(p)
[1, 1, 10, 10] 22
[2, 1, 10, 10] 23
[1, 2, 10, 10] 23
[2, 2, 10, 10] 24
The trick is to only generate the combinations that might possibly be needed, and store them in a heap. Each one that you pull out is the smallest one you have not yet seen. And the fact that THAT combination has been pulled out tells you that there are new ones which might also be small.
See https://docs.python.org/2/library/heapq.html for how to use a heap. We also need code for generating combinations. And with that, here is working code for getting the first n combinations for any list of lists:
import heapq
# Helper class for storing combinations.
class ListSelector:
def __init__(self, lists, indexes):
self.lists = lists
self.indexes = indexes
def value(self):
answer = 0
for i in range(0, len(self.lists)):
answer = answer + self.lists[i][self.indexes[i]]
return answer
def values(self):
return [self.lists[i][self.indexes[i]] for i in range(0, len(self.lists))]
# These are the next combinations. We are willing to increment any
# leading 0, or the first non-zero value. This will provide one and
# only one path to each possible combination.
def next_selectors(self):
lists = self.lists
indexes = self.indexes
selectors = []
for i in range(0, len(lists)):
if len(lists[i]) <= indexes[i] + 1:
if 0 == indexes[i]:
continue
else:
break
new_indexes = [
indexes[j] + (0 if j != i else 1)
for j in range(0, len(lists))]
selectors.append(ListSelector(lists, new_indexes))
if 0 < indexes[i]:
break
return selectors
# This will just return an iterator over all combinations, from smallest
# to largest. It does NOT generate them until needed.
def combinations(lists):
sel = ListSelector(lists, [0 for _ in range(len(lists))])
upcoming = [(sel.value(), sel)]
while len(upcoming):
value, sel = heapq.heappop(upcoming)
yield sel
for next_sel in sel.next_selectors():
heapq.heappush(upcoming, (next_sel.value(), next_sel))
# This just gets the first n of them. (It will return less if less.)
def smallest_n_combinations(n, lists):
i = 0
for sel in combinations(lists):
yield sel
i = i + 1
if i == n:
break
# Example usage
lists = [
[1, 2, 5],
[2, 3, 4],
[1]]
for sel in smallest_n_combinations(3, lists):
print(sel.value(), sel.values(), sel.indexes)
(This could be made more efficient for a long list of lists with tricks like caching the value inside of ListSelector and calculating it incrementally for new ones.)

Python list recursive changes

I have a bug in my attempt to add to a list a sequence of numbers recursively. E.g. if the input is [5,3,9], I do [5+1,3+2,9+3] and output [6,5,12]. I want to do this recursively so the way I'm doing it is going through and adding one to a smaller and smaller part of the list as below:
def add_position_recur(lst, number_from=0):
length = len(lst)
# base case
if (length <= 1):
lst = [x+1 for x in lst]
print "last is", lst
else:
lst = [x+1 for x in lst]
print "current list is", lst
add_position_recur(lst[1:], number_from)
return lst
The problem, though, is that all this does is add 1 to every element of the list. Where is the bug? Is it to do with the way I return the list in the base case?
When you recurse down your call stack you slice lst which creates a new list, this is not the same as what you return, so you will only ever return the changes you've applied to your list in the first call to the function, losing all changes further down the stack:
>>> add_position_recur([1,2,3])
[2, 3, 4]
This should have returned [2, 4, 6].
You need to consider reassembling the list on the way out to get the changes.
return [lst[0]] + add_position_recur(lst[1:], number_from)
and you need to return lst in your base case:
def add_position_recur(lst, number_from=0):
length = len(lst)
# base case
if (length <= 1):
lst = [x+1 for x in lst]
return lst
else:
lst = [x+1 for x in lst]
return [lst[0]] + add_position_recur(lst[1:], number_from)
>>> add_position_recur([1,2,3])
[2, 4, 6]
However, this is quite a complicated approach to this recursion. It is idiomatic for the base case to be the empty list, otherwise take the head and recurse down the tail. So something to consider which uses the number_from:
def add_position_recur(lst, number_from=1):
if not lst:
return lst
return [lst[0]+number_from] + add_position_recur(lst[1:], number_from+1)
>>> add_position_recur([1,2,3])
[2, 4, 6]
This also has the advantage(?) of not changing the passed in lst
Why don't you instead do something like this:
def func(lon, after=[]):
if not l:
pass
else:
v = len(lon) + lon[-1]
after.append(v)
func(lon[:-1], after)
return after[::-1]
The output of the function for the example you provided matches what you want.
Currently, you are simply adding 1 to each value of your list.
lst = [x+1 for x in lst]
Rather, you should be increasing a variable which is being added to x with each iteration of x in lst.
lst = [x+(lst.index(x)+1) for x in lst]
This solution assumes that you want the number being added to x to depend on its position in the list relative to the start of the list, rather than being dependent on the position of x relative to the first element which was >1. Meaning, do you want to add 1 or 3 to the value 2 in the following list? The solution above adds three.
lst = [0.5, 0.1, 2, 3]

In Python, how can I get the intersection of two lists, preserving the order of the intersection?

I have a list of lists ("sublists") and I want to see if the same sequence of any unspecified length occurs in more than one sublist. To clarify, the order of items must be preserved - I do not want the intersection of each sublist as a set. There must be at least 2 items that match sequentially. Please see example below.
Input:
someList = [[0,1,3,4,3,7,2],[2,3,4,3],[0,3,4,3,7,3]]
Desired Output: (will be printed to file but don't worry about this detail)
sublist0_sublist1 = [3,4,3] #intersection of 1st and 2nd sublists
sublist0_sublist2 = [3,4,3,7] #intersection of 1st and 3rd sublists
sublist1_sublist2 = [3,4,3] #intersection of 2nd and 3rd sublists
Whipped this up for you (including your comment that equal-length maximum sublists should all be returned in a list):
def sublists(list1, list2):
subs = []
for i in range(len(list1)-1):
for j in range(len(list2)-1):
if list1[i]==list2[j] and list1[i+1]==list2[j+1]:
m = i+2
n = j+2
while m<len(list1) and n<len(list2) and list1[m]==list2[n]:
m += 1
n += 1
subs.append(list1[i:m])
return subs
def max_sublists(list1, list2):
subls = sublists(list1, list2)
if len(subls)==0:
return []
else:
max_len = max(len(subl) for subl in subls)
return [subl for subl in subls if len(subl)==max_len]
This works allright for these cases:
In [10]: max_sublists([0,1,3,4,3,7,2],[0,3,4,3,7,3])
Out[10]: [[3, 4, 3, 7]]
In [11]: max_sublists([0,1,2,3,0,1,3,5,2],[1,2,3,4,5,1,3,5,3,7,3])
Out[11]: [[1, 2, 3], [1, 3, 5]]
It's not pretty though, nor is it really fast.
You only have to figure out how to compare every sublist in your original list of sublists, but that should be easy.
[Edit: I fixed a bug and prevented your error from occurring.]

Categories

Resources