Comparing two list of lists to find common values Python - python

I have two lists of lists I want to iterate through them and compare the values in each bracket on the list bracket by bracket.....
List_1
[[42, 43, 45, 48, 155, 157], [37, 330, 43, 47, 157], [258, 419, 39, 40, 330, 47], [419, 39, 44, 589, 599, 188].....
List_2
[[37, 330, 43, 47, 157], [258, 419, 39, 40, 330, 47], [419, 39, 44, 589, 599, 188], [41, 44, 526, 602, 379, 188]....
I need to compare the first bracket in List_1 [42, 43, 45, 48, 155, 157]
With the first bracket in List_2 [37, 330, 43, 47, 157]
the desired result is the numbers that are the same in each sequential bracket...for first bracket the result is 43 and 157
then I need to continue second bracket in List_1, with the second bracket in List_2 etc.
Number of values in each bracket may vary
I need each bracket in 1 list to compare with the corresponding bracket from the other list
I don't need the results to be separate
I'm at a very basic level but, I've tried a few different things including using sets intersection, list matches. I'm sure that there is a simple way but only just getting started.
set_x = set([i[1] for i in list_1])
print(set_x)
set_y = set([i[0] for i in list_2])
matches = set_x.intersection(set_y)
print(matches)
this is providing an answer that is way off {3, 8, 396, 12,} and I can't really work out what it's doing.
also tried this
common_elements=[]
import itertools
for i in list(itertools.product(coords_list_1,coords_list_2)):
if i[0] == i[1]:
common_elements.append(i[0])
print(common_elements)
but it produces a mass of results.
Thanks for your help!

Use zip and set's intersection:
for x, y in zip(List_1, List_2):
print(set(x).intersection(y))
# {43, 157}
# {330, 47}
# {419, 39}
# {188, 44}

Your approach tackles the elements in the wrong "axis". For instance:
set_x = set([i[1] for i in list_1])
creates a set of the 2nd element of each list.
In those cases, you have to forget about the indexes.
you just want to zip sublists together to perform intersection between them
List_1 = [[42, 43, 45, 48, 155, 157], [37, 330, 43, 47, 157], [258, 419, 39, 40, 330, 47], [419, 39, 44, 589, 599, 188]]
List_2 = [[37, 330, 43, 47, 157], [258, 419, 39, 40, 330, 47], [419, 39, 44, 589, 599, 188], [41, 44, 526, 602, 379, 188]]
result = [set(x) & set(y) for x,y in zip(List_1,List_2)]
result:
>>> result
[{43, 157}, {330, 47}, {419, 39}, {188, 44}]

Related

How do I match similar bounding boxes that are in two separate lists?

I have two lists of bounding boxes. One list is the expected location of the bounding boxes, and the second list is the value of the bounding boxes that are returned by an OCR program. The bounding box lists (below) are in the format of [Top, Left, Width, Height]
Expected_Boxes= [[96, 752, 784, 172],
[876, 754, 674, 174],
[1536, 756, 620, 170],
[2146, 754, 318, 176],
[1136, 960, 66, 70],
[1406, 928, 906, 112],
[184, 1076, 60, 56],
[442, 1192, 812, 132],
[1710, 1232, 62, 54],
[2012, 1228, 58, 58],
[176, 1332, 1062, 128],
[1302, 1334, 1128, 126],
[128, 1526, 950, 106],
[1098, 1532, 402, 98],
[1534, 1538, 450, 88],
[2010, 1512, 434, 110],
[804, 1680, 62, 62],
[992, 1684, 56, 60],
[742, 1816, 62, 60],
[1158, 1814, 64, 60],
[100, 1994, 776, 102],
[910, 1996, 748, 98],
[1728, 1994, 714, 96],
[1728, 1994, 714, 96],
[2218, 2302, 58, 62],
[2072, 2486, 60, 60],
[2218, 2486, 60, 62],
[56, 1430, 336, 66]]
OCR_Boxes = [[793, 1660, 248, 81],
[806, 223, 215, 85],
[812, 1009, 219, 67],
[812, 2248, 86, 53],
[947, 1563, 556, 80],
[970, 1143, 44, 44],
[1080, 188, 46, 46],
[1208, 651, 406, 82],
[1234, 2015, 47, 46],
[1235, 1710, 46, 47],
[1364, 1422, 827, 96],
[1375, 338, 602, 93],
[1536, 1523, 516, 102],
[1550, 2115, 180, 76],
[1562, 429, 648, 70],
[1691, 991, 48, 47],
[1692, 808, 47, 46],
[1822, 1765, 46, 48],
[1823, 1166, 47, 47],
[1824, 746, 46, 45],
[2007, 195, 374, 91],
[2011, 1858, 380, 82],
[2014, 1019, 339, 81],
[2304, 2223, 49, 50],
[2305, 2078, 47, 46],
[2492, 2224, 46, 47],
[2492, 2081, 46, 47],
[2553, 485, 1124, 48],
[2790, 1168, 1269, 210],
[2906, 193, 391, 89]]
As you can tell, the expected list might have more or less than the OCR list, and the values will not be the same. I attempted to solve this by using the following code:
def intersection_over_union(boxA, boxB):
# determine the (x, y)-coordinates of the intersection rectangle
xA = max(boxA[0], boxB[0])
yA = max(boxA[1], boxB[1])
xB = min(boxA[2], boxB[2])
yB = min(boxA[3], boxB[3])
# compute the area of intersection rectangle
interArea = max(0, xB - xA + 1) * max(0, yB - yA + 1)
# compute the area of both the prediction and ground-truth
# rectangles
boxAArea = (boxA[2] - boxA[0] + 1) * (boxA[3] - boxA[1] + 1)
boxBArea = (boxB[2] - boxB[0] + 1) * (boxB[3] - boxB[1] + 1)
# compute the intersection over union by taking the intersection
# area and dividing it by the sum of prediction + ground-truth
# areas - the interesection area
iou = interArea / float(boxAArea + boxBArea - interArea)
# return the intersection over union value
return iou
def match_bounding_boxes(image1, image2):
matches = []
for box1 in image1:
best_iou = 0
best_box = None
for box2 in image2:
iou = intersection_over_union(box1, box2)
if iou > best_iou:
best_iou = iou
best_box = box2
matches.append((box1, best_box))
return matches
However, all matches return "None"... meaning something is logically wrong with the code. Can anyone spot it?

Divide list into sublist following certain pattern

Given an example list a = [311, 7426, 3539, 2077, 13, 558, 288, 176, 6, 196, 91, 54, 5, 202, 116, 95] with n = 16 elements (it will be in general a list of an even number of elements).
I wish to create n/4 lists that would be:
list1 = [311, 13, 6, 5]
list2 = [7426, 558, 196, 202]
list3 = [3539, 288, 91, 116]
list4 = [2077, 176, 54, 95]
(The solution is not taking an element every n such as a[i::3] in a for loop because values are excluded as the sliding window moves to the left)
Thanks for the tips!
UPDATE:
Thanks for the solutions which work well for this particular example. I realized however that my problem is a bit more complex than this.
In the sense that the list a is generated dynamically in the sense the list can decrease or increase. Now my issue is the following, say that the list grows of another group i.e. until 20 elements. Now the output lists should be 5 using the same concept. Example:
a = [311, 7426, 3539, 2077, 1 ,13, 558, 288, 176, 1, 6, 196, 91, 54, 1, 5, 202, 116, 95, 1]
Now the output should be:
list1 = [311, 13, 6, 5]
list2 = [7426, 558, 196, 202]
list3 = [3539, 288, 91, 116]
list4 = [2077, 176, 54, 95]
list5 = [1, 1, 1, 1]
And so on for whatever size of the list.
Thanks again!
I'm assuming the length of the list a is a multiple of 4. You can use numpy for your problem.
import numpy as np
a = [...]
desired_shape = (-1, len(a)//4)
arr = np.array(a).reshape(desired_shape).transpose().tolist()
Output:
[[311, 13, 6, 5],
[7426, 558, 196, 202],
[3539, 288, 91, 116],
[2077, 176, 54, 95],
[1, 1, 1, 1]]
Unpack the list into variables or iterate over them as desirable.
Consult numpy.transpose, and reshape to understand their usage.
One option: nested list comprehension.
split in n/4 chunks of 4 items
out = [[a[i+4*j] for j in range(4)]
for i in range(len(a)//4)]
Output:
[[311, 1, 176, 91],
[7426, 13, 1, 54],
[3539, 558, 6, 1],
[2077, 288, 196, 5],
[1, 176, 91, 202]]
split in 4 chunks of n/4 items
out = [[a[i+4*j] for j in range(len(a)//4)]
for i in range(4)]
Output:
[[311, 1, 176, 91, 202],
[7426, 13, 1, 54, 116],
[3539, 558, 6, 1, 95],
[2077, 288, 196, 5, 1]]
To split in lists:
list1, list2, list3, list4 = out
Although it is not easily possible to do this programmatically (and not recommended to use many variables)

python numpy ndArray replace values in first n-1 columns based on value in nth column

I have an ndArray of shape (800x1280x4) - An image data of 4 channels. In the 4rth channel i.e, alpha channel some values are 1 (transparent) and some are 255 (opaque). I want to replace the r,g,b channel values with zero, where alpha channel value is 1.
To illustrate this I took an example array as below and tried following code:
>>> import numpy as np
>>>
>>> a = np.random.randint(255, size=(3,5,4))
>>> a
array([[[165, 200, 80, 149],
[247, 126, 88, 2],
[ 35, 24, 59, 167],
[105, 69, 98, 78],
[138, 224, 50, 32]],
[[ 90, 53, 113, 39],
[105, 153, 60, 101],
[139, 249, 105, 79],
[171, 127, 81, 240],
[133, 22, 62, 172]],
[[197, 163, 253, 62],
[193, 57, 208, 247],
[241, 80, 100, 249],
[181, 118, 72, 52],
[221, 121, 89, 138]]])
>>> # I want to replace cell values with zero where 4th column value is < 100
>>> b = np.where(a[...,-1]<100,0,a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<__array_function__ internals>", line 6, in where
ValueError: operands could not be broadcast together with shapes (3,5) () (3,5,4)
I get the ValueErrors. what is the approach for replacing first three cell values based on 4rth cell value in numpy ndArray?
Try this
a[np.where(a[...,-1]<100)] = np.array([0,0,0,100])
This will perform in-place operation upon a

convert a numbered list into an order list

I have a python list question:
Input:
l=[2, 5, 6, 7, 10, 11, 12, 19, 20, 26, 28, 33, 34, 45, 46, 47, 50, 57, 59, 64, 67, 77, 79, 87, 93, 97, 106, 110, 111, 113, 115, 120, 125, 126, 133, 135, 142, 148, 160, 166, 169, 176, 202, 228, 234, 253, 274, 365, 433, 435, 436, 468, 476, 529, 570, 575, 577, 581, 614, 766, 813, 944, 1058, 1079, 1245, 1363, 1389, 1428, 1758, 2129, 2336, 2402, 2405, 2576, 3013, 3993, 7687, 8142, 8455, 8456]
Now I want to write mark the numbers in a [0]*10000 list, such that the beginning is like:
Output:
lp=[0,1,0,0,1,...]
The second and fifth elements are marked since they appeared in the input.
lp = [0] * 10000
for index in l:
lp[index - 1] = 1
You could use the following list comprehension
lp = [1 if i in l else 0 for i in range(1, 10001)]
Though I'd recommend since l could be long that you convert it to a set first
set_l = set(l)
lp = [1 if i in set_l else 0 for i in range(1, 10001)]

One-liner to calculate multiples of a certain number between two values in python

I have the following code to do what the title says:
def multiples(small, large, multiple):
multiples = []
for k in range(small, large+1):
if k % multiple == 0:
multiples.append(k)
return multiples
What it outputs:
>>> multiples(39, 51, 12)
[48]
>>> multiples(39, 51, 11)
[44]
>>> multiples(39, 51, 10)
[40, 50]
>>> multiples(39, 51, 9)
[45]
>>> multiples(39, 51, 8)
[40, 48]
>>> multiples(39, 51, 7)
[42, 49]
>>> multiples(39, 51, 6)
[42, 48]
>>> multiples(39, 51, 5)
[40, 45, 50]
>>> multiples(39, 51, 4)
[40, 44, 48]
>>> multiples(39, 51, 3)
[39, 42, 45, 48, 51]
>>> multiples(39, 51, 2)
[40, 42, 44, 46, 48, 50]
However, this is a lot of code to write, and I was looking for a pythonic one-liner to do what this does. Is there anything out there?
Just change your code to a List Comprehension, like this
return [k for k in range(small, large+1) if k % multiple == 0]
If you are just going to iterate through the results, then you can simply return a generator expression, like this
return (k for k in xrange(small, large+1) if k % multiple == 0)
If you really want to get all the multiples as a list, then you can convert that to a list like this
list(multiples(39, 51, 12))
You can do it as:
def get_multiples(low, high, num):
return [i for i in range(low,high+1) if i%num==0]
Examples:
>>> print get_multiples(4, 345, 56)
[56, 112, 168, 224, 280, 336]
>>> print get_multiples(39, 51, 2)
[40, 42, 44, 46, 48, 50]
>>> print get_multiples(2, 1234, 43)
[43, 86, 129, 172, 215, 258, 301, 344, 387, 430, 473, 516, 559, 602, 645, 688, 731, 774, 817, 860, 903, 946, 989, 1032, 1075, 1118, 1161, 1204]
range((small+multiple-1)//multiple * multiple, large+1, multiple)
Perfect application for a generator expression:
>>> sm=31
>>> lg=51
>>> mult=5
>>> (m for m in xrange(sm,lg+1) if not m%mult)
<generator object <genexpr> at 0x101e3f2d0>
>>> list(_)
[35, 40, 45, 50]
If on Python3+, use range instead of xrange...

Categories

Resources