Indices of intersection of lists - python

Given two lists of equal length:
_list = [1, 4, 8, 7, 3, 15, 5, 0, 6]
_list2 = [7, 4, 0, 1, 5, 5, 7, 2, 2]
How do I try getting an output like this:
output = [(0,3), (1,1), (3,0), (6,4), (6,5), (7,2)]
Here the intersection of two lists are obtained and the common elements' indices are arranged in the list:
output = list of (index of an element in _list, where it appears in _list2)
Trying intersection with sets is not an option since the set removes the repeating elements.

Basic-Intermediate: As a generator:
def find_matching_indices(a, b):
for i, x in enumerate(a):
for j, y in enumerate(b):
if x == y:
yield i, j
list(find_matching_indices(list1_, list2_))
# [(0, 3), (1, 1), (3, 0), (3, 6), (6, 4), (6, 5), (7, 2)]
Basic-Intermediate: As a list comprehension:
[(i, j) for i, x in enumerate(list1_) for j, y in enumerate(list2_) if x == y]
# [(0, 3), (1, 1), (3, 0), (3, 6), (6, 4), (6, 5), (7, 2)]
These solutions involve two loops.
Intermediate-Advanced: For fun, a dictionary is another data structure you might consider:
import collections as ct
import more_itertools as mit
def locate_indices(a, b):
"""Return a dictionary of `a` index keys found at `b` indices."""
dd = ct.defaultdict(list)
for i, y in enumerate(a):
idxs = list(mit.locate(b, lambda z: z == y))
if idxs: dd[i].extend(idxs)
return dd
locate_indices(list1_, list2_)
# defaultdict(list, {0: [3], 1: [1], 3: [0, 6], 6: [4, 5], 7: [2]})
Note the index of list a is the key in the dictionary. All indices in list b that share the same value are appended.
A defaultdict was used since it is helpful in building dictionaries with list values. See more on the third-party tool more_itertools.locate(), which simply yields all indices that satisfy the lambda condition - an item in list a is also found in b.

from itertools import product
from collections import defaultdict
def mathcing_indices(*lists):
d = defaultdict(lambda: tuple([] for _ in range(len(lists))))
for l_idx, l in enumerate(lists):
for i, elem in enumerate(l):
d[elem][l_idx].append(i)
return sorted([tup for _, v in d.items() for tup in product(*v)])
This solution builds a dictionary that tracks the indices that values appear at in the input lists. So if the value 5 appears at indices 0 and 2 of the first list and index 3 of the second, the value for 5 in the dictionary would be ([0, 2], [3])
It then uses itertools.product to build all the combinations of those indices.
This looks more complicated than the other answers here, but because it is O(nlogn) and not O(n**2) it is significantly faster, especially for large inputs. Two length 1000 lists of random numbers 0-1000 complete 100 tests in ~.4 seconds using the above algorithm and 6-13 seconds using some of the others here

Here is a solution that runs in O(n log n):
ind1 = numpy.argsort(_list)
ind2 = numpy.argsort(_list2)
pairs = []
i = 0
j = 0
while i<ind1.size and j<ind2.size:
e1 = _list[ind1[i]]
e2 = _list2[ind2[j]]
if e1==e2:
pairs.append((ind1[i],ind2[j]))
i = i + 1
j = j + 1
elif e1<e2:
i = i +1
elif e2<e1:
j = j + 1
print(pairs)

Related

Detect ranges in Python

I'm trying to solve this exercise in my coursework:
Create a function named detect_ranges that gets a list of integers as a parameter.
The function should then sort this list, and transform the list into another list where pairs are used for all the detected intervals.
So 3,4,5,6 is replaced by the pair (3,7).
Numbers that are not part of any interval result just single numbers.
The resulting list consists of these numbers and pairs, separated by commas. An example of how this function works:
print(detect_ranges([2,5,4,8,12,6,7,10,13]))
[2,(4,9),10,(12,14)]
I couldn't comprehend the exercise topic and can't think of how I can detect range. Do you guys have any hints or tips?
Another way of doing this. Although this method will not be as efficient as the other one, but since its an exercise, it will be easier to follow.
I have used zip function in python to do some stuff I explained below, you can check it here to know more about it.
1. First sort the list data, so you get: [2, 4, 5, 6, 7, 8, 10, 12, 13]
2. Then find the differences of increasing values in list. Like (4-2),(5-4), .. If the difference is <=1, then it will be part of a range:
(Also, insert a 0 in the front, just to account for the 1st element and make the obtained list's length equal to original list)
>>> diff = [j-i for i, j in zip(lst[:-1], lst[1:])]
>>> diff.insert(0, 0)
>>> diff
[0, 2, 1, 1, 1, 1, 2, 2, 1]
3. Now get positions in above list where difference is >= 2. This is to detect the ranges:
(Again, insert a 0 in the front, just to account for the 1st element, and make sure it gets picked in range detection)
>>> ind = [i for i,v in enumerate(diff) if v >= 2]
>>> ind.insert(0, 0)
>>> ind
[0, 1, 6, 7]
So the ranges are 0 to 1, 1 to 6, and 6 to 7 in your original list.
4. Group the elements together that will form ranges, using the ind list obtained:
>>> groups = [lst[i:j] for i,j in zip(ind, ind[1:]+[None])]
>>> groups
[[2], [4, 5, 6, 7, 8], [10], [12, 13]]
5. Finally obtain your desired ranges:
>>> ranges = [(i[0],i[-1]+1) if len(i)>1 else i[0] for i in groups]
>>> ranges
[2, (4, 9), 10, (12, 14)]
Putting it all in a function detect_ranges:
def detect_ranges(lst):
lst = sorted(lst)
diff = [j-i for i, j in zip(lst[:-1], lst[1:])]
diff.insert(0, 0)
ind = [i for i,v in enumerate(diff) if v >= 2]
ind.insert(0, 0)
groups = [lst[i:j] for i,j in zip(ind, ind[1:]+[None])]
ranges = [(i[0],i[-1]+1) if len(i)>1 else i[0] for i in groups]
return ranges
Examples:
>>> lst = [2,6,1,9,3,7,12,45,46,13,90,14,92]
>>> detect_ranges(lst)
[(1, 4), (6, 8), 9, (12, 15), (45, 47), 90, 92]
>>> lst = [12,43,43,11,4,3,6,6,9,9,10,78,32,23,22,98]
>>> detect_ranges(lst)
[(3, 5), (6, 7), (9, 13), (22, 24), 32, (43, 44), 78, 98]
Iterate through the elements and save the start of each interval.
def detect_ranges(xs):
it = iter(xs)
try:
start = next(it)
except StopIteration:
return
prev = start
for x in it:
if prev + 1 != x:
yield start, prev + 1
start = x
prev = x
yield start, prev + 1
Usage:
>>> xs = [2, 4, 5, 6, 7, 8, 10, 12, 13]
>>> ranges = list(detect_ranges(xs))
>>> ranges
[(2, 3), (4, 9), (10, 11), (12, 14)]
If you want to reduce single item intervals like (2, 3) to 2, you can do:
>>> ranges = [a if a + 1 == b else (a, b) for a, b in ranges]
>>> ranges
[2, (4, 9), 10, (12, 14)]

Python. How to conveniently count the frequence of lists in a collection of lists

I have a list of list.
e.g. list_a = [[1,2,3], [2,3], [4,3,2], [2,3]]
I want to count them like
[1,2,3]: 1
[2,3]: 2
[4,3,2]: 1
There is a library Counter in collections but not for unhashable elements like list. Currently, I just try to use other indirect ways for example transfer the list [1,2,3] into a string "1_2_3" to do that. Is there any other way can enable the count on the list directly?
Not the prettiest way to do it, but this works:
list_a = [[1,2,3], [2,3], [4,3,2], [2,3]]
counts = {}
for x in list_a:
counts.setdefault(tuple(x), list()).append(1)
for a, b in counts.items():
counts[a] = sum(b)
print(counts)
{(2, 3): 2, (4, 3, 2): 1, (1, 2, 3): 1}
A possible approach to do this job is using a dict.
Create a empty dict
Iterate over the list using a for loop.
For each element (iteration), check if the dict contains it.
If it doesn't, save it in the dict as a key. The value will be the occurrence counter.
If it does, just increment its value.
Possible implementation:
occurrence_dict = {}
for list in list_a:
if (occurrence_dict.get(str(list), False)):
occurence_dict[str(list)] += 1
else:
ocorrence_dict[str(list)] = 1
print(occurence_dict)
You can achieve it easily, by using tuple instead of list
c = Counter(tuple(item) for item in list_a)
# or
c = Counter(map(tuple, list_a))
# Counter({(2, 3): 2, (1, 2, 3): 1, (4, 3, 2): 1})
# exactly what you expected
(1, 2, 3) 1
(2, 3) 2
(4, 3, 2) 1
Way 1
Through the indexes of repeatable lists
list_a = [[1,2,3], [2,3], [4,3,2], [2,3], [1,2,3]] # just add some data
# step 1
dd = {i:v for i, v in enumerate(list_a)}
print(dd)
Out[1]:
{0: [1, 2, 3], 1: [2, 3], 2: [4, 3, 2], 3: [2, 3], 4: [1, 2, 3]}
# step 2
tpl = [tuple(x for x,y in dd.items() if y == b) for a,b in dd.items()]
print(tpl)
Out[2]:
[(0, 4), (1, 3), (2,), (1, 3), (0, 4)] # here is the tuple of indexes of matching lists
# step 3
result = {tuple(list_a[a[0]]):len(a) for a in set(tpl)}
print(result)
Out[3]:
{(4, 3, 2): 1, (2, 3): 2, (1, 2, 3): 2}
Way 2
Through converting nested lists to tuples
{i:[tuple(a) for a in list_a].count(i) for i in [tuple(a) for a in list_a]}
Out[1]:
{(1, 2, 3): 2, (2, 3): 2, (4, 3, 2): 1}

Convert flattened 1 dimensional indices to 2 dimensional indices

Say I have a list of lists, e.g:
x = [[0,1,2,3],[4,5],[6,7,8,9,10]]
And I have the 'flat' indices of the elements I wish to target, i.e, the indices of the elements I want to select from the list if it were flattened into a 1d list:
flattened_indices = [0,1,4,9]
# # # #
flattened_list = [0,1,2,3,4,5,6,7,8,9,10]
How do I convert the 1.d. indices into 2.d. indices that would allow me to recover the elements from the original nested list? I.e. in this example:
2d_indices = [(0,0), (0,1), (1,0), (2,3)]
Here is a way to do that:
from bisect import bisect
import itertools
# Accumulated sum of list lengths
def len_cumsum(x):
return list(itertools.accumulate(map(len, x)))
# Find 2D index from accumulated list of lengths
def find_2d_idx(c, idx):
i1 = bisect(c, idx)
i2 = (idx - c[i1 - 1]) if i1 > 0 else idx
return (i1, i2)
# Test
x = [[0, 1, 2, 3], [4, 5], [6, 7, 8, 9, 10]]
indices = [0, 4, 9]
flattened_list = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
c = len_cumsum(x)
idx_2d = [find_2d_idx(c, i) for i in indices]
print(idx_2d)
>>> [(0, 0), (1, 0), (2, 3)]
print([x[i1][i2] for i1, i2 in idx_2d])
>>> [0, 4, 9]
If you have many "flat" indices, this is more effective than iterating the nested list for each index.
I guess you could put these index pairs in a dict, then just reference the dict from indices at the end and create a new list:
x = [[0,1,2,3],[4,5],[6,7,8,9,10]]
indices = [0,4,9]
idx_map = {x: (i, j) for i, l in enumerate(x) for j, x in enumerate(l)}
result = [idx_map[x] for x in indices]
print(result)
Which results in:
[(0, 0), (1, 0), (2, 3)]
But this is not optimal since its quadratic runtime to create idx_map. #jdehesa's solution using bisect is much more optimal.

How to check if a column has different numbers in a list of lists?

I am trying to identify all the columns that contain different numbers
for i in range(len(f)):
for j in range(len(f[i])):
if(f[j][i] != f[j][i+1]):
print(f[j][i+1])
for example if the list is f = [[3, 5, 6, 7], [7, 5, 6, 3]]
I would like to obtain col 0 and col 3 but I am getting: "list index out of range"
Any help would be apreciatted.
Using zip can achieve a better solution:
for i, (a, b) in enumerate(zip(*f)):
if a != b: print i
zip(*f) gives you:
In [18]: zip(*f)
Out[18]: [(3, 7), (5, 5), (6, 6), (7, 3)]
And now you can easily compare the "columns".
If you're a one-liner guy:
[i for i, (a, b) in enumerate(zip(*f)) if a != b]
You swapped the indices. So j is 0,1,2,3 and when it hits 2, the error happens in your if clause. Remember, the first index is giving you the index of the sublist and the second one the index of the item in the sublist.
This is correctly yielding 0 and 3:
for i in range(len(f)-1):
for j in range(len(f[i])):
if(f[i][j] != f[i+1][j]):
print(j)
you can use zip:
f = [[3, 5, 6, 7], [7, 5, 6, 3]]
for n, (i, j) in enumerate(zip(*f)):
if i != j:
print(n)
the expression zip(*f) iterates over a 'transposed' version of your list f.

Find most common element

How can I print the most common element of a list without importing a library?
l=[1,2,3,4,4,4]
So I want the output to be 4.
You can get the unique values first:
l = [1, 2, 3, 4, 4, 4]
s = set(l)
then you can create list of (occurrences, value) tuples
freq = [(l.count(i), i) for i in s] # [(1, 1), (1, 2), (1, 3), (3, 4)]
get the "biggest" element (biggest number of occurrences, the biggest value if there are more than one with the same number of occurrences):
result = max(freq) # (3, 4)
and print the value:
print(result[1]) # 4
or as a "one-liner" way:
l = [1, 2, 3, 4, 4, 4]
print(max((l.count(i), i) for i in set(l))[1]) # 4
lst=[1,2,2,2,3,3,4,4,5,6]
from collections import Counter
Counter(lst).most_common(1)[0]
Counter(lst) returns a dict of element-occurence pairs. most_common(n) returns the n most common elements from the dict, along with the number of occurences.

Categories

Resources