Get list of lists of duplicates' indices - python

I have a list that contains duplicate elements. For all duplicate elements, I would like to obtain a list of their indices. The final output should be a list of lists of duplicate indices.
I have already come up with a working solution, but I have the feeling, that there might be a more computationally efficient and/or sparse way (using less code) for this problem:
# set up a list that contains duplicate elements
a = ['bar','foo','bar','foo','foobar','barfoo']
# get list of elements that appear more than one time in the list
seen = {}
dupes = []
for x in a:
if x not in seen:
seen[x] = 1
else:
if seen[x] == 1:
dupes.append(x)
seen[x] += 1
# for each of those elements, return list of indices of matching elements
# in original list
dupes_indices = []
for dupe in dupes:
indices = [i for i, x in enumerate(a) if x == dupe]
dupes_indices.append(indices)
where dupes_indices is [[0, 2], [1, 3]] ('foo' appears at indices 0 and 2 and 'bar' appears at indices 1 and 3)
I used the code from this and from this answer on stackoverflow.

You could try this nested list comprehension one-liner:
a = ['bar','foo','bar','foo','foobar','barfoo']
print([y for y in [[i for i, v in enumerate(a) if v == x] for x in set(a)] if len(y) > 1])
Output:
[[0, 2], [1, 3]]

pandas selection is great for such solution:
df = pd.DataFrame(['bar','foo','bar','foo','foobar','barfoo'])
df.columns = ["elements"]
elements = set(df.elements.tolist())
for e in elements:
x = df.loc[df.elements == e]
print(e, x.index.tolist())
output
bar [0, 2]
foobar [4]
foo [1, 3]
barfoo [5]

Related

Truncate sublists to the lowest length

I have:
l = [[1,2,3],[3,4],[1,6,8,3]]
I want:
[[1,2],[3,4],[1,6]]
Which is the list l with all sublists truncated to the lowest length found for the sublists in l.
I tried:
min = 1000
for x in l:
if len(x) < min: min = len(x)
r = []
for x in l:
s = []
for i in range(min):
s.append(x[i])
r.append(s.copy())
Which works but quite slow and long to write. I'd like to make this more efficient through list comprehension or similar.
Using del:
n = min(map(len, l))
for a in l:
del a[n:]
You can find the length of each item in the list and then pick min element from it. Later you can use this value to truncate all the items in the list
l = [[1,2,3],[3,4],[1,6,8,3]]
min_length = min(map(len,l)) # using map with len function to get the length of each item and then using min to find the min value.
l = [item[:min_length] for item in l] # list comprehension to truncate the list
One liner -
l = [item[:min(map(len,l))] for item in l]
One fun thing about zip is zip is the inverse itself, so list(zip(*zip(*x))) gives x in similar structure.
And zip stop iteration when any input is exhausted.
Though the results are tuples and the nested lists are not truncated in-place., one can make use of this to build the following output:
Output:
[(1, 2), (3, 4), (1, 6)]
l = [[1, 2, 3], [3, 4], [1, 6, 8, 3]]
print(list(zip(*zip(*l))))
With list comprehension, one-liner:
l = [[1,2,3],[3,4],[1,6,8,3]]
print ([[s[i] for i in range(min([len(x) for x in l]))] for s in l])
Or:
print ([s[:min([len(s) for s in l])] for s in l])
Output:
[[1, 2], [3, 4], [1, 6]]
We compute the minimal length of subslists in the 'range()' to iterate over sublists for that amount and to reconstruct a new subslist. The top-level list comprehension allows to reconstruct the nested sublist.
If you have a large nested list, you should use this version with two lines:
m = min([len(x) for x in l])
print ([[s[i] for i in range(m)] for s in l])
Or:
print ([s[:m] for s in l])
Using zip and preserving the list objects:
print (list([list(x) for x in zip(*zip(*l))]))
Output:
[[1, 2], [3, 4], [1, 6]]

Pythonic popping from a list to another list

I have two lists
a = [1,2,3]
b = []
I want to move an element from list a, if it meets a certain condition.
a = [1,3]
b = [2]
The below code shows an example, however, I would like to do this inside of a single loop. How do I do this more efficiently?
a = [1,2,3]
b = []
pop_list = []
for i in range(len(a)):
if a[i] == 2:
print("pop:", a[i])
pop_list.append(i)
for i in range(len(pop_list)):
b.append(a.pop(pop_list[i]))
# Reset pop_list
pop_list=[]
Ideally, I would not generate a new list b.
A pair of list comprehensions would do the job: one to select the desired elements for b, the other to remove them from a
b = [i for i in a if i == 2]
a = [i for i in a if i != 2]
You can use filter and itertools.filterfalse and use the same filtering function for both:
from itertools import filterfalse
a = [1,2,3]
b = []
list(filterfalse(lambda x: x == 2, a))
list(filter (lambda x: x == 2, a))
[1, 3]
[2]
Here is the itertools.filterfalse docs.
If the element x exists you could just remove it from b and append it to a.
a = [1, 2, 3]
b = []
x = 2
def remove_append(a, b, x):
if x in a:
a.remove(x)
b.append(x)
remove_append(a, b, x)
print(a)
print(b)
Output:
[1, 3]
[2]
We must pass through all elements, however, you can apply this trick to add to the appropriate list in one loop:
(Appending to a loop is more efficient than deleting an element at arbitrary position)
a = [1,2,3]
condition_false, condition_true = [], []
for v in a:
# Add to the right list
(condition_false, condition_true)[v == 2].append(v)
# [1, 3]
print(condition_false)
# [2]
print(condition_true)
Here is a single loop way that's similar to your initial method:
length = len(a)
popped = 0
for i in range(length):
if i == length - popped:
break
if a[i] == 2:
b.append(a.pop(i))
popped += 1
If we keep track of how many elements we pop from a, we can just stop our loop that many elements early since there are fewer elements left in a.

Find matches between many Lists with same length at the same Position

I have two lists (can be more later), and i want to figure out, which values mach at the same position.
This code below returns matched values, but its not returning the position of the match.
a = [5,0]
b = [5,1]
print list(set(a).intersection(set(b)))
>>5
Use zip and enumerate and check for unique values:
lists = [a, b] # add more lists here if need be...
for idx, items in enumerate(zip(*lists)):
unique = set(items)
if len(unique) == 1:
# idx = position, unique.pop() == the value
print idx, unique.pop()
def check_equal(lst):
return lst[1:] == lst[:-1]
def get_position_and_matches(*lists):
shortest_list = min(lists, key=len)
for index,item in enumerate(shortest_list):
matching = [l[index] for l in lists]
if check_equal(matching):
print "Index: {0}, Value: {1}".format(index, shortest_list[index])
one = [1, 3, 4, 6, 2]
two = [1, 3, 4, 2, 9, 9]
three = [2, 3, 4]
get_position_and_matches(one, two, three)
This will show you the position of the match (assuming the value None is not a valid element)
a=[1,2,3]
b=[0,2,3]
c=[3,2,1]
l = [a, b, c] # add any number of lists
z = zip(*l)
pos = 0
for i in z:
if reduce(lambda x, y: x if x == y else None, i):
print pos
pos += 1
or, if you wanted to keep the match for each position:
matches=[reduce(lambda x, y: x if x == y else None, i) for i in z]
would produce
[None, 2, None]
You could write your own method:
a = [1, 2, 3, 4, 5]
b = [5, 4, 3, 2, 1]
c = [3, 3, 3, 3, 3]
allLists = [b, c] # all lists but the first
for i in range(len(a)):
good = True
for l in allLists:
if l[i] != a[i]:
good = False
break
if good:
print(i, a[i])
edited to make it easier to add more lists
matching_indexes = [i for i in range(len(a)) if a[i] == b[i] == c[i]]
Can be used. A simple list comprehension to test each individual value of a,b and c. More or less == can be added for each list to be compared. However, this assumes all the lists are of the same length or that a is the shortest list.
This is the answer to as many lists as you want to use
a = [5,0,1,2]
b = [5,2,3,2]
lists = [a,b,b,a,a]
d = dict()
for l in lists:
for i in range(len(a)):
if i not in d.keys():
d[i] = a[i]
elif d[i] != l[i]:
d[i] = -1
for i in d.keys():
if d[i] != -1:
print d[i], i

List comprehension won't return expected output

I'm trying to solve the Google's Python Basic Exercises and I tried solving this particular one about lists with list comprehension:
# D. Given a list of numbers, return a list where
# all adjacent == elements have been reduced to a single element,
# so [1, 2, 2, 3] returns [1, 2, 3]. You may create a new list or
# modify the passed in list.
def remove_adjacent(nums):
newList = []
newList = [i for i in nums if len(newList) == 0 or nums[i] != newList[-1]]
return newList
Obviously, output is not what I expected, and the author-made test function underlines this:
got: [2, 2, 3, 3, 3] expected [2, 3]
got: [1, 2, 2, 3] expected [1, 2, 3]
What's wrong with my function?
The problem with your code is that the newList you are referring to inside the list comprehension expression always stays the same empty list [] as you assigned it initially. The expression [i for i in nums if len(newList) == 0 or nums[i] != newList[-1]] is calculated first using the existing variable, and only then the result is assigned to newList.
In other words, your code is equivalent to
def remove_adjacent(nums):
newList = []
otherList = [i for i in nums if len(newList) == 0 or nums[i] != newList[-1]]
return otherList
You don't have to use list comprehension to solve this problem (and personally I wouldn't because it gets tricky in this case).
def adj(l):
if len(l) in {0,1}: # check for empty or list with 1 element
return l
return [ele for ind, ele in enumerate(l[:-1]) if ele != l[ind+1]] + [l[-1]]
if ele != l[ind+1]]checks the current element against the element at the next index in the list, we go to l[:-1] so l[ind+1] does not give an index error, because of this we need to add l[-1] the last element to the result.
In [44]: l = [1, 2, 2, 3]
In [45]: adj(l)
Out[45]: [1, 2, 3]
In [46]: l = [1, 2, 2, 3,2]
In [47]: adj(l)
Out[47]: [1, 2, 3, 2]
In [48]: l = [2,2,2,2,2]
In [49]: adj(l)
Out[49]: [2]
Using your own code you would need a for loop as newList is assigned to the list comprehension, you are not updating your original assignment of newList you have reassigned the name to the list comprehension which is a completely new object:
def remove_adjacent(nums):
if len(l) in {0,1}: # catch empty and single element list
return l
newList = [nums[0]] # add first element to avoid index error with `newList[-1]`
for i in nums[1:]: # start at second element and iterate over the element
if i != newList[-1]:
newList.append(i)
return newList
In [1]: l = [] # assign l to empty list
In [2]: id(l)
Out[2]: 140592635860536 # object id
In [3]: l = [x for x in range(2)] # reassign
In [4]: id(l)
Out[4]: 140592635862264 # new id new object

In Python, how can I get the intersection of two lists, preserving the order of the intersection?

I have a list of lists ("sublists") and I want to see if the same sequence of any unspecified length occurs in more than one sublist. To clarify, the order of items must be preserved - I do not want the intersection of each sublist as a set. There must be at least 2 items that match sequentially. Please see example below.
Input:
someList = [[0,1,3,4,3,7,2],[2,3,4,3],[0,3,4,3,7,3]]
Desired Output: (will be printed to file but don't worry about this detail)
sublist0_sublist1 = [3,4,3] #intersection of 1st and 2nd sublists
sublist0_sublist2 = [3,4,3,7] #intersection of 1st and 3rd sublists
sublist1_sublist2 = [3,4,3] #intersection of 2nd and 3rd sublists
Whipped this up for you (including your comment that equal-length maximum sublists should all be returned in a list):
def sublists(list1, list2):
subs = []
for i in range(len(list1)-1):
for j in range(len(list2)-1):
if list1[i]==list2[j] and list1[i+1]==list2[j+1]:
m = i+2
n = j+2
while m<len(list1) and n<len(list2) and list1[m]==list2[n]:
m += 1
n += 1
subs.append(list1[i:m])
return subs
def max_sublists(list1, list2):
subls = sublists(list1, list2)
if len(subls)==0:
return []
else:
max_len = max(len(subl) for subl in subls)
return [subl for subl in subls if len(subl)==max_len]
This works allright for these cases:
In [10]: max_sublists([0,1,3,4,3,7,2],[0,3,4,3,7,3])
Out[10]: [[3, 4, 3, 7]]
In [11]: max_sublists([0,1,2,3,0,1,3,5,2],[1,2,3,4,5,1,3,5,3,7,3])
Out[11]: [[1, 2, 3], [1, 3, 5]]
It's not pretty though, nor is it really fast.
You only have to figure out how to compare every sublist in your original list of sublists, but that should be easy.
[Edit: I fixed a bug and prevented your error from occurring.]

Categories

Resources