How to remove duplicate lists from a 'list of lists'?

How to remove duplicate lists from a 'list of lists'? - python

I have a list of lists (I'm relatively new to Python so excuse me if the terms are inaccurate, but look at the example below) and want to remove any duplicate lists.
In this example, entries 1&4 and 3&5 are identical and a duplicate should be removed.
List = [[1, 'A', 6, 2], [8, 'C', 6, 2], [3, 'G', 3, 4], [1, 'A', 6, 2], [3, 'G', 3, 4], [3, 'B', 3, 4]]
[[1, 'A', 6, 2], [8, 'C', 6, 2], [3, 'G', 3, 4], [3, 'B', 3, 4]]
I currently have the following for loop reading through the list and removing duplicates but this makes it very slow and my code is much longer and the input list is much more complicated than in my example and makes the code run for days and days.
unique = []
for i in cohesiveFaceNodes:
if not i in unique:
unique.append(i)
cohesiveFaceNodes = unique

de-duping while preserving the order (from Cpython 3.6+):
>>> lst = [[1, 'A', 6, 2], [8, 'C', 6, 2], [3, 'G', 3, 4],
... [1, 'A', 6, 2], [3, 'G', 3, 4], [3, 'B', 3, 4]]
>>> [list(x) for x in dict.fromkeys(map(tuple, lst))]
[[1, 'A', 6, 2], [8, 'C', 6, 2], [3, 'G', 3, 4], [3, 'B', 3, 4]]

If you can convert the inner lists into tuples, there is a super simple one-liner way to handle this
# use a list of tuples instead of a list of lists for this method to work
input_list = [(1, 'A', 6, 2), (8, 'C', 6, 2), (3, 'G', 3, 4), (1, 'A', 6, 2), (3, 'G', 3, 4), (3, 'B', 3, 4)]
deduped_list = list(dict.fromkeys(input_list)) # remove dupes, return new list of tuples
Edit to add that a quick way to convert your existing list of lists to a list of tuples is to use a list comprehension like so input_list = [tuple(e) for e in input_list]
Edit 2: if you for some reason really really need a list of lists after the fact, once again it's list comprehensions to the rescue final_list = [list(e) for e in deduped_list]

Testing whether something is an element of a list (i in unique) is quite expensive (it iterates the list element by element until it finds a match or the list is exhausted). To check for element membership a data structure such as a set is much more efficient. So making unique a set rather than a list would help.
Now there's a small hurdle: Python sets don't support lists as members, because lists are mutable and not hashable. Assuming the elements in each of the inner lists are hashable, though, you can convert them to Python tuples (which are similar to lists but immutable) and then they can be elements of sets.
So one solution could be (I'm reusing the original variable names, though I think they're not ideal and I recommend changing them):
unique = set()
result = []
for i in cohesiveFaceNodes:
i_as_tuple = tuple(i)
if not i_as_tuple in unique:
unique.add(i_as_tuple)
result.append(i)

For better coding practice and readability, it may be better to use dataclass to store these data. You can explicitly name each entry in the inner list for more clarity. dataclass offers built-in equality comparison like the tuple methods in the other answers.
from dataclasses import dataclass
#dataclass(unsafe_hash=True)
class AClass:
some_int: int
some_chr: str
int2: int
int3: int
lst = [[1, 'A', 6, 2], [8, 'C', 6, 2], [3, 'G', 3, 4],
[1, 'A', 6, 2], [3, 'G', 3, 4], [3, 'B', 3, 4]]
new_lst = [AClass(*x) for x in lst]
deduped_list = list(dict.fromkeys(new_lst))

Related

Slicing 2D Python List

Let's say I have a list:
list = [[1, 2, 3, 4],
['a', 'b', 'c', 'd'],
[9, 8, 7, 6]]
and I would like to get something like:
newList = [[2, 3, 4],
['b', 'c', 'd'],
[8, 7, 6]]
hence I tried going with this solution
print(list[0:][1:])
But I get this output
[['a', 'b', 'c', 'd'],
[9, 8, 7, 6]]
Therefore I tried
print(list[1:][0:])
but I get precisely the same result.
I tried to make some research and experiments about this specific subject but without any result.

You want the 1 to end element of every row in your matrix.
mylist = [[1, 2, 3, 4],
['a', 'b', 'c', 'd'],
[9, 8, 7, 6]]
new_list = [row[1:] for row in mylist]

I want explain, what have you done by this
print(list[0:][1:])
print(list[1:][0:])
Firstly note that python use indices starting at 0, i.e. for [1,2,3] there is 0th element, 1th element and 2nd element.
[0:] means get list elements starting at 0th element, this will give you copy of list, [1:] means get list elements starting at 1th element, which will give you list with all but 0th element. Therefore both lines are equivalent to each other and to
print(list[1:])
You might desired output using comprehension or map as follows
list1 = [[1, 2, 3, 4], ['a', 'b', 'c', 'd'], [9, 8, 7, 6]]
list2 = list(map(lambda x:x[1:],list1))
print(list2)
output
[[2, 3, 4], ['b', 'c', 'd'], [8, 7, 6]]
lambda here is nameless function, note that comprehension here is more readable, but might be easier to digest if you earlier worked with language which have similar feature, e.g. JavaScript's map

First - don't name your list "list"!
a = [[1, 2, 3, 4],
['a', 'b', 'c', 'd'],
[9, 8, 7, 6]]
b = [x[1:] for x in a]
print(b)
[[2, 3, 4], ['b', 'c', 'd'], [8, 7, 6]]

How to prepend an element to each list in a list

I have a list of lists:
x = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
I want to prepend a string 'a' to each list in the list such that x becomes:
[['a', 1, 2, 3], ['a', 4, 5, 6], ['a', 7, 8, 9]]
What is the most pythonic way of achieving this?
What I've got so far:
[l.insert(0, 'a') for l in x]

Just concat what you want to prepend:
[['a'] + l for l in x]
# [['a', 1, 2, 3], ['a', 4, 5, 6], ['a', 7, 8, 9]]

Prepending to a list is expensive (O(n)), because you need to move every element of the list to make room for the new item. (Appending, by contrast, is cheap.)
If this is something you need to do often for these lists, consider using a deque instead, which is optimized to support appending and prepending efficiently.
from collections import deque
x = [deque([1,2,3]), deque([4,5,6]), deque([7,8,9])]
for d in x:
d.appendleft('a')

Create a new list in a list of lists that has the same length as the longest one

If I have the following list of lists:
data = [[1,2,3], [1,2,3,4,5], [1,2,3,4,5,6,7]]
Is there a way to create a list between the second and third one (preferably at any position) that has the same length as the longest list in this list of lists?
For example, in my case create a list between the second and third/last list that has the same length as the last one (since this is the longest list with length 7):
data = [[1,2,3], [1,2,3,4,5], [1,2,3,4,5,6,7], [1,2,3,4,5,6,7]]
I'm using this data in a dataframe with pandas. Maybe pandas can help me accomplish my goal?

First get the longest sublist, then create a new list of that length (by copying it or whatever you need). Then insert the new list into data at your desired position.
data = [[1,2,3], [1,2,3,4,5], [1,2,3,4,5,6,7]]
longest_sublist = max(data, key=len)
new_list = longest_sublist.copy()
desired_position = 2
data.insert(desired_position, new_list)
After which, data becomes:
[[1, 2, 3], [1, 2, 3, 4, 5], [1, 2, 3, 4, 5, 6, 7], [1, 2, 3, 4, 5, 6, 7]]

Being that no specific elements for the new list have been provided, you could simply make a copy of the longest sublist and then insert it into data
data = sorted(data, key=len) # for when lists are not organized by len
new = ['x']*len(data[-1])
data.insert(-1, new)
# [[1, 2, 3], [1, 2, 3, 4, 5], ['x', 'x', 'x', 'x', 'x', 'x', 'x'], [1, 2, 3, 4, 5, 6, 7]]

Its actually quite simple
def insert_sublist(list_of_lists, position):
ideal_sized_list = list(range(max([len(i) for i in list_of_lists])))
return list_of_lists[:position]+[ideal_sized_list]+list_of_lists[position:]

Sort two lists of lists by index of inner list [duplicate]

This question already has answers here:
Sorting list based on values from another list
(20 answers)
Closed 5 years ago.
Assume I want to sort a list of lists like explained here:
>>>L=[[0, 1, 'f'], [4, 2, 't'], [9, 4, 'afsd']]
>>>sorted(L, key=itemgetter(2))
[[9, 4, 'afsd'], [0, 1, 'f'], [4, 2, 't']]
(Or with lambda.) Now I have a second list which I want to sort in the same order, so I need the new order of the indices. sorted() or .sort() do not return indices. How can I do that?
Actually in my case both lists contain numpy arrays. But the numpy sort/argsort aren't intuitive for that case either.

If I understood you correctly, you want to order B in the example below, based on a sorting rule you apply on L. Take a look at this:
L = [[0, 1, 'f'], [4, 2, 't'], [9, 4, 'afsd']]
B = ['a', 'b', 'c']
result = [i for _, i in sorted(zip(L, B), key=lambda x: x[0][2])]
print(result) # ['c', 'a', 'b']
# that corresponds to [[9, 4, 'afsd'], [0, 1, 'f'], [4, 2, 't']]

If I understand correctly, you want to know how the list has been rearranged. i.e. where is the 0th element after sorting, etc.
If so, you are one step away:
L2 = [L.index(x) for x in sorted(L, key=itemgetter(2))]
which gives:
[2, 0, 1]
As tobias points out, this is needlessly complex compared to
map(itemgetter(0), sorted(enumerate(L), key=lambda x: x[1][2]))

NumPy
Setup:
import numpy as np
L = np.array([[0, 1, 'f'], [4, 2, 't'], [9, 4, 'afsd']])
S = np.array(['a', 'b', 'c'])
Solution:
print S[L[:,2].argsort()]
Output:
['c' 'a' 'b']
Just Python
You could combine both lists, sort them together, and separate them again.
>>> L = [[0, 1, 'f'], [4, 2, 't'], [9, 4, 'afsd']]
>>> S = ['a', 'b', 'c']
>>> L, S = zip(*sorted(zip(L, S), key=lambda x: x[0][2]))
>>> L
([9, 4, 'afsd'], [0, 1, 'f'], [4, 2, 't'])
>>> S
('c', 'a', 'b')
I guess you could do something similar in NumPy as well...

How do you create a dictionary from nested lists in Python?

I was hoping that there might be someway to use a comprehension to do this, but say I have data that looks like this:
data = [['a', 'b', 'c'], [1, 2, 3], [4, 5, 6]]
My ultimate goal is to create a dictionary where the first nested list holds the keys and the remaining lists hold the values:
{'a': [1, 4], 'b': [2, 5], 'c': [3, 6]}
I have tried something like this that gets me close, but as you can tell I am having trouble appending the list in the dictionary values and this code is just overwriting:
d = {data[0][c]: [] + [col] for r, row in enumerate(data) for c, col in enumerate(row)}
>>> d
{'c': [6], 'a': [4], 'b': [5]}

You can use zip in a dict comprehension:
{z[0]: list(z[1:3]) for z in zip(*data)}
Out[16]: {'a': [1, 4], 'b': [2, 5], 'c': [3, 6]}
How it works:
zip will take the transpose:
list(zip(['a', 'b', 'c'], [1, 2, 3], [4, 5, 6]))
Out[19]: [('a', 1, 4), ('b', 2, 5), ('c', 3, 6)]
However, your data is a list of lists, so in order to make sure Python doesn't see a single list but sees three seperate lists, you need zip(*data) instead of zip(data). There are several posts on the use of *: (1), (2), (3).
list(zip(*data))
Out[13]: [('a', 1, 4), ('b', 2, 5), ('c', 3, 6)]
And in the dict comprehension you are taking the first elements as the keys and the remaining two as the values.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to remove duplicate lists from a 'list of lists'? - python

de-duping while preserving the order (from Cpython 3.6+): >>> lst = [[1, 'A', 6, 2], [8, 'C', 6, 2], [3, 'G', 3, 4], ... [1, 'A', 6, 2], [3, 'G', 3, 4], [3, 'B', 3, 4]] >>> [list(x) for x in dict.fromkeys(map(tuple, lst))] [[1, 'A', 6, 2], [8, 'C', 6, 2], [3, 'G', 3, 4], [3, 'B', 3, 4]]

Related

Slicing 2D Python List

How to prepend an element to each list in a list

Create a new list in a list of lists that has the same length as the longest one

Sort two lists of lists by index of inner list [duplicate]

How do you create a dictionary from nested lists in Python?

Categories

Resources