Extract unique list from nested list in Python - python

I want to extract unique data from nested list, see below. I implemented two way of this. First one works good, but second one failed. Is new_data is empty during calculation? And how do I fix it?
data = [
['a', 'b'],
['a', 'c'],
['a', 'b'],
['b', 'a']
]
# working
new_data = []
for d in data:
if d not in new_data:
new_data.append(d)
print(new_data)
# [['a', 'b'], ['a','c'], ['b','a']]
# Failed to extract unique list
new_data = []
new_data = [d for d in data if d not in new_data]
print(new_data)
# [['a', 'b'], ['a', 'c'], ['a', 'b'], ['b', 'a']]

Just try:
new_data = [list(y) for y in set([tuple(x) for x in data])]
You cannot use set() on a list of lists because lists are not hashable. You convert the list of lists into a list of tuples. Apply set() to remove the duplicates. Then convert the de duplicated list of tuples back into a list of lists.

you could use enumerate to test that there are no copies before the current value such that only the first instance of a copy is taken:
new_data = [item for index, item in enumerate(data) if item not in data[:index]]

Related

I have a list and numpy array of lists. How to find the index and extract the value from index?

I have a list,
list = ["A","B","C","D","E"]
values = np.array([[1,0,0,1,1],[0,1,0,0,1]])
where values is of type numpy array.
I want my output to look like,
["A","D","E"]
["B","E"]
I want to loop through every element inside a list and extract the index of elements having values 1. Using the index from Values get the names for the same index from the list and store them as a list inside a DataFrame. This has to be done for every list inside values.
Kindly help. Thanks
Try a list comprehension:
l = ["A","B","C","D","E"]
values = ([[1,0,0,1,1],[0,1,0,0,1]])
print([[x for x, y in zip(l, i) if y] for i in values])
Output:
[['A', 'D', 'E'], ['B', 'E']]
Try itertools.compress:
>>> from itertools import compress
>>> list_ = ["A","B","C","D","E"]
>>> values = np.array([[1,0,0,1,1],[0,1,0,0,1]])
>>> result = [[*compress(list_,val)] for val in values]
>>> print(*result, sep = '\n')
['A', 'D', 'E']
['B', 'E']

combining list of list with addition of value when list items are missing

I need to combine list of lists and add items when items are not present in other lists.
I need a way to represent when a value is not present in a list across three list of lists. In the example below ['e','f'] is only in list one (l1), so a placeholder would be added to list two (l2) and three (l3) to represent that it is in list one, but not list two and three. The placeholder would be something like ['e','-'].
l1 = [['a', 'b'],['e','f']]
l2 = [['a', 'b'],['c', 'd']]
l3 = [['a', 'b'],['c', 'd']]
So in the end, every list would have an entry for any list item that had a unique value in the first position of a list item.
l1 = [['a', 'b'],['c','-'],['e','f']
l2 = [['a', 'b'],['c','d'],['e','-']
l3 = [['a', 'b'],['c','d'],['e','-']
I tried converting the list of lists to sets and could find common objects, for example
l1_set = set(tuple(row) for row in l1)
l2_set = set(tuple(row) for row in l2)
l3_set = set(tuple(row) for row in l3)
print (ipl_set & vgda_set & odm_set)
set([('a', 'b')])
I am not sure how to manage the sets so I can find different values and modify the lists to include those different values, while keeping position in the list.
The order does matter. I don't want to simply append missing list items like this:
l1 = [['a', 'b'],['e','f'],['c','-']
l2 = [['a', 'b'],['c','d'],['e','-']
l3 = [['a', 'b'],['c','d'],['e','-']
The function add_missing_items() accepts any number (>0) of lists and add to them items that are missing from other lists. Assumes that items inside the lists can be sorted alphabetically:
l1 = [['a', 'b'],['e','f']]
l2 = [['a', 'b'],['c', 'd']]
l3 = [['a', 'b'],['c', 'd']]
def add_missing_items(*lists):
l_sets = [set(tuple(row) for row in l) for l in lists]
u = l_sets[0].union(*l_sets[1:])
for lst, slst in zip(lists, l_sets):
l = [list(v) for v in u.difference(slst)]
for missing_value in l:
missing_value[1::1] = '-' * (len(missing_value)-1)
lst[:] = sorted(lst + l)
add_missing_items(l1, l2, l3)
print(l1)
print(l2)
print(l3)
Prints:
[['a', 'b'], ['c', '-'], ['e', 'f']]
[['a', 'b'], ['c', 'd'], ['e', '-']]
[['a', 'b'], ['c', 'd'], ['e', '-']]
You can test if a couple in a list is in another list like this:
l1 = [['a', 'b'],['e','f']]
l2 = [['a', 'b'],['c', 'd']]
l3 = [['a', 'b'],['c', 'd']]
for couple in l1:
if couple not in l2:
print(couple)
In the case above, it's just an example to show how you can find elements that are different from l1. However, you can image modify an array instead of just make a print.
Hence, if you want to modify your arrays, I advise you to copy them and not modify directly original arrays.

Find index element in a list of lists and strings

I have a list containing strings and lists. Something like:
l = ['a', 'b', ['c', 'd'], 'e']
I need to find the index of an element I'm looking for in this nested list. For instance, if I need to find c, the function should return 2, and if I'm looking for d, it should return 2 too. Consider that I have to do this for a large number of elements. Before I was simply using
idx = list.index(element)
but this does not work anymore, because of the nested lists. I cannot simply flatten the list, as I then shall use the index in another list with the same shape as this one.
Any suggestion?
This is one approach, Iterating the list.
Ex:
l = ['a', 'b', ['c', 'd'], 'e']
toFind = "c"
toFind1 = "d"
for i, v in enumerate(l):
if isinstance(v, list):
if toFind1 in v:
print(i)
else:
if toFind1 == v:
print(i)

restructure lists of lists by iteration in Python

I have a 2D list or list of lists.
Input file is
A 58.76-65.9
B 58.76-65.9
C 58.76-65.9
A 24.8-62.8
I then created a list of lists:
with open("Input.txt", "r") as file:
raw = [[str(x) for x in line.split()] for line in file]
print (raw)
which returns
[['A', '58.76-65.9'], ['B', '58.76-65.9'], ['C', '58.76-65.9'], ['A', '24.8-62.8']]
My aim is to now create a new list of lists with a new structure. How can I obtain a new list of lits like this?
[['58.76-65.9', 'A', 'B', 'C'], ['A', '24.8-62.8']]
I first tried unioning sets, but that creates one large list and I need lists of lists. Therefore my plan is to (1) Create a new empty list of lists,
(2) iterate through the original list of lists,
(3) check if the 2nd element (i.e. 58.76-65.9) exists in the new list lists of lists. If it does not, extend both elements. If it does, just the first element (ie A)
# Defining empty list
matches=[]
# Accesing each row in the 2d list
for r in raw:
if r[1] not in matches[0][]:
matches.append([r[1], r[0]])
I realize that matches[0][] is not correct, what is the correct way to access it?
Use the grouping idiom:
>>> data = [['A', '58.76-65.9'], ['B', '58.76-65.9'], ['C', '58.76-65.9'], ['A', '24.8-62.8']]
>>> from collections import defaultdict
>>> grouper = defaultdict(list)
>>> for x, y in data:
... grouper[y].append(x)
...
>>> grouper
defaultdict(<class 'list'>, {'24.8-62.8': ['A'], '58.76-65.9': ['A', 'B', 'C']})
Now, I honestly think the above data-structure is much more practical, but you can easily convert into a list-of-lists if you really want:
>>> [[k] + v for k, v in grouper.items()]
[['24.8-62.8', 'A'], ['58.76-65.9', 'A', 'B', 'C']]
Or even nicer:
>>> [[k, *v] for k, v in grouper.items()]
[['24.8-62.8', 'A'], ['58.76-65.9', 'A', 'B', 'C']]
Just use the dictionary data structure. It does, what you want:
# Load data:
my_array = [[1 , 10], [2, 10], [3, 20]]
# Result as a dictionary:
result = {}
# Loop over data:
for value, key in my_array:
if key not in result:
# Create new list
result[key]=[]
result[key].append(value)
# If you really need a list of lists as output, do something like:
result_l = [list(elem) for elem in result.items()]
# (in python3)

Would like to prevent dupes in a python list of lists

I am creating a list of lists and want to prevent dupes. For example, I have:
mainlist = [[a,b],[c,d],[a,d]]
the next item (list) to be added is [b,a] which is considered a duplicate of [a,b].
UPDATE
mainlist = [[a,b],[c,d],[a,d]]
swap = [b,a]
for item in mainlist:
if set(item) & set(swap):
print "match was found", item
else:
mainlist.append(swap)
Any suggestions as to how I can test whether the next item to be added is already in the list?
Here's an approach using frozensets within a set to check for duplicates. It's a bit ugly since I'm invoking a function that works with global variables.
def add_to_mainlist(new_list):
if frozenset(new_list) not in dups:
mainlist.append(new_list)
mainlist = [['a', 'b'],['c', 'd'],['a', 'd']]
dups = set()
for l in mainlist:
dups.add(frozenset(l))
print("Before:", mainlist)
add_to_mainlist(['a', 'b'])
print("After:", mainlist)
This outputs:
Before: [['a', 'b'], ['c', 'd'], ['a', 'd']]
After: [['a', 'b'], ['c', 'd'], ['a', 'd']]
Showing that the new list was indeed not added to the original.
Here's a cleaner version that calculates the existing set on the fly inside a function that does everything locally:
def add_to_mainlist(mainlist, new_list):
dups = set()
for l in mainlist:
dups.add(frozenset(l))
if frozenset(new_list) not in dups:
mainlist.append(new_list)
return mainlist
mainlist = [['a', 'b'],['c', 'd'],['a', 'd']]
print("Before:", mainlist)
mainlist = add_to_mainlist(mainlist, ['a', 'b']) # the assignment isn't needed, but done anyway :-)
print("After:", mainlist)
Why doesn't your existing code work?
This is what you're doing:
...
for item in mainlist:
if set(item) & set(swap):
print "match was found", item
else:
mainlist.append(swap)
You're intersecting two sets and checking the truthiness of the result. While this might be okay for 0 intersections, in the event that even one of the elements are common (example, ['a', 'b'] and ['b', 'd']), you'd still declare a match which is false.
Ideally you'd want to check the length of the resultant set and make sure its length is equal to than 2:
dups = False
for item in mainlist:
if len(set(item) & set(swap)) == 2:
dups = True
break
if dups == False:
mainlist.append(swap)
You'd also ideally want a flag to ensure that you did not find duplicates. Your previous code would add without checking all items first.
If the order of your inner lists doesn't matter, then this can trivially be accomplished using frozenset()s:
>>> mainlist = [['a', 'b'],['c', 'd'],['a', 'd']]
>>> mainlist = [frozenset(sublist) for sublist in mainlist]
>>>
>>> def add_to_list(lst, sublist):
... if frozenset(sublist) not in lst:
... lst.append(frozenset(sublist))
...
>>> mainlist
[frozenset({'a', 'b'}), frozenset({'d', 'c'}), frozenset({'a', 'd'})]
>>> add_to_list(mainlist, ['b', 'a'])
>>> mainlist
[frozenset({'a', 'b'}), frozenset({'d', 'c'}), frozenset({'a', 'd'})]
>>>
If the order does matter you can either do what #Coldspeed suggested - Construct a set() from your list, construct a frozenset() from the list to be added, and test for membership - or you can use all() and sorted() to test if the list to be added is equivalent to any of the other lists:
>>> def add_to_list(lst, sublist):
... for l in lst:
... if all(a == b for a, b in zip(sorted(sublist), sorted(l))):
... return
... lst.append(sublist)
...
>>> mainlist
[['a', 'b'], ['c', 'd'], ['a', 'd']]
>>> add_to_list(mainlist, ['b', 'a'])
>>> mainlist
[['a', 'b'], ['c', 'd'], ['a', 'd']]
>>>

Categories

Resources