Consider the following simplified case:
lol = [['John','Polak',5,3,7,9],
['John','Polak',7,9,2,3],
['Mark','Eden' ,0,3,3,1],
['Mark','Eden' ,5,1,2,9]]
What would be a pythonic and memory+speed efficient way to transform this list-of-lists to a list-of-lists-of-lists based on the first two parameters:
lolol = [[['John','Polak',5,3,7,9],
['John','Polak',7,9,2,3]],
[['Mark','Eden' ,0,3,3,1],
['Mark','Eden' ,5,1,2,9]]]
Actually - any other data structure would also be ok, as long as I have the correct hierarchy. For example the following dictionary structure comes to mind, but creating it doesn't seem efficient speed-efficient enough, and the memory would probably be higher than the lolol solution.
dolol = {('John','Polak'):[[5,3,7,9],[7,9,2,3]],
('Mark','Eden') :[[0,3,3,1],[5,1,2,9]]}
List:
from itertools import groupby
lolol = [list(grp) for (match, grp) in groupby(lol, lambda lst: lst[:2])]
# [[['John', 'Polak', 5, 3, 7, 9], ['John', 'Polak', 7, 9, 2, 3]],
# [['Mark', 'Eden', 0, 3, 3, 1], ['Mark', 'Eden', 5, 1, 2, 9]]]
Dictionary:
dolol = dict((tuple(match), [x[2:] for x in grp]) for (match, grp) in
groupby(lol, lambda lst: lst[:2]))
# {('John', 'Polak'): [[5, 3, 7, 9], [7, 9, 2, 3]],
# ('Mark', 'Eden'): [[0, 3, 3, 1], [5, 1, 2, 9]]}
Since itertools.groupby works on consecutive matches, it assumes sorted input (lol).
If a dictionary is acceptable, this code will create one:
import collections
d = collections.defaultdict(list)
for name, surname, *stuff in lol:
d[name, surname].append(nums)
Note that this requires Python 3 (extended iterable unpacking). For Python 2, use
for x in lol:
name = x[0]
surname = x[1]
stuff = x[2:]
You may fold the variables to save lines.
To complement delnan's answer with a Python 2 equivalent:
from collections import defaultdict
dolol=defaultdict(list)
for data in lol:
dolol[data[0],data[1]].append(data[2:])
Related
I have a nested list as an example:
lst_a = [[1,2,3,5], [1,2,3,7], [1,2,3,9], [1,2,6,8]]
I'm trying to check if the first 3 indices of a nested list element are the same as other.
I.e.
if [1,2,3] exists in other lists, remove all the other nested list elements that contain that. So that the nested list is unique.
I'm not sure the most pythonic way of doing this would be.
for i in range(0, len(lst_a)):
if lst[i][:3] == lst[i-1][:3]:
lst[i].pop()
Desired output:
lst_a = [[1,2,3,9], [1,2,6,8]]
If, as you said in comments, sublists that have the same first three elements are always next to each other (but the list is not necessarily sorted) you can use itertools.groupby to group those elements and then get the next from each of the groups.
>>> from itertools import groupby
>>> lst_a = [[1,2,3,5], [1,2,3,7], [1,2,3,9], [1,2,6,8]]
>>> [next(g) for k, g in groupby(lst_a, key=lambda x: x[:3])]
[[1, 2, 3, 5], [1, 2, 6, 8]]
Or use a list comprehension with enumerate and compare the current element with the last one:
>>> [x for i, x in enumerate(lst_a) if i == 0 or lst_a[i-1][:3] != x[:3]]
[[1, 2, 3, 5], [1, 2, 6, 8]]
This does not require any imports, but IMHO when using groupby it is much clearer what the code is supposed to do. Note, however, that unlike your method, both of those will create a new filtered list, instead of updating/deleting from the original list.
I think you are missing a loop For if you want to check all possibilities. I guess it should like :
for i in range(0, len(lst_a)):
for j in range(i, len(lst_a)):
if lst[i][:3] == lst[j][:3]:
lst[i].pop()
Deleting while going throught the list is maybe not the best idea you should delete unwanted elements at the end
Going with your approach, Find the below code:
lst=[lst_a[0]]
for li in lst_a[1:]:
if li[:3]!=lst[0][:3]:
lst.append(li)
print(lst)
Hope this helps!
You can use a dictionary to filter a list:
dct = {tuple(i[:3]): i for i in lst}
# {(1, 2, 3): [1, 2, 3, 9], (1, 2, 6): [1, 2, 6, 8]}
list(dct.values())
# [[1, 2, 3, 9], [1, 2, 6, 8]]
I have a sorted list with duplicate elements like
>>> randList = [1, 2, 2, 3, 4, 4, 5]
>>> randList
[1, 2, 2, 3, 4, 4, 5]
I need to create a list that removes the adjacent duplicate elements. I can do it like:
>>>> dupList = []
for num in nums:
if num not in dupList:
dupList.append(num)
But I want to do it with list comprehension. I tried the following code:
>>> newList = []
>>> newList = [num for num in randList if num not in newList]
But I get the result like the if condition isn't working.
>>> newList
[1, 2, 2, 3, 4, 4, 5]
Any help would be appreciated.
Thanks!!
Edit 1: The wording of the question does seem to be confusing given the data I have provided. The for loop that I am using will remove all duplicates but since I am sorting the list beforehand, that shouldn't a problem when removing adjacent duplicates.
Using itertools.groupby is the simplest approach to remove adjacent (and only adjacent) duplicates, even for unsorted input:
>>> from itertools import groupby
>>> [k for k, _ in groupby(randList)]
[1, 2, 3, 4, 5]
Removing all duplicates while maintaining the order of occurence can be efficiently achieved with an OrderedDict. This, as well, works for ordered and unordered input:
>>> from collections import OrderedDict
>>> list(OrderedDict.fromkeys(randList))
[1, 2, 3, 4, 5]
I need to create a list that removes the adjacent duplicate elements
Note that your for loop based solution will remove ALL duplicates, not only adjacent ones. Test it with this:
rand_list = [1, 2, 2, 3, 4, 4, 2, 5, 1]
according to your spec the result should be:
[1, 2, 3, 4, 2, 5, 1]
but you'll get
[1, 2, 3, 4, 5]
instead.
A working solution to only remove adjacent duplicates is to use a generator:
def dedup_adjacent(seq):
prev = seq[0]
yield prev
for current in seq[1:]:
if current == prev:
continue
yield current
prev = current
rand_list = [1, 2, 2, 3, 4, 4, 2, 5, 1]
list(dedup_adjacent(rand_list))
=> [1, 2, 3, 4, 2, 5, 1]
Python first evaluates the list comprehension and then assigns it to newList, so you cannot refer to it during execution of the list comprehension.
You can remove dublicates in two ways:-
1. Using for loop
rand_list = [1,2,2,3,3,4,5]
new_list=[]
for i in rand_list:
if i not in new_list:
new_list.append(i)
Convert list to set,then again convert set to list,and at last sort the new list.
Since set stores values in any order so when we convert set into list you need to sort the list so that you get the item in ascending order
rand_list = [1,2,2,3,3,4,5]
sets = set(rand_list)
new_list = list(sets)
new_list.sort()
Update: Comparison of different Approaches
There have been three ways of achieving the goal of removing adjacent duplicate elements in a sorted list, i.e. removing all duplicates:
using groupby (only adjacent elements, requires initial sorting)
using OrderedDict (all duplicates removed)
using sorted(list(set(_))) (all duplicaties removed, ordering restored by sorting).
I compared the running times of the different solutions using:
from timeit import timeit
print('groupby:', timeit('from itertools import groupby; l = [x // 5 for x in range(1000)]; [k for k, _ in groupby(l)]'))
print('OrderedDict:', timeit('from collections import OrderedDict; l = [x // 5 for x in range(1000)]; list(OrderedDict.fromkeys(l))'))
print('Set:', timeit('l = [x // 5 for x in range(1000)]; sorted(list(set(l)))'))
> groupby: 78.83623623599942
> OrderedDict: 94.54144410200024
> Set: 65.60372123999969
Note that the set approach is the fastest among all alternatives.
Old Answer
Python first evaluates the list comprehension and then assigns it to newList, so you cannot refer to it during execution of the list comprehension. To illustrate, consider the following code:
randList = [1, 2, 2, 3, 4, 4, 5]
newList = []
newList = [num for num in randList if print(newList)]
> []
> []
> []
> …
This becomes even more evident if you try:
# Do not initialize newList2
newList2 = [num for num in randList if print(newList2)]
> NameError: name 'newList2' is not defined
You can remove duplicates by turning randList into a set:
sorted(list(set(randlist)))
> [1, 2, 3, 4, 5]
Be aware that this does remove all duplicates (not just adjacent ones) and ordering is not preserved. The former also holds true for your proposed solution with the loop.
edit: added a sorted clause as to specification of required ordering.
In this line newList = [num for num in randList if num not in newList], at first the list will be created in right side then then it will be assigned to newList. That's why every time you check if num not in newList returns True. Becasue newList remains empty till the assignment.
You can try this:
randList = [1, 2, 2, 3, 4, 4, 5]
new_list=[]
for i in randList:
if i not in new_list:
new_list.append(i)
print(new_list)
You cannot access the items in a list comprehension as you go along. The items in a list comprehension are only accessible once the comprehension is completed.
For large lists, checking for membership in a list will be expensive, albeit with minimal memory requirements. Instead, you can append to a set:
randList = [1, 2, 2, 3, 4, 4, 5]
def gen_values(L):
seen = set()
for i in L:
if i not in seen:
seen.add(i)
yield i
print(list(gen_values(randList)))
[1, 2, 3, 4, 5]
This algorithm has been implemented in the 3rd party toolz library. It's also known as the unique_everseen recipe in the itertools docs:
from toolz import unique
res = list(unique(randList))
Since your list is sorted, using set will be the fasted way to achieve your goal, as follows:
>>> randList = [1, 2, 2, 3, 4, 4, 5]
>>> randList
[1, 2, 2, 3, 4, 4, 5]
>>> remove_dup_list = list(set(randList))
>>> remove_dup_list
[1, 2, 3, 4, 5]
>>>
Let's say I have a list of lists, for example:
[[0, 2], [0, 1], [2, 3], [4, 5, 7, 8], [6, 4]]
and if at least one of the values on a list is the same that another one of a different list, i would like to unite the lists so in the example the final result would be:
[[0, 1, 2, 3], [4, 5, 6, 7, 8]]
I really don't care about the order of the values inside the list [0, 1, 2, 3] or [0, 2, 1, 3].
I tried to do it but it doesn't work. So have you got any ideas? Thanks.
Edit(sorry for not posting the code that i tried before):
What i tried to do was the following:
for p in llista:
for q in p:
for k in llista:
if p==k:
llista.remove(k)
else:
for h in k:
if p!=k:
if q==h:
k.remove(h)
for t in k:
if t not in p:
p.append(t)
llista_final = [x for x in llista if x != []]
Where llista is the list of lists.
I have to admit this is a tricky problem. I'm really curious what does this problem represent and/or where did you find it out...
I initially have thought this is just a graph connected components problem, but I wanted to take a shortcut from creating an explicit representation of the graph, running bfs, etc...
The idea of the solution is this: for every sublist, check if it has some common element with any other sublist, and replace that with their union.
Not very pythonic, but here it is:
def merge(l):
l = list(map(tuple, l))
for i, h in enumerate(l):
sh = set(h)
for j, k in enumerate(l):
if i == j: continue
sk = set(k)
if sh & sk: # h and k have some element in common
l[j] = tuple(sh | sk)
return list(map(list, set(l)))
Here is a function that does what you want. I tried to use self-documenting variable names and comments to help you understand how this code works. As far as I can tell, the code is pythonic. I used sets to speed up and simplify some of the operations. The downside of that is that the items in your input list-of-lists must be hashable, but your example uses integers which works perfectly well.
def cliquesfromlistoflists(inputlistoflists):
"""Given a list of lists, return a new list of lists that unites
the old lists that have at least one element in common.
"""
listofdisjointsets = []
for inputlist in inputlistoflists:
# Update the list of disjoint sets using the current sublist
inputset = set(inputlist)
unionofsetsoverlappinginputset = inputset.copy()
listofdisjointsetsnotoverlappinginputset = []
for aset in listofdisjointsets:
# Unite set if overlaps the new input set, else just store it
if aset.isdisjoint(inputset):
listofdisjointsetsnotoverlappinginputset.append(aset)
else:
unionofsetsoverlappinginputset.update(aset)
listofdisjointsets = (listofdisjointsetsnotoverlappinginputset
+ [unionofsetsoverlappinginputset])
# Return the information in a list-of-lists format
return [list(aset) for aset in listofdisjointsets]
print(cliquesfromlistoflists([[0, 2], [0, 1], [2, 3], [4, 5, 7, 8], [6, 4]]))
# printout is [[0, 1, 2, 3], [4, 5, 6, 7, 8]]
This solution modifies the generic breadth-first search to gradually diminish the initial deque and update a result list with either a combination should a match be found or a list addition if no grouping is discovered:
from collections import deque
d = deque([[0,2] , [0,1] , [2,3] , [4,5,7,8] , [6,4]])
result = [d.popleft()]
while d:
v = d.popleft()
result = [list(set(i+v)) if any(c in i for c in v) else i for i in result] if any(any(c in i for c in v) for i in result) else result + [v]
Output:
[[0, 1, 2, 3], [8, 4, 5, 6, 7]]
I'm not sure how to write a comparator in Python 3 as the cmp parameter is removed. Considering the following code in Python 3, how do I rewrite the comparator using only key?
import functools
def my_cmp(x, y):
return x*5-y*2
l = [50, 2, 1, 9]
print(sorted(l, key=functools.cmp_to_key(my_cmp)))
thanks.
This "comparison" function that you came up with is inconsistent: it should provide a definite (deterministic) order, meaning, if you change the order of the elements in the list and run sorted - you should get the same result!
In your case, the order of the elements effects the sorting:
import functools
def my_cmp(x, y):
return x*5-y*2
l = [50, 2, 1, 9]
print(sorted(l, key=functools.cmp_to_key(my_cmp))) # [2, 1, 9, 50]
l = [50, 1, 2, 9]
print(sorted(l, key=functools.cmp_to_key(my_cmp))) # [1, 2, 9, 50]
which means that your "comparison" function is inconsistent. First provide good ordering function, then it should not be very difficult to convert it to a key function.
Regards the question that you raised in the comments, key accepts a function that takes only a single argument - and returns a "measurement" of "how big is it". The easiest example would be to compare numbers, in that case your key function can simply be: lambda x: x. For any number the lambda expression will returns itself and the comparison is now trivial!
Modifying your example:
def my_key(x):
return x
l = [50, 2, 1, 9]
print(sorted(l, key=my_key)) # [1, 2, 9, 50]
A shorter version of the above would be:
l = [50, 2, 1, 9]
print(sorted(l, key=lambda x: x)) # [1, 2, 9, 50]
I've refactored how the merged-dictionary (all_classes) below is created, but I'm wondering if it can be more efficient.
I have a dictionary of dictionaries, like this:
groups_and_classes = {'group_1': {'class_A': [1, 2, 3],
'class_B': [1, 3, 5, 7],
'class_c': [1, 2], # ...many more items like this
},
'group_2': {'class_A': [11, 12, 13],
'class_C': [5, 6, 7, 8, 9]
}, # ...and many more items like this
}
A function creates a new object from groups_and_classes like this (the function to create this is called often):
all_classes = {'class_A': [1, 2, 3, 11, 12, 13],
'class_B': [1, 3, 5, 7, 9],
'class_C': [1, 2, 5, 6, 7, 8, 9]
}
Right now, there is a loop that does this:
all_classes = {}
for group in groups_and_classes.values():
for c, vals in group.iteritems():
for v in vals:
if all_classes.has_key(c):
if v not in all_classes[c]:
all_classes[c].append(v)
else:
all_classes[c] = [v]
So far, I changed the code to use a set instead of a list since the order of the list doesn't matter and the values need to be unique:
all_classes = {}
for group in groups_and_classes.values():
for c, vals in group.iteritems():
try:
all_classes[c].update(set(vals))
except KeyError:
all_classes[c] = set(vals)
This is a little nicer, and I didn't have to convert the sets to lists because of how all_classes is used in the code.
Question: Is there a more efficient way of creating all_classes (aside from building it at the same time groups_and_classes is built, and changing everywhere this function is called)?
Here's a tweak for conciseness, though I'm not sure about performance:
from collections import defaultdict
all_classes = defaultdict(set)
for group in groups_and_classes.values():
for c, vals in group.iteritems():
all_classes[c].update(set(vals))
Defaultdicts are not quite the greatest thing since sliced bread, but they're pretty cool. :)
One thing that might improve things slightly is to avoid the redundant conversion to a set, and just use:
all_classes[c].update(vals)
update can actually take an arbitrary iterable, as it essentially just iterates and adds, so you can avoid an extra conversion step.
Combining Dictionaries Of Lists In Python.
def merge_dols(dol1, dol2):
result = dict(dol1, **dol2)
result.update((k, dol1[k] + dol2[k]) for k in set(dol1).intersection(dol2))
return result
g1 = groups_and_classes['group_1']
g2 = groups_and_classes['group_2']
all_classes = merge_dols(g1,g2)
OR
all_classes = reduce(merge_dols,groups_and_classes.values())
--copied from Alex Martelli
If you get more than two groups then you can use itertools.reduce
all_classes = reduce(merge_dols,groups_and_classes.values())