Python remove duplicates in dictionary of lists - python

My dictionary looks something like this:
dictionary= {apple:[3,5], banana:[3,3,6], strawberry:[1,2,4,5,5]}
How am I able to remove all duplicates (so create a set) for each value/list?
I would like the new dictionary to be look this:
{apple:[3,5], banana:[3,6], strawberry:[1,2,4,5]}

using dict comprehension and sets to remove duplicates
d= {'apple':[3,5], 'banana':[3,3,6], 'strawberry':[1,2,4,5,5]}
print {k:list(set(j)) for k,j in d.items()}
results in
{'strawberry': [1, 2, 4, 5], 'apple': [3, 5], 'banana': [3, 6]}
If you want to preserve the list order
d= {'apple':[3,5,5,8,4,5], 'banana':[3,3,6,1,1,3], 'strawberry':[5,1,1,2,4,5,5]}
print {k:sorted(set(j),key=j.index) for k,j in d.items()}
results in:
{'strawberry': [5, 1, 2, 4], 'apple': [3, 5, 8, 4], 'banana': [3, 6, 1]}

for lst in dictionary.values():
lst[:] = list(set(lst))
Going through set might change the order, though. If that must not happen, OrderedDict is an option:
for lst in dictionary.values():
lst[:] = list(collections.OrderedDict.fromkeys(lst))
Or if the lists shall be sorted, you can do that instead:
for lst in dictionary.values():
lst[:] = sorted(set(lst))
Or if the lists already are sorted, you could keep the first element and every element that's not a duplicate of the element before it.
for lst in dictionary.values():
lst[:] = lst[:1] + [b for a, b in zip(lst, lst[1:]) if a != b]

dictionary= {"apple":[3,5], "banana":[3,3,6], "strawberry":[1,2,4,5,5]}
for key,item in dictionary.items():
dictionary[key]=set(item)
print(dictionary)
output:
{'apple': {3, 5}, 'banana': {3, 6}, 'strawberry': {1, 2, 4, 5}}

Related

How to return a string containing information about how many values exist for each key

I currently have the following:
mydict = {'a': [1, 2, 3], 'b': [1, 2]}
I want to return a string containing quantities of items available in the dictionary. For example
a: 3
b: 2
However, I want my output to update if I add another key value pair to the dictionary. For example mydict['c'] = [1, 2, 3]
I have thought about how to do this and this is all that comes to mind:
def quantities() -> str:
mydict = {'a': [1, 2, 3], 'b': [1, 2]}
for k, v in mydict:
print(f'{k}: {len(v)})
But I am not sure if this is correct. Are there any other ways to do this.
The statement:
for <variable> in mydict:
Iterates through only the keys of the dictionary. So, you can either use the key to get the item like:
mydict = {'a': [1, 2, 3], 'b': [1, 2]}
for k in mydict:
print(f'{k}: {len(mydict[k])}')
Or use mydict.items() This makes it iterate through every (key, value). USe it as:
mydict = {'a': [1, 2, 3], 'b': [1, 2]}
for k, v in mydict.items():
print(f'{k}: {len(v)}')
I don't think your sample code will work. I used this documentation and use sorted() I think what you want is something like this.
mydict = {'a': [1, 2, 3, 4], 'b': [1, 2]}
def quantities():
for k, v in sorted(mydict.items()):
print(k, len(v))
quantities()
You can do this with str.join and a generator expression:
def quantities(mydict):
return '\n'.join('{}: {}'.format(k, len(v)) for k, v in mydict.items())

List comprehension with early conditional check

For the given list l
l = [{'k': [1, 2]}, {'k': [2, 8]}, {'k': [6, 32]}, {}, {'s': 0}]
where I would like to have a single list of all values
r = [1, 2, 2, 8, 6, 32]
and the code
r = []
for item in l:
if 'k' in item:
for i in item['k']:
r += [i]
is there an elegant list comprehension solution for this kind of list?
Obviously,
[i for i in item['k'] if 'k' in item for item in l]
fails, because item['k'] is accessed before the condition is checked. Any ideas?
Use get to provide an empty list to iterate over if k doesn't exist.
r = [i for d in l for i in d.get('k', [])]
Or, check for k before you try to access its value.
r = [i for d in l if 'k' in d for i in d['k']]
You almost have the right solution with your list comprehension. It is just that the order of statements inside list comprehension is wrong. Please try the following.
l = [{'k': [1, 2]}, {'k': [2, 8]}, {'k': [6, 32]}, {}, {'s': 0}]
answer = [i for item in l if 'k' in item for i in item['k'] ]
print(answer)
Is this what you wanted?

How do I merge into one sublist the two sublists with the same index 0?

It's not the same as flattening a list.
I have this list of lists:
listoflists = [[853, 'na'], [854, [1, 2, 3, 4, 5]], [854, [2, 4, 6, 8]]
I want those sublists which have the same index 0 (in this case 854) to be combined but not flattened, like so:
listoflists_v2 = [[853, 'na'], [854, [1, 2, 3, 4, 5], [2, 4, 6, 8]]]
How do I do that?
If order is important, use an OrderedDict and collect values per key:
from collections import OrderedDict
d = OrderedDict()
for k, v in listoflists:
d.setdefault(k, []).append(v)
listoflists_v2 = [[k, *v] for k, v in d.items()]
If not, use a defaultdict, you get slightly better performance:
from collections import defaultdict
d = defaultdict(list)
for k, v in listoflists:
d[k].append(v)
listoflists_v2 = [[k, *v] for k, v in d.items()]
Another option is using itertools.groupby:
from itertools import groupby
from operator import itemgetter
listoflists.sort(key=itemgetter(0)) # Do this if keys aren't consecutive.
listoflists_v2 = [
[k, *map(itemgetter(1), g)]
for k, g in groupby(listoflists, key=itemgetter(0))
]
print(listoflists_v2)
[[853, 'na'], [854, [1, 2, 3, 4, 5], [2, 4, 6, 8]]]
Here is another way of going about it although i wouldn't recommend it. Its good
for learning i guess.
# orginal list
listoflists = [[853, 'na'], [854, [1, 2, 3, 4, 5]], [854, [2, 4, 6, 8]]]
# new list with combined data
new_list = []
# loop through all sublists
for sub_list in listoflists:
# check if new_list is empty to see if its data should be compared
# with the orinal if not add sublist to new_list
if new_list:
# check all the lists in new_list
for list_ in new_list:
# if the list in new_list and one of the original lists
# first element match, add the values of the original list
# starting from the first elemnt to the new_list
if sub_list[0] == list_[0]:
list_.append(sub_list[1:])
else:
list_.append(sub_list)
else:
new_list.append(sub_list)
print(new_list)

How to sort a dictionary by value lists numerically

Assume a Python dictionary, in which the keys are strings and the corresponding values are lists of integers.
> my_dict = {}
> my_dict['key1'] = [1,2,3]
> my_dict['key2'] = [4,5]
> my_dict['key3'] = [3,4,5]
> my_dict
# Python dictionaries are unordered by default
{'key2': [4, 5], 'key3': [3, 4, 5], 'key1': [1, 2, 3]}
What function would you write to obtain a dictionary-like object sorted by the first element of each value (i.e., first element of each list).
> from collections import OrderedDict
> def my_function(a_dict):
> return OrderedDict(sorted(a_dict.items(), magic_specified_here)) # Just an example
> sorted_dict = my_function(my_dict)
> sorted_dict
{'key1': [1, 2, 3], 'key3': [3, 4, 5], 'key2': [4, 5]}
You can use the following:
return OrderedDict(sorted(a_dict.items(), key=lambda x: x[1][0]))
This will sort the items in the a_dict according to the 0th element of its values. Since a_dict.items() returns a tuple of (key, value) pairs, x[1] means the value.
You can use the key parameter sorted(my_dict.items(), key=lambda x:x[1][0])
>>> my_dict = {}
>>> my_dict['key1'] = [1,2,3]
>>> my_dict['key2'] = [4,5]
>>> my_dict['key3'] = [3,4,5]
>>> my_dict
{'key3': [3, 4, 5], 'key2': [4, 5], 'key1': [1, 2, 3]}
>>> my_dict.items()
dict_items([('key3', [3, 4, 5]), ('key2', [4, 5]), ('key1', [1, 2, 3])])
>>> sorted(my_dict.items(), key=lambda x:x[1][0])
[('key1', [1, 2, 3]), ('key3', [3, 4, 5]), ('key2', [4, 5])]
>>>
Pretty simple:
def my_function(a_dict):
return OrderedDict(sorted(a_dict.items(), key=lambda x: x[1][0]))

Python dictionary sum values

I'm using Python 2.7 and I have a large dictionary that looks a little like this
{J: [92704, 238476902378, 32490872394, 234798327, 2390470], M: [32974097, 237407, 3248707, 32847987, 34879], Z: [8237, 328947, 239487, 234, 182673]}
How can I sum these by value to create a new dictionary that sums the first values in each dictionary, then the second, etc. Like
{FirstValues: J[0]+M[0]+Z[0]}
etc
In [4]: {'FirstValues': sum(e[0] for e in d.itervalues())}
Out[4]: {'FirstValues': 33075038}
where d is your dictionary.
print [sum(row) for row in zip(*yourdict.values())]
yourdict.values() gets all the lists, zip(* ) groups the first, second, etc items together and sum sums each group.
I don't know why do you need dictionary as output, but here it is:
dict(enumerate( [sum(x) for x in zip(*d.values())] ))
from itertools import izip_longest
totals = (sum(vals) for vals in izip_longest(*mydict.itervalues(), fillvalue=0))
print tuple(totals)
In English...
zip the lists (dict values) together, padding with 0 (if you want, you don't have to).
Sum each zipped group
For example,
mydict = {
'J': [1, 2, 3, 4, 5],
'M': [1, 2, 3, 4, 5],
'Z': [1, 2, 3, 4]
}
## When zipped becomes...
([1, 1, 1], [2, 2, 2], [3, 3, 3], [4, 4, 4], [5, 5, 0])
## When summed becomes...
(3, 6, 9, 12, 10)
It does really not make sense to create a new dictionary as the new keys are (probably) meaningless. The results don't relate to the original keys. More appropriate is a tuple as results[0] holds the sum of all values at position 0 in the original dict values etc.
If you must have a dict, take the totals iterator and turn it into a dict thus:
new_dict = dict(('Values%d' % idx, val) for idx, val in enumerate(totals))
Say you have some dict like:
d = {'J': [92704, 238476902378, 32490872394, 234798327, 2390470],
'M': [32974097, 237407, 3248707, 32847987, 34879],
'Z': [8237, 328947, 239487, 234, 182673]}
Make a defaultdict (int)
from collections import defaultdict
sum_by_index = defaultdict(int)
for alist in d.values():
for index,num in enumerate(alist):
sum_by_index[index] += num

Categories

Resources