Related
I'm working on python 3.2.2.
Breaking my head more than 3 hours to sort a dictionary by it's keys.
I managed to make it a sorted list with 2 argument members, but can not make it a sorted dictionary in the end.
This is what I've figured:
myDic={10: 'b', 3:'a', 5:'c'}
sorted_list=sorted(myDic.items(), key=lambda x: x[0])
But no matter what I can not make a dictionary out of this sorted list. How do I do that? Thanks!
A modern and fast solution, for Python 3.7. May also work in some interpreters of Python 3.6.
TLDR
To sort a dictionary by keys use:
sorted_dict = {k: disordered[k] for k in sorted(disordered)}
Almost three times faster than the accepted answer; probably more when you include imports.
Comment on the accepted answer
The example in the accepted answer instead of iterating over the keys only - with key parameter of sorted() or the default behaviour of dict iteration - iterates over tuples (key, value), which suprisingly turns out to be much slower than comparing the keys only and accessing dictionary elements in a list comprehension.
How to sort by key in Python 3.7
The big change in Python 3.7 is that the dictionaries are now ordered by default.
You can generate sorted dict using dict comprehensions.
Using OrderedDict might still be preferable for the compatibility sake.
Do not use sorted(d.items()) without key.
See:
disordered = {10: 'b', 3: 'a', 5: 'c'}
# sort keys, then get values from original - fast
sorted_dict = {k: disordered[k] for k in sorted(disordered)}
# key = itemgetter - slower
from operator import itemgetter
key = itemgetter(0)
sorted_dict = {k: v for k, v in sorted(disordered.items(), key=key)}
# key = lambda - the slowest
key = lambda item: item[0]
sorted_dict = {k: v for k in sorted(disordered.items(), key=key)}
Timing results:
Best for {k: d[k] for k in sorted(d)}: 7.507327548999456
Best for {k: v for k, v in sorted(d.items(), key=key_getter)}: 12.031082626002899
Best for {k: v for k, v in sorted(d.items(), key=key_lambda)}: 14.22885995300021
Best for dict(sorted(d.items(), key=key_getter)): 11.209122000000207
Best for dict(sorted(d.items(), key=key_lambda)): 13.289728325995384
Best for dict(sorted(d.items())): 14.231471302999125
Best for OrderedDict(sorted(d.items(), key=key_getter)): 16.609151654003654
Best for OrderedDict(sorted(d.items(), key=key_lambda)): 18.52622927199991
Best for OrderedDict(sorted(d.items())): 19.436101284998585
Testing code:
from timeit import repeat
setup_code = """
from operator import itemgetter
from collections import OrderedDict
import random
random.seed(0)
d = {i: chr(i) for i in [random.randint(0, 120) for repeat in range(120)]}
key_getter = itemgetter(0)
key_lambda = lambda item: item[0]
"""
cases = [
# fast
'{k: d[k] for k in sorted(d)}',
'{k: v for k, v in sorted(d.items(), key=key_getter)}',
'{k: v for k, v in sorted(d.items(), key=key_lambda)}',
# slower
'dict(sorted(d.items(), key=key_getter))',
'dict(sorted(d.items(), key=key_lambda))',
'dict(sorted(d.items()))',
# the slowest
'OrderedDict(sorted(d.items(), key=key_getter))',
'OrderedDict(sorted(d.items(), key=key_lambda))',
'OrderedDict(sorted(d.items()))',
]
for code in cases:
times = repeat(code, setup=setup_code, repeat=3)
print(f"Best for {code}: {min(times)}")
dict does not keep its elements' order. What you need is an OrderedDict: http://docs.python.org/library/collections.html#collections.OrderedDict
edit
Usage example:
>>> from collections import OrderedDict
>>> a = {'foo': 1, 'bar': 2}
>>> a
{'foo': 1, 'bar': 2}
>>> b = OrderedDict(sorted(a.items()))
>>> b
OrderedDict([('bar', 2), ('foo', 1)])
>>> b['foo']
1
>>> b['bar']
2
I don't think you want an OrderedDict. It sounds like you'd prefer a SortedDict, that is a dict that maintains its keys in sorted order. The sortedcontainers module provides just such a data type. It's written in pure-Python, fast-as-C implementations, has 100% coverage and hours of stress.
Installation is easy with pip:
pip install sortedcontainers
Note that if you can't pip install then you can simply pull the source files from the open-source repository.
Then you're code is simply:
from sortedcontainers import SortedDict
myDic = SortedDict({10: 'b', 3:'a', 5:'c'})
sorted_list = list(myDic.keys())
The sortedcontainers module also maintains a performance comparison with other popular implementations.
Python's ordinary dicts cannot be made to provide the keys/elements in any specific order. For that, you could use the OrderedDict type from the collections module. Note that the OrderedDict type merely keeps a record of insertion order. You would have to sort the entries prior to initializing the dictionary if you want subsequent views/iterators to return the elements in order every time. For example:
>>> myDic={10: 'b', 3:'a', 5:'c'}
>>> sorted_list=sorted(myDic.items(), key=lambda x: x[0])
>>> myOrdDic = OrderedDict(sorted_list)
>>> myOrdDic.items()
[(3, 'a'), (5, 'c'), (10, 'b')]
>>> myOrdDic[7] = 'd'
>>> myOrdDic.items()
[(3, 'a'), (5, 'c'), (10, 'b'), (7, 'd')]
If you want to maintain proper ordering for newly added items, you really need to use a different data structure, e.g., a binary tree/heap. This approach of building a sorted list and using it to initialize a new OrderedDict() instance is just woefully inefficient unless your data is completely static.
Edit: So, if the object of sorting the data is merely to print it in order, in a format resembling a python dict object, something like the following should suffice:
def pprint_dict(d):
strings = []
for k in sorted(d.iterkeys()):
strings.append("%d: '%s'" % (k, d[k]))
return '{' + ', '.join(strings) + '}'
Note that this function is not flexible w/r/t the types of the key, value pairs (i.e., it expects the keys to be integers and the corresponding values to be strings). If you need more flexibility, use something like strings.append("%s: %s" % (repr(k), repr(d[k]))) instead.
With Python 3.7 I could do this:
>>> myDic={10: 'b', 3:'a', 5:'c'}
>>> sortDic = sorted(myDic.items())
>>> print(dict(sortDic))
{3:'a', 5:'c', 10: 'b'}
If you want a list of tuples:
>>> myDic={10: 'b', 3:'a', 5:'c'}
>>> sortDic = sorted(myDic.items())
>>> print(sortDic)
[(3, 'a'), (5, 'c'), (10, 'b')]
Dictionaries are unordered by definition, What would be the main reason for ordering by key? A list of tuples created by the sort method can be used for whatever the need may have been, but changing the list of tuples back into a dictionary will return a random order
>>> myDic
{10: 'b', 3: 'a', 5: 'c'}
>>> sorted(myDic.items())
[(3, 'a'), (5, 'c'), (10, 'b')]
>>> print(dict(myDic.items()))
{10: 'b', 3: 'a', 5: 'c'}
Maybe not that good but I've figured this:
def order_dic(dic):
ordered_dic={}
key_ls=sorted(dic.keys())
for key in key_ls:
ordered_dic[key]=dic[key]
return ordered_dic
Any modern solution to this problem?
I worked around it with:
order = sorted([ job['priority'] for job in self.joblist ])
sorted_joblist = []
while order:
min_priority = min(order)
for job in self.joblist:
if job['priority'] == min_priority:
sorted_joblist += [ job ]
order.remove(min_priority)
self.joblist = sorted_joblist
The joblist is formatted as:
joblist = [ { 'priority' : 3, 'name' : 'foo', ... }, { 'priority' : 1, 'name' : 'bar', ... } ]
Basically I create a list (order) with all the elements by which I want to sort the dict
then I iterate this list and the dict, when I find the item on the dict I send it to a new dict and remove the item from 'order'.
Seems to be working, but I suppose there are better solutions.
I'm not sure whether this could help, but I had a similar problem and I managed to solve it, by defining an apposite function:
def sor_dic_key(diction):
lista = []
diction2 = {}
for x in diction:
lista.append([x, diction[x]])
lista.sort(key=lambda x: x[0])
for l in lista:
diction2[l[0]] = l[1]
return diction2
This function returns another dictionary with the same keys and relative values, but sorted by its keys.
Similarly, I defined a function that could sort a dictionary by its values. I just needed to use x[1] instead of x[0] in the lambda function. I find this second function mostly useless, but one never can tell!
I like python numpy for this kind of stuff! eg:
r=readData()
nsorted = np.lexsort((r.calls, r.slow_requests, r.very_slow_requests, r.stalled_requests))
I have an example of importing CSV data into a numpy and ordering by column priorities.
https://github.com/unixunion/toolbox/blob/master/python/csv-numpy.py
Kegan
The accepted answer definitely works, but somehow miss an important point.
The OP is asking for a dictionary sorted by it's keys this is just not really possible and not what OrderedDict is doing.
OrderedDict is maintaining the content of the dictionary in insertion order. First item inserted, second item inserted, etc.
>>> d = OrderedDict()
>>> d['foo'] = 1
>>> d['bar'] = 2
>>> d
OrderedDict([('foo', 1), ('bar', 2)])
>>> d = OrderedDict()
>>> d['bar'] = 2
>>> d['foo'] = 1
>>> d
OrderedDict([('bar', 2), ('foo', 1)])
Hencefore I won't really be able to sort the dictionary inplace, but merely to create a new dictionary where insertion order match key order. This is explicit in the accepted answer where the new dictionary is b.
This may be important if you are keeping access to dictionaries through containers. This is also important if you itend to change the dictionary later by adding or removing items: they won't be inserted in key order but at the end of dictionary.
>>> d = OrderedDict({'foo': 5, 'bar': 8})
>>> d
OrderedDict([('foo', 5), ('bar', 8)])
>>> d['alpha'] = 2
>>> d
OrderedDict([('foo', 5), ('bar', 8), ('alpha', 2)])
Now, what does mean having a dictionary sorted by it's keys ? That makes no difference when accessing elements by keys, this only matter when you are iterating over items. Making that a property of the dictionary itself seems like overkill. In many cases it's enough to sort keys() when iterating.
That means that it's equivalent to do:
>>> d = {'foo': 5, 'bar': 8}
>>> for k,v in d.iteritems(): print k, v
on an hypothetical sorted by key dictionary or:
>>> d = {'foo': 5, 'bar': 8}
>>> for k, v in iter((k, d[k]) for k in sorted(d.keys())): print k, v
Of course it is not hard to wrap that behavior in an object by overloading iterators and maintaining a sorted keys list. But it is likely overkill.
Sorting dictionaries by value using comprehensions. I think it's nice as 1 line and no need for functions or lambdas
a = {'b':'foo', 'c':'bar', 'e': 'baz'}
a = {f:a[f] for f in sorted(a, key=a.__getitem__)}
Easy and straightforward way:
op = {'1': (1,0,6),'3': (0,45,8),'2': (2,34,10)}
lp3 = sorted(op.items(), key=operator.itemgetter(0), reverse=True)
print(lp3)
ref: https://blog.csdn.net/weixin_37922873/article/details/81210032
I'd like to find the amount of values within sequences of the same value from a list:
list = ['A','A','A','B','B','C','A','A']
The result should look like:
result_dic = {A: [3,2], B: [2], C: [1]}
I do not just want the counts of different values in a list as you can see in the result for A.
collections.defaultdict and itertools.groupby
from itertools import groupby
from collections import defaultdict
listy = ['A','A','A','B','B','C','A','A']
d = defaultdict(list)
for k, v in groupby(listy):
d[k].append(len([*v]))
d
defaultdict(list, {'A': [3, 2], 'B': [2], 'C': [1]})
groupby will loop through an iterable and lump contiguous things together.
[(k, [*v]) for k, v in groupby(listy)]
[('A', ['A', 'A', 'A']), ('B', ['B', 'B']), ('C', ['C']), ('A', ['A', 'A'])]
So I loop through those results and append the length of each grouped thing to the values of a defaultdict
I'd suggest using a defaultdict and looping through the list.
from collections import defaultdict
sample = ['A','A','A','B','B','C','A','A']
result_dic = defaultdict(list)
last_letter = None
num = 0
for l in sample:
if last_letter == l or last_letter is None:
num += 1
else:
result_dic[last_letter].append(num)
Edit
This is my approach, although I'd have a look at #piRSquared's answer because they were keen enough to include groupby as well. Nice work!
I'd suggest looping through the list.
result_dic = {}
old_word = ''
for word in list:
if not word in result_dic:
d[word] = [1]
elif word == old_word:
result_dic[word][-1] += 1
else:
result_dic[word].append(1)
old_word = word
Given a dictionary like so:
my_map = {'a': 1, 'b': 2}
How can one invert this map to get:
inv_map = {1: 'a', 2: 'b'}
Python 3+:
inv_map = {v: k for k, v in my_map.items()}
Python 2:
inv_map = {v: k for k, v in my_map.iteritems()}
Assuming that the values in the dict are unique:
Python 3:
dict((v, k) for k, v in my_map.items())
Python 2:
dict((v, k) for k, v in my_map.iteritems())
If the values in my_map aren't unique:
Python 3:
inv_map = {}
for k, v in my_map.items():
inv_map[v] = inv_map.get(v, []) + [k]
Python 2:
inv_map = {}
for k, v in my_map.iteritems():
inv_map[v] = inv_map.get(v, []) + [k]
To do this while preserving the type of your mapping (assuming that it is a dict or a dict subclass):
def inverse_mapping(f):
return f.__class__(map(reversed, f.items()))
Try this:
inv_map = dict(zip(my_map.values(), my_map.keys()))
(Note that the Python docs on dictionary views explicitly guarantee that .keys() and .values() have their elements in the same order, which allows the approach above to work.)
Alternatively:
inv_map = dict((my_map[k], k) for k in my_map)
or using python 3.0's dict comprehensions
inv_map = {my_map[k] : k for k in my_map}
Another, more functional, way:
my_map = { 'a': 1, 'b':2 }
dict(map(reversed, my_map.items()))
We can also reverse a dictionary with duplicate keys using defaultdict:
from collections import Counter, defaultdict
def invert_dict(d):
d_inv = defaultdict(list)
for k, v in d.items():
d_inv[v].append(k)
return d_inv
text = 'aaa bbb ccc ddd aaa bbb ccc aaa'
c = Counter(text.split()) # Counter({'aaa': 3, 'bbb': 2, 'ccc': 2, 'ddd': 1})
dict(invert_dict(c)) # {1: ['ddd'], 2: ['bbb', 'ccc'], 3: ['aaa']}
See here:
This technique is simpler and faster than an equivalent technique using dict.setdefault().
This expands upon the answer by Robert, applying to when the values in the dict aren't unique.
class ReversibleDict(dict):
# Ref: https://stackoverflow.com/a/13057382/
def reversed(self):
"""
Return a reversed dict, with common values in the original dict
grouped into a list in the returned dict.
Example:
>>> d = ReversibleDict({'a': 3, 'c': 2, 'b': 2, 'e': 3, 'd': 1, 'f': 2})
>>> d.reversed()
{1: ['d'], 2: ['c', 'b', 'f'], 3: ['a', 'e']}
"""
revdict = {}
for k, v in self.items():
revdict.setdefault(v, []).append(k)
return revdict
The implementation is limited in that you cannot use reversed twice and get the original back. It is not symmetric as such. It is tested with Python 2.6. Here is a use case of how I am using to print the resultant dict.
If you'd rather use a set than a list, and there could exist unordered applications for which this makes sense, instead of setdefault(v, []).append(k), use setdefault(v, set()).add(k).
Combination of list and dictionary comprehension. Can handle duplicate keys
{v:[i for i in d.keys() if d[i] == v ] for k,v in d.items()}
A case where the dictionary values is a set. Like:
some_dict = {"1":{"a","b","c"},
"2":{"d","e","f"},
"3":{"g","h","i"}}
The inverse would like:
some_dict = {vi: k for k, v in some_dict.items() for vi in v}
The output is like this:
{'c': '1',
'b': '1',
'a': '1',
'f': '2',
'd': '2',
'e': '2',
'g': '3',
'h': '3',
'i': '3'}
For instance, you have the following dictionary:
my_dict = {'a': 'fire', 'b': 'ice', 'c': 'fire', 'd': 'water'}
And you wanna get it in such an inverted form:
inverted_dict = {'fire': ['a', 'c'], 'ice': ['b'], 'water': ['d']}
First Solution. For inverting key-value pairs in your dictionary use a for-loop approach:
# Use this code to invert dictionaries that have non-unique values
inverted_dict = dict()
for key, value in my_dict.items():
inverted_dict.setdefault(value, list()).append(key)
Second Solution. Use a dictionary comprehension approach for inversion:
# Use this code to invert dictionaries that have unique values
inverted_dict = {value: key for key, value in my_dict.items()}
Third Solution. Use reverting the inversion approach (relies on the second solution):
# Use this code to invert dictionaries that have lists of values
my_dict = {value: key for key in inverted_dict for value in my_map[key]}
Lot of answers but didn't find anything clean in case we are talking about a dictionary with non-unique values.
A solution would be:
from collections import defaultdict
inv_map = defaultdict(list)
for k, v in my_map.items():
inv_map[v].append(k)
Example:
If initial dict my_map = {'c': 1, 'd': 5, 'a': 5, 'b': 10}
then, running the code above will give:
{5: ['a', 'd'], 1: ['c'], 10: ['b']}
I found that this version is more than 10% faster than the accepted version of a dictionary with 10000 keys.
d = {i: str(i) for i in range(10000)}
new_d = dict(zip(d.values(), d.keys()))
In addition to the other functions suggested above, if you like lambdas:
invert = lambda mydict: {v:k for k, v in mydict.items()}
Or, you could do it this way too:
invert = lambda mydict: dict( zip(mydict.values(), mydict.keys()) )
I think the best way to do this is to define a class. Here is an implementation of a "symmetric dictionary":
class SymDict:
def __init__(self):
self.aToB = {}
self.bToA = {}
def assocAB(self, a, b):
# Stores and returns a tuple (a,b) of overwritten bindings
currB = None
if a in self.aToB: currB = self.bToA[a]
currA = None
if b in self.bToA: currA = self.aToB[b]
self.aToB[a] = b
self.bToA[b] = a
return (currA, currB)
def lookupA(self, a):
if a in self.aToB:
return self.aToB[a]
return None
def lookupB(self, b):
if b in self.bToA:
return self.bToA[b]
return None
Deletion and iteration methods are easy enough to implement if they're needed.
This implementation is way more efficient than inverting an entire dictionary (which seems to be the most popular solution on this page). Not to mention, you can add or remove values from your SymDict as much as you want, and your inverse-dictionary will always stay valid -- this isn't true if you simply reverse the entire dictionary once.
If the values aren't unique, and you're a little hardcore:
inv_map = dict(
(v, [k for (k, xx) in filter(lambda (key, value): value == v, my_map.items())])
for v in set(my_map.values())
)
Especially for a large dict, note that this solution is far less efficient than the answer Python reverse / invert a mapping because it loops over items() multiple times.
This handles non-unique values and retains much of the look of the unique case.
inv_map = {v:[k for k in my_map if my_map[k] == v] for v in my_map.itervalues()}
For Python 3.x, replace itervalues with values.
I am aware that this question already has many good answers, but I wanted to share this very neat solution that also takes care of duplicate values:
def dict_reverser(d):
seen = set()
return {v: k for k, v in d.items() if v not in seen or seen.add(v)}
This relies on the fact that set.add always returns None in Python.
Here is another way to do it.
my_map = {'a': 1, 'b': 2}
inv_map= {}
for key in my_map.keys() :
val = my_map[key]
inv_map[val] = key
dict([(value, key) for key, value in d.items()])
Function is symmetric for values of type list; Tuples are coverted to lists when performing reverse_dict(reverse_dict(dictionary))
def reverse_dict(dictionary):
reverse_dict = {}
for key, value in dictionary.iteritems():
if not isinstance(value, (list, tuple)):
value = [value]
for val in value:
reverse_dict[val] = reverse_dict.get(val, [])
reverse_dict[val].append(key)
for key, value in reverse_dict.iteritems():
if len(value) == 1:
reverse_dict[key] = value[0]
return reverse_dict
Since dictionaries require one unique key within the dictionary unlike values, we have to append the reversed values into a list of sort to be included within the new specific keys.
def r_maping(dictionary):
List_z=[]
Map= {}
for z, x in dictionary.iteritems(): #iterate through the keys and values
Map.setdefault(x,List_z).append(z) #Setdefault is the same as dict[key]=default."The method returns the key value available in the dictionary and if given key is not available then it will return provided default value. Afterward, we will append into the default list our new values for the specific key.
return Map
Fast functional solution for non-bijective maps (values not unique):
from itertools import imap, groupby
def fst(s):
return s[0]
def snd(s):
return s[1]
def inverseDict(d):
"""
input d: a -> b
output : b -> set(a)
"""
return {
v : set(imap(fst, kv_iter))
for (v, kv_iter) in groupby(
sorted(d.iteritems(),
key=snd),
key=snd
)
}
In theory this should be faster than adding to the set (or appending to the list) one by one like in the imperative solution.
Unfortunately the values have to be sortable, the sorting is required by groupby.
Try this for python 2.7/3.x
inv_map={};
for i in my_map:
inv_map[my_map[i]]=i
print inv_map
def invertDictionary(d):
myDict = {}
for i in d:
value = d.get(i)
myDict.setdefault(value,[]).append(i)
return myDict
print invertDictionary({'a':1, 'b':2, 'c':3 , 'd' : 1})
This will provide output as : {1: ['a', 'd'], 2: ['b'], 3: ['c']}
A lambda solution for current python 3.x versions:
d1 = dict(alice='apples', bob='bananas')
d2 = dict(map(lambda key: (d1[key], key), d1.keys()))
print(d2)
Result:
{'apples': 'alice', 'bananas': 'bob'}
This solution does not check for duplicates.
Some remarks:
The lambda construct can access d1 from the outer scope, so we only
pass in the current key. It returns a tuple.
The dict() constructor accepts a list of tuples. It
also accepts the result of a map, so we can skip the conversion to a
list.
This solution has no explicit for loop. It also avoids using a list comprehension for those who are bad at math ;-)
Taking up the highly voted answer starting If the values in my_map aren't unique:, I had a problem where not only the values were not unique, but in addition, they were a list, with each item in the list consisting again of a list of three elements: a string value, a number, and another number.
Example:
mymap['key1'] gives you:
[('xyz', 1, 2),
('abc', 5, 4)]
I wanted to switch only the string value with the key, keeping the two number elements at the same place. You simply need another nested for loop then:
inv_map = {}
for k, v in my_map.items():
for x in v:
# with x[1:3] same as x[1], x[2]:
inv_map[x[0]] = inv_map.get(x[0], []) + [k, x[1:3]]
Example:
inv_map['abc'] now gives you:
[('key1', 1, 2),
('key1', 5, 4)]
This works even if you have non-unique values in the original dictionary.
def dict_invert(d):
'''
d: dict
Returns an inverted dictionary
'''
# Your code here
inv_d = {}
for k, v in d.items():
if v not in inv_d.keys():
inv_d[v] = [k]
else:
inv_d[v].append(k)
inv_d[v].sort()
print(f"{inv_d[v]} are the values")
return inv_d
I would do it that way in python 2.
inv_map = {my_map[x] : x for x in my_map}
Not something completely different, just a bit rewritten recipe from Cookbook. It's futhermore optimized by retaining setdefault method, instead of each time getting it through the instance:
def inverse(mapping):
'''
A function to inverse mapping, collecting keys with simillar values
in list. Careful to retain original type and to be fast.
>> d = dict(a=1, b=2, c=1, d=3, e=2, f=1, g=5, h=2)
>> inverse(d)
{1: ['f', 'c', 'a'], 2: ['h', 'b', 'e'], 3: ['d'], 5: ['g']}
'''
res = {}
setdef = res.setdefault
for key, value in mapping.items():
setdef(value, []).append(key)
return res if mapping.__class__==dict else mapping.__class__(res)
Designed to be run under CPython 3.x, for 2.x replace mapping.items() with mapping.iteritems()
On my machine runs a bit faster, than other examples here
I would like to concatenate a list of strings into new strings grouped over values in a list. Here is an example of what I mean:
Input
key = ['1','2','2','3']
data = ['a','b','c','d']
Result
newkey = ['1','2','3']
newdata = ['a','b c','d']
I understand how to join text. But I don't know how to iterate correctly over the values of the list to aggregate the strings that are common to the same key value.
Any help or suggestions appreciated. Thanks.
from collections import defaultdict
d = defaultdict(list)
for k, v in zip(key, data):
d[k].append(v)
print [(k, ' '.join(v)) for k, v in d.items()]
Output:
[('1', 'a'), ('3', 'd'), ('2', 'b c')]
And how to get new lists:
newkey, newvalue = d.keys(), [' '.join(v) for v in d.values()]
And with saved order:
newkey, newvalue = zip(*[(k, ' '.join(d.pop(k))) for k in key if k in d])
Use the itertools.groupby() function to combine elements; zip will let you group two input lists into two output lists:
import itertools
import operator
newkey, newdata = [], []
for key, items in itertools.groupby(zip(key, data), key=operator.itemgetter(0)):
# key is the grouped key, items an iterable of key, data pairs
newkey.append(key)
newdata.append(' '.join(d for k, d in items))
You can turn this into a list comprehension with a bit more zip() magic:
from itertools import groupby
from operator import itemgetter
newkey, newdata = zip(*[(k, ' '.join(d for _, d in it)) for k, it in groupby(zip(key, data), key=itemgetter(0))])
Note that this does require the input to be sorted; groupby only groups elements based on the consecutive keys being the same. On the other hand, it does preserve that initial sorted order.
you can use itertools.groupby() on zip(key,data):
In [128]: from itertools import *
In [129]: from operator import *
In [133]: lis=[(k," ".join(x[1] for x in g)) for k,g in groupby(zip(key,data),key=itemgetter(0))]
In [134]: newkey,newdata=zip(*lis)
In [135]: newkey
Out[135]: ('1', '2', '3')
In [136]: newdata
Out[136]: ('a', 'b c', 'd')
If you dont feel like importing collections you can always use a regular dictionary.
key = ['1','2','2','3']
data = ['a','b','c','d']
newkeydata = {}
for k,d in zip(key,data):
newkeydata[k] = newkeydata.get(k, []).append(d)
Just for the sake of variety, here is a solution that works without any external libraries and without dictionaries:
def group_vals(keys, vals):
new_keys= sorted(set(keys))
zipped_keys = zip(keys, keys[1:]+[''])
zipped_vals = zip(vals, vals[1:]+[''])
new_vals = []
for i, (key1, key2) in enumerate(zipped_keys):
if key1 == key2:
new_vals.append(' '.join(zipped_vals[i]))
else:
new_vals.append(zipped_vals[i][0])
return new_keys, new_vals
group_vals([1,2,2,3], ['a','b','c','d'])
# --> ([1, 2, 3], ['a', 'b c', 'd'])
But I know that it's quite ugly and probably not as performant as the other solutions. Just for demonstration purposes. :)
I'm supposed to program in Python, and I've only used Python for 3 weeks. I have to solve all kinds of problems and write functions as training. For one of my functions I use this line.
theDict = dict( [(k,v) for k,v in theDict.items() if len(v)>0])
However I can't use anything I don't fully understand or can't fully explain. I understand the gist of the line, but, I can't really explain it. So my instructor told me that to use this, i must learn ether everything about tuples and fully understand list comprehension, or i must write that in pure python.
The line basically looks into a dictionary, and inside the dictionary, its supposed to look for values that are equal to empty lists and delete those keys/values.
So, my question is, what would this line look like in pure, non list comprehension python?
I'll attempt to write it because I want to try my best, and this isn't a website where you get free answers, but you guys correct me and help me finish it if it doesn't work.
Also another problem is that, the empty lists inside the 'value' of the dictionary, if they are empty, then they won't be processed inside the loop. The loop is supposed to delete the key that is equal to the empty value. So how are you supposed to check if the list is empty, if the check is inside the loop, and the loop won't have the empty array in its body?
for key,value in TheDict.items(): #i need to add 'if value:' somewhere,
#but i don't know how to add it to make it work, because
#this checks if the value exists or not, but if the value
#doesn't exist, then it won't go though this area, so
#there is no way to see if the value exists or not.
theDict[key]=value
If there is a better method to remove dictionary values that have a value of an empty list. please let me know.
And how will
theDict = dict( [(k,v) for k,v in theDict.items() if len(v)>0])
look like if it didn't use a generator?
result = dict([(k,v) for k,v in theDict.items() if len(v)>0])
will look like(if you want new dictionary)
result = {}
for key, value in theDict.items():
if len(value) > 0:
result[key] = value
if you want to modify existing dictionary:
for key, value in theDict.items():
if not len(value) > 0:
del theDict[key]
if v signifies if v has some value, if v donesn't have any value, control will not enter the condition and skip the value
In [25]: theDict={'1':'2','3':'', '4':[]}
In [26]: for k,v in theDict.items():
....: if v:
....: newDict[k]=v
....:
In [27]: newDict
Out[27]: {'1': '2'}
==========================
In [2]: theDict = { 1: ['e', 'f'], 2: ['a', 'b', 'c'], 4: ['d', ' '], 5: [] }
In [3]: newDict = {}
In [4]: for k,v in theDict.items():
...: if v:
...: newDict[k]=v
...:
In [5]: newDict
Out[5]: {1: ['e', 'f'], 2: ['a', 'b', 'c'], 4: ['d', ' ']}
Updated the answer as per your input...
Just for fun:
from operator import itemgetter
theDict = dict(filter(itemgetter(1), theDict.items()))
To remove an element from a dictionary, you can use the del keyword:
>>> d = {1: 2, 3: 4}
>>> d
{1: 2, 3: 4}
>>> del d[1]
>>> d
{3: 4}
>>>
This will probably be more efficient than generating a completely new dictionary. Then, you can use a similar structure to above:
for k in theDict:
if len(theDict[k]) == 0:
del theDict[k]
Does that make sense?
theDict = dict( [(k,v) for k,v in theDict.items() if len(v)>0])
However I can't use anything I don't fully understand or can't fully explain. I understand the gist of the line, but, I can't really explain it.
Background
The easiest way to understand or demo behaviour in python is using the interactive interpreter:
python -i
In the interactive interpreter there are two fabulously useful commands:
dir - takes an optional argument of an object, returns a list of the attributes on the object.
help - accesses inline documentation
You can use dir to find out, for example what methods an object has and then look at their documentation using help.
Explaining the line in question
Here's a sample dictionary:
>>> theDict = dict(a=[1,2],b=[3,4],c=[])
>>> theDict
{'a': [1, 2], 'c': [], 'b': [3, 4]}
The list comprehension returns a list of key-value pairs as tuples:
>>> [(k,v) for k,v in theDict.items()]
[('a', [1, 2]), ('c', []), ('b', [3, 4])]
The if statement filters the resulting list.
>>> [(k,v) for k,v in theDict.items() if len(v) > 0]
[('a', [1, 2]), ('b', [3, 4])]
The dict can be instantiated with a sequence of key-value pairs:
>>> dict([(k,v) for k,v in theDict.items() if len(v) > 0])
{'a': [1, 2], 'b': [3, 4]}
Putting it all together:
>>> theDict = dict(a=[1,2],b=[3,4],c=[])
>>> theDict
{'a': [1, 2], 'c': [], 'b': [3, 4]}
>>> theDict = dict([(k,v) for k,v in theDict.items() if len(v) > 0])
>>> theDict
{'a': [1, 2], 'b': [3, 4]}
The original dict object is replaced with a new one instantiated using the list comprehension filtered list of it's key-value pairs.
If you follow all this (and play with it yourself in the interactive interpreter) you will understand what's going on in this line of code you've asked about.
What you probably want is a defaultdict with a list as the empty value.
Here's your function in a more or less readable way:
def clean_whitespace(dct):
out = {}
for key, val in dct.items():
val = map(str.strip, val)
val = filter(None, val)
if val:
out[key] = val
return out
or, using comprehensions,
def clean_whitespace(dct):
out = {}
for key, val in dct.items():
val = [x.strip() for x in val]
val = [x for x in val if x]
if val:
out[key] = val
return out
Let us know if you need comments or explanations.
the solution was under my nose. sorry guys. thank you for all your help +1 for everyone
def CleanWhiteSpace(theDict) :
for k,v in theDict.items():
if not v:
del theDict[k]
return theDict