I have this list made from a csv which is massive.
For every item in list, I have broken it into it's id and details. id is always between 0-3 characters max length and details is variable.
I created an empty dictionary, D...(rest of code below):
D={}
for v in list:
id = v[0:3]
details = v[3:]
if id not in D:
D[id] = {}
if details not in D[id]:
D[id][details] = 0
D[id][details] += 1
aside: Can you help me understand what the two if statements are doing? Very new to python and programming.
Anyway, it produces something like this:
{'KEY1_1': {'key2_1' : value2_1, 'key2_2' : value2_2, 'key2_3' : value2_3},
'KEY1_2': {'key2_1' : value2_1, 'key2_2' : value2_2, 'key2_3' : value2_3},
and many more KEY1's with variable numbers of key2's
Each 'KEY1' is unique but each 'key2' isn't necessarily. The value2_
s are all different.
Ok so, right now I found a way to sort by the first KEY
for k, v in sorted(D.items()):
print k, ':', v
I have done enough research to know that dictionaries can't really be sorted but I don't care about sorting, I care about ordering or more specifically frequencies of occurrence. In my code value2_x is the number of times its corresponding key2_x occurs for that particular KEY1_x. I am starting to think I should have used better variable names.
Question: How do I order the top-level/overall dictionary by the number in value2_x which is in the nested dictionary? I want to do some statistics to those numbers like...
How many times does the most frequent KEY1_x:key2_x pair show up?
What are the 10, 20, 30 most frequent KEY1_x:key2_x pairs?
Can I only do that by each KEY1 or can I do it overall? Bonus: If I could order it that way for presentation/sharing that would be very helpful because it is such a large data set. So much thanks in advance and I hope I've made my question and intent clear.
You could use Counter to order the key pairs based on their frequency. It also provides an easy way to get x most frequent items:
from collections import Counter
d = {
'KEY1': {
'key2_1': 5,
'key2_2': 1,
'key2_3': 3
},
'KEY2': {
'key2_1': 2,
'key2_2': 3,
'key2_3': 4
}
}
c = Counter()
for k, v in d.iteritems():
c.update({(k, k1): v1 for k1, v1 in v.iteritems()})
print c.most_common(3)
Output:
[(('KEY1', 'key2_1'), 5), (('KEY2', 'key2_3'), 4), (('KEY2', 'key2_2'), 3)]
If you only care about the most common key pairs and have no other reason to build nested dictionary you could just use the following code:
from collections import Counter
l = ['foobar', 'foofoo', 'foobar', 'barfoo']
D = Counter((v[:3], v[3:]) for v in l)
print D.most_common() # [(('foo', 'bar'), 2), (('foo', 'foo'), 1), (('bar', 'foo'), 1)]
Short explanation: ((v[:3], v[3:]) for v in l) is a generator expression that will generate tuples where first item is the same as top level key in your original dict and second item is the same as key in nested dict.
>>> x = list((v[:3], v[3:]) for v in l)
>>> x
[('foo', 'bar'), ('foo', 'foo'), ('foo', 'bar'), ('bar', 'foo')]
Counter is a subclass of dict. It accepts an iterable as an argument and each unique element in iterable will be used as key and value is the count of element in the iterable.
>>> c = Counter(x)
>>> c
Counter({('foo', 'bar'): 2, ('foo', 'foo'): 1, ('bar', 'foo'): 1})
Since generator expression is an iterable there's no need to convert it to list in between so construction can simply be done with Counter((v[:3], v[3:]) for v in l).
The if statements you asked about are checking if the key exists in dict:
>>> d = {1: 'foo'}
>>> 1 in d
True
>>> 2 in d
False
So the following code will check if key with value of id exists in dict D and if it doesn't it will assign empty dict there.
if id not in D:
D[id] = {}
The second if does exactly the same for nested dictionaries.
Related
I'm working on python 3.2.2.
Breaking my head more than 3 hours to sort a dictionary by it's keys.
I managed to make it a sorted list with 2 argument members, but can not make it a sorted dictionary in the end.
This is what I've figured:
myDic={10: 'b', 3:'a', 5:'c'}
sorted_list=sorted(myDic.items(), key=lambda x: x[0])
But no matter what I can not make a dictionary out of this sorted list. How do I do that? Thanks!
A modern and fast solution, for Python 3.7. May also work in some interpreters of Python 3.6.
TLDR
To sort a dictionary by keys use:
sorted_dict = {k: disordered[k] for k in sorted(disordered)}
Almost three times faster than the accepted answer; probably more when you include imports.
Comment on the accepted answer
The example in the accepted answer instead of iterating over the keys only - with key parameter of sorted() or the default behaviour of dict iteration - iterates over tuples (key, value), which suprisingly turns out to be much slower than comparing the keys only and accessing dictionary elements in a list comprehension.
How to sort by key in Python 3.7
The big change in Python 3.7 is that the dictionaries are now ordered by default.
You can generate sorted dict using dict comprehensions.
Using OrderedDict might still be preferable for the compatibility sake.
Do not use sorted(d.items()) without key.
See:
disordered = {10: 'b', 3: 'a', 5: 'c'}
# sort keys, then get values from original - fast
sorted_dict = {k: disordered[k] for k in sorted(disordered)}
# key = itemgetter - slower
from operator import itemgetter
key = itemgetter(0)
sorted_dict = {k: v for k, v in sorted(disordered.items(), key=key)}
# key = lambda - the slowest
key = lambda item: item[0]
sorted_dict = {k: v for k in sorted(disordered.items(), key=key)}
Timing results:
Best for {k: d[k] for k in sorted(d)}: 7.507327548999456
Best for {k: v for k, v in sorted(d.items(), key=key_getter)}: 12.031082626002899
Best for {k: v for k, v in sorted(d.items(), key=key_lambda)}: 14.22885995300021
Best for dict(sorted(d.items(), key=key_getter)): 11.209122000000207
Best for dict(sorted(d.items(), key=key_lambda)): 13.289728325995384
Best for dict(sorted(d.items())): 14.231471302999125
Best for OrderedDict(sorted(d.items(), key=key_getter)): 16.609151654003654
Best for OrderedDict(sorted(d.items(), key=key_lambda)): 18.52622927199991
Best for OrderedDict(sorted(d.items())): 19.436101284998585
Testing code:
from timeit import repeat
setup_code = """
from operator import itemgetter
from collections import OrderedDict
import random
random.seed(0)
d = {i: chr(i) for i in [random.randint(0, 120) for repeat in range(120)]}
key_getter = itemgetter(0)
key_lambda = lambda item: item[0]
"""
cases = [
# fast
'{k: d[k] for k in sorted(d)}',
'{k: v for k, v in sorted(d.items(), key=key_getter)}',
'{k: v for k, v in sorted(d.items(), key=key_lambda)}',
# slower
'dict(sorted(d.items(), key=key_getter))',
'dict(sorted(d.items(), key=key_lambda))',
'dict(sorted(d.items()))',
# the slowest
'OrderedDict(sorted(d.items(), key=key_getter))',
'OrderedDict(sorted(d.items(), key=key_lambda))',
'OrderedDict(sorted(d.items()))',
]
for code in cases:
times = repeat(code, setup=setup_code, repeat=3)
print(f"Best for {code}: {min(times)}")
dict does not keep its elements' order. What you need is an OrderedDict: http://docs.python.org/library/collections.html#collections.OrderedDict
edit
Usage example:
>>> from collections import OrderedDict
>>> a = {'foo': 1, 'bar': 2}
>>> a
{'foo': 1, 'bar': 2}
>>> b = OrderedDict(sorted(a.items()))
>>> b
OrderedDict([('bar', 2), ('foo', 1)])
>>> b['foo']
1
>>> b['bar']
2
I don't think you want an OrderedDict. It sounds like you'd prefer a SortedDict, that is a dict that maintains its keys in sorted order. The sortedcontainers module provides just such a data type. It's written in pure-Python, fast-as-C implementations, has 100% coverage and hours of stress.
Installation is easy with pip:
pip install sortedcontainers
Note that if you can't pip install then you can simply pull the source files from the open-source repository.
Then you're code is simply:
from sortedcontainers import SortedDict
myDic = SortedDict({10: 'b', 3:'a', 5:'c'})
sorted_list = list(myDic.keys())
The sortedcontainers module also maintains a performance comparison with other popular implementations.
Python's ordinary dicts cannot be made to provide the keys/elements in any specific order. For that, you could use the OrderedDict type from the collections module. Note that the OrderedDict type merely keeps a record of insertion order. You would have to sort the entries prior to initializing the dictionary if you want subsequent views/iterators to return the elements in order every time. For example:
>>> myDic={10: 'b', 3:'a', 5:'c'}
>>> sorted_list=sorted(myDic.items(), key=lambda x: x[0])
>>> myOrdDic = OrderedDict(sorted_list)
>>> myOrdDic.items()
[(3, 'a'), (5, 'c'), (10, 'b')]
>>> myOrdDic[7] = 'd'
>>> myOrdDic.items()
[(3, 'a'), (5, 'c'), (10, 'b'), (7, 'd')]
If you want to maintain proper ordering for newly added items, you really need to use a different data structure, e.g., a binary tree/heap. This approach of building a sorted list and using it to initialize a new OrderedDict() instance is just woefully inefficient unless your data is completely static.
Edit: So, if the object of sorting the data is merely to print it in order, in a format resembling a python dict object, something like the following should suffice:
def pprint_dict(d):
strings = []
for k in sorted(d.iterkeys()):
strings.append("%d: '%s'" % (k, d[k]))
return '{' + ', '.join(strings) + '}'
Note that this function is not flexible w/r/t the types of the key, value pairs (i.e., it expects the keys to be integers and the corresponding values to be strings). If you need more flexibility, use something like strings.append("%s: %s" % (repr(k), repr(d[k]))) instead.
With Python 3.7 I could do this:
>>> myDic={10: 'b', 3:'a', 5:'c'}
>>> sortDic = sorted(myDic.items())
>>> print(dict(sortDic))
{3:'a', 5:'c', 10: 'b'}
If you want a list of tuples:
>>> myDic={10: 'b', 3:'a', 5:'c'}
>>> sortDic = sorted(myDic.items())
>>> print(sortDic)
[(3, 'a'), (5, 'c'), (10, 'b')]
Dictionaries are unordered by definition, What would be the main reason for ordering by key? A list of tuples created by the sort method can be used for whatever the need may have been, but changing the list of tuples back into a dictionary will return a random order
>>> myDic
{10: 'b', 3: 'a', 5: 'c'}
>>> sorted(myDic.items())
[(3, 'a'), (5, 'c'), (10, 'b')]
>>> print(dict(myDic.items()))
{10: 'b', 3: 'a', 5: 'c'}
Maybe not that good but I've figured this:
def order_dic(dic):
ordered_dic={}
key_ls=sorted(dic.keys())
for key in key_ls:
ordered_dic[key]=dic[key]
return ordered_dic
Any modern solution to this problem?
I worked around it with:
order = sorted([ job['priority'] for job in self.joblist ])
sorted_joblist = []
while order:
min_priority = min(order)
for job in self.joblist:
if job['priority'] == min_priority:
sorted_joblist += [ job ]
order.remove(min_priority)
self.joblist = sorted_joblist
The joblist is formatted as:
joblist = [ { 'priority' : 3, 'name' : 'foo', ... }, { 'priority' : 1, 'name' : 'bar', ... } ]
Basically I create a list (order) with all the elements by which I want to sort the dict
then I iterate this list and the dict, when I find the item on the dict I send it to a new dict and remove the item from 'order'.
Seems to be working, but I suppose there are better solutions.
I'm not sure whether this could help, but I had a similar problem and I managed to solve it, by defining an apposite function:
def sor_dic_key(diction):
lista = []
diction2 = {}
for x in diction:
lista.append([x, diction[x]])
lista.sort(key=lambda x: x[0])
for l in lista:
diction2[l[0]] = l[1]
return diction2
This function returns another dictionary with the same keys and relative values, but sorted by its keys.
Similarly, I defined a function that could sort a dictionary by its values. I just needed to use x[1] instead of x[0] in the lambda function. I find this second function mostly useless, but one never can tell!
I like python numpy for this kind of stuff! eg:
r=readData()
nsorted = np.lexsort((r.calls, r.slow_requests, r.very_slow_requests, r.stalled_requests))
I have an example of importing CSV data into a numpy and ordering by column priorities.
https://github.com/unixunion/toolbox/blob/master/python/csv-numpy.py
Kegan
The accepted answer definitely works, but somehow miss an important point.
The OP is asking for a dictionary sorted by it's keys this is just not really possible and not what OrderedDict is doing.
OrderedDict is maintaining the content of the dictionary in insertion order. First item inserted, second item inserted, etc.
>>> d = OrderedDict()
>>> d['foo'] = 1
>>> d['bar'] = 2
>>> d
OrderedDict([('foo', 1), ('bar', 2)])
>>> d = OrderedDict()
>>> d['bar'] = 2
>>> d['foo'] = 1
>>> d
OrderedDict([('bar', 2), ('foo', 1)])
Hencefore I won't really be able to sort the dictionary inplace, but merely to create a new dictionary where insertion order match key order. This is explicit in the accepted answer where the new dictionary is b.
This may be important if you are keeping access to dictionaries through containers. This is also important if you itend to change the dictionary later by adding or removing items: they won't be inserted in key order but at the end of dictionary.
>>> d = OrderedDict({'foo': 5, 'bar': 8})
>>> d
OrderedDict([('foo', 5), ('bar', 8)])
>>> d['alpha'] = 2
>>> d
OrderedDict([('foo', 5), ('bar', 8), ('alpha', 2)])
Now, what does mean having a dictionary sorted by it's keys ? That makes no difference when accessing elements by keys, this only matter when you are iterating over items. Making that a property of the dictionary itself seems like overkill. In many cases it's enough to sort keys() when iterating.
That means that it's equivalent to do:
>>> d = {'foo': 5, 'bar': 8}
>>> for k,v in d.iteritems(): print k, v
on an hypothetical sorted by key dictionary or:
>>> d = {'foo': 5, 'bar': 8}
>>> for k, v in iter((k, d[k]) for k in sorted(d.keys())): print k, v
Of course it is not hard to wrap that behavior in an object by overloading iterators and maintaining a sorted keys list. But it is likely overkill.
Sorting dictionaries by value using comprehensions. I think it's nice as 1 line and no need for functions or lambdas
a = {'b':'foo', 'c':'bar', 'e': 'baz'}
a = {f:a[f] for f in sorted(a, key=a.__getitem__)}
Easy and straightforward way:
op = {'1': (1,0,6),'3': (0,45,8),'2': (2,34,10)}
lp3 = sorted(op.items(), key=operator.itemgetter(0), reverse=True)
print(lp3)
ref: https://blog.csdn.net/weixin_37922873/article/details/81210032
Here's a minimal reproducible example:
from collections import OrderedDict
d = OrderedDict([('foo', 123),
('bar', 456)])
So I want to check if there's a foo key in d and if there's then I'd like to rewrite it as a single value of a list for a new hardcoded key:
print(d)
ordereddict([('bar', 456), ('newCoolHardcodedKey', [ordereddict([('foo', 123)])])])
You can use a generating expression (like a list comprehension, but returns an iterator instead of storing the temporary list in memory) to do this:
d = OrderedDict(
(
("newCoolHardcodedKey", OrderedDict([item])) if item[0] == "foo" else item
for item in d.items()
)
)
print(d)
OrderedDict([('newCoolHardcodedKey', OrderedDict([('foo', 123)])), ('bar', 456)])
The dict being ordered, the new element is where foo was.
If you need the new element to go to the end, it might be easiest to test if d["foo"] exists, and if so append the new ordered dict with its hard-coded key and delete the original entry for foo:
if "foo" in d:
d["newCoolHardcodedKey"] = OrderedDict([("foo", d["foo"])])
del d["foo"]
print(d)
OrderedDict([('bar', 456), ('newCoolHardcodedKey', OrderedDict([('foo', 123)]))])
Performance considerations
If d is large in your real application, the second solution is much better since it changes d in place instead of making a copy.
Say I have a dictionary called word_counter_dictionary that counts how many words are in the document in the form {'word' : number}. For example, the word "secondly" appears one time, so the key/value pair would be {'secondly' : 1}. I want to make an inverted list so that the numbers will become keys and the words will become the values for those keys so I can then graph the top 25 most used words. I saw somewhere where the setdefault() function might come in handy, but regardless I cannot use it because so far in the class I am in we have only covered get().
inverted_dictionary = {}
for key in word_counter_dictionary:
new_key = word_counter_dictionary[key]
inverted_dictionary[new_key] = word_counter_dictionary.get(new_key, '') + str(key)
inverted_dictionary
So far, using this method above, it works fine until it reaches another word with the same value. For example, the word "saves" also appears once in the document, so Python will add the new key/value pair just fine. BUT it erases the {1 : 'secondly'} with the new pair so that only {1 : 'saves'} is in the dictionary.
So, bottom line, my goal is to get ALL of the words and their respective number of repetitions in this new dictionary called inverted_dictionary.
A defaultdict is perfect for this
word_counter_dictionary = {'first':1, 'second':2, 'third':3, 'fourth':2}
from collections import defaultdict
d = defaultdict(list)
for key, value in word_counter_dictionary.iteritems():
d[value].append(key)
print(d)
Output:
defaultdict(<type 'list'>, {1: ['first'], 2: ['second', 'fourth'], 3: ['third']})
What you can do is convert the value in a list of words with the same key:
word_counter_dictionary = {'first':1, 'second':2, 'third':3, 'fourth':2}
inverted_dictionary = {}
for key in word_counter_dictionary:
new_key = word_counter_dictionary[key]
if new_key in inverted_dictionary:
inverted_dictionary[new_key].append(str(key))
else:
inverted_dictionary[new_key] = [str(key)]
print inverted_dictionary
>>> {1: ['first'], 2: ['second', 'fourth'], 3: ['third']}
Python dicts do NOT allow repeated keys, so you can't use a simple dictionary to store multiple elements with the same key (1 in your case). For your example, I'd rather have a list as the value of your inverted dictionary, and store in that list the words that share the number of appearances, like:
inverted_dictionary = {}
for key in word_counter_dictionary:
new_key = word_counter_dictionary[key]
if new_key in inverted_dictionary:
inverted_dictionary[new_key].append(key)
else:
inverted_dictionary[new_key] = [key]
In order to get the 25 most repeated words, you should iterate through the (sorted) keys in the inverted_dictionary and store the words:
common_words = []
for key in sorted(inverted_dictionary.keys(), reverse=True):
if len(common_words) < 25:
common_words.extend(inverted_dictionary[key])
else:
break
common_words = common_words[:25] # In case there are more than 25 words
Here's a version that doesn't "invert" the dictionary:
>>> import operator
>>> A = {'a':10, 'b':843, 'c': 39, 'd': 10}
>>> B = sorted(A.iteritems(), key=operator.itemgetter(1), reverse=True)
>>> B
[('b', 843), ('c', 39), ('a', 10), ('d', 10)]
Instead, it creates a list that is sorted, highest to lowest, by value.
To get the top 25, you simply slice it: B[:25].
And here's one way to get the keys and values separated (after putting them into a list of tuples):
>>> [x[0] for x in B]
['b', 'c', 'a', 'd']
>>> [x[1] for x in B]
[843, 39, 10, 10]
or
>>> C, D = zip(*B)
>>> C
('b', 'c', 'a', 'd')
>>> D
(843, 39, 10, 10)
Note that if you only want to extract the keys or the values (and not both) you should have done so earlier. This is just examples of how to handle the tuple list.
For getting the largest elements of some dataset an inverted dictionary might not be the best data structure.
Either put the items in a sorted list (example assumes you want to get to two most frequent words):
word_counter_dictionary = {'first':1, 'second':2, 'third':3, 'fourth':2}
counter_word_list = sorted((count, word) for word, count in word_counter_dictionary.items())
Result:
>>> print(counter_word_list[-2:])
[(2, 'second'), (3, 'third')]
Or use Python's included batteries (heapq.nlargest in this case):
import heapq, operator
print(heapq.nlargest(2, word_counter_dictionary.items(), key=operator.itemgetter(1)))
Result:
[('third', 3), ('second', 2)]
Given a string s, I want to know how many times each character at the string occurs. Here is the code:
def main() :
while True :
try :
line=raw_input('Enter a string: ')
except EOFError :
break;
mp={};
for i in range(len(line)) :
if line[i] in mp :
mp[line[i]] += 1;
else :
mp[line[i]] = 1;
for i in range(len(line)) :
print line[i],': ',mp[line[i]];
if __name__ == '__main__' :
main();
When I run this code and I enter abbba, I get:
a : 2
b : 3
b : 3
b : 3
a : 2
I would like to get only:
a : 2
b : 3
I understand why this is happening, but as I'm new to python, I don't know any other ways to iterate over the elements of a map. Could anyone tell me how to do this? Thanks in advance.
You could try a Counter (Python 2.7 and above; see below for a pre-2.7 option):
>>> from collections import Counter
>>> Counter('abbba')
Counter({'b': 3, 'a': 2})
You can then access the elements just like a dictionary:
>>> counts = Counter('abbba')
>>> counts['a']
2
>>> counts['b']
3
And to iterate, you can use #BurhanKhalid's suggestion (the Counter behaves as a dictionary, where you can iterate over the key/value pairs):
>>> for k, v in Counter('abbba').iteritems():
... print k, v
...
a 2
b 3
If you're using a pre-2.7 version of Python, you can use a defaultdict to simplify your code a bit (process is still the same - only difference is that now you don't have to check for the key first - it will 'default' to 0 if a matching key isn't found). Counter has other features built into it, but if you simply want counts (and don't care about most_common, or being able to subtract, for instance), this should be fine and can be treated just as any other dictionary:
>>> from collections import defaultdict
>>> counts = defaultdict(int)
>>> for c in 'abbba':
... counts[c] += 1
...
>>> counts
defaultdict(<type 'int'>, {'a': 2, 'b': 3})
When you use iteritems() on a dictionary (or the Counter/defaultdict here), a key and a value are returned for each iteration (in this case, the key being the letter and the value being the number of occurrences). One thing to note about using dictionaries is that they are inherently unordered, so you won't necessarily get 'a', 'b', ... while iterating. One basic way to iterate through a dictionary in a sorted manner would be to iterate through a sorted list of the keys (here alphabetical, but sorted can be manipulated to handle a variety of options), and return the dictionary value for that key (there are other ways, but this will hopefully be somewhat informative):
>>> mapping = {'some': 2, 'example': 3, 'words': 5}
>>> mapping
{'some': 2, 'example': 3, 'words': 5}
>>> for key in sorted(mapping.keys()):
... print key, mapping[key]
...
example 3
some 2
words 5
Iterating over a mapping yields keys.
>>> d = {'foo': 42, 'bar': 'quux'}
>>> for k in d:
... print k, d[k]
...
foo 42
bar quux
You need to look up the help for dict(). It's all there -- 'for k in mp' iterates over keys, 'for v in mp.values()' iterates over values, 'for k,v in mp.items()' iterates over key, value pairs.
Also, you don't need those semicolons. While they are legal in Python, nobody uses them, there's pretty much no reason to.
Python 2.5 and above
dDIct = collections.defaultdict(int)
[(d[i]+=1) for i in line]
print dDict
Given a dictionary keyed by 2-element tuples, I want to return all the key-value pairs whose keys contain a given element.
For example, the dictionary can be:
tupled_dict = {('a',1):1, ('a',2):0, ('b',1):1, ('c',4):0}
and the given element is 'a', then the key-value pairs that should be returned would be:
('a',1):1, ('a',2):0
What is the fastest code to do this?
EDIT:
In addition, as a related sub-question, I am interested in the fastest way to delete all such key-value pairs given an element of the keys. Obviously, once I have the results of the above, I can use a loop to delete each dictionary item one by one, but I wonder if there is a short-cut way to do it.
To get those ones:
>>> {k: v for k, v in tupled_dict.iteritems() if 'a' in k}
{('a', 1): 1, ('a', 2): 0}
Similarly, to delete the other ones:
>>> tupled_dict = {k: v for k, v in tupled_dict.iteritems() if 'a' not in k}
>>> tupled_dict
{('b', 1): 1, ('c', 4): 0}
I haven't tested it for performance, but I suggest you start by getting a baseline using a for loop, and then another with dict comprehensions .
>>> {k:v for k, v in tupled_dict.iteritems() if k[0] == 'a'}
{('a', 1): 1, ('a', 2): 0}
This snippet will work even if 'a' isn't the first element in a key tuple:
from operator import methodcaller
contains_a = methodcaller('__contains__', 'a')
keys = filter(contains_a, tupled_dict)
new_dict = dict(zip(keys, map(tupled_dict.get, keys))