How to print dict sorted by keys? - python

My use case involves printing a json. To aid legibility I want to print it sorted by key. dict comes into the picture as in my case json.loads returns a dict.
Things I tried:
dict.__str__ = myStrFn which results in TypeError: can't set
attributes of built-in/extension type 'dict'
Write myDict along the lines of
https://stackoverflow.com/a/931822/438758. This does not work for
nested dictionaries as the nested dictionaries are of type dict
and not myDict.
What are my options here? I would prefer something which makes print(json.loads(json_str)) work. But would settle for print(str_func(json.loads(json_str))).
If you have a solution specific to my json use case, that would be great too. But I would prefer a generic answer. I am aware that dict keys only need to be hashable and not "comparable" (in the sense that there might not be a total order), so an absolutely generic solution might not be possible. But I am inclined to believe that we can have a solution for all valid JSON types.
I am using python3

print(json.dumps(your_dict, sort_keys=True))

Related

Why django uses tuple of tuples to store static dictionaries and should i do the same?

Why django uses tuple of tuples to store for example choices instead of standard dict?
Example:
ORGINAL_MARKET = 1
SECONDARY_MARKET = 2
MARKET_CHOICES = (
(ORGINAL_MARKET, _('Orginal Market')),
(SECONDARY_MARKET, _('Secondary Market')),
)
And should I do it to when I know the dict won't change in time?
I reckon the tuples are faster but does it matter if when I try to get value I'm still need to convert it to dict to find it?
UPDATE:
Clarification if I use it as a tuple of tuples I will be getting value using
dict(self.MARKET_CHOICES)[self.ORGINAL_MARKET]
Which will work faster, this or storing values in dict from the beginning?
The main reason is that ordering is preserved. If you used a dictionary, and called .items() on it to give the choices for a ChoiceField, for example, the ordering of items in the select box would be unreliable when you rendered the form.
If you want the dict, it is easy to create one from the tuple of tuples, the format is already one accepted by the constructer so you can just call dict() on it.
I don't think the immutability is a correct reason - it is not strictly necessary for them to be a tuple of tuples, a list of tuples or even a list of lists would work as well in Django.
Tuples are immutable, slightly faster, and Django uses them because they're immutable in the choices parameter in fields.
If you're using Python 3.4 or later you can use Enums also which is better than both tuples and dictionaries (but I'm not sure if Django supports them for the choices parameter).
To be clear: I'm not going to use it in choices=- I'm looking for most efficient method – Lord_JABA
If you want your choices to have a particular order (which often is the case with the choices parameter) then use tuples, if you don't care use whatever literal you find easier to type (from the allowed datatypes), I doubt you will see any significant difference regarding the memory/cpu footprint for this specific use case.

json dump with dict as value

This question clearly explains how to send a response using json and python dicts. The example however uses String as the value type in this dict. How would one do this with dict as value type? That is a dict with with dict as a value.
To clarify adityasdarma1's comment: this is not a limitation of Python or Django, but of JSON. In JSON, object keys must always be strings. There is no "tuple" type in JSON or JavaScript anyway; and arrays can't be keys because they are mutable. (In Python, tuples can be dict keys, but lists can't.)
I'm not sure why you would need that, though. You can either concatenate the values in some way to make a string key - eg "bar-baz" - or alternatively you might need a more complex nested structure, with bar as the key of the outer dict and baz as an inner one. Without seeing your full data structure, it's hard to advise further.
By giving it a dict with a dict as a value type.
>>> json.dumps({'foo': {'bar': 42}})
'{"foo": {"bar": 42}}'

How To Create a Unique Key For A Dictionary In Python

What is the best way to generate a unique key for the contents of a dictionary. My intention is to store each dictionary in a document store along with a unique id or hash so that I don't have to load the whole dictionary from the store to check if it exists already or not. Dictionaries with the same keys and values should generate the same id or hash.
I have the following code:
import hashlib
a={'name':'Danish', 'age':107}
b={'age':107, 'name':'Danish'}
print str(a)
print hashlib.sha1(str(a)).hexdigest()
print hashlib.sha1(str(b)).hexdigest()
The last two print statements generate the same string. Is this is a good implementation? or are there any pitfalls with this approach? Is there a better way to do this?
Update
Combining suggestions from the answers below, the following might be a good implementation
import hashlib
a={'name':'Danish', 'age':107}
b={'age':107, 'name':'Danish'}
def get_id_for_dict(dict):
unique_str = ''.join(["'%s':'%s';"%(key, val) for (key, val) in sorted(dict.items())])
return hashlib.sha1(unique_str).hexdigest()
print get_id_for_dict(a)
print get_id_for_dict(b)
I prefer serializing the dict as JSON and hashing that:
import hashlib
import json
a={'name':'Danish', 'age':107}
b={'age':107, 'name':'Danish'}
# Python 2
print hashlib.sha1(json.dumps(a, sort_keys=True)).hexdigest()
print hashlib.sha1(json.dumps(b, sort_keys=True)).hexdigest()
# Python 3
print(hashlib.sha1(json.dumps(a, sort_keys=True).encode()).hexdigest())
print(hashlib.sha1(json.dumps(b, sort_keys=True).encode()).hexdigest())
Returns:
71083588011445f0e65e11c80524640668d3797d
71083588011445f0e65e11c80524640668d3797d
No - you can't rely on particular order of elements when converting dictionary to a string.
You can, however, convert it to sorted list of (key,value) tuples, convert it to a string and compute a hash like this:
a_sorted_list = [(key, a[key]) for key in sorted(a.keys())]
print hashlib.sha1( str(a_sorted_list) ).hexdigest()
It's not fool-proof, as a formating of a list converted to a string or formatting of a tuple can change in some future major python version, sort order depends on locale etc. but I think it can be good enough.
A possible option would be using a serialized representation of the list that preserves order. I am not sure whether the default list to string mechanism imposes any kind of order, but it wouldn't surprise me if it were interpreter-dependent. So, I'd basically build something akin to urlencode that sorts the keys beforehand.
Not that I believe that you method would fail, but I'd rather play with predictable things and avoid undocumented and/or unpredictable behavior. It's true that despite "unordered", dictionaries end up having an order that may even be consistent, but the point is that you shouldn't take that for granted.

Python, checksum of a dict

I'm thinking to create a checksum of a dict to know if it was modified or not
For the moment i have that:
>>> import hashlib
>>> import pickle
>>> d = {'k': 'v', 'k2': 'v2'}
>>> z = pickle.dumps(d)
>>> hashlib.md5(z).hexdigest()
'8521955ed8c63c554744058c9888dc30'
Perhaps a better solution exists?
Note: I want to create an unique id of a dict to create a good Etag.
EDIT: I can have abstract data in the dict.
Something like this:
reduce(lambda x,y : x^y, [hash(item) for item in d.items()])
Take the hash of each (key, value) tuple in the dict and XOR them alltogether.
#katrielalex
If the dict contains unhashable items you could do this:
hash(str(d))
or maybe even better
hash(repr(d))
In Python 3, the hash function is initialized with a random number, which is different for each python session. If that is not acceptable for the intended application, use e.g. zlib.adler32 to build the checksum for a dict:
import zlib
d={'key1':'value1','key2':'value2'}
checksum=0
for item in d.items():
c1 = 1
for t in item:
c1 = zlib.adler32(bytes(repr(t),'utf-8'), c1)
checksum=checksum ^ c1
print(checksum)
I would recommend an approach very similar to the one your propose, but with some extra guarantees:
import hashlib, json
hashlib.md5(json.dumps(d, sort_keys=True, ensure_ascii=True).encode('utf-8')).hexdigest()
sort_keys=True: keep the same hash if the order of your keys changes
ensure_ascii=True: in case you have some non-ascii characters, to make sure the representation does not change
We use this for our ETag.
I don't know whether pickle guarantees you that the hash is serialized the same way every time.
If you only have dictionaries, I would go for o combination of calls to keys(), sorted(), build a string based on the sorted key/value pairs and compute the checksum on that
I think you may not realise some of the subtleties that go into this. The first problem is that the order that items appear in a dict is not defined by the implementation. This means that simply asking for str of a dict doesn't work, because you could have
str(d1) == "{'a':1, 'b':2}"
str(d2) == "{'b':2, 'a':1}"
and these will hash to different values. If you have only hashable items in the dict, you can hash them and then join up their hashes, as #Bart does or simply
hash(tuple(sorted(hash(x) for x in d.items())))
Note the sorted, because you have to ensure that the hashed tuple comes out in the same order irrespective of which order the items appear in the dict. If you have dicts in the dict, you could recurse this, but it will be complicated.
BUT it would be easy to break any implementation like this if you allow arbitrary data in the dictionary, since you can simply write an object with a broken __hash__ implementation and use that. And you can't use id, because then you might have equal items which compare different.
The moral of the story is that hashing dicts isn't supported in Python for a reason.
As you said, you wanted to generate an Etag based on the dictionary content, OrderedDict which preserves the order of the dictionary may be better candidate here. Just iterator through the key,value pairs and construct your Etag string.

Ordering in Python (2.4) dictionary

r_dict={'answer1': "value1",'answer11': "value11",'answer2': "value2",'answer3': "value3",'answer4': "value4",}
for i in r_dict:
if("answer" in i.lower()):
print i
Result is answer11,answer2,snswer4,answer3
I am using Python 2.4.3. I there any way to get the order in which it is populated?
Or is there a way to do this by regular expression since I am using the older Python version?
Dictionaries are unordered - that is, they do have some order, but it's influenced in nonobvious ways by the order of insertion and the hash of the keys. However, there is another implementation that remembers the order of insertion, collections.OrderedDict.
Edit: For Python 2.4, there are several third party implementations. I haven't used any, but since the one from voidspace looks promising.
A dictionary is by construction unordered. If you want an ordered one, use a collections.OrderedDict:
import collections
r_dict = collections.OrderedDict( [ ( 'answer1', "value1"), ('answer11', "value11"), ('answer2', "value2"), ('answer3', "value3"), ('answer4', "value4") ] )
for i in r_dict:
if("answer" in i.lower()):
print i
Not just by using the dictionary by itself. Dictionaries in Python (and a good portion of equivalent non-specialized data structures that involve mapping) are not sorted.
You could potentially subclass dict and override the __setitem__ and __delitem__ methods to add/remove each key to an internal list where you maintain your own sorting. You'd probably then have to override other methods, such as __iter__ to get the sorting you want out of your for loop.
...or just use the odict module as #delnan suggested
Short answer: no. Python dictionaries are fundamentally unordered.

Categories

Resources