Python's json module, converts int dictionary keys to strings

Python's json module, converts int dictionary keys to strings - python

I have found that when the following is run, python's json module (included since 2.6) converts int dictionary keys to strings.
>>> import json
>>> releases = {1: "foo-v0.1"}
>>> json.dumps(releases)
'{"1": "foo-v0.1"}'
Is there any easy way to preserve the key as an int, without needing to parse the string on dump and load.
I believe it would be possible using the hooks provided by the json module, but again this still requires parsing.
Is there possibly an argument I have overlooked?
Sub-question:
Thanks for the answers. Seeing as json works as I feared, is there an easy way to convey key type by maybe parsing the output of dumps?
Also I should note the code doing the dumping and the code downloading the json object from a server and loading it, are both written by me.

This is one of those subtle differences among various mapping collections that can bite you. JSON treats keys as strings; Python supports distinct keys differing only in type.
In Python (and apparently in Lua) the keys to a mapping (dictionary or table, respectively) are object references. In Python they must be immutable types, or they must be objects which implement a __hash__ method. (The Lua docs suggest that it automatically uses the object's ID as a hash/key even for mutable objects and relies on string interning to ensure that equivalent strings map to the same objects).
In Perl, Javascript, awk and many other languages the keys for hashes, associative arrays or whatever they're called for the given language, are strings (or "scalars" in Perl). In perl $foo{1}, $foo{1.0}, and $foo{"1"} are all references to the same mapping in %foo --- the key is evaluated as a scalar!
JSON started as a Javascript serialization technology. (JSON stands for JavaScript Object Notation.) Naturally it implements semantics for its mapping notation which are consistent with its mapping semantics.
If both ends of your serialization are going to be Python then you'd be better off using pickles. If you really need to convert these back from JSON into native Python objects I guess you have a couple of choices. First you could try (try: ... except: ...) to convert any key to a number in the event of a dictionary look-up failure. Alternatively, if you add code to the other end (the serializer or generator of this JSON data) then you could have it perform a JSON serialization on each of the key values --- providing those as a list of keys. (Then your Python code would first iterate over the list of keys, instantiating/deserializing them into native Python objects ... and then use those for access the values out of the mapping).

No, there is no such thing as a Number key in JavaScript. All object properties are converted to String.
var a= {1: 'a'};
for (k in a)
alert(typeof k); // 'string'
This can lead to some curious-seeming behaviours:
a[999999999999999999999]= 'a'; // this even works on Array
alert(a[1000000000000000000000]); // 'a'
alert(a['999999999999999999999']); // fail
alert(a['1e+21']); // 'a'
JavaScript Objects aren't really proper mappings as you'd understand it in languages like Python, and using keys that aren't String results in weirdness. This is why JSON always explicitly writes keys as strings, even where it doesn't look necessary.

Answering your subquestion:
It can be accomplished by using json.loads(jsonDict, object_hook=jsonKeys2int)
def jsonKeys2int(x):
if isinstance(x, dict):
return {int(k):v for k,v in x.items()}
return x
This function will also work for nested dicts and uses a dict comprehension.
If you want to to cast the values too, use:
def jsonKV2int(x):
if isinstance(x, dict):
return {int(k):(int(v) if isinstance(v, unicode) else v) for k,v in x.items()}
return x
Which tests the instance of the values and casts them only if they are strings objects (unicode to be exact).
Both functions assumes keys (and values) to be integers.
Thanks to:
How to use if/else in a dictionary comprehension?
Convert a string key to int in a Dictionary

Alternatively you can also try converting dictionary to a list of [(k1,v1),(k2,v2)] format while encoding it using json, and converting it back to dictionary after decoding it back.
>>>> import json
>>>> json.dumps(releases.items())
'[[1, "foo-v0.1"]]'
>>>> releases = {1: "foo-v0.1"}
>>>> releases == dict(json.loads(json.dumps(releases.items())))
True
I believe this will need some more work like having some sort of flag to identify what all parameters to be converted to dictionary after decoding it back from json.

I've gotten bitten by the same problem. As others have pointed out, in JSON, the mapping keys must be strings. You can do one of two things. You can use a less strict JSON library, like demjson, which allows integer strings. If no other programs (or no other in other languages) are going to read it, then you should be okay. Or you can use a different serialization language. I wouldn't suggest pickle. It's hard to read, and is not designed to be secure. Instead, I'd suggest YAML, which is (nearly) a superset of JSON, and does allow integer keys. (At least PyYAML does.)

Here is my solution! I used object_hook, it is useful when you have nested json
>>> import json
>>> json_data = '{"1": "one", "2": {"-3": "minus three", "4": "four"}}'
>>> py_dict = json.loads(json_data, object_hook=lambda d: {int(k) if k.lstrip('-').isdigit() else k: v for k, v in d.items()})
>>> py_dict
{1: 'one', 2: {-3: 'minus three', 4: 'four'}}
There is filter only for parsing json key to int. You can use int(v) if v.lstrip('-').isdigit() else v filter for json value too.

Convert the dictionary to be string by using str(dict) and then convert it back to dict by doing this:
import ast
ast.literal_eval(string)

I made a very simple extension of Murmel's answer which I think will work on a pretty arbitrary dictionary (including nested) assuming it can be dumped by JSON in the first place. Any keys which can be interpreted as integers will be cast to int. No doubt this is not very efficient, but it works for my purposes of storing to and loading from json strings.
def convert_keys_to_int(d: dict):
new_dict = {}
for k, v in d.items():
try:
new_key = int(k)
except ValueError:
new_key = k
if type(v) == dict:
v = _convert_keys_to_int(v)
new_dict[new_key] = v
return new_dict
Assuming that all keys in the original dict are integers if they can be cast to int, then this will return the original dictionary after storing as a json.
e.g.
>>>d = {1: 3, 2: 'a', 3: {1: 'a', 2: 10}, 4: {'a': 2, 'b': 10}}
>>>convert_keys_to_int(json.loads(json.dumps(d))) == d
True

[NSFW] You can write your json.dumps by yourself, here is a example from djson: encoder.py. You can use it like this:
assert dumps({1: "abc"}) == '{1: "abc"}'

Related

Get reference to Python dict key

In Python (3.7 and above) I would like to obtain a reference to a dict key. More precisely, let d be a dict where the keys are strings. In the following code, the value of k is potentially stored at two distinct locations in memory (one pointed to by the dict and one pointed to by k), whereas the value of v is stored at only one location (the one pointed to by the dict).
# d is a dict
# k is a string dynamically constructed, in particular not from iterating over d's keys
if k in d:
v = d[k]
# Now store k and v in other data structures
In my case, the dict is very large and the string keys are very long. To keep memory usage down I would like to replace k with a pointer to the corresponding string used by d before storing k in other data structures. Is there a straightforward way of doing this, that is using the keys of the dict as a string pool?
(Footnote: this may seem as premature optimisation, and perhaps it is, but being an old-school C programmer I sleep better at night doing "memory tricks". Joke aside, I do genuinely would like to know the answer out of curiosity, and I am indeed going to run my code on a Raspberry Pi and will probably face memory issues.)

Where does the key k come from? Is it dynamically constructed by something like str.join, + , slicing another string, bytes.decode etc? Is it read from a file or input()? Did you get it from iterating over d at some point? Or does it originate from a literal somewhere in your source code?
In the last two cases, you don't need to worry about it since it is going to be a single instance anyway.
If not, you could use sys.intern to intern your keys. If a == b then sys.intern(a) is sys.intern(b).
Another possible solution, in case you might want to garbage collect the strings at some point or you want to intern some non-string values, like tuples of strings, you could do the following:
# create this dictionary once after `d` has all the right keys
canonical_keys = {key: key for key in d}
k = canonical_keys.get(k, k) # use the same instance if possible
I recommend reading up on Python's data model.

json dump with dict as value

This question clearly explains how to send a response using json and python dicts. The example however uses String as the value type in this dict. How would one do this with dict as value type? That is a dict with with dict as a value.

To clarify adityasdarma1's comment: this is not a limitation of Python or Django, but of JSON. In JSON, object keys must always be strings. There is no "tuple" type in JSON or JavaScript anyway; and arrays can't be keys because they are mutable. (In Python, tuples can be dict keys, but lists can't.)
I'm not sure why you would need that, though. You can either concatenate the values in some way to make a string key - eg "bar-baz" - or alternatively you might need a more complex nested structure, with bar as the key of the outer dict and baz as an inner one. Without seeing your full data structure, it's hard to advise further.

By giving it a dict with a dict as a value type.
>>> json.dumps({'foo': {'bar': 42}})
'{"foo": {"bar": 42}}'

How To Create a Unique Key For A Dictionary In Python

What is the best way to generate a unique key for the contents of a dictionary. My intention is to store each dictionary in a document store along with a unique id or hash so that I don't have to load the whole dictionary from the store to check if it exists already or not. Dictionaries with the same keys and values should generate the same id or hash.
I have the following code:
import hashlib
a={'name':'Danish', 'age':107}
b={'age':107, 'name':'Danish'}
print str(a)
print hashlib.sha1(str(a)).hexdigest()
print hashlib.sha1(str(b)).hexdigest()
The last two print statements generate the same string. Is this is a good implementation? or are there any pitfalls with this approach? Is there a better way to do this?
Update
Combining suggestions from the answers below, the following might be a good implementation
import hashlib
a={'name':'Danish', 'age':107}
b={'age':107, 'name':'Danish'}
def get_id_for_dict(dict):
unique_str = ''.join(["'%s':'%s';"%(key, val) for (key, val) in sorted(dict.items())])
return hashlib.sha1(unique_str).hexdigest()
print get_id_for_dict(a)
print get_id_for_dict(b)

I prefer serializing the dict as JSON and hashing that:
import hashlib
import json
a={'name':'Danish', 'age':107}
b={'age':107, 'name':'Danish'}
# Python 2
print hashlib.sha1(json.dumps(a, sort_keys=True)).hexdigest()
print hashlib.sha1(json.dumps(b, sort_keys=True)).hexdigest()
# Python 3
print(hashlib.sha1(json.dumps(a, sort_keys=True).encode()).hexdigest())
print(hashlib.sha1(json.dumps(b, sort_keys=True).encode()).hexdigest())
Returns:
71083588011445f0e65e11c80524640668d3797d
71083588011445f0e65e11c80524640668d3797d

No - you can't rely on particular order of elements when converting dictionary to a string.
You can, however, convert it to sorted list of (key,value) tuples, convert it to a string and compute a hash like this:
a_sorted_list = [(key, a[key]) for key in sorted(a.keys())]
print hashlib.sha1( str(a_sorted_list) ).hexdigest()
It's not fool-proof, as a formating of a list converted to a string or formatting of a tuple can change in some future major python version, sort order depends on locale etc. but I think it can be good enough.

A possible option would be using a serialized representation of the list that preserves order. I am not sure whether the default list to string mechanism imposes any kind of order, but it wouldn't surprise me if it were interpreter-dependent. So, I'd basically build something akin to urlencode that sorts the keys beforehand.
Not that I believe that you method would fail, but I'd rather play with predictable things and avoid undocumented and/or unpredictable behavior. It's true that despite "unordered", dictionaries end up having an order that may even be consistent, but the point is that you shouldn't take that for granted.

Python, checksum of a dict

I'm thinking to create a checksum of a dict to know if it was modified or not
For the moment i have that:
>>> import hashlib
>>> import pickle
>>> d = {'k': 'v', 'k2': 'v2'}
>>> z = pickle.dumps(d)
>>> hashlib.md5(z).hexdigest()
'8521955ed8c63c554744058c9888dc30'
Perhaps a better solution exists?
Note: I want to create an unique id of a dict to create a good Etag.
EDIT: I can have abstract data in the dict.

Something like this:
reduce(lambda x,y : x^y, [hash(item) for item in d.items()])
Take the hash of each (key, value) tuple in the dict and XOR them alltogether.
#katrielalex
If the dict contains unhashable items you could do this:
hash(str(d))
or maybe even better
hash(repr(d))

In Python 3, the hash function is initialized with a random number, which is different for each python session. If that is not acceptable for the intended application, use e.g. zlib.adler32 to build the checksum for a dict:
import zlib
d={'key1':'value1','key2':'value2'}
checksum=0
for item in d.items():
c1 = 1
for t in item:
c1 = zlib.adler32(bytes(repr(t),'utf-8'), c1)
checksum=checksum ^ c1
print(checksum)

I would recommend an approach very similar to the one your propose, but with some extra guarantees:
import hashlib, json
hashlib.md5(json.dumps(d, sort_keys=True, ensure_ascii=True).encode('utf-8')).hexdigest()
sort_keys=True: keep the same hash if the order of your keys changes
ensure_ascii=True: in case you have some non-ascii characters, to make sure the representation does not change
We use this for our ETag.

I don't know whether pickle guarantees you that the hash is serialized the same way every time.
If you only have dictionaries, I would go for o combination of calls to keys(), sorted(), build a string based on the sorted key/value pairs and compute the checksum on that

I think you may not realise some of the subtleties that go into this. The first problem is that the order that items appear in a dict is not defined by the implementation. This means that simply asking for str of a dict doesn't work, because you could have
str(d1) == "{'a':1, 'b':2}"
str(d2) == "{'b':2, 'a':1}"
and these will hash to different values. If you have only hashable items in the dict, you can hash them and then join up their hashes, as #Bart does or simply
hash(tuple(sorted(hash(x) for x in d.items())))
Note the sorted, because you have to ensure that the hashed tuple comes out in the same order irrespective of which order the items appear in the dict. If you have dicts in the dict, you could recurse this, but it will be complicated.
BUT it would be easy to break any implementation like this if you allow arbitrary data in the dictionary, since you can simply write an object with a broken __hash__ implementation and use that. And you can't use id, because then you might have equal items which compare different.
The moral of the story is that hashing dicts isn't supported in Python for a reason.

As you said, you wanted to generate an Etag based on the dictionary content, OrderedDict which preserves the order of the dictionary may be better candidate here. Just iterator through the key,value pairs and construct your Etag string.

Best way to encode tuples with json

In python I have a dictionary that maps tuples to a list of tuples. e.g.
{(1,2): [(2,3),(1,7)]}
I want to be able to encode this data use it with javascript, so I looked into json but it appears keys must be strings so my tuple does not work as a key.
Is the best way to handle this is encode it as "1,2" and then parse it into something I want on the javascript? Or is there a more clever way to handle this.

You might consider saying
{"[1,2]": [(2,3),(1,7)]}
and then when you need to get the value out, you can just parse the keys themselves as JSON objects, which all modern browsers can do with the built-in JSON.parse method (I'm using jQuery.each to iterate here but you could use anything):
var myjson = JSON.parse('{"[1,2]": [[2,3],[1,7]]}');
$.each(myjson, function(keystr,val){
var key = JSON.parse(keystr);
// do something with key and val
});
On the other hand, you might want to just structure your object differently, e.g.
{1: {2: [(2,3),(1,7)]}}
so that instead of saying
myjson[1,2] // doesn't work
which is invalid Javascript syntax, you could say
myjson[1][2] // returns [[2,3],[1,7]]

If your key tuples are truly integer pairs, then the easiest and probably most straightforward approach would be as you suggest.... encode them to a string. You can do this in a one-liner:
>>> simplejson.dumps(dict([("%d,%d" % k, v) for k, v in d.items()]))
'{"1,2": [[2, 3], [1, 7]]}'
This would get you a javascript data structure whose keys you could then split to get the points back again:
'1,2'.split(',')

My recommendation would be:
{ "1": [
{ "2": [[2,3],[1,7]] }
]
}
It's still parsing, but depending on how you use it, it may be easier in this form.

You can't use an array as a key in JSON. The best you can do is encode it. Sorry, but there's really no other sane way to do it.

Could it simply be a two dimensional array? Then you may use integers as keys

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.