In python I have a dictionary that maps tuples to a list of tuples. e.g.
{(1,2): [(2,3),(1,7)]}
I want to be able to encode this data use it with javascript, so I looked into json but it appears keys must be strings so my tuple does not work as a key.
Is the best way to handle this is encode it as "1,2" and then parse it into something I want on the javascript? Or is there a more clever way to handle this.
You might consider saying
{"[1,2]": [(2,3),(1,7)]}
and then when you need to get the value out, you can just parse the keys themselves as JSON objects, which all modern browsers can do with the built-in JSON.parse method (I'm using jQuery.each to iterate here but you could use anything):
var myjson = JSON.parse('{"[1,2]": [[2,3],[1,7]]}');
$.each(myjson, function(keystr,val){
var key = JSON.parse(keystr);
// do something with key and val
});
On the other hand, you might want to just structure your object differently, e.g.
{1: {2: [(2,3),(1,7)]}}
so that instead of saying
myjson[1,2] // doesn't work
which is invalid Javascript syntax, you could say
myjson[1][2] // returns [[2,3],[1,7]]
If your key tuples are truly integer pairs, then the easiest and probably most straightforward approach would be as you suggest.... encode them to a string. You can do this in a one-liner:
>>> simplejson.dumps(dict([("%d,%d" % k, v) for k, v in d.items()]))
'{"1,2": [[2, 3], [1, 7]]}'
This would get you a javascript data structure whose keys you could then split to get the points back again:
'1,2'.split(',')
My recommendation would be:
{ "1": [
{ "2": [[2,3],[1,7]] }
]
}
It's still parsing, but depending on how you use it, it may be easier in this form.
You can't use an array as a key in JSON. The best you can do is encode it. Sorry, but there's really no other sane way to do it.
Could it simply be a two dimensional array? Then you may use integers as keys
Related
I have been using nested arrays parsed from a json.
This ends up giving a gigantic line each time I try to access values in the data.
let's say I have a nested array in the var data, when I try to reach the deeper values, I still have to respect the 80 characters limit. All I want to do is read or modify the value.
self.data["name1"]["name2"][varWithNumber][varWithNumber2][varWithNumber3]
Now, I thought about two possible solutions I could use:
1- split it using temporary vars and then reasign it to the data once I am done ex:
tempData=self.data["name1"]["name2"][varWithNumber]
tempData[varWithNumber2][varWithNumber3]+=1
self.data["name1"]["name2"][varWithNumber]=tempData
I guess this solution would use quite a bit of ressources from all the memory copied around.
2- use the exec function implemented in python and split the string on multiple lines:
exec ('self.data'+
'["name1"]'+
'["name2"]'+
'[varWithNumber]'+
'[varWithNumber2]'+
'[varWithNumber3]+=1')
I have no idea how optimised is the exec function. What would be the most pythonic/optimised way to do this? Is there any other/better way to reach the goal whilst respecting the pep8?
(Bit long for a comment) You don't need exec to do that... you can use the line continuation operator:
self.data["name1"]\
["name2"]\
[varWithNumber]\
[varWithNumber2]\
[varWithNumber3]
Demo:
In [635]: x = [[[[1, 2, 3]]]]
In [636]: x[0]\
...: [0]\
...: [0]\
...: [0]
Out[636]: 1
This seems like the easiest and cleanest way to do it.
Don't use exec unless you have to. Actually, don't use it, ever.
In some cases, keeping a reference to a sub dict works if you are to frequently visit that part of your data structure again and again. It is a matter of deciding what is the best solution to apply given the situation and circumstances.
You are on the right track with your first option, and it's not as memory intensive as you might think. Most things in python are references to places in memory, so let's say we have this json blob (dict in python):
test = {
"name1": {
"name2": {
"foo": {
"count": 1,
"color": "red"
}
}
}
}
Now if you wanted to change both parts of that nested "foo" key, you could first make a reference to it with:
foo_ref = test['name1']['name2']['foo']
Then it's very simple to just
foo_ref['count'] += 1
foo_ref['color'] = 'green'
I have a piece of code that allows me to print dictionary items returned using .json() method on an XHR response from a website:
teamStatDicts = responser[u'teamTableStats']
for statDict in teamStatDicts:
print("{seasonId},{tournamentRegionId},{minsPlayed},"
.decode('cp1252').format(**statDict))
This prints in the following format:
9155,5,900
9155,5,820
...
...
...
9155,5,900
9155,5,820
The above method works fine, providing the keys in the dictionary never change. However in some of the XHR submissions I am making they do. Is there a way that I can print all dictionary values in exactly the same format as above? I've tried a few things but really didn't get anywhere.
In general, give a dict, you can do:
print(','.join(str(v) for v in dct.values()))
The problem here is you don't know the order of the values. i.e. is the first value in the CSV data the seasonId? Is the the tournamentRegionId? the minsPlayed? Or is it something else entirely that you don't know about?
So, my point is that unless you know the field names, you can't put them in a string in any reliable order if the data comes to you as vanilla dicts.
If you're decoding the XHR elsewhere using json, you could make the object_pairs_hook an OrderedDict:
from collections import OrderedDict
import json
...
data = json.loads(datastring, object_pairs_hook=OrderedDict)
Now the data is guaranteed to be in the same order as the datastring was, but that only helps if the data in datastring was ordered in a particular way (which is usually not the case).
What is the best way to generate a unique key for the contents of a dictionary. My intention is to store each dictionary in a document store along with a unique id or hash so that I don't have to load the whole dictionary from the store to check if it exists already or not. Dictionaries with the same keys and values should generate the same id or hash.
I have the following code:
import hashlib
a={'name':'Danish', 'age':107}
b={'age':107, 'name':'Danish'}
print str(a)
print hashlib.sha1(str(a)).hexdigest()
print hashlib.sha1(str(b)).hexdigest()
The last two print statements generate the same string. Is this is a good implementation? or are there any pitfalls with this approach? Is there a better way to do this?
Update
Combining suggestions from the answers below, the following might be a good implementation
import hashlib
a={'name':'Danish', 'age':107}
b={'age':107, 'name':'Danish'}
def get_id_for_dict(dict):
unique_str = ''.join(["'%s':'%s';"%(key, val) for (key, val) in sorted(dict.items())])
return hashlib.sha1(unique_str).hexdigest()
print get_id_for_dict(a)
print get_id_for_dict(b)
I prefer serializing the dict as JSON and hashing that:
import hashlib
import json
a={'name':'Danish', 'age':107}
b={'age':107, 'name':'Danish'}
# Python 2
print hashlib.sha1(json.dumps(a, sort_keys=True)).hexdigest()
print hashlib.sha1(json.dumps(b, sort_keys=True)).hexdigest()
# Python 3
print(hashlib.sha1(json.dumps(a, sort_keys=True).encode()).hexdigest())
print(hashlib.sha1(json.dumps(b, sort_keys=True).encode()).hexdigest())
Returns:
71083588011445f0e65e11c80524640668d3797d
71083588011445f0e65e11c80524640668d3797d
No - you can't rely on particular order of elements when converting dictionary to a string.
You can, however, convert it to sorted list of (key,value) tuples, convert it to a string and compute a hash like this:
a_sorted_list = [(key, a[key]) for key in sorted(a.keys())]
print hashlib.sha1( str(a_sorted_list) ).hexdigest()
It's not fool-proof, as a formating of a list converted to a string or formatting of a tuple can change in some future major python version, sort order depends on locale etc. but I think it can be good enough.
A possible option would be using a serialized representation of the list that preserves order. I am not sure whether the default list to string mechanism imposes any kind of order, but it wouldn't surprise me if it were interpreter-dependent. So, I'd basically build something akin to urlencode that sorts the keys beforehand.
Not that I believe that you method would fail, but I'd rather play with predictable things and avoid undocumented and/or unpredictable behavior. It's true that despite "unordered", dictionaries end up having an order that may even be consistent, but the point is that you shouldn't take that for granted.
I'm thinking to create a checksum of a dict to know if it was modified or not
For the moment i have that:
>>> import hashlib
>>> import pickle
>>> d = {'k': 'v', 'k2': 'v2'}
>>> z = pickle.dumps(d)
>>> hashlib.md5(z).hexdigest()
'8521955ed8c63c554744058c9888dc30'
Perhaps a better solution exists?
Note: I want to create an unique id of a dict to create a good Etag.
EDIT: I can have abstract data in the dict.
Something like this:
reduce(lambda x,y : x^y, [hash(item) for item in d.items()])
Take the hash of each (key, value) tuple in the dict and XOR them alltogether.
#katrielalex
If the dict contains unhashable items you could do this:
hash(str(d))
or maybe even better
hash(repr(d))
In Python 3, the hash function is initialized with a random number, which is different for each python session. If that is not acceptable for the intended application, use e.g. zlib.adler32 to build the checksum for a dict:
import zlib
d={'key1':'value1','key2':'value2'}
checksum=0
for item in d.items():
c1 = 1
for t in item:
c1 = zlib.adler32(bytes(repr(t),'utf-8'), c1)
checksum=checksum ^ c1
print(checksum)
I would recommend an approach very similar to the one your propose, but with some extra guarantees:
import hashlib, json
hashlib.md5(json.dumps(d, sort_keys=True, ensure_ascii=True).encode('utf-8')).hexdigest()
sort_keys=True: keep the same hash if the order of your keys changes
ensure_ascii=True: in case you have some non-ascii characters, to make sure the representation does not change
We use this for our ETag.
I don't know whether pickle guarantees you that the hash is serialized the same way every time.
If you only have dictionaries, I would go for o combination of calls to keys(), sorted(), build a string based on the sorted key/value pairs and compute the checksum on that
I think you may not realise some of the subtleties that go into this. The first problem is that the order that items appear in a dict is not defined by the implementation. This means that simply asking for str of a dict doesn't work, because you could have
str(d1) == "{'a':1, 'b':2}"
str(d2) == "{'b':2, 'a':1}"
and these will hash to different values. If you have only hashable items in the dict, you can hash them and then join up their hashes, as #Bart does or simply
hash(tuple(sorted(hash(x) for x in d.items())))
Note the sorted, because you have to ensure that the hashed tuple comes out in the same order irrespective of which order the items appear in the dict. If you have dicts in the dict, you could recurse this, but it will be complicated.
BUT it would be easy to break any implementation like this if you allow arbitrary data in the dictionary, since you can simply write an object with a broken __hash__ implementation and use that. And you can't use id, because then you might have equal items which compare different.
The moral of the story is that hashing dicts isn't supported in Python for a reason.
As you said, you wanted to generate an Etag based on the dictionary content, OrderedDict which preserves the order of the dictionary may be better candidate here. Just iterator through the key,value pairs and construct your Etag string.
I have found that when the following is run, python's json module (included since 2.6) converts int dictionary keys to strings.
>>> import json
>>> releases = {1: "foo-v0.1"}
>>> json.dumps(releases)
'{"1": "foo-v0.1"}'
Is there any easy way to preserve the key as an int, without needing to parse the string on dump and load.
I believe it would be possible using the hooks provided by the json module, but again this still requires parsing.
Is there possibly an argument I have overlooked?
Sub-question:
Thanks for the answers. Seeing as json works as I feared, is there an easy way to convey key type by maybe parsing the output of dumps?
Also I should note the code doing the dumping and the code downloading the json object from a server and loading it, are both written by me.
This is one of those subtle differences among various mapping collections that can bite you. JSON treats keys as strings; Python supports distinct keys differing only in type.
In Python (and apparently in Lua) the keys to a mapping (dictionary or table, respectively) are object references. In Python they must be immutable types, or they must be objects which implement a __hash__ method. (The Lua docs suggest that it automatically uses the object's ID as a hash/key even for mutable objects and relies on string interning to ensure that equivalent strings map to the same objects).
In Perl, Javascript, awk and many other languages the keys for hashes, associative arrays or whatever they're called for the given language, are strings (or "scalars" in Perl). In perl $foo{1}, $foo{1.0}, and $foo{"1"} are all references to the same mapping in %foo --- the key is evaluated as a scalar!
JSON started as a Javascript serialization technology. (JSON stands for JavaScript Object Notation.) Naturally it implements semantics for its mapping notation which are consistent with its mapping semantics.
If both ends of your serialization are going to be Python then you'd be better off using pickles. If you really need to convert these back from JSON into native Python objects I guess you have a couple of choices. First you could try (try: ... except: ...) to convert any key to a number in the event of a dictionary look-up failure. Alternatively, if you add code to the other end (the serializer or generator of this JSON data) then you could have it perform a JSON serialization on each of the key values --- providing those as a list of keys. (Then your Python code would first iterate over the list of keys, instantiating/deserializing them into native Python objects ... and then use those for access the values out of the mapping).
No, there is no such thing as a Number key in JavaScript. All object properties are converted to String.
var a= {1: 'a'};
for (k in a)
alert(typeof k); // 'string'
This can lead to some curious-seeming behaviours:
a[999999999999999999999]= 'a'; // this even works on Array
alert(a[1000000000000000000000]); // 'a'
alert(a['999999999999999999999']); // fail
alert(a['1e+21']); // 'a'
JavaScript Objects aren't really proper mappings as you'd understand it in languages like Python, and using keys that aren't String results in weirdness. This is why JSON always explicitly writes keys as strings, even where it doesn't look necessary.
Answering your subquestion:
It can be accomplished by using json.loads(jsonDict, object_hook=jsonKeys2int)
def jsonKeys2int(x):
if isinstance(x, dict):
return {int(k):v for k,v in x.items()}
return x
This function will also work for nested dicts and uses a dict comprehension.
If you want to to cast the values too, use:
def jsonKV2int(x):
if isinstance(x, dict):
return {int(k):(int(v) if isinstance(v, unicode) else v) for k,v in x.items()}
return x
Which tests the instance of the values and casts them only if they are strings objects (unicode to be exact).
Both functions assumes keys (and values) to be integers.
Thanks to:
How to use if/else in a dictionary comprehension?
Convert a string key to int in a Dictionary
Alternatively you can also try converting dictionary to a list of [(k1,v1),(k2,v2)] format while encoding it using json, and converting it back to dictionary after decoding it back.
>>>> import json
>>>> json.dumps(releases.items())
'[[1, "foo-v0.1"]]'
>>>> releases = {1: "foo-v0.1"}
>>>> releases == dict(json.loads(json.dumps(releases.items())))
True
I believe this will need some more work like having some sort of flag to identify what all parameters to be converted to dictionary after decoding it back from json.
I've gotten bitten by the same problem. As others have pointed out, in JSON, the mapping keys must be strings. You can do one of two things. You can use a less strict JSON library, like demjson, which allows integer strings. If no other programs (or no other in other languages) are going to read it, then you should be okay. Or you can use a different serialization language. I wouldn't suggest pickle. It's hard to read, and is not designed to be secure. Instead, I'd suggest YAML, which is (nearly) a superset of JSON, and does allow integer keys. (At least PyYAML does.)
Here is my solution! I used object_hook, it is useful when you have nested json
>>> import json
>>> json_data = '{"1": "one", "2": {"-3": "minus three", "4": "four"}}'
>>> py_dict = json.loads(json_data, object_hook=lambda d: {int(k) if k.lstrip('-').isdigit() else k: v for k, v in d.items()})
>>> py_dict
{1: 'one', 2: {-3: 'minus three', 4: 'four'}}
There is filter only for parsing json key to int. You can use int(v) if v.lstrip('-').isdigit() else v filter for json value too.
Convert the dictionary to be string by using str(dict) and then convert it back to dict by doing this:
import ast
ast.literal_eval(string)
I made a very simple extension of Murmel's answer which I think will work on a pretty arbitrary dictionary (including nested) assuming it can be dumped by JSON in the first place. Any keys which can be interpreted as integers will be cast to int. No doubt this is not very efficient, but it works for my purposes of storing to and loading from json strings.
def convert_keys_to_int(d: dict):
new_dict = {}
for k, v in d.items():
try:
new_key = int(k)
except ValueError:
new_key = k
if type(v) == dict:
v = _convert_keys_to_int(v)
new_dict[new_key] = v
return new_dict
Assuming that all keys in the original dict are integers if they can be cast to int, then this will return the original dictionary after storing as a json.
e.g.
>>>d = {1: 3, 2: 'a', 3: {1: 'a', 2: 10}, 4: {'a': 2, 'b': 10}}
>>>convert_keys_to_int(json.loads(json.dumps(d))) == d
True
[NSFW] You can write your json.dumps by yourself, here is a example from djson: encoder.py. You can use it like this:
assert dumps({1: "abc"}) == '{1: "abc"}'