Pickling dict in Python

Pickling dict in Python - python

Can I expect the string representation of the same pickled dict to be consistent across different machines/runs for the same Python version?
In the scope of one run on the same machine?
e.g.
# Python 2.7
import pickle
initial = pickle.dumps({'a': 1, 'b': 2})
for _ in xrange(1000**2):
assert pickle.dumps({'a': 1, 'b': 2}) == initial
Does it depend on the actual structure of my dict object (nested values etc.)?
UPD:
The thing is - I can't actually make the code above fail in the scope of one run (Python 2.7) no matter how my dict object looks like (what keys/values etc.)

You can't in the general case, for the same reasons you can't rely on the dictionary order in other scenarios; pickling is not special here. The string representation of a dictionary is a function of the current dictionary iteration order, regardless of how you loaded it.
Your own small test is too limited, because it doesn't do any mutation of the test dictionary and doesn't use keys that would cause collisions. You create dictionaries with the exact same Python source code, so those will produce the same output order because the editing history of the dictionaries is exactly the same, and two single-character keys that use consecutive letters from the ASCII character set are not likely to cause a collision.
Not that you actually test string representations being equal, you only test if their contents are the same (two dictionaries that differ in string representation can still be equal because the same key-value pairs, subjected to a different insertion order, can produce different dictionary output order).
Next, the most important factor in the dictionary iteration order before cPython 3.6 is the hash key generation function, which must be stable during a single Python executable lifetime (or otherwise you'd break all dictionaries), so a single-process test would never see dictionary order change on the basis of different hash function results.
Currently, all pickling protocol revisions store the data for a dictionary as a stream of key-value pairs; on loading the stream is decoded and key-value pairs are assigned back to the dictionary in the on-disk order, so the insertion order is at least stable from that perspective. BUT between different Python versions, machine architectures and local configuration, the hash function results absolutely will differ:
The PYTHONHASHSEED environment variable, is used in the generation of hashes for str, bytes and datetime keys. The setting is available as of Python 2.6.8 and 3.2.3, and is enabled and set to random by default as of Python 3.3. So the setting varies from Python version to Python version, and can be set to something different locally.
The hash function produces a ssize_t integer, a platform-dependent signed integer type, so different architectures can produce different hashes just because they use a larger or smaller ssize_t type definition.
With different hash function output from machine to machine and from Python run to Python run, you will see different string representations of a dictionary.
And finally, as of cPython 3.6, the implementation of the dict type changed to a more compact format that also happens to preserve insertion order. As of Python 3.7, the language specification has changed to make this behaviour mandatory, so other Python implementations have to implement the same semantics. So pickling and unpickling between different Python implementations or versions predating Python 3.7 can also result in a different dictionary output order, even with all other factors equal.

No, you cannot. This depends on lot of things, including key values, interpreter state and python version.
If you need consistent representation, consider using JSON with canonical form.
EDIT
I'm not quite sure why people downvoting this without any comments, but I'll clarify.
pickle is not meant to produce reliable representations, its pure machine-(not human-) readable serializer.
Python version backward/forward compatibility is a thing, but it applies only for ability to deserialize identic object inside interpreter — i.e. when you dump in one version and load in another, it's guaranteed to have have same behaviour of same public interfaces. Neither serialized text representation or internal memory structure claimed to be the same (and IIRC, it never did).
Easiest way to check this is to dump same data in versions with significant difference in structure handling and/or seed handling while keeping your keys out of cached range (no short integers nor strings):
Python 3.5.6 (default, Oct 26 2018, 11:00:52)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> d = {'first_string_key': 1, 'second_key_string': 2}
>>> pickle.dump
>>> pickle.dumps(d)
b'\x80\x03}q\x00(X\x11\x00\x00\x00second_key_stringq\x01K\x02X\x10\x00\x00\x00first_string_keyq\x02K\x01u.'
Python 3.6.7 (default, Oct 26 2018, 11:02:59)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> d = {'first_string_key': 1, 'second_key_string': 2}
>>> pickle.dumps(d)
b'\x80\x03}q\x00(X\x10\x00\x00\x00first_string_keyq\x01K\x01X\x11\x00\x00\x00second_key_stringq\x02K\x02u.'

Python2 dictinaries are unordered; the order depends on the hash values of keys as explained in this great answer by Martijn Pieters. I don't think you can use a dict here, but you could use an OrderedDict (requires Python 2.7 or higher) which maintains the order of the keys. For example,
from collections import OrderedDict
data = [('b', 0), ('a', 0)]
d = dict(data)
od = OrderedDict(data)
print(d)
print(od)
#{'a': 0, 'b': 0}
#OrderedDict([('b', 0), ('a', 0)])
You can pickle an OrderedDict like you would pickle a dict, but order would be preserved, and the resulting string would be the same when pickling same objects.
from collections import OrderedDict
import pickle
data = [('a', 1), ('b', 2)]
od = OrderedDict(data)
s = pickle.dumps(od)
print(s)
Note that you shouldn't pass a dict in OrderedDict's constructor as the keys would be already placed. If you have a dictionary, you should first convert it to tuples with the desired order. OrderedDict is a subclass of dict and has all the dict methods, so you could create an empty object and assign new keys.
Your test doesn't fail because you're using the same Python version and the same conditions - the order of the dictionary will not change randomly between loop iterations. But we can demonstrate how your code fails to produce differend strings when we change the order of keys in the dictionary.
import pickle
initial = pickle.dumps({'a': 1, 'b': 2})
assert pickle.dumps({'b': 2, 'a': 1}) != initial
The resulting string should be different when we put key 'b' first (it would be different in Python >= 3.6), but in Python2 it's the same because key 'a' is placed before key 'b'.
To answer your main question, Python2 dictionaries are unordered, but a dictionary is likely to have the same order when using the same code and Python version. However that order may not be the same as the order in which you placed the items in the dictionary. If the order is important it's best to use an OrderedDict or update your Python version.

As with a frustratingly large number of things in Python, the answer is "sort of". Straight from the docs,
The pickle serialization format is guaranteed to be backwards compatible across Python releases.
That's potentially ever so subtly different from what you're asking. If it's a valid pickled dictionary now, it'll always be a valid pickled dictionary, and it'll always deserialize to the correct dictionary. That leaves unspoken a few properties which you might expect and which don't have to hold:
Pickling doesn't have to be deterministic, even for the same object in the same Python instance on the same platform. The same dictionary could have infinitely many possible pickled representations (not that we would expect the format to ever be inefficient enough to support arbitrarily large degrees of extra padding). As the other answers point out, dictionaries don't have a defined sort order, and this can give at least n! string representations of a dictionary with n elements.
Going further with the last point, it isn't guaranteed that pickle is consistent even in a single Python instance. In practice those changes don't currently happen, but that behavior isn't guaranteed to remain in future versions of Python.
Future versions of Python don't need to serialize dictionaries in a way which is compatible with current versions. The only promise we have is that they will be able to correctly deserialize our dictionaries. Currently dictionaries are supported the same in all Pickle formats, but that need not remain the case forever (not that I suspect it would ever change).

If you don't modify the dict its string representation won't change during a given run of the program, and its .keys method will return the keys in the same order. However, the order can change from run to run (before Python 3.6).
Also, two different dict objects that have identical key-value pairs are not guaranteed to use the same order (pre Python 3.6).
BTW, it's not a good idea to shadow a module name with your own variables, like you do with that lambda. It makes the code harder to read, and will lead to confusing error messages if you forget that you shadowed the module & try to access some other name from it later in the program.

Related

Pass Dictionary from LabVIEW to python script via a Python Node

TLDR: I am making a python wrapper around something for LabVIEW to use and I want to pass a dict (or even kwargs) [i.e. key/value pairs] to a python script so I can have more dynamic function arguments.
LabVIEW 2018 implemented a Python Node which allows LabVIEW to interact with python scripts by calling, passing, and getting returned variables.
The issue is it doesn't appear to have native support for the dict type:
Python Node Details Supported Data Types
The Python Node supports a large number of data types. You can use
this node to call the following data types:
Numerics Arrays, including multi-dimensional arrays Strings Clusters
Calling Conventions
This node converts integers and strings to the corresponding data
types in Python, converts arrays to lists, and converts clusters to
tuples.
Of course python is built around dictionaries but it appears LabVIEW does not support any way to pass a dictionary object.
Does anyone know of a way I can pass a cluster of named elements (or any other dictionary type) to a python script as a dict object?

There is no direct way to do it.
The simplest way on both sides would be to use JSON strings.
From LabVIEW to Python
LabVIEW Clusters can be flattened to JSON (Strings > Flatten/unflatten):
The resulting string can be converted to a dict in just one line (plus an import) python:
>>> import json
>>> myDict=json.loads('{"MyString":"FooBar","MySubCluster":{"MyInt":42,"MyFloat":3.1410000000000000142},"myIntArray":[1,2,3]}')
>>> myDict
{u'MyString': u'FooBar', u'MySubCluster': {u'MyInt': 42, u'MyFloat': 3.141}, u'myIntArray': [1, 2, 3]}
>>> myDict['MySubCluster']['MyFloat']
3.141
From Python to LabVIEW
The Python side is easy again:
>>> MyJson = json.dumps(myDict)
In LabVIEW, unflatten JSON from string, and wire a cluster of the expected structure with default values:
This of course requires that the structure of the dict is fixed.
If it is not, you can still access single elements by giving the path to them as array:
Limitations:
While this works like a charm (did you even notice that my locale uses comma as decimal sign?), not all datatypes are supported. For example, JSON itself does not have a time datatype, nor a dedicated path datatype, and so, the JSON VIs refuse to handle them. Use a numerical or string datatype, and convert it within LabVIEW.
Excourse: A dict-ish datatype in LabVIEW
If you ever need a dynamic datatype in LabVIEW, have a look at attributes of variants.
These are pairs of keys (string) and values (any datatype!), which can be added and reads about as simple as in Python. But there is no (builtin, simple) way to use this to interchange data with Python.

Two identical dictionaries differ (by using diff) after being pickled

I have a dictionary whose keys are tuples like (int, str, int, str, int), and the corresponding values are lists of floats of the same size.
I pickled the dictionary twice by the same script:
import pickle
with open(name, 'wb') as source:
pickle.dump(the_dict, source)
For the two resulting binary files test_1 and test_2, I run
diff test_1 test_2
in a terminal (I'm using macOS) to see whether I can use diff to tell the difference. However, I received
Binary files test_1 and test_2 differ
Why? Was the same dictionary being pickled in different ways? Does it mean I cannot use diff to tell whether two dictionaries are identical?

Depending on what version of Python you are using, Python versions before v3.6 do not remember the order of insertion. Python v3.6 made this an implementation detail and v3.7 made it a language feature.
For backwards compatibility, you shouldn't depend on the dictionary remembering the order of inserted keys. Instead, you can use OrderedDict from the Collections module.
Also, using diff on pickled dict data may show differences in the data even though the actual dictionaries are equivalent -- since dicts, unlike lists, generally make no assurances on order state (see above for when that is not the case).

python OrderedDict not iterating through dictionary in original order [duplicate]

I have a dictionary that I declared in a particular order and want to keep it in that order all the time. The keys/values can't really be kept in order based on their value, I just want it in the order that I declared it.
So if I have the dictionary:
d = {'ac': 33, 'gw': 20, 'ap': 102, 'za': 321, 'bs': 10}
It isn't in that order if I view it or iterate through it. Is there any way to make sure Python will keep the explicit order that I declared the keys/values in?

From Python 3.6 onwards, the standard dict type maintains insertion order by default.
Defining
d = {'ac':33, 'gw':20, 'ap':102, 'za':321, 'bs':10}
will result in a dictionary with the keys in the order listed in the source code.
This was achieved by using a simple array with integers for the sparse hash table, where those integers index into another array that stores the key-value pairs (plus the calculated hash). That latter array just happens to store the items in insertion order, and the whole combination actually uses less memory than the implementation used in Python 3.5 and before. See the original idea post by Raymond Hettinger for details.
In 3.6 this was still considered an implementation detail; see the What's New in Python 3.6 documentation:
The order-preserving aspect of this new implementation is considered an implementation detail and should not be relied upon (this may change in the future, but it is desired to have this new dict implementation in the language for a few releases before changing the language spec to mandate order-preserving semantics for all current and future Python implementations; this also helps preserve backwards-compatibility with older versions of the language where random iteration order is still in effect, e.g. Python 3.5).
Python 3.7 elevates this implementation detail to a language specification, so it is now mandatory that dict preserves order in all Python implementations compatible with that version or newer. See the pronouncement by the BDFL. As of Python 3.8, dictionaries also support iteration in reverse.
You may still want to use the collections.OrderedDict() class in certain cases, as it offers some additional functionality on top of the standard dict type. Such as as being reversible (this extends to the view objects), and supporting reordering (via the move_to_end() method).

from collections import OrderedDict
OrderedDict((word, True) for word in words)
contains
OrderedDict([('He', True), ('will', True), ('be', True), ('the', True), ('winner', True)])
If the values are True (or any other immutable object), you can also use:
OrderedDict.fromkeys(words, True)

Rather than explaining the theoretical part, I'll give a simple example.
>>> from collections import OrderedDict
>>> my_dictionary=OrderedDict()
>>> my_dictionary['foo']=3
>>> my_dictionary['aol']=1
>>> my_dictionary
OrderedDict([('foo', 3), ('aol', 1)])
>>> dict(my_dictionary)
{'foo': 3, 'aol': 1}

Note that this answer applies to python versions prior to python3.7. CPython 3.6 maintains insertion order under most circumstances as an implementation detail. Starting from Python3.7 onward, it has been declared that implementations MUST maintain insertion order to be compliant.
python dictionaries are unordered. If you want an ordered dictionary, try collections.OrderedDict.
Note that OrderedDict was introduced into the standard library in python 2.7. If you have an older version of python, you can find recipes for ordered dictionaries on ActiveState.

Dictionaries will use an order that makes searching efficient, and you cant change that,
You could just use a list of objects (a 2 element tuple in a simple case, or even a class), and append items to the end. You can then use linear search to find items in it.
Alternatively you could create or use a different data structure created with the intention of maintaining order.

I came across this post while trying to figure out how to get OrderedDict to work. PyDev for Eclipse couldn't find OrderedDict at all, so I ended up deciding to make a tuple of my dictionary's key values as I would like them to be ordered. When I needed to output my list, I just iterated through the tuple's values and plugged the iterated 'key' from the tuple into the dictionary to retrieve my values in the order I needed them.
example:
test_dict = dict( val1 = "hi", val2 = "bye", val3 = "huh?", val4 = "what....")
test_tuple = ( 'val1', 'val2', 'val3', 'val4')
for key in test_tuple: print(test_dict[key])
It's a tad cumbersome, but I'm pressed for time and it's the workaround I came up with.
note: the list of lists approach that somebody else suggested does not really make sense to me, because lists are ordered and indexed (and are also a different structure than dictionaries).

You can't really do what you want with a dictionary. You already have the dictionary d = {'ac':33, 'gw':20, 'ap':102, 'za':321, 'bs':10}created. I found there was no way to keep in order once it is already created. What I did was make a json file instead with the object:
{"ac":33,"gw":20,"ap":102,"za":321,"bs":10}
I used:
r = json.load(open('file.json'), object_pairs_hook=OrderedDict)
then used:
print json.dumps(r)
to verify.

from collections import OrderedDict
list1 = ['k1', 'k2']
list2 = ['v1', 'v2']
new_ordered_dict = OrderedDict(zip(list1, list2))
print new_ordered_dict
# OrderedDict([('k1', 'v1'), ('k2', 'v2')])

Another alternative is to use Pandas dataframe as it guarantees the order and the index locations of the items in a dict-like structure.

I had a similar problem when developing a Django project. I couldn't use OrderedDict, because I was running an old version of python, so the solution was to use Django's SortedDict class:
https://code.djangoproject.com/wiki/SortedDict
e.g.,
from django.utils.datastructures import SortedDict
d2 = SortedDict()
d2['b'] = 1
d2['a'] = 2
d2['c'] = 3
Note: This answer is originally from 2011. If you have access to Python version 2.7 or higher, then you should have access to the now standard collections.OrderedDict, of which many examples have been provided by others in this thread.

Generally, you can design a class that behaves like a dictionary, mainly be implementing the methods __contains__, __getitem__, __delitem__, __setitem__ and some more. That class can have any behaviour you like, for example prividing a sorted iterator over the keys ...

if you would like to have a dictionary in a specific order, you can also create a list of lists, where the first item will be the key, and the second item will be the value
and will look like this
example
>>> list =[[1,2],[2,3]]
>>> for i in list:
... print i[0]
... print i[1]
1
2
2
3

You can do the same thing which i did for dictionary.
Create a list and empty dictionary:
dictionary_items = {}
fields = [['Name', 'Himanshu Kanojiya'], ['email id', 'hima#gmail.com']]
l = fields[0][0]
m = fields[0][1]
n = fields[1][0]
q = fields[1][1]
dictionary_items[l] = m
dictionary_items[n] = q
print dictionary_items

Python: how to dump the values of a dict to a list respecting the original visualization? [duplicate]

I have a dictionary that I declared in a particular order and want to keep it in that order all the time. The keys/values can't really be kept in order based on their value, I just want it in the order that I declared it.
So if I have the dictionary:
d = {'ac': 33, 'gw': 20, 'ap': 102, 'za': 321, 'bs': 10}
It isn't in that order if I view it or iterate through it. Is there any way to make sure Python will keep the explicit order that I declared the keys/values in?

from collections import OrderedDict
OrderedDict((word, True) for word in words)
contains
OrderedDict([('He', True), ('will', True), ('be', True), ('the', True), ('winner', True)])
If the values are True (or any other immutable object), you can also use:
OrderedDict.fromkeys(words, True)

Rather than explaining the theoretical part, I'll give a simple example.
>>> from collections import OrderedDict
>>> my_dictionary=OrderedDict()
>>> my_dictionary['foo']=3
>>> my_dictionary['aol']=1
>>> my_dictionary
OrderedDict([('foo', 3), ('aol', 1)])
>>> dict(my_dictionary)
{'foo': 3, 'aol': 1}

Note that this answer applies to python versions prior to python3.7. CPython 3.6 maintains insertion order under most circumstances as an implementation detail. Starting from Python3.7 onward, it has been declared that implementations MUST maintain insertion order to be compliant.
python dictionaries are unordered. If you want an ordered dictionary, try collections.OrderedDict.
Note that OrderedDict was introduced into the standard library in python 2.7. If you have an older version of python, you can find recipes for ordered dictionaries on ActiveState.

Dictionaries will use an order that makes searching efficient, and you cant change that,
You could just use a list of objects (a 2 element tuple in a simple case, or even a class), and append items to the end. You can then use linear search to find items in it.
Alternatively you could create or use a different data structure created with the intention of maintaining order.

I came across this post while trying to figure out how to get OrderedDict to work. PyDev for Eclipse couldn't find OrderedDict at all, so I ended up deciding to make a tuple of my dictionary's key values as I would like them to be ordered. When I needed to output my list, I just iterated through the tuple's values and plugged the iterated 'key' from the tuple into the dictionary to retrieve my values in the order I needed them.
example:
test_dict = dict( val1 = "hi", val2 = "bye", val3 = "huh?", val4 = "what....")
test_tuple = ( 'val1', 'val2', 'val3', 'val4')
for key in test_tuple: print(test_dict[key])
It's a tad cumbersome, but I'm pressed for time and it's the workaround I came up with.
note: the list of lists approach that somebody else suggested does not really make sense to me, because lists are ordered and indexed (and are also a different structure than dictionaries).

You can't really do what you want with a dictionary. You already have the dictionary d = {'ac':33, 'gw':20, 'ap':102, 'za':321, 'bs':10}created. I found there was no way to keep in order once it is already created. What I did was make a json file instead with the object:
{"ac":33,"gw":20,"ap":102,"za":321,"bs":10}
I used:
r = json.load(open('file.json'), object_pairs_hook=OrderedDict)
then used:
print json.dumps(r)
to verify.

from collections import OrderedDict
list1 = ['k1', 'k2']
list2 = ['v1', 'v2']
new_ordered_dict = OrderedDict(zip(list1, list2))
print new_ordered_dict
# OrderedDict([('k1', 'v1'), ('k2', 'v2')])

Another alternative is to use Pandas dataframe as it guarantees the order and the index locations of the items in a dict-like structure.

I had a similar problem when developing a Django project. I couldn't use OrderedDict, because I was running an old version of python, so the solution was to use Django's SortedDict class:
https://code.djangoproject.com/wiki/SortedDict
e.g.,
from django.utils.datastructures import SortedDict
d2 = SortedDict()
d2['b'] = 1
d2['a'] = 2
d2['c'] = 3
Note: This answer is originally from 2011. If you have access to Python version 2.7 or higher, then you should have access to the now standard collections.OrderedDict, of which many examples have been provided by others in this thread.

Generally, you can design a class that behaves like a dictionary, mainly be implementing the methods __contains__, __getitem__, __delitem__, __setitem__ and some more. That class can have any behaviour you like, for example prividing a sorted iterator over the keys ...

if you would like to have a dictionary in a specific order, you can also create a list of lists, where the first item will be the key, and the second item will be the value
and will look like this
example
>>> list =[[1,2],[2,3]]
>>> for i in list:
... print i[0]
... print i[1]
1
2
2
3

You can do the same thing which i did for dictionary.
Create a list and empty dictionary:
dictionary_items = {}
fields = [['Name', 'Himanshu Kanojiya'], ['email id', 'hima#gmail.com']]
l = fields[0][0]
m = fields[0][1]
n = fields[1][0]
q = fields[1][1]
dictionary_items[l] = m
dictionary_items[n] = q
print dictionary_items

Are sets internally sorted, or is the str method displaying a sorted list?

I have a set, I add items (ints) to it, and when I print it, the items apparently are sorted:
a = set()
a.add(3)
a.add(2)
a.add(4)
a.add(1)
a.add(5)
print a
# set([1, 2, 3, 4, 5])
I have tried with various values, apparently it needs to be only ints.
I run Python 2.7.5 under MacOSX. It is also reproduced using repl.it (see http://repl.it/TpV)
The question is: is this documented somewhere (haven't find it so far), is it normal, is it something that can be relied on?
Extra question: when is the sort done? during the print? is it internally stored sorted? (is that even possible given the expected constant complexity of insertion?)

This is a coincidence. The data is neither sorted nor does __str__ sort.
The hash values for integers equal their value (except for -1 and long integers outside the sys.maxint range), which increases the chance that integers are slotted in order, but that's not a given.
set uses a hash table to track items contained, and ordering depends on the hash value, and insertion and deletion history.
The how and why of the interaction between integers and sets are all implementation details, and can easily vary from version to version. Python 3.3 introduced hash randomisation for certain types, and Python 3.4 expanded on this, making ordering of sets and dictionaries volatile between Python process restarts too (depending on the types of values stored).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.