Google said: primitive data types include Integers, Float, Strings and Boolean, the non-primitive data types are Array, List, Tuples, Dictionary, Sets and Files
But Google also said that Python don't have such thing called Primative or Non primative. I'm quited confusing right now.
When I run the code help(int), help(str), help(list), i found out that every data types in python have some build in methods in it, which looks like an object. Does it means that all data types in Python are non primative?
Imagine the following problem: you have a dictionary of some content in python and want to generate python code that would create this dict. (which is like eval but in reverse)
Is there something that can do this?
Scenario:
I am working with a remote python interpreter. I can give source files to it but no input. So I am now looking for a way to encode my input data into a python source file.
Example:
d = {'a': [1,4,7]}
str_d = reverse_eval(d)
# "{'a': [1, 4, 7]}"
eval(str_d) == d
repr(thing)
will output text that when executed will (in most cases) reproduce the dictionary.
Actually, it's important for which types of data do you want this reverse function to exist. If you're talking about built-in/standard classes, usually their .__repr__() method returns the code you want to access. But if your goal is to save something in a human-readable format, but to use an eval-like function to use this data in python, there is a json library.
It's better to use json for this reason because using eval is not safe.
Json's problem is that it can't save any type of data, it can save only standard objects, but if we're talking about not built-in types of data, you never know, what is at their .__repr__(), so there's no way to use repr-eval with this kind of data
So, there is no reverse function for all types of data, you can use repr-eval for built-in, but for built-in data the json library is better at least because it's safe
I have a dictionary whose keys are tuples like (int, str, int, str, int), and the corresponding values are lists of floats of the same size.
I pickled the dictionary twice by the same script:
import pickle
with open(name, 'wb') as source:
pickle.dump(the_dict, source)
For the two resulting binary files test_1 and test_2, I run
diff test_1 test_2
in a terminal (I'm using macOS) to see whether I can use diff to tell the difference. However, I received
Binary files test_1 and test_2 differ
Why? Was the same dictionary being pickled in different ways? Does it mean I cannot use diff to tell whether two dictionaries are identical?
Depending on what version of Python you are using, Python versions before v3.6 do not remember the order of insertion. Python v3.6 made this an implementation detail and v3.7 made it a language feature.
For backwards compatibility, you shouldn't depend on the dictionary remembering the order of inserted keys. Instead, you can use OrderedDict from the Collections module.
Also, using diff on pickled dict data may show differences in the data even though the actual dictionaries are equivalent -- since dicts, unlike lists, generally make no assurances on order state (see above for when that is not the case).
Can I expect the string representation of the same pickled dict to be consistent across different machines/runs for the same Python version?
In the scope of one run on the same machine?
e.g.
# Python 2.7
import pickle
initial = pickle.dumps({'a': 1, 'b': 2})
for _ in xrange(1000**2):
assert pickle.dumps({'a': 1, 'b': 2}) == initial
Does it depend on the actual structure of my dict object (nested values etc.)?
UPD:
The thing is - I can't actually make the code above fail in the scope of one run (Python 2.7) no matter how my dict object looks like (what keys/values etc.)
You can't in the general case, for the same reasons you can't rely on the dictionary order in other scenarios; pickling is not special here. The string representation of a dictionary is a function of the current dictionary iteration order, regardless of how you loaded it.
Your own small test is too limited, because it doesn't do any mutation of the test dictionary and doesn't use keys that would cause collisions. You create dictionaries with the exact same Python source code, so those will produce the same output order because the editing history of the dictionaries is exactly the same, and two single-character keys that use consecutive letters from the ASCII character set are not likely to cause a collision.
Not that you actually test string representations being equal, you only test if their contents are the same (two dictionaries that differ in string representation can still be equal because the same key-value pairs, subjected to a different insertion order, can produce different dictionary output order).
Next, the most important factor in the dictionary iteration order before cPython 3.6 is the hash key generation function, which must be stable during a single Python executable lifetime (or otherwise you'd break all dictionaries), so a single-process test would never see dictionary order change on the basis of different hash function results.
Currently, all pickling protocol revisions store the data for a dictionary as a stream of key-value pairs; on loading the stream is decoded and key-value pairs are assigned back to the dictionary in the on-disk order, so the insertion order is at least stable from that perspective. BUT between different Python versions, machine architectures and local configuration, the hash function results absolutely will differ:
The PYTHONHASHSEED environment variable, is used in the generation of hashes for str, bytes and datetime keys. The setting is available as of Python 2.6.8 and 3.2.3, and is enabled and set to random by default as of Python 3.3. So the setting varies from Python version to Python version, and can be set to something different locally.
The hash function produces a ssize_t integer, a platform-dependent signed integer type, so different architectures can produce different hashes just because they use a larger or smaller ssize_t type definition.
With different hash function output from machine to machine and from Python run to Python run, you will see different string representations of a dictionary.
And finally, as of cPython 3.6, the implementation of the dict type changed to a more compact format that also happens to preserve insertion order. As of Python 3.7, the language specification has changed to make this behaviour mandatory, so other Python implementations have to implement the same semantics. So pickling and unpickling between different Python implementations or versions predating Python 3.7 can also result in a different dictionary output order, even with all other factors equal.
No, you cannot. This depends on lot of things, including key values, interpreter state and python version.
If you need consistent representation, consider using JSON with canonical form.
EDIT
I'm not quite sure why people downvoting this without any comments, but I'll clarify.
pickle is not meant to produce reliable representations, its pure machine-(not human-) readable serializer.
Python version backward/forward compatibility is a thing, but it applies only for ability to deserialize identic object inside interpreter — i.e. when you dump in one version and load in another, it's guaranteed to have have same behaviour of same public interfaces. Neither serialized text representation or internal memory structure claimed to be the same (and IIRC, it never did).
Easiest way to check this is to dump same data in versions with significant difference in structure handling and/or seed handling while keeping your keys out of cached range (no short integers nor strings):
Python 3.5.6 (default, Oct 26 2018, 11:00:52)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> d = {'first_string_key': 1, 'second_key_string': 2}
>>> pickle.dump
>>> pickle.dumps(d)
b'\x80\x03}q\x00(X\x11\x00\x00\x00second_key_stringq\x01K\x02X\x10\x00\x00\x00first_string_keyq\x02K\x01u.'
Python 3.6.7 (default, Oct 26 2018, 11:02:59)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> d = {'first_string_key': 1, 'second_key_string': 2}
>>> pickle.dumps(d)
b'\x80\x03}q\x00(X\x10\x00\x00\x00first_string_keyq\x01K\x01X\x11\x00\x00\x00second_key_stringq\x02K\x02u.'
Python2 dictinaries are unordered; the order depends on the hash values of keys as explained in this great answer by Martijn Pieters. I don't think you can use a dict here, but you could use an OrderedDict (requires Python 2.7 or higher) which maintains the order of the keys. For example,
from collections import OrderedDict
data = [('b', 0), ('a', 0)]
d = dict(data)
od = OrderedDict(data)
print(d)
print(od)
#{'a': 0, 'b': 0}
#OrderedDict([('b', 0), ('a', 0)])
You can pickle an OrderedDict like you would pickle a dict, but order would be preserved, and the resulting string would be the same when pickling same objects.
from collections import OrderedDict
import pickle
data = [('a', 1), ('b', 2)]
od = OrderedDict(data)
s = pickle.dumps(od)
print(s)
Note that you shouldn't pass a dict in OrderedDict's constructor as the keys would be already placed. If you have a dictionary, you should first convert it to tuples with the desired order. OrderedDict is a subclass of dict and has all the dict methods, so you could create an empty object and assign new keys.
Your test doesn't fail because you're using the same Python version and the same conditions - the order of the dictionary will not change randomly between loop iterations. But we can demonstrate how your code fails to produce differend strings when we change the order of keys in the dictionary.
import pickle
initial = pickle.dumps({'a': 1, 'b': 2})
assert pickle.dumps({'b': 2, 'a': 1}) != initial
The resulting string should be different when we put key 'b' first (it would be different in Python >= 3.6), but in Python2 it's the same because key 'a' is placed before key 'b'.
To answer your main question, Python2 dictionaries are unordered, but a dictionary is likely to have the same order when using the same code and Python version. However that order may not be the same as the order in which you placed the items in the dictionary. If the order is important it's best to use an OrderedDict or update your Python version.
As with a frustratingly large number of things in Python, the answer is "sort of". Straight from the docs,
The pickle serialization format is guaranteed to be backwards compatible across Python releases.
That's potentially ever so subtly different from what you're asking. If it's a valid pickled dictionary now, it'll always be a valid pickled dictionary, and it'll always deserialize to the correct dictionary. That leaves unspoken a few properties which you might expect and which don't have to hold:
Pickling doesn't have to be deterministic, even for the same object in the same Python instance on the same platform. The same dictionary could have infinitely many possible pickled representations (not that we would expect the format to ever be inefficient enough to support arbitrarily large degrees of extra padding). As the other answers point out, dictionaries don't have a defined sort order, and this can give at least n! string representations of a dictionary with n elements.
Going further with the last point, it isn't guaranteed that pickle is consistent even in a single Python instance. In practice those changes don't currently happen, but that behavior isn't guaranteed to remain in future versions of Python.
Future versions of Python don't need to serialize dictionaries in a way which is compatible with current versions. The only promise we have is that they will be able to correctly deserialize our dictionaries. Currently dictionaries are supported the same in all Pickle formats, but that need not remain the case forever (not that I suspect it would ever change).
If you don't modify the dict its string representation won't change during a given run of the program, and its .keys method will return the keys in the same order. However, the order can change from run to run (before Python 3.6).
Also, two different dict objects that have identical key-value pairs are not guaranteed to use the same order (pre Python 3.6).
BTW, it's not a good idea to shadow a module name with your own variables, like you do with that lambda. It makes the code harder to read, and will lead to confusing error messages if you forget that you shadowed the module & try to access some other name from it later in the program.
I'm working on a Mpris V2.1 interface with python.
The interfaces are described in the document:
http://www.mpris.org/2.1/spec/Playlists.html#Property:ActivePlaylist
The signature shows it's complex type contains boolean, object and strings. I just wonder how to represent the type in python. Do I have a provider a list or tuple contains each element ? I've tested it but seems not work.
According to D-Bus specification, (b(oss)) is a struct of two elements, first is a boolean, second is a struct of three elements: an object path and two strings. In python this maps to something like:
dbus.Struct((dbus.Boolean(a_boolean),
dbus.Struct((dbus.ObjectPath(s1),
dbus.String(s2),
dbus.String(s3)))),
signature="(b(oss))")
but it can be used as if it was simply a python tuple like:
( a_boolean, (s1, s2, s3) )
Are you writing a client or a server? In the latter case you should also check this question which provides details on exporting properties using python dbus module.