how to deserialize a python printed dictionary? - python

I have python's str dictionary representations in a database as varchars, and I want to retrieve the original python dictionaries
How to have a dictionary again, based in the str representation of a dictionay?
Example
>>> dic = {u'key-a':u'val-a', "key-b":"val-b"}
>>> dicstr = str(dic)
>>> dicstr
"{'key-b': 'val-b', u'key-a': u'val-a'}"
In the example would be turning dicstr back into a usable python dictionary.

Use ast.literal_eval() and for such cases prefer repr() over str(), as str() doesn't guarantee that the string can be converted back to useful object.
In [7]: import ast
In [10]: dic = {u'key-a':u'val-a', "key-b":"val-b"}
In [11]: strs = repr(dic)
In [12]: strs
Out[12]: "{'key-b': 'val-b', u'key-a': u'val-a'}"
In [13]: ast.literal_eval(strs)
Out[13]: {u'key-a': u'val-a', 'key-b': 'val-b'}

You can use eval() or ast.literal_eval(). Most repr() strings can be evaluated back into the original object:
>>> import ast
>>> ast.literal_eval("{'key-b': 'val-b', u'key-a': u'val-a'}")
{'key-b': 'val-b', u'key-a': u'val-a'}

ast.literal_eval could be the way to do it for simple dicts, BUT you should probably rethink your design and NOT save such text in database at first place. e.g.
import collections
d = {'a':1, 'b': collections.defaultdict()}
import ast
print ast.literal_eval(repr(d))
This will not work and throw ValueError('malformed string') basically you won't be convert back dict if it contains any non basic types.
Better way is to dump dict using pickle or json or something like that e.g.
import collections
d = {'a':1, 'b': collections.defaultdict()}
import json
print json.loads(json.dumps(d))
Summary: serialize using repr, deserialize using ast.literal_eval is BAD, serialize using json.dumps and deserialize using json.loads is GOOD

Related

Converting string with leading-zero integer to json

I convert a string to a json-object using the json-library:
a = '{"index":1}'
import json
json.loads(a)
{'index': 1}
However, if I instead change the string a to contain a leading 0, then it breaks down:
a = '{"index":01}'
import json
json.loads(a)
>>> JSONDecodeError: Expecting ',' delimiter
I believe this is due to the fact that it is invalid JSON if an integer begins with a leading zero as described in this thread.
Is there a way to remedy this? If not, then I guess the best way is to remove any leading zeroes by a regex from the string first, then convert to json?
A leading 0 in a number literal in JSON is invalid unless the number literal is only the character 0 or starts with 0.. The Python json module is quite strict in that it will not accept such number literals. In part because a leading 0 is sometimes used to denote octal notation rather than decimal notation. Deserialising such numbers could lead to unintended programming errors. That is, should 010 be parsed as the number 8 (in octal notation) or as 10 (in decimal notation).
You can create a decoder that will do what you want, but you will need to heavily hack the json module or rewrite much of its internals. Either way, you will see a performance slow down as you will no longer be using the C implementation of the module.
Below is an implementation that can decode JSON which contains numbers with any number of leading zeros.
import json
import re
import threading
# a more lenient number regex (modified from json.scanner.NUMBER_RE)
NUMBER_RE = re.compile(
r'(-?(?:\d*))(\.\d+)?([eE][-+]?\d+)?',
(re.VERBOSE | re.MULTILINE | re.DOTALL))
# we are going to be messing with the internals of `json.scanner`. As such we
# want to return it to its initial state when we're done with it, but we need to
# do so in a thread safe way.
_LOCK = threading.Lock()
def thread_safe_py_make_scanner(context, *, number_re=json.scanner.NUMBER_RE):
with _LOCK:
original_number_re = json.scanner.NUMBER_RE
try:
json.scanner.NUMBER_RE = number_re
return json.scanner._original_py_make_scanner(context)
finally:
json.scanner.NUMBER_RE = original_number_re
json.scanner._original_py_make_scanner = json.scanner.py_make_scanner
json.scanner.py_make_scanner = thread_safe_py_make_scanner
class MyJsonDecoder(json.JSONDecoder):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
# overwrite the stricter scan_once implementation
self.scan_once = json.scanner.py_make_scanner(self, number_re=NUMBER_RE)
d = MyJsonDecoder()
n = d.decode('010')
assert n == 10
json.loads('010') # check the normal route still raise an error
I would stress that you shouldn't rely on this as a proper solution. Rather, it's a quick hack to help you decode malformed JSON that is nearly, but not quite valid. It's useful if recreating the JSON in a valid form is not possible for some reason.
First, using regex on JSON is evil, almost as bad as killing a kitten.
If you want to represent 01 as a valid JSON value, then consider using this structure:
a = '{"index" : "01"}'
import json
json.loads(a)
If you need the string literal 01 to behave like a number, then consider just casting it to an integer in your Python script.
How to convert string int JSON into real int with json.loads
Please see the post above
You need to use your own version of Decoder.
More information can be found here , in the github
https://github.com/simplejson/simplejson/blob/master/index.rst
c = '{"value": 02}'
value= json.loads(json.dumps(c))
print(value)
This seems to work .. It is strange
> >>> c = '{"value": 02}'
> >>> import json
> >>> value= json.loads(json.dumps(c))
> >>> print(value) {"value": 02}
> >>> c = '{"value": 0002}'
> >>> value= json.loads(json.dumps(c))
> >>> print(value) {"value": 0002}
As #Dunes, pointed out the loads produces string as an outcome which is not a valid solution.
However,
DEMJSON seems to decode it properly.
https://pypi.org/project/demjson/ -- alternative way
>>> c = '{"value": 02}'
>>> import demjson
>>> demjson.decode(c)
{'value': 2}

Convert tuple-strings to tuple of strings

My Input is:
input = ['(var1, )', '(var2,var3)']
Expected Output is:
output = [('var1', ), ('var2','var3')]
Iterating over input and using eval/literal_eval on the tuple-strings is not possible:
>>> eval('(var1, )')
>>> NameError: name 'var1' is not defined
How can I convert an item such as '(var1, )' to a tuple where the inner objects are treated as strings instead of variables?
Is there a simpler way than writing a parser or using regex?
For each occurrence of a variable, eval searches the symbol table for the name of the variable. It's possible to provide a custom mapping that will return the key name for every missing key:
class FakeNamespace(dict):
def __missing__(self, key):
return key
Example:
In [38]: eval('(var1,)', FakeNamespace())
Out[38]: ('var1',)
In [39]: eval('(var2, var3)', FakeNamespace())
Out[39]: ('var2', 'var3')
Note: eval copies current globals to the submitted globals dictionary, if it doesn't have __builtins__. That means that the expression will have access to built-in functions, exceptions and constants, as well as variables in your namespace. You can try to solve this by passing FakeNamespace(__builtins__=<None or some other value>) instead of just FakeNamespace(), but it won't make eval 100% safe (Python eval: is it still dangerous if I disable builtins and attribute access?)
Try this:
tuples = [tuple(filter(None, t.strip('()').strip().split(','))) for t in input]
For example:
In [16]: tuples = [tuple(filter(None, t.strip('()').strip().split(','))) for t in input]
In [17]: tuples
Out[17]: [('var1',), ('var2', 'var3')]
We're iterating through our list of tuple strings, and for each one, removing the ()s, then splitting our string into a list by the ,, and then converting our list back into a tuple. We use filter() to remove empty elements.
I like vaultah's solution. Here's another one with ast.literal_eval and re if eval is not an option:
>>> import re
>>> from ast import literal_eval
>>> [literal_eval(re.sub('(?<=\(|,)(\w+)(?=\)|,)', r'"\1"', x)) for x in input]
[('var1',), ('var2', 'var3')]

Convert list to comma-delimited string

The following is the list that I have:
>>> issue_search
[<JIRA Issue: key=u'NEC-1519', id=u'991356'>, <JIRA Issue: key=u'NEC-1516', id=u'991344'>, <JIRA Issue: key=u'NEC-1518', id=u'990463'>]
>>>
I was using the following:
issue_string = ','.join(map(str, issue_search))
But the output is:
NEC-1519, NEC-1516, NEC-1518
I am confused on the output. How is only the key getting displayed? How to get the other text too in the string format?
What you see in the list is the values returned by each object's __repr__ method. If you want these values, map the list to repr instead of str:
issue_string = ','.join(map(repr, issue_search))
Below is a demonstration with decimal.Decimal:
>>> from decimal import Decimal
>>> lst = [Decimal('1.2'), Decimal('3.4'), Decimal('5.6')]
>>> lst
[Decimal('1.2'), Decimal('3.4'), Decimal('5.6')]
>>> print ','.join(map(str, lst))
1.2,3.4,5.6
>>> print ','.join(map(repr, lst))
Decimal('1.2'),Decimal('3.4'),Decimal('5.6')
>>>
You are calling str on the objects within issue_search before joining them. So obviously, the call of str on a “JIRA Issue” will only result in the key.
The return value of str is determined by an object’s __str__ method which is likely defined in the described way for the “JIRA Issue” type. If you cannot change the method, you could also call repr on the objects instead, or specify a custom format function:
>>> ', '.join(map(lambda x: '{} ({})'.format(x.key, x.id), issue_search))
'NEC-1519 (991356), NEC-1516 (991344), NEC-1518 (990463)'

How do I format a string using a dictionary in python-3.x?

I am a big fan of using dictionaries to format strings. It helps me read the string format I am using as well as let me take advantage of existing dictionaries. For example:
class MyClass:
def __init__(self):
self.title = 'Title'
a = MyClass()
print 'The title is %(title)s' % a.__dict__
path = '/path/to/a/file'
print 'You put your file here: %(path)s' % locals()
However I cannot figure out the python 3.x syntax for doing the same (or if that is even possible). I would like to do the following
# Fails, KeyError 'latitude'
geopoint = {'latitude':41.123,'longitude':71.091}
print '{latitude} {longitude}'.format(geopoint)
# Succeeds
print '{latitude} {longitude}'.format(latitude=41.123,longitude=71.091)
Is this good for you?
geopoint = {'latitude':41.123,'longitude':71.091}
print('{latitude} {longitude}'.format(**geopoint))
To unpack a dictionary into keyword arguments, use **. Also,, new-style formatting supports referring to attributes of objects and items of mappings:
'{0[latitude]} {0[longitude]}'.format(geopoint)
'The title is {0.title}s'.format(a) # the a from your first example
As Python 3.0 and 3.1 are EOL'ed and no one uses them, you can and should use str.format_map(mapping) (Python 3.2+):
Similar to str.format(**mapping), except that mapping is used directly and not copied to a dict. This is useful if for example mapping is a dict subclass.
What this means is that you can use for example a defaultdict that would set (and return) a default value for keys that are missing:
>>> from collections import defaultdict
>>> vals = defaultdict(lambda: '<unset>', {'bar': 'baz'})
>>> 'foo is {foo} and bar is {bar}'.format_map(vals)
'foo is <unset> and bar is baz'
Even if the mapping provided is a dict, not a subclass, this would probably still be slightly faster.
The difference is not big though, given
>>> d = dict(foo='x', bar='y', baz='z')
then
>>> 'foo is {foo}, bar is {bar} and baz is {baz}'.format_map(d)
is about 10 ns (2 %) faster than
>>> 'foo is {foo}, bar is {bar} and baz is {baz}'.format(**d)
on my Python 3.4.3. The difference would probably be larger as more keys are in the dictionary, and
Note that the format language is much more flexible than that though; they can contain indexed expressions, attribute accesses and so on, so you can format a whole object, or 2 of them:
>>> p1 = {'latitude':41.123,'longitude':71.091}
>>> p2 = {'latitude':56.456,'longitude':23.456}
>>> '{0[latitude]} {0[longitude]} - {1[latitude]} {1[longitude]}'.format(p1, p2)
'41.123 71.091 - 56.456 23.456'
Starting from 3.6 you can use the interpolated strings too:
>>> f'lat:{p1["latitude"]} lng:{p1["longitude"]}'
'lat:41.123 lng:71.091'
You just need to remember to use the other quote characters within the nested quotes. Another upside of this approach is that it is much faster than calling a formatting method.
print("{latitude} {longitude}".format(**geopoint))
Since the question is specific to Python 3, here's using the new f-string syntax, available since Python 3.6:
>>> geopoint = {'latitude':41.123,'longitude':71.091}
>>> print(f'{geopoint["latitude"]} {geopoint["longitude"]}')
41.123 71.091
Note the outer single quotes and inner double quotes (you could also do it the other way around).
The Python 2 syntax works in Python 3 as well:
>>> class MyClass:
... def __init__(self):
... self.title = 'Title'
...
>>> a = MyClass()
>>> print('The title is %(title)s' % a.__dict__)
The title is Title
>>>
>>> path = '/path/to/a/file'
>>> print('You put your file here: %(path)s' % locals())
You put your file here: /path/to/a/file
geopoint = {'latitude':41.123,'longitude':71.091}
# working examples.
print(f'{geopoint["latitude"]} {geopoint["longitude"]}') # from above answer
print('{geopoint[latitude]} {geopoint[longitude]}'.format(geopoint=geopoint)) # alternate for format method (including dict name in string).
print('%(latitude)s %(longitude)s'%geopoint) # thanks #tcll
Use format_map to do what you want
print('{latitude} {longitude}'.format_map(geopoint))
This has the advantage that
the dictionary does not have to be blown up into parameters (compared to **geopoint) and that
the format string only has access to the provided map and not the entire scope of variables (compared to F-strings).
Most answers formatted only the values of the dict.
If you want to also format the key into the string you can use dict.items():
geopoint = {'latitude':41.123,'longitude':71.091}
print("{} {}".format(*geopoint.items()))
Output:
('latitude', 41.123) ('longitude', 71.091)
If you want to format in an arbitry way, that is, not showing the key-values like tuples:
from functools import reduce
print("{} is {} and {} is {}".format(*reduce((lambda x, y: x + y), [list(item) for item in geopoint.items()])))
Output:
latitude is 41.123 and longitude is 71.091

How do I serialize a Python dictionary into a string, and then back to a dictionary?

How do I serialize a Python dictionary into a string, and then back to a dictionary? The dictionary will have lists and other dictionaries inside it.
It depends on what you're wanting to use it for. If you're just trying to save it, you should use pickle (or, if you’re using CPython 2.x, cPickle, which is faster).
>>> import pickle
>>> pickle.dumps({'foo': 'bar'})
b'\x80\x03}q\x00X\x03\x00\x00\x00fooq\x01X\x03\x00\x00\x00barq\x02s.'
>>> pickle.loads(_)
{'foo': 'bar'}
If you want it to be readable, you could use json:
>>> import json
>>> json.dumps({'foo': 'bar'})
'{"foo": "bar"}'
>>> json.loads(_)
{'foo': 'bar'}
json is, however, very limited in what it will support, while pickle can be used for arbitrary objects (if it doesn't work automatically, the class can define __getstate__ to specify precisely how it should be pickled).
>>> pickle.dumps(object())
b'\x80\x03cbuiltins\nobject\nq\x00)\x81q\x01.'
>>> json.dumps(object())
Traceback (most recent call last):
...
TypeError: <object object at 0x7fa0348230c0> is not JSON serializable
Pickle is great but I think it's worth mentioning literal_eval from the ast module for an even lighter weight solution if you're only serializing basic python types. It's basically a "safe" version of the notorious eval function that only allows evaluation of basic python types as opposed to any valid python code.
Example:
>>> d = {}
>>> d[0] = range(10)
>>> d['1'] = {}
>>> d['1'][0] = range(10)
>>> d['1'][1] = 'hello'
>>> data_string = str(d)
>>> print data_string
{0: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], '1': {0: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 1: 'hello'}}
>>> from ast import literal_eval
>>> d == literal_eval(data_string)
True
One benefit is that the serialized data is just python code, so it's very human friendly. Compare it to what you would get with pickle.dumps:
>>> import pickle
>>> print pickle.dumps(d)
(dp0
I0
(lp1
I0
aI1
aI2
aI3
aI4
aI5
aI6
aI7
aI8
aI9
asS'1'
p2
(dp3
I0
(lp4
I0
aI1
aI2
aI3
aI4
aI5
aI6
aI7
aI8
aI9
asI1
S'hello'
p5
ss.
The downside is that as soon as the the data includes a type that is not supported by literal_ast you'll have to transition to something else like pickling.
Use Python's json module, or simplejson if you don't have python 2.6 or higher.
If you fully trust the string and don't care about python injection attacks then this is very simple solution:
d = { 'method' : "eval", 'safe' : False, 'guarantees' : None }
s = str(d)
d2 = eval(s)
for k in d2:
print k+"="+d2[k]
If you're more safety conscious then ast.literal_eval is a better bet.
One thing json cannot do is dict indexed with numerals. The following snippet
import json
dictionary = dict({0:0, 1:5, 2:10})
serialized = json.dumps(dictionary)
unpacked = json.loads(serialized)
print(unpacked[0])
will throw
KeyError: 0
Because keys are converted to strings. cPickle preserves the numeric type and the unpacked dict can be used right away.
pyyaml should also be mentioned here. It is both human readable and can serialize any python object.
pyyaml is hosted here:
https://pypi.org/project/PyYAML
While not strictly serialization, json may be reasonable approach here. That will handled nested dicts and lists, and data as long as your data is "simple": strings, and basic numeric types.
A new alternative to JSON or YaML is NestedText. It supports strings that are nested in lists and dictionaries to any depth. It conveys nesting through the use of indenting, and so has no need for either quoting or escaping. As such, the result tends to be very readable. The result looks like YaML, but without all the special cases. It is especially appropriate for serializing code snippets. For example, here is an a single test case extracted from a much larger set that was serialized with NestedText:
base tests:
-
args: --quiet --config test7 files -N configs/subdir
expected:
> Archive: test7-\d\d\d\d-\d\d-\d\dT\d\d:\d\d:\d\d
> «TESTS»/configs/subdir/
> «TESTS»/configs/subdir/file
Be aware, that integers, floats, and bools are converted to strings.
If you are trying to only serialize then pprint may also be a good option. It requires the object to be serialized and a file stream.
Here's some code:
from pprint import pprint
my_dict = {1:'a',2:'b'}
with open('test_results.txt','wb') as f:
pprint(my_dict,f)
I am not sure if we can deserialize easily. I was using json to serialize and deserialze earlier which works correctly in most cases.
f.write(json.dumps(my_dict, sort_keys = True, indent = 2, ensure_ascii=True))
However, in one particular case, there were some errors writing non-unicode data to json.

Categories

Resources