Weird TypeError from json.dumps - python

In python 3.4.0, using json.dumps() throws me a TypeError in one case but works like a charm in other case (which I think is equivalent to the first one).
I have a dict where keys are strings and values are numbers and other dicts (i.e. something like {'x': 1.234, 'y': -5.678, 'z': {'a': 4, 'b': 0, 'c': -6}}).
This fails (the stacktrace is not from this particular code snippet but from my larger script which I won't paste here but it is essentialy the same):
>>> x = dict(foo()) # obtain the data and make a new dict of it to really be sure
>>> import json
>>> json.dumps(x)
Traceback (most recent call last):
File "/mnt/data/gandalv/progs/pycharm-3.4/helpers/pydev/pydevd.py", line 1733, in <module>
debugger.run(setup['file'], None, None)
File "/mnt/data/gandalv/progs/pycharm-3.4/helpers/pydev/pydevd.py", line 1226, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/mnt/data/gandalv/progs/pycharm-3.4/helpers/pydev/_pydev_execfile.py", line 38, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc) #execute the script
File "/mnt/data/gandalv/School/PhD/Other work/Krachy/code/recalculate.py", line 54, in <module>
ls[1] = json.dumps(f)
File "/usr/lib/python3.4/json/__init__.py", line 230, in dumps
return _default_encoder.encode(obj)
File "/usr/lib/python3.4/json/encoder.py", line 192, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/usr/lib/python3.4/json/encoder.py", line 250, in iterencode
return _iterencode(o, 0)
File "/usr/lib/python3.4/json/encoder.py", line 173, in default
raise TypeError(repr(o) + " is not JSON serializable")
TypeError: 306 is not JSON serializable
The 306 is one of the values in one of ther inner dicts in x. It is not always the same number, sometimes it is a different number contained in the dict, apparently because of the unorderedness of a dict.
However, this works like a charm:
>>> x = foo() # obtain the data and make a new dict of it to really be sure
>>> import ast
>>> import json
>>> x2 = ast.literal_eval(repr(x))
>>> x == x2
True
>>> json.dumps(x2)
"{...}" # the json representation of dict as it should be
Could anyone, please, tell me why does this happen or what could be the cause? The most confusing part is that those two dicts (the original one and the one obtained through evaluation of the representation of the original one) are equal but the dumps() function behaves differently for each of them.

The cause was that the numbers inside the dict were not ordinary python ints but numpy.in64s which are apparently not supported by the json encoder.

As you have seen, numpy int64 data types are not serializable into json directly:
>>> import numpy as np
>>> import json
>>> a=np.zeros(3, dtype=np.int64)
>>> a[0]=-9223372036854775808
>>> a[2]=9223372036854775807
>>> jstr=json.dumps(a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/Cellar/python3/3.4.1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/json/__init__.py", line 230, in dumps
return _default_encoder.encode(obj)
File "/usr/local/Cellar/python3/3.4.1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/json/encoder.py", line 192, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/usr/local/Cellar/python3/3.4.1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/json/encoder.py", line 250, in iterencode
return _iterencode(o, 0)
File "/usr/local/Cellar/python3/3.4.1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/json/encoder.py", line 173, in default
raise TypeError(repr(o) + " is not JSON serializable")
TypeError: array([-9223372036854775808, 0, 9223372036854775807]) is not JSON serializable
However, Python integers -- including longer integers -- can be serialized and deserialized:
>>> json.loads(json.dumps(2**123))==2**123
True
So with numpy, you can convert directly to Python data structures then serialize:
>>> jstr=json.dumps(a.tolist())
>>> b=np.array(json.loads(jstr))
>>> np.array_equal(a,b)
True

Related

How to make the converters keyword of genfromtxt (numpy) work in Python 3?

From the Numpy User Guide I take the following example that uses the converters keyword to format the data
from io import BytesIO
convertfunc = lambda x: float(x.strip("%"))/100
data = "1, 2.3%, 45.\n6, 78.9%, 0."
names = ('i', 'p', 'n')
a = np.genfromtxt(BytesIO(data.encode()), delimiter = ',', names = names, converters = {1 : convertfunc})
print(a)
However, this does not work in Python 3. I get an error message
Traceback (most recent call last):
File "/Users/MacBookPro/Documents/workspace/python3/learnnumpy/importingdata.py", line 46, in <module>
a = np.genfromtxt(BytesIO(data.encode()), delimiter = ',', names = names, converters = {1 : convertfunc})
File "/Users/MacBookPro/anaconda/lib/python3.4/site-packages/numpy/lib/npyio.py", line 1708, in genfromtxt
for (i, conv) in enumerate(converters)]))
File "/Users/MacBookPro/anaconda/lib/python3.4/site-packages/numpy/lib/npyio.py", line 1708, in <listcomp>
for (i, conv) in enumerate(converters)]))
File "/Users/MacBookPro/anaconda/lib/python3.4/site-packages/numpy/lib/npyio.py", line 1707, in <listcomp>
zip(*[[conv._loose_call(_r) for _r in map(itemgetter(i), rows)]
File "/Users/MacBookPro/anaconda/lib/python3.4/site-packages/numpy/lib/_iotools.py", line 668, in _loose_call
return self.func(value)
File "/Users/MacBookPro/Documents/workspace/python3/learnnumpy/importingdata.py", line 43, in <lambda>
convertfunc = lambda x: float(x.strip("%"))/100
TypeError: 'str' does not support the buffer interface
How to make this work and as a matter of fact, why does it fail exactly?
You need to change the string literal in your convertfunc to a byte-string by changing from "%" to b"%" or "%".encode()
The reason for this is that in Python 3 strings are unicode by default, whereas in Python 2 strings are bytes by default.

convert date from numpyarray into datetime type -> getting mystic error

I load a file f with the numpy.loadtxt function and wanted to extract some dates.
The date has a format like this: 15.08. - 21.08.2011
numpyarr = loadtxt(f, dtype=str, delimiter=";", skiprows = 1)
alldate = numpyarr[:,0][0]
dat = datetime.datetime.strptime(alldate,"%d.%m. - %d.%m.%Y")
And here is the whole error:
Traceback (most recent call last):
File "C:\PYTHON\Test DATETIME_2.py", line 52, in <module>
dat = datetime.datetime.strptime(alldate,"%d.%m. - %d.%m.%Y")
File "C:\Python27\lib\_strptime.py", line 308, in _strptime
format_regex = _TimeRE_cache.compile(format)
File "C:\Python27\lib\_strptime.py", line 265, in compile
return re_compile(self.pattern(format), IGNORECASE)
File "C:\Python27\lib\re.py", line 190, in compile
return _compile(pattern, flags)
File "C:\Python27\lib\re.py", line 242, in _compile
raise error, v # invalid expression
sre_constants.error: redefinition of group name 'd' as group 3; was group 1
Does somebody have an idea was going on?
A datetime holds a single date & time, while your field contains two dates and trys to extract them into a single variable. Specifically, the error you're getting is because you've used %d and %m twice.
You can try something along the lines of:
a = datetime.datetime.strptime(alldate.split('-')[0],"%d.%m. ")
b = datetime.datetime.strptime(alldate.split('-')[1]," %d.%m.%Y")
a = datetime.datetime(b.year, a.month, a.day)
(it's not the best code, but it demonstrates the fact that there are two dates in two different formats hiding in your string).

How to parse JSON files with double-quotes inside strings in Python?

I'm trying to read a JSON file in Python. Some of the lines have strings with double quotes inside:
{"Height__c": "8' 0\"", "Width__c": "2' 8\""}
Using a raw string literal produces the right output:
json.loads(r"""{"Height__c": "8' 0\"", "Width__c": "2' 8\""}""")
{u'Width__c': u'2\' 8"', u'Height__c': u'8\' 0"'}
But my string comes from a file, ie:
s = f.readline()
Where:
>>> print repr(s)
'{"Height__c": "8\' 0"", "Width__c": "2\' 8""}'
And json throws the following exception:
json.loads(s) # s = """{"Height__c": "8' 0\"", "Width__c": "2' 8\""}"""
ValueError: Expecting ',' delimiter: line 1 column 21 (char 20)
Also,
>>> s = """{"Height__c": "8' 0\"", "Width__c": "2' 8\""}"""
>>> json.loads(s)
Fails, but assigning the raw literal works:
>>> s = r"""{"Height__c": "8' 0\"", "Width__c": "2' 8\""}"""
>>> json.loads(s)
{u'Width__c': u'2\' 8"', u'Height__c': u'8\' 0"'}
Do I need to write a custom Decoder?
The data file you have does not escape the nested quotes correctly; this can be hard to repair.
If the nested quotes follow a pattern; e.g. always follow a digit and are the last character in each string you can use a regular expression to fix these up. Given your sample data, if all you have is measurements in feet and inches, that's certainly doable:
import re
from functools import partial
repair_nested = partial(re.compile(r'(\d)""').sub, r'\1\\""')
json.loads(repair_nested(s))
Demo:
>>> import json
>>> import re
>>> from functools import partial
>>> s = '{"Height__c": "8\' 0"", "Width__c": "2\' 8""}'
>>> repair_nested = partial(re.compile(r'(\d)""').sub, r'\1\\""')
>>> json.loads(s)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/json/__init__.py", line 338, in loads
return _default_decoder.decode(s)
File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/json/decoder.py", line 365, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/json/decoder.py", line 381, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Expecting , delimiter: line 1 column 21 (char 20)
>>> json.loads(repair_nested(s))
{u'Width__c': u'2\' 8"', u'Height__c': u'8\' 0"'}

Is there a way to implement **kwargs behavior when calling a Python script from the command line

Say I have a function as follows:
def foo(**kwargs):
print kwargs
And then call the function like this, I get this handy little dict of all kwargs.
>>> foo(a = 5, b = 7)
{'a': 5, 'b': 7}
I want to do this directly to scripts I call from command line. So entering this:
python script.py a = 5 b = 7
Would create a similar dict to the example above. Can this be done?
Here's what I have so far:
import sys
kwargs_raw = sys.argv[1:]
kwargs = {key:val for key, val in zip(kwargs_raw[::3], kwargs_raw[1::3])}
print kwargs
And here's what this produces:
Y:\...\Python>python test.py a = 5 b = 7
{'a': '5', 'b': '7'}
So you may be wondering why this isn't good enough
Its very structured, and thus, won't work if a or b are anything other that strings, ints, or floats.
I have no way of determining if the user intended to have 5 be an int, string, or float
I've seen ast.literal_eval() around here before, but I couldn't figure out how to get that to work. Both my attempts failed:
>>> ast.literal_eval("a = 5")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "Y:\admin\Anaconda\lib\ast.py", line 49, in literal_eval
node_or_string = parse(node_or_string, mode='eval')
File "Y:\admin\Anaconda\lib\ast.py", line 37, in parse
return compile(source, filename, mode, PyCF_ONLY_AST)
File "<unknown>", line 1
a = 5
and
>>> ast.literal_eval("{a:5,b:7}")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "Y:\admin\Anaconda\lib\ast.py", line 80, in literal_eval
return _convert(node_or_string)
File "Y:\admin\Anaconda\lib\ast.py", line 63, in _convert
in zip(node.keys, node.values))
File "Y:\admin\Anaconda\lib\ast.py", line 62, in <genexpr>
return dict((_convert(k), _convert(v)) for k, v
File "Y:\admin\Anaconda\lib\ast.py", line 79, in _convert
raise ValueError('malformed string')
ValueError: malformed string
If it matters, I'm using Python 2.7.6 32-bit on Windows 7 64-bit. Thanks in advance
It seems what you're really looking for is a way to parse command-line arguments. Take a look at the argparse module: http://docs.python.org/2/library/argparse.html#module-argparse
Alternately, if you really want to give your arguments in dictionary-ish form, just use the json module:
import json, sys
# Run your program as:
# python my_prog.py "{'foo': 1, 'bar': 2}"
# (the quotes are important)
data = json.loads(sys.argv[1])

Mongoengine - using icontains with all

I have seen this question but it does not answer my question, or even pose it very well.
I think that this is best explained with an example:
class Blah(Document):
someList = ListField(StringField())
Blah.drop_collection()
Blah(someList=['lop', 'glob', 'hat']).save()
Blah(someList=['hello', 'kitty']).save()
# One of these should match the first entry
print(Blah.objects(someList__icontains__all=['Lo']).count())
print(Blah.objects(someList__all__icontains=['Lo']).count())
I assumed that this would print either 1, 0 or 0, 1 (or miraculously 1, 1) but instead it gives
0
Traceback (most recent call last):
File "metst.py", line 14, in <module>
print(Blah.objects(someList__all__icontains=['lO']).count())
File "/home/blah/.pythonbrew/pythons/Python-3.1.4/lib/python3.1/site-packages/mongoengine/queryset.py", line 1034, in count
return self._cursor.count(with_limit_and_skip=True)
File "/home/blah/.pythonbrew/pythons/Python-3.1.4/lib/python3.1/site-packages/mongoengine/queryset.py", line 608, in _cursor
self._cursor_obj = self._collection.find(self._query,
File "/home/blah/.pythonbrew/pythons/Python-3.1.4/lib/python3.1/site-packages/mongoengine/queryset.py", line 390, in _query
self._mongo_query = self._query_obj.to_query(self._document)
File "/home/blah/.pythonbrew/pythons/Python-3.1.4/lib/python3.1/site-packages/mongoengine/queryset.py", line 213, in to_query
query = query.accept(QueryCompilerVisitor(document))
File "/home/blah/.pythonbrew/pythons/Python-3.1.4/lib/python3.1/site-packages/mongoengine/queryset.py", line 278, in accept
return visitor.visit_query(self)
File "/home/blah/.pythonbrew/pythons/Python-3.1.4/lib/python3.1/site-packages/mongoengine/queryset.py", line 170, in visit_query
return QuerySet._transform_query(self.document, **query.query)
File "/home/blah/.pythonbrew/pythons/Python-3.1.4/lib/python3.1/site-packages/mongoengine/queryset.py", line 755, in _transform_query
value = field.prepare_query_value(op, value)
File "/home/blah/.pythonbrew/pythons/Python-3.1.4/lib/python3.1/site-packages/mongoengine/fields.py", line 594, in prepare_query_value
return self.field.prepare_query_value(op, value)
File "/home/blah/.pythonbrew/pythons/Python-3.1.4/lib/python3.1/site-packages/mongoengine/fields.py", line 95, in prepare_query_value
value = re.escape(value)
File "/home/blah/.pythonbrew/pythons/Python-3.1.4/lib/python3.1/re.py", line 246, in escape
return bytes(s)
TypeError: 'str' object cannot be interpreted as an integer
Neither query works!
Does MongoEngine support some way to search using icontains and all? Or some way to get around this?
Note: I want to use MongoEngine, not PyMongo.
Edit: The same issue exists with Python 2.7.3.
The only way to do this, as of now(version 0.8.0) is by using a __raw__ query, possibly combined with re.compile(). Like so:
import re
input_list = ['Lo']
converted_list = [re.compile(q, re.I) for q in input_list]
print(Blah.objects(__raw__={"someList": {"$all": converted_list}}).count())
There is currently no way in mongoengine to combine all and icontains, and the only operator that can be used with other operators is not. This is subtly mentioned in the docs, as in it says that:
not – negate a standard check, may be used before other operators (e.g. Q(age_not_mod=5))
emphasis mine
But it does not say that you can not do this with other operators, which is actually the case.
You can confirm this behavior by looking at the source:
version 0.8.0+ (in module - mongoengine/queryset/transform.py - lines 42-48):
if parts[-1] in MATCH_OPERATORS:
op = parts.pop()
negate = False
if parts[-1] == 'not':
parts.pop()
negate = True
In older versions the above lines can be seen in mongoengine/queryset.py within the _transform_query method.

Categories

Resources