JSON with Python "object must be str, not dict" - python

I'm trying to interpret data from the Twitch API with Python. This is my code:
from twitch.api import v3
import json
streams = v3.streams.all(limit=1)
list = json.loads(streams)
print(list)
Then, when running, I get:
TypeError, "the JSON object must be str, not 'dict'"
Any ideas? Also, is this a method in which I would actually want to use data from an API?

Per the documentation json.loads() will parse a string into a json hierarchy (which is often a dict). Therefore, if you don't pass a string to it, it will fail.
json.loads(s, encoding=None, cls=None, object_hook=None,
parse_float=None, parse_int=None, parse_constant=None,
object_pairs_hook=None, **kw) Deserialize s (a str instance containing
a JSON document) to a Python object using this conversion table.
The other arguments have the same meaning as in load(), except
encoding which is ignored and deprecated.
If the data being deserialized is not a valid JSON document, a
JSONDecodeError will be raised.
From the Twitch API we see that the object being returned by all() is a V3Query. Looking at the source and documentation for that, we see it is meant to return a list. Thus, you should treat that as a list rather than a string that needs to be decoded.
Specifically, the V3Query is a subclass of ApiQuery, in turn a subclass of JsonQuery. That class explicitly runs the query and passes a function over the results, get_json. That source explicitly calls json.loads()... so you don't need to! Remember: never be afraid to dig through the source.

after streams = v3.streams.all(limit=1)
try using
streams = json.dumps(streams)
As the streams should be a JSON string and be in the form:
'{"key":value}'
instead of just dict form:
{"key":value}

Related

Python documentation on possibly inherited method

I am writing a program (python Python 3.5.2) that uses a HTTPSConnection to get a JSON object as a response. I have it working using some example code, but am not sure where a method comes from.
My question is this: In the code below, the decode('utf-9') method doesn't exist in the documentation at https://docs.python.org/3.4/library/http.client.html#http.client.HTTPResponse under "21.12.2. HTTPResponse Objects". How would I know that the return value from the method "response.read()" has the method "decode('utf-8')" available?
Do Python objects inherit from a base class like C# objects do or am I missing something?
http = HTTPSConnection(get_hostname(token))
http.request('GET', uri_path, headers=get_authorization_header(token))
response = http.getresponse()
print(response.status, response.reason)
feed = json.loads(response.read().decode('utf-8'))
Thank you for your help.
The read method of the response object always returns a byte string (in Python 3, which I presume you are using as you use the print function). The byte string does indeed have a decode method, so there should be no problem with this code. Of course it makes the assumption that the response is encoded in UTF-8, which may or may not be correct.
[Technical note: email is a very difficult medium to handle: messages can be made up of different parts, each of which is differently encoded. At least with web traffic you stand a chance of reading the Content-Type header's charset attribute to find the correct encoding].

Manipulate JSON object with non ASC-II characters?

I'm trying to make an API call to Google CSE from python and then manipulate the resulting object into a dictionary object that I can manipulate. I think this question is not duplicated because the issue here I believe is that there are non ASC-II characters which leads to the resulting object being of type 'NoneType' and the resulting json object 'null'. I've played with the options documented for json including "ensure_ascii=False", but haven't been successful. Any help will be greatly appreciated!
Code:
import pprint, os, json
from googleapisclient.discovery import build
def search(searchkey,datekey,developkey,enginekey):
service = build("customsearch", "v1",
developerKey=developkey).cse().list(
q=searchkey,dateRestrict=datekey,
cx=enginekey,
).execute()
pprint.pprint(service)
mykey = 'My_Private_Key'
myengine = '009333857041890623793:z_drq9obxp0'
object2write = search('narco','20170101-20170201',mykey,myengine)
type(object2write)
jsonAbder = json.dumps(object2write, ensure_ascii=False, allow_nan=False)
print(jsonAbder)
The proximal cause of your error is that your search function doesn't have an explicit return statement. Thus, it implicitely returns None, which gets encoded into JSON null. Your issue has nothing to do with character encodings.
Just add:
return service
at the end of your function.

Different behavior of json.dump() on Windows and Linux

I wrote a python script to retrieve data from a website in json format using the requests library, and then I dump it into a json file. I have written a lot of code utilizing this data and have tested it in Windows only. Recently I shifted to a Linux system, and when the same python script is executed, the order of the keys in the json file is completely different.
This is the code I'm using:
API_request = requests.get('https://www.abcd.com/datarequest')
alertJson_Data = API_request.json() # To convert returned data to json
json.dump(alertJson_Data, jsonDataFile) # for adding the json data for the alert to the file
jsonDataFile.write('\n')
jsonDataFile.close()
A lot of my other scripts depends on the ordering of the keys in this json file, so is there any way to maintain the same ordering that is used in Windows to be used in Linux as well?
For example in Windows the order is "id":, "src":, "dest":, whereas in Linux its completely different. If I directly go to the Web link on my browser, it has the same ordering as the one saved in Windows. How do I retain this ordering?
Can you use collections.OrderedDict when loading json?
e.g
from collections import OrderedDict
alertJson_Data = API_request.json(object_pairs_hook=OrderedDict)
should works, because json() method implemented on requests take the same optional arguments as json.loads
json(**kwargs)
Returns the json-encoded content of a response, if any.
Parameters **kwargs – Optional arguments that json.loads takes. Raises
ValueError – If the response body does not contain valid json.
And the doc of json.loads specify:
object_hook, if specified, will be called with the result of every
JSON object decoded and its return value will be used in place of the
given dict. This can be used to provide custom deserializations (e.g.
to support JSON-RPC class hinting).
object_pairs_hook, if specified will be called with the result of
every JSON object decoded with an ordered list of pairs. The return
value of object_pairs_hook will be used instead of the dict. This
feature can be used to implement custom decoders that rely on the
order that the key and value pairs are decoded (for example,
collections.OrderedDict() will remember the order of insertion). If
object_hook is also defined, the object_pairs_hook takes priority.

Is parsing a json naively into a Python class or struct secure?

Some background first: I have a few rather simple data structures which are persisted as json files on disk. These json files are shared between applications of different languages and different environments (like web frontend and data manipulation tools).
For each of the files I want to create a Python "POPO" (Plain Old Python Object), and a corresponding data mapper class for each item should implement some simple CRUD like behavior (e.g. save will serialize the class and store as json file on disk).
I think a simple mapper (which only knows about basic types) will work. However, I'm concerned about security. Some of the json files will be generated by a web frontend, so a possible security risk if a user feeds me some bad json.
Finally, here is the simple mapping code (found at How to convert JSON data into a Python object):
class User(object):
def __init__(self, name, username):
self.name = name
self.username = username
import json
j = json.loads(your_json)
u = User(**j)
What possible security issues do you see?
NB: I'm new to Python.
Edit: Thanks all for your comments. I've found out that I have one json where I have 2 arrays, each having a map. Unfortunately this starts to look like it gets cumbersome when I get more of these.
I'm extending the question to mapping a json input to a recordtype. The original code is from here: https://stackoverflow.com/a/15882054/1708349.
Since I need mutable objects, I'd change it to use a namedlist instead of a namedtuple:
import json
from namedlist import namedlist
data = '{"name": "John Smith", "hometown": {"name": "New York", "id": 123}}'
# Parse JSON into an object with attributes corresponding to dict keys.
x = json.loads(data, object_hook=lambda d: namedlist('X', d.keys())(*d.values()))
print x.name, x.hometown.name, x.hometown.id
Is it still safe?
There's not much wrong that can happen in the first case. You're limiting what arguments can be provided and it's easy to add validation/conversion right after loading from JSON.
The second example is a bit worse. Packing things into records like this will not help you in any way. You don't inherit any methods, because each type you define is new. You can't compare values easily, because dicts are not ordered. You don't know if you have all arguments handled, or if there is some extra data, which can lead to hidden problems later.
So in summary: with User(**data), you're pretty safe. With namedlist there's space for ambiguity and you don't really gain anything. (compared to bare, parsed json)
If you blindly accept users json input without sanity check, you are at risk of become json injection victim.
See detail explanation of json injection attack here: https://www.acunetix.com/blog/web-security-zone/what-are-json-injections/
Besides security vulnerability, parse JSON to Python object this way is not type safe.
With your example of User class, I would assume you expect both fields name and username to be string type. What if the json input is like this:
{
"name": "my name",
"username": 1
}
j = json.loads(your_json)
u = User(**j)
type(u.username) # int
You have gotten an object with unexpected type.
One solution to make sure type safe is to use json schema to validate input json. more about json schema: https://json-schema.org/

In Python, have json not escape a string

I am caching some JSON data, and in storage it is represented as a JSON-encode string. No work is performed on the JSON by the server before sending it to the client, other than collation of multiple cached objects, like this:
def get_cached_items():
item1 = cache.get(1)
item2 = cache.get(2)
return json.dumps(item1=item1, item2=item2, msg="123")
There may be other items included with the return value, in this case represented by msg="123".
The issue is that the cached items are double-escaped. It would behoove the library to allow a pass-through of the string without escaping it.
I have looked at the documentation for json.dumps default argument, as it seems to be the place where one would address this, and searched on google/SO but found no useful results.
It would be unfortunate, from a performance perspective, if I had to decode the JSON of each cached items to send it to the browser. It would be unfortunate from a complexity perspective to not be able to use json.dumps.
My inclination is to write a class that stores the cached string and when the default handler encounters an instance of this class it uses the string without perform escaping. I have yet to figure out how to achieve this though, and I would be grateful for thoughts and assistance.
EDIT For clarity, here is an example of the proposed default technique:
class RawJSON(object):
def __init__(self, str):
self.str = str
class JSONEncoderWithRaw(json.JSONEncoder):
def default(self, o):
if isinstance(o, RawJSON):
return o.str # but avoid call to `encode_basestring` (or ASCII equiv.)
return super(JSONEncoderWithRaw, self).default(o)
Here is a degenerate example of the above:
>>> class M():
str = ''
>>> m = M()
>>> m.str = json.dumps(dict(x=123))
>>> json.dumps(dict(a=m), default=lambda (o): o.str)
'{"a": "{\\"x\\": 123}"}'
The desired output would include the unescaped string m.str, being:
'{"a": {"x": 123}}'
It would be good if the json module did not encode/escape the return of the default parameter, or if same could be avoided. In the absence of a method via the default parameter, one may have to achieve the objective here by overloading the encode and iterencode method of JSONEncoder, which brings challenges in terms of complexity, interoperability, and performance.
A quick-n-dirty way is to patch json.encoder.encode_basestring*() functions:
import json
class RawJson(unicode):
pass
# patch json.encoder module
for name in ['encode_basestring', 'encode_basestring_ascii']:
def encode(o, _encode=getattr(json.encoder, name)):
return o if isinstance(o, RawJson) else _encode(o)
setattr(json.encoder, name, encode)
print(json.dumps([1, RawJson(u'["abc", 2]'), u'["def", 3]']))
# -> [1, ["abc", 2], "[\"def\", 3]"]
If you are caching JSON strings, you need to first decode them to python structures; there is no way for json.dumps() to distinguish between normal strings and strings that are really JSON-encoded structures:
return json.dumps({'item1': json.loads(item1), 'item2': json.loads(item2), 'msg': "123"})
Unfortunately, there is no option to include already-converted JSON data in this; the default function is expected to return Python values. You extract data from whatever object that is passed in and return a value that can be converted to JSON, not a value that is already JSON itself.
The only other approach I can see is to insert "template" values, then use string replacement techniques to manipulate the JSON output to replace the templates with your actual cached data:
json_data = json.dumps({'item1': '==item1==', 'item2': '==item2==', 'msg': "123"})
return json_data.replace('"==item1=="', item1).replace('"==item2=="', item2)
A third option is to cache item1 and item2 in non-serialized form, as a Python structure instead of a JSON string.
You can use the better maintained simplejson instead of json which provides this functionality.
import simplejson as json
from simplejson.encoder import RawJSON
print(json.dumps([1, RawJSON(u'["abc", 2]'), u'["def", 3]']))
# -> [1, ["abc", 2], "[\"def\", 3]"]
You get simplicity of code, plus all the C optimisations of simplejson.

Categories

Resources