python jsonify dictionary in utf-8 - python

I want to get json data into utf-8
I have a list my_list = []
and then many appends unicode values to the list like this
my_list.append(u'ტესტ')
return jsonify(result=my_list)
and it gets
{
"result": [
"\u10e2\u10d4\u10e1\u10e2",
"\u10e2\u10dd\u10db\u10d0\u10e8\u10d5\u10d8\u10da\u10d8"
]
}

Use the following config to add UTF-8 support:
app.config['JSON_AS_ASCII'] = False

Use the standard-library json module instead, and set the ensure_ascii keyword parameter to False when encoding, or do the same with flask.json.dumps():
>>> data = u'\u10e2\u10d4\u10e1\u10e2'
>>> import json
>>> json.dumps(data)
'"\\u10e2\\u10d4\\u10e1\\u10e2"'
>>> json.dumps(data, ensure_ascii=False)
u'"\u10e2\u10d4\u10e1\u10e2"'
>>> print json.dumps(data, ensure_ascii=False)
"ტესტ"
>>> json.dumps(data, ensure_ascii=False).encode('utf8')
'"\xe1\x83\xa2\xe1\x83\x94\xe1\x83\xa1\xe1\x83\xa2"'
Note that you still need to explicitly encode the result to UTF8 because the dumps() function returns a unicode object in that case.
You can make this the default (and use jsonify() again) by setting JSON_AS_ASCII to False in your Flask app config.
WARNING: do not include untrusted data in JSON that is not ASCII-safe, and then interpolate into a HTML template or use in a JSONP API, as you can cause syntax errors or open a cross-site scripting vulnerability this way. That's because JSON is not a strict subset of Javascript, and when disabling ASCII-safe encoding the U+2028 and U+2029 separators will not be escaped to \u2028 and \u2029 sequences.

If you still want to user flask's json and ensure the utf-8 encoding then you can do something like this:
from flask import json,Response
#app.route("/")
def hello():
my_list = []
my_list.append(u'ტესტ')
data = { "result" : my_list}
json_string = json.dumps(data,ensure_ascii = False)
#creating a Response object to set the content type and the encoding
response = Response(json_string,content_type="application/json; charset=utf-8" )
return response
#I hope this helps

In my case the above solution was not sufficient. (Running flask on the GCP App Engine flexible environment). I ended up doing:
json_str = json.dumps(myDict, ensure_ascii = False, indent=4, sort_keys=True)
encoding = chardet.detect(json_str)['encoding']
json_unicode = json_str.decode(encoding)
json_utf8 = json_unicode.encode('utf-8')
response = make_response(json_utf8)
response.headers['Content-Type'] = 'application/json; charset=utf-8'
response.headers['mimetype'] = 'application/json'
response.status_code = status

Related

How to return data in JSON format using FastAPI?

I have written the same API application with the same function in both FastAPI and Flask. However, when returning the JSON, the format of data differs between the two frameworks. Both use the same json library and even the same exact code:
import json
from google.cloud import bigquery
bigquery_client = bigquery.Client()
#router.get('/report')
async def report(request: Request):
response = get_clicks_impression(bigquery_client, source_id)
return response
def get_user(client, source_id):
try:
query = """ SELECT * FROM ....."""
job_config = bigquery.QueryJobConfig(
query_parameters=[
bigquery.ScalarQueryParameter("source_id", "STRING", source_id),
]
)
query_job = client.query(query, job_config=job_config) # Wait for the job to complete.
result = []
for row in query_job:
result.append(dict(row))
json_obj = json.dumps(result, indent=4, sort_keys=True, default=str)
except Exception as e:
return str(e)
return json_obj
The returned data in Flask was dict:
{
"User": "fasdf",
"date": "2022-09-21",
"count": 205
},
{
"User": "abd",
"date": "2022-09-27",
"count": 100
}
]
While in FastAPI was string:
"[\n {\n \"User\": \"aaa\",\n \"date\": \"2022-09-26\",\n \"count\": 840,\n]"
The reason I use json.dumps() is that date cannot be itterable.
If you serialise the object before returning it—using, for instance, json.dumps(), as in your example—the object will end up being serialised twice, as FastAPI will automatically serialise the return value. Hence, the reason for the output string you ended up with, that is:
"[\n {\n \"User\": \"aaa\",\n \"date\": \"2022-09-26\",\n ...
Have a look at the available solutions below.
Option 1
You could normally return data such as dict, list, etc., and FastAPI would automatically convert that return value into JSON, after first converting the data into JSON-compatible data (e.g., a dict) using the jsonable_encoder. The jsonable_encoder ensures that objects that are not serializable, such as datetime objects, are converted to a str. Then, behind the scenes, FastAPI would put that JSON-compatible data inside of a JSONResponse, which will return an application/json encoded response to the client. The JSONResponse, as can be seen in Starlette's source code here, will use the Python standard json.dumps() to serialise the dict (for alternatvie/faster JSON encoders, see this answer).
Data:
from datetime import date
d = [{'User': 'a', 'date': date.today(), 'count': 1},
{'User': 'b', 'date': date.today(), 'count': 2}]
API Endpoint:
#app.get('/')
def main():
return d
The above is equivalent to:
from fastapi.responses import JSONResponse
from fastapi.encoders import jsonable_encoder
#app.get('/')
def main():
return JSONResponse(content=jsonable_encoder(d))
Output:
[{"User":"a","date":"2022-10-21","count":1},{"User":"b","date":"2022-10-21","count":2}]
Option 2
If, for any reason (e.g., trying to force some custom JSON format), you have to serialise the object before returning it, you can then return a custom Response directly, as described in this answer. As per the documentation:
When you return a Response directly its data is not validated,
converted (serialized), nor documented automatically.
Additionally, as described here:
FastAPI (actually Starlette) will automatically include a
Content-Length header. It will also include a Content-Type header,
based on the media_type and appending a charset for text types.
Hence, you can also set the media_type to whatever type you are expecting the data to be; in this case, that is application/json. Example is given below.
Note 1: The JSON outputs posted in this answer (in both Options 1 & 2) are the result of accessing the API endpoint through the browser directly (i.e., by typing the URL in the address bar of the browser and then hitting the enter key). If you tested the endpoint through Swagger UI at /docs instead, you would see that the indentation differs (in both options). This is due to how Swagger UI formats application/json responses. If you needed to force your custom indentation on Swagger UI as well, you could avoid specifying the media_type for the Response in the example below. This would result in displaying the content as text, as the Content-Type header would be missing from the response, and hence, Swagger UI couldn't recognise the type of the data, in order to format them.
Note 2: Setting the default argument to str in json.dumps() is what makes it possible to serialise the date object, otherwise if it wasn't set, you would get: TypeError: Object of type date is not JSON serializable. The default is a function that gets called for objects that can't otherwise be serialized. It should return a JSON-encodable version of the object. In this case it is str, meaning that every object that is not serializable, it is converted to string. You could also use a custom function or JSONEncoder subclass, as demosntrated here, if you would like to serialise an object in a custom way.
Note 3: FastAPI/Starlette's Response accepts as a content argument either a str or bytes object. As shown in the implementation here, if you don't pass a bytes object, Starlette will try to encode it using content.encode(self.charset). Hence, if, for instance, you passed a dict, you would get: AttributeError: 'dict' object has no attribute 'encode'. In the example below, a JSON str is passed, which will later be encoded into bytes (you could alternatively encode it yourself before passing it to the Response object).
API Endpoint:
from fastapi import Response
import json
#app.get('/')
def main():
json_str = json.dumps(d, indent=4, default=str)
return Response(content=json_str, media_type='application/json')
Output:
[
{
"User": "a",
"date": "2022-10-21",
"count": 1
},
{
"User": "b",
"date": "2022-10-21",
"count": 2
}
]

How to post a kafka schema using python

I am trying to post a kafka schema using python.
From the CLI I would use a syntax like:
curl -X POST -H "Content-Type: application/vnd.schemaregistry.v1+json" --data '{"schema": "{\"type\":\"record\",\"name\":\"VisualDetections\",\"namespace\":\"com.namespace.something\",\"fields\":[{\"name\":\"vehicle_id\",\"type\":\"int\"},{\"name\":\"source\",\"type\":\"string\"},{\"name\":\"width\",\"type\":\"int\"},{\"name\":\"height\",\"type\":\"int\"},{\"name\":\"annotated_frame\",\"type\":[\"string\",\"null\"]},{\"name\":\"version\",\"type\":\"string\"},{\"name\":\"fps\",\"type\":\"int\"},{\"name\":\"mission_id\",\"type\":\"int\"},{\"name\":\"sequence\",\"type\":{\"type\":\"array\",\"items\":{\"type\":\"record\",\"name\":\"sequence_record\",\"fields\":[{\"name\":\"frame_id\",\"type\":\"int\"},{\"name\":\"timestamp\",\"type\":\"long\"},{\"name\":\"localization\",\"type\":{\"type\":\"array\",\"items\":{\"type\":\"record\",\"name\":\"localization_record\",\"fields\":[{\"name\":\"latitude\",\"type\":\"double\"},{\"name\":\"longitude\",\"type\":\"double\"},{\"name\":\"class\",\"type\":\"string\"},{\"name\":\"object_id\",\"type\":\"int\"},{\"name\":\"confidence\",\"type\":\"double\"},{\"name\":\"bbox\",\"type\":{\"type\":\"record\",\"name\":\"bbox\",\"fields\":[{\"name\":\"x_min\",\"type\":\"int\"},{\"name\":\"y_min\",\"type\":\"int\"},{\"name\":\"x_max\",\"type\":\"int\"},{\"name\":\"y_max\",\"type\":\"int\"}]}}]}}}]}}}]}"}' http://server_ip:8081/subjects/VisualDetections-value/versions/
When I tried to tranfer this function to python I tried something like:
import requests
import json
topic = 'VisualDetections'
headers = {'Content-Type': 'application/vnd.schemaregistry.v1+json'}
with open(avro_path) as fp:
data = {'schema': json.load(fp)}
data_json = json.dumps(data)
cmd = 'http://server_ip:8081/subjects/{}-value/versions/'.format(topic)
response = requests.post(cmd, headers=headers, data=data_json)
The above returns a code {"error_code":500,"message":"Internal Server Error"}. I have tried other options like:
with open(avro_path) as fp:
data = json.load(fp)
with error code:
"error_code":422,"message":"Unrecognized field: name"
In the above the avro_path just contains the avro schema in a json file (can be uploaded if useful also).
I am not sure how I could post this data exactly. Also, I did not take into consideration the -H argument of post in CLI since I couldn't find a equivalent python argument (not sure it plays any role though). Can anyone provide a solution to this issue.
For the second error, the payload needs to be {'schema': "schema string"}
For the first, I think its a matter of the encoding; json.load will read the file to a dict rather than just a string.
Notice
>>> import json
>>> schema = {"type":"record"} # example when using json.load() ... other data excluded
>>> json.dumps({'schema': schema})
'{"schema": {"type": "record"}}' # the schema value is not a string
>>> json.dumps({'schema': json.dumps(schema)})
'{"schema": "{\\"type\\": \\"record\\"}"}' # here it is
Try just reading the file
url = 'http://server_ip:8081/subjects/{}-value/versions/'.format(topic)
with open(avro_path) as fp:
data = {'schema': fp.read().strip()}
response = requests.post(cmd, headers=headers, data=json.dumps(data))
Otherwise, you would json.load then use json.dumps twice as shown above
You may also try json=data rather than data=json.dumps(data)

Jsonify response data with backslash

I have a flask API that sends response in json format
rep = {'application_id': 32657, 'business_rules_id': 20} # a python dictionary
rep_json = json.dumps(rep, cls=CustomizedEncoder) # converts to a json format string
return jsonify(rep_json), 200 . #return the flask response (with headers etc)
I can see the flask response body data and the response is something like:
b'"{\\"application_id\\": 32567, \\"business_rules_id\\": 20}"\n'
or in postman body
"{\"application_id\": 32567, \"business_rules_id\": 20}
Should i get a response in JSON format (without the backslash)? I guess the reason is that json.dumps dump the string to json once then jsonify dump it a second time which cause the double quote to be escaped.
The reason that I need to run the following is because i need a customized encoder which jsonify probably does not support.
rep_json = json.dumps(rep, cls=CustomizedEncoder)
My other solution is to dumps then loads but which make it looks redudant. Is there a different approach to use a customized encoder while return a Flask response?
This is another way that I tried but looks weird
rep = {'application_id': 32657, 'business_rules_id': 20} # a python dictionary
rep_json = json.dumps(rep, cls=CustomizedEncoder) # converts to a json format string
return jsonify(json.loads(rep_json)), 200 . #return the flask response (with headers etc)
You can configure your app to use a customer encoder with app.json_encoder = CustomizedEncoder
https://kite.com/python/docs/flask.app.Flask.json_encoder

How to return this valid json data in Python?

I tested using Python to translate a curl to get some data.
import requests
import json
username="abc"
password="123"
headers = {
'Content-Type': 'application/json',
}
params = (
('version', '2017-05-01'),
)
data = '{"text":["This is message one."], "id":"en-es"}'
response = requests.post('https://somegateway.service/api/abc', headers=headers, params=params, data=data, auth=(username, password))
print(response.text)
The above works fine. It returns json data.
It seems ["This is message one."] is a list. I want to use a variable that loads a file to replace this list.
I tried:
with open(f,"r",encoding='utf-8') as fp:
file_in_list=fp.read().splitlines()
toStr=str(file_in_list)
data = '{"text":'+toStr+', "id":"en-es"}'
response = requests.post('https://somegateway.service/api/abc', headers=headers, params=params, data=data, auth=(username, password))
print(response.text)
But it returned error below.
{
"code" : 400,
"error" : "Mapping error, invalid JSON"
}
Can you help? How can I have valid response.text?
Thanks.
update:
The content of f contains only five lines below:
This is message one.
this is 2.
this is three.
this is four.
this is five.
The reason your existing code fails is that str applied to a list of strings will only rarely give you valid JSON. They're not intended to do the same thing. JSON only allows double-quoted strings; Python allows both single- and double-quoted strings. And, unless your strings all happen to include ' characters, Python will render them with single quotes:
>>> print(["abc'def"]) # gives you valid JSON, but only by accident
["abc'def"]
>>> print(["abc"]) # does not give you valid JSON
['abc']
If you want to get the valid JSON encoding of a list of strings, don't try to trick str into giving you valid JSON by accident, just use the json module:
toStr = json.dumps(file_in_list)
But, even more simply, you shouldn't be trying to figure out how to construct JSON strings in the first place. Just create a dict and json.dumps the whole thing:
data = {"text": file_in_list, "id": "en-es"}
data_str = json.dumps(data)
Being able to do this is pretty much the whole point of JSON: it's a simple way to automatically serialize all of the types that are common to all the major scripting languages.
Or, even better, let requests do it for you by passing a json argument instead of a data argument:
data = {"text": file_in_list, "id": "en-es"}
response = requests.post('https://somegateway.service/api/abc', headers=headers, params=params, json=data, auth=(username, password))
This also automatically takes care of setting the Content-Type header to application/json for you. You weren't doing that—and, while many servers will accept your input without it, it's illegal, and some servers will not allow it.
For more details, see the section More complicated POST requests in the requests docs. But there really aren't many more details.
tldr;
toStr = json.dumps(file_in_list)
Explanation
Assuming your file contains something like
String_A
String_B
You need to ensure that toStr is:
Enclosed by [ and ]
Every String in the list is enclosed by quotation marks.
So your raw json (as a String) is equal to '{"text":["String_A", "String_B"], "id":"en-es"}'

BOM in server response screws up json parsing

I'm trying to write a Python script that posts some JSON to a web server and gets some JSON back. I patched together a few different examples on StackOverflow, and I think I have something that's mostly working.
import urllib2
import json
url = "http://foo.com/API.svc/SomeMethod"
payload = json.dumps( {'inputs': ['red', 'blue', 'green']} )
headers = {"Content-type": "application/json;"}
req = urllib2.Request(url, payload, headers)
f = urllib2.urlopen(req)
response = f.read()
f.close()
data = json.loads(response) # <-- Crashes
The last line throws an exception:
ValueError: No JSON object could be decoded
When I look at response, I see valid JSON, but the first few characters are a BOM:
>>> response
'\xef\xbb\xbf[\r\n {\r\n ... Valid JSON here
So, if I manually strip out the first three bytes:
data = json.loads(response[3::])
Everything works and response is turned into a dictionary.
My Question:
It seems kinda silly that json barfs when you give it a BOM. Is there anything different I can do with urllib or the json library to let it know this is a UTF8 string and to handle it as such? I don't want to manually strip out the first 3 bytes.
You should probably yell at whoever's running this service, because a BOM on UTF-8 text makes no sense. The BOM exists to disambiguate byte order, and UTF-8 is defined as being little-endian.
That said, ideally you should decode bytes before doing anything else with them. Luckily, Python has a codec that recognizes and removes the BOM: utf-8-sig.
>>> '\xef\xbb\xbffoo'.decode('utf-8-sig')
u'foo'
So you just need:
data = json.loads(response.decode('utf-8-sig'))
In case I'm not the only one who experienced the same problem, but is using requests module instead of urllib2, here is a solution that works in Python 2.6 as well as 3.3:
import requests
r = requests.get(url, params=my_dict, auth=(user, pass))
print(r.headers['content-type']) # 'application/json; charset=utf8'
if r.text[0] == u'\ufeff': # bytes \xef\xbb\xbf in utf-8 encoding
r.encoding = 'utf-8-sig'
print(r.json())
Since I lack enough reputation for a comment, I'll write an answer instead.
I usually encounter that problem when I need to leave the underlying Stream of a StreamWriter open. However, the overload that has the option to leave the underlying Stream open needs an encoding (which will be UTF8 in most cases), here's how to do it without emitting the BOM.
/* Since Encoding.UTF8 (the one you'd normally use in those cases) **emits**
* the BOM, use whats below instead!
*/
// UTF8Encoding has an overload which enables / disables BOMs in the output
UTF8Encoding encoding = new UTF8Encoding(false);
using (MemoryStream ms = new MemoryStream())
using (StreamWriter sw = new StreamWriter(ms, encoding, 4096, true))
using (JsonTextWriter jtw = new JsonTextWriter(sw))
{
serializer.Serialize(jtw, myObject);
}

Categories

Resources