Flask testing, post file and nested dictionaries - python

I have created a Flask app and wanted to test it. In a single endpoint, I would like to post a multipart request, which includes a file and a complex JSON object. I thought at first of using werkzeug EnvironBuilder for this task, as it seems to provide a quite automated approach, handling content types, etc. My snippet of code for preparing the request is the following:
# client is an instance of FlaskClient produced using a pytest fixture and the method test client
def _post(endpoint, file_path=None, serialized_message=None):
with open(file_path, 'rb') as fin:
fil = io.BytesIO(fin.read())
file_name = file_path.split(os.sep)[-1]
builder = EnvironBuilder(path='/' + endpoint,
method='POST',
data=json.loads(
serialized_message),
content_type="application/json")
builder.files[file_name] = fil
result = client.open(builder, buffered=True)
return result
This failed with the following error:
def _add_file_from_data(self, key, value):
"""Called in the EnvironBuilder to add files from the data dict."""
if isinstance(value, tuple):
self.files.add_file(key, *value)
elif isinstance(value, dict):
from warnings import warn
warn(DeprecationWarning('it\'s no longer possible to pass dicts '
'as `data`. Use tuples or FileStorage '
'objects instead'), stacklevel=2)
value = dict(value)
mimetype = value.pop('mimetype', None)
if mimetype is not None:
value['content_type'] = mimetype
> self.files.add_file(key, **value)
E TypeError: add_file() got an unexpected keyword argument 'globalServiceOptionId'
With the globalServiceOptionId being a key of a nested dictionary in the dictionary I am posting. I have some thoughts over bypassing this problem, with converting to string jsons the inner dictionaries, but I would like something more concrete as an answer, as I do not want the representation of the request to be changed inside and outside of testing. Thank you.
Update 1
The form of the passwed dictionary doesn't really matter, as long as it has nested dictionaries inside it. This json is given in this example:
{
"attachments": [],
"Ids": [],
"globalServiceOptions": [{
"globalServiceOptionId": {
"id": 2,
"agentServiceId": {
"id": 2
},
"serviceOptionName": "Time",
"value": "T_last",
"required": false,
"defaultValue": "T_last",
"description": "UTC Timestamp",
"serviceOptionType": "TIME"
},
"name": "Time",
"value": null
}]
}
Update 2
I tested another snippet:
def _post(endpoint, file_path=None, serialized_message=None):
with open(file_path, 'rb') as fin:
fil = io.BytesIO(fin.read())
files = {
'file': (file_path, fil, 'application/octet-stream')
}
for key, item in json.loads(serialized_message).items():
files[key] = (None, json.dumps(item), 'application/json')
builder = EnvironBuilder(path='/' + endpoint,
method='POST', data=files,
)
result = client.open(builder, buffered=True)
return result
Although this runs without errors, Flask recognizes (as expected) the incoming jsons as files, which again requires different handling during testing and normal running.

I ran into a similar issue, and what ended up working for me was changing the data approach to exclude nested dicts. Taking your sample JSON, doing the following should allow it to clear the EnvironBuilder:
data_json = {
"attachments": [],
"Ids": [],
"globalServiceOptions": [json.dumps({ # Dump all nested JSON to a string representation
"globalServiceOptionId": {
"id": 2,
"agentServiceId": {
"id": 2
},
"serviceOptionName": "Time",
"value": "T_last",
"required": false,
"defaultValue": "T_last",
"description": "UTC Timestamp",
"serviceOptionType": "TIME"
},
"name": "Time",
"value": null
})
]
}
builder = EnvironBuilder(path='/' + endpoint,
method='POST',
data=data_json,
content_type="application/json")
Taking the approach above still allowed the nested dict/JSON to be passed appropriately while clearing the werkzeug limitation.

Related

How to use msearch() with "q" in ElasticSearch?

I've been using the standard Python ElasticSearch client to make single requests in the following format:
es.search(index='my_index', q=query, size=5, search_type='dfs_query_then_fetch')
I now want to make queries in batch for multiple strings q.
I've seen this question explaining how to use the msearch() functionality to do queries in batch. However, msearch requires the full json-formatted request body for each request. I'm not sure which parameters in the query API correspond to just the q parameter from search(), or size, or search_type, which seem to be API shortcuts specific to the single-example search().
How can I use msearch but specify q, size, and search_type?
I read through the API and figured out how to batch simple search queries:
from typing import List
from elasticsearch import Elasticsearch
import json
def msearch(
es: Elasticsearch,
max_hits: int,
query_strings: List[str],
index: str
):
search_arr = []
for q in query_strings:
search_arr.append({'index': index })
search_arr.append(
{
"query": {
"query_string": {
"query": q
}
},
'size': max_hits
})
request = ''
request = ' \n'.join([json.dumps(x) for x in search_arr])
resp = es.msearch(body = request)
return resp
msearch(es, query_strings=['query 1', 'query 2'], max_hits=1, index='my_index')
EDIT: For my use case, I made one more improvement, which was because I didn't want to return the entire document in the result– for my purpose, I just needed the document ID and its score.
So the final search request object part looked like this, including the '_source': False bit:
search_arr.append(
{
# Queries `q` using Lucene syntax.
"query": {
"query_string": {
"query": q
},
},
# Don't return the full profile string, etc. with the result.
# We just want the ID and the score.
'_source': False,
# Only return `max_hits` documents.
'size': max_hits
}
)

Referencing Values in a List (syntax issue?) [duplicate]

I wrote some code to get data from a web API. I was able to parse the JSON data from the API, but the result I gets looks quite complex. Here is one example:
>>> my_json
{'name': 'ns1:timeSeriesResponseType', 'declaredType': 'org.cuahsi.waterml.TimeSeriesResponseType', 'scope': 'javax.xml.bind.JAXBElement$GlobalScope', 'value': {'queryInfo': {'creationTime': 1349724919000, 'queryURL': 'http://waterservices.usgs.gov/nwis/iv/', 'criteria': {'locationParam': '[ALL:103232434]', 'variableParam': '[00060, 00065]'}, 'note': [{'value': '[ALL:103232434]', 'title': 'filter:sites'}, {'value': '[mode=LATEST, modifiedSince=null]', 'title': 'filter:timeRange'}, {'value': 'sdas01', 'title': 'server'}]}}, 'nil': False, 'globalScope': True, 'typeSubstituted': False}
Looking through this data, I can see the specific data I want: the 1349724919000 value that is labelled as 'creationTime'.
How can I write code that directly gets this value?
I don't need any searching logic to find this value. I can see what I need when I look at the response; I just need to know how to translate that into specific code to extract the specific value, in a hard-coded way. I read some tutorials, so I understand that I need to use [] to access elements of the nested lists and dictionaries; but I can't figure out exactly how it works for a complex case.
More generally, how can I figure out what the "path" is to the data, and write the code for it?
For reference, let's see what the original JSON would look like, with pretty formatting:
>>> print(json.dumps(my_json, indent=4))
{
"name": "ns1:timeSeriesResponseType",
"declaredType": "org.cuahsi.waterml.TimeSeriesResponseType",
"scope": "javax.xml.bind.JAXBElement$GlobalScope",
"value": {
"queryInfo": {
"creationTime": 1349724919000,
"queryURL": "http://waterservices.usgs.gov/nwis/iv/",
"criteria": {
"locationParam": "[ALL:103232434]",
"variableParam": "[00060, 00065]"
},
"note": [
{
"value": "[ALL:103232434]",
"title": "filter:sites"
},
{
"value": "[mode=LATEST, modifiedSince=null]",
"title": "filter:timeRange"
},
{
"value": "sdas01",
"title": "server"
}
]
}
},
"nil": false,
"globalScope": true,
"typeSubstituted": false
}
That lets us see the structure of the data more clearly.
In the specific case, first we want to look at the corresponding value under the 'value' key in our parsed data. That is another dict; we can access the value of its 'queryInfo' key in the same way, and similarly the 'creationTime' from there.
To get the desired value, we simply put those accesses one after another:
my_json['value']['queryInfo']['creationTime'] # 1349724919000
I just need to know how to translate that into specific code to extract the specific value, in a hard-coded way.
If you access the API again, the new data might not match the code's expectation. You may find it useful to add some error handling. For example, use .get() to access dictionaries in the data, rather than indexing:
name = my_json.get('name') # will return None if 'name' doesn't exist
Another way is to test for a key explicitly:
if 'name' in resp_dict:
name = resp_dict['name']
else:
pass
However, these approaches may fail if further accesses are required. A placeholder result of None isn't a dictionary or a list, so attempts to access it that way will fail again (with TypeError). Since "Simple is better than complex" and "it's easier to ask for forgiveness than permission", the straightforward solution is to use exception handling:
try:
creation_time = my_json['value']['queryInfo']['creationTime']
except (TypeError, KeyError):
print("could not read the creation time!")
# or substitute a placeholder, or raise a new exception, etc.
Here is an example of loading a single value from simple JSON data, and converting back and forth to JSON:
import json
# load the data into an element
data={"test1": "1", "test2": "2", "test3": "3"}
# dumps the json object into an element
json_str = json.dumps(data)
# load the json to a string
resp = json.loads(json_str)
# print the resp
print(resp)
# extract an element in the response
print(resp['test1'])
Try this.
Here, I fetch only statecode from the COVID API (a JSON array).
import requests
r = requests.get('https://api.covid19india.org/data.json')
x = r.json()['statewise']
for i in x:
print(i['statecode'])
Try this:
from functools import reduce
import re
def deep_get_imps(data, key: str):
split_keys = re.split("[\\[\\]]", key)
out_data = data
for split_key in split_keys:
if split_key == "":
return out_data
elif isinstance(out_data, dict):
out_data = out_data.get(split_key)
elif isinstance(out_data, list):
try:
sub = int(split_key)
except ValueError:
return None
else:
length = len(out_data)
out_data = out_data[sub] if -length <= sub < length else None
else:
return None
return out_data
def deep_get(dictionary, keys):
return reduce(deep_get_imps, keys.split("."), dictionary)
Then you can use it like below:
res = {
"status": 200,
"info": {
"name": "Test",
"date": "2021-06-12"
},
"result": [{
"name": "test1",
"value": 2.5
}, {
"name": "test2",
"value": 1.9
},{
"name": "test1",
"value": 3.1
}]
}
>>> deep_get(res, "info")
{'name': 'Test', 'date': '2021-06-12'}
>>> deep_get(res, "info.date")
'2021-06-12'
>>> deep_get(res, "result")
[{'name': 'test1', 'value': 2.5}, {'name': 'test2', 'value': 1.9}, {'name': 'test1', 'value': 3.1}]
>>> deep_get(res, "result[2]")
{'name': 'test1', 'value': 3.1}
>>> deep_get(res, "result[-1]")
{'name': 'test1', 'value': 3.1}
>>> deep_get(res, "result[2].name")
'test1'

Save values from POST request of a list of dicts

I a trying to expose an API (if that's the correct way to say it). I am using Quart, a python library made out of Flask and this is what my code looks like:
async def capture_post_request(request_json):
for item in request_json:
callbackidd = item['callbackid']
print(callbackidd)
#app.route('/start_work/', methods=['POST'])
async def start_work():
content_type = request.headers.get('content-type')
if (content_type == 'application/json'):
request_json = await request.get_json()
loop = asyncio.get_event_loop()
loop.create_task(capture_post_request(request_json))
body = "Async Job Started"
return body
else:
return 'Content-Type not supported!'
My schema looks like that:
[
{
"callbackid": "dd",
"itemid": "234r",
"input": [
{
"type": "thistype",
"uri": "www.uri.com"
}
],
"destination": {
"type": "thattype",
"uri": "www.urino2.com"
}
},
{
"statusCode": "202"
}
]
So far what I am getting is this error:
line 11, in capture_post_request
callbackidd = item['callbackid']
KeyError: 'callbackid'
I've tried so many stackoverflow posts to see how to iterate through my list of dicts but nothing worked. At one point in my start_work function I was using the get_data(as_text=True) method but still no results. In fact with the last method (or attr) I got:
TypeError: string indices must be integers
Any help on how to access those values is greatly appreciated. Cheers.
Your schema indicates there are two items in the request_json. The first indeed has the callbackid, the 2nd only has statusCode.
Debugging this should be easy:
async def capture_post_request(request_json):
for item in request_json:
print(item)
callbackidd = item.get('callbackid')
print(callbackidd) # will be None in case of the 2nd 'item'
This will print two dicts:
{
"callbackid": "dd",
"itemid": "234r",
"input": [
{
"type": "thistype",
"uri": "www.uri.com"
}
],
"destination": {
"type": "thattype",
"uri": "www.urino2.com"
}
}
And the 2nd, the cause of your KeyError:
{
"statusCode": "202"
}
I included the 'fix' of sorts already:
callbackidd = item.get('callbackid')
This will default to None if the key isn't in the dict.
Hopefully this will get you further!
Edit
How to work with only the dict containing your key? There are two options.
First, using filter. Something like this:
def has_callbackid(dict_to_test):
return 'callbackid' in dict_to_test
list_with_only_list_callbackid_items = list(filter(has_callbackid, request_json))
# Still a list at this point! With dicts which have the `callbackid` key
Filter accepts some arguments:
Function to call to determine if the value being tested should be filtered out or not.
The iterable you want to filter
Could also use a 'lambda function', but it's a bit evil. But serves the purpose just as well:
list_with_only_list_callbackid_items = list(filter(lambda x: 'callbackid' in x, request_json))
# Still a list at this point! With dict(s) which have the `callbackid` key
Option 2, simply loop over the result and only grab the one you want to use.
found_item = None # default
for item in request_json:
if 'callbackid' in item:
found_item = item
break # found what we're looking for, stop now
# Do stuff with the found_item from this point.

How do I automate finding and replacing a JSON attribute?

This is an example of a JSON database that I will work with in my Python code.
{
"name1": {
"file": "abc"
"delimiter": "n"
},
"name2": {
"file": "def"
"delimiter": "n"
}
}
Pretend that a user of my code presses a GUI button that is supposed to change the name of "name1" to whatever the user typed into a textbox.
How do I change "name1" to a custom string without manually copying and pasting the entire JSON database into my actual code? I want the code to load the JSON database and change the name by itself.
Load the JSON object into a dict. Grab the name1 entry. Create a new entry with the desired key and the same value. Delete the original entry. Dump the dict back to your JSON file.
This is likely not the best way to perform the task. Use sed on Linux or its Windows equivalent (depending on your loaded apps) to make the simple stream-edit change.
If I understand clearly the task. Here is an example:
import json
user_input = input('Name: ')
db = json.load(open("db.json"))
db[user_input] = db.pop('name1')
json.dump(db, open("db.json", 'w'))
You can use the object_hook parameter that json.loads() accepts to detect JSON objects (dictionaries) that have an entry associated with the old key and re-associate its value with new key they're encountered.
This can be implement as a function as shown follows:
import json
def replace_key(json_repr, old_key, new_key):
def decode_dict(a_dict):
try:
entry = a_dict.pop(old_key)
except KeyError:
pass # Old key not present - no change needed.
else:
a_dict[new_key] = entry
return a_dict
return json.loads(json_repr, object_hook=decode_dict)
data = '''{
"name1": {
"file": "abc",
"delimiter": "n"
},
"name2": {
"file": "def",
"delimiter": "n"
}
}
'''
new_data = replace_key(data, 'name1', 'custom string')
print(json.dumps(new_data, indent=4))
Output:
{
"name2": {
"file": "def",
"delimiter": "n"
},
"custom string": {
"file": "abc",
"delimiter": "n"
}
}
I got the basic idea from #Mike Brennan's answer to another JSON-related question How to get string objects instead of Unicode from JSON?

json dumps not including attribute name

I am writing a python lambda function that reads in a json file from s3 and then will take one of the nodes and send it to another lambda function. Here is my code:
The json snippet I want
"jobstreams": [
{
"jobname": "team-summary",
"bucket": "aaa-bbb",
"key": "team-summary.json"
}
step 1 – convert JSON to python objects for processing
note: these I got from another Stack Overflow guru - thanks!!
def _json_object_hook(d): return namedtuple('X', d.keys())(*d.values())
def json2obj(data): return json.loads(data, object_hook=_json_object_hook)
routes = json2obj(jsonText)
step 2 - I then traverse the python objects and find the json I need and dump it
for jobstream in jobstreams:
x = json.dumps(jobstream, ensure_ascii=False)
Howeever, when I print it out, I only have the values not the attributes. Why is that?
print(json.dumps(jobstream, ensure_ascii=False))
yields
["team-summary", "aaa-bbb", "team-summary.json"]
I'm assuming your full json file looks somewhat like what I have in my example
import json
js = {"jobstreams": [
{
"jobname": "team-summary",
"bucket": "aaa-bbb",
"key": "team-summary.json"
},
{
"jobname": "team-2222",
"bucket": "aaa-2222",
"key": "team-222.json"
}
]}
def extract_by_jobname(jobname):
for d in js['jobstreams']:
if d['jobname'] == jobname:
return d
json.dumps(extract_by_jobname("team-summary"))
# '{"jobname": "team-summary", "bucket": "aaa-bbb", "key": "team-summary.json"}'
I ended up creating a new Dictionary from the list that the json.dumps gave me.
["team-summary", "aaa-bbb", "team-summary.json"]
once i had the new dictionary (that is flat), then i converted that to json.... probably not the most efficient approach but i have other fish to fry. THANKS to all for your help!

Categories

Resources