Python conditional get value from dict? - python

Given this json string, how can I pull out the value of id if code equals 4003?
error_json = '''{
'error': {
'meta': {
'code': 4003,
'message': 'Tracking already exists.',
'type': 'BadRequest'
},
'data': {
'tracking': {
'slug': 'fedex',
'tracking_number': '442783308929',
'id': '5b59ea69a9335baf0b5befcf',
'created_at': '2018-07-26T15:36:09+00:00'
}
}
}
}'''
I can't assume that anything other than the error element exists at the beginning, so the meta and code elements may or may not be there. The data, tracking, and id may or may not be there either.
The question is how to extract the value of id if the elements are all there. If any of the elements are missing, the value of id should be None

A python dictionary has a get(key, default) method that supports returning a default value if a key is not found. You can chain empty dictionaries to reach nested elements.
# use get method to access keys without raising exceptions
# see https://docs.quantifiedcode.com/python-anti-patterns/correctness/not_using_get_to_return_a_default_value_from_a_dictionary.html
code = error_json.get('error', {}).get('meta', {}).get('code', None)
if code == 4003:
# change id with _id to avoid overriding builtin methods
# see https://docs.quantifiedcode.com/python-anti-patterns/correctness/assigning_to_builtin.html
_id = error_json.get('error', {}).get('data', {}).get('tracking', {}).get('id', None)
Now, given a string that looks like a JSON you can parse it into a dictionary using json.loads(), as shown in Parse JSON in Python

I would try this:
import json
error_json = '''{
"error": {
"meta": {
"code": 4003,
"message": "Tracking already exists.",
"type": "BadRequest"
},
"data": {
"tracking": {
"slug": "fedex",
"tracking_number": "442783308929",
"id": "5b59ea69a9335baf0b5befcf",
"created_at": "2018-07-26T15:36:09+00:00"
}
}
}
}'''
parsed_json = json.loads(error_json)
try:
if parsed_json["error"]["meta"]["code"] == int(parsed_json["error"]["meta"]["code"]):
print(str(parsed_json["error"]["data"]["tracking"]["id"]))
except:
print("no soup for you")
Output:
5b59ea69a9335baf0b5befcf
A lot of python seems to be it's better to ask for forgiveness instead of permission. You could look up to see if that key in the dictionary is there, but really it's easier to just try. I'm specifically doing a check to make sure that code is an int, but you could change it around any way you'd like. If it can be other things you'd have to adjust it. There are several different solutions to this, it's really whatever you feel the most comfortable doing and maintaining.

You can add a check for key errors to the whole operation:
def get_id(j)
try:
if j['error']['meta’]['code'] == 4003:
return j['error']['data']['tracking']['id']
except KeyError:
pass
return None
If any element is missing, this function will quietly return None. If all the required elements are present, it will return the required ID. The only time it could really fail is if one of the intermediate keys does not refer to a dictionary. That could potentially result in a TypeError.

Related

For a DynamoDB item, how can i append a list which is a value of a key only if the element is not exists in that list?

I have an item in DynamoDB that has a key which has values as a list. I want to append that list with new elements, only if they are not already exists in that list. I don't want to duplicate any element in that list. Item's structure is like below:
{
"username": "blabla",
"my_list": ["element1","element2"]
}
I use boto3 library in Python and this is my code block for the update:
response = my_table.update_item(
Key = {
'username': "blabla"
},
UpdateExpression="SET my_list = list_append(my_list, :i)",
ExpressionAttributeValues={
':i': ["element1"],
},
ReturnValues="UPDATED_NEW"
)
I tried to use if_not_exist() in UpdateExpression but always got syntax errors. How can i properly achieve this goal? Thank you.
You should not use a list you need to use a String Set like defined as datatype here:
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.NamingRulesDataTypes.html
For adding values to a Set you need to use ADD UpdateOperation like described here
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Expressions.UpdateExpressions.html#Expressions.UpdateExpressions.ADD
According to this you can change your code like
{
"username": "blabla",
"my_list": set(["element1","element2"])
}
# ---
response = my_table.update_item(
Key = {
'username': "blabla"
},
UpdateExpression="ADD my_list :i",
ExpressionAttributeValues={
':i': set(["element1"]),
},
ReturnValues="UPDATED_NEW"
)
This answer shows howto remove items from the set again
https://stackoverflow.com/a/56861830/13055947

Referencing Values in a List (syntax issue?) [duplicate]

I wrote some code to get data from a web API. I was able to parse the JSON data from the API, but the result I gets looks quite complex. Here is one example:
>>> my_json
{'name': 'ns1:timeSeriesResponseType', 'declaredType': 'org.cuahsi.waterml.TimeSeriesResponseType', 'scope': 'javax.xml.bind.JAXBElement$GlobalScope', 'value': {'queryInfo': {'creationTime': 1349724919000, 'queryURL': 'http://waterservices.usgs.gov/nwis/iv/', 'criteria': {'locationParam': '[ALL:103232434]', 'variableParam': '[00060, 00065]'}, 'note': [{'value': '[ALL:103232434]', 'title': 'filter:sites'}, {'value': '[mode=LATEST, modifiedSince=null]', 'title': 'filter:timeRange'}, {'value': 'sdas01', 'title': 'server'}]}}, 'nil': False, 'globalScope': True, 'typeSubstituted': False}
Looking through this data, I can see the specific data I want: the 1349724919000 value that is labelled as 'creationTime'.
How can I write code that directly gets this value?
I don't need any searching logic to find this value. I can see what I need when I look at the response; I just need to know how to translate that into specific code to extract the specific value, in a hard-coded way. I read some tutorials, so I understand that I need to use [] to access elements of the nested lists and dictionaries; but I can't figure out exactly how it works for a complex case.
More generally, how can I figure out what the "path" is to the data, and write the code for it?
For reference, let's see what the original JSON would look like, with pretty formatting:
>>> print(json.dumps(my_json, indent=4))
{
"name": "ns1:timeSeriesResponseType",
"declaredType": "org.cuahsi.waterml.TimeSeriesResponseType",
"scope": "javax.xml.bind.JAXBElement$GlobalScope",
"value": {
"queryInfo": {
"creationTime": 1349724919000,
"queryURL": "http://waterservices.usgs.gov/nwis/iv/",
"criteria": {
"locationParam": "[ALL:103232434]",
"variableParam": "[00060, 00065]"
},
"note": [
{
"value": "[ALL:103232434]",
"title": "filter:sites"
},
{
"value": "[mode=LATEST, modifiedSince=null]",
"title": "filter:timeRange"
},
{
"value": "sdas01",
"title": "server"
}
]
}
},
"nil": false,
"globalScope": true,
"typeSubstituted": false
}
That lets us see the structure of the data more clearly.
In the specific case, first we want to look at the corresponding value under the 'value' key in our parsed data. That is another dict; we can access the value of its 'queryInfo' key in the same way, and similarly the 'creationTime' from there.
To get the desired value, we simply put those accesses one after another:
my_json['value']['queryInfo']['creationTime'] # 1349724919000
I just need to know how to translate that into specific code to extract the specific value, in a hard-coded way.
If you access the API again, the new data might not match the code's expectation. You may find it useful to add some error handling. For example, use .get() to access dictionaries in the data, rather than indexing:
name = my_json.get('name') # will return None if 'name' doesn't exist
Another way is to test for a key explicitly:
if 'name' in resp_dict:
name = resp_dict['name']
else:
pass
However, these approaches may fail if further accesses are required. A placeholder result of None isn't a dictionary or a list, so attempts to access it that way will fail again (with TypeError). Since "Simple is better than complex" and "it's easier to ask for forgiveness than permission", the straightforward solution is to use exception handling:
try:
creation_time = my_json['value']['queryInfo']['creationTime']
except (TypeError, KeyError):
print("could not read the creation time!")
# or substitute a placeholder, or raise a new exception, etc.
Here is an example of loading a single value from simple JSON data, and converting back and forth to JSON:
import json
# load the data into an element
data={"test1": "1", "test2": "2", "test3": "3"}
# dumps the json object into an element
json_str = json.dumps(data)
# load the json to a string
resp = json.loads(json_str)
# print the resp
print(resp)
# extract an element in the response
print(resp['test1'])
Try this.
Here, I fetch only statecode from the COVID API (a JSON array).
import requests
r = requests.get('https://api.covid19india.org/data.json')
x = r.json()['statewise']
for i in x:
print(i['statecode'])
Try this:
from functools import reduce
import re
def deep_get_imps(data, key: str):
split_keys = re.split("[\\[\\]]", key)
out_data = data
for split_key in split_keys:
if split_key == "":
return out_data
elif isinstance(out_data, dict):
out_data = out_data.get(split_key)
elif isinstance(out_data, list):
try:
sub = int(split_key)
except ValueError:
return None
else:
length = len(out_data)
out_data = out_data[sub] if -length <= sub < length else None
else:
return None
return out_data
def deep_get(dictionary, keys):
return reduce(deep_get_imps, keys.split("."), dictionary)
Then you can use it like below:
res = {
"status": 200,
"info": {
"name": "Test",
"date": "2021-06-12"
},
"result": [{
"name": "test1",
"value": 2.5
}, {
"name": "test2",
"value": 1.9
},{
"name": "test1",
"value": 3.1
}]
}
>>> deep_get(res, "info")
{'name': 'Test', 'date': '2021-06-12'}
>>> deep_get(res, "info.date")
'2021-06-12'
>>> deep_get(res, "result")
[{'name': 'test1', 'value': 2.5}, {'name': 'test2', 'value': 1.9}, {'name': 'test1', 'value': 3.1}]
>>> deep_get(res, "result[2]")
{'name': 'test1', 'value': 3.1}
>>> deep_get(res, "result[-1]")
{'name': 'test1', 'value': 3.1}
>>> deep_get(res, "result[2].name")
'test1'

Save values from POST request of a list of dicts

I a trying to expose an API (if that's the correct way to say it). I am using Quart, a python library made out of Flask and this is what my code looks like:
async def capture_post_request(request_json):
for item in request_json:
callbackidd = item['callbackid']
print(callbackidd)
#app.route('/start_work/', methods=['POST'])
async def start_work():
content_type = request.headers.get('content-type')
if (content_type == 'application/json'):
request_json = await request.get_json()
loop = asyncio.get_event_loop()
loop.create_task(capture_post_request(request_json))
body = "Async Job Started"
return body
else:
return 'Content-Type not supported!'
My schema looks like that:
[
{
"callbackid": "dd",
"itemid": "234r",
"input": [
{
"type": "thistype",
"uri": "www.uri.com"
}
],
"destination": {
"type": "thattype",
"uri": "www.urino2.com"
}
},
{
"statusCode": "202"
}
]
So far what I am getting is this error:
line 11, in capture_post_request
callbackidd = item['callbackid']
KeyError: 'callbackid'
I've tried so many stackoverflow posts to see how to iterate through my list of dicts but nothing worked. At one point in my start_work function I was using the get_data(as_text=True) method but still no results. In fact with the last method (or attr) I got:
TypeError: string indices must be integers
Any help on how to access those values is greatly appreciated. Cheers.
Your schema indicates there are two items in the request_json. The first indeed has the callbackid, the 2nd only has statusCode.
Debugging this should be easy:
async def capture_post_request(request_json):
for item in request_json:
print(item)
callbackidd = item.get('callbackid')
print(callbackidd) # will be None in case of the 2nd 'item'
This will print two dicts:
{
"callbackid": "dd",
"itemid": "234r",
"input": [
{
"type": "thistype",
"uri": "www.uri.com"
}
],
"destination": {
"type": "thattype",
"uri": "www.urino2.com"
}
}
And the 2nd, the cause of your KeyError:
{
"statusCode": "202"
}
I included the 'fix' of sorts already:
callbackidd = item.get('callbackid')
This will default to None if the key isn't in the dict.
Hopefully this will get you further!
Edit
How to work with only the dict containing your key? There are two options.
First, using filter. Something like this:
def has_callbackid(dict_to_test):
return 'callbackid' in dict_to_test
list_with_only_list_callbackid_items = list(filter(has_callbackid, request_json))
# Still a list at this point! With dicts which have the `callbackid` key
Filter accepts some arguments:
Function to call to determine if the value being tested should be filtered out or not.
The iterable you want to filter
Could also use a 'lambda function', but it's a bit evil. But serves the purpose just as well:
list_with_only_list_callbackid_items = list(filter(lambda x: 'callbackid' in x, request_json))
# Still a list at this point! With dict(s) which have the `callbackid` key
Option 2, simply loop over the result and only grab the one you want to use.
found_item = None # default
for item in request_json:
if 'callbackid' in item:
found_item = item
break # found what we're looking for, stop now
# Do stuff with the found_item from this point.

How to remove all the “$oid” and "$date" in a .json file?

I have a .json file saved in my computer that contains things like $oid or $date which will later cause me trouble in BigQuery. For example:
{
"_id": {
"$oid": "5e7511c45cb29ef48b8cfcff"
},
"about": "some text",
"creationDate": {
"$date": "2021-01-05T14:59:58.046Z"
}
}
I want it to look like (so it’s not just removing some letters from the string):
{
"_id": "5e7511c45cb29ef48b8cfcff",
"about": "some text",
"creationDate": "2021-01-05T14:59:58.046Z"
}
With Pymongo, one can do something like:
my_file['id']=my_file['id']['$oid']
my_file['creationDate']=my_file['creationDate']['$date']
How would this look without using Pymongo, since I want to first find such keys and remove all the problematic $oid or $date?
Edit: sorry for the bad wording, what I meant to say was whether it was possible to find the keys that contain these problematic $ without writing down every key in the dictionary. In reality, there are more files with huge tables and many of them can contain this.
The $oid and $date fields appear when you use the default encoder using bson.json_util.dumps().
If you have control over where these files come from, you might want to fix the "problem" at source rather than having to code around it. The following code snippet shows how you can implement a custom encoder to format the output how you need it:
import json
import datetime
from pymongo import MongoClient
class MyJsonEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, datetime.datetime):
return obj.isoformat()
if hasattr(obj, '__str__'): # This will handle ObjectIds
return str(obj)
return super(MyJsonEncoder, self).default(obj)
db = MongoClient()['mydatabase']
db.mycollection.insert_one({'Date': datetime.datetime.now()})
record = db.mycollection.find_one()
print(json.dumps(record, indent=4, cls=MyJsonEncoder))
prints:
{
"_id": "60a55e3cea5bf57c79177871",
"Date": "2021-05-19T19:51:40.808000"
}
I would try something as shown below.
import json
file = open('data.json','r')
data = json.load(file)
for k,v in data.items():
#check if key has dict value
if type(v) == dict:
#find id with $
r = list(data[k].keys())[0]
#change value if $ occurs
if r[0] == '$':
data[k] = data[k][r]
print(data)
seems like we get this output.
{'_id': '5e7511c45cb29ef48b8cfcff', 'about': 'some text', 'creationDate': '2021-01-05T14:59:58.046Z'}

The best way to transform a response to a json format in the example

Appreciate if you could help me for the best way to transform a result into json as below.
We have a result like below, where we are getting an information on the employees and the companies. In the result, somehow, we are getting a enum like T, but not for all the properties.
[ {
"T.id":"Employee_11",
"T.category":"Employee",
"node_id":["11"]
},
{
"T.id":"Company_12",
"T.category":"Company",
"node_id":["12"],
"employeecount":800
},
{
"T.id":"id~Employee_11_to_Company_12",
"T.category":"WorksIn",
},
{
"T.id":"Employee_13",
"T.category":"Employee",
"node_id":["13"]
},
{
"T.id":"Parent_Company_14",
"T.category":"ParentCompany",
"node_id":["14"],
"employeecount":900,
"childcompany":"Company_12"
},
{
"T.id":"id~Employee_13_to_Parent_Company_14",
"T.category":"Contractorin",
}]
We need to transform this result into a different structure and grouping based on the category, if category in Employee, Company and ParentCompany, then it should be under the node_properties object, else, should be in the edge_properties. And also, apart from the common properties(property_id, property_category and node), different properties to be added if the category is company and parent company. There are few more logic also where we have to get the from and to properties of the edge object based on the 'to' . the expected response is,
"node_properties":[
{
"property_id":"Employee_11",
"property_category":"Employee",
"node":{node_id: "11"}
},
{
"property_id":"Company_12",
"property_category":"Company",
"node":{node_id: "12"},
"employeecount":800
},
{
"property_id":"Employee_13",
"property_category":"Employee",
"node":{node_id: "13"}
},
{
"property_id":"Company_14",
"property_category":"ParentCompany",
"node":{node_id: "14"},
"employeecount":900,
"childcompany":"Company_12"
}
],
"edge_properties":[
{
"from":"Employee_11",
"to":"Company_12",
"property_id":"Employee_11_to_Company_12",
},
{
"from":"Employee_13",
"to":"Parent_Company_14",
"property_id":"Employee_13_to_Parent_Company_14",
}
]
In java, we have used the enhanced for loop, switch etc. How we can write the code in the python to get the structure as above from the initial result structure. ( I am new to python), thank you in advance.
Regards
Here is a method that I quickly made, you can adjust it to your requirements. You can use regex or your own function to get the IDs of the edge_properties then assign it to an object like the way I did. I am not so sure of your full requirements but if that list that you gave is all the categories then this will be sufficient.
def transform(input_list):
node_properties = []
edge_properties = []
for input_obj in input_list:
# print(obj)
new_obj = {}
if input_obj['T.category'] == 'Employee' or input_obj['T.category'] == 'Company' or input_obj['T.category'] == 'ParentCompany':
new_obj['property_id'] = input_obj['T.id']
new_obj['property_category'] = input_obj['T.category']
new_obj['node'] = {input_obj['node_id'][0]}
if "employeecount" in input_obj:
new_obj['employeecount'] = input_obj['employeecount']
if "childcompany" in input_obj:
new_obj['childcompany'] = input_obj['childcompany']
node_properties.append(new_obj)
else: # You can do elif == to as well based on your requirements if there are other outliers
# You can use regex or whichever method here to split the string and add the values like above
edge_properties.append(new_obj)
return [node_properties, edge_properties]

Categories

Resources