I have a .json file with many entries looking like this:
{
"name": "abc",
"time": "20220607T190731.442",
"id": "123",
"relatedIds": [
{
"id": "456",
"source": "sourceA"
},
{
"id": "789",
"source": "sourceB"
}
],
}
I am saving each entry in a python object, however, I only need the related ID from source A. Problem is, the related ID from source A is not always first place in that nested list.
So data['relatedIds'][0]['id'] is not reliable to yield the right Id.
Currently I am solving the issue like this:
import json
with open("filepath", 'r') as file:
data = json.load(file)
for value in data['relatedIds']:
if(value['source'] == 'sourceA'):
id_from_a = value['id']
entry = Entry(data['name'], data['time'], data['id'], id_from_a)
I don't think this approach is the optimal solution though, especially if relatedIds list gets longer and more entries appended to the JSON file.
Is there a more sophisticated way of singling out this 'id' value from a specified source without looping through all entries in that nested list?
For a cleaner solution, you could try using python's filter() function with a simple lambda:
import json
with open("filepath", 'r') as file:
data = json.load(file)
filtered_data = filter(lambda a : a["source"] == "sourceA", data["relatedIds"])
id_from_a = next(filtered_data)['id']
entry = Entry(data['name'], data['time'], data['id'], id_from_a)
Correct me if I misunderstand how your json file looks, but it seems to work for me.
One step at a time, in order to get to all entries:
>>> data["relatedIds"]
[{'id': '789', 'source': 'sourceB'}, {'id': '456', 'source': 'sourceA'}]
Next, in order to get only those entries with source=sourceA:
>>> [e for e in data["relatedIds"] if e["source"] == "sourceA"]
[{'id': '456', 'source': 'sourceA'}]
Now, since you don't want the whole entry, but just the ID, we can go a little further:
>>> [e["id"] for e in data["relatedIds"] if e["source"] == "sourceA"]
['456']
From there, just grab the first ID:
>>> [e["id"] for e in data["relatedIds"] if e["source"] == "sourceA"][0]
'456'
Can you get whatever generates your .json file to produce the relatedIds as an object rather than a list?
{
"name": "abc",
"time": "20220607T190731.442",
"id": "123",
"relatedIds": {
"sourceA": "456",
"sourceB": "789"
}
}
If not, I'd say you're stuck looping through the list until you find what you're looking for.
Related
I wrote some code to get data from a web API. I was able to parse the JSON data from the API, but the result I gets looks quite complex. Here is one example:
>>> my_json
{'name': 'ns1:timeSeriesResponseType', 'declaredType': 'org.cuahsi.waterml.TimeSeriesResponseType', 'scope': 'javax.xml.bind.JAXBElement$GlobalScope', 'value': {'queryInfo': {'creationTime': 1349724919000, 'queryURL': 'http://waterservices.usgs.gov/nwis/iv/', 'criteria': {'locationParam': '[ALL:103232434]', 'variableParam': '[00060, 00065]'}, 'note': [{'value': '[ALL:103232434]', 'title': 'filter:sites'}, {'value': '[mode=LATEST, modifiedSince=null]', 'title': 'filter:timeRange'}, {'value': 'sdas01', 'title': 'server'}]}}, 'nil': False, 'globalScope': True, 'typeSubstituted': False}
Looking through this data, I can see the specific data I want: the 1349724919000 value that is labelled as 'creationTime'.
How can I write code that directly gets this value?
I don't need any searching logic to find this value. I can see what I need when I look at the response; I just need to know how to translate that into specific code to extract the specific value, in a hard-coded way. I read some tutorials, so I understand that I need to use [] to access elements of the nested lists and dictionaries; but I can't figure out exactly how it works for a complex case.
More generally, how can I figure out what the "path" is to the data, and write the code for it?
For reference, let's see what the original JSON would look like, with pretty formatting:
>>> print(json.dumps(my_json, indent=4))
{
"name": "ns1:timeSeriesResponseType",
"declaredType": "org.cuahsi.waterml.TimeSeriesResponseType",
"scope": "javax.xml.bind.JAXBElement$GlobalScope",
"value": {
"queryInfo": {
"creationTime": 1349724919000,
"queryURL": "http://waterservices.usgs.gov/nwis/iv/",
"criteria": {
"locationParam": "[ALL:103232434]",
"variableParam": "[00060, 00065]"
},
"note": [
{
"value": "[ALL:103232434]",
"title": "filter:sites"
},
{
"value": "[mode=LATEST, modifiedSince=null]",
"title": "filter:timeRange"
},
{
"value": "sdas01",
"title": "server"
}
]
}
},
"nil": false,
"globalScope": true,
"typeSubstituted": false
}
That lets us see the structure of the data more clearly.
In the specific case, first we want to look at the corresponding value under the 'value' key in our parsed data. That is another dict; we can access the value of its 'queryInfo' key in the same way, and similarly the 'creationTime' from there.
To get the desired value, we simply put those accesses one after another:
my_json['value']['queryInfo']['creationTime'] # 1349724919000
I just need to know how to translate that into specific code to extract the specific value, in a hard-coded way.
If you access the API again, the new data might not match the code's expectation. You may find it useful to add some error handling. For example, use .get() to access dictionaries in the data, rather than indexing:
name = my_json.get('name') # will return None if 'name' doesn't exist
Another way is to test for a key explicitly:
if 'name' in resp_dict:
name = resp_dict['name']
else:
pass
However, these approaches may fail if further accesses are required. A placeholder result of None isn't a dictionary or a list, so attempts to access it that way will fail again (with TypeError). Since "Simple is better than complex" and "it's easier to ask for forgiveness than permission", the straightforward solution is to use exception handling:
try:
creation_time = my_json['value']['queryInfo']['creationTime']
except (TypeError, KeyError):
print("could not read the creation time!")
# or substitute a placeholder, or raise a new exception, etc.
Here is an example of loading a single value from simple JSON data, and converting back and forth to JSON:
import json
# load the data into an element
data={"test1": "1", "test2": "2", "test3": "3"}
# dumps the json object into an element
json_str = json.dumps(data)
# load the json to a string
resp = json.loads(json_str)
# print the resp
print(resp)
# extract an element in the response
print(resp['test1'])
Try this.
Here, I fetch only statecode from the COVID API (a JSON array).
import requests
r = requests.get('https://api.covid19india.org/data.json')
x = r.json()['statewise']
for i in x:
print(i['statecode'])
Try this:
from functools import reduce
import re
def deep_get_imps(data, key: str):
split_keys = re.split("[\\[\\]]", key)
out_data = data
for split_key in split_keys:
if split_key == "":
return out_data
elif isinstance(out_data, dict):
out_data = out_data.get(split_key)
elif isinstance(out_data, list):
try:
sub = int(split_key)
except ValueError:
return None
else:
length = len(out_data)
out_data = out_data[sub] if -length <= sub < length else None
else:
return None
return out_data
def deep_get(dictionary, keys):
return reduce(deep_get_imps, keys.split("."), dictionary)
Then you can use it like below:
res = {
"status": 200,
"info": {
"name": "Test",
"date": "2021-06-12"
},
"result": [{
"name": "test1",
"value": 2.5
}, {
"name": "test2",
"value": 1.9
},{
"name": "test1",
"value": 3.1
}]
}
>>> deep_get(res, "info")
{'name': 'Test', 'date': '2021-06-12'}
>>> deep_get(res, "info.date")
'2021-06-12'
>>> deep_get(res, "result")
[{'name': 'test1', 'value': 2.5}, {'name': 'test2', 'value': 1.9}, {'name': 'test1', 'value': 3.1}]
>>> deep_get(res, "result[2]")
{'name': 'test1', 'value': 3.1}
>>> deep_get(res, "result[-1]")
{'name': 'test1', 'value': 3.1}
>>> deep_get(res, "result[2].name")
'test1'
I have some json data similar to this...
{
"people": [
{
"name": "billy",
"age": "12"
...
...
},
{
"name": "karl",
"age": "31"
...
...
},
...
...
]
}
At the moment I can do this to get a entry from the people list...
wantedPerson = "karl"
for person in people:
if person['name'] == wantedPerson:
* I have the persons entry *
break
Is there a better way of doing this? Something similar to how we can .get('key') ?
Thanks,
Chris
Assuming you load that json data using the standard library for it, you're fairly close to optimal, perhaps you were looking for something like this:
from json import loads
text = '{"people": [{"name": "billy", "age": "12"}, {"name": "karl", "age": "31"}]}'
data = loads(text)
people = [p for p in data['people'] if p['name'] == 'karl']
If you frequently need to access this data, you might just do something like this:
all_people = {p['name']: p for p in data['people']}
print(all_people['karl'])
That is, all_people becomes a dictionary that uses the name as a key, so you can access any person in it quickly by accessing them by name. This assumes however that there are no duplicate names in your data.
First, there's no problem with your current 'naive' approach - it's clear and efficient since you can't find the value you're looking for without scanning the list.
It seems that you refer to better as shorter, so if you want a one-liner solution, consider the following:
next((person for person in people if person.name == wantedPerson), None)
It gets the first person in the list that has the required name or None if no such person was found.
similarly
ps = {
"people": [
{
"name": "billy",
"age": "12"
},
{
"name": "karl",
"age": "31"
},
]
}
print([x for x in ps['people'] if 'karl' in x.values()])
For possible alternatives or details see e.g. # Get key by value in dictionary
I've been messing around with JSON for some time. I want to get the values of "box" and "text" in this format using python can someone help me out how to resolve this example output:[92,197,162,215,AUTHORS,...!]
{ "form": [ { "box": [ 92,162,197,215], "text": "AUTHORS", "label": "question", "words": [ { "box": [ 92,197,162,215 ],"text": "AUTHORS"} ], "linking": [[0,13]],"id": 0 },
import os
import json
# Directory name consisting of json
file = open('033.json')
data = json.load(file)
result = []
for value in data['form']:
my_dict=[]
my_dict=value.get('box')
print(my_dict)
result.append(my_dict)
Probably like this:
collector = []
for obj in form:
collector.append({"box": obj["box"], "text": obj["text"]})
print(collector)
Okay, few issues with your code -
Why is your list named my_dict? A name should indicate what the object is/ what it contains. Your name does the opposite and if someone works with that code in the future then it will most likely confuse them.
Why are you initializing a list before doing this value.get('box')?
As for the solution, it is a short piece of code that would require 2 lines of code.
result = []
for form_dict in data['form']:
result.append(tuple(form_dict[key]
for key in ('box', 'text') if key in form_dict))
That piece of code would result in this: [([92, 162, 197, 215], 'AUTHORS')] based on the data you provided.
This is assuming that there can be more items in the data['form'] list, otherwise the for loop is not needed.
I'm pretty new to Python, so just working my way through understanding the data sets.
I'm having a little trouble producing the JSON output that is required for the API I am working with.
I am using
import json
json.load(data_file)
Working with Python dictionary and then doing
json.dump(dict, json_data)
My data needs to look like the following when it is output.
{
"event":{
"id":10006,
"event_name":"My Event Name",
},
"sub event":[
],
"attendees":[
{
"id":11201,
"first_name":"Jeff",
"last_name":"Smith",
},
{
"id":10002,
"first_name":"Victoria",
"last_name":"Baker",
},
]
}
I have been able to create the arrays in python and dump to json, but I am having difficulty creating the event "object" in the dictionary. I am using the below:
attendees = ['attendees']
attendeesdict = {}
attendeesdict['first_name'] = "Jeff"
attendees.append(attendeesdict.copy())
Can anyone help me add the "event" object properly?
In general, going from JSON to dictionary is almost no work because the two are very similar, if not identical:
attendees = [
{
"first_name": "Jeff"
# Add other fields which you need here
},
{
"first_name": "Victoria"
}
]
In this instance, attendees is a list of dictionaries. For the event:
event = {
"id": 10006,
"event_name": "My Event Name"
}
Putting it all together:
data = {
"event": event,
"sub event": [],
"attendees": attendees
}
Now, you can convert it to a JSON object, ready to send to your API:
json_object = json.dumps(data)
Assuming you have built all the values elsewhere and now you're just putting them together:
result = {'event':event_dict, 'sub event':subevent_list, 'attendees':attendees_list}
If you want just to statically create a nested dict, you can use a single literal. If you paste the JSON above into python code, you would get a valid dict literal.
Construct your dicts and add like below
{
"event":"add your dict"
"sub event":["add your dict"],
"attendees":["add your dict"]
}
i'm having trouble with a simple question.
a = {
"apiVersion": "2.1",
"data": {
"startIndex": 1,
"items": [{
"id": "YVA3UoZM0zU",
"title": "Trailer - Lisbela eo Prisioneiro"
}]
}
}
i don't know how to get the info id.
this is a string.
so, i tried to make this
import simplejson as json
>>> type(js)
<type 'dict'>
js = json.loads(a)
print js['data'{'items'[{'id'}]}]
>>> syntax error
this syntax is invalid, how could I get this info? it's supposed to be easy. where I'm making wrong?
Try:
js['data']['items'][0]['id']
It would appear that there may be multiple items in this structure. If you'd like to extract all item ids as a list, the following will do it:
[item['id'] for item in js['data']['items']]