Retreive JSON Keys In Python - python

My goal is to iterate through every element in classes and add the value of class in classes into a new list.
JSON Structure:
{
"images": [
{
"classifiers": [
{
"classes": [
{
"class": "street",
"score": 0.846
},
{
"class": "road",
"score": 0.85
}
]
}
]
}
]
}
In the above JSON example, the new list should contain:
{'street','road'}
I tried to iterate over json_data['images'][0]['classifiers']['classes'] and for each one list.append() the value of class.
list = list()
def func(json_data):
for item in json_data['images'][0]['classifiers']['classes']:
list.append(item['class'])
return(list)
I am receiving a TypeError which is:
TypeError: list indices must be integers or slices, not str
What I am getting from this TypeError is that it attempts to add a string to the list but list.append() does not accept a string as a paramet. I need to somehow convert the string into something else.
My question is what should I convert the string into so that list.append() will accept it as a parameter?

The first issue is that classifiers is also a list, so you need an additional index to get at classes. This will work:
for item in json_data['images'][0]['classifiers'][0]['classes']:
Second, you want the result as a set not a list, so you can do: return set(lst). Note that sets are unordered, so don't expect the ordering of items to match that of the list.

Related

how to get access inner array of an object in dictionary

I have a json file with this structure:
[
{
"_id": "62b2ebff955fe1001d225781",
"datasetName": "comments",
"action": "dataset",
"comment": "Initial data!",
"instances": [
"62b2eb94955fe1001d22576a",
"62b2eba1955fe1001d22576e",
"62b2eba9955fe1001d225770",
"62b2ebb9955fe1001d225772",
"62b2ebcc955fe1001d225774",
"62b2ebe2955fe1001d225778"
],
"label": [
"Contemporary",
"Tango"
]
}
]
I wanted to know how can I access the values of "label" and also how can I count the length of "instances" object.
So your json is a list of dict. You can load it with the json standard library. Then you can iterate over the elements in the list. As each of these elements is a dict item, you can then get the instances list by it's key. As it is a list, you can simple call the len() function to get the length.
See the code below as an example:
import json
with open("somefile.json") as infile:
data = json.load(infile)
for element in data:
print(len(element["instances"]))
Note that if your json file contains more elements in the root list, it will of course print out the length of the instances list for each of these elements.

Filter JSON using Jmespath and return value if expression exist, if it doesn't return None/Null (python)

How can I get JMESPath to only return the value in a json if it exists, if it doesn't exist return none/null. I am using JMESPath in a python application, below is an example of a simple JSON data.
{
"name": "Sarah",
"region": "south west",
"age": 21,
"occupation": "teacher",
"height": 145,
"education": "university",
"favouriteMovie": "matrix",
"gender": "female",
"country": "US",
"level": "medium",
"tags": [],
"data": "abc",
"moreData": "xyz",
"logging" : {
"systemLogging" : [ {
"enabled" : true,
"example" : [ "this", "is", "an", "example", "array" ]
} ]
}
}
For example I want it to check if the key "occupation" contains the word "banker" if it doesn't return null.
In this case if I do jmespath query "occupation == 'banker'" I would get false. However for more complicated jmespath queries like "logging.systemLogging[?enabled == `false`]" this would result in an empty array [] because it doesn't exist, which is what I want.
The reason I want it to return none or null is because in another part of the application (my base class) I have code that checks if the dictionary/json data will return a value or not, this piece of code iterates through an array of dictionaries/ json data like the one above.
One thing I've noticed with JMESPath is that it is inconsistent with its return value. In more complicated dictionaries I am able to achieve what I want but from simple dictionaries I can't, also If you used a methods, e.g starts_with, it returns a boolean but if you just use an expression it returns the value you are looking for if it exists otherwise it will return None or an empty array.
This is traditionally accomplished by:
dictionary = json.loads(my_json)
dictionary.get(key, None) # None is the default value that is returned.
That will work if you know the exact structure to expect from the json. Alternatively you can make two calls to JMESpath, using one to try to get the value / None / empty list, and one to run the query you want.
The problem is that JMESpath is trying to answer your query: Does this structure contain this information pattern? It makes sense that the result of such a query should be True/False. If you want to get something like an empty list back, you need to modify your query to ask "Give me back all instances where this structure contains the information I'm looking for" or "Give me back the first instance where this structure contains the information I'm looking for."
Filters in JMESPath do apply to arrays (or list, to speak in Python).
So, indeed, your case is not a really common one.
This said, you can create an array out of a hash (or dictionary, to speak in Python again) using the to_array function.
Then, since you do know you started from a hash, you can select back the first element of the created array, and indeed, if the array ends up being empty, it will return a null.
To me, at least, it looks consistant, an array can be empty [], but an empty object is a null.
To use this trick, though, you will also have to reset the projection you created out of the array, with the pipe expression:
Projections are an important concept in JMESPath. However, there are times when projection semantics are not what you want. A common scenario is when you want to operate of the result of a projection rather than projecting an expression onto each element in the array.
For example, the expression people[*].first will give you an array containing the first names of everyone in the people array. What if you wanted the first element in that list? If you tried people[*].first[0] that you just evaluate first[0] for each element in the people array, and because indexing is not defined for strings, the final result would be an empty array, []. To accomplish the desired result, you can use a pipe expression, <expression> | <expression>, to indicate that a projection must stop.
Source: https://jmespath.org/tutorial.html#pipe-expressions
And so, with all this, the expression ends up being:
to_array(#)[?occupation == `banker`]|[0]
Which gives
null
On your example JSON, while the expression
to_array(#)[?occupation == `teacher`]|[0]
Would return your existing object, so:
{
"name": "Sarah",
"region": "south west",
"age": 21,
"occupation": "teacher",
"height": 145,
"education": "university",
"favouriteMovie": "matrix",
"gender": "female",
"country": "US",
"level": "medium",
"tags": [],
"data": "abc",
"moreData": "xyz",
"logging": {
"systemLogging": [
{
"enabled": true,
"example": [
"this",
"is",
"an",
"example",
"array"
]
}
]
}
}
And following this trick, all your other test will probably start to work e.g.
to_array(#)[?starts_with(occupation, `tea`)]|[0]
will give you back your object
to_array(#)[?starts_with(occupation, `ban`)]|[0]
will give you a null
And if you only need the value of the occupation property, as you are falling back to a hash now, it is as simple as doing, e.g.
to_array(#)[?starts_with(occupation, `tea`)]|[0].occupation
Which gives
"teacher"
to_array(#)[?starts_with(occupation, `ban`)]|[0].occupation
Which gives
null

How to extract data from json and add additional values to the extracted values using python?

I want to parse the value from json response using python and assign additional value to the list
{ "form": [{ "box": [60,120,260,115], "text": "hello", "label": "question", "words": [{ "box": [90,190,160,215 ],"text": "hello"} ], "linking": [[0,13]],"id": 0 }]}
I am trying to parse the value and assign to a variable using python. What I am trying to achieve is:
If the actual output is ([60,120,260,115],hello) I wanted to add few more values to the list: Thus expected output should be:
([60,120,260,120,260,115,60,115],hello)
try this:
tmp_json = { "form": [{ "box": [60,120,260,115], "text": "hello", "label": "question", "words": [{ "box": [90,190,160,215 ],"text": "hello"} ], "linking": [[0,13]],"id": 0 }]}
# Then do whatever you need to do with the list by accessing it as follows
# tmp_json["form"][0]["box"]
you can iterate through all elements of list here and if each item matches required condition extend the existing list with required values.
# Pseudocode
for item in data["form"]:
# check each item's box attribute has all such elements i.e 60,120,260,115
# AND item's text attribute has value "hello"
# If matches then to add extra values to box list you can use <list>.extend([115, 120 etc])
# e.g item["box"].extend([120, 115, 260])

How can I get the dates from this nested list in Python

My data contains nested lists and I am trying to create a list that contains only the date information from the second layer of nested lists.
"DateMap": {
"2020-12-04:0": {
"55.0": [
{
}]},
"2020-12-11:7": {
"60.0": [
{
}]}
}
I want to get a list that is like this mylist = ["2020-12-04:0", "2020-12-11:7"]
I have looked into using regex and list comprehensions and this is the expression I have found to match the dates ^\d{4}-\d\d-\d\d:\d\d?$
How can I make this work?
Use the function .keys(). This just gets all the keys of a dictionary, which is exactly what you're looking for. If DateMap is inside a dictionary, say dic, just do the same thing for dic["DateMap"].
DateMap = {
"2020-12-04:0": {
"55.0": [
{
}]},
"2020-12-11:7": {
"60.0": [
{
}]}
}
mylist = DateMap.keys()
# mylist = list(DateMap.keys()) for Python 3
print(mylist)
# Prints ['2020-12-04:0', '2020-12-11:7']

Getting specific field values from Json Python

I have a JSON file, and what I am trying to do is getting this specific field '_id'. Problem is that when I use json.load('input_file'), it says that my variable data is a list, not a dictionary, so I can't do something like:
for value in data['_id']:
print(data['_id'][i])
because I keep getting this error: TypeError: list indices must be integers or slices, not str
What I also tried to do is:
data = json.load(input_file)[0]
It kinda works. Now, my type is a dictionary, and I can access like this: data['_id']
But I only get the first '_id' from the archive...
So, what I would like to do is add all '_id' 's values into a list, to use later.
input_file = open('input_file.txt')
data = json.load(input_file)[0]
print(data['_id'])# only shows me the first '_id' value
Thanks for the help!
[{
"_id": "5436e3abbae478396759f0cf",
"name": "ISIC_0000000",
"updated": "2015-02-23T02:48:17.495000+00:00"
},
{
"_id": "5436e3acbae478396759f0d1",
"name": "ISIC_0000001",
"updated": "2015-02-23T02:48:27.455000+00:00"
},
{
"_id": "5436e3acbae478396759f0d3",
"name": "ISIC_0000002",
"updated": "2015-02-23T02:48:37.249000+00:00"
},
{
"_id": "5436e3acbae478396759f0d5",
"name": "ISIC_0000003",
"updated": "2015-02-23T02:48:46.021000+00:00"
}]
You want to print the _id of each element of your json list, so let's do it by simply iterating over the elements:
input_file = open('input_file.txt')
data = json.load(input_file) # get the data list
for element in data: # iterate on each element of the list
# element is a dict
id = element['_id'] # get the id
print(id) # print it
If you want to transform the list of elements into a list of ids for later use, you can use list comprehension:
ids = [ e['_id'] for e in data ] # get id from each element and create a list of them
As you can see the data is a list of dictionaries
for looping over data you need to use the following code
for each in data:
print each['_id']
print each['name']
print each['updated']
it says that my variable data is a list, not a dictionary, so I can't do something like:
for value in data['_id']:
print(data['_id'][i])
Yes, but you can loop over all the dictionaries in your list and get the values for their '_id' keys. This can be done in a single line using list comprehension:
data = json.load(input_file)
ids = [value['_id'] for value in data]
print(ids)
['5436e3abbae478396759f0cf', '5436e3acbae478396759f0d1', '5436e3acbae478396759f0d3', '5436e3acbae478396759f0d5']
Another way to achieve this is using the map built-in function of python:
ids = map(lambda value: value['_id'], data)
This creates a function that returns the value of the key _id from a dictionary using a lambda expression and then returns a list with the return value from this function applied on every item in data

Categories

Resources