unable to get to one value from multiple values json python - python

working on a project and this is driving me nuts , I have search online and found few answer that have work for my other queries that are json related however for this one its a bit of nightmare keep getting TrackStack error
this is my json
ServerReturnJson = {
"personId":"59c16cab-9f28-454e-8c7c-213ac6711dfc",
"persistedFaceIds":["3aaafe27-9013-40ae-8e2a-5803dad90d04"],
"name":"ramsey,",
"userData":null
}
data = responseIdentify.read()
print("The following data return : " + data)
#Parse json data to print just
load = json.loads(data)
print(load[0]['name'])
and thats where my problem is I am unable to get the value form name , try for next statement and then i get this error:
Traceback (most recent call last):
File "C:\Python-Windows\random_test\cogT2.py", line 110, in <module>
for b in load[0]['name']:
KeyError: 0
using this for loop
for b in load[0]['name']:
print b[load]
any support would be most welcome am sure its something simple just can not figure it out.

Understanding how to reference nested dicts and lists in JSON is the hardest part. Here's a few things to consider.
Using your original data
ServerReturnJson = {
"personId":"59c16cab-9f28-454e-8c7c-213ac6711dfc",
"persistedFaceIds":["3aaafe27-9013-40ae-8e2a-5803dad90d04"],
"name":"ramsey,",
"userData":'null'
}
# No index here, just the dictionary key
print(ServerReturnJson['name'])
Added second person by making a list of dicts
ServerReturnJson = [{
"personId":"59c16cab-9f28-454e-8c7c-213ac6711dfc",
"persistedFaceIds":["3aaafe27-9013-40ae-8e2a-5803dad90d04"],
"name":"ramsey",
"userData": 'null'
},
{
"personId": "234123412341234234",
"persistedFaceIds": ["1241234123423"],
"name": "miller",
"userData": 'null'
}
]
# You can use the index here since you have a list of dictionaries
print(ServerReturnJson[1]['name'])
# You can iterate like this
for item in ServerReturnJson:
print(item['name'])

Thanks for your support basically Microsoft Face API is returning back json with no index like Chris said in this first example
The above example works only if you add the following
data = responseIdentify.read() # read incoming respond form server
ServerReturnJson = json.loads(data)
so the complete answer is as follows :
dataJson= {
"personId":"59c16cab-9f28-454e-8c7c-213ac6711dfc",
"persistedFaceIds":["3aaafe27-9013-40ae-8e2a-5803dad90d04"],
"name":"ramsey,",
"userData":'null'
}
# add json load here
ServerReturnJson = json.loads(dataJson)
# No index here, just the dictionary key
print(ServerReturnJson['name'])
credits to Chris thanks , one last thing Chris mention "Understanding how to reference nested dicts and lists in JSON is the hardest part" 100% agreed

Related

How can I best convert an API JSON object to a single row for SQL server?

I have a script setup to pull a JSON from an API and I need to convert objects into different columns for a single row layout for a SQL server. See the example below for the body raw layout of an example object:
"answers": {
"agent_star_rating": {
"question_id": 145,
"question_text": "How satisfied are you with the service you received from {{ employee.first_name }} today?",
"comment": "John was exceptionally friendly and knowledgeable.",
"selected_options": {
"1072": {
"option_id": 1072,
"option_text": "5",
"integer_value": 5
}
}
},
In said example I need the output for all parts of agent_star_rating to be individual columns so all data spits out 1 row for the entire survey on our SQL server. I have tried mapping several keys like so:
agent_star_rating = [list(response['answers']['agent_star_rating']['selected_options'].values())[0]['integer_value']]
agent_question = (response['answers']['agent_star_rating']['question_text'])
agent_comment = (response['answers']['agent_star_rating']['comment'])
response['agent_question'] = agent_question
response['agent_comment'] = agent_comment
response['agent_star_rating'] = agent_star_rating
I get the expected result until we reach a point where some surveys have skipped a field like ['question text'] and we'll get a missing key error. This happens over the course of other objects and I am failing to come up with a solution for these missing keys. If there is a better way to format the output as I've described beyond the keys method I've used I'd also love to hear ideas! I'm fresh to learning python/pandas so pardon any improper terminology!
I would do something like this:
# values that you always capture
row = ['value1', 'value2', ...]
gottem_attrs = {'question_id': '' ,
'question_text': '',
'comment': '',
'selected_options': ''}
# find and save the values that response have
for attr in list(response['agent_star_rating']):
gottem_attrs[attr] = response['agent_star_rating'][attr]
# then you have your final row
final_row = row + gottem_attrs.values()
If the response have a value in his attribute, this code will save it. Else, it will save a empty string for that value.

Normalizing Cloudwatch Log JSON in Python

I'm trying to clean AWS Cloudwatch's log data, which is delivered in JSON format when queried via boto3. Each log line is stored as an array of dictionaries. For example, one log line takes the following form:
[
{
"field": "field1",
"value": "abc"
},
{
"field": "field2",
"value": "def"
},
{
"field": "field3",
"value": "ghi"
}
]
If this were in a standard key-value format (e.g., {'field1':'abc'}), I would know exactly what to do with it. I'm just getting stuck on untangling the extra layer of hierarchy introduced by the field/value keys. The ultimate goal is to convert the entire response object into a data frame like the following:
| field1 | field2 | field3 |
|--------|--------|--------|
| abc | def | ghi
(and so on for the rest of the response object, one row per log line.)
Last bit of info: each array has the same set of fields, and there is no nesting deeper than the example I've provided here. Thank you in advance :)
I was able to do this using nested loops. Not my favorite - I always feel like there has to be a more elegant solution than crawling through every single inch of the object, but this data is simple enough that it's still very fast.
logList = [] # Empty array to store list of dictionaries (i.e., log lines)
for line in logs: # logs = response object
line_dict = {}
# Flatten each dict into a single key-value pair
for i in range( len(line) ):
line_dict[ line[i]['field'] ] = line[i]['value']
logList.append(line_dict)
df = pd.json_normalize(logList)
For anyone else working with CloudWatch logs, the actual log lines (like those I displayed above) are nested in an array called 'results' in the boto3 response object. So you'd need to extract that array first, or point the outer loop to it (i.e., for line in response['results']).

Why is my For loop in python printing three times?

I am trying to loop through a JSON file to get specific values, however, when doing so the loop is printing three times. I only want the value to print once and have tried breaking the loop but it still has not worked.
Python Code:
with open(filename) as json_filez:
dataz = json.load(json_filez)
for i in dataz:
for i in dataz['killRelated']:
print(i["SteamID"])
break
and a snippet of my json file is
{
"killRelated": [
{
"SteamID": "76561198283763531",
"kill": "15,302",
"shotacc": "16.1%"
}
],
"metaData": [
{
"test": "lol"
}
],
"miscData": [
{
"damageGiven": "2,262,638",
"gamePlayed": "1,292",
"moneyEarned": "50,787,000",
"score": "31,122",
"timePlayed": "22d 11h 56m"
}
]
}
and this is my output:
76561198283763531
76561198283763531
76561198283763531
Expected output:
76561198283763531
The return from json.load is a dictionary, and you are only interested in one entry in that, keyed by 'killRelated'. Now the "values" against each dictionary entry are lists, so that is what you need to be iterating though. And each element of such a list is a dictionary that you can again access via a key.
So your code could be:
with open(filename) as json_filez:
dataz = json.load(json_filez)
for kr in dataz['killRelated']: # iterate through the list under the top-level keyword
print (kr["SteamID"])
Now in your sample data, there's only one entry in the dataz['killRelated'] list, so you'll only get that one printed. But in general, you should expect multiple entries - and cater for the possibility of none. You can handle that by try/except of by checking key existence; here's the latter:
with open(filename) as json_filez:
dataz = json.load(json_filez)
if 'killRelated' in dataz: # check for the top keyword
for kr in dataz['killRelated']: # iterate through the list under this keyword
if 'steamID' in kr: # check for the next level keyword
print (kr["SteamID"]) # report it
You were getting three output lines because your outer loop iterated across all keyword entries in dataz (although without examining them), and then each time within that also iterated across the dataz['killRelated'] list. Your addition of break only stopped that inner loop, which for the particular data you had was redundant anyway because it was only going to print one entry.
Your code is correct. You should check your json file or you can share your full JSON text. That would be a problem. I run you code with json snippet you provided and it works as expected.
import json
with open("test.json") as json_filez:
dataz = json.load(json_filez)
for i in dataz:
for i in dataz['killRelated']:
print(i["SteamID"])
and the result as blow:
76561198283763531

my loop is only printing the second part of the dictionary , i'm using json

import json
data ='''
{
"names": {"first_boy" : "khaled"},
"names": {"second_boy" : "waseem"}
}
'''
info = json.loads(data)
for line in info:
print(info["names"])
I expected it to print the first_boy and the second_boy dictionary ,but it's printing
{'second_boy': 'waseem'}
Dicts in python can only support one of the same key. Similarly, most implementations of JSON do not allow duplicate keys. The way python handles this, when using json.loads() (or anything else that constructs a dict) is to simply use the most recent definition of any given key.
In this case, {"second_boy":"waseem"} overwrites {"first_boy":"khaled"}.
The problem here is that the key "names" exists 2 times.
Maybe you can do this:
import json
data ='''
{
"names": {"first_boy" : "khaled",
"second_boy" : "waseem"}
}
'''
info = json.loads(data)
for key, value in info['names'].items():
print(key, value)

Parsing JSON output efficiently in Python?

The below block of code works however I'm not satisfied that it is very optimal due to my limited understanding of using JSON but I can't seem to figure out a more efficient method.
The steam_game_db is like this:
{
"applist": {
"apps": [
{
"appid": 5,
"name": "Dedicated Server"
},
{
"appid": 7,
"name": "Steam Client"
},
{
"appid": 8,
"name": "winui2"
},
{
"appid": 10,
"name": "Counter-Strike"
}
]
}
}
and my Python code so far is
i = 0
x = 570
req_name_from_id = requests.get(steam_game_db)
j = req_name_from_id.json()
while j["applist"]["apps"][i]["appid"] != x:
i+=1
returned_game = j["applist"]["apps"][i]["name"]
print(returned_game)
Instead of looping through the entire app list is there a smarter way to perhaps search for it? Ideally the elements in the data structure with 'appid' and 'name' were numbered the same as their corresponding 'appid'
i.e.
appid 570 in the list is Dota2
However element 570 in the data structure in appid 5069 and Red Faction
Also what type of data structure is this? Perhaps it has limited my searching ability for this answer already. (I.e. seems like a dictionary of 'appid' and 'element' to me for each element?)
EDIT: Changed to a for loop as suggested
# returned_id string for appid from another query
req_name_from_id = requests.get(steam_game_db)
j_2 = req_name_from_id.json()
for app in j_2["applist"]["apps"]:
if app["appid"] == int(returned_id):
returned_game = app["name"]
print(returned_game)
The most convenient way to access things by a key (like the app ID here) is to use a dictionary.
You pay a little extra performance cost up-front to fill the dictionary, but after that pulling out values by ID is basically free.
However, it's a trade-off. If you only want to do a single look-up during the life-time of your Python program, then paying that extra performance cost to build the dictionary won't be beneficial, compared to a simple loop like you already did. But if you want to do multiple look-ups, it will be beneficial.
# build dictionary
app_by_id = {}
for app in j["applist"]["apps"]:
app_by_id[app["appid"]] = app["name"]
# use it
print(app_by_id["570"])
Also think about caching the JSON file on disk. This will save time during your program's startup.
It's better to have the JSON file on disk, you can directly dump it into a dictionary and start building up your lookup table. As an example I've tried to maintain your logic while using the dict for lookups. Don't forget to encode the JSON it has special characters in it.
Setup:
import json
f = open('bigJson.json')
apps = {}
with open('bigJson.json', encoding="utf-8") as handle:
dictdump = json.loads(handle.read())
for item in dictdump['applist']['apps']:
apps.setdefault(item['appid'], item['name'])
Usage 1:
That's the way you have used it
for appid in range(0, 570):
if appid in apps:
print(appid, apps[appid].encode("utf-8"))
Usage 2: That's how you can query a key, using getinstead of [] will prevent a KeyError exception if the appid isn't recorded.
print(apps.get(570, 0))

Categories

Resources