I just need contexts to be an Array ie., 'contexts' :[{}] instead of 'contexts':{}
Below is my python code which helps in converting python data-frame to required JSON format
This is the sample df for one row
name type aim context
xxx xxx specs 67646546 United States of America
data = {'entities':[]}
for key,grp in df.groupby('name'):
for idx, row in grp.iterrows():
temp_dict_alpha = {'name':key,'type':row['type'],'data' :{'contexts':{'attributes':{},'context':{'dcountry':row['dcountry']}}}}
attr_row = row[~row.index.isin(['name','type'])]
for idx2,row2 in attr_row.iteritems():
dict_temp = {}
dict_temp[idx2] = {'values':[]}
dict_temp[idx2]['values'].append({'value':row2,'source':'internal','locale':'en_Us'})
temp_dict_alpha['data']['contexts']['attributes'].update(dict_temp)
data['entities'].append(temp_dict_alpha)
print(json.dumps(data, indent = 4))
Desired output:
{
"entities": [{
"name": "XXX XXX",
"type": "specs",
"data": {
"contexts": [{
"attributes": {
"aim": {
"values": [{
"value": 67646546,
"source": "internal",
"locale": "en_Us"
}
]
}
},
"context": {
"country": "United States of America"
}
}
]
}
}
]
}
However I am getting below output
{
"entities": [{
"name": "XXX XXX",
"type": "specs",
"data": {
"contexts": {
"attributes": {
"aim": {
"values": [{
"value": 67646546,
"source": "internal",
"locale": "en_Us"
}
]
}
},
"context": {
"country": "United States of America"
}
}
}
}
]
}
Can any one please suggest ways for solving this problem using Python.
I think this does it:
import pandas as pd
import json
df = pd.DataFrame([['xxx xxx','specs','67646546','United States of America']],
columns = ['name', 'type', 'aim', 'context' ])
data = {'entities':[]}
for key,grp in df.groupby('name'):
for idx, row in grp.iterrows():
temp_dict_alpha = {'name':key,'type':row['type'],'data' :{'contexts':[{'attributes':{},'context':{'country':row['context']}}]}}
attr_row = row[~row.index.isin(['name','type'])]
for idx2,row2 in attr_row.iteritems():
if idx2 != 'aim':
continue
dict_temp = {}
dict_temp[idx2] = {'values':[]}
dict_temp[idx2]['values'].append({'value':row2,'source':'internal','locale':'en_Us'})
temp_dict_alpha['data']['contexts'][0]['attributes'].update(dict_temp)
data['entities'].append(temp_dict_alpha)
print(json.dumps(data, indent = 4))
Output:
{
"entities": [
{
"name": "xxx xxx",
"type": "specs",
"data": {
"contexts": [
{
"attributes": {
"aim": {
"values": [
{
"value": "67646546",
"source": "internal",
"locale": "en_Us"
}
]
}
},
"context": {
"country": "United States of America"
}
}
]
}
}
]
}
The problem is here in the following code
temp_dict_alpha = {'name':key,'type':row['type'],'data' :{'contexts':{'attributes':{},'context':{'dcountry':row['dcountry']}}}}
As you can see , you are already creating a contexts dict and assigning values to it. What you could do is something like this
contextObj = {'attributes':{},'context':{'dcountry':row['dcountry']}}
contextList = []
for idx, row in grp.iterrows():
temp_dict_alpha = {'name':key,'type':row['type'],'data' :{'contexts':{'attributes':{},'context':{'dcountry':row['dcountry']}}}}
attr_row = row[~row.index.isin(['name','type'])]
for idx2,row2 in attr_row.iteritems():
dict_temp = {}
dict_temp[idx2] = {'values':[]}
dict_temp[idx2]['values'].append({'value':row2,'source':'internal','locale':'en_Us'})
contextObj['attributes'].update(dict_temp)
contextList.append(contextObj)
Please Note - This code will have logical errors and might not run ( as it is difficult for me , to understand the logic behind it). But here is what you need to do .
You need to create a list of objects, which is not what you are doing. You are trying to manipulate an object and when its JSON dumped , you are getting an object back instead of a list. What you need is a list. You create context object for each and every iteration and keep on appending them to the local list contextList that we created earlier.
Once when the for loop terminates, you can update your original object by using the contextList and you will have a list of objects instead of and object which you are having now.
Related
I am writing a parser that goes through a list of data that is roughly formatted:
{
"teachers": [
{
"fullName": "Testing",
"class": [
{
"className": "Counselor",
"school": {
"id": "2b6671cb-617d-48d6-b0b5-3d44ce4da21c"
}
}
]
},
...
}
The parser is supposed to check for duplicate names within this json object, and when it stumbles upon said duplicate name, append the class to the class array.
So for example:
{
"teachers": [
{
"fullName": "Testing",
"class": [
{
"className": "Counselor",
"school": {
"id": "2b6671cb-617d-48d6-b0b5-3d44ce4da21c"
}
}
]
},
{
"fullName": "Testing",
"class": [
{
"className": "Math 8",
"school": {
"id": "2b6671cb-617d-48d6-b0b5-3d44ce4da21c"
}
}
]
},
...
}
Would return
{
"teachers": [
{
"fullName": "Testing",
"class": [
{
"className": "Counselor",
"school": {
"id": "2b6671cb-617d-48d6-b0b5-3d44ce4da21c"
}
},
{
"className": "Math 8",
"school": {
"id": "2b6671cb-617d-48d6-b0b5-3d44ce4da21c"
}
},
]
},
...
}
My current parser works just fine for most objects, however for some reason it doesn't catch some of the duplicates despite the names being the exact same, and also is appending the string
}7d-48d6-b0b5-3d44ce4da21c"
}
}
]
}
]
to the end of the json document. I am not sure why it would do this considering I am just dumping the modified json (which only is modified within the array).
My parser code is:
i_duplicates = []
name_duplicates = []
def converter():
global i_duplicates
file = open("final2.json", "r+")
infinite = json.load(file)
for i, teacher in enumerate(infinite["teachers"]):
class_name = teacher["class"][0]["className"]
class_data = {
"className": class_name,
"school": {
"id": "2b6671cb-617d-48d6-b0b5-3d44ce4da21c"
}
}
d = {
"fullName": teacher["fullName"],
"index": i
}
c = {
"fullName": teacher["fullName"]
}
if c in name_duplicates:
infinite["teachers"][search(i_duplicates, c["fullName"])]["class"].append(class_data)
infinite["teachers"].pop(i)
file.seek(0)
json.dump(infinite, file, indent=4)
else:
i_duplicates.append(d)
name_duplicates.append(c)
def search(a, t):
for i in a:
if i["fullName"] == t:
return i["index"]
print(Fore.RED + "not found" + Fore.RESET)
I know I am going about this inefficiently, but I am not sure how to fix the issues the current algorithm is having. Any feedback appreciated.
This is the first time I'm working with JSON, and I'm trying to pull url out of the JSON below.
{
"name": "The_New11d112a_Company_Name",
"sections": [
{
"name": "Products",
"payload": [
{
"id": 1,
"name": "TERi Geriatric Patient Skills Trainer,
"type": "string"
}
]
},
{
"name": "Contact Info",
"payload": [
{
"id": 1,
"name": "contacts",
"url": "https://www.a3bs.com/catheterization-kits-8000892-3011958-3b-scientific,p_1057_31043.html",
"contacts": [
{
"name": "User",
"email": "Company Email",
"phone": "Company PhoneNumber"
}
],
"type": "contact"
}
]
}
],
"tags": [
"Male",
"Airway"
],
"_id": "0e4cd5c6-4d2f-48b9-acf2-5aa75ade36e1"
}
I have been able to access description and _id via
data = json.loads(line)
if 'xpath' in data:
xpath = data["_id"]
description = data["sections"][0]["payload"][0]["description"]
However, I can't seem to figure out a way to access url. One other issue I have is there could be other items in sections, which makes indexing into Contact Info a non starter.
Hope this helps:
import json
with open("test.json", "r") as f:
json_out = json.load(f)
for i in json_out["sections"]:
for j in i["payload"]:
for key in j:
if "url" in key:
print(key, '->', j[key])
I think your JSON is damaged, it should be like that.
{
"name": "The_New11d112a_Company_Name",
"sections": [
{
"name": "Products",
"payload": [
{
"id": 1,
"name": "TERi Geriatric Patient Skills Trainer",
"type": "string"
}
]
},
{
"name": "Contact Info",
"payload": [
{
"id": 1,
"name": "contacts",
"url": "https://www.a3bs.com/catheterization-kits-8000892-3011958-3b-scientific,p_1057_31043.html",
"contacts": [
{
"name": "User",
"email": "Company Email",
"phone": "Company PhoneNumber"
}
],
"type": "contact"
}
]
}
],
"tags": [
"Male",
"Airway"
],
"_id": "0e4cd5c6-4d2f-48b9-acf2-5aa75ade36e1"
}
You can check it on http://json.parser.online.fr/.
And if you want to get the value of the url.
import json
j = json.load(open('yourJSONfile.json'))
print(j['sections'][1]['payload'][0]['url'])
I think it's worth to write a short function to get the url(s) and make a decision whether or not to use the first found url in the returned list, or skip processing if there's no url available in your data.
The method shall looks like this:
def extract_urls(data):
payloads = []
for section in data['sections']:
payloads += section.get('payload') or []
urls = [x['url'] for x in payloads if 'url' in x]
return urls
This should print out the URL
import json
# open json file to read
with open('test.json','r') as f:
# load json, parameter as json text (file contents)
data = json.loads(f.read())
# after observing format of JSON data, the location of the URL key
# is determined and the data variable is manipulated to extract the value
print(data['sections'][1]['payload'][0]['url'])
The exact location of the 'url' key:
1st (position) of the array which is the value of the key 'sections'
Inside the array value, there is a dict, and the key 'payload' contains an array
In the 0th (position) of the array is a dict with a key 'url'
While testing my solution, I noticed that the json provided is flawed, after fixing the json flaws(3), I ended up with this.
{
"name": "The_New11d112a_Company_Name",
"sections": [
{
"name": "Products",
"payload": [
{
"id": 1,
"name": "TERi Geriatric Patient Skills Trainer",
"type": "string"
}
]
},
{
"name": "Contact Info",
"payload": [
{
"id": 1,
"name": "contacts",
"url": "https://www.a3bs.com/catheterization-kits-8000892-3011958-3b-scientific,p_1057_31043.html",
"contacts": [
{
"name": "User",
"email": "Company Email",
"phone": "Company PhoneNumber"
}
],
"type": "contact"
}
]
}
],
"tags": [
"Male",
"Airway"
],
"_id": "0e4cd5c6-4d2f-48b9-acf2-5aa75ade36e1"}
After utilizing the JSON that was provided by Vincent55.
I made a working code with exception handling and with certain assumptions.
Working Code:
## Assuming that the target data is always under sections[i].payload
from json import loads
line = open("data.json").read()
data = loads(line)["sections"]
for x in data:
try:
# With assumption that there is only one payload
if x["payload"][0]["url"]:
print(x["payload"][0]["url"])
except KeyError:
pass
I have some data from a project where the variables can change from motorcycle and car. I need to get the name out of them and that value is inside the variable.
This is not the data i will be using but it has the same structure, the "official" data is some persional information so i changed it to some random values. I can not change the structure of the JSON data since this is the way the serveradmins decided to structure it for some reason.
This is my python code:
import json
with open('exampleData.json') as j:
data = json.load(j)
name = 0
Vehicle = 0
for x in data:
print(data['persons'][x]['name'])
for i in data['persons'][x]['things']["Vehicles"]:
print(data['persons'][x]['things']['Vehicles'][i]['type']['name'])
print("\n")
This is my Json data i extracted from the file "ExampleData.json"(sorry for long but it is kinda complex and necessary to understand the problem):
{
"total": 2,
"persons": [
{
"name": "Sven Svensson",
"things": {
"House": "apartment",
"Vehicles": [
{
"id": "46",
"type": {
"name": "Kawasaki ER6N",
"type": "motorcyle"
},
"Motorcycle": {
"plate": "aaa111",
"fields": {
"brand": "Kawasaki",
"status": "in shop"
}
}
},
{
"id": "44",
"type": {
"name": "BMW m3",
"type": "Car"
},
"Car": {
"plate": "bbb222",
"fields": {
"brand": "BMW",
"status": "in garage"
}
}
}
]
}
},
{
"name": "Eric Vivian Matthews",
"things": {
"House": "House",
"Vehicles": [
{
"id": "44",
"type": {
"name": "Volvo XC90",
"type": "Car"
},
"Car": {
"plate": "bbb222",
"fields": {
"brand": "Volvo",
"status": "in garage"
}
}
}
]
}
}
]
}
I want it to print out something like this :
Sven Svensson
Bmw M3
Kawasaki ER6n
Eric Vivian Matthews
Volvo XC90
but i get this error:
print(data['persons'][x]['name'])
TypeError: list indices must be integers or slices, not str
Process finished with exit code 1
What you need is
for person in data["persons"]:
for vehicle in person["things"]["vehicles"]:
print(vehicle["type"]["name"])
type = vehicle["type"]["type"]
print(vehicle[type]["plate"])
Python for loop does not return the key but rather an object here:
for x in data:
Referencing an object as key
print(data['persons'][x]['name'])
Is causing the error
What you need is to use the returning json object and iterate over them like so:
for x in data['persons']:
print(x['name'])
for vehicle in x['things']['Vehicles']:
print(vehicle['type']['name'])
print('\n')
I am receiving a large json from Google Assistant and I want to retrieve some specific details from it. The json is the following:
{
"responseId": "************************",
"queryResult": {
"queryText": "actions_intent_DELIVERY_ADDRESS",
"action": "delivery",
"parameters": {},
"allRequiredParamsPresent": true,
"fulfillmentMessages": [
{
"text": {
"text": [
""
]
}
}
],
"outputContexts": [
{
"name": "************************/agent/sessions/1527070836044/contexts/actions_capability_screen_output"
},
{
"name": "************************/agent/sessions/1527070836044/contexts/more",
"parameters": {
"polar": "no",
"polar.original": "No",
"cardinal": 2,
"cardinal.original": "2"
}
},
{
"name": "************************/agent/sessions/1527070836044/contexts/actions_capability_audio_output"
},
{
"name": "************************/agent/sessions/1527070836044/contexts/actions_capability_media_response_audio"
},
{
"name": "************************/agent/sessions/1527070836044/contexts/actions_intent_delivery_address",
"parameters": {
"DELIVERY_ADDRESS_VALUE": {
"userDecision": "ACCEPTED",
"#type": "type.googleapis.com/google.actions.v2.DeliveryAddressValue",
"location": {
"postalAddress": {
"regionCode": "US",
"recipients": [
"Amazon"
],
"postalCode": "NY 10001",
"locality": "New York",
"addressLines": [
"450 West 33rd Street"
]
},
"phoneNumber": "+1 206-266-2992"
}
}
}
},
{
"name": "************************/agent/sessions/1527070836044/contexts/actions_capability_web_browser"
}
],
"intent": {
"name": "************************/agent/intents/86fb2293-7ae9-4bed-adeb-6dfe8797e5ff",
"displayName": "Delivery"
},
"intentDetectionConfidence": 1,
"diagnosticInfo": {},
"languageCode": "en-gb"
},
"originalDetectIntentRequest": {
"source": "google",
"version": "2",
"payload": {
"isInSandbox": true,
"surface": {
"capabilities": [
{
"name": "actions.capability.MEDIA_RESPONSE_AUDIO"
},
{
"name": "actions.capability.SCREEN_OUTPUT"
},
{
"name": "actions.capability.AUDIO_OUTPUT"
},
{
"name": "actions.capability.WEB_BROWSER"
}
]
},
"inputs": [
{
"rawInputs": [
{
"query": "450 West 33rd Street"
}
],
"arguments": [
{
"extension": {
"userDecision": "ACCEPTED",
"#type": "type.googleapis.com/google.actions.v2.DeliveryAddressValue",
"location": {
"postalAddress": {
"regionCode": "US",
"recipients": [
"Amazon"
],
"postalCode": "NY 10001",
"locality": "New York",
"addressLines": [
"450 West 33rd Street"
]
},
"phoneNumber": "+1 206-266-2992"
}
},
"name": "DELIVERY_ADDRESS_VALUE"
}
],
"intent": "actions.intent.DELIVERY_ADDRESS"
}
],
"user": {
"lastSeen": "2018-05-23T10:20:25Z",
"locale": "en-GB",
"userId": "************************"
},
"conversation": {
"conversationId": "************************",
"type": "ACTIVE",
"conversationToken": "[\"more\"]"
},
"availableSurfaces": [
{
"capabilities": [
{
"name": "actions.capability.SCREEN_OUTPUT"
},
{
"name": "actions.capability.AUDIO_OUTPUT"
},
{
"name": "actions.capability.WEB_BROWSER"
}
]
}
]
}
},
"session": "************************/agent/sessions/1527070836044"
}
This large json returns amongst other things to my back-end the delivery address details of the user (here I use Amazon's NY locations details as an example). Therefore, I want to retrieve the location dictionary which is near the end of this large json. The location details appear also near the start of this json but I want to retrieve specifically the second location dictionary which is near the end of this large json.
For this reason, I had to read through this json by myself and manually test some possible "paths" of the location dictionary within this large json to find out finally that I had to write the following line to retrieve the second location dictionary:
location = json['originalDetectIntentRequest']['payload']['inputs'][0]['arguments'][0]['extension']['location']
Therefore, my question is the following: is there any concise way to retrieve automatically the "path" of the parent keys and indices of the second location dictionary within this large json?
Hence, I expect that the general format of the output from a function which does this for all the occurrences of the location dictionary in any json will be the following:
[["path" of first `location` dictionary], ["path" of second `location` dictionary], ["path" of third `location` dictionary], ...]
where for the json above it will be
[["path" of first `location` dictionary], ["path" of second `location` dictionary]]
as there are two occurrences of the location dictionary with
["path" of second `location` dictionary] = ['originalDetectIntentRequest', 'payload', 'inputs', 0, 'arguments', 0, 'extension', 'location']
I have in my mind relevant posts on StackOverflow (Python--Finding Parent Keys for a specific value in a nested dictionary) but I am not sure that these apply exactly to my problem since these are for parent keys in nested dictionaries whereas here I am talking about the parent keys and indices in dictionary with nested dictionaries and lists.
I solved this by using recursive search
# result and path should be outside of the scope of find_path to persist values during recursive calls to the function
result = []
path = []
from copy import copy
# i is the index of the list that dict_obj is part of
def find_path(dict_obj,key,i=None):
for k,v in dict_obj.items():
# add key to path
path.append(k)
if isinstance(v,dict):
# continue searching
find_path(v, key,i)
if isinstance(v,list):
# search through list of dictionaries
for i,item in enumerate(v):
# add the index of list that item dict is part of, to path
path.append(i)
if isinstance(item,dict):
# continue searching in item dict
find_path(item, key,i)
# if reached here, the last added index was incorrect, so removed
path.pop()
if k == key:
# add path to our result
result.append(copy(path))
# remove the key added in the first line
if path != []:
path.pop()
# default starting index is set to None
find_path(di,"location")
print(result)
# [['queryResult', 'outputContexts', 4, 'parameters', 'DELIVERY_ADDRESS_VALUE', 'location'], ['originalDetectIntentRequest', 'payload', 'inputs', 0, 'arguments', 0, 'extension', 'location']]
this is a sample of my json file:
{
"pops": [{
"name": "pop_a",
"subnets": {
"Public": ["1.1.1.0/24,2.2.2.0/24"],
"Private": ["192.168.0.0/24,192.168.1.0/24"],
"more DATA":""
}
},
{
"name": "pop_b",
"subnets": {
"Public": ["3.3.3.0/24,4.4.4.0/24"],
"Private": ["192.168.2.0/24,192.168.3.0/24"],
"more DATA":""
}
}
]
}
after i read it, i want to make a dic object and store some of the things that i need from this file.
i want my object to be like this ..
[{
"name": "pop_a",
"subnets": {"Public": ["1.1.1.0/24,2.2.2.0/24"],"Private": ["192.168.0.0/24,192.168.1.0/24"]}
},
{
"name": "pop_b",
"subnets": {"Public": ["3.3.3.0/24,4.4.4.0/24"],"Private": ["192.168.2.0/24,192.168.3.0/24"]}
}]
then i want to be able to access some of the public/private values
here is what i tried, and i know there is update(), setdefault() that gave also same unwanted results
def my_funckion():
nt_json = [{'name':"",'subnets':[]}]
Pname = []
Psubnet= []
for pop in pop_json['pops']: # it print only the last key/value
nt_json[0]['name']= pop['name']
nt_json[0]['subnet'] = pop['subnet']
pprint (nt_json)
for pop in pop_json['pops']:
"""
it print the names in a row then all of the ipss
"""
Pname.append(pop['name'])
Pgre.append(pop['subnet'])
nt_json['pop_name'] = Pname
nt_json['subnet']= Psubnet
pprint (nt_json)
Here's a quick solution using list comprehension. Note that this approach can be taken only with enough knowledge of the json structure.
>>> import json
>>>
>>> data = ... # your data
>>> new_data = [{ "name" : x["name"], "subnets" : {"Public" : x["subnets"]["Public"], "Private" : x["subnets"]["Private"]}} for x in data["pops"]]
>>>
>>> print(json.dumps(new_data, indent=2))
[
{
"name": "pop_a",
"subnets": {
"Private": [
"192.168.0.0/24,192.168.1.0/24"
],
"Public": [
"1.1.1.0/24,2.2.2.0/24"
]
}
},
{
"name": "pop_b",
"subnets": {
"Private": [
"192.168.2.0/24,192.168.3.0/24"
],
"Public": [
"3.3.3.0/24,4.4.4.0/24"
]
}
}
]