Finding the headers for values in Json files - python

Hi I've got json data that looks something like this:
{
"content": {
"Header 1": [
{
"name": "Name1",
}
},
{
"name": "Name2",
}
}
],
"Header 2": [
{
"name": "Name3",
}
}
],
}
}
I'm looking to convert this into lists that look something like this:
header1 = ["Name1", "Name2"]
header2 = ["Name3"]
So far I've been able to get all the names that I want using objectpath.
from importlib.resources import path
import json
from unicodedata import name
import objectpath
path = (r"C:\Users\path\example.json")
with open(path) as json_file:
data = json.load(json_file)
tree_obj = objectpath.Tree(data)
names = list(tree_obj.execute('$..name'))
print (names)
But I've been unable to get the appropriate headers for each name as it appears to be nested under the 'content' header
Any help would be appreciated. Thanks :)

This does what you ask. Just iterate through the keys of "content", and grab the keys in the subobjects.
import json
jsonx = """{
"content": {
"Header 1": [
{
"name": "Name1"
},
{
"name": "Name2"
}
],
"Header 2": [
{
"name": "Name3"
}
]
}
}"""
data = json.loads(jsonx)
gather = {}
for k, v in data["content"].items():
k1 = k.lower().replace(' ','')
v1 = [vv['name'] for vv in v]
gather[k1] = v1
print(gather)
Output:
{'header1': ['Name1', 'Name2'], 'header2': ['Name3']}
And for those who like one-liners:
gather = dict(
(k.lower().replace(' ',''),[vv['name'] for vv in v])
for k, v in data["content"].items() )

Related

How to select multiple JSON Objects using python

I have a Json data as following. The Json has many such objects with same NameId's:
[{
"NameId": "name1",
"exp": {
"exp1": "test1"
}
}, {
"NameId": "name1",
"exp": {
"exp2": "test2"
}
}
]
Now, what I am after is to create a new Json Object that has a merged exp and create a file something like below, so that I do not have multiple NameId:
[{
"NameId": "name1",
"exp": {
"exp1": "test1",
"exp2": "test2"
}
}
]
Is there a possibility I can achive it using Python?
You can do the manual work, merging the entries while rebuilding the structure. You can keep a dictionary with the exp to merge them.
import json
jsonData = [{
"NameId": "name1",
"exp": {
"exp1": "test1"
}
}, {
"NameId": "name1",
"exp": {
"exp2": "test2"
}
}, {
"NameId": "name2",
"exp": {
"exp3": "test3"
}
}]
result = []
expsDict = {}
for entry in jsonData:
nameId = entry["NameId"]
exp = entry["exp"]
if nameId in expsDict:
# Merge exp into resultExp.
# Note that resultExp belongs to both result and expsDict,
# changes made will be reflected in both containers!
resultExp = expsDict[nameId]
for (expName, expValue) in exp.items():
resultExp[expName] = expValue
else:
# Copy copy copy, otherwise merging would modify jsonData too!
exp = exp.copy()
entry = entry.copy()
entry["exp"] = exp
# Add a new item to the result
result.append(entry)
# Store exp to later merge other entries with the same name.
expsDict[nameId] = exp
print(result)
You can use itertools.groupby and functools.reduce
d = [{
"NameId": "name1",
"exp": {
"exp1": "test1"
}
}, {
"NameId": "name1",
"exp": {
"exp2": "test2"
}
}]
from itertools import groupby
[ {'NameId': k, 'exp': reduce(lambda x,y : {**x["exp"], **y["exp"]} , v) } for k,v in groupby(sorted(d, key=lambda x: x["NameId"]), lambda x: x["NameId"]) ]
#output
[{'NameId': 'name1', 'exp': {'exp1': 'test1', 'exp2': 'test2'}}]

Pandas to JSON not respecting DataFrame format

I have a Pandas DataFrame which I need to transform into a JSON object. I thought by grouping it, I would achieve this but this does not seem to yield the correct results. Further, I wouldnt know how to name the sub group.
My data frame as follows:
parent
name
age
nick
stef
10
nick
rob
12
And I do a groupby as I would like all children together under one parent in json:
df = df.groupby(['parent', 'name'])['age'].min()
And I would like it to yield the following:
{
"parent": "Nick",
"children": [
{
"name": "Rob",
"age": 10,
},
{
"name": "Stef",
"age": 15,
},,.. ]
}
When I do .to_json() it seems to regroup everything on age etc.
df.groupby(['parent'])[['name', 'age']].apply(list).to_json()
Given I wanted to add some styling, I ended up solving it as follows:
import json
df_grouped = df.groupby('parent')
new = []
for group_name, df_group in df_grouped:
base = {}
base['parent'] = group_name
children = []
for row_index, row in df_group.iterrows():
temp = {}
temp['name'] = row['name']
temp['age'] = row['age']
children.append(temp)
base['children'] = children
new.append(base)
json_format = json.dumps(new)
print(new)
Which yielded the following results:
[
{
"parent":"fee",
"children":[
{
"name":"bob",
"age":9
},
{
"name":"stef",
"age":10
}
]
},
{
"parent":"nick",
"children":[
{
"name":"stef",
"age":10
},
{
"name":"tobi",
"age":2
},
{
"name":"ralf",
"age":12
}
]
},
{
"parent":"patrick",
"children":[
{
"name":"marion",
"age":10
}
]
}
]

Is there any way to convert specific JSON data to CSV?

I have JSON format which looks like
Here is the link https://drive.google.com/file/d/1RqU2s0dqjd60dcYlxEJ8vnw9_z2fWixd/view?usp=sharing
result =
{
"ERROR":[
],
"LinkSetDbHistory":[
],
"LinkSetDb":[
{
"Link":[
{
"Id":"8116078"
},
{
"Id":"7654180"
},
{
"Id":"7643601"
},
{
"Id":"7017037"
},
{
"Id":"6190213"
},
{
"Id":"5902265"
},
{
"Id":"5441934"
},
{
"Id":"5417587"
},
{
"Id":"5370323"
},
{
"Id":"5362514"
},
{
"Id":"4818642"
},
{
"Id":"4330602"
}
],
"DbTo":"pmc",
"LinkName":"pubmed_pmc_refs"
}
],
"DbFrom":"pubmed",
"IdList":[
"25209241"
]
},
{
"ERROR":[
],
"LinkSetDbHistory":[
],
"LinkSetDb":[
{
"Link":[
{
"Id":"7874507"
},
{
"Id":"7378719"
},
{
"Id":"6719480"
},
{
"Id":"5952809"
},
{
"Id":"4944516"
}
],
"DbTo":"pmc",
"LinkName":"pubmed_pmc_refs"
}
],
"DbFrom":"pubmed",
"IdList":[
"25209630"
]
},
I want to fetch ID with a length which is 12 and list
"IdList":"25209241"
so the final output will be
IDList: length
25209241: 12 (Total number of Id in link array)
25209630 : 5 (Total number of Id in link array)
I have tried this code but not working with single or multiple values.
pmc_ids = [link["Id"] for link in results["LinkSetDb"]["Link"]]
len(pmc_ids)
How it can work with a large dataset if there?
You have "LinkSetDb" as a list containing a single dictionary but you are indexing it as if it is a dictionary. Use:
pmc_ids = [link["Id"] for link in result["LinkSetDb"][0]["Link"]]
len(pmc_ids)
The 'Link' key is inside a list. So, change pmc_ids = [link["Id"] for link in results["LinkSetDb"]["Link"]] to pmc_ids = [link["Id"] for link in results["LinkSetDb"][0]["Link"]].
To generate csv file, the code would be something like this:
import json
import csv
with open('Citation_with_ID.json', 'r') as f_json:
json_data = f_json.read()
f_json.close()
json_dict = json.loads(json_data)
csv_headers = ["IdList", "length"]
csv_values = []
for i in json_dict:
if len(i["LinkSetDb"])>0:
pmc_ids = [link["Id"] for link in i["LinkSetDb"][0]["Link"]]
else:
pmc_ids = []
length = len(pmc_ids)
if len(i['IdList'])==1:
IdList = i['IdList'][0]
else:
IdList = None
csv_values.append([IdList,length])
with open('mycsvfile.csv', 'w') as f_csv:
w = csv.writer(f_csv)
w.writerow(csv_headers)
w.writerows(csv_values)
f_csv.close()
If you want to store the values in a dictionary then something like this can be used:
values_list = list(zip(*csv_values))
dict(zip(values_list[0],values_list[1]))

Update a specific key in JSON Array using PYTHON

I have a JSON file which has some key-value pairs in Arrays. I need to update/replace the value for key id with a value stored in a variable called Var1
The problem is that when I run my python code, it adds the new key-value pair in outside the inner array instead of replacing:
PYTHON SCRIPT:
import json
import sys
var1=abcdefghi
with open('C:\\Projects\\scripts\\input.json', 'r+') as f:
json_data = json.load(f)
json_data['id'] = var1
f.seek(0)
f.write(json.dumps(json_data))
f.truncate()
INPUT JSON:
{
"channel": "AT",
"username": "Maintenance",
"attachments": [
{
"fallback":"[Urgent]:",
"pretext":"[Urgent]:",
"color":"#D04000",
"fields":[
{
"title":"SERVERS:",
"id":"popeye",
"short":false
}
]
}
]
}
OUTPUT:
{
"username": "Maintenance",
"attachments": [
{
"color": "#D04000",
"pretext": "[Urgent]:",
"fallback": "[Urgent]:",
"fields": [
{
"short": false,
"id": "popeye",
"title": "SERVERS:"
}
]
}
],
"channel": "AT",
"id": "abcdefghi"
}
Below will update the id inside fields :
json_data['attachments'][0]['fields'][0]['id'] = var1

Filtering out desired data from a JSON file (Python)

this is a sample of my json file:
{
"pops": [{
"name": "pop_a",
"subnets": {
"Public": ["1.1.1.0/24,2.2.2.0/24"],
"Private": ["192.168.0.0/24,192.168.1.0/24"],
"more DATA":""
}
},
{
"name": "pop_b",
"subnets": {
"Public": ["3.3.3.0/24,4.4.4.0/24"],
"Private": ["192.168.2.0/24,192.168.3.0/24"],
"more DATA":""
}
}
]
}
after i read it, i want to make a dic object and store some of the things that i need from this file.
i want my object to be like this ..
[{
"name": "pop_a",
"subnets": {"Public": ["1.1.1.0/24,2.2.2.0/24"],"Private": ["192.168.0.0/24,192.168.1.0/24"]}
},
{
"name": "pop_b",
"subnets": {"Public": ["3.3.3.0/24,4.4.4.0/24"],"Private": ["192.168.2.0/24,192.168.3.0/24"]}
}]
then i want to be able to access some of the public/private values
here is what i tried, and i know there is update(), setdefault() that gave also same unwanted results
def my_funckion():
nt_json = [{'name':"",'subnets':[]}]
Pname = []
Psubnet= []
for pop in pop_json['pops']: # it print only the last key/value
nt_json[0]['name']= pop['name']
nt_json[0]['subnet'] = pop['subnet']
pprint (nt_json)
for pop in pop_json['pops']:
"""
it print the names in a row then all of the ipss
"""
Pname.append(pop['name'])
Pgre.append(pop['subnet'])
nt_json['pop_name'] = Pname
nt_json['subnet']= Psubnet
pprint (nt_json)
Here's a quick solution using list comprehension. Note that this approach can be taken only with enough knowledge of the json structure.
>>> import json
>>>
>>> data = ... # your data
>>> new_data = [{ "name" : x["name"], "subnets" : {"Public" : x["subnets"]["Public"], "Private" : x["subnets"]["Private"]}} for x in data["pops"]]
>>>
>>> print(json.dumps(new_data, indent=2))
[
{
"name": "pop_a",
"subnets": {
"Private": [
"192.168.0.0/24,192.168.1.0/24"
],
"Public": [
"1.1.1.0/24,2.2.2.0/24"
]
}
},
{
"name": "pop_b",
"subnets": {
"Private": [
"192.168.2.0/24,192.168.3.0/24"
],
"Public": [
"3.3.3.0/24,4.4.4.0/24"
]
}
}
]

Categories

Resources