Extracting specific elements from multiple JSON files and adding into single Excel - python

So, basically I have two JSON files and from them I need to extract only "value" and add it to a single Excel sheet.
JSON file 1
{
"flower": {
"price": {
"type": "good",
"value": 5282.0,
"direction": "up"
}
},
"furniture": {
"price": {
"type": "comfy",
"value": 9074.0,
"direction": "down"
}
}
}
JSON file 2
{
"flower": {
"price": {
"type": "good",
"value": 827.0,
"direction": "up"
}
},
"furniture": {
"price": {
"type": "comfy",
"value": 468.0,
"direction": "down"
}
}
}
Now, the output should look like this in the Excel sheet
therefore, for solving this question here's the code so far , where JSON file 1 is json.json and file 2 is json12.json
import json
import pandas as pd
with open('json.json', 'r') as f: data = json.load(f)
with open('json12.json', 'r') as f: data1 = json.load(f)
data = [{'key': k, 'value1': v['price']['value']} for k, v in data.items() if k in ['flower' , 'furniture']]
print(data)
data1 = [{'key': k, 'value2': v['price']['value']} for k, v in data.items() if k in ['flower' , 'furniture']]
print(data1)
df = pd.DataFrame(data).set_index('key')
df = pd.DataFrame(data1).set_index('key')
df.to_excel('xcel.xlsx')
after running this I'm not getting the desired output...so, plz help me in this as I'm new in learning python so, it's very hard to address the correct approach..

I believe this code does what you are requesting (if j1 and j2 are the jsons you are showing):
v1s = [j1['flower']['price']['value'], j1['furniture']['price']['value']]
v2s = [j2['flower']['price']['value'], j2['furniture']['price']['value']]
index = ['flower', 'furniture']
pd.DataFrame({'value1': v1s, 'value2': v2s, 'key': index}).set_index('key')

Related

How to select multiple JSON Objects using python

I have a Json data as following. The Json has many such objects with same NameId's:
[{
"NameId": "name1",
"exp": {
"exp1": "test1"
}
}, {
"NameId": "name1",
"exp": {
"exp2": "test2"
}
}
]
Now, what I am after is to create a new Json Object that has a merged exp and create a file something like below, so that I do not have multiple NameId:
[{
"NameId": "name1",
"exp": {
"exp1": "test1",
"exp2": "test2"
}
}
]
Is there a possibility I can achive it using Python?
You can do the manual work, merging the entries while rebuilding the structure. You can keep a dictionary with the exp to merge them.
import json
jsonData = [{
"NameId": "name1",
"exp": {
"exp1": "test1"
}
}, {
"NameId": "name1",
"exp": {
"exp2": "test2"
}
}, {
"NameId": "name2",
"exp": {
"exp3": "test3"
}
}]
result = []
expsDict = {}
for entry in jsonData:
nameId = entry["NameId"]
exp = entry["exp"]
if nameId in expsDict:
# Merge exp into resultExp.
# Note that resultExp belongs to both result and expsDict,
# changes made will be reflected in both containers!
resultExp = expsDict[nameId]
for (expName, expValue) in exp.items():
resultExp[expName] = expValue
else:
# Copy copy copy, otherwise merging would modify jsonData too!
exp = exp.copy()
entry = entry.copy()
entry["exp"] = exp
# Add a new item to the result
result.append(entry)
# Store exp to later merge other entries with the same name.
expsDict[nameId] = exp
print(result)
You can use itertools.groupby and functools.reduce
d = [{
"NameId": "name1",
"exp": {
"exp1": "test1"
}
}, {
"NameId": "name1",
"exp": {
"exp2": "test2"
}
}]
from itertools import groupby
[ {'NameId': k, 'exp': reduce(lambda x,y : {**x["exp"], **y["exp"]} , v) } for k,v in groupby(sorted(d, key=lambda x: x["NameId"]), lambda x: x["NameId"]) ]
#output
[{'NameId': 'name1', 'exp': {'exp1': 'test1', 'exp2': 'test2'}}]

Applying formula to the first 2 columns of dataframes and showing the result in the 3 column through python

So, i had two jsons files from where values were extracted into a dataframe and saved in an excel sheet
JSON file1
{
"flower": {
"price": {
"type": "good",
"value": 5282.0,
"direction": "up"
}
},
"furniture": {
"price": {
"type": "comfy",
"value": 9074.0,
"direction": "down"
}
}
}
JSON file2
{
"flower": {
"price": {
"type": "good",
"value": 827.0,
"direction": "up"
}
},
"furniture": {
"price": {
"type": "comfy",
"value": 468.0,
"direction": "down"
}
}
}
to create a dataframe and store into the excel sheet the code was
import json
import pandas as pd
with open('jsonfile1.json', 'r') as f1:
data1 = json.load(f1)
with open('jsonfile2.json', 'r') as f2:
data2 = json.load(f2)
col1 = [data1['flower']['price']['value'], data1['furniture']['price']['value']]
col2 = [data2['flower']['price']['value'], data2['furniture']['price']['value']]
index = ['flower' , 'furniture' ]
df = pd.DataFrame({'value 1': col1, 'value2': col2, 'Test': index}).set_index('Test')
# storing into the excel file
df.to_excel('file.xlsx')
now the output I want is: to apply formula [{(value 1- value2)/value1}*100] on column 1 and column 2 , where the result should be displayed in the third column of the dataframe , like this in excel sheet
as, i am quite new in learning python, therefore I'm confused that how to apply formulas to the columns in dataframe. so , I'd be grateful if someone could help me!
You can simply use the columns for your calculation:
df["col3"] = ((df["value 1"]-df["value2"])/df["value 1"])*100

Convert Pandas Dataframe or csv file to Custom Nested JSON

I have a csv file with a DF with structure as follows:
my dataframe:
I want to enter the data to the following JSON format using python. I looked to couple of links (but I got lost in the nested part). The links I checked:
How to convert pandas dataframe to uniquely structured nested json
convert dataframe to nested json
"PHI": 2,
"firstname": "john",
"medicalHistory": {
"allergies": "egg",
"event": {
"inPatient":{
"hospitalized": {
"visit" : "7-20-20",
"noofdays": "5",
"test": {
"modality": "xray"
}
"vitalSign": {
"temperature": "32",
"heartRate": "80"
},
"patientcondition": {
"headache": "1",
"cough": "0"
}
},
"icu": {
"visit" : "",
"noofdays": "",
},
},
"outpatient": {
"visit":"5-20-20",
"vitalSign": {
"temperature": "32",
"heartRate": "80"
},
"patientcondition": {
"headache": "1",
"cough": "1"
},
"test": {
"modality": "blood"
}
}
}
}
If anyone can help me with the nested array, that will be really helpful.
You need one or more helper functions to unpack the data in the table like this. Write main helper function to accept two arguments: 1. df and 2. schema. The schema will be used to unpack the df into a nested structure for each row in the df. The schema below is an example of how to achieve this for a subset of the logic you describe. Although not exactly what you specified in example, should be enough of hint for you to complete the rest of the task on your own.
from operator import itemgetter
groupby_idx = ['PHI', 'firstName']
groups = df.groupby(groupby_idx, as_index=False, drop=False)
schema = {
"event": {
"eventType": itemgetter('event'),
"visit": itemgetter('visit'),
"noOfDays": itemgetter('noofdays'),
"test": {
"modality": itemgetter('test')
},
"vitalSign": {
"temperature": itemgetter('temperature'),
"heartRate": itemgetter('heartRate')
},
"patientCondition": {
"headache": itemgetter('headache'),
"cough": itemgetter('cough')
}
}
}
def unpack(obj, schema):
tmp = {}
for k, v in schema.items():
if isinstance(v, (dict,)):
tmp[k] = unpack(obj, v)
if callable(v):
tmp[k] = v(obj)
return tmp
def apply_unpack(groups, schema):
results = {}
for gidx, df in groups:
events = []
for ridx, obj in df.iterrows():
d = unpack(obj, schema)
events.append(d)
results[gidx] = events
return results
unpacked = apply_unpack(groups, schema)

Pandas json normalize an object property containing an array of objects

If I have json data formatted like this:
{
"result": [
{
"id": 878787,
"name": "Testing",
"schema": {
"id": 3463463,
"smartElements": [
{
"svKey": "Model",
"value": {
"type": "type1",
"value": "ThisValue"
}
},
{
"svKey": "SecondKey",
"value": {
"type": "example",
"value": "ThisValue2"
}
}
]
}
},
{
"id": 333,
"name": "NameName",
"schema": {
"id": 1111,
"smartElements": [
{
"svKey": "Model",
"value": {
"type": "type1",
"value": "NewValue"
}
},
{
"svKey": "SecondKey",
"value": {
"type": "example",
"value": "ValueIs"
}
}
]
}
}
]
}
is there a way to normalize it so I end up with records:
name Model SecondKey
Testing ThisValue ThisValue2
NameName NewValue ValueIs
I can get the smartElements to a pandas series but I can't figure out a way to break out smartElements[x].svKey to a column header and smartElements[x].value.value to the value for that column and/or merge it.
I'd skip trying to use a pre-baked solution and just navigate the json yourself.
import json
import pandas as pd
data = json.load(open('my.json'))
records = []
for d in data['result']:
record = {}
record['name'] = d['name']
for ele in d['schema']['smartElements']:
record[ele['svKey']] = ele['value']['value']
records.append(record)
pd.DataFrame(records)
name Model SecondKey
0 Testing ThisValue ThisValue2
1 NameName NewValue ValueIs
My solution
import pandas as pd
import json
with open('test.json') as f:
a = json.load(f)
d = pd.json_normalize(data=a['result'], errors='ignore', record_path=['schema', 'smartElements'], meta=['name'])
print(d)
produces
svKey value.type value.value name
0 Model type1 ThisValue Testing
1 SecondKey example ThisValue2 Testing
2 Model type1 NewValue NameName
3 SecondKey example ValueIs NameName

Python - Problem extracting data from nested json

I have a problem extracting data from json, I tried n different ways. I was able to extract the ID itself, unfortunately I can't manage to show the details of the field.
Below is my json
{
"params": {
"cid": "15482782896",
"datemax": "20190831",
"datemin": "20190601",
"domains": [
"url.com"
],
},
"results": {
"59107": {
"url.com": {
"1946592": {
"data": {
"2019-06-01": {
"ENGINE": {
"DEVICE": {
"": {
"position": 21,
"url": "url3.com"
}
}
}
},
"2019-07-01": {
"ENGINE": {
"DEVICE": {
"": {
"position": 4,
"url": "url3.com"
}
}
}
},
"2019-08-01": {
"ENGINE": {
"DEVICE": {
"": {
"position": 2,
"url": "url3.com"
}
}
}
}
},
"keyword": {
"title": "python_1",
"volume": 10
}
},
"1946602": {
"data": {
"2019-06-01": {
"ENGINE": {
"DEVICE": {
"": {
"position": 5,
"url": "url1.com"
}
}
}
},
"2019-07-01": {
"ENGINE": {
"DEVICE": {
"": {
"position": 12,
"url": "url1.com"
}
}
}
},
"2019-08-01": {
"ENGINE": {
"DEVICE": {
"": {
"position": 10.25,
"url": "url1.com"
}
}
}
}
},
"keyword": {
"title": "python_2",
"volume": 20
}
}
}
}
}
}
I tried the following code but I got the result in the form of id itself
import json
import csv
def get_leaves(item, key=None):
if isinstance(item, dict):
leaves = {}
for i in item.keys():
leaves.update(get_leaves(item[i], i))
return leaves
elif isinstance(item, list):
leaves = {}
for i in item:
leaves.update(get_leaves(i, key))
return leaves
else:
return {key : item}
with open('me_filename') as f_input:
json_data = json.load(f_input)
fieldnames = set()
for entry in json_data:
fieldnames.update(get_leaves(entry).keys())
with open('output.csv', 'w', newline='') as f_output:
csv_output = csv.DictWriter(f_output, fieldnames=sorted(fieldnames))
csv_output.writeheader()
csv_output.writerows(get_leaves(entry) for entry in json_data)
I also tried to use the pandas but also failed to parse properly
import io
import json
import pandas as pd
with open('me_filename', encoding='utf-8') as f_input:
df = pd.read_json(f_input , orient='None')
df.to_csv('output.csv', encoding='utf-8')
The result I'd need to get it :
ID Name page volume url 2019-06-01 2019-07-01 2019-08-01 2019-09-01
1946592 python_1 url.com 10 url3.com 21 4 2 null
1946602 python_2 url.com 20 url1.com 5 12 10,25 null
What could I do wrong?
Hmm this is a bit of a convoluted solution and it looks very messy and no-longer looks like the code provided however I believe it will resolve your issue.
First of all I had a problem with the provided Json (due to the trailing ',' on line 8) however have managed to generate:
Output (temp.csv)
ID,Name,Page,Volume,Url,2019-08-01,2019-07-01,2019-06-01,
1946592,python_1,url.com,10,url3.com,2,4,21,
1946602,python_2,url.com,20,url1.com,10.25,12,5,
using the following:
import json
dates: set = set()
# Collect the data
def get_breakdown(json):
collected_data = []
for result in json['results']:
for page in json['results'][result]:
for _id in json['results'][result][page]:
data_struct = {
'ID': _id,
'Name': json['results'][result][page][_id]['keyword']['title'],
'Page': page,
'Volume': json['results'][result][page][_id]['keyword']['volume'],
'Dates': {}
}
for date in dates:
if date in json['results'][result][page][_id]['data']:
data_struct['URL'] = json['results'][result][page][_id]['data'][date]['ENGINE']['DEVICE']['']['url']
data_struct['Dates'][date] = {'Position' : json['results'][result][page][_id]['data'][date]['ENGINE']['DEVICE']['']['position']}
else:
data_struct['Dates'][date] = {'Position' : 'null'}
collected_data.append(data_struct)
return collected_data
# Collect all dates across the whole data
# structure and save them to a set
def get_dates(json):
for result in json['results']:
for page in json['results'][result]:
for _id in json['results'][result][page]:
for date in json['results'][result][page][_id]['data']:
dates.add(date)
# Write to .csv file
def write_csv(collected_data, file_path):
f = open(file_path, "w")
# CSV Title
date_string = ''
for date in dates:
date_string = '{0}{1},'.format(date_string, date)
f.write('ID,Name,Page,Volume,Url,{0}\n'.format(date_string))
# Data
for data in collected_data:
position_string = ''
for date in dates:
position_string = '{0}{1},'.format(position_string, data['Dates'][date]['Position'])
f.write('{0},{1},{2},{3},{4},{5}\n'.format(
data['ID'],
data['Name'],
data['Page'],
data['Volume'],
data['URL'],
position_string
))
# Code Body
with open('me_filename.json') as f_input:
json_data = json.load(f_input)
get_dates(json_data)
write_csv(get_breakdown(json_data), "output.csv")
Hopefully you can follow the code and it does what is expected. I am sure that it can be made much more reliable - however as previously mentioned I couldn't make it work with the base code you provided.
After a small modification your code works great, but I noticed that showing the date as the next line would be a better solution in the format.
I tried to modify your solution to this form, but I'm still too weak in python to easily deal with it. Can you still tell me how you can do it to achieve this csv file format?
Output(temp.csv)
ID,Name,Page,Volume,Url,data,value,
1946592,python_1,url.com,10,url3.com,2019-08-01,2
1946592,python_1,url.com,10,url3.com,2019-07-01,4
1946592,python_1,url.com,10,url3.com,2019-06-01,21
1946602,python_2,url.com,20,url1.com,2019-08-01,10.25,
1946602,python_2,url.com,20,url1.com,2019-07-01,12,
1946602,python_2,url.com,20,url1.com,2019-06-01,5,

Categories

Resources