Python - Problem extracting data from nested json

Python - Problem extracting data from nested json - python

I have a problem extracting data from json, I tried n different ways. I was able to extract the ID itself, unfortunately I can't manage to show the details of the field.
Below is my json
{
"params": {
"cid": "15482782896",
"datemax": "20190831",
"datemin": "20190601",
"domains": [
"url.com"
],
},
"results": {
"59107": {
"url.com": {
"1946592": {
"data": {
"2019-06-01": {
"ENGINE": {
"DEVICE": {
"": {
"position": 21,
"url": "url3.com"
}
}
}
},
"2019-07-01": {
"ENGINE": {
"DEVICE": {
"": {
"position": 4,
"url": "url3.com"
}
}
}
},
"2019-08-01": {
"ENGINE": {
"DEVICE": {
"": {
"position": 2,
"url": "url3.com"
}
}
}
}
},
"keyword": {
"title": "python_1",
"volume": 10
}
},
"1946602": {
"data": {
"2019-06-01": {
"ENGINE": {
"DEVICE": {
"": {
"position": 5,
"url": "url1.com"
}
}
}
},
"2019-07-01": {
"ENGINE": {
"DEVICE": {
"": {
"position": 12,
"url": "url1.com"
}
}
}
},
"2019-08-01": {
"ENGINE": {
"DEVICE": {
"": {
"position": 10.25,
"url": "url1.com"
}
}
}
}
},
"keyword": {
"title": "python_2",
"volume": 20
}
}
}
}
}
}
I tried the following code but I got the result in the form of id itself
import json
import csv
def get_leaves(item, key=None):
if isinstance(item, dict):
leaves = {}
for i in item.keys():
leaves.update(get_leaves(item[i], i))
return leaves
elif isinstance(item, list):
leaves = {}
for i in item:
leaves.update(get_leaves(i, key))
return leaves
else:
return {key : item}
with open('me_filename') as f_input:
json_data = json.load(f_input)
fieldnames = set()
for entry in json_data:
fieldnames.update(get_leaves(entry).keys())
with open('output.csv', 'w', newline='') as f_output:
csv_output = csv.DictWriter(f_output, fieldnames=sorted(fieldnames))
csv_output.writeheader()
csv_output.writerows(get_leaves(entry) for entry in json_data)
I also tried to use the pandas but also failed to parse properly
import io
import json
import pandas as pd
with open('me_filename', encoding='utf-8') as f_input:
df = pd.read_json(f_input , orient='None')
df.to_csv('output.csv', encoding='utf-8')
The result I'd need to get it :
ID Name page volume url 2019-06-01 2019-07-01 2019-08-01 2019-09-01
1946592 python_1 url.com 10 url3.com 21 4 2 null
1946602 python_2 url.com 20 url1.com 5 12 10,25 null
What could I do wrong?

Hmm this is a bit of a convoluted solution and it looks very messy and no-longer looks like the code provided however I believe it will resolve your issue.
First of all I had a problem with the provided Json (due to the trailing ',' on line 8) however have managed to generate:
Output (temp.csv)
ID,Name,Page,Volume,Url,2019-08-01,2019-07-01,2019-06-01,
1946592,python_1,url.com,10,url3.com,2,4,21,
1946602,python_2,url.com,20,url1.com,10.25,12,5,
using the following:
import json
dates: set = set()
# Collect the data
def get_breakdown(json):
collected_data = []
for result in json['results']:
for page in json['results'][result]:
for _id in json['results'][result][page]:
data_struct = {
'ID': _id,
'Name': json['results'][result][page][_id]['keyword']['title'],
'Page': page,
'Volume': json['results'][result][page][_id]['keyword']['volume'],
'Dates': {}
}
for date in dates:
if date in json['results'][result][page][_id]['data']:
data_struct['URL'] = json['results'][result][page][_id]['data'][date]['ENGINE']['DEVICE']['']['url']
data_struct['Dates'][date] = {'Position' : json['results'][result][page][_id]['data'][date]['ENGINE']['DEVICE']['']['position']}
else:
data_struct['Dates'][date] = {'Position' : 'null'}
collected_data.append(data_struct)
return collected_data
# Collect all dates across the whole data
# structure and save them to a set
def get_dates(json):
for result in json['results']:
for page in json['results'][result]:
for _id in json['results'][result][page]:
for date in json['results'][result][page][_id]['data']:
dates.add(date)
# Write to .csv file
def write_csv(collected_data, file_path):
f = open(file_path, "w")
# CSV Title
date_string = ''
for date in dates:
date_string = '{0}{1},'.format(date_string, date)
f.write('ID,Name,Page,Volume,Url,{0}\n'.format(date_string))
# Data
for data in collected_data:
position_string = ''
for date in dates:
position_string = '{0}{1},'.format(position_string, data['Dates'][date]['Position'])
f.write('{0},{1},{2},{3},{4},{5}\n'.format(
data['ID'],
data['Name'],
data['Page'],
data['Volume'],
data['URL'],
position_string
))
# Code Body
with open('me_filename.json') as f_input:
json_data = json.load(f_input)
get_dates(json_data)
write_csv(get_breakdown(json_data), "output.csv")
Hopefully you can follow the code and it does what is expected. I am sure that it can be made much more reliable - however as previously mentioned I couldn't make it work with the base code you provided.

After a small modification your code works great, but I noticed that showing the date as the next line would be a better solution in the format.
I tried to modify your solution to this form, but I'm still too weak in python to easily deal with it. Can you still tell me how you can do it to achieve this csv file format?
Output(temp.csv)
ID,Name,Page,Volume,Url,data,value,
1946592,python_1,url.com,10,url3.com,2019-08-01,2
1946592,python_1,url.com,10,url3.com,2019-07-01,4
1946592,python_1,url.com,10,url3.com,2019-06-01,21
1946602,python_2,url.com,20,url1.com,2019-08-01,10.25,
1946602,python_2,url.com,20,url1.com,2019-07-01,12,
1946602,python_2,url.com,20,url1.com,2019-06-01,5,

Related

Parse complex JSON in Python

EDITED WITH LARGER JSON:
I have the following JSON and I need to get id element: 624ff9f71d847202039ec220
results": [
{
"id": "62503d2800c0d0004ee4636e",
"name": "2214524",
"settings": {
"dataFetch": "static",
"dataEntities": {
"variables": [
{
"id": "624ffa191d84720202e2ed4a",
"name": "temp1",
"device": {
"id": "624ff9f71d847202039ec220",
"name": "282c0240ea4c",
"label": "282c0240ea4c",
"createdAt": "2022-04-08T09:01:43.547702Z"
},
"chartType": "line",
"aggregationMethod": "last_value"
},
{
"id": "62540816330443111016e38b",
"device": {
"id": "624ff9f71d847202039ec220",
"name": "282c0240ea4c",
},
"chartType": "line",
}
]
}
...
Here is my code (EDITED)
url = "API_URL"
response = urllib.urlopen(url)
data = json.loads(response.read().decode("utf-8"))
print url
all_ids = []
for i in data['results']: # i is a dictionary
for variable in i['settings']['dataEntities']['variables']:
print(variable['id'])
all_ids.append(variable['id'])
But I have the following error:
for variable in i['settings']['dataEntities']['variables']:
KeyError: 'dataEntities'
Could you please help?
Thanks!!

What is it printing when you print(fetc)? If you format the json, it will be easier to read, the current nesting is very hard to comprehend.
fetc is a string, not a dict. If you want the dict, you have to use the key.
Try:
url = "API_URL"
response = urllib.urlopen(url)
data = json.loads(response.read().decode("utf-8"))
print url
for i in data['results']:
print(json.dumps(i['settings']))
print(i['settings']['dataEntities']
EDIT: To get to the id field, you'll need to dive further.
i['settings']['dataEntities']['variables'][0]['id']
So if you want all the ids you'll have to loop over the variables (assuming the list is more than one)`, and if you want them for all the settings, you'll need to loop over that too.
Full solution for you to try (EDITED after you uploaded the full JSON):
url = "API_URL"
response = urllib.urlopen(url)
data = json.loads(response.read().decode("utf-8"))
print url
all_ids = []
for i in data['results']: # i is a dictionary
for variable in i['settings']['dataEntities']['variables']:
print(variable['id'])
all_ids.append(variable['id'])
all_ids.append(variable['device']['id']
Let me know if that works.

The shared JSON is not valid. A valid JSON similar to yours is:
{
"results": [
{
"settings": {
"dataFetch": "static",
"dataEntities": {
"variables": [
{
"id": "624ffa191d84720202e2ed4a",
"name": "temp1",
"span": "inherit",
"color": "#2ccce4",
"device": {
"id": "624ff9f71d847202039ec220"
}
}
]
}
}
}
]
}
In order to get a list of ids from your JSON you need a double for cycle. A Pythonic code to do that is:
all_ids = [y["device"]["id"] for x in my_json["results"] for y in x["settings"]["dataEntities"]["variables"]]
Where my_json is your initial JSON.

Is there any way to convert specific JSON data to CSV?

I have JSON format which looks like
Here is the link https://drive.google.com/file/d/1RqU2s0dqjd60dcYlxEJ8vnw9_z2fWixd/view?usp=sharing
result =
{
"ERROR":[
],
"LinkSetDbHistory":[
],
"LinkSetDb":[
{
"Link":[
{
"Id":"8116078"
},
{
"Id":"7654180"
},
{
"Id":"7643601"
},
{
"Id":"7017037"
},
{
"Id":"6190213"
},
{
"Id":"5902265"
},
{
"Id":"5441934"
},
{
"Id":"5417587"
},
{
"Id":"5370323"
},
{
"Id":"5362514"
},
{
"Id":"4818642"
},
{
"Id":"4330602"
}
],
"DbTo":"pmc",
"LinkName":"pubmed_pmc_refs"
}
],
"DbFrom":"pubmed",
"IdList":[
"25209241"
]
},
{
"ERROR":[
],
"LinkSetDbHistory":[
],
"LinkSetDb":[
{
"Link":[
{
"Id":"7874507"
},
{
"Id":"7378719"
},
{
"Id":"6719480"
},
{
"Id":"5952809"
},
{
"Id":"4944516"
}
],
"DbTo":"pmc",
"LinkName":"pubmed_pmc_refs"
}
],
"DbFrom":"pubmed",
"IdList":[
"25209630"
]
},
I want to fetch ID with a length which is 12 and list
"IdList":"25209241"
so the final output will be
IDList: length
25209241: 12 (Total number of Id in link array)
25209630 : 5 (Total number of Id in link array)
I have tried this code but not working with single or multiple values.
pmc_ids = [link["Id"] for link in results["LinkSetDb"]["Link"]]
len(pmc_ids)
How it can work with a large dataset if there?

You have "LinkSetDb" as a list containing a single dictionary but you are indexing it as if it is a dictionary. Use:
pmc_ids = [link["Id"] for link in result["LinkSetDb"][0]["Link"]]
len(pmc_ids)

The 'Link' key is inside a list. So, change pmc_ids = [link["Id"] for link in results["LinkSetDb"]["Link"]] to pmc_ids = [link["Id"] for link in results["LinkSetDb"][0]["Link"]].
To generate csv file, the code would be something like this:
import json
import csv
with open('Citation_with_ID.json', 'r') as f_json:
json_data = f_json.read()
f_json.close()
json_dict = json.loads(json_data)
csv_headers = ["IdList", "length"]
csv_values = []
for i in json_dict:
if len(i["LinkSetDb"])>0:
pmc_ids = [link["Id"] for link in i["LinkSetDb"][0]["Link"]]
else:
pmc_ids = []
length = len(pmc_ids)
if len(i['IdList'])==1:
IdList = i['IdList'][0]
else:
IdList = None
csv_values.append([IdList,length])
with open('mycsvfile.csv', 'w') as f_csv:
w = csv.writer(f_csv)
w.writerow(csv_headers)
w.writerows(csv_values)
f_csv.close()
If you want to store the values in a dictionary then something like this can be used:
values_list = list(zip(*csv_values))
dict(zip(values_list[0],values_list[1]))

Extracting specific elements from multiple JSON files and adding into single Excel

So, basically I have two JSON files and from them I need to extract only "value" and add it to a single Excel sheet.
JSON file 1
{
"flower": {
"price": {
"type": "good",
"value": 5282.0,
"direction": "up"
}
},
"furniture": {
"price": {
"type": "comfy",
"value": 9074.0,
"direction": "down"
}
}
}
JSON file 2
{
"flower": {
"price": {
"type": "good",
"value": 827.0,
"direction": "up"
}
},
"furniture": {
"price": {
"type": "comfy",
"value": 468.0,
"direction": "down"
}
}
}
Now, the output should look like this in the Excel sheet
therefore, for solving this question here's the code so far , where JSON file 1 is json.json and file 2 is json12.json
import json
import pandas as pd
with open('json.json', 'r') as f: data = json.load(f)
with open('json12.json', 'r') as f: data1 = json.load(f)
data = [{'key': k, 'value1': v['price']['value']} for k, v in data.items() if k in ['flower' , 'furniture']]
print(data)
data1 = [{'key': k, 'value2': v['price']['value']} for k, v in data.items() if k in ['flower' , 'furniture']]
print(data1)
df = pd.DataFrame(data).set_index('key')
df = pd.DataFrame(data1).set_index('key')
df.to_excel('xcel.xlsx')
after running this I'm not getting the desired output...so, plz help me in this as I'm new in learning python so, it's very hard to address the correct approach..

I believe this code does what you are requesting (if j1 and j2 are the jsons you are showing):
v1s = [j1['flower']['price']['value'], j1['furniture']['price']['value']]
v2s = [j2['flower']['price']['value'], j2['furniture']['price']['value']]
index = ['flower', 'furniture']
pd.DataFrame({'value1': v1s, 'value2': v2s, 'key': index}).set_index('key')

Problem with extrakting Data out of a List of JSONS

I want to evaluate json data using python. However, I get the error message: string indices must be integers
I have tested it with the following code, but it does not work. I tried it before with elasticsearch but im not that good in programming so i decided to try it with a homebrew solution
(sorry for the bad formating, i am new to stackoverflow)
```
import requests, json, os
from elasticsearch import Elasticsearch
directory = "C:\\Users\\Felix Bildstein\\Desktop\\Test1"
Dateien = os.listdir(directory)
index_len = len(Dateien) - 2
n = 1
# Create your dictionary class
class my_dictionary(dict):
# __init__ function
def __init__(self):
self = dict()
# Function to add key:value
def add(self, key, value):
self[key] = value
# Main Function
dict_obj = my_dictionary()
dict_obj.add(1, 'Geeks')
while n <= index_len:
try:
a, *Dateien = Dateien
n += 1
f = open(a, "r")
file_contents = (f.read())
f.close
dict_obj.add(n, file_contents)
finally:
print(n)
print(file_contents['unitPrice'])
output = dict_obj
print(output)
with open('result.json', 'w') as fp:
json.dump(output, fp)
f = open("dict.txt","w")
f.write( str(dict_obj) )
f.close()
```
It should spend the appropriate value
This is my Test Json
{
"merchantOrderId": "302-08423880-89823764",
"creationTime": {
"usecSinceEpochUtc": "15555040944000000",
"granularity": "MICROSECOND"
},
"transactionMerchant": {
"name": "Amazon.de"
},
"lineItem": [{
"purchase": {
"status": "ACCEPTED",
"unitPrice": {
"amountMicros": "4690000",
"currencyCode": {
"code": "EUR"
},
"displayString": "EUR 4,20"
},
"returnsInfo": {
"isReturnable": true,
"daysToReturn": 30
},
"fulfillment": {
"location": {
"address": [""]
},
"timeWindow": {
"startTime": {
"usecSinceEpochUtc": "155615040000000",
"granularity": "DAY"
},
"endTime": {
"usecSinceEpochUtc": "155615040000000",
"granularity": "DAY"
}
}
},
"landingPageUrl": {
"link": "https://www.amazon.de/gp/r.html?C\u003d3ILR4VQSVD3HI\u0026K\u0026M\u003durn:rtn:msg:20190417124222996c5bb6751e45b5ba12aff8d350p0eu\u0026R\u003d2FEGGCJMDBAOF\u0026T\u003dC\u0026U\u003dhttps%3A%2F%2Fwww.amazon.de%2Fdp%2FB001J8I7VG%2Fref%3Dpe_3044161_185740101_TE_item\u0026H\u003d6EXBPJA679MVNLICLRRO4K1XPFCA\u0026ref_\u003dpe_3044161_185740101_TE_item"
},
"productInfo": {
"name": "tesa Powerstrips Bildernagel, selbstklebend, weiß, 2 Stück"
}
},
"name": "tesa Powerstrips Bildernagel, selbstklebend, weiß, 2 Stück"
}],
"priceline": [{
"type": "SUBTOTAL",
"amount": {
"amountMicros": "4690000",
"currencyCode": {
"code": "EUR"
}
}
}, {
"type": "DELIVERY",
"amount": {
"amountMicros": "0",
"currencyCode": {
"code": "EUR"
},
"displayString": "EUR 0,00"
}
}]
}

Converting a Text file to JSON format using Python

I am not new to programming but not good at python data structures. I would like to know a way to convert a text file into JSON format using python since I heard using python the task is much easier with a module called import.json.
The file looks like
Source Target Value
B cells Streptococcus pneumoniae 226
B cells Candida albicans 136
B cells Mycoplasma 120
For the first line "B cells" is the source, target is the "Streptococcus pneumoniae" and value is "226". I just started with the code, but couldnot finish it. Please help
import json
prot2_names = {}
tmpfil = open("file.txt", "r");
for lin in tmpfil.readlines():
flds = lin.rstrip().split("\t")
prot2_names[flds[0]] = "\"" + flds[1] + "\""
print prot2_names+"\t",
tmpfil.close()
Wants the output to be like
{
"nodes": [
{
"name": "B cells"
},
{
"name": "Streptococcus pneumoniae"
},
{
"name": "Candida albicans"
},
{
"name": "Mycoplasma"
},
{
"links": [
{
"source": 0,
"target": 1,
"value": "226"
},
{
"source": 0,
"target": 2,
"value": "136"
},
{
"source": 0,
"target": 3,
"value": "120"
}
]
}

You can read it as a csv file and convert it into json. But, be careful with spaces as you've used it as separator, the values with spaces should be carefully handled. Otherwise, if possible make the separator , instead of space.
the working code for what you're trying,
import csv
import json
with open('file.txt', 'rb') as csvfile:
filereader = csv.reader(csvfile, delimiter=' ')
i = 0
header = []
out_data = []
for row in filereader:
row = [elem for elem in row if elem]
if i == 0:
i += 1
header = row
else:
row[0:2] = [row[0]+" "+row[1]]
_dict = {}
for elem, header_elem in zip(row, header):
_dict[header_elem] = elem
out_data.append(_dict)
print json.dumps(out_data)
output,
[
{
"Source":"B cells",
"Target":"Streptococcus",
"Value":"pneumoniae"
},
{
"Source":"B cells",
"Target":"Candida",
"Value":"albicans"
},
{
"Source":"B cells",
"Target":"Mycoplasma",
"Value":"120"
},
{
"Source":"B cells",
"Target":"Neisseria",
"Value":"111"
},
{
"Source":"B cells",
"Target":"Pseudomonas",
"Value":"aeruginosa"
}
]
UPDATE: Just noticed your updated question with json sample that you require. Hope, you could build it with the above example I've written.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python - Problem extracting data from nested json - python

Related

Parse complex JSON in Python

Is there any way to convert specific JSON data to CSV?

Extracting specific elements from multiple JSON files and adding into single Excel

Problem with extrakting Data out of a List of JSONS

Converting a Text file to JSON format using Python

Categories

Resources