I am trying to convert JSON data into a CSV in Python and found this code listed on Stack Exchange from a while back (link:How can I convert JSON to CSV?). It no longer works in Python 3, giving me different errors. Anyone know how to fix for Python 3? Thanks.
Below is my JSON data:
{ "fruit": [
{ "name": "Apple",
"binomial name": "Malus domestica",
"major_producers": [ "China", "United States", "Turkey" ],
"nutrition":
{ "carbohydrates": "13.81g",
"fat": "0.17g",
"protein": "0.26g"
}
},
{ "name": "Orange",
"binomial name": "Citrus x sinensis",
"major_producers": [ "Brazil", "United States", "India" ],
"nutrition":
{ "carbohydrates": "11.75g",
"fat": "0.12g",
"protein": "0.94g"
}
},
{ "name": "Mango",
"binomial name": "Mangifera indica",
"major_producers": [ "India", "China", "Thailand" ],
"nutrition":
{ "carbohydrates": "15g",
"fat": "0.38g",
"protein": "0.82g"
}
}
] }
The output CSV should look like
the most easiest way to go would be throwing the desired dict into a pandas dataframe and use its .to_csv() method:
json_data = { "fruit": [ { "name": "Apple", "binomial name": "Malus domestica", "major_producers": [ "China", "United States", "Turkey" ], "nutrition": { "carbohydrates": "13.81g", "fat": "0.17g", "protein": "0.26g" } }, { "name": "Orange", "binomial name": "Citrus x sinensis", "major_producers": [ "Brazil", "United States", "India" ], "nutrition": { "carbohydrates": "11.75g", "fat": "0.12g", "protein": "0.94g" } }, { "name": "Mango", "binomial name": "Mangifera indica", "major_producers": [ "India", "China", "Thailand" ], "nutrition": { "carbohydrates": "15g", "fat": "0.38g", "protein": "0.82g" } } ] }
df = pd.DataFrame(json_data['fruit'])
df.to_csv('/wherever/file/shall/roam/test.csv')
which leads to a csv file like
Still using pandas but slightly different approach by treating your JSON as a dictionary
import pandas as pd
import pprint as pprint
x = { "fruit": [ { "name": "Apple", "binomial name": "Malus domestica", "major_producers": [ "China", "United States", "Turkey" ], "nutrition": { "carbohydrates": "13.81g", "fat": "0.17g", "protein": "0.26g" } }, { "name": "Orange", "binomial name": "Citrus x sinensis", "major_producers": [ "Brazil", "United States", "India" ], "nutrition": { "carbohydrates": "11.75g", "fat": "0.12g", "protein": "0.94g" } }, { "name": "Mango", "binomial name": "Mangifera indica", "major_producers": [ "India", "China", "Thailand" ], "nutrition": { "carbohydrates": "15g", "fat": "0.38g", "protein": "0.82g" } } ] }
add some additional information to the dict that will give additional headers closer to the desired output.
for item in x['fruit']:
for index, country in enumerate(item['major_producers']):
new_key = 'major_producers'+str(index + 1)
item[new_key] = country
item['carbs'] = item['nutrition']['carbohydrates']
item['fat'] = item['nutrition']['fat']
item['protein']= item['nutrition']['protein']
pretty print of the updated dict
pprint(x['fruit'])
Create the pandas dataframe from the list of dicts as in:
xdf = pd.DataFrame.from_dict(x['fruit'])
Use only the headers you require
xdf = xdf[['name', 'binomial name', 'major_producers1','major_producers2','major_producers3','carbs','fat','protein']]
Then as #SpghttCd mentions you can use the pd.to_csv. No need for index in this case.
xdf.to_csv('filename.csv',index=False)
The csv file should look like this:
Related
this time I have some strange json that I want to convert to something more readable but I don't know how to do it in Python language:
Current json format:
{'data': [{'VALUE': '{"filters":[ {"field":"example1","operation":"like","values":["Completed"]},{"field":"example2","operation":"like","values":["value1","value2","value3"]}]}'}]}
Json that I want to obtain for further data processing:
{
"filters": [
{
"field": "example1",
"operation": "like",
"values": [
"Completed"
]
},
{
"field": "example2",
"operation": "like",
"values": [
"value1",
"value2",
"value3",
]
}
]
}
Try:
import json
data = {
"data": [
{
"VALUE": '{"filters":[ {"field":"example1","operation":"like","values":["Completed"]},{"field":"example2","operation":"like","values":["value1","value2","value3"]}]}'
}
]
}
dct = json.loads(data["data"][0]["VALUE"])
print(dct)
Prints:
{
"filters": [
{"field": "example1", "operation": "like", "values": ["Completed"]},
{
"field": "example2",
"operation": "like",
"values": ["value1", "value2", "value3"],
},
]
}
Currently I using ParseHub to scrape some basic data about a list counties, the json file for this can be seen below. I also want to scrape the current time of each country which means going to other website were such information can be found, but the list of counties on that website are in a complete different order meaning each country would end up with the incorrect time.
Is there a was I can scrape the time of each country and have it appended to the correct countries json object or am I thinking about this the wrong way?
country.json
{
"country": [
{
"name": "China",
"pop": "1,438,801,917",
"area": "9,706,961 km²",
"growth": "0.39%",
"worldPer": "18.47%",
"rank": "1"
},
{
"name": "India",
"pop": "1,378,687,736",
"area": "3,287,590 km²",
"growth": "0.99%",
"worldPer": "17.70%",
"rank": "2"
},
{
"name": "United States",
"pop": "330,812,025",
"area": "9,372,610 km²",
"growth": "0.59%",
"worldPer": "4.25%",
"rank": "3"
}
{
time.json
{
"country": [
{
"name": "china",
"time": "18:36"
}
{
How would I go about adding this data to the China object in country.json
Try this:
import json
with open('country.json') as f1, open('time.json') as f2:
country = json.loads(f1.read())
time = json.loads(f2.read())
country = {x['name'].lower(): x for x in country['country']}
for y in time['country']:
if y['name'].lower() in country:
country[y['name'].lower()]['time'] = y['time']
country = {'country': list(country.values())}
with open('country.json', 'w') as fw:
json.dump(country, fw)
Output:
country.json
{
"country": [
{
"name": "China",
"pop": "1,438,801,917",
"area": "9,706,961 km²",
"growth": "0.39%",
"worldPer": "18.47%",
"rank": "1",
"time": "18:36"
},
{
"name": "India",
"pop": "1,378,687,736",
"area": "3,287,590 km²",
"growth": "0.99%",
"worldPer": "17.70%",
"rank": "2"
},
{
"name": "United States",
"pop": "330,812,025",
"area": "9,372,610 km²",
"growth": "0.59%",
"worldPer": "4.25%",
"rank": "3"
}
]
}
{
"type": "Data",
"version": "1.0",
"box": {
"identifier": "abcdef",
"serial": "12345678"
},
"payload": {
"Type": "EL",
"Version": "1",
"Result": "Successful",
"Reference": null,
"Box": {
"Identifier": "abcdef",
"Serial": "12345678"
},
"Configuration": {
"EL": "1"
},
"vent": [
{
"ventType": "Arm",
"Timestamp": "2020-03-18T12:17:04+10:00",
"Parameters": [
{
"Name": "Arm",
"Value": "LT"
},
{
"Name": "Status",
"Value": "LD"
}
]
},
{
"ventType": "Arm",
"Timestamp": "2020-03-18T12:17:24+10:00",
"Parameters": [
{
"Name": "Arm",
"Value": "LT"
},
{
"Name": "Status",
"Value": "LD"
}
]
},
{
"EventType": "TimeUpdateCompleted",
"Timestamp": "2020-03-18T02:23:21.2979668Z",
"Parameters": [
{
"Name": "ActualAdjustment",
"Value": "PT0S"
},
{
"Name": "CorrectionOffset",
"Value": "PT0S"
},
{
"Name": "Latency",
"Value": "PT0.2423996S"
}
]
}
]
}
}
If you're looking to transfer information from a JSON file to a CSV, then you can use the following code to read in a JSON file into a dictionary in Python:
import json
with open('data.txt') as json_file:
data_dict = json.load(json_file)
You could then convert this dictionary into a list with either data_dict.items() or data_dict.values().
Then you just need to write this list to a CSV file which you can easily do by just looping through the list.
Below is the dataframe
df = pd.DataFrame([['xxx xxx','specs','67646546','TEST 123','United States of America']], columns = ['name', 'type', 'aim', 'aimd','context' ])
I am trying to add a object 'aimd' under 'data'.
Below is the format
{
"entities": [{
"name": "xxx xxx",
"type": "specs",
"data": {
"attributes": {
"aimd": {
"values": [{
"value": "xxxxx",
"source": "internal",
"locale": "en_Us"
}
]
}
},
"contexts": [{
"attributes": {
"aim": {
"values": [{
"value": "67646546",
"source": "internal",
"locale": "en_Us"
}
]
}
},
"context": {
"country": "United States of America"
}
}
]
}
}
]
}
It's just a matter of inserting an additional array:
import pandas as pd
import json
df = pd.DataFrame([['xxx xxx','specs','67646546','TEST 123','R12',43,'789S','XXX','SSSS','GGG','TTT','United States of America']], columns = ['name', 'type', 'aim', 'aimd','aim1','aim2','alim1','alim2','alim3','apim','asim','context' ])
exclude_list = ['name','type','aimd','context']
data = {'entities':[]}
for key,grp in df.groupby('name'):
for idx, row in grp.iterrows():
temp_dict_alpha = {'name':key,'type':row['type'],'data' :{'attributes':{},'contexts':[{'attributes':{},'context':{'country':row['context']}}]}}
attr_row = row[~row.index.isin(['name','type'])]
for idx2,row2 in attr_row.iteritems():
if idx2 not in exclude_list:
dict_temp = {}
dict_temp[idx2] = {'values':[]}
dict_temp[idx2]['values'].append({'value':row2,'source':'internal','locale':'en_Us'})
temp_dict_alpha['data']['contexts'][0]['attributes'].update(dict_temp)
if idx2 == 'aimd':
dict_temp = {}
dict_temp[idx2] = {'values':[]}
dict_temp[idx2]['values'].append({'value':row2,'source':'internal','locale':'en_Us'})
temp_dict_alpha['data']['attributes'].update(dict_temp)
data['entities'].append(temp_dict_alpha)
print(json.dumps(data, indent = 4))
Output:
print(json.dumps(data, indent = 4))
{
"entities": [
{
"name": "xxx xxx",
"type": "specs",
"data": {
"attributes": {
"aimd": {
"values": [
{
"value": "TEST 123",
"source": "internal",
"locale": "en_Us"
}
]
}
},
"contexts": [
{
"attributes": {
"aim": {
"values": [
{
"value": "67646546",
"source": "internal",
"locale": "en_Us"
}
]
},
"aim1": {
"values": [
{
"value": "R12",
"source": "internal",
"locale": "en_Us"
}
]
},
"aim2": {
"values": [
{
"value": 43,
"source": "internal",
"locale": "en_Us"
}
]
},
"alim1": {
"values": [
{
"value": "789S",
"source": "internal",
"locale": "en_Us"
}
]
},
"alim2": {
"values": [
{
"value": "XXX",
"source": "internal",
"locale": "en_Us"
}
]
},
"alim3": {
"values": [
{
"value": "SSSS",
"source": "internal",
"locale": "en_Us"
}
]
},
"apim": {
"values": [
{
"value": "GGG",
"source": "internal",
"locale": "en_Us"
}
]
},
"asim": {
"values": [
{
"value": "TTT",
"source": "internal",
"locale": "en_Us"
}
]
}
},
"context": {
"country": "United States of America"
}
}
]
}
}
]
}
NIST recently released all CVE data in JSON format, and I am trying to parse it out to add to a MySQL database so I can compare my security findings to what NIST shows.
The data, is very confusing to parses because there is a lot of nesting, with some lists included.
Here is a snippet of the JSON.
{
"CVE_data_type": "CVE",
"CVE_data_format": "MITRE",
"CVE_data_version": "4.0",
"CVE_data_numberOfCVEs": "600",
"CVE_data_timestamp": "Fri Apr 28 16:00:10 EDT 2017",
"CVE_Items": [
{
"CVE_data_meta": {
"CVE_ID": "CVE-2007-6761"
},
"CVE_affects": {
"CVE_vendor": {
"CVE_data_version": "4.0",
"CVE_vendor_data": [
{
"CVE_vendor_name": "linux",
"CVE_product": {
"CVE_product_data": [
{
"CVE_data_version": "4.0",
"CVE_product_name": "linux_kernel",
"CVE_version": {
"CVE_version_data": [
{
"CVE_version_value": "2.6.23",
"CVE_version_affected": "<="
}
]
}
}
]
}
}
]
}
},
"CVE_configurations": {
"CVE_data_version": "4.0",
"CVE_configuration_data": [
{
"operator": "OR",
"cpe": [
{
"vulnerable": true,
"previousVersions": true,
"cpeMatchString": "cpe:/o:linux:linux_kernel:2.6.23",
"cpe23Uri": "cpe:2.3:o:linux:linux_kernel:2.6.23:*:*:*:*:*:*:*"
}
]
}
]
},
"CVE_description": {
"CVE_data_version": "4.0",
"CVE_description_data": [
{
"lang": "en",
"value": "drivers/media/video/videobuf-vmalloc.c in the Linux kernel before 2.6.24 does not initialize videobuf_mapping data structures, which allows local users to trigger an incorrect count value and videobuf leak via unspecified vectors, a different vulnerability than CVE-2010-5321."
}
]
},
"CVE_references": {
"CVE_data_version": "4.0",
"CVE_reference_data": [
{
"url": "http://www.linuxgrill.com/anonymous/kernel/v2.6/ChangeLog-2.6.24",
"name": "CONFIRM",
"publish_date": "04/24/2017"
},
{
"url": "http://www.securityfocus.com/bid/98001",
"name": "BID",
"publish_date": "04/26/2017"
},
{
"url": "https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=827340",
"name": "MISC",
"publish_date": "04/24/2017"
},
{
"url": "https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0b29669c065f60501e7289e1950fa2a618962358",
"name": "CONFIRM",
"publish_date": "04/24/2017"
},
{
"url": "https://github.com/torvalds/linux/commit/0b29669c065f60501e7289e1950fa2a618962358",
"name": "CONFIRM",
"publish_date": "04/24/2017"
}
]
},
"CVE_impact": {
"CVE_impact_cvssv2": {
"bm": {
"av": "LOCAL",
"ac": "LOW",
"au": "NONE",
"c": "PARTIAL",
"i": "PARTIAL",
"a": "PARTIAL",
"score": "4.6"
}
},
"CVE_impact_cvssv3": {
"bm": {
"av": "LOCAL",
"ac": "LOW",
"pr": "LOW",
"ui": "NONE",
"scope": "UNCHANGED",
"c": "HIGH",
"i": "HIGH",
"a": "HIGH",
"score": "7.8"
}
}
},
"CVE_problemtype": {
"CVE_data_version": "4.0",
"CVE_problemtype_data": [
{
"description": [
{
"lang": "en",
"value": "CWE-119"
}
]
}
]
}
}
]
}
When I try to parse it to get the info I want, I run into errors. Here is the code test.
import json
with open('/tmp/nvdcve-1.0-recent.json') as data_file:
cve_data = json.load(data_file)
product_list = []
for data_list in cve_data["CVE_Items"]:
for cve_tag,cve_id in data_list["CVE_data_meta"].items():
cve = str(cve_id)
for vendor_data in data_list["CVE_affects"]["CVE_vendor"]["CVE_vendor_data"]["CVE_product"]:
for data_version,product_name,version_set in vendor_data["CVE_product_data"].items():
print(product_name)
The Error
TypeError Traceback (most recent call last)
<ipython-input-10-81b0239327c1> in <module>()
10 cve = str(cve_id)
11
---> 12 for vendor_data in data_list["CVE_affects"]["CVE_vendor"]["CVE_vendor_data"]["CVE_product"]:
13 for data_version,product_name,version_set in vendor_data["CVE_product_data"].items():
14 print data_version
TypeError: list indices must be integers, not str
This is confusing to me because there is nests within nests, and lists within theses nests. I am having a hard time figuring out how to get some of this super nested info.
I feel your pain, but after closer inspection "CVE_vendor_data" is not a dictionary, but a list of dictionaries. Notice the "[]" after the colon. That is why it needs integers to index the list. Same goes for "CVE_product_data". It is also a list of dictionaries.