How to parse JSON in Python and Bash? - python

I need to parse via Bash and Python the JSON below. I am getting different errors.
From JSON I want to get name and ObjectID information and put it on array. But don't know how to do this.
Example of JSON :
{
"aliases": [],
"localizations": {},
"name": "Super DX-Ball",
"popularity": 0,
"objectID": "7781",
"_highlightResult": {
"name": {
"value": "Super DX-<em>Ba</em>ll",
"matchLevel": "full",
"fullyHighlighted": false,
"matchedWords": [
"ba"
]
}
}
},
{
"aliases": [],
"localizations": {},
"name": "Katekyo Hitman Reborn! DS Flame Rumble X - Mirai Chou-Bakuhatsu!!",
"popularity": 0,
"objectID": "77522",
"_highlightResult": {
"name": {
"value": "Katekyo Hitman Reborn! DS Flame Rumble X - Mirai Chou-<em>Ba</em>kuhatsu!!",
"matchLevel": "full",
"fullyHighlighted": false,
"matchedWords": [
"ba"
]
}
}
},
{
"aliases": [],
"localizations": {},
"name": "Bagitman",
"popularity": 0,
"objectID": "7663",
"_highlightResult": {
"name": {
"value": "<em>Ba</em>gitman",
"matchLevel": "full",
"fullyHighlighted": false,
"matchedWords": [
"ba"
]
}
}
},
{
"aliases": [],
"localizations": {},
"name": "Virtual Bart",
"popularity": 0,
"objectID": "7616",
"_highlightResult": {
"name": {
"value": "Virtual <em>Ba</em>rt",
"matchLevel": "full",
"fullyHighlighted": false,
"matchedWords": [
"ba"
]
}
}
}
I'm getting error due that few independends jsons. Here is an example :
cat /tmp/out | jq ".name"
"Fortnite"
parse error: Expected value before ',' at line 35, column 4

The input JSON looks like an array but lacks brackets. Try to add them:
$ (echo '['; cat /tmp/out; echo ']') | jq 'map({ name, objectID })'
[
{
"name": "Super DX-Ball",
"objectID": "7781"
},
{
"name": "Katekyo Hitman Reborn! DS Flame Rumble X - Mirai Chou-Bakuhatsu!!",
"objectID": "77522"
},
{
"name": "Bagitman",
"objectID": "7663"
},
{
"name": "Virtual Bart",
"objectID": "7616"
}
]

Related

how do I access this json data in python?

hi I'm pretty new at coding and I was trying to create a program in python that reads and save in another file the data inside a json file (not everything, just what I want). I googled how to parse data but there's something I don't understand.
that's a part of the json file:
`
{
"profileRevision": 548789,
"profileId": "campaign",
"profileChangesBaseRevision": 548789,
"profileChanges": [
{
"changeType": "fullProfileUpdate",
"profile": {
"_id": "2da4f079f8984cc48e84fc99dace495d",
"created": "2018-03-29T11:02:15.190Z",
"updated": "2022-10-31T17:34:43.284Z",
"rvn": 548789,
"wipeNumber": 9,
"accountId": "63881e614ef543b2932c70fed1196f34",
"profileId": "campaign",
"version": "refund_teddy_perks_september_2022",
"items": {
"8ec8f13f-6bf6-4933-a7db-43767a055e66": {
"templateId": "Quest:heroquest_loadout_constructor_2",
"attributes": {
"quest_state": "Claimed",
"creation_time": "min",
"last_state_change_time": "2019-05-18T16:09:12.750Z",
"completion_complete_pve03_diff26_loadout_constructor": 300,
"level": -1,
"item_seen": true,
"sent_new_notification": true,
"quest_rarity": "uncommon",
"xp_reward_scalar": 1
},
"quantity": 1
},
"6940c71b-c74b-4581-9f1e-c0a87e246884": {
"templateId": "Worker:workerbasic_sr_t01",
"attributes": {
"gender": "2",
"personality": "Homebase.Worker.Personality.IsDreamer",
"level": 1,
"item_seen": true,
"squad_slot_idx": -1,
"portrait": "WorkerPortrait:IconDef-WorkerPortrait-Dreamer-F02",
"building_slot_used": -1,
"set_bonus": "Homebase.Worker.SetBonus.IsMeleeDamageLow"
}
}
}
]
}
`
I can access profileChanges. I wrote this to create another json file with only the profileChanges things:
`
myjsonfile= open("file.json",'r')
jsondata=myjsonfile.read()
obj=json.loads(jsondata)
ciso=obj['profileChanges']
for i in ciso:
print(i)
with open("file2", "w") as outfile:
json.dump( ciso, outfile, indent=1)
the issue I have is that I can't access "profile" (inside profileChanges) in the same way by parsing the new file and I have no idea on how to do it
Access to JSON or dict element is realized by list indexes, please look at below example:
a = [
{
"friends": [
{
"id": 0,
"name": "Reba May"
}
],
"greeting": "Hello, Doris Gallagher! You have 2 unread messages.",
"favoriteFruit": "strawberry"
},
]
b = a['friends']['id] # b = 0
I've added a couple of closing braces to make your snippet valid json:
s = '''{
"profileRevision": 548789,
"profileId": "campaign",
"profileChangesBaseRevision": 548789,
"profileChanges": [
{
"changeType": "fullProfileUpdate",
"profile": {
"_id": "2da4f079f8984cc48e84fc99dace495d",
"created": "2018-03-29T11:02:15.190Z",
"updated": "2022-10-31T17:34:43.284Z",
"rvn": 548789,
"wipeNumber": 9,
"accountId": "63881e614ef543b2932c70fed1196f34",
"profileId": "campaign",
"version": "refund_teddy_perks_september_2022",
"items": {
"8ec8f13f-6bf6-4933-a7db-43767a055e66": {
"templateId": "Quest:heroquest_loadout_constructor_2",
"attributes": {
"quest_state": "Claimed",
"creation_time": "min",
"last_state_change_time": "2019-05-18T16:09:12.750Z",
"completion_complete_pve03_diff26_loadout_constructor": 300,
"level": -1,
"item_seen": true,
"sent_new_notification": true,
"quest_rarity": "uncommon",
"xp_reward_scalar": 1
},
"quantity": 1
},
"6940c71b-c74b-4581-9f1e-c0a87e246884": {
"templateId": "Worker:workerbasic_sr_t01",
"attributes": {
"gender": "2",
"personality": "Homebase.Worker.Personality.IsDreamer",
"level": 1,
"item_seen": true,
"squad_slot_idx": -1,
"portrait": "WorkerPortrait:IconDef-WorkerPortrait-Dreamer-F02",
"building_slot_used": -1,
"set_bonus": "Homebase.Worker.SetBonus.IsMeleeDamageLow"
}
}
}
}
}
]
}
'''
d = json.loads(s)
print(d['profileChanges'][0]['profile']['version'])
This prints refund_teddy_perks_september_2022
Explanation:
d is a dict
d['profileChanges'] is a list of dicts
d['profileChanges'][0] is the first dict in the list
d['profileChanges'][0]['profile'] is a dict
d['profileChanges'][0]['profile']['version'] is the value of version key in the profile dict in the first entry of the profileChanges list.

Modify the value of a field of a specific nested object (its index) depending on a condition

I would like to modify the value of a field on a specific index of a nested type depending on another value of the same nested object or a field outside of the nested object.
As example, I have the current mapping of my index feed:
{
"feed": {
"mappings": {
"properties": {
"attacks_ids": {
"type": "keyword"
},
"created_by": {
"type": "keyword"
},
"date": {
"type": "date"
},
"groups_related": {
"type": "keyword"
},
"indicators": {
"type": "nested",
"properties": {
"date": {
"type": "date"
},
"description": {
"type": "text"
},
"role": {
"type": "keyword"
},
"type": {
"type": "keyword"
},
"value": {
"type": "keyword"
}
}
},
"malware_families": {
"type": "keyword"
},
"published": {
"type": "boolean"
},
"references": {
"type": "keyword"
},
"tags": {
"type": "keyword"
},
"targeted_countries": {
"type": "keyword"
},
"title": {
"type": "text"
},
"tlp": {
"type": "keyword"
}
}
}
}
}
Take the following document as example:
{
"took": 194,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "feed",
"_type": "_doc",
"_id": "W3CS7IABovFpcGfZjfyu",
"_score": 1,
"_source": {
"title": "Test",
"date": "2022-05-22T16:21:09.159711",
"created_by": "finch",
"tlp": "white",
"published": true,
"references": [
"test",
"test"
],
"tags": [
"tag1",
"tag2"
],
"targeted_countries": [
"Italy",
"Germany"
],
"malware_families": [
"family1",
"family2"
],
"groups_related": [
"group1",
"griup2"
],
"attacks_ids": [
""
],
"indicators": [
{
"value": "testest",
"description": "This is a test",
"type": "sha256",
"role": "file",
"date": "2022-05-22T16:21:09.159560"
},
{
"value": "testest2",
"description": "This is a test 2",
"type": "ipv4",
"role": "c2",
"date": "2022-05-22T16:21:09.159699"
}
]
}
}
]
}
}
I would like to make this update: indicators[0].value = 'changed'
if _id == 'W3CS7IABovFpcGfZjfyu'
or if title == 'some_title'
or if indicators[0].role == 'c2'
I already tried with a script, but it seems I can't manage to get it work, I hope the explanation is clear, ask any question if not, thank you.
Edit 1:
I managed to make it work, however it needs the _id, still looking for a way to do that without it.
My partial solution:
update = Pulse.get(id="XHCz7IABovFpcGfZWfz9") #Pulse is my document
update.update(script="for (indicator in ctx._source.indicators) {if (indicator.value=='changed2') {indicator.value='changed3'}}")
# Modify depending on the value of a field inside the same nested object

Read a JSON file and select node like in Pandas dataframe

I need to read a JSON config file like the example below and change some of its values with a querying structure like in Pandas.
Ex:
[
{
"_id": "5d1f5d0289725ba2c32695ac",
"index": 0,
"guid": "d1a8c2e2-1011-4db2-97a8-b68777c2d18b",
"isActive": false,
"name": {
"first": "Barnett",
"last": "Obrien"
},
"latitude": "-76.327744",
"longitude": "-131.003501",
"friends": [
{
"friend_id": 0,
"name": "Burnett Burke"
},
{
"friend_id": 1,
"name": "Lawrence Hunt"
},
{
"friend_id": 2,
"name": "Nola Benjamin"
}
]
},
{
"_id": "5d1f5d023ef4523b5e326ae2",
"index": 1,
"guid": "6b0ad8a7-2b10-4892-9b91-fc7445038aca",
"isActive": true,
"name": {
"first": "Valerie",
"last": "Preston"
},
"latitude": "27.995886",
"longitude": "170.930419",
"friends": [
{
"friend_id": 0,
"name": "Gretchen Hobbs"
},
{
"friend_id": 1,
"name": "Irene Fox"
},
{
"friend_id": 2,
"name": "Porter King"
}
]
}
]
Then I wanted to change the value for the friend_id == 1 and object with guid == 6b0ad8a7-2b10-4892-9b91-fc7445038aca from Irene Fox to something else.
With Pandas I can have something like this:
valerie = dataframe['guid'] == '6b0ad8a7-2b10-4892-9b91-fc7445038aca'
friend1 = dataframe['friend_1'] == 1
dataframe[valerie & friend1]['name'] = 'Karen Smith'
How can I achieve this without having to add Pandas dependency?
With simple loop:
import json
import sys
data = json.load(open('input.json'))
for d in data:
if d["guid"] == "6b0ad8a7-2b10-4892-9b91-fc7445038aca":
for f in d["friends"]:
if f["friend_id"] == 1:
f["name"] = "Karen Smith"
# break <- uncomment if only one match is implied
# replace sys.stdout with output file pointer
json.dump(data, sys.stdout, indent=4)
you may also break the outer for loop if items/dicts have unique guids.
The output (for demonstration):
[
{
"_id": "5d1f5d0289725ba2c32695ac",
"index": 0,
"guid": "d1a8c2e2-1011-4db2-97a8-b68777c2d18b",
"isActive": false,
"name": {
"first": "Barnett",
"last": "Obrien"
},
"latitude": "-76.327744",
"longitude": "-131.003501",
"friends": [
{
"friend_id": 0,
"name": "Burnett Burke"
},
{
"friend_id": 1,
"name": "Lawrence Hunt"
},
{
"friend_id": 2,
"name": "Nola Benjamin"
}
]
},
{
"_id": "5d1f5d023ef4523b5e326ae2",
"index": 1,
"guid": "6b0ad8a7-2b10-4892-9b91-fc7445038aca",
"isActive": true,
"name": {
"first": "Valerie",
"last": "Preston"
},
"latitude": "27.995886",
"longitude": "170.930419",
"friends": [
{
"friend_id": 0,
"name": "Gretchen Hobbs"
},
{
"friend_id": 1,
"name": "Karen Smith"
},
{
"friend_id": 2,
"name": "Porter King"
}
]
}
]

Python parse large JSON nests and lists - string indices must be integers

NIST recently released all CVE data in JSON format, and I am trying to parse it out to add to a MySQL database so I can compare my security findings to what NIST shows.
The data, is very confusing to parses because there is a lot of nesting, with some lists included.
Here is a snippet of the JSON.
{
"CVE_data_type": "CVE",
"CVE_data_format": "MITRE",
"CVE_data_version": "4.0",
"CVE_data_numberOfCVEs": "600",
"CVE_data_timestamp": "Fri Apr 28 16:00:10 EDT 2017",
"CVE_Items": [
{
"CVE_data_meta": {
"CVE_ID": "CVE-2007-6761"
},
"CVE_affects": {
"CVE_vendor": {
"CVE_data_version": "4.0",
"CVE_vendor_data": [
{
"CVE_vendor_name": "linux",
"CVE_product": {
"CVE_product_data": [
{
"CVE_data_version": "4.0",
"CVE_product_name": "linux_kernel",
"CVE_version": {
"CVE_version_data": [
{
"CVE_version_value": "2.6.23",
"CVE_version_affected": "<="
}
]
}
}
]
}
}
]
}
},
"CVE_configurations": {
"CVE_data_version": "4.0",
"CVE_configuration_data": [
{
"operator": "OR",
"cpe": [
{
"vulnerable": true,
"previousVersions": true,
"cpeMatchString": "cpe:/o:linux:linux_kernel:2.6.23",
"cpe23Uri": "cpe:2.3:o:linux:linux_kernel:2.6.23:*:*:*:*:*:*:*"
}
]
}
]
},
"CVE_description": {
"CVE_data_version": "4.0",
"CVE_description_data": [
{
"lang": "en",
"value": "drivers/media/video/videobuf-vmalloc.c in the Linux kernel before 2.6.24 does not initialize videobuf_mapping data structures, which allows local users to trigger an incorrect count value and videobuf leak via unspecified vectors, a different vulnerability than CVE-2010-5321."
}
]
},
"CVE_references": {
"CVE_data_version": "4.0",
"CVE_reference_data": [
{
"url": "http://www.linuxgrill.com/anonymous/kernel/v2.6/ChangeLog-2.6.24",
"name": "CONFIRM",
"publish_date": "04/24/2017"
},
{
"url": "http://www.securityfocus.com/bid/98001",
"name": "BID",
"publish_date": "04/26/2017"
},
{
"url": "https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=827340",
"name": "MISC",
"publish_date": "04/24/2017"
},
{
"url": "https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0b29669c065f60501e7289e1950fa2a618962358",
"name": "CONFIRM",
"publish_date": "04/24/2017"
},
{
"url": "https://github.com/torvalds/linux/commit/0b29669c065f60501e7289e1950fa2a618962358",
"name": "CONFIRM",
"publish_date": "04/24/2017"
}
]
},
"CVE_impact": {
"CVE_impact_cvssv2": {
"bm": {
"av": "LOCAL",
"ac": "LOW",
"au": "NONE",
"c": "PARTIAL",
"i": "PARTIAL",
"a": "PARTIAL",
"score": "4.6"
}
},
"CVE_impact_cvssv3": {
"bm": {
"av": "LOCAL",
"ac": "LOW",
"pr": "LOW",
"ui": "NONE",
"scope": "UNCHANGED",
"c": "HIGH",
"i": "HIGH",
"a": "HIGH",
"score": "7.8"
}
}
},
"CVE_problemtype": {
"CVE_data_version": "4.0",
"CVE_problemtype_data": [
{
"description": [
{
"lang": "en",
"value": "CWE-119"
}
]
}
]
}
}
]
}
When I try to parse it to get the info I want, I run into errors. Here is the code test.
import json
with open('/tmp/nvdcve-1.0-recent.json') as data_file:
cve_data = json.load(data_file)
product_list = []
for data_list in cve_data["CVE_Items"]:
for cve_tag,cve_id in data_list["CVE_data_meta"].items():
cve = str(cve_id)
for vendor_data in data_list["CVE_affects"]["CVE_vendor"]["CVE_vendor_data"]["CVE_product"]:
for data_version,product_name,version_set in vendor_data["CVE_product_data"].items():
print(product_name)
The Error
TypeError Traceback (most recent call last)
<ipython-input-10-81b0239327c1> in <module>()
10 cve = str(cve_id)
11
---> 12 for vendor_data in data_list["CVE_affects"]["CVE_vendor"]["CVE_vendor_data"]["CVE_product"]:
13 for data_version,product_name,version_set in vendor_data["CVE_product_data"].items():
14 print data_version
TypeError: list indices must be integers, not str
This is confusing to me because there is nests within nests, and lists within theses nests. I am having a hard time figuring out how to get some of this super nested info.
I feel your pain, but after closer inspection "CVE_vendor_data" is not a dictionary, but a list of dictionaries. Notice the "[]" after the colon. That is why it needs integers to index the list. Same goes for "CVE_product_data". It is also a list of dictionaries.

Convert deeply nested json from facebook to dataframe in python

I am trying to get user details of persons who has put likes, comments on Facebook posts. I am using python facebook-sdk package. Code is as follows.
import facebook as fi
import json
graph = fi.GraphAPI('Access Token')
data = json.dumps(graph.get_object('DSIfootcandy/posts'))
From the above, I am getting a highly nested json. Here I will put only a json string for one post in the fb.
{
"paging": {
"next": "https://graph.facebook.com/v2.0/425073257683630/posts?access_token=&limit=25&until=1449201121&__paging_token=enc_AdD0DL6sN3aDZCwfYY25rJLW9IZBZCLM1QfX0venal6rpjUNvAWZBOoxTjbOYZAaFiBImzMqiv149HPH5FBJFo0nSVOPqUy78S0YvwZDZD",
"previous": "https://graph.facebook.com/v2.0/425073257683630/posts?since=1450843741&access_token=&limit=25&__paging_token=enc_AdCYobFJpcNavx6STzfPFyFe6eQQxRhkObwl2EdulwL7mjbnIETve7sJZCPMwVm7lu7yZA5FoY5Q4sprlQezF4AlGfZCWALClAZDZD&__previous=1"
},
"data": [
{
"picture": "https://fbcdn-photos-e-a.akamaihd.net/hphotos-ak-xfa1/v/t1.0-0/p130x130/1285_5066979392443_n.png?oh=b37a42ee58654f08af5abbd4f52b1ace&oe=570898E7&__gda__=1461440649_aa94b9ec60f22004675c4a527e8893f",
"is_hidden": false,
"likes": {
"paging": {
"cursors": {
"after": "MTU3NzQxODMzNTg0NDcwNQ==",
"before": "MTU5Mzc1MjA3NDE4ODgwMA=="
}
},
"data": [
{
"id": "1593752074188800",
"name": "Maduri Priyadarshani"
},
{
"id": "427605680763414",
"name": "Darshi Mashika"
},
{
"id": "599793563453832",
"name": "Shakeer Nimeshani Shashikala"
},
{
"id": "1577418335844705",
"name": "Däzlling Jalali Muishu"
}
]
},
"from": {
"category": "Retail and Consumer Merchandise",
"name": "Footcandy",
"category_list": [
{
"id": "2239",
"name": "Retail and Consumer Merchandise"
}
],
"id": "425073257683630"
},
"name": "Timeline Photos",
"privacy": {
"allow": "",
"deny": "",
"friends": "",
"description": "",
"value": ""
},
"is_expired": false,
"comments": {
"paging": {
"cursors": {
"after": "WTI5dGJXVnVkRjlqZFhKemIzSUVXdNVFExTURRd09qRTBOVEE0TkRRNE5EVT0=",
"before": "WTI5dGJXVnVkRjlqZFhKemIzNE16Y3dNVFExTVRFNE9qRTBOVEE0TkRRME5UVT0="
}
},
"data": [
{
"from": {
"name": "NiFû Shafrà",
"id": "1025030640553"
},
"like_count": 0,
"can_remove": false,
"created_time": "2015-12-23T04:20:55+0000",
"message": "wow lovely one",
"id": "50018692683829_500458145118",
"user_likes": false
},
{
"from": {
"name": "Shamnaz Lukmanjee",
"id": "160625809961884"
},
"like_count": 0,
"can_remove": false,
"created_time": "2015-12-23T04:27:25+0000",
"message": "Nice",
"id": "500186926838929_500450145040",
"user_likes": false
}
]
},
"actions": [
{
"link": "https://www.facebook.com/425073257683630/posts/5001866838929",
"name": "Comment"
},
{
"link": "https://www.facebook.com/42507683630/posts/500186926838929",
"name": "Like"
}
],
"updated_time": "2015-12-23T04:27:25+0000",
"link": "https://www.facebook.com/DSIFootcandy/photos/a.438926536298302.1073741827.4250732576630/50086926838929/?type=3",
"object_id": "50018692838929",
"shares": {
"count": 3
},
"created_time": "2015-12-23T04:09:01+0000",
"message": "Reach new heights in the cute and extremely comfortable \"Silviar\" www.focandy.lk",
"type": "photo",
"id": "425077683630_50018926838929",
"status_type": "added_photos",
"icon": "https://www.facebook.com/images/icons/photo1.gif"
}
]
}
Now I need to get this data into a dataframe as follows(no need to get all).
item | Like_id |Like_username | comments_userid |comments_username|comment(msg)|
-----+---------+--------------+-----------------+-----------------+------------+
Bag | 45546 | noel | 641 | James | nice work |
-----+---------+--------------+-----------------+-----------------+------------+
Any Help will be Highly Appreciated.
Not exactly like your intended format, but here is the making of a solution :
import pandas
DictionaryObject_as_List = str(mydict).replace("{","").replace("}","").replace("[","").replace("]","").split(",")
newlist = []
for row in DictionaryObject_as_List :
row = row.replace('https://',' ').split(":")
exec('newlist.append ( ' + "[" + " , ".join(row)+"]" + ')')
DataFrame_Object = pandas.DataFrame(newlist)
print DataFrame_Object

Categories

Resources