Read a JSON file and select node like in Pandas dataframe - python

I need to read a JSON config file like the example below and change some of its values with a querying structure like in Pandas.
Ex:
[
{
"_id": "5d1f5d0289725ba2c32695ac",
"index": 0,
"guid": "d1a8c2e2-1011-4db2-97a8-b68777c2d18b",
"isActive": false,
"name": {
"first": "Barnett",
"last": "Obrien"
},
"latitude": "-76.327744",
"longitude": "-131.003501",
"friends": [
{
"friend_id": 0,
"name": "Burnett Burke"
},
{
"friend_id": 1,
"name": "Lawrence Hunt"
},
{
"friend_id": 2,
"name": "Nola Benjamin"
}
]
},
{
"_id": "5d1f5d023ef4523b5e326ae2",
"index": 1,
"guid": "6b0ad8a7-2b10-4892-9b91-fc7445038aca",
"isActive": true,
"name": {
"first": "Valerie",
"last": "Preston"
},
"latitude": "27.995886",
"longitude": "170.930419",
"friends": [
{
"friend_id": 0,
"name": "Gretchen Hobbs"
},
{
"friend_id": 1,
"name": "Irene Fox"
},
{
"friend_id": 2,
"name": "Porter King"
}
]
}
]
Then I wanted to change the value for the friend_id == 1 and object with guid == 6b0ad8a7-2b10-4892-9b91-fc7445038aca from Irene Fox to something else.
With Pandas I can have something like this:
valerie = dataframe['guid'] == '6b0ad8a7-2b10-4892-9b91-fc7445038aca'
friend1 = dataframe['friend_1'] == 1
dataframe[valerie & friend1]['name'] = 'Karen Smith'
How can I achieve this without having to add Pandas dependency?

With simple loop:
import json
import sys
data = json.load(open('input.json'))
for d in data:
if d["guid"] == "6b0ad8a7-2b10-4892-9b91-fc7445038aca":
for f in d["friends"]:
if f["friend_id"] == 1:
f["name"] = "Karen Smith"
# break <- uncomment if only one match is implied
# replace sys.stdout with output file pointer
json.dump(data, sys.stdout, indent=4)
you may also break the outer for loop if items/dicts have unique guids.
The output (for demonstration):
[
{
"_id": "5d1f5d0289725ba2c32695ac",
"index": 0,
"guid": "d1a8c2e2-1011-4db2-97a8-b68777c2d18b",
"isActive": false,
"name": {
"first": "Barnett",
"last": "Obrien"
},
"latitude": "-76.327744",
"longitude": "-131.003501",
"friends": [
{
"friend_id": 0,
"name": "Burnett Burke"
},
{
"friend_id": 1,
"name": "Lawrence Hunt"
},
{
"friend_id": 2,
"name": "Nola Benjamin"
}
]
},
{
"_id": "5d1f5d023ef4523b5e326ae2",
"index": 1,
"guid": "6b0ad8a7-2b10-4892-9b91-fc7445038aca",
"isActive": true,
"name": {
"first": "Valerie",
"last": "Preston"
},
"latitude": "27.995886",
"longitude": "170.930419",
"friends": [
{
"friend_id": 0,
"name": "Gretchen Hobbs"
},
{
"friend_id": 1,
"name": "Karen Smith"
},
{
"friend_id": 2,
"name": "Porter King"
}
]
}
]

Related

how do I access this json data in python?

hi I'm pretty new at coding and I was trying to create a program in python that reads and save in another file the data inside a json file (not everything, just what I want). I googled how to parse data but there's something I don't understand.
that's a part of the json file:
`
{
"profileRevision": 548789,
"profileId": "campaign",
"profileChangesBaseRevision": 548789,
"profileChanges": [
{
"changeType": "fullProfileUpdate",
"profile": {
"_id": "2da4f079f8984cc48e84fc99dace495d",
"created": "2018-03-29T11:02:15.190Z",
"updated": "2022-10-31T17:34:43.284Z",
"rvn": 548789,
"wipeNumber": 9,
"accountId": "63881e614ef543b2932c70fed1196f34",
"profileId": "campaign",
"version": "refund_teddy_perks_september_2022",
"items": {
"8ec8f13f-6bf6-4933-a7db-43767a055e66": {
"templateId": "Quest:heroquest_loadout_constructor_2",
"attributes": {
"quest_state": "Claimed",
"creation_time": "min",
"last_state_change_time": "2019-05-18T16:09:12.750Z",
"completion_complete_pve03_diff26_loadout_constructor": 300,
"level": -1,
"item_seen": true,
"sent_new_notification": true,
"quest_rarity": "uncommon",
"xp_reward_scalar": 1
},
"quantity": 1
},
"6940c71b-c74b-4581-9f1e-c0a87e246884": {
"templateId": "Worker:workerbasic_sr_t01",
"attributes": {
"gender": "2",
"personality": "Homebase.Worker.Personality.IsDreamer",
"level": 1,
"item_seen": true,
"squad_slot_idx": -1,
"portrait": "WorkerPortrait:IconDef-WorkerPortrait-Dreamer-F02",
"building_slot_used": -1,
"set_bonus": "Homebase.Worker.SetBonus.IsMeleeDamageLow"
}
}
}
]
}
`
I can access profileChanges. I wrote this to create another json file with only the profileChanges things:
`
myjsonfile= open("file.json",'r')
jsondata=myjsonfile.read()
obj=json.loads(jsondata)
ciso=obj['profileChanges']
for i in ciso:
print(i)
with open("file2", "w") as outfile:
json.dump( ciso, outfile, indent=1)
the issue I have is that I can't access "profile" (inside profileChanges) in the same way by parsing the new file and I have no idea on how to do it
Access to JSON or dict element is realized by list indexes, please look at below example:
a = [
{
"friends": [
{
"id": 0,
"name": "Reba May"
}
],
"greeting": "Hello, Doris Gallagher! You have 2 unread messages.",
"favoriteFruit": "strawberry"
},
]
b = a['friends']['id] # b = 0
I've added a couple of closing braces to make your snippet valid json:
s = '''{
"profileRevision": 548789,
"profileId": "campaign",
"profileChangesBaseRevision": 548789,
"profileChanges": [
{
"changeType": "fullProfileUpdate",
"profile": {
"_id": "2da4f079f8984cc48e84fc99dace495d",
"created": "2018-03-29T11:02:15.190Z",
"updated": "2022-10-31T17:34:43.284Z",
"rvn": 548789,
"wipeNumber": 9,
"accountId": "63881e614ef543b2932c70fed1196f34",
"profileId": "campaign",
"version": "refund_teddy_perks_september_2022",
"items": {
"8ec8f13f-6bf6-4933-a7db-43767a055e66": {
"templateId": "Quest:heroquest_loadout_constructor_2",
"attributes": {
"quest_state": "Claimed",
"creation_time": "min",
"last_state_change_time": "2019-05-18T16:09:12.750Z",
"completion_complete_pve03_diff26_loadout_constructor": 300,
"level": -1,
"item_seen": true,
"sent_new_notification": true,
"quest_rarity": "uncommon",
"xp_reward_scalar": 1
},
"quantity": 1
},
"6940c71b-c74b-4581-9f1e-c0a87e246884": {
"templateId": "Worker:workerbasic_sr_t01",
"attributes": {
"gender": "2",
"personality": "Homebase.Worker.Personality.IsDreamer",
"level": 1,
"item_seen": true,
"squad_slot_idx": -1,
"portrait": "WorkerPortrait:IconDef-WorkerPortrait-Dreamer-F02",
"building_slot_used": -1,
"set_bonus": "Homebase.Worker.SetBonus.IsMeleeDamageLow"
}
}
}
}
}
]
}
'''
d = json.loads(s)
print(d['profileChanges'][0]['profile']['version'])
This prints refund_teddy_perks_september_2022
Explanation:
d is a dict
d['profileChanges'] is a list of dicts
d['profileChanges'][0] is the first dict in the list
d['profileChanges'][0]['profile'] is a dict
d['profileChanges'][0]['profile']['version'] is the value of version key in the profile dict in the first entry of the profileChanges list.

Modify the value of a field of a specific nested object (its index) depending on a condition

I would like to modify the value of a field on a specific index of a nested type depending on another value of the same nested object or a field outside of the nested object.
As example, I have the current mapping of my index feed:
{
"feed": {
"mappings": {
"properties": {
"attacks_ids": {
"type": "keyword"
},
"created_by": {
"type": "keyword"
},
"date": {
"type": "date"
},
"groups_related": {
"type": "keyword"
},
"indicators": {
"type": "nested",
"properties": {
"date": {
"type": "date"
},
"description": {
"type": "text"
},
"role": {
"type": "keyword"
},
"type": {
"type": "keyword"
},
"value": {
"type": "keyword"
}
}
},
"malware_families": {
"type": "keyword"
},
"published": {
"type": "boolean"
},
"references": {
"type": "keyword"
},
"tags": {
"type": "keyword"
},
"targeted_countries": {
"type": "keyword"
},
"title": {
"type": "text"
},
"tlp": {
"type": "keyword"
}
}
}
}
}
Take the following document as example:
{
"took": 194,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "feed",
"_type": "_doc",
"_id": "W3CS7IABovFpcGfZjfyu",
"_score": 1,
"_source": {
"title": "Test",
"date": "2022-05-22T16:21:09.159711",
"created_by": "finch",
"tlp": "white",
"published": true,
"references": [
"test",
"test"
],
"tags": [
"tag1",
"tag2"
],
"targeted_countries": [
"Italy",
"Germany"
],
"malware_families": [
"family1",
"family2"
],
"groups_related": [
"group1",
"griup2"
],
"attacks_ids": [
""
],
"indicators": [
{
"value": "testest",
"description": "This is a test",
"type": "sha256",
"role": "file",
"date": "2022-05-22T16:21:09.159560"
},
{
"value": "testest2",
"description": "This is a test 2",
"type": "ipv4",
"role": "c2",
"date": "2022-05-22T16:21:09.159699"
}
]
}
}
]
}
}
I would like to make this update: indicators[0].value = 'changed'
if _id == 'W3CS7IABovFpcGfZjfyu'
or if title == 'some_title'
or if indicators[0].role == 'c2'
I already tried with a script, but it seems I can't manage to get it work, I hope the explanation is clear, ask any question if not, thank you.
Edit 1:
I managed to make it work, however it needs the _id, still looking for a way to do that without it.
My partial solution:
update = Pulse.get(id="XHCz7IABovFpcGfZWfz9") #Pulse is my document
update.update(script="for (indicator in ctx._source.indicators) {if (indicator.value=='changed2') {indicator.value='changed3'}}")
# Modify depending on the value of a field inside the same nested object

Best way to build denormilazed dataframe with pandas from spotify API

I just downloaded some json from spotify and took a look into the pd.normalize_json().
But if I normalise the data i still have dictionaries within my dataframe. Also setting the level doesnt help.
DATA I want to have in my dataframe:
{
"collaborative": false,
"description": "",
"external_urls": {
"spotify": "https://open.spotify.com/playlist/5"
},
"followers": {
"href": null,
"total": 0
},
"href": "https://api.spotify.com/v1/playlists/5?additional_types=track",
"id": "5",
"images": [
{
"height": 640,
"url": "https://i.scdn.co/image/a",
"width": 640
}
],
"name": "Another",
"owner": {
"display_name": "user",
"external_urls": {
"spotify": "https://open.spotify.com/user/user"
},
"href": "https://api.spotify.com/v1/users/user",
"id": "user",
"type": "user",
"uri": "spotify:user:user"
},
"primary_color": null,
"public": true,
"snapshot_id": "M2QxNTcyYTkMDc2",
"tracks": {
"href": "https://api.spotify.com/v1/playlists/100&additional_types=track",
"items": [
{
"added_at": "2020-12-13T18:34:09Z",
"added_by": {
"external_urls": {
"spotify": "https://open.spotify.com/user/user"
},
"href": "https://api.spotify.com/v1/users/user",
"id": "user",
"type": "user",
"uri": "spotify:user:user"
},
"is_local": false,
"primary_color": null,
"track": {
"album": {
"album_type": "album",
"artists": [
{
"external_urls": {
"spotify": "https://open.spotify.com/artist/1dfeR4Had"
},
"href": "https://api.spotify.com/v1/artists/1dfDbWqFHLkxsg1d",
"id": "1dfeR4HaWDbWqFHLkxsg1d",
"name": "Q",
"type": "artist",
"uri": "spotify:artist:1dfeRqFHLkxsg1d"
}
],
"available_markets": [
"CA",
"US"
],
"external_urls": {
"spotify": "https://open.spotify.com/album/6wPXmlLzZ5cCa"
},
"href": "https://api.spotify.com/v1/albums/6wPXUJ9LzZ5cCa",
"id": "6wPXUmYJ9zZ5cCa",
"images": [
{
"height": 640,
"url": "https://i.scdn.co/image/ab676620a47",
"width": 640
},
{
"height": 300,
"url": "https://i.scdn.co/image/ab67616d0620a47",
"width": 300
},
{
"height": 64,
"url": "https://i.scdn.co/image/ab603e6620a47",
"width": 64
}
],
"name": "The (Deluxe ",
"release_date": "1920-07-17",
"release_date_precision": "day",
"total_tracks": 15,
"type": "album",
"uri": "spotify:album:6m5cCa"
},
"artists": [
{
"external_urls": {
"spotify": "https://open.spotify.com/artist/1dg1d"
},
"href": "https://api.spotify.com/v1/artists/1dsg1d",
"id": "1dfeR4HaWDbWqFHLkxsg1d",
"name": "Q",
"type": "artist",
"uri": "spotify:artist:1dxsg1d"
}
],
"available_markets": [
"CA",
"US"
],
"disc_number": 1,
"duration_ms": 21453,
"episode": false,
"explicit": false,
"external_ids": {
"isrc": "GBU6015"
},
"external_urls": {
"spotify": "https://open.spotify.com/track/5716J"
},
"href": "https://api.spotify.com/v1/tracks/5716J",
"id": "5716J",
"is_local": false,
"name": "Another",
"popularity": 73,
"preview_url": null,
"track": true,
"track_number": 3,
"type": "track",
"uri": "spotify:track:516J"
},
"video_thumbnail": {
"url": null
}
}
],
"limit": 100,
"next": null,
"offset": 0,
"previous": null,
"total": 1
},
"type": "playlist",
"uri": "spotify:playlist:fek"
}
So what are best practices to read nested data like this into one dataframe in pandas?
I'm glad for any advice.
EDIT:
so basically I want all keys as columns in my dataframe. But with normalise it stops at "tracks.items" and if I normalise this again i have the recursive problem again.
It depends on the information you are looking for. Take a look at pandas.read_json() to see if that can work. Also you can select data as such
json_output = {"collaborative": 'false',"description": "", "external_urls": {"spotify": "https://open.spotify.com/playlist/5"}}
df['collaborative'] = json_output['collaborative'] #set value of your df to value of returned json values

How to parse JSON in Python and Bash?

I need to parse via Bash and Python the JSON below. I am getting different errors.
From JSON I want to get name and ObjectID information and put it on array. But don't know how to do this.
Example of JSON :
{
"aliases": [],
"localizations": {},
"name": "Super DX-Ball",
"popularity": 0,
"objectID": "7781",
"_highlightResult": {
"name": {
"value": "Super DX-<em>Ba</em>ll",
"matchLevel": "full",
"fullyHighlighted": false,
"matchedWords": [
"ba"
]
}
}
},
{
"aliases": [],
"localizations": {},
"name": "Katekyo Hitman Reborn! DS Flame Rumble X - Mirai Chou-Bakuhatsu!!",
"popularity": 0,
"objectID": "77522",
"_highlightResult": {
"name": {
"value": "Katekyo Hitman Reborn! DS Flame Rumble X - Mirai Chou-<em>Ba</em>kuhatsu!!",
"matchLevel": "full",
"fullyHighlighted": false,
"matchedWords": [
"ba"
]
}
}
},
{
"aliases": [],
"localizations": {},
"name": "Bagitman",
"popularity": 0,
"objectID": "7663",
"_highlightResult": {
"name": {
"value": "<em>Ba</em>gitman",
"matchLevel": "full",
"fullyHighlighted": false,
"matchedWords": [
"ba"
]
}
}
},
{
"aliases": [],
"localizations": {},
"name": "Virtual Bart",
"popularity": 0,
"objectID": "7616",
"_highlightResult": {
"name": {
"value": "Virtual <em>Ba</em>rt",
"matchLevel": "full",
"fullyHighlighted": false,
"matchedWords": [
"ba"
]
}
}
}
I'm getting error due that few independends jsons. Here is an example :
cat /tmp/out | jq ".name"
"Fortnite"
parse error: Expected value before ',' at line 35, column 4
The input JSON looks like an array but lacks brackets. Try to add them:
$ (echo '['; cat /tmp/out; echo ']') | jq 'map({ name, objectID })'
[
{
"name": "Super DX-Ball",
"objectID": "7781"
},
{
"name": "Katekyo Hitman Reborn! DS Flame Rumble X - Mirai Chou-Bakuhatsu!!",
"objectID": "77522"
},
{
"name": "Bagitman",
"objectID": "7663"
},
{
"name": "Virtual Bart",
"objectID": "7616"
}
]

Merging two json files using python

I'm new to python, I want to merge two JSON files
there should not be any duplicate:
if the values and name are same then I will add both the keys and maintain a single record, otherwise, I will keep the record
File 1:
[ {
"key": 1,
"name": "test",
"value": "NY"
},
{
"key": 1,
"name": "test",
"value": "CA"
},
{
"key": 1,
"name": "test",
"value": "MA"
},
{
"key": 1,
"name": "test",
"value": "MA"
}
]
File 2:
[ {
"key": 1,
"name": "test",
"value": "NJ"
},
{
"key": 1,
"name": "test",
"value": "CA"
},
{
"key": 1,
"name": "test",
"value": "TX"
},
{
"key": 1,
"name": "test",
"value": "MA"
}
]
and the merged file output should be:
[
{
"key": 1,
"name": "test",
"value": "NY"
},
{
"key": 3,
"name": "test",
"value": "MA"
},
{
"key": 1,
"name": "test",
"value": "NJ"
},
{
"key": 2,
"name": "test",
"value": "CA"
},
{
"key": 1,
"name": "test",
"value": "TX"
}
]
order of the record does not matter.
I have tried several approaches, like merging the files and then iterating over then, parsing both files separately but I'm facing issues, being new to python.
This should help.
# -*- coding: utf-8 -*-
f1 = [ {
"key": 1,
"value": "NY"
},
{
"key": 1,
"value": "CA"
},
{
"key": 1,
"value": "MA"
}
]
f2 = [ {
"key": 1,
"value": "NJ"
},
{
"key": 1,
"value": "CA"
},
{
"key": 1,
"value": "TX"
}
]
check = [i["value"] for i in f1] #check list to see if the value already exist in f1.
for i in f2:
if i['value'] not in check:
f1.append(i)
print(f1)
Output:
[{'value': 'NY', 'key': 1}, {'value': 'CA', 'key': 1}, {'value': 'MA', 'key': 1}, {'value': 'NJ', 'key': 1}, {'value': 'TX', 'key': 1}]

Categories

Resources