In Python parse a generic JSON structure looking for specific values

In Python parse a generic JSON structure looking for specific values - python

I'm trying to parse a JSON structure and looking for specific values.
Here an example of a possibile data to parse:
{
"objects": [
{
"from": 44,
"my_ref": "http://192.168.10.1/index.php",
"allow": "no",
"to": 10
},
{
"from": 20,
"my_ref": "http://192.168.10.2/index.php",
"allow": "mandatory",
"to": 0
}
],
"comment": "My PHP",
"identifiable_with_user": true,
"key": 10,
"link": [
{
"href": "http://192.168.10.1/index.php",
"method": "GET",
"rel": "self",
"type": "website"
},
{
"href": "http://192.168.10.5/identifiable.php",
"method": "GET",
"rel": "account_info"
}
],
"name": "Accounts",
"read_only": true,
"system": true,
"system_key": 20,
"tls_match_ref": "http://192.168.10.5/accounts.php"
}
I need to extract all the values identified by the key *ref (like my_ref, href, etc...) in a generic JSON structure. I just need to extract the URL and then perform some actions:
for entities in JSON_structure
URL= "take the URL corresponding to the ref key"
so something with the URL
I tried using .items() filtering "http://" string but didn't work
URL = {k:v for (k,v) in element_detail.items() if "http://" in k or v}

You should parse the data then look for the keys with ref in it.
That does the trick :
import json
def parse(root):
for k,v in root.items():
if isinstance(v, list):
for e in v:
parse(e)
else:
if 'ref' in k:
print(k + ':' + v)
json_data = '{\
"objects": [\
{ "from": 44, "my_ref": "http://192.168.10.1/index.php", "allow": "no", "to": 10 },\
{ "from": 20, "my_ref": "http://192.168.10.2/index.php", "allow": "mandatory", "to": 0 }\
],\
"comment": "My PHP",\
"identifiable_with_user": true,\
"key": 10,\
"link": [\
{ "href": "http://192.168.10.1/index.php", "method": "GET", "rel": "self", "type": "website" },\
{ "href": "http://192.168.10.5/identifiable.php", "method": "GET", "rel": "account_info" }\
],\
"name": "Accounts",\
"read_only": true,\
"system": true,\
"system_key": 20,\
"tls_match_ref": "http://192.168.10.5/accounts.php"\
}'
data = json.loads(json_data)
parse(data)
Output :
href:http://192.168.10.1/index.php
href:http://192.168.10.5/identifiable.php
my_ref:http://192.168.10.1/index.php
my_ref:http://192.168.10.2/index.php
tls_match_ref:http://192.168.10.5/accounts.php

Related

Writing resilient recursive code that will return results from a big json file

I have written a recursive code. I want more experienced people to tell me how resillient and fail-safe is my code:
I have a json file (Json file can be as big as 300MB):
[
{
"modules": {
"webpages": []
},
"webpages": {
"ip_addr": {
"value": "127.0.0.1",
"tags": []
},
"http": {
"status": {
"value": "Unavailable",
"tags": []
},
"title": {
"value": "403 Forbidden",
"tags": [
{
"category": "Server Code",
"match": "403"
},
{
"category": "Interesting Words",
"match": "Forbidden"
}
]
},
"server": {
"value": "Apache",
"tags": [
{
"category": "Apache Server",
"match": "Apache"
}
]
}
},
"redirects": [],
"robottxt": null
}
},
{
"modules": {
"webpages": []
}
}
]
I want to return value keys where tags are populated.
So I want to ignore:
"status": {
"value": "Unavailable",
"tags": []
},
But I want to return the title and server values. I also want to return ip_addr.value
I have written this code:
def getAllValues(nestedDictionary, firstArray, firstObj, firstUseful):
returnedArray = firstArray
tempValue = firstObj
useful = firstUseful
for key, value in nestedDictionary.items():
ipString = nestedDictionary.get("ip_addr")
if ipString is not None:
ipValue = ipString.get("value")
useful = {"ip_add": ipValue}
if isinstance(value, dict):
temp = {
"Key": key,
"useful": useful,
}
getAllValues(value, returnedArray, temp, useful)
else:
if key == "value":
tempValue["value"] = value
if key == "tags" and isinstance(value, list) and len(value) > 0:
tempValue["tags"] = value
returnedArray.append(tempValue)
return returnedArray
The above code should return:
[
{
"Key": "title",
"value": "403 Forbidden",
"useful": { "ip_addr": "127.0.0.1" },
"tags": [
{
"category": "Server Code",
"match": "403"
},
{
"category": "Interesting Words",
"match": "Forbidden"
}
]
},
{
"Key": "server",
"value": "Apache",
"useful": { "ip_addr": "127.0.0.1" },
"tags": [
{
"category": "Apache Server",
"match": "Apache"
}
]
}
]
Its a long post, but hopefully, someone can give me some assurance :)

Spotify API: Error parsing through JSON with Python

I have had trouble appending id's to a separate list as I parse through the JSON I receive from Spotify's "Users Saved Tracks" endpoint.
The JSON received looks like this:
{
"href": "https://api.spotify.com/v1/me/tracks?offset=0&limit=20",
"items": [
{
"added_at": "2021-11-16T13:56:51Z",
"track": {
"album": {
"album_type": "single",
"artists": [
{
"external_urls": {
"spotify": "https://open.spotify.com/artist/3iKDeO8yaOiWz7vkeljunk"
},
"href": "https://api.spotify.com/v1/artists/3iKDeO8yaOiWz7vkeljunk",
"id": "3iKDeO8yaOiWz7vkeljunk",
"name": "Heavenward",
"type": "artist",
"uri": "spotify:artist:3iKDeO8yaOiWz7vkeljunk"
}
],
"available_markets": [
],
"disc_number": 1,
"duration_ms": 224838,
"explicit": false,
"external_ids": {
"isrc": "QZK6P2040977"
},
"external_urls": {
"spotify": "https://open.spotify.com/track/6mJ1nbmQOm6iNClo71K5O6"
},
"href": "https://api.spotify.com/v1/tracks/6mJ1nbmQOm6iNClo71K5O6",
"id": "6mJ1nbmQOm6iNClo71K5O6",
"is_local": false,
"name": "Hole",
"popularity": 33,
"preview_url": "https://p.scdn.co/mp3-preview/c425dc91bdb19f1cddf2b35df08e30a03290c3c0?cid=8c9ee97b95854163a250399fda32d350",
"track_number": 1,
"type": "track",
"uri": "spotify:track:6mJ1nbmQOm6iNClo71K5O6"
}
}
Right now my code that I am using to parse looks like this:
def getLikedTrackIds(session):
url = 'https://api.spotify.com/v1/me/tracks'
payload = makeGetRequest(session, url)
if payload == None:
return None
liked_tracks_ids = []
for track in payload['items']:
for attribute in track['track']:
if (attribute == 'id'):
app.logger.info(f"\n\nTrack ID: {attribute}")
liked_tracks_ids.append(attribute)
return liked_tracks_ids
My liked_track_ids is filled with the string "id", for each song:
[ "id", "id", "id", "id"....]
Can anyone provide insight as to what I am doing wrong?

Already commented under the question but your code can be simplified by getting rid of the loop:
def getLikedTrackIds(session):
url = 'https://api.spotify.com/v1/me/tracks'
payload = makeGetRequest(session, url)
if payload == None:
return None
liked_tracks_ids = []
for track in payload['items']:
liked_id = track['track'].get('id', None)
if liked_id:
app.logger.info(f"\n\nTrack ID: {liked_id}")
liked_tracks_ids.append(liked_id)
return liked_tracks_ids

Getting specific lines from a print in Python with Spotipy

I am writing some code with Python and Spotipy and I'm relatively new to coding. I have some code that get all the info about a Spotify playlist and prints it out for me:
from spotipy.oauth2 import SpotifyClientCredentials
import spotipy
import json
client_credentials_manager = SpotifyClientCredentials()
sp = spotipy.Spotify(client_credentials_manager=client_credentials_manager)
playlist_id = 'spotify:playlist:76CVeJDw2b90up5PgkZXyU'
results = sp.playlist(playlist_id)
#print(json.dumps(results, indent=4))
print((json.dumps(results, indent=4)))
It works well and gives me all the info. My problem is that I only need specifics from the print:
"collaborative": false,
"description": "",
"external_urls": {
"spotify": "https://open.spotify.com/playlist/76CVeJDw2b90up5PgkZXyU"
},
"followers": {
"href": null,
"total": 0
},
"href": "https://api.spotify.com/v1/playlists/76CVeJDw2b90up5PgkZXyU?additional_types=track",
"id": "76CVeJDw2b90up5PgkZXyU",
"images": [
{
"height": 640,
"url": "https://i.scdn.co/image/ab67616d0000b2734a052b99c042dc15f933145b",
"width": 640
}
],
"name": "TEST",
"owner": {
"display_name": "Name",
"external_urls": {
"spotify": "https://open.spotify.com/user/myname"
},
"href": "https://api.spotify.com/v1/users/myname",
"id": "Myname",
"type": "user",
"uri": "spotify:user:kovizsombor"
},
"primary_color": null,
"public": true,
"snapshot_id": "MixmMGE0MDgxNDQ1ZGVlNmE4MThiMmQwODMwYWU0OTI3YzkyOGJhOWIz",
"tracks": {
"href": "https://api.spotify.com/v1/playlists/76CVeJDw2b90up5PgkZXyU/tracks?offset=0&limit=100&additional_types=track",
"items": [
{
"added_at": "2020-05-17T10:00:11Z",
"added_by": {
"external_urls": {
"spotify": "https://open.spotify.com/user/kovizsombor"
},
"href": "https://api.spotify.com/v1/users/kovizsombor",
"id": "kovizsombor",
"type": "user",
"uri": "spotify:user:kovizsombor"
},
"is_local": false,
"primary_color": null,
"track": {
"album": {
"album_type": "album",
"artists": [
{
"external_urls": {
"spotify": "https://open.spotify.com/artist/0PFtn5NtBbbUNbU9EAmIWF"
},
"href": "https://api.spotify.com/v1/artists/0PFtn5NtBbbUNbU9EAmIWF",
"id": "0PFtn5NtBbbUNbU9EAmIWF",
"name": "TOTO",
"type": "artist",
"uri": "spotify:artist:0PFtn5NtBbbUNbU9EAmIWF"
}
],
],
"external_urls": {
"spotify": "https://open.spotify.com/album/62U7xIHcID94o20Of5ea4D"
},
"href": "https://api.spotify.com/v1/albums/62U7xIHcID94o20Of5ea4D",
"id": "62U7xIHcID94o20Of5ea4D",
"images": [
{
"height": 640,
"url": "https://i.scdn.co/image/ab67616d0000b2734a052b99c042dc15f933145b",
"width": 640
},
{
"height": 300,
"url": "https://i.scdn.co/image/ab67616d00001e024a052b99c042dc15f933145b",
"width": 300
},
{
"height": 64,
"url": "https://i.scdn.co/image/ab67616d000048514a052b99c042dc15f933145b",
"width": 64
}
],
"name": "Toto IV",
"release_date": "1982-04-08",
"release_date_precision": "day",
"total_tracks": 10,
"type": "album",
"uri": "spotify:album:62U7xIHcID94o20Of5ea4D"
},
"artists": [
{
"external_urls": {
"spotify": "https://open.spotify.com/artist/0PFtn5NtBbbUNbU9EAmIWF"
},
"href": "https://api.spotify.com/v1/artists/0PFtn5NtBbbUNbU9EAmIWF",
"id": "0PFtn5NtBbbUNbU9EAmIWF",
"name": "TOTO",
"type": "artist",
"uri": "spotify:artist:0PFtn5NtBbbUNbU9EAmIWF"
}
],
"available_markets": [
],
"disc_number": 1,
"duration_ms": 295893,
"episode": false,
"explicit": false,
"external_ids": {
"isrc": "USSM19801941"
},
"external_urls": {
"spotify": "https://open.spotify.com/track/2374M0fQpWi3dLnB54qaLX"
},
"href": "https://api.spotify.com/v1/tracks/2374M0fQpWi3dLnB54qaLX",
"id": "2374M0fQpWi3dLnB54qaLX",
"is_local": false,
"name": "Africa",
"popularity": 83,
"preview_url": "https://p.scdn.co/mp3-preview/dd78dafe31bb98f230372c038a126b8808f9349b?cid=d568e7073a38465bba48268ea9f10153",
"track": true,
"track_number": 10,
"type": "track",
"uri": "spotify:track:2374M0fQpWi3dLnB54qaLX"
},
"video_thumbnail": {
"url": null
}
}
],
"limit": 100,
"next": null,
"offset": 0,
"previous": null,
"total": 1
},
"type": "playlist",
"uri": "spotify:playlist:76CVeJDw2b90up5PgkZXyU"
}
From this long print I somehow need to extract the Artist and the song title and preferably make it into a variable. Also not sure how this would work if there are multiple songs in a playlist.
It's also a solution if I can print out only the Artist and the title of the song without printing out all the information.

Based on your posted example it appears that you have only 1 song named "Africa" by artist "TOTO". Copying the track to have two songs, I added another track with two artists for testing the arrays.
If that json is loaded into a variable named results then (as #xcmkz said) you have a python dictionary and can process accordingly.
Try working with the following to transverse through your dict appending artists and songs to lists:
song_dict = {}
for track in results['tracks']['items']:
song_name = track["track"]["name"]
a2 = []
for t1 in track['track']['artists']:
a2.append(t1['name'])
song_dict.update({song_name: a2})
print(f'Dictionary of Songs and Artists:')
for k, v in song_dict.items():
print(f'Song --> {k}, by --> {", ".join(v)}')
Results:
Dictionary of Songs and Artists:
Song --> Africa, by --> TOTO
Song --> Just Another Silly Song, by --> Artist 2, Artist 3

sp.playlist returns a dictionary, so you can simply access its values by their keys. For example:
>>> results['name']
'TEST'
JSON is a data serialization format, ie a standardized way of representing objects as pure text and parsing them back from the text. json.dumps therefore converts the dictionary object to a string of text. This is useful if you want to for example save the results to a file and load it back later. You don't need it to access contents from results.
(This is a playlist though—you will need to get information on each song/track to get its artist and name.)

How to flatten JSON response from Surveymonkey API

I'm setting up a Python function to use the Surveymonkey API to get survey responses from Surveymonkey.
The API returns responses in a JSON format with a deep recursive file structure.
I'm having issues trying to flatten this JSON so that it can go into Google Cloud Storage.
I have tried to flatten the response using the following code. Which works; however, it does not transform it to the format that I am looking for.
{
"per_page": 2,
"total": 1,
"data": [
{
"total_time": 0,
"collection_mode": "default",
"href": "https://api.surveymonkey.com/v3/responses/5007154325",
"custom_variables": {
"custvar_1": "one",
"custvar_2": "two"
},
"custom_value": "custom identifier for the response",
"edit_url": "https://www.surveymonkey.com/r/",
"analyze_url": "https://www.surveymonkey.com/analyze/browse/",
"ip_address": "",
"pages": [
{
"id": "73527947",
"questions": [
{
"id": "273237811",
"answers": [
{
"choice_id": "1842351148"
},
{
"text": "I might be text or null",
"other_id": "1842351149"
}
]
},
{
"id": "273240822",
"answers": [
{
"choice_id": "1863145815",
"row_id": "1863145806"
},
{
"text": "I might be text or null",
"other_id": "1863145817"
}
]
},
{
"id": "273239576",
"answers": [
{
"choice_id": "1863156702",
"row_id": "1863156701"
},
{
"text": "I might be text or null",
"other_id": "1863156707"
}
]
},
{
"id": "296944423",
"answers": [
{
"text": "I might be text or null"
}
]
}
]
}
],
"date_modified": "1970-01-17T19:07:34+00:00",
"response_status": "completed",
"id": "5007154325",
"collector_id": "50253586",
"recipient_id": "0",
"date_created": "1970-01-17T19:07:34+00:00",
"survey_id": "105723396"
}
],
"page": 1,
"links": {
"self": "https://api.surveymonkey.com/v3/surveys/123456/responses/bulk?page=1&per_page=2"
}
}
answers_df = json_normalize(data=response_json['data'],
record_path=['pages', 'questions', 'answers'],
meta=['id', ['pages', 'questions', 'id'], ['pages', 'id']])
Instead of returning a row for each question id, I need it to return a column for each question id, choice_id, and text field.
The columns I would like to see are total_time, collection_mode, href, custom_variables.custvar_1, custom_variables.custvar_2, custom_value, edit_url, analyze_url, ip_address, pages.id, pages.questions.0.id, pages.questions.0.answers.0.choice_id, pages.questions.0.answers.0.text, pages.questions.0.answers.0.other_id
Instead of the each Question ID, Choice_id, text and answer being on a separate row. I would like a column for each one. So that there is only 1 row per survey_id or index in data

Convert deeply nested json from facebook to dataframe in python

I am trying to get user details of persons who has put likes, comments on Facebook posts. I am using python facebook-sdk package. Code is as follows.
import facebook as fi
import json
graph = fi.GraphAPI('Access Token')
data = json.dumps(graph.get_object('DSIfootcandy/posts'))
From the above, I am getting a highly nested json. Here I will put only a json string for one post in the fb.
{
"paging": {
"next": "https://graph.facebook.com/v2.0/425073257683630/posts?access_token=&limit=25&until=1449201121&__paging_token=enc_AdD0DL6sN3aDZCwfYY25rJLW9IZBZCLM1QfX0venal6rpjUNvAWZBOoxTjbOYZAaFiBImzMqiv149HPH5FBJFo0nSVOPqUy78S0YvwZDZD",
"previous": "https://graph.facebook.com/v2.0/425073257683630/posts?since=1450843741&access_token=&limit=25&__paging_token=enc_AdCYobFJpcNavx6STzfPFyFe6eQQxRhkObwl2EdulwL7mjbnIETve7sJZCPMwVm7lu7yZA5FoY5Q4sprlQezF4AlGfZCWALClAZDZD&__previous=1"
},
"data": [
{
"picture": "https://fbcdn-photos-e-a.akamaihd.net/hphotos-ak-xfa1/v/t1.0-0/p130x130/1285_5066979392443_n.png?oh=b37a42ee58654f08af5abbd4f52b1ace&oe=570898E7&__gda__=1461440649_aa94b9ec60f22004675c4a527e8893f",
"is_hidden": false,
"likes": {
"paging": {
"cursors": {
"after": "MTU3NzQxODMzNTg0NDcwNQ==",
"before": "MTU5Mzc1MjA3NDE4ODgwMA=="
}
},
"data": [
{
"id": "1593752074188800",
"name": "Maduri Priyadarshani"
},
{
"id": "427605680763414",
"name": "Darshi Mashika"
},
{
"id": "599793563453832",
"name": "Shakeer Nimeshani Shashikala"
},
{
"id": "1577418335844705",
"name": "Däzlling Jalali Muishu"
}
]
},
"from": {
"category": "Retail and Consumer Merchandise",
"name": "Footcandy",
"category_list": [
{
"id": "2239",
"name": "Retail and Consumer Merchandise"
}
],
"id": "425073257683630"
},
"name": "Timeline Photos",
"privacy": {
"allow": "",
"deny": "",
"friends": "",
"description": "",
"value": ""
},
"is_expired": false,
"comments": {
"paging": {
"cursors": {
"after": "WTI5dGJXVnVkRjlqZFhKemIzSUVXdNVFExTURRd09qRTBOVEE0TkRRNE5EVT0=",
"before": "WTI5dGJXVnVkRjlqZFhKemIzNE16Y3dNVFExTVRFNE9qRTBOVEE0TkRRME5UVT0="
}
},
"data": [
{
"from": {
"name": "NiFû Shafrà",
"id": "1025030640553"
},
"like_count": 0,
"can_remove": false,
"created_time": "2015-12-23T04:20:55+0000",
"message": "wow lovely one",
"id": "50018692683829_500458145118",
"user_likes": false
},
{
"from": {
"name": "Shamnaz Lukmanjee",
"id": "160625809961884"
},
"like_count": 0,
"can_remove": false,
"created_time": "2015-12-23T04:27:25+0000",
"message": "Nice",
"id": "500186926838929_500450145040",
"user_likes": false
}
]
},
"actions": [
{
"link": "https://www.facebook.com/425073257683630/posts/5001866838929",
"name": "Comment"
},
{
"link": "https://www.facebook.com/42507683630/posts/500186926838929",
"name": "Like"
}
],
"updated_time": "2015-12-23T04:27:25+0000",
"link": "https://www.facebook.com/DSIFootcandy/photos/a.438926536298302.1073741827.4250732576630/50086926838929/?type=3",
"object_id": "50018692838929",
"shares": {
"count": 3
},
"created_time": "2015-12-23T04:09:01+0000",
"message": "Reach new heights in the cute and extremely comfortable \"Silviar\" www.focandy.lk",
"type": "photo",
"id": "425077683630_50018926838929",
"status_type": "added_photos",
"icon": "https://www.facebook.com/images/icons/photo1.gif"
}
]
}
Now I need to get this data into a dataframe as follows(no need to get all).
item | Like_id |Like_username | comments_userid |comments_username|comment(msg)|
-----+---------+--------------+-----------------+-----------------+------------+
Bag | 45546 | noel | 641 | James | nice work |
-----+---------+--------------+-----------------+-----------------+------------+
Any Help will be Highly Appreciated.

Not exactly like your intended format, but here is the making of a solution :
import pandas
DictionaryObject_as_List = str(mydict).replace("{","").replace("}","").replace("[","").replace("]","").split(",")
newlist = []
for row in DictionaryObject_as_List :
row = row.replace('https://',' ').split(":")
exec('newlist.append ( ' + "[" + " , ".join(row)+"]" + ')')
DataFrame_Object = pandas.DataFrame(newlist)
print DataFrame_Object

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

In Python parse a generic JSON structure looking for specific values - python

Related

Writing resilient recursive code that will return results from a big json file

Spotify API: Error parsing through JSON with Python

Getting specific lines from a print in Python with Spotipy

How to flatten JSON response from Surveymonkey API

Convert deeply nested json from facebook to dataframe in python

Categories

Resources