I have some JSON data like:
{
"status": "200",
"msg": "",
"data": {
"time": "1515580011",
"video_info": [
{
"announcement": "{\"announcement_id\":\"6\",\"name\":\"INS\\u8d26\\u53f7\",\"icon\":\"http:\\\/\\\/liveme.cms.ksmobile.net\\\/live\\\/announcement\\\/2017-08-18_19:44:54\\\/ins.png\",\"icon_new\":\"http:\\\/\\\/liveme.cms.ksmobile.net\\\/live\\\/announcement\\\/2017-10-20_22:24:38\\\/4.png\",\"videoid\":\"15154610218328614178\",\"content\":\"FOLLOW ME PLEASE\",\"x_coordinate\":\"0.22\",\"y_coordinate\":\"0.23\"}",
"announcement_shop": "",
etc.
How do I grab the content "FOLLOW ME PLEASE"? I tried using
replay_data = raw_replay_data['data']['video_info'][0]
announcement = replay_data['announcement']
But now announcement is a string representing more JSON data. I can't continue indexing announcement['content'] results in TypeError: string indices must be integers.
How can I get the desired string in the "right" way, i.e. respecting the actual structure of the data?
In a single line -
>>> json.loads(data['data']['video_info'][0]['announcement'])['content']
'FOLLOW ME PLEASE'
To help you understand how to access data (so you don't have to ask again), you'll need to stare at your data.
First, let's lay out your data nicely. You can either use json.dumps(data, indent=4), or you can use an online tool like JSONLint.com.
{
'data': {
'time': '1515580011',
'video_info': [{
'announcement': ( # ***
"""{
"announcement_id": "6",
"name": "INS\\u8d26\\u53f7",
"icon": "http:\\\\/\\\\/liveme.cms.ksmobile.net\\\\/live\\\\/announcement\\\\/2017-08-18_19:44:54\\\\/ins.png",
"icon_new": "http:\\\\/\\\\/liveme.cms.ksmobile.net\\\\/live\\\\/announcement\\\\/2017-10-20_22:24:38\\\\/4.png",
"videoid": "15154610218328614178",
"content": "FOLLOW ME PLEASE",
"x_coordinate": "0.22",
"y_coordinate": "0.23"
}"""),
'announcement_shop': ''
}]
},
'msg': '',
'status': '200'
}
*** Note that the data in the announcement key is actually more json data, which I've laid out on separate lines.
First, find out where your data resides. You're looking for the data in the content key, which is accessed by the announcement key, which is part of a dictionary inside a list of dicts, which can be accessed by the video_info key, which is in turn accessed by data.
So, in summary, "descend" the ladder that is "data" using the following "rungs" -
data, a dictionary
video_info, a list of dicts
announcement, a dict in the first dict of the list of dicts
content residing as part of json data.
First,
i = data['data']
Next,
j = i['video_info']
Next,
k = j[0] # since this is a list
If you only want the first element, this suffices. Otherwise, you'd need to iterate:
for k in j:
...
Next,
l = k['announcement']
Now, l is JSON data. Load it -
import json
m = json.loads(l)
Lastly,
content = m['content']
print(content)
'FOLLOW ME PLEASE'
This should hopefully serve as a guide should you have future queries of this nature.
You have nested JSON data; the string associated with the 'annoucement' key is itself another, separate, embedded JSON document.
You'll have to decode that string first:
import json
replay_data = raw_replay_data['data']['video_info'][0]
announcement = json.loads(replay_data['announcement'])
print(announcement['content'])
then handle the resulting dictionary from there.
The content of "announcement" is another JSON string. Decode it and then access its contents as you were doing with the outer objects.
Related
I have a JSON object that looks as such, stored in a variabled called results:
{
"users":[
{
"pk":54297756964,
"username":"zach_nga_test",
"full_name":"",
"is_private":false,
"profile_pic_url":"https://instagram.fcxl1-1.fna.fbcdn.net/v/t51.2885-19/44884218_345707102882519_2446069589734326272_n.jpg?efg=eyJybWQiOiJpZ19hbmRyb2lkX21vYmlsZV9uZXR3b3JrX3N0YWNrX2JhY2t0ZXN0X3YyNDI6Y29udHJvbCJ9&_nc_ht=instagram.fcxl1-1.fna.fbcdn.net&_nc_cat=1&_nc_ohc=bdMav5vmkj0AX9lHIND&edm=ALdCaaIBAAAA&ccb=7-5&ig_cache_key=YW5vbnltb3VzX3Byb2ZpbGVfcGlj.2-ccb7-5&oh=00_AT9BXADjYsi8BkryuLhynAbrTCGmgjucZ6CJUxW4VS49QA&oe=62D6BC4F&_nc_sid=bfcd0c",
"is_verified":false,
"has_anonymous_profile_picture":true,
"has_highlight_reels":false,
"account_badges":[
],
"latest_reel_media":0
},
{
"pk":182625349,
"username":"joesmith123",
"full_name":"Joe Smith",
"is_private":false,
"profile_pic_url":"https://scontent-lga3-1.cdninstagram.com/v/t51.2885-19/264887407_23285995fef5595973_2438487768932865121_n.jpg?stp=dst-jpg_s150x150&_nc_ht=scontent-lga3-1.cdninstagram.com&_nc_cat=105&_nc_ohc=8YpjD2OKeoEAX_gXrh3&edm=APQMUHMBAAAA&ccb=7-5&oh=00_AT9UuhNl9LL_ANffkCcyNNFPv5_yK7J2FKQpRPmqEIri3w&oe=62D60038&_nc_sid=e5d0a6",
"profile_pic_id":"2725348120205619717_182625349",
"is_verified":false,
"has_anonymous_profile_picture":false,
"has_highlight_reels":false,
"account_badges":[
],
"latest_reel_media":0
},
{
"pk":7324707263,
"username":"Mike Jones",
"full_name":"Mike",
"is_private":false,
"profile_pic_url":"https://scontent-lga3-1.cdninstagram.com/v/t51.2885-19/293689497_4676376015169654_5558066294974198168_n.jpg?stp=dst-jpg_s150x150&_nc_ht=scontent-lga3-1.cdninstagram.com&_nc_cat=110&_nc_ohc=ErZ3qsP0LsAAX96OT9a&edm=APQMUHMBAAAA&ccb=7-5&oh=00_AT8pFSsMfJz4Wpq5ulTCpou-4jPs3_GBqIT_SQA6YMaQ0Q&oe=62D61C4D&_nc_sid=e5d0a6",
"profile_pic_id":"28817013445468058864_7324707263",
"is_verified":false,
"has_anonymous_profile_picture":false,
"has_highlight_reels":false,
"account_badges":[
],
"latest_reel_media":1657745524
}
]
}
What I'm trying to do is extract the pk variable from all of them.
I know how to do it on a one by one basis as such:
ids = results['users'][1]['pk']
ids = results['users'][2]['pk']
ids = results['users'][3]['pk']
And so on. But let's say I wanted to extract all of those values in one swoop. I also won't necessarily know how many there are in each JSON object (while the example I used had three, it could be hundreds).
I'm so used to R where you can just doing something like ids = results$pk but don't know how to do this with Python.
EDIT BASED ON rv.kvetch answer
data = results
data = json.loads(data)
ids = [u['pk'] for u in data['users']]
print(ids)
But when I run it I get TypeError: the JSON object must be str, bytes or bytearray, not dict
You can load the json string to a dict object, then use a list comprehension to retrieve a list of ids:
import json
# data is a string
data: str = """
...
"""
data: dict = json.loads(data)
ids = [u['pk'] for u in data['users']]
print(ids)
Output:
[54297756964, 182625349, 7324707263]
I have some JSON data like:
{
"status": "200",
"msg": "",
"data": {
"time": "1515580011",
"video_info": [
{
"announcement": "{\"announcement_id\":\"6\",\"name\":\"INS\\u8d26\\u53f7\",\"icon\":\"http:\\\/\\\/liveme.cms.ksmobile.net\\\/live\\\/announcement\\\/2017-08-18_19:44:54\\\/ins.png\",\"icon_new\":\"http:\\\/\\\/liveme.cms.ksmobile.net\\\/live\\\/announcement\\\/2017-10-20_22:24:38\\\/4.png\",\"videoid\":\"15154610218328614178\",\"content\":\"FOLLOW ME PLEASE\",\"x_coordinate\":\"0.22\",\"y_coordinate\":\"0.23\"}",
"announcement_shop": "",
etc.
How do I grab the content "FOLLOW ME PLEASE"? I tried using
replay_data = raw_replay_data['data']['video_info'][0]
announcement = replay_data['announcement']
But now announcement is a string representing more JSON data. I can't continue indexing announcement['content'] results in TypeError: string indices must be integers.
How can I get the desired string in the "right" way, i.e. respecting the actual structure of the data?
In a single line -
>>> json.loads(data['data']['video_info'][0]['announcement'])['content']
'FOLLOW ME PLEASE'
To help you understand how to access data (so you don't have to ask again), you'll need to stare at your data.
First, let's lay out your data nicely. You can either use json.dumps(data, indent=4), or you can use an online tool like JSONLint.com.
{
'data': {
'time': '1515580011',
'video_info': [{
'announcement': ( # ***
"""{
"announcement_id": "6",
"name": "INS\\u8d26\\u53f7",
"icon": "http:\\\\/\\\\/liveme.cms.ksmobile.net\\\\/live\\\\/announcement\\\\/2017-08-18_19:44:54\\\\/ins.png",
"icon_new": "http:\\\\/\\\\/liveme.cms.ksmobile.net\\\\/live\\\\/announcement\\\\/2017-10-20_22:24:38\\\\/4.png",
"videoid": "15154610218328614178",
"content": "FOLLOW ME PLEASE",
"x_coordinate": "0.22",
"y_coordinate": "0.23"
}"""),
'announcement_shop': ''
}]
},
'msg': '',
'status': '200'
}
*** Note that the data in the announcement key is actually more json data, which I've laid out on separate lines.
First, find out where your data resides. You're looking for the data in the content key, which is accessed by the announcement key, which is part of a dictionary inside a list of dicts, which can be accessed by the video_info key, which is in turn accessed by data.
So, in summary, "descend" the ladder that is "data" using the following "rungs" -
data, a dictionary
video_info, a list of dicts
announcement, a dict in the first dict of the list of dicts
content residing as part of json data.
First,
i = data['data']
Next,
j = i['video_info']
Next,
k = j[0] # since this is a list
If you only want the first element, this suffices. Otherwise, you'd need to iterate:
for k in j:
...
Next,
l = k['announcement']
Now, l is JSON data. Load it -
import json
m = json.loads(l)
Lastly,
content = m['content']
print(content)
'FOLLOW ME PLEASE'
This should hopefully serve as a guide should you have future queries of this nature.
You have nested JSON data; the string associated with the 'annoucement' key is itself another, separate, embedded JSON document.
You'll have to decode that string first:
import json
replay_data = raw_replay_data['data']['video_info'][0]
announcement = json.loads(replay_data['announcement'])
print(announcement['content'])
then handle the resulting dictionary from there.
The content of "announcement" is another JSON string. Decode it and then access its contents as you were doing with the outer objects.
I have some JSON data like:
{
"status": "200",
"msg": "",
"data": {
"time": "1515580011",
"video_info": [
{
"announcement": "{\"announcement_id\":\"6\",\"name\":\"INS\\u8d26\\u53f7\",\"icon\":\"http:\\\/\\\/liveme.cms.ksmobile.net\\\/live\\\/announcement\\\/2017-08-18_19:44:54\\\/ins.png\",\"icon_new\":\"http:\\\/\\\/liveme.cms.ksmobile.net\\\/live\\\/announcement\\\/2017-10-20_22:24:38\\\/4.png\",\"videoid\":\"15154610218328614178\",\"content\":\"FOLLOW ME PLEASE\",\"x_coordinate\":\"0.22\",\"y_coordinate\":\"0.23\"}",
"announcement_shop": "",
etc.
How do I grab the content "FOLLOW ME PLEASE"? I tried using
replay_data = raw_replay_data['data']['video_info'][0]
announcement = replay_data['announcement']
But now announcement is a string representing more JSON data. I can't continue indexing announcement['content'] results in TypeError: string indices must be integers.
How can I get the desired string in the "right" way, i.e. respecting the actual structure of the data?
In a single line -
>>> json.loads(data['data']['video_info'][0]['announcement'])['content']
'FOLLOW ME PLEASE'
To help you understand how to access data (so you don't have to ask again), you'll need to stare at your data.
First, let's lay out your data nicely. You can either use json.dumps(data, indent=4), or you can use an online tool like JSONLint.com.
{
'data': {
'time': '1515580011',
'video_info': [{
'announcement': ( # ***
"""{
"announcement_id": "6",
"name": "INS\\u8d26\\u53f7",
"icon": "http:\\\\/\\\\/liveme.cms.ksmobile.net\\\\/live\\\\/announcement\\\\/2017-08-18_19:44:54\\\\/ins.png",
"icon_new": "http:\\\\/\\\\/liveme.cms.ksmobile.net\\\\/live\\\\/announcement\\\\/2017-10-20_22:24:38\\\\/4.png",
"videoid": "15154610218328614178",
"content": "FOLLOW ME PLEASE",
"x_coordinate": "0.22",
"y_coordinate": "0.23"
}"""),
'announcement_shop': ''
}]
},
'msg': '',
'status': '200'
}
*** Note that the data in the announcement key is actually more json data, which I've laid out on separate lines.
First, find out where your data resides. You're looking for the data in the content key, which is accessed by the announcement key, which is part of a dictionary inside a list of dicts, which can be accessed by the video_info key, which is in turn accessed by data.
So, in summary, "descend" the ladder that is "data" using the following "rungs" -
data, a dictionary
video_info, a list of dicts
announcement, a dict in the first dict of the list of dicts
content residing as part of json data.
First,
i = data['data']
Next,
j = i['video_info']
Next,
k = j[0] # since this is a list
If you only want the first element, this suffices. Otherwise, you'd need to iterate:
for k in j:
...
Next,
l = k['announcement']
Now, l is JSON data. Load it -
import json
m = json.loads(l)
Lastly,
content = m['content']
print(content)
'FOLLOW ME PLEASE'
This should hopefully serve as a guide should you have future queries of this nature.
You have nested JSON data; the string associated with the 'annoucement' key is itself another, separate, embedded JSON document.
You'll have to decode that string first:
import json
replay_data = raw_replay_data['data']['video_info'][0]
announcement = json.loads(replay_data['announcement'])
print(announcement['content'])
then handle the resulting dictionary from there.
The content of "announcement" is another JSON string. Decode it and then access its contents as you were doing with the outer objects.
I need to pull data in from elasticsearch, do some cleaning/munging and export as table/rds.
To do this I have a long list of variable names required to pull from elasticsearch. This list of variables is required for the pull, but the issue is that not all fields may be represented within a given pull, meaning that I need to add the fields after the fact. I can do this using a schema (in nested json format) of the same list of variable names.
To try and [slightly] future proof this work I would ideally like to only maintain the list/schema in one place, and convert from list to schema (or vice-versa).
Is there a way to do this in python? Please see example below of input and desired output.
Small part of schema:
{
"_source": {
"filters": {"group": {"filter_value": 0}},
"user": {
"email": "",
"uid": ""
},
"status": {
"date": "",
"active": True
}
}
}
Desired string list output:
[
"_source.filters.group.filter_value",
"_source.user.email",
"_source.user.uid",
"_source.status.date",
"_source.status.active"
]
I believe that schema -> list might be an easier transformation than list -> schema, though am happy for it to be the other way round if that is simpler (though need to ensure the schema variables have the correct type, i.e. str, bool, float).
I have explored the following answers which come close, but I am struggling to understand since none appear to be in python:
Convert dot notation to JSON
Convert string with dot notation to JSON
Where d is your json as a dictionary,
def full_search(d):
arr = []
def dfs(d, curr):
if not type(d) == dict or curr[-1] not in d or type(d[curr[-1]]) != dict:
arr.append(curr)
return
for key in d[curr[-1]].keys():
dfs(d[curr[-1]], curr + [key])
for key in d.keys():
dfs(d, [key])
return ['.'.join(x) for x in arr]
If d is in json form, use
import json
res = full_search(json.loads(d))
Im trying to compare two json files. One is a json.dump and the other is a new json object coming into the method.
We generate a JSON file using results of tests from a server, they are then passed into this method:
def save_json_results(data):
if not os.listdir('/opt/Code/JSON'):
with open('/opt/Code/JSON/current_json.json','w') as outfile:
json.dump(data,outfile)
send_json_results(data)
else:
f = open('/opt/Code/JSON/current_json.json')
file_data = json.loads(f)
d = json.loads(data)
if sorted(file_data.items()) == sorted(d.items()) and file_data.get("timestamp",{}) != d.get("timestamp",{}):
with open('/opt/Code/JSON/current_json.json','w') as outfile:
json.dump(data,outfile)
else:
os.rename('/opt/Code/JSON/current_json.json','/opt/cfm_health_checks/Code/JSON/prev_json.json')
with open('/opt/Code/JSON/current_json.json','w') as outfile:
json.dump(data,outfile)
send_json_results(data)
The premise of this method is to take the JSON object which is being passed in (data) and then check a directory if it is empty then store this file as the most up to date JSON, else it will attempt to open the file which is already there (I might be wrong in assuming I have to reload it as a JSON object) then try and compare the files, if they are the same and its only the timestamp which is different then dont send the JSON onto the database (as it is just wasting network resources as no information about the test results are different) however it will still make the new data = the most current JSON, else it will send it to the database and rename the files accordingly.
{
"acronym": "",
"api": [],
"cell_version": "",
"cfm": [],
"city": "Belfast",
"country": "UK",
"latitude": 54.614443,
"longitude": -5.899953,
"nas": [],
"node": [],
"number_of_cells": "",
"switch": [],
"timestamp": "2015-01-19 15:04:13.489097"
}
Example JSON above.
The d and file_data objects returned by json.loads should have the standard python dictionary interface.
Therefore you should be able to access the acronym value in d using d["acronym"].
You could also do for keyvaluepair in file_data.items(): to iterate through each JSON entry and apply a separate comparison to each key value pair based on the key name.
You could also do
[d[k[0]] for k in sorted(d.items()) if k[0] != "timestamp"] == [file_data[k[0]] for k in sorted(file_data.items()) if k[0] != "timestamp"]
Although that's a very long line, and I wouldn't personally let that past a code review.