Navigation through json (python)

Navigation through json (python) - python

I'm trying to navigation through a json file but cannot parse properly the 'headliner' node.
Here is my JSON file :
{
"resultsPage":{
"results":{
"calendarEntry":[
{
"event":{
"id":38862824,
"artistName":"Raphael",
},
"performance":[
{
"id":73632729,
"headlinerName":"Top-Secret",
}
}
],
"venue":{
"id":4285819,
"displayName":"Sacré"
}
}
}
}
Here is what I my trying to do :
for item in data ["resultsPage"]["results"]["calendarEntry"]:
artistname = item["event"]["artistName"]
headliner = item["performance"]["headlinerName"]
I don't understand why it's working for the 'artistName' but it's not working for 'headlinerName'. Thanks for your help and your explanation.

Notice your performance key:
"performance":[
{
"id":73632729,
"headlinerName":"Top-Secret",
}
}
],
The json you posted is malformed. Assuming the structure is like:
"performance":[
{
"id":73632729,
"headlinerName":"Top-Secret",
}
],
You can do:
for i in item:
i["headlinerName"]
or as #UltraInstinct suggested:
item["performance"][0]["headlinerName"]

A few problems here. First, your JSON is incorrectly formatted. Your square brackets don't match up. Maybe you meant something like this? I am going to assume "calendarEntry" is a list here and everything else is an object. Usually lists are made plural, i.e. "calendarEntries".
{
"resultsPage": {
"results": {
"calendarEntries": [
{
"event": {
"id": 38862824,
"artistName": "Raphael"
},
"performance": {
"id": 73632729,
"headlinerName": "Top-Secret"
},
"venue": {
"id": 4285819,
"displayName": "Sacré"
}
}
]
}
}
}

Related

What happens to a $match term in a pipeline?

I'm a newbie to MongoDB and Python scripts. I'm confused how a $match term is handled in a pipeline.
Let's say I manage a library, where books are tracked as JSON files in a MongoDB. There is one JSON for each copy of a book. The book.JSON files look like this:
{
"Title": "A Tale of Two Cities",
"subData":
{
"status": "Checked In"
...more data here...
}
}
Here, status will be one string from a finite set of strings, perhaps just: { "Checked In", "Checked Out", "Missing", etc. } But also note also that there may not be a status field at all:
{
"Title": "Great Expectations",
"subData":
{
...more data here...
}
}
Okay: I am trying to write a MongoDB pipeline within a Python script that does the following:
For each book in the library:
Groups and counts the different instances of the status field
So my target output from my Python script would be something like this:
{ "A Tale of Two Cities" 'Checked In' 3 }
{ "A Tale of Two Cities" 'Checked Out' 4 }
{ "Great Expectations" 'Checked In' 5 }
{ "Great Expectations" '' 7 }
Here's my code:
mydatabase = client.JSON_DB
mycollection = mydatabase.JSON_all_2
listOfBooks = mycollection.distinct("bookname")
for book in listOfBooks:
match_variable = {
"$match": { 'Title': book }
}
group_variable = {
"$group":{
'_id': '$subdata.status',
'categories' : { '$addToSet' : '$subdata.status' },
'count': { '$sum': 1 }
}
}
project_variable = {
"$project": {
'_id': 0,
'categories' : 1,
'count' : 1
}
}
pipeline = [
match_variable,
group_variable,
project_variable
]
results = mycollection.aggregate(pipeline)
for result in results:
print(str(result['Title'])+" "+str(result['categories'])+" "+str(result['count']))
As you can probably tell, I have very little idea what I'm doing. When I run the code, I get an error because I'm trying to reference my $match term:
Traceback (most recent call last):
File "testScript.py", line 34, in main
print(str(result['Title'])+" "+str(result['categories'])+" "+str(result['count']))
KeyError: 'Title'
So a $match term is not included in the pipeline? Or am I not including it in the group_variable or project_variable ?
And on a general note, the above seems like a lot of code to do something relatively easy. Does anyone see a better way? Its easy to find simple examples online, but this is one step of complexity away from anything I can locate. Thank you.

Here's one aggregation pipeline to "$group" all the books by "Title" and "subData.status".
db.collection.aggregate([
{
"$group": {
"_id": {
"Title": "$Title",
"status": {"$ifNull": ["$subData.status", ""]}
},
"count": { "$count": {} }
}
},
{ // not really necessary, but puts output in predictable order
"$sort": {
"_id.Title": 1,
"_id.status": 1
}
},
{
"$replaceWith": {
"$mergeObjects": [
"$_id",
{"count": "$count"}
]
}
}
])
Example output for one of the "books":
{
"Title": "mumblecore",
"count": 3,
"status": ""
},
{
"Title": "mumblecore",
"count": 3,
"status": "Checked In"
},
{
"Title": "mumblecore",
"count": 8,
"status": "Checked Out"
},
{
"Title": "mumblecore",
"count": 6,
"status": "Missing"
}
Try it on mongoplayground.net.

Efficient way to calculate metrics from nested JSON, via python?

What is the most efficient way to calculate metrics from nested JSON, via python?
Given the JSON blob below, how can I calculate the user (ie profileId) with the most events - without using the pandas library and not having multiple nested for loops? I am having trouble writing the code the would not rely on O(N2).
{
"kind":"admin#reports#activities",
"etag":"\"5g8\"",
"nextPageToken":"A:1651795128914034:-4002873813067783265:151219070090:C02f6wppb",
"items":[
{
"kind":"admin#reports#activity",
"id":{
"time":"2022-05-05T23:59:39.421Z",
"uniqueQualifier":"5526793068617678141",
"applicationName":"token",
"customerId":"cds"
},
"etag":"\"jkYcURYoi8\"",
"actor":{
"email":"blah#blah.net",
"profileId":"1323"
},
"ipAddress":"107.178.193.87",
"events":[
{
"type":"auth",
"name":"activity",
"parameters":[
{
"name":"api_name",
"value":"admin"
},
{
"name":"method_name",
"value":"directory.users.list"
},
{
"name":"client_id",
"value":"722230783769-dsta4bi9fkom72qcu0t34aj3qpcoqloq.apps.googleusercontent.com"
},
{
"name":"num_response_bytes",
"intValue":"7158"
},
{
"name":"product_bucket",
"value":"GSUITE_ADMIN"
},
{
"name":"app_name",
"value":"Untitled project"
},
{
"name":"client_type",
"value":"WEB"
}
]
}
]
},
{
"kind":"admin#reports#activity",
"id":{
"time":"2022-05-05T23:58:48.914Z",
"uniqueQualifier":"-4002873813067783265",
"applicationName":"token",
"customerId":"df"
},
"etag":"\"5T53xK7dpLei95RNoKZd9uz5Xb8LJpBJb72fi2HaNYM/9DTdB8t7uixvUbjo4LUEg53_gf0\"",
"actor":{
"email":"blah.blah#bebe.net",
"profileId":"1324"
},
"ipAddress":"54.80.168.30",
"events":[
{
"type":"auth",
"name":"activity",
"parameters":[
{
"name":"api_name",
"value":"gmail"
},
{
"name":"method_name",
"value":"gmail.users.messages.list"
},
{
"name":"client_id",
"value":"927538837578.apps.googleusercontent.com"
},
{
"name":"num_response_bytes",
"intValue":"2"
},
{
"name":"product_bucket",
"value":"GMAIL"
},
{
"name":"app_name",
"value":"Zapier"
},
{
"name":"client_type",
"value":"WEB"
}
]
}
]
}
]
}

This would be a good place to start:
data = # your dict.
ids = [x['actor']['profileId'] for x in data['items']]
print(ids)
Output:
['1323', '1324']

Python seems to only read one item per top-level arrays

I have a JSON file that I read in Python. The JSON (see below) contains two top-level items, both are arrays, containing complex structure, including other arrays at lower levels. For some reason, Python seems to only read one item from both top level arrays.
This is the JSON:
{
"deliverables": [
{
"name": "<uvCode>gadget1",
"objects": [
{ "name": "handler-plate" },
{ "name": "Cone" }
]
},
{
"name": "<uvCode>gadget2",
"objects": [
{ "name": "handler-plate" },
{ "name": "Cone" }
]
}
],
"uvCombinations": [
{
"name": "st01",
"uvMapping": [
{
"objectNameContains": "handler-plate",
"uvLayer": "UVMap1"
},
{
"objectNameContains": "Cone",
"uvLayer": "UVMap1"
}
]
},
{
"name": "st02",
"uvMapping": [
{
"objectNameContains": "handler-plate",
"uvLayer": "UVMap3"
},
{
"objectNameContains": "Cone",
"uvLayer": "UVMap2"
}
]
}
]
}
This is my code to read and dump the JSON file:
with open("file.json") as configFile:
configuration = json.load(configFile)
logging.debug("CONFIG: %s", json.dumps(configuration, indent=4))
And this is the output:
CONFIG: {
"deliverables": [
{
"name": "<uvCode>gadget1",
"objects": [
{
"name": "handler-plate"
},
{
"name": "Cone"
}
]
}
],
"uvCombinations": [
{
"name": "st02",
"uvMapping": [
{
"objectNameContains": "handler-plate",
"uvLayer": "UVMap3"
},
{
"objectNameContains": "Cone",
"uvLayer": "UVMap2"
}
]
}
]
}
The second item of array deliverables (with name <uvCode>gadget2) and the first item of array uvCombination (the one with name st01) is somehow missing.
I'm not a Python expert, but I think this should work like charm, and it's strange that the missing items are not even of the same index. It get even more interesting if you observe that arrays called objects and uvMapping are read properly.
What am I doing wrong?, the poor guy asks

Oh guys, you saved my life! As two of you reported very quickly you can't repro it and as Jordan suggested that maybe my file does not contain what I think it does, I first started ROTL, then I took a look at the files, and found that the file name was not updated... I was editing another file for hours... :D
Thanks, guys, really. If you don't say you can't repro it, I never realize this since I completely forgot about the other copy of the file.

Getting linked documents in single lookup query in Elastic Search

To provide some context :
I want to write a bulk update query(possibly affecting 0.5 - 1M docs). The update would be in the aspects field (shown below) which are mostly duplicated.
My thinking was if I normalised it into another entity (aspect_label), the amount of docs updated would be reduced drastically (say 500-1000 max).
Query : I want to find out if there is a way to get linked documents via id in Elastic Search.
Eg. if I have documents in index my_db according to the mapping below.
Just to point out : processed_reviews is a child of aspect_label
{
"my_db":{
"mappings":{
"processed_reviews":{
"_all":{
"enabled":false
},
"_parent":{
"type":"aspect_label"
},
"_routing":{
"required":true
},
"properties":{
"data":{
"properties":{
"insights":{
"type":"nested",
"properties":{
"aspects":{
"type":"nested",
"properties":{
"aspect_label_id":{
"type":"keyword"
},
"aspect_term_frequency":{
"type":"long"
}
}
}
}
},
"preprocessed_text":{
"type":"text"
},
"preprocessed_title":{
"type":"text"
}
}
}
}
}
}
}
}
And another entity aspect_label :
{
"my_db": {
"mappings": {
"aspect_label": {
"_all": {
"enabled": false
},
"properties": {
"aspect": {
"type": "keyword"
},
"aspect_label_new": {
"type": "keyword"
},
"aspect_label_old": {
"type": "text"
}
}
}
}
}
}
Now, I want to write a search query on the processed_reviews type such that the aspect_label_id entity is replaced with the the value of aspect_label_new in the doc or the entire doc in aspect_label matching the id.
{
"_index":"my_db",
"_type":"processed_reviews",
"_id":"191b3bff-4915-4404-a05a-10e6bd2b19d4",
"_score":1,
"_routing":"5",
"_parent":"5",
"_source":{
"data":{
"preprocessed_text":"Good product I really like so comfortable and so light wait and looks good",
"preprocessed_title":"Good choice",
"insights":[
{
"aspects":[
{
"aspect_label":"color",
"aspect_term_frequency":1
}
]
}
]
}
}
}
Also, if there is a better way to approach this problem/ something wrong with my approach or if this is possible or not. Please inform me of the same as well.

How to make a 'outer' JSON key for JSON object with python

I would like to make the following JSON syntax output with python:
data={
"timestamp": "1462868427",
"sites": [
{
"name": "SiteA",
"zone": 1
},
{
"name": "SiteB",
"zone": 7
}
]
}
But I cannot manage to get the 'outer' data key there.
So far I got this output without the data key:
{
"timestamp": "1462868427",
"sites": [
{
"name": "SiteA",
"zone": 1
},
{
"name": "SiteB",
"zone": 7
}
]
}
I have tried with this python code:
sites = [
{
"name":"nameA",
"zone":123
},
{
"name":"nameB",
"zone":324
}
]
data = {
"timestamp": 123456567,
"sites": sites
}
print(json.dumps(data, indent = 4))
But how do I manage to get the outer 'data' key there?

Once you have your data ready, you can simply do this :
data = {'data': data}

JSON doesn't have =, it's all key:value.
What you're looking for is
data = {
"data": {
"timestamp": 123456567,
"sites": sites
}
}
json.dumps(data)
json.dumps() doesn't care for the name you give to the data object in python. You have to specify it manually inside the object, as a string.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Navigation through json (python) - python

Related

What happens to a $match term in a pipeline?

Efficient way to calculate metrics from nested JSON, via python?

Python seems to only read one item per top-level arrays

Getting linked documents in single lookup query in Elastic Search

How to make a 'outer' JSON key for JSON object with python

Categories

Resources