Filter with jsonpath-ng - python

Working with the following json data:
{
"data":
{
"level1":
[
{
"levelName": "level11",
"cost": 1,
"child":
{
"childName": "first",
"status": "running"
}
},
{
"levelName": "level12",
"cost": 2,
"child":
{
"childName": "second",
"status": "asleep"
}
}
]
}
}
A jsonpath search/filter using the expression
"$.data.level1[*][?(childName=='first')]"
correctly locates the data.
However, using the expression
"$.data.level1[*][?(levelName=='level11')]"
returns blank
How do I search at the "levelName": "level11" level?
In the latter case, if I have the "levelName": "level11" in the json data at the same level as "childName": "first", the search works successfully.

If I understand correctly you only need to slightly change your syntax to select the node in question:
$.data.level1[?(#.levelName="level11")]
We are already in the level1 array and can directly filter.

Related

Get field value in MongoDB without parent object name

I'm trying to find a way to retrieve some data on MongoDB trough python scripts
but I got stuck on a situation as follows:
I have to retrieve some data, check a field value and compare with another data (MongoDB Documents).
But the Object's name may vary from each module, see bellow:
Document 1
{
"_id": "001",
"promotion": {
"Avocado": {
"id": "01",
"timestamp": "202005181407",
},
"Banana": {
"id": "02",
"timestamp": "202005181407",
}
},
"product" : {
"id" : "11"
}
Document 2
{
"_id": "002",
"promotion": {
"Grape": {
"id": "02",
"timestamp": "202005181407",
},
"Dragonfruit": {
"id": "02",
"timestamp": "202005181407",
}
},
"product" : {
"id" : "15"
}
}
I'll aways have an Object called promotion but the child's name may vary, sometimes it's an ordered number, sometimes it is not. The field I need the value is the id inside promotion, it will aways have the same name.
So if the document matches the criteria I'll retrieve with python and get the rest of the work done.
PS.: I'm not the one responsible for this kind of Document Structure.
I've already tried these docs, but couldn't get them to work the way I need.
$all
$elemMatch
Try this python pipeline:
[
{
'$addFields': {
'fruits': {
'$objectToArray': '$promotion'
}
}
}, {
'$addFields': {
'FruitIds': '$fruits.v.id'
}
}, {
'$project': {
'_id': 0,
'FruitIds': 1
}
}
]
Output produced:
{FruitIds:["01","02"]},
{FruitIds:["02","02"]}
Is this the desired output?

Only getting string while iterating through json data

JSON:
{
"status": "success",
"data": {
"9": {
"1695056": {
"id": "1695056",
[...]
},
"csevents": {
"2807": {
"id": "2807",
"startdate": "2019-01-24 18:45:00",
"service_texts": [],
"eventTemplate": "1"
},
"2810": {
"id": "2810",
"startdate": "2019-01-31 18:45:00",
"service_texts": [],
"eventTemplate": "1"
}
}
},
"1695309": {
"id": "1695309",
[...]
},
"csevents": {
"3601": {
"id": "3601",
"startdate": "2019-05-17 18:45:00",
"service_texts": [],
"eventTemplate": "1"
}
I try to get the members from "csevents" ("2807", "2810", 3601") with python. Problem is that i don't know the IDs in "9" ("1695056", "1695309") while coding.
So i tried to iterate through "9" and then through "csevents" but if i iterate through "9" i only get a string so i can't iterate through "csevents" anymore.
Python:
for whatever in json_object['data']['9']:
for id in whatever['csevents']:
print(id)
So that doesn't work. Does anybody know how I can solve that?
Thanks
Had to clean up your JSON string to get it to work, but looking at your solution is seems like you're iterating directly from your dict, what you should be using is .items() or .values():
for key, value in json_object['data']['9'].items():
# We can use .keys() here since we only need the IDs from csevents
csevent_keys = list(value['csevents'].keys())
print(csevent_keys)
# Output
['2807', '2810']
['3601']

Update document if value there is no match

In Mongodb, how do you skip an update if one field of the document exists?
To give an example, I have the following document structure, and I'd like to only update it if the link key is not matching.
{
"_id": {
"$oid": "56e9978732beb44a2f2ac6ae"
},
"domain": "example.co.uk",
"good": [
{
"crawled": true,
"added": {
"$date": "2016-03-16T17:27:17.461Z"
},
"link": "/url-1"
},
{
"crawled": false,
"added": {
"$date": "2016-03-16T17:27:17.461Z"
},
"link": "url-2"
}
]
}
My update query is:
links.update({
"domain": "example.co.uk"
},
{'$addToSet':
{'good':
{"crawled": False, 'link':"/url-1"} }}, True)
Part of the problem is the crawl field could be set to True or False and the date will also always be different - I don't want to add to the array if the URL exists, regardless of the crawled status.
Update:
Just for clarity, if the URL is not within the document, I want it to be added to the existing array, for example, if /url-3 was introduced, the document would look like this:
{
"_id": {
"$oid": "56e9978732beb44a2f2ac6ae"
},
"domain": "example.co.uk",
"good": [
{
"crawled": true,
"added": {
"$date": "2016-03-16T17:27:17.461Z"
},
"link": "/url-1"
},
{
"crawled": false,
"added": {
"$date": "2016-03-16T17:27:17.461Z"
},
"link": "url-2"
},
{
"crawled": false,
"added": {
"$date": "2016-04-16T17:27:17.461Z"
},
"link": "url-3"
}
]
}
The domain will be unique and specific to the link and I want it to insert the link within the good array if it doesn't exist and do nothing if it does exist.
The only way to do this is to find if there is any document in the collection that matches your criteria using the find_one method, also you need to consider the "good.link" field in your filter criteria. If no document matches you run your update query using the update_one method, but this time you don't use the "good.link" field in your query criteria. Also you don't need the $addToSet operator as it's not doing anything simple use the $push update operator, it makes your intention clear. You also don't need to "upsert" option here.
if not link.find_one({"domain": "example.co.uk", "good.link": "/url-1"}):
link.update_one({"domain": "example.co.uk"},
{"$push": {"good": {"crawled": False, 'link':"/url-1"}}})
in your find section of the query you are matching all documents where
"domain": "example.co.uk"
you need to add that you don't want to match
'good.link':"/url-1"
so try
{
"domain": "example.co.uk",
"good.link": {$ne: "/url-1"}
}
The accepted answer is not correct by saying the only way to do it is using findOne first.
You can do it in a single db call by using the aggregation pipelined updates feature, this allows you to use aggregation operators within an update, now the strategy will be to concat two arrays, the first array will always be the "good" array, the second array will either be [new link] or an empty array based on the condition if the links exists or not using $cond, like so:
links.update({
"domain": "example.co.uk"
},
[
{
"$set": {
"good": {
"$ifNull": [
"$good",
[]
]
}
}
},
{
"$set": {
"good": {
"$concatArrays": [
"$good",
{
"$cond": [
{
"$in": [
"/url-1",
"$good.link"
]
},
[],
[
{
"crawled": False,
"link": "/url-1"
}
]
]
}
]
}
}
}
], True)
Mongo Playground

Issues decoding Collections+JSON in Python

I've been trying to decode a JSON response in Collections+JSON format using Python for a while now but I can't seem to overcome a small issue.
First of all, here is the JSON response:
{
"collection": {
"href": "http://localhost:8000/social/messages-api/",
"items": [
{
"data": [
{
"name": "messageID",
"value": 19
},
{
"name": "author",
"value": "mike"
},
{
"name": "recipient",
"value": "dan"
},
{
"name": "pm",
"value": "0"
},
{
"name": "time",
"value": "2015-03-31T15:04:01.165060Z"
},
{
"name": "text",
"value": "first message"
}
]
}
],
"version": "1.0",
"links": []
}
}
And here is how I am attempting to extract data:
response = urllib2.urlopen('myurl')
responseData = response.read()
jsonData = json.loads(responseData)
test = jsonData['collection']['items']['data']
When I run this code I get the error:
list indices must be integers, not str
If I use an integer, e.g. 0, instead of a string it merely shows 'data' instead of any useful information, unlike if I were to simply output 'items'. Similarly, I can't seem to access the data within a data child, for example:
test = jsonData['collection']['items'][0]['name']
This will argue that there is no element called 'name'.
What is the proper method of accessing JSON data in this situation? I would also like to iterate over the collection, if that helps.
I'm aware of a package that can be used to simplify working with Collections+JSON in Python, collection-json, but I'd rather be able to do this without using such a package.

Extracting values from deeply nested JSON structures

This is a structure I'm getting from elsewhere, that is, a list of deeply nested dictionaries:
{
"foo_code": 404,
"foo_rbody": {
"query": {
"info": {
"acme_no": "444444",
"road_runner": "123"
},
"error": "no_lunch",
"message": "runner problem."
}
},
"acme_no": "444444",
"road_runner": "123",
"xyzzy_code": 200,
"xyzzy_rbody": {
"api": {
"items": [
{
"desc": "OK",
"id": 198,
"acme_no": "789",
"road_runner": "123",
"params": {
"bicycle": "2wheel",
"willie": "hungry",
"height": "1",
"coyote_id": "1511111"
},
"activity": "TRAP",
"state": "active",
"status": 200,
"type": "chase"
}
]
}
}
}
{
"foo_code": 200,
"foo_rbody": {
"query": {
"result": {
"acme_no": "260060730303258",
"road_runner": "123",
"abyss": "26843545600"
}
}
},
"acme_no": "260060730303258",
"road_runner": "123",
"xyzzy_code": 200,
"xyzzy_rbody": {
"api": {
"items": [
{
"desc": "OK",
"id": 198,
"acme_no": "789",
"road_runner": "123",
"params": {
"bicycle": "2wheel",
"willie": "hungry",
"height": "1",
"coyote_id": "1511111"
},
"activity": "TRAP",
"state": "active",
"status": 200,
"type": "chase"
}
]
}
}
}
Asking for different structures is out of question (legacy apis etc).
So I'm wondering if there's some clever way of extracting selected values from such a structure.
The candidates I was thinking of:
flatten particular dictionaries, building composite keys, smth like:
{
"foo_rbody.query.info.acme_no": "444444",
"foo_rbody.query.info.road_runner": "123",
...
}
Pro: getting every value with one access and if predictable key is not there, it means that the structure was not there (as you might have noticed, dictionaries may have different structures depending on whether it was successful operation, error happened, etc).
Con: what to do with lists?
Use some recursive function that would do successive key lookups, say by "foo_rbody", then by "query", "info", etc.
Any better candidates?
You can try this rather trivial function to access nested properties:
import re
def get_path(dct, path):
for i, p in re.findall(r'(\d+)|(\w+)', path):
dct = dct[p or int(i)]
return dct
Usage:
value = get_path(data, "xyzzy_rbody.api.items[0].params.bicycle")
Maybe the function byPath in my answer to this post might help you.
You could create your own path mechanism and then query the complicated dict with paths. Example:
/ : get the root object
/key: get the value of root_object['key'], e.g. /foo_code --> 404
/key/key: nesting: /foo_rbody/query/info/acme_no -> 444444
/key[i]: get ith element of that list, e.g. /xyzzy_rbody/api/items[0]/desc --> "OK"
The path can also return a dict which you then run more queries on, etc.
It would be fairly easy to implement recursively.
I think about two more solutions:
You can try package Pynq, described here - structured query language for JSON (in Python). As far as a I understand, it's some kind of LINQ for python.
You may also try to convert your JSON to XML and then use Xquery language to get data from it - XQuery library under Python

Categories

Resources