I have below an example of data:
{
"id": "2",
"items":
{
"3" : { "blocks" : { "3" : { "txt" : 'xx' } } },
"4" : { "blocks" : { "1" : { "txt" : 'yy'}, "2" : { "txt" : 'zz'} } }
}
}
I want to make it so that it looks like the below example data. Simply append a new value to items.3.blocks.3.txt while keeping existing value of it:
{
"id": "2",
"items":
{
"3" : { "blocks" : { "3" : { "txt" : 'xx, tt' } } },
"4" : { "blocks" : { "1" : { "txt" : 'yy'}, "2" : { "txt" : 'zz'} } }
}
}
I run below but did not make any difference
dbx.test.update({"_id": ObjectId("5192264c02a03e374e67d7be")}, {'$addToSet': {'items.3.blocks.0.txt': 'tt'}}, )
what should be correct syntax, any help is appreciated...
regards
You need to use '$push' instead of '$addToSet':
dbx.test.update({"_id": ObjectId("5192264c02a03e374e67d7be")}, {'$push': {'items.3.blocks.0.txt': 'tt'}}, )
I did not realize that this was a mongodb|pymongo question to start with.
a = {
"id": "2",
"items": {
"3" : { "blocks" : { "3" : { "txt" : 'xx' } } },
"4" : { "blocks" : { "1" : { "txt" : 'yy'}, "2" : { "txt" : 'zz'} } }
}
}
a['items']['3']['blocks']['3']['txt'] += ', yy'
?
Shorter answer on how to modify dicts:
a = {'moo' : 'cow', 'quack' : 'duck'}
a['moo'] = 'bull'
a['quack'] = 'duckling'
print(str(a))
if a['moo'] == 'bull':
print 'Yes the bull says moo'
If your data is a JSON string you first need to convert it from a JSON-string to a dictionary in Python, do so by:
import json
a = json.loads(<your string goes here>)
is simple, for will work with $addToSet need the data in arrays [ ] but you have the data in {"key":"value"}, but you have { "blocks" : { "1" : { "txt" : 'yy'}, "2" : { "txt" : 'zz'} } }, would recommend work in Arrays, Example
"items":
[
{ "algo" : { "blocks" : { "txt" : ['xx']} } } ,
{ "algoo" : { "blocks" : { "txt" : ['yy','zz']} } }
]
db.foo.update({"_id":1,"items.algo.blocks.txt":'xx'},{$addToSet:{"items.$.algo.blocks.txt":'tt'}});
and no use numbers in fields name.
{
"_id" : 1,
"items" : [
{
"algo" : {
"blocks" : {
"txt" : [
"xx",
"tt"
]
}
}
},
{
"algoo" : {
"blocks" : {
"txt" : [
"yy",
"zz"
]
}
}
}
]
}
Related
I have an explicit JSON input that I want to transform into metadata driven generic objects within an array. I have successfully done this on an individual basis however, I want to be able to do it using a configuration file instead.
What I have below is an example of the input data, the configuration I want to apply and the output data.
Since it is outputing into a generic schema, no matter what the input value data type is, I want it to always output as a string.
In addition, the origin data may not always exist in the origin payload so, when I did an individual one of these, I used try which worked really well however, when doing it via a second file of configuration, I am not sure if it will still be the same method, I guess it would, I expect a loop through the configuration file and it create whatever it can else skips to the next one until completed.
INPUT ORIGIN DATA
{
"activities_acceptance" : {
"contractors_sub_contractors" : {
"contractors_subcontractors_engaged" : "yes"
},
"cooking_deep_frying" : {
"deep_frying_engaged" : "yes",
"deep_fryer_vat_limit" : 10
}
},
"situation_acceptance" : {
"building_construction" : {
"wall_materials" : "CONCRETE"
}
}
}
CONFIGURATION PARAMETERS
{
"processiong_configuration" : [
{
"origin_path" : "activities_acceptance.contractors_sub_contractors",
"set_category" : "business-activity",
"set_type" : "contractors-subcontractors",
"set_value" : [
{
"use_value" : "activities_acceptance.contractors_sub_contractors.contractors_subcontractors_engaged",
"set_value" : "value"
}
]
},
{
"origin_path" : "activities_acceptance.cooking_deep_frying",
"set_category" : "business-activity",
"set_type" : "cooking-deep-frying",
"set_value" : [
{
"use_value" : "activities_acceptance.cooking_deep_frying.deep_frying_engaged",
"set_value" : "value"
},
{
"use_value" : "activities_acceptance.cooking_deep_frying.deep_fryer_vat_limit",
"set_value" : "details"
}
]
},
{
"origin_path" : "situation_acceptance.building_construction",
"set_category" : "situation-materials",
"set_type" : "wall-materials",
"set_value" : [
{
"use_value" : "situation_acceptance.building_construction.wall_materials",
"set_value" : "CONCRETE"
}
]
}
]
}
EXPECTED OUTPUT
{
"characteristics" : [
{
"category" : "business-activity",
"type" : "contractors-subcontractors",
"value" : "yes"
},
{
"category" : "business-activity",
"type" : "deep-frying",
"value" : "yes",
"details" : "10"
},
{
"category" : "situation-materials",
"type" : "wall-materials",
"value" : "CONCRETE"
}
]
}
What I currently have for a single transform without configuration is the following:
# Create Business Characteristics
business_characteristics = {
"characteristics" : []
}
# Create Characteristics - Business - Liability
# if liability section exists logic to go in here
try:
acc_liability = {
"category" : "business-activities",
"type" : "contractors-sub-contractors-engaged",
"description" : "",
"value" : "",
"details" : ""
}
acc_liability['value'] = d['line_of_businesses'][0]['assets']['commercial_operations'][0]['liability_asset']['acceptance']['contractors_and_subcontractors']['contractors_and_subcontractors_engaged']
acc_liability['details'] = d['line_of_businesses'][0]['assets']['commercial_operations'][0]['liability_asset']['acceptance']['contractors_and_subcontractors']['types_of_work_contractors_performed']
business_characteristics['characteristics'].append(acc_liability)
except:
acc_liability = {}
CURRENT OUTPUT in Jupyter
{
"characteristics": [
{
"category": "business-activities",
"type": "contractors-sub-scontractors-engaged",
"description": "",
"value": "YES",
"details": ""
}
]
}
My english is poor.
I have a piece of json data with such a structure, and I want to use python to aggregate this data, as shown in the figure.
If the service.name is the same, then it needs to be archived, and the duplicate "url.path" needs to be removed
I don’t know what method to use to store data, use list? dict?
Can anyone help please? thanks
```
{
[
{
"_source" : {
"error" : {
"exception" : [
{
"handled" : false,
"type" : "Abp.UI.UserFriendlyException",
"message" : "未发现该用户的WeiXinUserRelation,粉丝编号447519"
}
]
},
"trace" : {
"id" : "a3e3796ca145b448829d0d0f96661e67"
},
"#timestamp" : "2021-06-21T06:57:52.603Z",
"service" : {
"name" : "Lonsid_AAA_Web_Host"
}, "url" : {
"path" : "/product/getAAA" }
}
},
{
"_source" : {
"error" : {
"exception" : [
{
"handled" : false,
"type" : "Abp.UI.UserFriendlyException",
"message" : "未发现该用户的WeiXinUserRelation,粉丝编号447519"
}
]
},
"trace" : {
"id" : "a3e3796ca145b448829d0d0f96661e67"
},
"#timestamp" : "2021-06-21T06:57:52.603Z",
"service" : {
"name" : "Lonsid_BBB_Web_Host"
}, "url" : {
"path" : "/product/getBBB" }
}
},
{
"_source" : {
"error" : {
"exception" : [
{
"handled" : false,
"type" : "Abp.UI.UserFriendlyException",
"message" : "未发现该用户的WeiXinUserRelation,粉丝编号447519"
}
]
},
"trace" : {
"id" : "a3e3796ca145b448829d0d0f96661e67"
},
"#timestamp" : "2021-06-21T06:57:52.603Z",
"service" : {
"name" : "Lonsid_AAA_Web_Host"
}, "url" : {
"path" : "/product/getAAA" }
}
} ] }
```
This should get you started. It builds a cache of all the known past names, and drops anything that is previously seen.
import pprint
import json
json_data = """[
{
"_source" : {
"error" : {
"exception" : [
{
"handled" : false,
"type" : "Abp.UI.UserFriendlyException",
"message" : "未发现该用户的WeiXinUserRelation,粉丝编号447519"
}
]
},
"trace" : {
"id" : "a3e3796ca145b448829d0d0f96661e67"
},
"#timestamp" : "2021-06-21T06:57:52.603Z",
"service" : {
"name" : "Lonsid_AAA_Web_Host"
}, "url" : {
"path" : "/product/getAAA" }
}
},
{
"_source" : {
"error" : {
"exception" : [
{
"handled" : false,
"type" : "Abp.UI.UserFriendlyException",
"message" : "未发现该用户的WeiXinUserRelation,粉丝编号447519"
}
]
},
"trace" : {
"id" : "a3e3796ca145b448829d0d0f96661e67"
},
"#timestamp" : "2021-06-21T06:57:52.603Z",
"service" : {
"name" : "Lonsid_BBB_Web_Host"
}, "url" : {
"path" : "/product/getBBB" }
}
},
{
"_source" : {
"error" : {
"exception" : [
{
"handled" : false,
"type" : "Abp.UI.UserFriendlyException",
"message" : "未发现该用户的WeiXinUserRelation,粉丝编号447519"
}
]
},
"trace" : {
"id" : "a3e3796ca145b448829d0d0f96661e67"
},
"#timestamp" : "2021-06-21T06:57:52.603Z",
"service" : {
"name" : "Lonsid_AAA_Web_Host"
}, "url" : {
"path" : "/product/getAAA" }
}
}
]"""
data = json.loads(json_data)
cache = {}
for item in data:
if item['_source']['service']['name'] not in cache:
cache[item['_source']['service']['name']] = item
pprint.pprint(list(cache.values()))
I´m lookiing for update one field, array in array
db.germain.updateOne({}, {$set: { "items.$[elem].sub_items.price" : 2}}, {arrayFilters: [ { "elem.sub_item_name": "my_item_two_one" } ] } )
I find one but it doesn´t update.
{
"_id" : ObjectId("4faaba123412d654fe83hg876"),
"user_id" : 123456,
"items" : [
{
"item_name" : "my_item_one",
"sub_items" : [
{
"sub_item_name" : "my_item_one_one",
"price" : 20
},
]
},
{
"item_name" : "my_item_two",
"sub_items" : [
{
"sub_item_name" : "my_item_two_one",
"price" : 30
},
{
"sub_item_name" : "my_item_two_two",
"price" : 50
},
]
}
]
}
Actually its nested array. So you need to specify which object in the parent array to be changed? and which object in the specified object of parent array to be changed.
db.collection.update({},
{
$set: {
"items.$[parent].sub_items.$[child].price": 2
}
},
{
arrayFilters: [
{ "parent.item_name": "my_item_two" },
{ "child.sub_item_name": "my_item_two_one" }
}
]
})
If you need o to change the whole object in parent array, simply you can use $ positional operator.
I need to get all objects inside "posts" that have "published: true"
with pymongo. I've tried already so many variants but all I can do:
for elt in db[collection].find({}, {"posts"}):
print(elt)
And it'll show all "posts". I've tried smth like this:
for elt in db[collection].find({}, {"posts", {"published": {"$eq": True}}}):
print(elt)
But it doesn't work. Help, I'm trying for 3 days already =\
What you want to be doing is to use the aggregate $filter like so:
db[collection].aggregate([
{
"$match": { // only fetch documents with such posts
"posts.published": {"$eq": True}
}
},
{
"$project": {
"posts": {
"$filter": {
"input": "$posts",
"as": "post",
"cond": {"$eq": ["$$post.published", True]}
}
}
}
}
])
Note that the currenct structure returned will be:
[
{posts: [post1, post2]},
{posts: [post3, post4]}
]
If you want to retrieve it as a list of posts you'll need to add an $unwind stage to flatten the array.
The query options are quite limited you can do it with $elemMatch (projection) or with the $ operator but both of these return only the first post that matches the condition which is not what you want.
------- EDIT --------
Realizing posts is actually an object and not an array, you'll have to turn the object to an array, iterate over to filter and then restore the structure like so:
db.collection.aggregate([
{
$project: {
"posts": {
"$arrayToObject": {
$filter: {
input: {
"$objectToArray": "$posts"
},
as: "post",
cond: {
$eq: [
"$$post.v.published",
true
]
}
}
}
}
}
}
])
Mongo Playground
What I assumed that your document looks like this,
{
"_id" : ObjectId("5f8570f8afdefd2cfe7473a7"),
"posts" : {
"a" : {
"p" : false,
"name" : "abhishek"
},
"k" : {
"p" : true,
"name" : "jack"
},
"c" : {
"p" : true,
"name" : "abhinav"
}
}}
You can try the following query but the result format will be a bit different, adding that for clarification,
db.getCollection('temp2').aggregate([
{
$project: {
subPost: { $objectToArray: "$posts" }
}
},
{
'$unwind' : '$subPost'
},
{
'$match' : {'subPost.v.p':true}
},
{
'$group': {_id:'$_id', subPosts: { $push: { subPost: "$subPost"} }}
}
])
result format,
{
"_id" : ObjectId("5f8570f8afdefd2cfe7473a7"),
"subPosts" : [
{
"subPost" : {
"k" : "k",
"v" : {
"p" : true,
"name" : "jack"
}
}
},
{
"subPost" : {
"k" : "c",
"v" : {
"p" : true,
"name" : "abhinav"
}
}
}
]
}
I'm using imply-2.2.3. Here is my tranquility server configuration:
{
"dataSources" : [
{
"spec" : {
"dataSchema" : {
"dataSource" : "tutorial-tranquility-server",
"parser" : {
"type" : "string",
"parseSpec" : {
"timestampSpec" : {
"column" : "timestamp",
"format" : "auto"
},
"dimensionsSpec" : {
"dimensions" : [],
"dimensionExclusions" : [
"timestamp",
"value"
]
},
"format" : "json"
}
},
"granularitySpec" : {
"type" : "uniform",
"segmentGranularity" : "hour",
"queryGranularity" : "none"
},
"metricsSpec" : [
{
"type" : "count",
"name" : "count"
},
{
"name" : "value_sum",
"type" : "doubleSum",
"fieldName" : "value"
},
{
"fieldName" : "value",
"name" : "value_min",
"type" : "doubleMin"
},
{
"type" : "doubleMax",
"name" : "value_max",
"fieldName" : "value"
}
]
},
"ioConfig" : {
"type" : "realtime"
},
"tuningConfig" : {
"type" : "realtime",
"maxRowsInMemory" : "50000",
"windowPeriod" : "PT10M"
}
},
"properties" : {
"task.partitions" : "1",
"task.replicants" : "1"
}
},
{
"spec": {
"dataSchema" : {
"dataSource" : "test",
"parser" : {
"type" : "string",
"parseSpec" : {
"timestampSpec" : {
"column" : "timestamp",
"format" : "auto"
},
"dimensionsSpec" : {
"dimensions" : [
"a"
],
},
"format" : "json"
}
},
"granularitySpec" : {
"type" : "uniform",
"segmentGranularity" : "hour",
"queryGranularity" : "none"
},
"metricsSpec" : [
{
"type" : "count",
"name" : "count"
},
{
"type": "doubleSum",
"name": "b",
"fieldName": "b"
}
]
},
"ioConfig" : {
"type" : "realtime"
},
"tuningConfig" : {
"type" : "realtime",
"maxRowsInMemory" : "50000",
"windowPeriod" : "P1Y"
}
},
"properties": {
"task.partitions" : "1",
"task.replicants" : "1"
}
}
],
"properties" : {
"zookeeper.connect" : "localhost",
"druid.discovery.curator.path" : "/druid/discovery",
"druid.selectors.indexing.serviceName" : "druid/overlord",
"http.port" : "8200",
"http.threads" : "40",
"serialization.format" : "smile",
"druidBeam.taskLocator": "overlord"
}
}
I have trouble sending data to the second datasoruce, test, specifically. I tried to send the below data to druid with python requests:
{'b': 7, 'timestamp': '2017-01-20T03:32:54.586415', 'a': 't'}
The response I receive:
b'{"result":{"received":1,"sent":0}}'
If you read my config file you will notice that I set window period to one year. I would like to send data in with a large time span to druid using tranquility server. Is there something wrong with my config or data?