I try to create a join query and exclude _id field from my result
stage_lookup_comments = {
"$lookup": {
"from": "products",
"localField": "product_codename",
"foreignField": "codename",
"as": "product",
}
}
pipeline = [
{ "$match": {
"category":category,
"archived_at":{"$eq": None}
}
},
stage_lookup_comments
]
array = await db[collection].aggregate(pipeline).to_list(CURSOR_LIMIT)
return array
I don't know what is the syntax to add the "_id": 0 parameter to my query.
You should be able to use MongoDB $project in your pipeline to select only those fields you want to return. In this particular case you can exclude _id field as you already mentioned putting _id:0.
Read documentation about $project here for more details.
I didn't test it, but your query should be something similar to the following:
stage_lookup_comments = {
"$lookup": {
"from": "products",
"localField": "product_codename",
"foreignField": "codename",
"as": "product",
}
}
pipeline = [
{
"$match": {
"category":category,
"archived_at":{"$eq": None}
}
},
stage_lookup_comments,
{
$project: { "_id": 0 }
}
]
array = await db[collection].aggregate(pipeline).to_list(CURSOR_LIMIT)
return array
EDIT:
Also, starting in MongoDB 4.2, you can use operator $unset to explicitly remove a field from a document (see documentation here):
{ $unset: ["_id"] }
You can read more about this in this very similar question here on Stackoverflow.
I hope this works!
Related
I'm using Flask with Jinja2 template engine and MongoDB via pymongo. This are my documents from two collections (phone and factory):
phone = db.get_collection("phone")
{
"_id": ObjectId("63d8d39206c9f93e68d27206"),
"brand": "Apple",
"model": "iPhone XR",
"year": NumberInt("2016"),
"image": "https://apple-mania.com.ua/media/catalog/product/cache/e026f651b05122a6916299262b60c47d/a/p/apple-iphone-xr-yellow_1.png",
"CPU": {
"manufacturer": "A12 Bionic",
"cores": NumberInt("10")
},
"misc": [
"Bluetooth 5.0",
"NFC",
"GPS"
],
"factory_id": ObjectId("63d8d42b7a4d7a7e825ef956")
}
factory = db.get_collection("factory")
{
"_id": ObjectId("63d8d42b7a4d7a7e825ef956"),
"name": "Foxconn",
"stock": NumberInt("1000")
}
In my python code to retrieve the data I do:
models = list(
phone.find({"brand": brand}, projection={"model": True, "image": True, "factory_id": True})
)
How can I retrieve relative factory document by factory_id and have it as an embedded document in a models list?
I think you are looking for this query using aggregation stage $lookup.
So this query:
First $match by your desired brand.
Then do a "join" between collections based on the factory_id and store it in an array called "factory". The $lookup output is always an array because can be more than one match.
Last project only values you want. In this case, as _id is unique you can get the factory using $arrayElemAt position 0.
So the code can be like this (I'm not a python expert)
models = list(
phone.aggregate([
{
"$match": {
"brand": brand
}
},
{
"$lookup": {
"from": "factory",
"localField": "factory_id",
"foreignField": "_id",
"as": "factories"
}
},
{
"$project": {
"model": True,
"image": True,
"factory": {
"$arrayElemAt": [
"$factories",
0
]
}
}
}
])
)
Using Python, I'm trying to go row-by-row through an Elasticsearch index with 12 billion documents and add a field to each document. The field is named direction and will contain "e" for some values of the field src and "e" for others. For this particular _id, the field should contain an "e".
from elasticsearch import Elasticsearch
es = Elasticsearch(["https://myESserver:9200"],
http_auth=('myUsername', 'myPassword'))
query_to_add_direction_field = {
"script": {
"inline": "direction=\"e\"",
"lang": "painless"
},
"query": {"constant_score": {
"filter": {"bool": {"must": [{"match": {"_id": "YKReAoQBk7dLIXMBhYBF"}}]}}}}
}
results = es.update_by_query(index="myIndex-*", body=query_to_add_direction_field)
I'm getting this error:
elasticsearch.BadRequestError: BadRequestError(400, 'script_exception', 'compile error')
I'm new to Elasticsearch. How can I correct my query so that it does not throw an error?
UPDATE:
I updated the code like this:
query_find_id = {
"size": "1",
"query": {
"bool": {
"filter": {
"term": {
"_id": "YKReAoQBk7dLIXMBhYBF"
}
}
}
}
}
query_to_add_direction_field = {
"script": {
"source": "ctx._source['egress'] = true",
"lang": "painless"
},
"query": {
"bool": {
"filter": {
"term": {
"_id": "YKReAoQBk7dLIXMBhYBF"
}
}
}
}
}
results = es.search(index="traffic-*", body=query_find_id)
results = es.update_by_query(index="traffic-*", body=query_to_add_direction_field)
results_after_update = es.search(index="traffic-*", body=query_find_id)
The code now runs without errors... I think I may have fixed it.
I say I think I may have fixed it because if I run the same code again, I get a version_conflict_engine_exception error on the call to update_by_query... but I think that just means the big 12B-row index is still being updated to match the change I made. Does that sound possibly accurate?
Please try the following query:
{
"script": {
"source": "ctx._source.direction = 'e'",
"lang": "painless"
},
"query": {
"constant_score": {
"filter": {
"bool": {
"must": [
{
"match": {
"_id": "YKReAoQBk7dLIXMBhYBF"
}
}
]
}
}
}
}
}
Regarding version_conflict_engine_exception it happens because the version of the document is not the one that the update_by_query operation expects, for example, because other process updated that doc at the same time.
You can add /_update_by_query?conflicts=proceed to workaround the issue.
Read more about conflicts here:
https://www.elastic.co/guide/en/elasticsearch/reference/8.5/docs-update-by-query.html#docs-update-by-query-api-desc
If you think it is a temporal conflict, you can use retry_on_conflict to try again after the conflicts:
retry_on_conflict
(Optional, integer) Specify how many times should the operation be retried when a conflict occurs. Default: 0.
Documents store in mongo db in following form
{
"_id" : ObjectId("54fa059ce4b01b3e086c83e9"),
"field1" : "value1",
"field2" : "value2"
"field3" : [
{
"abc123": ["somevalue", "somevalue"]
},
{
"xyz345": ["somevalue", "somevalue"]
}
]
}
What I want in output is whenever I pass abc123 in pymongo query I need result in following form
{
"abc123": ["somevalue", "somevalue"]
}
or
["somevalue", "somevalue"]
Please suggest a mongo query for it. Thanks
Maybe something like this:
db.collection.aggregate([
{
$project: {
field3: {
"$filter": {
"input": "$field3",
"as": "f",
"cond": {
$ne: [
"$$f.abc123",
undefined
]
}
}
}
}
},
{
$unwind: "$field3"
},
{
"$replaceRoot": {
"newRoot": "$field3"
}
}
])
Explained:
Use the mongo aggregation framework with below 3x stages:
project/filter only the needed array field3 if exist
unwind the field3 array
replace the root document with the content of field3
playground
I have a sample collection of documents in mongo db like below
[{"name":"hans","age":30,"test":"pass","pre":"no","calc":"no"},
{"name":"abs","age":20,"test":"not_pass","pre":"yes","calc":"no"},
{"name":"cdf","age":40,"test":"pass"},
{"name":"cvf","age":30,"test":"not_pass","pre":"no","calc":"yes"},
{"name":"cdf","age":23,"test":"pass"},
{"name":"asd","age":35,"test":"not_pass"}]
For some documents the fields pre and calc are not present. I want to add those two fields to the documents which dont have those fields with value null for both "pre":"null", "calc":"null".
The final document should look like
[{"name":"hans","age":30,"test":"pass","pre":"no","calc":"no"},
{"name":"abs","age":20,"test":"not_pass","pre":"yes","calc":"no"},
{"name":"cdf","age":40,"test":"pass","pre":"null","calc":"null"},
{"name":"cvf","age":30,"test":"not_pass","pre":"no","calc":"yes"},
{"name":"cdf","age":23,"test":"pass","pre":"null","calc":"null"},
{"name":"asd","age":35,"test":"not_pass","pre":"null","calc":"null"}]
I tried this way but didnt work.
db.users.update({}, { "$set" : { "pre":"null","calc":"null" }}, false,true)
Thinking that you need an update with the aggregation pipeline.
And use $ifNull operator.
db.users.update({},
[
{
"$set": {
"pre": {
$ifNull: [
"$pre",
"null"
]
},
"calc": {
$ifNull: [
"$calc",
"null"
]
}
}
}
],
false,
true
)
Sample Mongo Playground
The easiest option is to run this query for every missing field that you have , for example for pre:
db.collection.update({
pre: {
$exists: false
}
},
{
"$set": {
"pre": null
}
},
{
multi: true
})
Playground
I have two collections: document and citation. Their structures are shown below:
# document
{id:001, title:'foo'}
{id:002, title:'bar'}
{id:003, title:'abc'}
# citation
{from_id:001, to_id:002}
{from_id:001, to_id:003}
I want to query the information of cited documents (called references, which is denoted by to_id) of each document. In SQL, I would use the document table left joins citation, and then left joins document to get full information of the references (not just their ids).
However, I can only achieve the first step with $lookup in MongoDB. Here is my aggregate pipeline:
[
{'$lookup':{
'from': 'citation',
'localField': 'id',
'foreignField': 'from_id',
'as': 'references'
}}
]
I am able to get the following results with this pipeline:
{
id:001,
title:'foo',
references:[{from_id:001, to_id:002}, {from_id:001, to_id:003}]
}
The desired result is:
{
id:001,
title:'foo',
references:[{id:002, title:'bar'}, {id:003, title:'abc'}]
}
I have found this answer but it seems to be a one-to-one relationship that is not applicable in my case.
EDIT: Some people said that join should be avoided in MongoDB as it's not a relational database. I choose MongoDB because it's much faster than MySQL in my case.
You need to use $unwind and again $lookup on same collection, then you should $group by _id to get the desired result.
Try the below:
[
{
"$lookup": {
"from": "citation",
"localField": "_id",
"foreignField": "from_id",
"as": "references"
}
},
{
"$unwind": "$references"
},
{
"$lookup": {
"from": "doc",
"localField": "references.to_id",
"foreignField": "_id",
"as": "map"
}
},
{
"$unwind": "$map"
},
{
"$project": {
"_id": 1,
"title": 1,
"map_id": "$map._id",
"map_title": "$map.title"
}
},
{
"$group": {
"_id": "$_id",
"title": {
"$first": "$title"
},
"references": {
"$push": {
"id": "$map_id",
"title": "$map_title"
}
}
}
}
]