How to join multiple collections in MongoDB (one to many relationship)?

How to join multiple collections in MongoDB (one to many relationship)? - python

I have two collections: document and citation. Their structures are shown below:
# document
{id:001, title:'foo'}
{id:002, title:'bar'}
{id:003, title:'abc'}
# citation
{from_id:001, to_id:002}
{from_id:001, to_id:003}
I want to query the information of cited documents (called references, which is denoted by to_id) of each document. In SQL, I would use the document table left joins citation, and then left joins document to get full information of the references (not just their ids).
However, I can only achieve the first step with $lookup in MongoDB. Here is my aggregate pipeline:
[
{'$lookup':{
'from': 'citation',
'localField': 'id',
'foreignField': 'from_id',
'as': 'references'
}}
]
I am able to get the following results with this pipeline:
{
id:001,
title:'foo',
references:[{from_id:001, to_id:002}, {from_id:001, to_id:003}]
}
The desired result is:
{
id:001,
title:'foo',
references:[{id:002, title:'bar'}, {id:003, title:'abc'}]
}
I have found this answer but it seems to be a one-to-one relationship that is not applicable in my case.
EDIT: Some people said that join should be avoided in MongoDB as it's not a relational database. I choose MongoDB because it's much faster than MySQL in my case.

You need to use $unwind and again $lookup on same collection, then you should $group by _id to get the desired result.
Try the below:
[
{
"$lookup": {
"from": "citation",
"localField": "_id",
"foreignField": "from_id",
"as": "references"
}
},
{
"$unwind": "$references"
},
{
"$lookup": {
"from": "doc",
"localField": "references.to_id",
"foreignField": "_id",
"as": "map"
}
},
{
"$unwind": "$map"
},
{
"$project": {
"_id": 1,
"title": 1,
"map_id": "$map._id",
"map_title": "$map.title"
}
},
{
"$group": {
"_id": "$_id",
"title": {
"$first": "$title"
},
"references": {
"$push": {
"id": "$map_id",
"title": "$map_title"
}
}
}
}
]

Related

How to filter ElasticSearch results without having it affect the document score?

I am trying to filter my results on "publication_year" field but I don't want it to affect the score of the document, but if I add the "range" to the query or to "filter", it seems to affect the score and score the documents higher whose "publication_year" is closer to "lte" or "less than equal to" the upper limit in the "range".
My query:
query = {
'bool': {
'should': [
{
'match_phrase': {
"title": keywords
}
},
{
'match_phrase': {
"abstract": keywords
}
},
]
}
}
if publication_year_constraint:
range_query = {"range":{"publication_year":{"gte":publication_year_constraint, "lte": datetime.datetime.today().year}}}
query["bool"]["filter"] = [range_query]
tried putting the "range" inside the "should" block as well, similar results.

Try use Filter Context.
In a filter context, a query clause answers the question “Does this
document match this query clause?” The answer is a simple Yes or
No — no scores are calculated.
Example:
{
"query": {
"bool": {
"must": [
{ "match": { "title": "Search" }},
{ "match": { "content": "Elasticsearch" }}
],
"filter": [
{ "term": { "status": "published" }},
{ "range": { "publish_date": { "gte": "2015-01-01" }}}
]
}
}
}

How can I retrieve relative document in MongoDB?

I'm using Flask with Jinja2 template engine and MongoDB via pymongo. This are my documents from two collections (phone and factory):
phone = db.get_collection("phone")
{
"_id": ObjectId("63d8d39206c9f93e68d27206"),
"brand": "Apple",
"model": "iPhone XR",
"year": NumberInt("2016"),
"image": "https://apple-mania.com.ua/media/catalog/product/cache/e026f651b05122a6916299262b60c47d/a/p/apple-iphone-xr-yellow_1.png",
"CPU": {
"manufacturer": "A12 Bionic",
"cores": NumberInt("10")
},
"misc": [
"Bluetooth 5.0",
"NFC",
"GPS"
],
"factory_id": ObjectId("63d8d42b7a4d7a7e825ef956")
}
factory = db.get_collection("factory")
{
"_id": ObjectId("63d8d42b7a4d7a7e825ef956"),
"name": "Foxconn",
"stock": NumberInt("1000")
}
In my python code to retrieve the data I do:
models = list(
phone.find({"brand": brand}, projection={"model": True, "image": True, "factory_id": True})
)
How can I retrieve relative factory document by factory_id and have it as an embedded document in a models list?

I think you are looking for this query using aggregation stage $lookup.
So this query:
First $match by your desired brand.
Then do a "join" between collections based on the factory_id and store it in an array called "factory". The $lookup output is always an array because can be more than one match.
Last project only values you want. In this case, as _id is unique you can get the factory using $arrayElemAt position 0.
So the code can be like this (I'm not a python expert)
models = list(
phone.aggregate([
{
"$match": {
"brand": brand
}
},
{
"$lookup": {
"from": "factory",
"localField": "factory_id",
"foreignField": "_id",
"as": "factories"
}
},
{
"$project": {
"model": True,
"image": True,
"factory": {
"$arrayElemAt": [
"$factories",
0
]
}
}
}
])
)

Exclude _id field during a join query

I try to create a join query and exclude _id field from my result
stage_lookup_comments = {
"$lookup": {
"from": "products",
"localField": "product_codename",
"foreignField": "codename",
"as": "product",
}
}
pipeline = [
{ "$match": {
"category":category,
"archived_at":{"$eq": None}
}
},
stage_lookup_comments
]
array = await db[collection].aggregate(pipeline).to_list(CURSOR_LIMIT)
return array
I don't know what is the syntax to add the "_id": 0 parameter to my query.

You should be able to use MongoDB $project in your pipeline to select only those fields you want to return. In this particular case you can exclude _id field as you already mentioned putting _id:0.
Read documentation about $project here for more details.
I didn't test it, but your query should be something similar to the following:
stage_lookup_comments = {
"$lookup": {
"from": "products",
"localField": "product_codename",
"foreignField": "codename",
"as": "product",
}
}
pipeline = [
{
"$match": {
"category":category,
"archived_at":{"$eq": None}
}
},
stage_lookup_comments,
{
$project: { "_id": 0 }
}
]
array = await db[collection].aggregate(pipeline).to_list(CURSOR_LIMIT)
return array
EDIT:
Also, starting in MongoDB 4.2, you can use operator $unset to explicitly remove a field from a document (see documentation here):
{ $unset: ["_id"] }
You can read more about this in this very similar question here on Stackoverflow.
I hope this works!

Get field value in MongoDB without parent object name

I'm trying to find a way to retrieve some data on MongoDB trough python scripts
but I got stuck on a situation as follows:
I have to retrieve some data, check a field value and compare with another data (MongoDB Documents).
But the Object's name may vary from each module, see bellow:
Document 1
{
"_id": "001",
"promotion": {
"Avocado": {
"id": "01",
"timestamp": "202005181407",
},
"Banana": {
"id": "02",
"timestamp": "202005181407",
}
},
"product" : {
"id" : "11"
}
Document 2
{
"_id": "002",
"promotion": {
"Grape": {
"id": "02",
"timestamp": "202005181407",
},
"Dragonfruit": {
"id": "02",
"timestamp": "202005181407",
}
},
"product" : {
"id" : "15"
}
}
I'll aways have an Object called promotion but the child's name may vary, sometimes it's an ordered number, sometimes it is not. The field I need the value is the id inside promotion, it will aways have the same name.
So if the document matches the criteria I'll retrieve with python and get the rest of the work done.
PS.: I'm not the one responsible for this kind of Document Structure.
I've already tried these docs, but couldn't get them to work the way I need.
$all
$elemMatch

Try this python pipeline:
[
{
'$addFields': {
'fruits': {
'$objectToArray': '$promotion'
}
}
}, {
'$addFields': {
'FruitIds': '$fruits.v.id'
}
}, {
'$project': {
'_id': 0,
'FruitIds': 1
}
}
]
Output produced:
{FruitIds:["01","02"]},
{FruitIds:["02","02"]}
Is this the desired output?

Elasticsearch not returning result for single word query

I have a basic Elasticsearch index that consists of a variety of help articles. Users can search for them in my Python/Django app.
The index has the following mappings:
{
"mappings": {
"properties": {
"body": {
"type": "text"
},
"category": {
"type": "nested",
"properties": {
"category_id": {
"type": "long"
},
"category_title": {
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
},
"type": "text"
}
}
},
"title": {
"type": "keyword"
},
"date_updated": {
"type": "date"
},
"position": {
"type": "integer"
}
}
}
}
I basically want the user to be able to search for a query and get any results that match the article title or category.
Say I have an article called "I Can't Remember My Password" in the "Your Account" category.
If I search for the article title exactly, I see the result. If I search for the category title exactly, I also see the result.
But if I search for just "password", I get nothing. What do I need to change in my setup/query to make it so that this query (or similarly non-exact queries) also returns the result?
My query looks like:
{
"query": {
"bool": {
"should": [{
"multi_match": {
"fields": ["title"],
"query": "password"
}
},
{
"nested": {
"path": "category",
"query": {
"multi_match": {
"fields": ["category.category_title"],
"query": "password"
}
}
}
}
]
}
}
}
I have read other questions and experimented with various settings but no luck so far. I am not doing anything particularly special at index time in terms of preparing the fields so I don't know if that's something to look at. I'm just using the elasticsearch-dsl defaults.

The solution was to reindex the title field as text rather than keyword. The latter only allows exact matching.
Credit to LeBigCat for pointing that out in the comments. They haven't posted it as an answer so I'm doing it on their behalf to improve visibility.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to join multiple collections in MongoDB (one to many relationship)? - python

Related

How to filter ElasticSearch results without having it affect the document score?

How can I retrieve relative document in MongoDB?

Exclude _id field during a join query

Get field value in MongoDB without parent object name

Elasticsearch not returning result for single word query

Categories

Resources