How can I retrieve relative document in MongoDB? - python

I'm using Flask with Jinja2 template engine and MongoDB via pymongo. This are my documents from two collections (phone and factory):
phone = db.get_collection("phone")
{
"_id": ObjectId("63d8d39206c9f93e68d27206"),
"brand": "Apple",
"model": "iPhone XR",
"year": NumberInt("2016"),
"image": "https://apple-mania.com.ua/media/catalog/product/cache/e026f651b05122a6916299262b60c47d/a/p/apple-iphone-xr-yellow_1.png",
"CPU": {
"manufacturer": "A12 Bionic",
"cores": NumberInt("10")
},
"misc": [
"Bluetooth 5.0",
"NFC",
"GPS"
],
"factory_id": ObjectId("63d8d42b7a4d7a7e825ef956")
}
factory = db.get_collection("factory")
{
"_id": ObjectId("63d8d42b7a4d7a7e825ef956"),
"name": "Foxconn",
"stock": NumberInt("1000")
}
In my python code to retrieve the data I do:
models = list(
phone.find({"brand": brand}, projection={"model": True, "image": True, "factory_id": True})
)
How can I retrieve relative factory document by factory_id and have it as an embedded document in a models list?

I think you are looking for this query using aggregation stage $lookup.
So this query:
First $match by your desired brand.
Then do a "join" between collections based on the factory_id and store it in an array called "factory". The $lookup output is always an array because can be more than one match.
Last project only values you want. In this case, as _id is unique you can get the factory using $arrayElemAt position 0.
So the code can be like this (I'm not a python expert)
models = list(
phone.aggregate([
{
"$match": {
"brand": brand
}
},
{
"$lookup": {
"from": "factory",
"localField": "factory_id",
"foreignField": "_id",
"as": "factories"
}
},
{
"$project": {
"model": True,
"image": True,
"factory": {
"$arrayElemAt": [
"$factories",
0
]
}
}
}
])
)

Related

Get field value in MongoDB without parent object name

I'm trying to find a way to retrieve some data on MongoDB trough python scripts
but I got stuck on a situation as follows:
I have to retrieve some data, check a field value and compare with another data (MongoDB Documents).
But the Object's name may vary from each module, see bellow:
Document 1
{
"_id": "001",
"promotion": {
"Avocado": {
"id": "01",
"timestamp": "202005181407",
},
"Banana": {
"id": "02",
"timestamp": "202005181407",
}
},
"product" : {
"id" : "11"
}
Document 2
{
"_id": "002",
"promotion": {
"Grape": {
"id": "02",
"timestamp": "202005181407",
},
"Dragonfruit": {
"id": "02",
"timestamp": "202005181407",
}
},
"product" : {
"id" : "15"
}
}
I'll aways have an Object called promotion but the child's name may vary, sometimes it's an ordered number, sometimes it is not. The field I need the value is the id inside promotion, it will aways have the same name.
So if the document matches the criteria I'll retrieve with python and get the rest of the work done.
PS.: I'm not the one responsible for this kind of Document Structure.
I've already tried these docs, but couldn't get them to work the way I need.
$all
$elemMatch
Try this python pipeline:
[
{
'$addFields': {
'fruits': {
'$objectToArray': '$promotion'
}
}
}, {
'$addFields': {
'FruitIds': '$fruits.v.id'
}
}, {
'$project': {
'_id': 0,
'FruitIds': 1
}
}
]
Output produced:
{FruitIds:["01","02"]},
{FruitIds:["02","02"]}
Is this the desired output?

How to join multiple collections in MongoDB (one to many relationship)?

I have two collections: document and citation. Their structures are shown below:
# document
{id:001, title:'foo'}
{id:002, title:'bar'}
{id:003, title:'abc'}
# citation
{from_id:001, to_id:002}
{from_id:001, to_id:003}
I want to query the information of cited documents (called references, which is denoted by to_id) of each document. In SQL, I would use the document table left joins citation, and then left joins document to get full information of the references (not just their ids).
However, I can only achieve the first step with $lookup in MongoDB. Here is my aggregate pipeline:
[
{'$lookup':{
'from': 'citation',
'localField': 'id',
'foreignField': 'from_id',
'as': 'references'
}}
]
I am able to get the following results with this pipeline:
{
id:001,
title:'foo',
references:[{from_id:001, to_id:002}, {from_id:001, to_id:003}]
}
The desired result is:
{
id:001,
title:'foo',
references:[{id:002, title:'bar'}, {id:003, title:'abc'}]
}
I have found this answer but it seems to be a one-to-one relationship that is not applicable in my case.
EDIT: Some people said that join should be avoided in MongoDB as it's not a relational database. I choose MongoDB because it's much faster than MySQL in my case.
You need to use $unwind and again $lookup on same collection, then you should $group by _id to get the desired result.
Try the below:
[
{
"$lookup": {
"from": "citation",
"localField": "_id",
"foreignField": "from_id",
"as": "references"
}
},
{
"$unwind": "$references"
},
{
"$lookup": {
"from": "doc",
"localField": "references.to_id",
"foreignField": "_id",
"as": "map"
}
},
{
"$unwind": "$map"
},
{
"$project": {
"_id": 1,
"title": 1,
"map_id": "$map._id",
"map_title": "$map.title"
}
},
{
"$group": {
"_id": "$_id",
"title": {
"$first": "$title"
},
"references": {
"$push": {
"id": "$map_id",
"title": "$map_title"
}
}
}
}
]

Update document if value there is no match

In Mongodb, how do you skip an update if one field of the document exists?
To give an example, I have the following document structure, and I'd like to only update it if the link key is not matching.
{
"_id": {
"$oid": "56e9978732beb44a2f2ac6ae"
},
"domain": "example.co.uk",
"good": [
{
"crawled": true,
"added": {
"$date": "2016-03-16T17:27:17.461Z"
},
"link": "/url-1"
},
{
"crawled": false,
"added": {
"$date": "2016-03-16T17:27:17.461Z"
},
"link": "url-2"
}
]
}
My update query is:
links.update({
"domain": "example.co.uk"
},
{'$addToSet':
{'good':
{"crawled": False, 'link':"/url-1"} }}, True)
Part of the problem is the crawl field could be set to True or False and the date will also always be different - I don't want to add to the array if the URL exists, regardless of the crawled status.
Update:
Just for clarity, if the URL is not within the document, I want it to be added to the existing array, for example, if /url-3 was introduced, the document would look like this:
{
"_id": {
"$oid": "56e9978732beb44a2f2ac6ae"
},
"domain": "example.co.uk",
"good": [
{
"crawled": true,
"added": {
"$date": "2016-03-16T17:27:17.461Z"
},
"link": "/url-1"
},
{
"crawled": false,
"added": {
"$date": "2016-03-16T17:27:17.461Z"
},
"link": "url-2"
},
{
"crawled": false,
"added": {
"$date": "2016-04-16T17:27:17.461Z"
},
"link": "url-3"
}
]
}
The domain will be unique and specific to the link and I want it to insert the link within the good array if it doesn't exist and do nothing if it does exist.
The only way to do this is to find if there is any document in the collection that matches your criteria using the find_one method, also you need to consider the "good.link" field in your filter criteria. If no document matches you run your update query using the update_one method, but this time you don't use the "good.link" field in your query criteria. Also you don't need the $addToSet operator as it's not doing anything simple use the $push update operator, it makes your intention clear. You also don't need to "upsert" option here.
if not link.find_one({"domain": "example.co.uk", "good.link": "/url-1"}):
link.update_one({"domain": "example.co.uk"},
{"$push": {"good": {"crawled": False, 'link':"/url-1"}}})
in your find section of the query you are matching all documents where
"domain": "example.co.uk"
you need to add that you don't want to match
'good.link':"/url-1"
so try
{
"domain": "example.co.uk",
"good.link": {$ne: "/url-1"}
}
The accepted answer is not correct by saying the only way to do it is using findOne first.
You can do it in a single db call by using the aggregation pipelined updates feature, this allows you to use aggregation operators within an update, now the strategy will be to concat two arrays, the first array will always be the "good" array, the second array will either be [new link] or an empty array based on the condition if the links exists or not using $cond, like so:
links.update({
"domain": "example.co.uk"
},
[
{
"$set": {
"good": {
"$ifNull": [
"$good",
[]
]
}
}
},
{
"$set": {
"good": {
"$concatArrays": [
"$good",
{
"$cond": [
{
"$in": [
"/url-1",
"$good.link"
]
},
[],
[
{
"crawled": False,
"link": "/url-1"
}
]
]
}
]
}
}
}
], True)
Mongo Playground

Elastic search is not showing the fields

I am newbie in Elastic search. I am trying to implement it in Python for one of my college projects. I want to use Elastic search as a resume indexer. Everything is working fine except it is showing all the fields in _source field .I don't want some fields and I tried too many thing but nothing is working. Below is my code
es = Elastcisearch()
query = {
"_source":{
"exclude":["resume_content"]
},
"query":{
"match":{
"resume_content":{
"query":keyword,
"fuzziness":"Auto",
"operator":"and",
"store":"false"
}
}
}
}
res = es.search(size=es_conf["MAX_SEARCH_RESULTS_LIMIT"],index=es_conf["ELASTIC_INDEX_NAME"], body=query)
return res
where es_conf is my local dictionary.
Apart from the above code I have also tried _source:false ,_source:[name of my fields], fields:[name of my fields] . I also tried store=False in my search method. Any ideas?
Did you try just using fields?
Here's a simple example. I set up a mapping with three fields, (imaginatively) named "field1", "field2", "field3":
PUT /test_index
{
"mappings": {
"doc": {
"properties": {
"field1": {
"type": "string"
},
"field2": {
"type": "string"
},
"field3": {
"type": "string"
}
}
}
}
}
Then I indexed three documents:
POST /test_index/doc/_bulk
{"index":{"_id":1}}
{"field1":"text11","field2":"text12","field3":"text13"}
{"index":{"_id":2}}
{"field1":"text21","field2":"text22","field3":"text23"}
{"index":{"_id":3}}
{"field1":"text31","field2":"text32","field3":"text33"}
And let's say I want to find docs that contain "text22" in field "field2", but I only want to return the contents of "field1" and "field2". Here's the query:
POST /test_index/doc/_search
{
"fields": [
"field1", "field2"
],
"query": {
"match": {
"field2": "text22"
}
}
}
which returns:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1.4054651,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "2",
"_score": 1.4054651,
"fields": {
"field1": [
"text21"
],
"field2": [
"text22"
]
}
}
]
}
}
Here's the code I used: http://sense.qbox.io/gist/69dabcf9f6e14fb1961ec9f761645c92aa8e528b
It should be straightforward to set this up with the Python adapter.

Elastic Search: including #/hashtags in search results

Using elastic search's query DSL this is how I am currently constructing my query:
elastic_sort = [
{ "timestamp": {"order": "desc" }},
"_score",
{ "name": { "order": "desc" }},
{ "channel": { "order": "desc" }},
]
elastic_query = {
"fuzzy_like_this" : {
"fields" : [ "msgs.channel", "msgs.msg", "msgs.name" ],
"like_text" : search_string,
"max_query_terms" : 10,
"fuzziness": 0.7,
}
}
res = self.es.search(index="chat", body={
"from" : from_result, "size" : results_per_page,
"track_scores": True,
"query": elastic_query,
"sort": elastic_sort,
})
I've been trying to implement a filter or an analyzer that will allow the inclusion of "#" in searches (I want a search for "#thing" to return results that include "#thing"), but I am coming up short. The error messages I am getting are not helpful and just telling me that my query is malformed.
I attempted to incorporate the method found here : http://www.fullscale.co/blog/2013/03/04/preserving_specific_characters_during_tokenizing_in_elasticsearch.html but it doesn't make any sense to me in context.
Does anyone have a clue how I can do this?
Did you create a mapping for you index? You can specify within your mapping to not analyze certain fields.
For example, a tweet mapping can be something like:
"tweet": {
"properties": {
"id": {
"type": "long"
},
"msg": {
"type": "string"
},
"hashtags": {
"type": "string",
"index": "not_analyzed"
}
}
}
You can then perform a term query on "hashtags" for an exact string match, including "#" character.
If you want "hashtags" to be tokenized as well, you can always create a multi-field for "hashtags".

Categories

Resources