Pymongo: Remove an element from array - python

I'm trying to remove lowest price from the iPad's in my schema. I know how to find it using pymongo, but I don't how to remove it.
Here's my schema:
{
"_id": "sjobs",
"items": [
{
"type": "iPod",
"price": 20.00
},
{
"type": "iPad",
"price": 399.99
},
{
"type": "iPad",
"price": 199.99
},
{
"type": "iPhone 5",
"price": 300.45
}
]
}
{
"_id": "bgates",
"items": [
{
"type": "MacBook",
"price": 2900.99
},
{
"type": "iPad",
"price": 399.99
},
{
"type": "iPhone 4",
"price": 100.00
},
{
"type": "iPad",
"price": 99.99
}
]
}
I've got a python loop that finds the lowest sale price for iPad:
cursor = db.sales.find({'items.type': 'iPad'}).sort([('items', pymongo.DESCENDING)])
for doc in cursor:
cntr = 0
for item in doc['items']:
if item['type'] == 'iPad' and resetCntr == 0:
cntr = 1
sales.update(doc, {'$pull': {'items': {item['type']}}})
That doesn't work. What do I need to do to remove lowest iPad price item?

Your Python code isn't doing what you think it's doing (unless there is a lot of it you didn't include). You don't need to do the sorting and iterating on the client side - you should make the server do the work. Run this aggregation pipeline (I'm giving shell syntax, you can call it from your Python, of course):
> r = db.sales.aggregate( {"$match" : { "items.type":"iPad"} },
{"$unwind" : "$items"},
{"$match" : { "items.type":"iPad"} },
{"$group" : { "_id" : "$_id",
"lowest" : {"$min":"$items.price"},
"count":{$sum:1}
}
},
{"$match" : {count:{$gt:1}}}
);
{
"result" : [
{
"_id" : "bgates",
"lowest" : 99.99,
"count" : 2
},
{
"_id" : "sjobs",
"lowest" : 199.99,
"count" : 2
}
],
"ok" : 1
}
Now you can iterate over the "r.results" array and execute your update:
db.sales.update( { "_id" : r.results[0]._id },
{ "$pull" : { "items" : { "type" : "iPad", "price" : r.result[0].lowest}}} );
Note that I only include records which have more than one iPad - since otherwise you may end up deleting the only iPad record in the array. If you want to delete all "non-highest" prices then you'd want to find the max and $pull all the elements $lt that price.

Disclaimer: The below code is not tested as I do not have mongo installed locally. However I did take my time writing it so im pretty confident its close to working
def remove_lowest_price(collection):
cursor = collection.find({}, {'items': 1})
for doc in cursor:
items = doc['items']
id = doc['_id']
for item in items:
lowest_price = 100000 # a huge number
if item['type'] == 'i_pad' and item['price'] < lowest:
lowest = item['price']
# lowest now contains the price of the cheapest ipad
collection.update(
{'_id': id},
{'$pull': {'items': {'price': lowest}}}
)
Of course there will be a problem here if another item happens to have exactly the same price but I think it will be easy to improve from here

{'$pull': {'items': {item['type']}}}
This doesn't look like valid json, does it?
shouldn't be "sales.update(...)" be "db.sales.update(...)" in your example?
maybe it's better to have query in update operation:
db.sales.update({_id: doc[_id]}, ...)
rather than entire doc.
and finally the update body itself might be
{'$pull': {'items': {type: item['type']}}}

Related

How to paginate subdocuments in a MongoDB collection?

I have a MongoDB collection with the following data structure;
[
{
"_id": "1",
"name": "businessName1",
"reviews": [
{
"_id": "1",
"comment": "comment1",
},
{
"_id": "2",
"comment": "comment1",
},
...
]
}
]
As you can see, the reviews for each business are a subdocument within the collection, where businessName1 has a total of 2 reviews. In my real MongoDB collection, each business has 100s of reviews. I want to view only 10 on one page using pagination.
I currently have a find_one() function in Python that retrieves this single business, but it also retrieves all of its reviews as well.
businesses.find_one( \
{ "_id" : ObjectId(1) }, \
{ "reviews" : 1, "_id" : 0 } )
I'm aware of the skip() and limit() methods in Python, where you can limit the number of results that are retrieved, but as far as I'm aware, you can only perform these methods on the find() method. Is this correct?
Option 1: You can use $slice for pagination as follow:
db.collection.find({
_id: 1
},
{
_id: 0,
reviews: {
$slice: [
3,
5
]
}
})
Playground
Option 2: Or via aggregation + total array size maybe better:
db.collection.aggregate([
{
$project: {
_id: 0,
reviews: {
$slice: [
"$reviews",
3,
5
]
},
total: {
$size: "$reviews"
}
}
}
])
Playground

What happens to a $match term in a pipeline?

I'm a newbie to MongoDB and Python scripts. I'm confused how a $match term is handled in a pipeline.
Let's say I manage a library, where books are tracked as JSON files in a MongoDB. There is one JSON for each copy of a book. The book.JSON files look like this:
{
"Title": "A Tale of Two Cities",
"subData":
{
"status": "Checked In"
...more data here...
}
}
Here, status will be one string from a finite set of strings, perhaps just: { "Checked In", "Checked Out", "Missing", etc. } But also note also that there may not be a status field at all:
{
"Title": "Great Expectations",
"subData":
{
...more data here...
}
}
Okay: I am trying to write a MongoDB pipeline within a Python script that does the following:
For each book in the library:
Groups and counts the different instances of the status field
So my target output from my Python script would be something like this:
{ "A Tale of Two Cities" 'Checked In' 3 }
{ "A Tale of Two Cities" 'Checked Out' 4 }
{ "Great Expectations" 'Checked In' 5 }
{ "Great Expectations" '' 7 }
Here's my code:
mydatabase = client.JSON_DB
mycollection = mydatabase.JSON_all_2
listOfBooks = mycollection.distinct("bookname")
for book in listOfBooks:
match_variable = {
"$match": { 'Title': book }
}
group_variable = {
"$group":{
'_id': '$subdata.status',
'categories' : { '$addToSet' : '$subdata.status' },
'count': { '$sum': 1 }
}
}
project_variable = {
"$project": {
'_id': 0,
'categories' : 1,
'count' : 1
}
}
pipeline = [
match_variable,
group_variable,
project_variable
]
results = mycollection.aggregate(pipeline)
for result in results:
print(str(result['Title'])+" "+str(result['categories'])+" "+str(result['count']))
As you can probably tell, I have very little idea what I'm doing. When I run the code, I get an error because I'm trying to reference my $match term:
Traceback (most recent call last):
File "testScript.py", line 34, in main
print(str(result['Title'])+" "+str(result['categories'])+" "+str(result['count']))
KeyError: 'Title'
So a $match term is not included in the pipeline? Or am I not including it in the group_variable or project_variable ?
And on a general note, the above seems like a lot of code to do something relatively easy. Does anyone see a better way? Its easy to find simple examples online, but this is one step of complexity away from anything I can locate. Thank you.
Here's one aggregation pipeline to "$group" all the books by "Title" and "subData.status".
db.collection.aggregate([
{
"$group": {
"_id": {
"Title": "$Title",
"status": {"$ifNull": ["$subData.status", ""]}
},
"count": { "$count": {} }
}
},
{ // not really necessary, but puts output in predictable order
"$sort": {
"_id.Title": 1,
"_id.status": 1
}
},
{
"$replaceWith": {
"$mergeObjects": [
"$_id",
{"count": "$count"}
]
}
}
])
Example output for one of the "books":
{
"Title": "mumblecore",
"count": 3,
"status": ""
},
{
"Title": "mumblecore",
"count": 3,
"status": "Checked In"
},
{
"Title": "mumblecore",
"count": 8,
"status": "Checked Out"
},
{
"Title": "mumblecore",
"count": 6,
"status": "Missing"
}
Try it on mongoplayground.net.

Sort ElasticSearch results by a custom Compare function on field

If I want to fetch Driver data from elastic sorted on rating where rating could be ["good", "ok", "bad"], how to write the query which can help me get data in sorted order considering good > ok > bad
Ex sorted response list:
[{
"name": "driver1",
"rating": "good"
},
{
"name": "driver3",
"rating": "good"
},
{
"name": "driver2",
"rating": "ok"
},
{
"name": "driver4",
"rating": "bad"
}]
For changing score value based on a field in your index you can use script score query, your query should look like below example:
GET /my-index-2/_search
{
"query": {
"script_score": {
"query": {
"match_all":{}
},
"script": {
"source": "if (doc['rating.keyword'].value == 'good'){2} else if(doc['rating.keyword'].value == 'ok') {1} else if(doc['rating.keyword'].value == 'bad') {0}"
}
}
}
}
For more information about script score query you can check Elastic official documentation here.

Get field value in MongoDB without parent object name

I'm trying to find a way to retrieve some data on MongoDB trough python scripts
but I got stuck on a situation as follows:
I have to retrieve some data, check a field value and compare with another data (MongoDB Documents).
But the Object's name may vary from each module, see bellow:
Document 1
{
"_id": "001",
"promotion": {
"Avocado": {
"id": "01",
"timestamp": "202005181407",
},
"Banana": {
"id": "02",
"timestamp": "202005181407",
}
},
"product" : {
"id" : "11"
}
Document 2
{
"_id": "002",
"promotion": {
"Grape": {
"id": "02",
"timestamp": "202005181407",
},
"Dragonfruit": {
"id": "02",
"timestamp": "202005181407",
}
},
"product" : {
"id" : "15"
}
}
I'll aways have an Object called promotion but the child's name may vary, sometimes it's an ordered number, sometimes it is not. The field I need the value is the id inside promotion, it will aways have the same name.
So if the document matches the criteria I'll retrieve with python and get the rest of the work done.
PS.: I'm not the one responsible for this kind of Document Structure.
I've already tried these docs, but couldn't get them to work the way I need.
$all
$elemMatch
Try this python pipeline:
[
{
'$addFields': {
'fruits': {
'$objectToArray': '$promotion'
}
}
}, {
'$addFields': {
'FruitIds': '$fruits.v.id'
}
}, {
'$project': {
'_id': 0,
'FruitIds': 1
}
}
]
Output produced:
{FruitIds:["01","02"]},
{FruitIds:["02","02"]}
Is this the desired output?

Querying nested objects in Elasticsearch

I have a Product-Merchant mapping which looks like the following
catalog_map = {
"catalog": {
"properties": {
"merchant_id": {
"type": "string",
},
"products": {
"type": "object",
},
"merchant_name" :{
"type" : "string"
}
}
}
}
"product" has objects, say , product_id , product_name , product_price. Products and merchants are mapped, such that :
for merchant in Merchant.objects.all() :
products = [{"product_name" : x.product.name, "product_price" : x.price, "product_id" : x.product.id , "product_category" : x.product.category.name} for x in MerchantProductMapping.objects.filter(merchant=merchant)]
tab = {
'merchant_id': merchant.id,
'merchant_name': merchant.name,
'product': products
}
res = es.index(index="my-index", doc_type='catalog', body=tab)
The data gets indexed smoothly, in the desired form. Now, when I query the data from given index, I do it in the following way :
GET /esearch-index/catalog/_search
{
"query": {
"bool" :{
"must": [
{"match": {
"merchant_name": {
"query": "Sir John"
}
}}],
"should": [
{"match": {
"product_name": {
"query": "Vanilla"
}
}}
]
}}
This query gives me the result of all the products in the index with merchant name "Sir John" . However, I want it to return the details of the product "Vanilla" sold by "Sir John" instead.
On someone's recommendation, I used "_source" while querying, but that doesn't help.
How can I single out the information of one single object from the entire "catalog" index of the merchant?
Once your bool query has a must clause, all the conditions inside of it are required. The conditions inside of the should clause are not required. They will only boost the results. (http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-bool-query.html#query-dsl-bool-query)
So, going back to your query, it will retrieve all catalogs matching merchant_name "Sir John". This is the only required (must) condition. The name "Vanilla" will only boost results with the name "Vanilla" to the top, because it is not required.
If you want to retrieve "Vanilla" sold by "Sir John", put both conditions inside of the must clause and change your query to this:
{
"query": {
"bool": {
"must": [
{
"match": {
"merchant_name": {
"query": "Sir John"
}
}
},
{
"match": {
"product_name": {
"query": "Vanilla"
}
}
}
]
}
}
}

Categories

Resources