What is the $project (aggregation) API for Pymongo? - python

I'm rather new to MongoDB, I can find some commands in shell to execute my query, however, I can not find a proper function in PyMongo API manual.
For example, I would like to project some of the fields of the document to a new document. I suppose the $project could do it, but there is no such support in Pymongo. How could I execute the same query both in shell and Python? For example:
db.books.aggregate( [ { $project : { title : 1 , author : 1 } } ] )

For projecting you may use the query as
db.books.aggregate([{'$project':{ 'title':'$title', 'author':'$author'}}])

Related

pymongo update_one(), upsert=True without using $ operators

I have documents in the form:
{"hostname": "myhost1.com", "services": { ... } }
What I'd like to do is the following:
dataset = requests.get('http://endpoint.com/hardware.json').json()
for hostname, services in dataset[0].items():
db.titleHardware.update_one({'hostname':hostname},
{services.keys()[0]: services.values()[0]},
True) #upsert
However, I'm getting the following error:
ValueError: update only works with $ operators
Is there a way to accomplish this update of the entire "services" chunk, based on the "hostname" key (and ultimately, inserting a new document if hostname doesn't exist)? I know I can write logic to compare what's in my MongoDB with what I'm trying to update/insert, but I was hopeful that there may be something already in pymongo or something that I could use for this.
Did you look at the mongodb documentation for updateOne?
You have to specify an update operator such as $set:
for hostname, services in dataset[0].items():
db.titleHardware.update_one({'hostname':hostname},
{'$set': {services.keys()[0]: services.values()[0]}},
upsert=True)
Use replace_one to replace documents.
for hostname, services in dataset[0].items():
db.titleHardware.replace_one({'hostname':hostname},
{'hostname':hostname,
services.keys()[0]: services.values()[0]},
True)

Mongoengine filter query on list embedded field based on last index

I'm using Mongoengine with Django.
I have an embedded field in my model. that is a list field of embedded documents.
import mongoengine
class OrderStatusLog(mongoengine.EmbeddedDocument):
status_code = mongoengine.StringField()
class Order(mongoengine.DynamicDocument):
incr_id = mongoengine.SequenceField()
status = mongoengine.ListField(mongoengine.EmbeddedDocumentField(OrderStatusLog))
Now I want to filter the result on Order collection based on the last value in status field.
e.g. Order.objects.filter(status__last__status_code="scode")
I guess there is no such thing __last. I tried the approach mentioned in the docs http://docs.mongoengine.org/guide/querying.html#querying-lists
but didn't work.
I can solve this by looping over all the documents in the collection but thats not efficient, how can we write this query efficiently.
I'm not sure MongoEngine can do that (yet). AFAIK, you'd need to use the aggregation pipeline.
In the Mongo shell, using the '$slice' and the $arrayElemAt operators:
db.order.aggregate([{ $project: {last_status: { $arrayElemAt: [{ $slice: [ "$status", -1 ] }, 0 ]} }}, {$match: {'last_status.status_code':"scode"}} ])
And in Python:
pipeline = [
{'$project': {'last_status': { '$arrayElemAt': [{ '$slice': [ "$status", -1 ] }, 0 ]} }},
{'$match': {'last_status.status_code':'scode'}}
]
agg_cursor = Order.objects.aggregate(*pipeline)
result = [ Order.objects.get(id=order['_id']) for order in agg_cursor ]
The trick here is that objects.aggregate provides a PyMongo cursor, not a MongoEngine cursor, so if you need MongoEngine objects, you can proceed in two steps: first filter using the aggregation framework to get the ids of matched items, then get them through a MongoEngine query.
This is what I do. From my tests, it had proven to be much more efficient than fetching everything and filtering in the python code.
If there is a simpler way, I'm interested to ear about it. Otherwise, this could be a feature request for MongoEngine. You may want to open an issue there.

Simple MongoDB query slow

I am new to MongoDB. I am trying to write some data to a Mongo database from Python script, the data structure is simple:
{"name":name, "first":"2016-03-01", "last":"2016-03-01"}
I have a script to query if the "name" exists, if yes, update the "last" date, otherwise, create the document.
if db.collections.find_one({"name": the_name}):
And the size of data is actually very small, <5M bytes, and <150k records.
It was fast at first (e.g. the first 20,000 records), and then getting slower and slower. I checked the analyzer profile, some queries were > 50 miliseconds, but I don't see anything abnormal with those records.
Any ideas?
Update 1:
Seems there is no index for the "name" field:
> db.my_collection.getIndexes()
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "domains.my_collection"
}
]
First, you should check if the collection has an index on the "name" field. See the output of the following command in mongo CLI.
db.my_collection.getIndexes();
If there is no index then create it (note, on production environment you'd better create index in background).
db.my_collection.createIndex({name:1},{unique:true});
And if you want to insert a document if the document does not exist or update one field if the document exists then you can do it in one step without pre-querying. Use UPDATE command with upsert option and $set/$setOnInsert operators (see https://docs.mongodb.org/manual/reference/operator/update/setOnInsert/).
db.my_collection.update(
{name:"the_name"},
{
$set:{last:"current_date"},
$setOnInsert:{first:"current_date"}
},
{upsert:true}
);

Check for existence of multiple fields in MongoDB document

I am trying to query a database collection that holds documents of processes for those documents that have specific fields. For simplicity imagine the following general document schema:
{
"timestamp": ISODate("..."),
"result1": "pass",
"result2": "fail"
}
Now, when a process is started a new document is inserted with only the timestamp. When that process reaches certain stages the fields result1 and result2 are added over time. Some processes however do not reach the stages 1 or 2 and therefore have no result fields.
I would like to query the database to retrieve only those documents, which have BOTH result1 and result2.
I am aware of the $exists operator, but as far as I can tell this only works for one field at a time, i.e. db.coll.find({"result1": {$exists: true}}). The $exists operator cannot be used as a top level operator. E.g. this does not work:
db.coll.find({"$exists": {"result1": true, "result2": true}})
To check for both results I would need:
db.coll.find({"result1": {"$exists": true}, "result2": {"$exists": true}})
Now that already becomes tedious for more than one variable.
Is there a better way to do this?
(Also, I am doing this in Python, so if there is a solution for just the pymongo driver that would make me happy already.)
I don't know about better, but you can always process with JavaScript via $where:
jsStr = """var doc = this;
return ['result1','result2','result3']
.every(function(key) {
return doc.hasOwnProperty(key)
});"""
coll.find({ "$where": jsStr })
But you are going to have to specify an array of "keys" to check for somewhere.
If you think you have a lot of keys to type out, then why not just "build" your query expression:
whitelist = [ "result1", "result2", "result3" ]
query = {}
for key in whitelist:
query[key] = { "$exists": True }
coll.find(query)
That saves a bit of typing and since all MongoDB queries are just data structures anyway then using basic data manipulation to build queries makes sense.
How about using $and:
db.coll.find({"$and": [
{ "fld1": { "$exists": true }}
, { "fld2": { "$exists": true }}
, { "fld3": { "$exists": true }}
]})

python-eve array field contains query

I understand eve is by default using mongodb as backend, and mongodb actually support indexing/query on array field (doc) e.g.
db.inventory.find( { tags: { $in: [ /^be/, /^st/ ] } } )
do we support the same in eve? if not, how far are we (want to estimate whether I need to make some change in the schema - not ideal though)?
This is supported:
/?where={"tags": {"$in": ["programming"]}}
Regex are not allowed in there thought.

Categories

Resources