MongoEngine Query Optimization - python

I have two collections ScenarioDrivers and ModelDrivers which has One to Many relationship with each other.
class ScenarioDrivers(Document):
meta = {
'collection': 'ScenarioDrivers'
}
ScenarioId = ReferenceField('ModelScenarios')
DriverId = ReferenceField('ModelDrivers')
DriverCalibrationMethod = StringField()
SegmentName = StringField()
DriverValue = ListField()
CalibrationStatus = StringField()
AdjustedValues = ListField(default=[])
CreateDate = DateTimeField(default=ObjectId().generation_time)
LastUpdateDate = DateTimeField(default=datetime.utcnow())
class ModelDrivers(Document):
meta = {
'collection': 'ModelDrivers'
}
PortfolioModelId = ReferenceField('PortfolioModels')
DriverName = StringField()
CreateDate = DateTimeField(default=ObjectId().generation_time)
LastUpdateDate = DateTimeField(default=datetime.utcnow())
FieldFormat = StringField()
DriverData = ListField()
My query is like this.
class GetCalibratedDrivers(Resource):
def get(self, scenario_id):
scenario_drivers_list = []
scenario_drivers = ScenarioDrivers.objects(ScenarioId=scenario_id).exclude('ScenarioId').select_related(1)
for scenario_driver in scenario_drivers:
scenario_driver_dict = {
'id': str(scenario_driver.id),
'DriverId': str(scenario_driver.DriverId.id),
'SegmentName': scenario_driver.SegmentName,
'CalibrationMethod': scenario_driver.DriverCalibrationMethod,
'CalibratedValues': exchange(scenario_driver.DriverValue),
'AdjustedValues': scenario_driver.AdjustedValues,
'LastUpdateDate': formatted_date(scenario_driver.LastUpdateDate),
'FieldFormat': scenario_driver.DriverId.FieldFormat
}
scenario_drivers_list.append(scenario_driver_dict)
return {
'DriverCalibrations': scenario_drivers_list
}
The Query matches 1140 records and then I construct a dictionary and make it a list.
But this API call takes 30s to process just 1140 records. Where I am missing? Please help. I am using latest version of Pymongo and MongoEngine.

I think the problem is not with your query, it is with you looping over 1140 records. I do not see any use of referenced objects so you should consider removing select_related(1). Once you do that, if you want to convert reference object ids to string, you can use as_pymongo() which will do that by default for you. And finally if you must read some data in specific format like formatted_date or exchange, it is better to save them as part of your document. i.e. save FormattedLastUpdateDate with LastUpdateDate. In MongoDB, you have to think about your read specific logic when you save the document.

Related

Elasticsearch: search does not return documents

When I run the following:
search_doc = Document.search(
using=client,
index=custom_index
)
search_doc = search_doc.query("term", doc_field = f"string_{num}")
and then use .scan() I get Hit objects. So, something seems to be wrong...
However, when I use the ids of those hit objects:
all_docs = [doc.meta.id for doc in search_doc.scan()]
for id in all_docs:
loc = Document.get(
using=client,
index=custom_index,
id = id
)
assert isinstance(doc, Document) # <- always true!!
Why is there this difference? Why doesn't the search object return documents objects?
Here's the definition of my Document class:
field_1 = dsl.Keyword()
field_2 = dsl.Keyword(multi=True)
data = dsl.Object(Inner_Document) # data will have many other fields
last_updated = dsl.Date()
class Index:
name = default_index # different from custom_index
settings = {
"number_of_shards": 1
}
class Meta:
dynamic = dsl.MetaField('strict')
The version of ES that I'm using is 7.17.6 .
The package elasticsearch is 7.17.3
and elasticsearch-dsl is 7.4.0 .

How to add index on ListField with mongoengine in python?

I want to add index on ListField.Here is my code:
class Post(Document):
meta = {"indexs":"testcomments.comment_id"}
_id = StringField()
txt = StringField()
testcomments = EmbeddedDocumentField(Comment)
comments = ListField(EmbeddedDocumentField(Comment))
class Comment(EmbeddedDocument):
comment = StringField()
comment_id = StringField()
...
...
I know how to add index on EmbeddedDocumentField (meta = {"indexs":"testcomments.comment_id"}),but how to add index on comments?
I believe it would work the same way for the list, thus
meta = {
"indexes": [
"testcomments.comment_id",
"comments.comment_id", # or simply 'comments' if you want a multikey index
]
}
Note that you can check the indexes being created with
col = Page._get_collection()
c.index_information()
If you use the dict form to define indexes e.g: meta = {'indexes': [{'fields': ['comments.comment_id']}}, you can have more granularity on the index definition (and syntax closer to pymongo/mongodb)

Mongoengine query with date range

I'm trying to retrieve data from mongodb via mongoengine within a specified time span. Below is the db model used.
class DeviationReport(db.Document):
meta = {'collection': 'DeviationReport'}
created_at = db.DateTimeField()
date = db.DateTimeField()
author = db.StringField()
read_by = db.ListField(default=[])
prod_line = db.ReferenceField(ProductionLine)
product = db.ReferenceField(Product)
description = db.StringField()
What I've tried is the code below. It does however not return any results. I've used a similar approach when I've needed to build dynamic queries depending on user input.
kwargs = {}
start = datetime.datetime(2018, 12, 11)
end = datetime.datetime(2019, 03, 13)
kwargs['created_at'] = { '$lt': end, '$gt': start }
DeviationReport.objects(**kwargs)
I've obviously made sure that there are objects within the date range, and I've read other similar posts where the query below has been successfully used. How do I get my query to return everything between 'start' and 'end', or how do I rewrite it to do as I wish?
Thanks you.
I worked around/solved the problem by first getting my results sans date filtering using **kwargs and then filtered that using Q. It may not be optimal, but it works for what I need it to do.
reports = DeviationReport.objects(**kwargs)
reports = reports.filter((Q(date__gte=start) & Q(date__lte=end)))
There's a number of ways to achieve your query, adjust the collection and params accordingly using the example below:
date_to = datetime.datetime.utcnow() # The end date
date_from = date_to - datetime.timedelta(days=120) # The start date
query_a = Application.objects(category="rest_api").filter(
date_created__gte=date_from,
date_created__lte=date_to
)
query_b = Application.objects(
date_created__gte=date_from,
date_created__lte=date_to
).filter(category="rest_api")
query = {"category": "rest_api"}
query_c = Application.objects(
date_created__gte=date_from,
date_created__lte=date_to,
**query
)
Querying with Q as suggested above did not work for me, but a raw query did:
raw_query = {'date': {'$gte': start, '$lt': end}}
reports = DeviationReport.objects(__raw__=raw_query)

mongengine (.10.6) does not save the Document

I have simple REST/json api. I am trying to update a mongoengine model Listing using a PUT call. I get one from the mongodb and update it using my own deserilize method with the incoming json. The update does not work as the json has a few DBRef and EmbeddedDocuments. The object does get updated with the correct values before the save is executed. There is no error but the object does not get saved. Any thoughts?
obj = Listing.objects.get(pk=id)
obj.deserialize(**request.json)
obj.save()
obj.reload()
return obj
class Listing():
name = db.StringField()
l_type = db.StringField( choices=listing_const.L_TYPE.choices(), )
expiry = db.ComplexDateTimeField( auto_now=False,auto_now_add=False, )
a_data = db.EmbeddedDocumentField(Media)
lcrr = db.ReferenceField( 'LCRR', reverse_delete_rule=3, dbref=False, )
meta = {
'db_alias': config.get_config()['MONGODB_SETTINGS']['alias'],
'cascade':True
}

How can I search in MongoDB ISODate field from Python?

I have an API from which I receive a query. This API is in Python.
I call it from a django app (views.py). Then, I want to query my MongoDB collection, using mongoengine:
api_response = requests.get("http://*******", {'query':query}) #We call the API
json_resp = api_response.json()
person = Person.objects(__raw__=json_resp).to_json() #We search for the json_query in the DB (raw) and the output is JSON
It works fine but I have a problem with dates... Indeed, my Person model is as follow:
class Person(DynamicDocument):
# Meta Variables
meta = {
'collection':'personsExample'
}
#Document variables
PersonID = models.CharField(max_length = 6)
FirstName = models.CharField(max_length = 50)
LastName = models.CharField(max_length = 50)
Gender = models.CharField(max_length = 6) #male/female
BirthDate = models.DateField()
CitizenCountryCode = models.CharField(max_length = 2)
My personsExample collection was imported via mongoimport from a CSV file:
mongoimport --host localhost --db persons --collection personsExample --type csv --file reducedPersonsExtract.csv --headerline
As the birth dates were set as string, I have converted them using:
db.personsExample.find().forEach(function(el){el.BirthDate = new ISODate(el.BirthDate); db.personsExample.save(el)})
The problem I have now is that it gives BirthDate field as follow:
"BirthDate" : ISODate("1970-12-21T00:00:00Z")
But in my json query, date is stored as
datetime.datetime(1970,12,21,0,0,0).isoformat()
Which gives:
{
"BirthDate": "1970-12-21T00:00:00"
}
Thus, the query doesn't work, I would need to make my query with
{
"BirthDate": ISODate"1970-12-21T00:00:00Z"
}
(But I can't create such objects (ISODate) with Python... )
Or to find another way to store the date in MongoDB.
Would you happen to know how I could solve my problem please?
I have succeeded to to what I wanted to. In fact, I don't convert dates to ISODate in the DB, I store them as string "YYYY-MM-DD". Then, I format the date in my API (which sends the JSON query to the App which uses MongoDB) :
my_dict['BirthDate'] = datetime.datetime(YYYY,MM,DD).isoformat()
Then, in my App :
json_resp = api_response.json() #Deserialization of the response
json_resp['BirthDate'] = datetime.datetime.strptime(json_resp['BirthDate'], "%Y-%m-%dT%H:%M:%S")
It may not be the best solution but it works.

Categories

Resources