MongoDB/Python - Date in collection (to use for query) - python

I just started using mongoDB (Version 3.6.8) today, and I like it.
I was reading that it should be possible to have a date object directly in the database, but I can't get it to work.
Also I was wondering if it is the best solution or if I should just store my dates as "Epoch millis" instead?
I am trying to use use the $dateFromString keyword which should work but i receive this error:
bson.errors.InvalidDocument: key '$dateFromString' must not start with '$'
My code looks like this:
from datetime import date
import pymongo
dbcli = pymongo.MongoClient('mongodb://192.168.1.8:27017')
db = dbcli['washbase']
col = db['machine']
def conv(dato):
return {
'$dateFromString': {
'dateString': dato,
'format': '%Y-%m-%d',
'timezone':'Europe/Copenhagen',
}
}
today = date.today().isoformat()
data = {
'day': conv(today),
'time':12,
'room':'2B',
}
col.insert_one(data)
The reason why I need something like a date-object in the database is because I want to do a conditional query on the data, so that the database only sends the data i require. So i expect to do something like this.
result = col.find(
{
'day' : {
'$gt' : {
'$date' : '2020-01-01'
}
}
}
)
for x in results:
print(x)
But when I do this the app prints nothing.

The $dateFromString is an operator for MongoDB aggregations. An aggregation is a powerful way to create complex queries in MongoDB. Hence, this might not be what you need.
I would recommend storing the dates in the normal format. So your code should look something like this:
from datetime import date
import pymongo
dbcli = pymongo.MongoClient('mongodb://192.168.1.8:27017')
db = dbcli['washbase']
col = db['machine']
today = date.today().isoformat()
data = {
'day': today,
'time':12,
'room':'2B',
}
col.insert_one(data)
If you are concerned about timezones, MongoDB stores each date in UTC by default, converting whatever timezone is specified in your date to UTC. When reading the dates, you can then convert them to whatever timezone you need.
EDIT:
When writing your query, try using an actual date object. This converts the query date to an actual ISO date that the DB can understand.
col.find({'day': {'$gte': ISODate(date.today) }})
If you're trying to find entries that fall within a date range, you can do something like:
col.find({'day': {'$gte': ISODate(date.today), '$lte': ISODate(date.today + 24 hours) }})

Related

How to prevent Pandas to_dict() from converting timestamps to string?

I have a dataframe with a date field which appear to be represented as unix timestamps. When i call df.to_dict() on it the dates are getting converted to a string like this yyyy-mm-dd .... how can I prevent this from happening?
I'm using the code to return a JSON in my FastAPI app ...
df_results = pd.read_sql_query(sql_query_str, _engine)
return_object["results"] = df_results.to_dict(orient='records')
# outputs "date": 2021-12-31" in the json
return_object["results"] = json.loads(df_results.to_json(orient='records'))
# outputs "date": 1640908800000 in the json
You can specify the data type before using .to_dict(). Calling it as an integer should keep the UNIX timestamp e.g
df.astype(int).to_dict()

How to use regex on date to extract entry by year?

My entries in MongoDB have a publishedDate field as follows:
publishedDate:"{'$date': '1999-08-01T00:00:00.000-0700'}"
How do I retrieve the entries via collection.find with $regex, using user's input for year?
From MongoDB version >= 4.4 we can write custom filters using $function operator so try this:
let yearRegex = /^1999/;
db.testCollection.find({
$expr: {
$function: {
body: function(publishedDate, yearRegex) {
return yearRegex.test(publishedDate);
},
args: [{ $toString: "$publishedDate" }, yearRegex],
lang: "js"
}
}
});
Note: Instead of $toString we can also use $dateToString with timezone to cover edge cases.
"{'$date': '1999-08-01T00:00:00.000-0700'}" looks like MongoDB extended JSON notation for a Datetime object.
If the data in the collection is actually a date, note that the timezone in the database will be UTC, so the start/end would be off by a few hours if you intended to use any other timezone.
You can build a date object for the beginning of the year, and another for the beginning of the following year, and query for dates between:
let queryYear = 1999
db.collection.find({
publishedDate:{
$gte: new Date( queryYear + "-01-01T00:00:00-0700" ),
$lt: new Date( (queryYear+1) + "-01-01T00:00:00-0700")
}})
This allows to you build a date object with the desired timezone, and this query could also make use of an index on the publishedDate field.

PyMongo: Unable to update entries older than 7 days. Need to handle ISODate

I'm trying to create a script that will update entries in Mongo that have a pending status, that where created more than 7 days ago. However, I'm having issues that seem to stem from how the creation date is stored.
I run the following command in mongodb:
db.jobs.find( {"$and":[{"status": "PENDING"},{"createdOn":{"$lt":ISODate('2020-11-30T00:00:00.00000')}}]})
Where the ISODate is 7 days ago. I get the entries created before then (note: I'm not sure why $lt is works for this, but when I do $gt, I do not get any results). The createdOn field in the entries returned looks like this:
"createdOn" : ISODate("2020-11-20T18:50:40.062Z")
When I run a similar batch of code in python:
from pymongo import MongoClient
import pymongo
from datetime import datetime, timedelta
newDate = datetime.utcnow() - timedelta(days=7)
pendingJobs = list(db.jobs.find( {"$and":[{"state": "PENDING"},{"createdOn":{"$lt":newDate}}]}))
print(pendingJobs)
The date is returned in the following format:
'createdOn': datetime.datetime(2020, 11, 20, 18, 50, 40, 62000)
This seems to be preventing me from using pymongo to update the status field with:
db.jobs.update( {"$and":[{"state": "PENDING"},{"createdOn":{"$lt":newDate}}]}, { "$set": {"status":"FAILED"} })
Since pymongo runs with datetime, and Mongo runs with ISODate. How can I account for this? I've already tried the following with no effect:
formatting newDate as an ISODate
isoDate = newDate.isoformat()
db.jobs.update( {"$and":[{"state": "PENDING"},{"createdOn":{"$lt":isoDate}}]}, { "$set": {"status":"FAILED"} })
Trying to incorporate "ISODate" into the string itself:
db.jobs.update( {"$and":[{"state": "PENDING"},{"createdOn":{"$lt":"ISODate('"+isoDate+"')"}}]}, { "$set": {"status":"FAILED"} })
The pymongo drivers will map a python datetime.datetime to a BSON date which you see represented in the shell as an ISODate. So you first approach is fine - no need to do anything fancy with the dates.
Worth noting that MongoDB filters are ANDed by default so you can simply your query to:
db.jobs.find({"state": "PENDING", "createdOn": {"$lt": newDate}})
I suspect your root cause is that you are querying on state but updating status.
This code sample should work:
from pymongo import MongoClient
from datetime import datetime, timedelta
db = MongoClient()['mydatabase']
# Set up some sample data
for days in range(8):
db.jobs.insert_one({'state': 'PENDING', 'createdOn': datetime.utcnow() - timedelta(days=days)})
newDate = datetime.utcnow() - timedelta(days=7)
db.jobs.update_many({"state": "PENDING", "createdOn": {"$lt": newDate}}, {'$set': {'state': 'FAILED'}})
pendingJobs = list(db.jobs.find({'state': 'FAILED'}))
print(pendingJobs)
prints:
[{'_id': ObjectId('5fce82c2916f9131fec02966'), 'state': 'FAILED', 'createdOn': datetime.datetime(2020, 11, 30, 19, 30, 10, 393000)}]

Pymongo date query in "DD/MM/YYYY" format

I've got a mongodb collection with one of the fields like:
u'Date': u'15/03/2016'
Now from my understanding u means that strings are simply unicode which is fine. I'm using Django and pymongo to allow users to select 2 dates and query my DB for stuff between those two dates like that:
number = coll.find({"Date": {'$gt': startDate}, "Date": {'$lt': endDate}}).count()
Where my startDate and endDate are both in formats "DD/MM/YYYY". What I receive back however is some rubbish data. How to correctly query for dates in python-mongo?
P.S From my understanding comma between those two results in 'AND' query right?
If the dates are stored as strings in the DB, then when you use the operator $gt or $lt, the operation is a string comparison, not a date comparison. Which means that for example: "15/03/2016 < 16/02/2016", because 5 comes before 6 in lexical order.
For the string comparison to work, the dates would need to be stored in a format so that a "smaller" date is always represented as a "smaller" string. For example by using YYYY/MM/DD.
So if you don't want to do the comparison in python, you could either change the date format, or store the date as a date in DB. But in both cases, this means changing the DB...
If doing in Python is OK, then you can do it like so:
from datetime import datetime
date_format = "%d/%m/%Y"
start_date = datetime.strptime(startDate, date_format)
end_date = datetime.strptime(endDate, date_format)
items = coll.find({})
def compare(c, item):
item_date = datetime.strptime(item['Date'], date_format)
if start_date < item_date < end_date:
return c+1
else:
return c
count = reduce(compare, items, 0)

Django filter by datetime, after converting string to datetime

I'm trying to filter my query by a datetime. This datetime is the datetime for the value range the customer wants to know information for. I'm trying to set it to the first of the month selected by the customer. I pass the month number convert it to the correct string format and then convert to a datetime object because simply looking for the string object was returning no values and Django's documentation says you need to do it like:
pub_date__gte=datetime(2005, 1, 30)
Code:
if 'billing-report' in request.POST:
customer_id = int(post_data['selected_customer'])
This is the code I use to get the selected customer date and turn it into a tupple
if 'billing-report' in request.POST:
customer_id = int(post_data['selected_customer'])
selected_date = int(post_data['month'])
if selected_date < 10:
selected_date = '0'+str(selected_date)
year = datetime.now()
year = year.year
query_date = str(year) + '-' + str(selected_date) + '-01'
query_date_filter = datetime.strptime(query_date, "%Y-%m-%d")
compute_usages = ComputeUsages.objects.filter(customer_id = customer_id).filter(values_date = query_date_filter)
django debug shows: datetime.datetime(2014, 10, 1, 0, 0)
query_date looks like: 2014-07-01 before it is converted
.
No error but no data is returned
I used to use:
compute_usages = ComputeUsages.objects.filter(customer_id = customer_id).filter(values_date = datetime(query_date_filter))
which was causing the error. I'm sorry for changing my question as it evolved that is why I'm re-including what I was doing before so the comments make sense.
Almost all of that code is irrelevant to your question.
I don't understand why you are calling datetime on query_date. That is already a datetime, as you know because you converted it to one with strptime earlier. So there's no need for any more conversion:
ComputeUsages.objects.filter(customer_id=customer_id).filter(values_date=query_date)
Well after spending sometime exploring setting the query filter to datetime(year, month, day) I came to the realization that django doesn't convert it to a neutral datetime format it has to match exactly. Also my data in the database had the year, day, month.
Learning point:
You have to use the datetime() exactly how it is in the database django does not convert to a neutral format and compare. I assumed it was like writing a query and saying to_date or to_timestamp where the db will take your format and convert it to a neutral format to compare against the rest of the db.
Here is the correct way
compute_usages = ComputeUsages.objects.filter(customer_id = customer_id).filter(values_date = datetime(year, day, selected_month))

Categories

Resources