Mongodb get new values from collection without timestamp

Mongodb get new values from collection without timestamp - python

I want to fetch added new values from mongodb collections without timestamp value. I guess only choice using objectid field. I using test dataset on github. "https://raw.githubusercontent.com/mongodb/docs-assets/primer-dataset/primer-dataset.json"
For example if I add new data to this collection, how ı fetch or how ı find these new values.
Some mongodb collections using timestamp value, and I use this timestamp value for get new values. But ı do not know, how ı find without timestamp.
Example dataset ;
enter image description here
I want like this filter. but it doesn't work
{_id: {$gt: '622e04d69edb39455e06d4af'}}

If you don't want to create a new field in the document.
SomeGlobalObj = ObjectId[] // length limit is 10
// you will need Redis or other outside storage if you have multi server
SomeGlobalObj.shift(newDocumentId)
SomeGlobalObj = SomeGlobalObj.slice(0,10)
//Make sure to keep the latest 10 IDs.
Now, if you want to retrieve the latest document, you can use this array.
If the up-to-date thing you're talking about is, disappears after checking, you can remove it from this array after query.

In the comments you mentioned that you want to do this using Python, so I shall answer from that perspective.
In Mongo, an ObjectId is composed of 3 sections:
a 4-byte timestamp value, representing the ObjectId's creation, measured in seconds since the Unix epoch
a 5-byte random value generated once per process. This random value is unique to the machine and process.
a 3-byte incrementing counter, initialized to a random value
Because of this, we can use the ObjectId to sort or filter by created timestamp. To construct an ObjectId for a specific date, we can use the following code:
gen_time = datetime.datetime(2010, 1, 1)
dummy_id = ObjectId.from_datetime(gen_time)
result = collection.find({"_id": {"$lt": dummy_id}})
Source: objectid - Tools for working with MongoDB ObjectIds
This example will find all documents created before 2010/01/01. Substituting $gt would allow this query to function as you desire.
If you need to get the timetamp from an ObjectId, you can use the following code:
id = myObjectId.generation_time

Related

ibm_db.execute- how to get the result set

Newbie working with db2. Developing a python script using ibm_db package. I have a select query where i am binding params using ibm_db.bind_param(stmt, 1,param1). and then doing a result = ibm_db.execute(stmt). How can I get the results from the query. the documentation is scarce on this topic. Would appreciate any example code.

After ibm_db.execute(stmt) you need to fetch data from a result
try this:
data = ibm_db.fetch_assoc(stmt)
Fetch data from a result set by calling one of the fetch functions.
ibm_db.fetch_tuple: Returns a tuple, which is indexed by column position, representing a row in a result set. The columns are 0-indexed.
ibm_db.fetch_assoc: Returns a dictionary, which is indexed by column name, representing a row in a result set.
ibm_db.fetch_both: Returns a dictionary, which is indexed by both column name and position, representing a row in a result set.
ibm_db.fetch_row: Sets the result set pointer to the next row or requested row. Use this function to iterate through a result set.

Study the examples for fetching result-sets in Python with ibm_db, that are in the Db2 knowledge center online at this link

not getting integer type in elastic

I have been trying this for hours now. But this is not working as expected.
I am pushing data to elasticsearch via python script. Below are some fields I want as integers, but they are not being stored as integers. Sometimes, they are of None Type, else they are strings. So, I did this
body['fuel_fee'] = int(rows[a][23] or 0)
body['late_fee'] = int(rows[a][24] or 0)
body['other_fee'] = int(rows[a][26] or 0)
But I see that they are still being stored as strings in elastic. I want sum
operation on these.
I even deleted index and rewrote the entire data, so I can confirm that there is no issue of previous mappings here.
Why am I not getting these fields as integers ? How can I get this done ?
EDIT - I am fetching data from postgres database. And in postgres, these fields are stored as strings, not integers. Can it have any effect ? I think no, as I am type casting in here in python.

The datatype of a field is determined in either of the following ways
When you create mappings (before indexing any real data) and explicitly tell elasticsearch about the field type. In your example, the field fuel_fee will be mapped to long and any record containing non-integral values will throw an error
Based on the first document indexed, elasticsearch determines the field type. It tries to convert the subsequent document field values to the same type thereafter.
Coming back to your question, how do you know that all your fields are stored as strings and not integer? Can you try GET <your-index>/_mapping and see if your assumption is correct.
If the problem persists, try any of the following:
Create mappings before indexing any data.
Index only 1 document(with kibana or through curl api) and check the mapping output again.

Django - Time aggregates of DatetimeField across queryset

(using django 1.11.2, python 2.7.10, mysql 5.7.18)
If we imagine a simple model:
class Event(models.Model):
happened_datetime = DateTimeField()
value = IntegerField()
What would be the most elegant (and quickest) way to run something similar to:
res = Event.objects.all().aggregate(
Avg('happened_datetime')
)
But that would be able to extract the average time of day for all members of the queryset. Something like:
res = Event.objects.all().aggregate(
AvgTimeOfDay('happened_datetime')
)
Would it be possible to do this on the db directly?, i.e., without running a long loop client-side for each queryset member?
EDIT:
There may be a solution, along those lines, using raw SQL:
select sec_to_time(avg(time_to_sec(extract(HOUR_SECOND from happened_datetime)))) from event_event;
Performance-wise, this runs in 0.015 second for ~23k rows on a laptop, not optimised, etc. Assuming that could yield accurate/correct results and since time is only a secondary factor, could I be using that?

Add another integer field to your model that contains only the hour of the day extracted from the happened_datetime.
When creating/updating a model instance you need to update this new field accordingly whenever the happened_datetime is set/updated. You can extract the hours of the day for example by reading datetime.datetime.hour. Or use strftime to create a value to your liking.
Aggregation should then work as proposed by yourself.
EDIT:
Django's ORM has Extract() as a function. Example from the docs adapted to your use case:
>>> # How many experiments completed in the same year in which they started?
>>> Event.objects.aggregate(
... happenend_datetime__hour=Extract('happenend_datetime', 'hour'))
(Not tested!)
https://docs.djangoproject.com/en/1.11/ref/models/database-functions/#extract

So after a little search and tries.. the below seems to work. Any comments on how to improve (or hinting as to why it is completely wrong), are welcome! :-)
res = Event.objects.raw('''
SELECT id, sec_to_time(avg(time_to_sec(extract(HOUR_SECOND from happened_datetime)))) AS average_time_of_day
FROM event_event
WHERE happened_datetime BETWEEN %s AND %s;''', [start_datetime, end_datetime])
print res[0].__dict__
# {'average_time_of_day': datetime.time(18, 48, 10, 247700), '_state': <django.db.models.base.ModelState object at 0x0445B370>, 'id': 9397L}
Now the ID returned is that of the last object falling in the datetime range for the WHERE clause. I believe Django just inserts that because of "InvalidQuery: Raw query must include the primary key".
Quick explanation of the SQL series of function calls:
Extract HH:MM:SS from all datetime fields
Convert the time values to seconds via time_to_sec.
average all seconds values
convert averaged seconds value back into time format (HH:MM:SS)
Don't know why Django insists on returning microseconds but that is not really relevant. (maybe the local ms at which the time object was instantiated?)
Performance note: this seems to be extremely fast but then again I haven't tested that bit. Any insight would be kindly appreciated :)

Algorithm to determine the closest date to some date input

I have a Python program that uses historical data coming from a database and allows the user to select the dates input. However not all the possible dates are available into the database, since these are financial data: in other words, if the user will insert "02/03/2014" (which is Sunday) he won't find any record in the database because the stock exchange was closed.
This causes SQL problems cause when the record is not found, the SQL statement fails and the user needs to adjust the date until the moment he finds an existing record. To avoid this I would like to build an algorithm which is able to change the date inputs itself choosing the closest to the originary input. For example, if the user inserts "02/03/2014", the closest would be 03/03/2014".
I have thought about something like this, where the table MyData is containing date values only (I'm still in process of working on the proper syntaxis but it's just to show the idea):
con = lite.connect('C:/.../MyDatabase.db')
cur = con.cursor()
cur.execute('SELECT * from MyDates')
rowsD= cur.fetchall()
data = []
for row in rowsD:
data.append(rowsD[row])
>>>data
['01/01/2010', '02/01/2010', .... '31/12/2013']
inputDate = '07/01/2010'
differences = []
for i in range(0, len(data)):
differences.append(abs(data[i] - inputDate))
After that, I was thinking about:
getting the minimum value from the vector differences: mV = min(differences)
getting the corresponding date value into the list data
Howwever, this cost me two things in terms of memory:
I need to load all the database, which is huge;
I have to iterate many times (once to build the list data, then the list of differences etc.)
Does anyone have a better idea to build this, or knows a different approach to the problem?

Query the database on the dates that are smaller than the input date and take the maximum of these. This will give you the closest date before.
Symmetrically, you can query the minimum of the larger dates to get the closest date after. And keep the preferred of the two.
These should be efficient queries.
SELECT MAX(Date)
FROM MyDates
WHERE Date <= InputDate;

I would try getting a record with the maximum date smaller then the given one from database directly (this can be done with SQL). If you put an index in your database on date then this can be done in O(log(n)). That's of course not really the same as "being closest" but if you combine it with "the minimum date bigger then the given one" you will achieve it.
Also if you know more or less the distribution of your data, for example that in each 7 consecutive days you have some data, then you can restrict to a smaller range of data like [-3 days, +3 days].
Combining both of these solutions should give you quite nice performance.

What model should a SQLalchemy database column be to contain an array of data?

So I am trying to set up a database, the rows of which will be modified frequently. Every hour, for instance, I want to add a number to a particular part of my database. So if self.checkmarks is entered into the database equal to 3, what is the best way to update this part of the database with an added number to make self.checkmarks now equal 3, 2? I tried establishing the column as db.Array but got an attribute error:
AttributeError: 'SQLAlchemy' object has no attribute 'Array'
I have found how to update a database, but I do not know the best way to update by adding to a list rather than replacing. My approach was as follows, but I don't think append will work because the column cannot be an array:
ven = data.query.filter_by(venid=ven['id']).first()
ven.totalcheckins = ven.totalcheckins.append(ven['stats']['checkinsCount'])
db.session.commit()
Many thanks in advance

If you really want to have a python list as a Column in SQLAlchemy you will want to have a look at the PickleType:
array = db.Column(db.PickleType(mutable=True))
Please note that you will have to use the mutable=True parameter to be able to edit the column. SQLAlchemy will detect changes automatically and they will be saved as soon as you commit them.
If you want the pickle to be human-readable you can combine it with json or other converters that suffice your purposes.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Mongodb get new values from collection without timestamp - python

Related

ibm_db.execute- how to get the result set

not getting integer type in elastic

Django - Time aggregates of DatetimeField across queryset

Algorithm to determine the closest date to some date input

What model should a SQLalchemy database column be to contain an array of data?

Categories

Resources