How to query a couchdb view using a composite key? - python

I have a couchdb view "record_by_date_product" with the following definition:
function(doc) {
emit([doc.logtime, doc.product_id], doc);
}
I am trying to run a query which is something like:
(logtime > fromdate & logtime < todate) & product_id in (1,2,6)
Is this possible with this view?
I am also using couchdb python library to access couchdb. Here is a code snippet:
server = couchdb.Server()
db = server['mydb']
results = db.view('_design/record_by_date_product/_view/record_by_date_product')
This page http://packages.python.org/CouchDB/client.html#viewresults specifies that we can use a startkey and endkey. But I am not able to get it working.
Thanks

I think I just found the exact answer:
Design a view 'sampleview' which is like:
{
"records_by_date_product": {
"map": "function(doc) {\n emit([doc.prod_id, doc.logtime], doc);\n}"
}
}
Let us say that the query parameters are:
prod_id in [1,3]
from_date = '2010-01-01 00:00:00'
to_date = '2010-01-02 00:00:00'
Then you will have to run 2 separate queries on the same view:
http://localhost:5984/db/_design/sampleview/_view/records_by_date_product?startkey='\["1,2010-01-01%2000:00:00"\]'&endkey='\[1,"2010-01-02%2000:00:00"\]'
http://localhost:5984/db/_design/sampleview/_view/records_by_date_product?startkey='\[2,"2010-01-01%2000:00:00"\]'&endkey='\[2,"2010-01-02%2000:00:00"\]'
Notice that the same query is run each time except that the prod_id is changed in the second query. The results have to be collated later. Hope this helps!

That exact query is not possible. As the documentation suggests, you can get everything in a view in a particular key range. Views are sorted data structures, so all CouchDB does to fulfill this request is locate the start key and begin returning items until you hit the end key.
The strategy you should use for this query depends on characteristics of the data itself. Most importantly, will you waste a lot of time weeding out items if you use only the first part of the key (logtime) and iterate through those in Python, weeding out items where product_id won't match? If so, you should consider writing another view that is primarily sorted by product_id. If not, go ahead and use the weed-out approach.

How about this solution:
I create a view for each product with logtime as the index.
Access each view if required and filter theresults using the range - [fromdate todate]
Do 3 for each product in the input parameters and collate the results
This has a drawback that for every product we will have to create a view and this looks like a manual process.
Just a thought! Let me know your views.

Related

MongoDB & web2py: Working with ObjectIds

I'm working on a very simple application as a use case for integrating MongoDB with web2py. In one section of the application, I'm interested in returning a list of products:
My database table:
db.define_table('products',
Field('brand', label='Brand'),
Field('photo', label='Photo'),
...
Field('options', label='Options'))
My controller:
def products():
qset = db(db['products'])
grid = qset.select()
return dict(grid=grid)
My view:
{{extend 'layout.html'}}
<h2>Product List</h2>
{{=grid}}
The products are returned without issue. However, the products._id field returns values in the form '26086541625969213357181461154'. If I switch to the shell (or python) and attempt to query my database based on those _ids, I can't find any of the products.
As you would expect, the _ids in the database are ObjectIds that look like this '544a481b2ceb7c3093a173a2'. I'd like to my view to return the ObjectIds and not the long strings. Simple, but I'm having trouble with it.
When constructing the DAL Row object for a given MongoDB record, the ObjectId is represented by converting to a long integer via long(str(value), 16). To convert back to an ObjectId, you can use the object_id method of the MongoDB adapter:
object_id = db._adapter.object_id('26086541625969213357181461154')
Of course, if you use the DAL to query MongoDB, you don't have to worry about this, as it handles the conversion automatically.
Although it makes perfect sense, I wasn't able to make Anthony's answer work. So, I just hacked it:
hex(value).replace("0x","").replace("L","")

How to Group by id and Order By count in Django

I'm having trouble converting writing the correct Python script that does what I can accomplish in MYSQL
Below is the SQL query that accomplish exactly what I want. Where I get tripped up in python the the GROUP BY statement.
SELECT COUNT(story_id) AS theCount, `headline`, `url` from tracking
GROUP BY `story_id`
ORDER BY theCount DESC
LIMIT 20
Here's What I have in python so far. This queries all of the articles just fine, but it's lacking any kind of groupby() or order_by() based on COUNT.
articles = ArticleTracking.objects.all().filter(date__range=(start_date, end_date))[:20]
article_info = []
for article in articles:
this_value = {
"story_id":article.story_id,
"url":article.url,
"headline":article.headline,
}
article_info.append(this_value)
The right way to do this is to use aggregation.
articles = ArticleTracking.objects.filter(date__range=(start_date, end_date))
articles = articles.values('story_id', 'url', 'headline').annotate(count = Count('story_id')).order_by('-count')[:20]
Also go through the aggregation documentation in Django.
https://docs.djangoproject.com/en/dev/topics/db/aggregation/
Don't try this at home.
You can add a group_by clause to a queryset like this:
qs = ArticleTracking.objects.all().filter(date__range=(start_date, end_date))
qs.query.group_by = ['story_id']
articles = qs[:20]
This is not part of the public api, so it may change, and it may work differently (or be unavailable) depending on the particular db backend you're using. Worth mentioning that I'm not sure if applying the group_by clause before or after the filter makes any difference. I have had success with this with a MySQL backend, though.

Mongoengine... How can I compare two fields?

For example..
class Example(Document):
up = IntField()
down = IntField()
and.. I want to retrieve documents whose up field is greater or equal to down.
But.. this is issue.
My wrong query code would be..
Example.objects(up__gte=down)
How can I use a field that resides in mongodb not python code as a queryset value?
Simple answer: not possible. Something like WHERE A = B in SQL is not doable in an efficient way in MongoDB (apart from using the $where clause which should be avoided).
this may be what you wanted::
db.myCollection.find( { $where: "this.credits == this.debits" } );
have a look at: http://docs.mongodb.org/manual/reference/operator/query/where/
but I donot know how to use it in mongoengine.

What is the proper way to perform a contextual search against NoSQL key-value pairs?

With MySQL, I might search through a table "photos" looking for matching titles as follows:
SELECT *
FROM photos
WHERE title LIKE '[string]%';
If the field "title" is indexed, this would perform rather efficiently. I might even set a FULLTEXT index on the title field to perform substring matching.
What is a good strategy for performing a similar search against a NoSQL table of photos, like Amazon's DynamoDB, in the format:
{key} -> photo_id,
{value} -> {photo_id = 2332532532235,
title = 'this is a title'}
I suppose one way would be to search the contents of each entry's value and return matches. But this seems pretty inefficient, especially when the data set gets very large.
Thanks in advance.
I can give you a Mongo shell example.
From the basic tutorial on MongoDB site:
j = { name : "mongo" };
t = { x : 3 };
db.things.save(j);
db.things.save(t);
So you now have a collection called things and have stored two documents in it.
Suppose you now want to do the equivalent of
SELECT * FROM things WHERE name like 'mon%'
In SQL, this would have returned you the "mongo" record.
In Mongo Shell, you can do this:
db.things.find({name:{$regex:'mon'}}).forEach(printjson);
This returns the "mongo" document.
Hope this helps.
Atish

Django - SQL Query - Timestamp

Can anyone turn me to a tutorial, code or some kind of resource that will help me out with the following problem.
I have a table in a mySQL database. It contains an ID, Timestamp, another ID and a value. I'm passing it the 'main' ID which can uniquely identify a piece of data. However, I want to do a time search on this piece of data(therefore using the timestamp field). Therefore what would be ideal is to say: between the hours of 12 and 1, show me all the values logged for ID = 1987.
How would I go about querying this in Django? I know in mySQL it'd be something like less than/greater than etc... but how would I go about doing this in Django? i've been using Object.Filter for most of database handling so far. Finally, I'd like to stress that I'm new to Django and I'm genuinely stumped!
If the table in question maps to a Django model MyModel, e.g.
class MyModel(models.Model):
...
primaryid = ...
timestamp = ...
secondaryid = ...
valuefield = ...
then you can use
MyModel.objects.filter(
primaryid=1987
).exclude(
timestamp__lt=<min_timestamp>
).exclude(
timestamp__gt=<max_timestamp>
).values_list('valuefield', flat=True)
This selects entries with the primaryid 1987, with timestamp values between <min_timestamp> and <max_timestamp>, and returns the corresponding values in a list.
Update: Corrected bug in query (filter -> exclude).
I don't think Vinay Sajip's answer is correct. The closest correct variant based on his code is:
MyModel.objects.filter(
primaryid=1987
).exclude(
timestamp__lt=min_timestamp
).exclude(
timestamp__gt=max_timestamp
).values_list('valuefield', flat=True)
That's "exclude the ones less than the minimum timestamp and exclude the ones greater than the maximum timestamp." Alternatively, you can do this:
MyModel.objects.filter(
primaryid=1987
).filter(
timestamp__gte=min_timestamp
).exclude(
timestamp__gte=max_timestamp
).values_list('valuefield', flat=True)
exclude() and filter() are opposites: exclude() omits the identified rows and filter() includes them. You can use a combination of them to include/exclude whichever you prefer. In your case, you want to exclude() those below your minimum time stamp and to exclude() those above your maximum time stamp.
Here is the documentation on chaining QuerySet filters.

Categories

Resources