I'm developing a web app using Django, and I'll need to add search functionality soon. Search will be implemented for two models, one being an extension of the auth user class and another one with the fields name, tags, and description. So I guess nothing too scary here in context of searching text.
For development I am using SQLite and as no database specific work has been done, I am at liberty to use any database in production. I'm thinking of choosing between PostgreSQL or MySQL.
I have gone through several posts on Internet about search solutions, nevertheless I'd like to get opinions for my simple case. Here are my questions:
is full-text search an overkill in my case?
is it better to rely on the database's full-text search support? If so, which database should I use?
should I use an external search library, such as Whoosh, Sphinx, or Xapian? If so, which one?
EDIT:
tags is a Tagfield (from the django-tagging app) that sits on a m2m relationship. description is a field that holds HTML and has a max_length of 1024 bytes.
If that field tags means what I think it means, i.e. you plan to store a string which concatenates multiple tags for an item, then you might need full-text search on it... but it's a bad design; rather, you should have a many-many relationship between items and a tags table (in another table, ItemTag or something, with 2 foreign keys that are the primary keys of the items table and tags table).
I can't tell whether you need full-text search on description as I have no indication of what it is -- nor whether you need the reasonable but somewhat rudimentary full-text search that MySQL 5.1 and PostgreSQL 8.3 provide, or the more powerful one in e.g. sphinx... maybe talk a bit more about the context of your app and why you're considering full-text search?
Edit: so it seems the only possible need for full-text search might be on description, and that looks like it's probably limited enough that either MySQL 5.1 or PostgreSQL 8.3 will serve it well. Me, I have a sweet spot for PostgreSQL (even though I'm reasonably expert at MySQL too), but that's a general preference, not specifically connected to full-text search issues. This blog does provide one reason to prefer PostgreSQL: you can have full-text search and still be transactional, while in MySQL full-text indexing only work on MyISAM tables, not InnoDB [[except if you add sphinx, of course]] (also see this follow-on for a bit more on full-text search in PostgreSQL and Lucene). Still, there are of course other considerations involved in picking a DB, and I don't think you'll be doing terribly with either (unless having to add sphinx for full-text plus transaction is a big problem).
Django has full text searching support in its QuerySet filters. Right now, if you only have two models that need searching, just make a view that searches the fields on both:
search_string = "+Django -jazz Python"
first_models = FirstModel.objects.filter(headline__search=search_string)
second_models = SecondModel.objects.filter(headline__search=search_string)
You could further filter them to make sure the results are unique, if necessary.
Additionally, there is a regex filter that may be even better for dealing with your html fields and tags since the regex can instruct the filter on exactly how to process any delimiters or markup.
Whether you need an external library depends on your needs. How much traffic are we talking about? The external libraries are generally better when it comes to performance, but as always there are advantages and disadvantages. I am using Sphinx with django-sphinx plugin, and I would recommend it if you will be doing a lot of searching.
Haystack looks promising. And it supports Whoosh on the back end.
Related
For working with data from a database inside of Python programmes, we generally use Object Relational Mappers, to translate database entries into python objects we can work with, with sqlAlchemy and Django Models probably being the most common and advanced ORMs.
Are there ORMs that do not connect to a database but to a third party (JSON) REST API instead? I would like to have a framework which lets me deal with Python objects to perform CRUD operations on the API. This should have all the well-established standard functionalities of an ORM, including Unit of Work and Lazy Loading. Actually, I would want my python code to be agnostic about whether the model is stored in a database or being fetched from a third party API.
It is hard for me to imagine that such a thing does not yet exist. But I am not able to find it. Maybe I am not knowing the right words to search for it?
ORMs Frameworks are frameworks that connect to databases. From your description you are talking about a DAO pattern, not about a Framework. This is a common programming pattern in other languages such as Java.
The right words or searches would be:
Search for the DAO pattern, what to expect from it and how to code it.
Check a couple of links on examples of DAO patterns in python such as this one, or this other one
Analyze your specific problem. You might not need all the code other solutions offer you. And you might be better off coding yourself your own class adjusted to your needs.
Remember KISS and DRY.
PS: Different languages use different paradigms, it is a common error to try to extrapolate patterns and coding uses from one language to another. So something that is solved in e.g. Java in a way, might not be the best option for Python. Keep that in mind too.
I am creating a food search. I want to simply be able to type a food into a search box and have it return results. I also want to be able to add priority to certain terms so that they show up. For example, searching for "orange" would bring up the fruit first as opposed to the juice.
I haven't been able to determine the better search solution for this scenario in django.
Let me know which is the better solution for this scenario.
I'm the current maintainer of Django-SphinxQL, an implementation for Sphinx in Django, and maintainer of the Xapian backend for Haystack.
I recommend using Haystack:
Haystack allows you to choose between different backends, support most standard features of search (e.g. highlight), and already stood the test of time on search engines for Django.
Django-SphinxQL is in pre-alpha (other implementations such has Django-Sphinx have stalled), and only support a minimal set of functionality.
The only reason I see to choose Sphinx search in detriment of Haystack (e.g. using Django-SphinxQL) is if you specifically have a use case where Sphinx is clearly superior to any Haystack backend.
For instance, Sphinx is known to be very fast indexing in the plain index, but it requires you to re-index everything when you update the database. This particular setup is very convenient for me because I'm using it to index a database that only changes once a day.
Here is my situation. I used Python, Django and MySQL for a web development.
I have several tables for form posting, whose fields may change dynamically. Here is an example.
Like a table called Article, it has three fields now, called id INT, title VARCHAR(50), author VARCHAR(20), and it should be able store some other values dynamically in the future, like source VARCHAR(100) or something else.
How can I implement this gracefully? Is MySQL be able to handle it? Anyway, I don't want to give up MySQL totally, for that I'm not really familiar with NoSQL databases, and it may be risky to change technique plan in the process of development.
Any ideas welcome. Thanks in advance!
You might be interested in this post about FriendFeed's schemaless SQL approach.
Loosely:
Store documents in JSON, extracting the ID as a column but no other columns
Create new tables for any indexes you require
Populate the indexes via code
There are several drawbacks to this approach, such as indexes not necessarily reflecting the actual data. You'll also need to hack up django's ORM pretty heavily. Depending on your requirements you might be able to keep some of your fields as pure DB columns and store others as JSON?
I've never actually used it, but django-not-eav looks like the tool for the job.
"This app attempts the impossible: implement a bad idea the right way." I already love it :)
That said, this question sounds like a "rethink your approach" situation, for sure. But yes, sometimes that is simply not an option...
i'm working on a project (written in Django) which has only a few entities, but many rows for each entity.
In my application i have several static "reports", directly written in plain SQL. The users can also search the database via a generic filter form. Since the target audience is really tech-savvy and at some point the filter doesn't fit their needs, i think about creating a query language for my database like YQL or Jira's advanced search.
I found http://sourceforge.net/projects/littletable/ and http://www.quicksort.co.uk/DeeDoc.html, but it seems that they only operate on in-memory objects. Since the database can be too large for holding it in-memory, i would prefer that the query is translated in SQL (or better a Django query) before doing the actual work.
Are there any library or best practices on how to do this?
Writing such a DSL is actually surprisingly easy with PLY, and what ho—there's already an example available for doing just what you want, in Django. You see, Django has this fancy thing called a Q object which make the Django querying side of things fairly easy.
At DjangoCon EU 2012, Matthieu Amiguet gave a session entitled Implementing Domain-specific Languages in Django Applications in which he went through the process, right down to implementing such a DSL as you desire. His slides, which include all you need, are available on his website. The final code (linked to from the last slide, anyway) is available at http://www.matthieuamiguet.ch/media/misc/djangocon2012/resources/compiler.html.
Reinout van Rees also produced some good comments on that session. (He normally does!) These cover a little of the missing context.
You see in there something very similar to YQL and JQL in the examples given:
groups__name="XXX" AND NOT groups__name="YYY"
(modified > 1/4/2011 OR NOT state__name="OK") AND groups__name="XXX"
It can also be tweaked very easily; for example, you might want to use groups.name rather than groups__name (I would). This modification could be made fairly trivially (allow . in the FIELD token, by modifying t_FIELD, and then replacing . with __ before constructing the Q object in p_expression_ID).
So, that satisfies simple querying; it also gives you a good starting point should you wish to make a more complex DSL.
I've faced exactly this problem - a large database which needs searching. I made some static reports and several fancy filters using django (very easy with django) just like you have.
However the power users were clamouring for more. I decided that there already was a DSL that they all knew - SQL. The question was how to make it secure enough.
So I used django permissions to give the power users permission to make SQL queries in a new table. I then made a view for the not-quite-so-power users to use these queries. I made them take optional parameters. The queries were run using Python's lower level DB-API which django is using under the hood for its ORM anyway.
The real trick was opening a read only database connection to run these queries just to make sure that no updates were ever run. I made a read only connection by creating a different user in the database with lower permissions and opening a specific connection for that in the view.
TL;DR - SQL is the way to go!
Depending on the form of your data, the types of queries your users need to use, and the frequency that your data is updated, an alternative to the pure SQL solution suggested by Nick Craig-Wood is to index your data in Solr and then run queries against it.
Solr is an added layer of complexity (configuration, data synchronization) but it is super-fast, can handle large datasets, and provides a (relatively) intuitive query language.
You could write your own SQL-ish language using pyparsing, actually. There is even pretty verbose example you could extend.
I am in need of a lightweight way to store dictionaries of data into a database. What I need is something that:
Creates a database table from a simple type description (int, float, datetime etc)
Takes a dictionary object and inserts it into the database (including handling datetime objects!)
If possible: Can handle basic references, so the dictionary can reference other tables
I would prefer something that doesn't do a lot of magic. I just need an easy way to setup and get data into an SQL database.
What would you suggest? There seems to be a lot of ORM software around, but I find it hard to evaluate them.
SQLAlchemy's SQL expression layer can easily cover the first two requirements. If you also want reference handling then you'll need to use the ORM, but this might fail your lightweight requirement depending on your definition of lightweight.
SQLAlchemy offers an ORM much like django, but does not require that you work within a web framework.
From it's description, perhaps Axiom is a pythonic tool for this .
Seeing as you have mentioned sql, python and orm in your tags, are you looking for Django? Of all the web frameworks I've tried, I like this one the best. You'd be looking at models, specifically. This could be too fancy for your needs, perhaps, but that shouldn't stop you looking at the code of Django itself and learning from it.