Make a "matrix" from Django query - python

I have a model similar to this one:
class MyModel(models.Model):
name = models.CharField(max_length = 30)
a = models.ForeignKey(External)
b = models.ForeignKey(External, related_name='MyModels_a')
def __unicode__(self):
return self.a + self.b.name + self.b.name
So when I query it I get something like this:
>>> MyModel.objects.all()
[<MyModel: Name1AB>,<MyModel: Name2AC>,<MyModel: Name3CB>,<MyModel: Name4BA>,<MyModel: Name5BA>]
And I'd like to represent this data similar to the following.
[[ [] , [Name1AB] , [Name2AC] ]
[ [Name4BA, Name5BA] , [] , [] ]
[ [] , [Name3CB] , [] ]]
As you can see the rows would be 'a' in the model; and the columns would be 'b'
I can do this, but it takes a long of time because in the real database I have a lot of data. I'd like to know if there's a Django built in way to do this.
I'm doing it like this:
mymodel_list = MyModel.objects.all()
external_list = External.objects.all()
for i in external_list:
for j in external_list:
print(mymodel_list.filter(a=i).filter(arrl=j).all(),end='')
print()
Thanks

Three ways of doing it but you will have to research a bit more. The third option one may be the most suitable for what you are looking for.
1) Django queries
The reason it is taking a long time is because you are constantly accessing the database in this line:
print(mymodel_list.filter(a=i).filter(arrl=j).all(),end='')
You may have to start reading what the Django documentation say about the way of working with queries. For what you are doing you have to create the algorithm to avoid the filters. Using MyModel.objects.order_by('a') may help you to build a efficient algorithm.
2) {% ifchanged ...%} tag
I suppose you are using print to post your answer but you probably need it in html. In that case, you may want to read about the ifchanged tag. It will allow you to build your matrix in html with just one db access.
3) Many to many relations
It seems you are sort of simulating a many to many relation in a very peculiar way. Django has support for many to many relations. You will need an extra field, so you will also have to read this.
Finally, for doing what you are trying with just one access to the database, you will need to read prefetch_related

There's no built-in way (because what you need is not common). You should write it manually, but I'd recommend retrieving full dataset (or at least dataset for one table line) and process it in Python instead of hitting DB in each table cell.

Related

Sort by a value in a many to one field with a django queryset?

I have a data model like this:
class Post(models.Model)
name = models.CharField(max_length=255)
class Tag(models.Model)
name = models.CharField(max_length=255)
rating = models.FloatField(max_length=255)
parent = models.ForeignKey(Post, related_name="tags")
I want to get Posts that have a tag, and order them by the tags rating.
something like:
Posts.objects.filter(tags__name="exampletag").order_by("tags(name=exampletag)__rating")
Currently, I am thinking it makes sense to do something like
tags = Tags.objects.filter(name="sometagname").order_by("rating")[0:10]
posts = [t.parent for t in tags]
But I like to know if there is a better way, preferably querying Post, and getting me back a queryset.
Edit:
I don't think this: (Edit 2 - this does give the correct sorting!)
Posts.objects.filter(tags__name="exampletag").order_by("tags__rating")
will give the correct sorting, as it does not sort only by the related item with name "exampletag"
Something like the following would be needed
Posts.objects.filter(tags__name="exampletag").order_by("tags(name=exampletag)__rating")
I've been looking over the django docs, and it seem "annotate" nearly works - but I don't see a way to use it to select a tag by name.
Edit 2
Both the Answers are correct! See my comments to observe some epic brain-farts (one test, the results WERE in order, the other i filter and sort by different tags!)
how it works
the query
Posts.objects.filter(tags__name="exampletag").order_by("tags__rating")
and
Posts.objects.filter(tags__name="exampletag").filter(tags__name="someothertag").order_by("tags__rating")
will work correctly and by sorted by the rating of "exampletag"
it seems the tag(From a ForeignKey BackReference Set) used for sorting when calling order_by is the one in the first filter.
You can do like:
tags = Tags.objects.filter(name="sometagname")
posts = Post.objects.filter(tags__in=tags).order_by('tags__rating')
Even shorter than Anush's, with a JOIN rather than a subquery:
Post.objects.filter(tags__name='exampletag').order_by('tags__rating')

Best practices method of implementing a django OR query from an iterable?

I am implementing a one off data importer where I need to search for existing slugs. The slugs are in an array. What is the accepted best practices way of converting an array to an OR query?
I came up with the following, which works, but feels like way too much code to accomplish something this simple.
# slug might be an array or just a string
# ex:
slug = [ "snakes", "snake-s" ] # in the real world this is generated from directory structure on disk
# build the query
query = MyModel.objects
if hasattr(slug, "__iter__"):
q = Q()
for s in slug:
q = q.__or__(Q(slug=s))
query = query.filter(q)
else:
query = query.filter(slug=slug)
slug = ["snakes", "snake-s" ] # in the real world this is generated from directory structure on disk
# build the query
query = MyModel.objects
if hasattr(slug, "__iter__"):
q_list = []
for s in slug:
q_list.append(Q(slug=s))
query = query.filter(reduce(operator.or_, q_list))
else:
query = query.filter(slug=slug)
q_list = [] create a list of Q clauses
reduce(operator.or_, q_list) implode the list with or operators
read this: http://www.michelepasin.org/techblog/2010/07/20/the-power-of-djangos-q-objects/
#MostafaR - sure we could crush my entire codeblock down to one line if we wanted (see below). Its not very readable anymore at that level though. saying code isn't "Pythonic" just because it hadn't become reduced and obsfucated is silly. Readable code is king IMHO. Its also important to keep in mind the purpose of my answer was to show the reduce by an operator technique. The rest of my answer was fluff to show that technique in context to the original question.
result = MyModel.objects.filter(reduce(operator.or_, [Q(slug=s) for s in slug])) if hasattr(slug, "__iter__") else MyModel.objects.filter(slug=slug)
result = MyModel.objects.filter(slug__in=slug).all() if isinstance(slug, list) else MyModel.objects.filter(slug=slug).all()
I believe in this case you should use django's __in field lookup like this:
slugs = [ "snakes", "snake-s" ]
objects = MyModel.objects.filter(slug__in=slugs)
The code that you posted will not work in many ways (but I am not sure if it should be more pseudocode?), but from what I understand, this might help:
MyModel.objects.filter(slug__in=slug)
should do the job.

Compare two files and make a list

I have two files that I want to compare with each other and form a list. Each file have their own class. Book and Person. In these, I have different attributes. The ones I want to compare are: person.personalcode == book.borrowed. From this I want a list of all the borrowed books. I have started like this:
for person in person_list:
for book in booklibrary_list:
if person.personalcode == book.borrowed:
person.books.append(book, person)
for person in person_list:
if len(person.books) > 0:
print(person.personalcode + "," + person.firstname + person.lastname + "have borrowed the following books: ")
for book in person.books:
print(book)
for person in person_list:
person.books = []
But it does not work, what have I missed or done wrong?
Posting as an answer as this is too long for a comment.
First: improve your question. Show how you construct the Person and the Book class, and how you populate them. Describe what the personalcode is and how come personalcode would be the same as a book code. Some sample data and a bit more code would make this easier to answer.
Second: reading your other question, you seem to be storing your data in a text file, loading and querying, modifying and saving the data directly. This will lead you to problems and instead you should consider going down one of two lines:
Use an SQL database, possibly the easiest to start with is SQLite as it does not need a server to be set up and there is a module in the standard library that is very easy to use. Store your data there and you will find it easier in the long run.
Use Python objects (e.g. three classes: Person, Book, and BorrowedBook), manage lists of them within the program, and use shelve from the standard library to store and retrieve these lists of objects between queries.
The use of shelve would be easier if you have not used SQL before, and I hope you will forgive the pun when I say that it might be very appropriate for a book-related application!

How to implement full text search in Django?

I would like to implement a search function in a django blogging application. The status quo is that I have a list of strings supplied by the user and the queryset is narrowed down by each string to include only those objects that match the string.
See:
if request.method == "POST":
form = SearchForm(request.POST)
if form.is_valid():
posts = Post.objects.all()
for string in form.cleaned_data['query'].split():
posts = posts.filter(
Q(title__icontains=string) |
Q(text__icontains=string) |
Q(tags__name__exact=string)
)
return archive_index(request, queryset=posts, date_field='date')
Now, what if I didn't want do concatenate each word that is searched for by a logical AND but with a logical OR? How would I do that? Is there a way to do that with Django's own Queryset methods or does one have to fall back to raw SQL queries?
In general, is it a proper solution to do full text search like this or would you recommend using a search engine like Solr, Whoosh or Xapian. What are their benefits?
I suggest you to adopt a search engine.
We've used Haystack search, a modular search application for django supporting many search engines (Solr, Xapian, Whoosh, etc...)
Advantages:
Faster
perform search queries even without querying the database.
Highlight searched terms
"More like this" functionality
Spelling suggestions
Better ranking
etc...
Disadvantages:
Search Indexes can grow in size pretty fast
One of the best search engines (Solr) run as a Java servlet (Xapian does not)
We're pretty happy with this solution and it's pretty easy to implement.
Actually, the query you have posted does use OR rather than AND - you're using \ to separate the Q objects. AND would be &.
In general, I would highly recommend using a proper search engine. We have had good success with Haystack on top of Solr - Haystack manages all the Solr configuration, and exposes a nice API very similar to Django's own ORM.
Answer to your general question: Definitely use a proper application for this.
With your query, you always examine the whole content of the fields (title, text, tags). You gain no benefit from indexes, etc.
With a proper full text search engine (or whatever you call it), text (words) is (are) indexed every time you insert new records. So queries will be a lot faster especially when your database grows.
SOLR is very easy to setup and integrate with Django. Haystack makes it even simpler.
For full text search in Python, look at PyLucene. It allows for very complex queries. The main problem here is that you must find a way to tell your search engine which pages changed and update the index eventually.
Alternatively, you can use Google Sitemaps to tell Google to index your site faster and then embed a custom query field in your site. The advantage here is that you just need to tell Google the changed pages and Google will do all the hard work (indexing, parsing the queries, etc). On top of that, most people are used to use Google to search plus it will keep your site current in the global Google searches, too.
I think full text search on an application level is more a matter of what you have and how you expect it to scale. If you run a small site with low usage I think it might be more affordable to put some time into making an custom full text search rather than installing an application to perform the search for you. And application would create more dependency, maintenance and extra effort when storing data. By making your search yourself and you can build in nice custom features. Like for example, if your text exactly matches one title you can direct the user to that page instead of showing the results. Another would be to allow title: or author: prefixes to keywords.
Here is a method I've used for generating relevant search results from a web query.
import shlex
class WeightedGroup:
def __init__(self):
# using a dictionary will make the results not paginate
# but it will be a lot faster when storing data
self.data = {}
def list(self, max_len=0):
# returns a sorted list of the items with heaviest weight first
res = []
while len(self.data) != 0:
nominated_weight = 0
for item, weight in self.data.iteritems():
if weight > nominated_weight:
nominated = item
nominated_weight = weight
self.data.pop(nominated)
res.append(nominated)
if len(res) == max_len:
return res
return res
def append(self, weight, item):
if item in self.data:
self.data[item] += weight
else:
self.data[item] = weight
def search(searchtext):
candidates = WeightedGroup()
for arg in shlex.split(searchtext): # shlex understand quotes
# Search TITLE
# order by date so we get most recent posts
query = Post.objects.filter_by(title__icontains=arg).order_by('-date')
arg_hits = query.count() # count is cheap
if arg_hits > 1000:
continue # skip keywords which has too many hits
# Each of these are expensive as it would transfer data
# from the db and build a python object,
for post in query[:50]: # so we limit it to 50 for example
# more hits a keyword has the lesser it's relevant
candidates.append(100.0 / arg_hits, post.post_id)
# TODO add searchs for other areas
# Weight might also be adjusted with number of hits within the text
# or perhaps you can find other metrics to value an post higher,
# like number of views
# candidates can contain a lot of stuff now, show most relevant only
sorted_result = Post.objects.filter_by(post_id__in=candidates.list(20))

Aggregating across columns in Django

I'm trying to figure out if there's a way to do a somewhat-complex aggregation in Django using its ORM, or if I'm going to have to use extra() to stick in some raw SQL.
Here are my object models (stripped to show just the essentials):
class Submission(Models.model)
favorite_of = models.ManyToManyField(User, related_name="favorite_submissions")
class Response(Models.model)
submission = models.ForeignKey(Submission)
voted_up_by = models.ManyToManyField(User, related_name="voted_up_responses")
What I want to do is sum all the votes for a given submission: that is, all of the votes for any of its responses, and then also including the number of people who marked the submission as a favorite.
I have the first part working using the following code; this returns the total votes for all responses of each submission:
submission_list = Response.objects\
.values('submission')\
.annotate(votes=Count('voted_up_by'))\
.filter(votes__gt=0)\
.order_by('-votes')[:TOP_NUM]
(So after getting the vote total, I sort in descending order and return the top TOP_NUM submissions, to get a "best of" listing.)
That part works. Is there any way you can suggest to include the number of people who have favorited each submission in its votes? (I'd prefer to avoid extra() for portability, but I'm thinking it may be necessary, and I'm willing to use it.)
EDIT: I realized after reading the suggestions below that I should have been clearer in my description of the problem. The ideal solution would be one that allowed me to sort by total votes (the sum of voted_up_by and favorited) and then pick just the top few, all within the database. If that's not possible then I'm willing to load a few of the fields of each response and do the processing in Python; but since I'll be dealing with 100,000+ records, it'd be nice to avoid that overhead. (Also, to Adam and Dmitry: I'm sorry for the delay in responding!)
One possibility would be to re-arrange your current query slightly. What if you tried something like the following:
submission_list = Response.objects\
.annotate(votes=Count('voted_up_by'))\
.filter(votes__gt=0)\
.order_by('-votes')[:TOP_NUM]
submission_list.query.group_by = ['submission_id']
This will return a queryset of Response objects (objects with the same Submission will be lumped together). In order to access the related submission and/or the favorite_of list/count, you have two options:
num_votes = submission_list[0].votes
submission = submission_list[0].submission
num_favorite = submission.favorite_of.count()
or...
submissions = []
for response in submission_list:
submission = response.submission
submission.votes = response.votes
submissions.append(submission)
num_votes = submissions[0].votes
submission = submissions[0]
num_favorite = submission.favorite_of.count()
Basically the first option has the benefit of still being a queryset, but you have to be sure to access the submission object in order to get any info about the submission (since each object in the queryset is technically a Response). The second option has the benefit of being a list of the submissions with both the favorite_of list as well as the votes, but it is no longer a queryset (so be sure you don't need to alter the query anymore afterwards).
You can count favorites in another query like
favorite_list = Submission.objects.annotate(favorites=Count(favorite_of))
After that you add the values from two lists:
total_votes = {}
for item in submission_list:
total_votes[item.submission.id] = item.voted_by
for item in favorite_list:
has_votes = total_votes.get(item.id, 0)
total_votes[item.id] = has_votes + item.favorites
I am using ids in the dictionary because Submission objects will not be identical. If you need the Submissions themselves, you may use one more dictionary or store tuple (submission, votes) instead of just votes.
Added: this solution is better than the previous because you have only two DB requests.

Categories

Resources