I am implementing a one off data importer where I need to search for existing slugs. The slugs are in an array. What is the accepted best practices way of converting an array to an OR query?
I came up with the following, which works, but feels like way too much code to accomplish something this simple.
# slug might be an array or just a string
# ex:
slug = [ "snakes", "snake-s" ] # in the real world this is generated from directory structure on disk
# build the query
query = MyModel.objects
if hasattr(slug, "__iter__"):
q = Q()
for s in slug:
q = q.__or__(Q(slug=s))
query = query.filter(q)
else:
query = query.filter(slug=slug)
slug = ["snakes", "snake-s" ] # in the real world this is generated from directory structure on disk
# build the query
query = MyModel.objects
if hasattr(slug, "__iter__"):
q_list = []
for s in slug:
q_list.append(Q(slug=s))
query = query.filter(reduce(operator.or_, q_list))
else:
query = query.filter(slug=slug)
q_list = [] create a list of Q clauses
reduce(operator.or_, q_list) implode the list with or operators
read this: http://www.michelepasin.org/techblog/2010/07/20/the-power-of-djangos-q-objects/
#MostafaR - sure we could crush my entire codeblock down to one line if we wanted (see below). Its not very readable anymore at that level though. saying code isn't "Pythonic" just because it hadn't become reduced and obsfucated is silly. Readable code is king IMHO. Its also important to keep in mind the purpose of my answer was to show the reduce by an operator technique. The rest of my answer was fluff to show that technique in context to the original question.
result = MyModel.objects.filter(reduce(operator.or_, [Q(slug=s) for s in slug])) if hasattr(slug, "__iter__") else MyModel.objects.filter(slug=slug)
result = MyModel.objects.filter(slug__in=slug).all() if isinstance(slug, list) else MyModel.objects.filter(slug=slug).all()
I believe in this case you should use django's __in field lookup like this:
slugs = [ "snakes", "snake-s" ]
objects = MyModel.objects.filter(slug__in=slugs)
The code that you posted will not work in many ways (but I am not sure if it should be more pseudocode?), but from what I understand, this might help:
MyModel.objects.filter(slug__in=slug)
should do the job.
Related
So I'm currently working on a project where I'm trying to improve the code.
Currently, I have this as my views.py
def home1(request):
if request.user.is_authenticated():
location = request.GET.get('location', request.user.profile.location)
users = User.objects.filter(profile__location=location)
print users
matchesper = Match.objects.get_matches_with_percent(request.user)
print matchesper
matches = [match for match in matchesper if match[0] in users][:20]
Currently, users gives me back a list of user that have the same location as the request.user and matchesper gives me a match percentage with all users. Then matches uses these two lists to give me back a list of users with their match percentage and that match with the request.users location
This works perfectly, however as soon the number of users using the website increases this will become very slow? I could add [:50] at the end of matchesper for example but this means you will never match with older users that have the same location as the request.user.
My question is, is there not a way to just create matches with matchesper for only the users that have the same location? Could I use an if statement before matchesper or a for loop?
I haven't written this code but I do understand it, however when trying to improve it I get very stuck, I hope my explanation and question makes sense.
Thank you for any help in advance I'm very stuck!
(I'm assuming you're using the matchmaker project.)
In Django, you can chain QuerySet methods. You'll notice that the models.py file you're working from defines both a MatchQuerySet and a MatchManager. You might also notice that get_matches_with_percent is only defined on the Manager, not the QuerySet.
This is a problem, but not an insurmountable one. One way around it is to modify which QuerySet our manager method actually works on on. We can do this by creating a new method that is basically a copy of get_matches_with_percent, but with some additional filtering.
class MatchManager(models.Manager):
[...]
def get_matches_with_percent_by_location(self, user, location=None):
if location is None:
location = user.profile.location
user_a = Q(user_a__profile__location=location)
user_b = Q(user_b__profile__location=location)
qs = self.get_queryset().filter(user_a | user_b).matches(user).order_by('-match_decimal')
matches = []
for match in qs:
if match.user_a == user:
items_wanted = [match.user_b, match.get_percent]
matches.append(items_wanted)
elif match.user_b == user:
items_wanted = [match.user_a, match.get_percent]
matches.append(items_wanted)
else:
pass
return matches
Note the use of repeated chaining in line 10! That's the magic.
Other notes:
Q objects are a way of doing complex queries, like multiple "OR" conditions.
An even better solution would factor out the elements that are common to get_matches_with_percent and get_matches_with_percent_by_location to keep the code "DRY", but this is good enough for now ;)
Be mindful of the fact that get_matches_with_percent returns a vanilla list instead of a Django QuerySet; it's a "terminal" method. Thus, you can't use any other QuerySet methods (like filter) after invoking get_matches_with_percent.
I use Wagtail serach:
query = self.request.query_params
questions = models.Questions.objects.filter(
answer__isnull=False,
owner__isnull=False).exclude(answer__exact='')
s = get_search_backend()
results = s.search(query[u'question'], questions)
And this is how I set up the indexing of my Questions model:
search_fields = [
index.SearchField('question', partial_match=True, boost=2),
index.FilterField('answer'),
index.FilterField('owner_id')
]
But it case sensitive. So queries how and How will give different results.
I need to make my search behave this way:
When I type either how or How, it should return
how to...
How to...
The way how...
THE WAY HoW...
In other words, it should find all mentions of how in all posible cases.
How do I make it work?
P.S.: I'm using default backend, and I'm free to change it if needed.
With Wagtail's elasticsearch backend, fields indexed with partial_match=True are tokenized in lowercase. So to accomplish case-insensitive search all you need to do is lowercase the query string:
results = s.search(query[u'question'].lower(), questions)
I have a model similar to this one:
class MyModel(models.Model):
name = models.CharField(max_length = 30)
a = models.ForeignKey(External)
b = models.ForeignKey(External, related_name='MyModels_a')
def __unicode__(self):
return self.a + self.b.name + self.b.name
So when I query it I get something like this:
>>> MyModel.objects.all()
[<MyModel: Name1AB>,<MyModel: Name2AC>,<MyModel: Name3CB>,<MyModel: Name4BA>,<MyModel: Name5BA>]
And I'd like to represent this data similar to the following.
[[ [] , [Name1AB] , [Name2AC] ]
[ [Name4BA, Name5BA] , [] , [] ]
[ [] , [Name3CB] , [] ]]
As you can see the rows would be 'a' in the model; and the columns would be 'b'
I can do this, but it takes a long of time because in the real database I have a lot of data. I'd like to know if there's a Django built in way to do this.
I'm doing it like this:
mymodel_list = MyModel.objects.all()
external_list = External.objects.all()
for i in external_list:
for j in external_list:
print(mymodel_list.filter(a=i).filter(arrl=j).all(),end='')
print()
Thanks
Three ways of doing it but you will have to research a bit more. The third option one may be the most suitable for what you are looking for.
1) Django queries
The reason it is taking a long time is because you are constantly accessing the database in this line:
print(mymodel_list.filter(a=i).filter(arrl=j).all(),end='')
You may have to start reading what the Django documentation say about the way of working with queries. For what you are doing you have to create the algorithm to avoid the filters. Using MyModel.objects.order_by('a') may help you to build a efficient algorithm.
2) {% ifchanged ...%} tag
I suppose you are using print to post your answer but you probably need it in html. In that case, you may want to read about the ifchanged tag. It will allow you to build your matrix in html with just one db access.
3) Many to many relations
It seems you are sort of simulating a many to many relation in a very peculiar way. Django has support for many to many relations. You will need an extra field, so you will also have to read this.
Finally, for doing what you are trying with just one access to the database, you will need to read prefetch_related
There's no built-in way (because what you need is not common). You should write it manually, but I'd recommend retrieving full dataset (or at least dataset for one table line) and process it in Python instead of hitting DB in each table cell.
It seems that StringListProperty can only contain strings up to 500 chars each, just like StringProperty...
Is there a way to store longer strings than that? I don't need them to be indexed or anything. What I would need would be something like a "TextListProperty", where each string in the list can be any length and not limited to 500 chars.
Can I create a property like that? Or can you experts suggest a different approach? Perhaps I should use a plain list and pickle/unpickle it in a Blob field, or something like that? I'm a bit new to Python and GAE and I would greatly appreciate some pointers instead of spending days on trial and error...thanks!
Alex already answered long ago, but in case someone else comes along with the same issue:
You'd just make item_type equal to db.Text (as OP mentions in a comment).
Here's a simple example:
from google.appengine.ext import db
class LargeTextList(db.Model):
large_text_list = db.ListProperty(item_type=db.Text)
def post(self):
# get value from a POST request,
# split into list using some delimiter
# add to datastore
L = self.request.get('large_text_list').split() # your delimiter here
LTL = [db.Text(i) for i in L]
new = LargeTextList()
new.large_text_list = LTL
new.put()
def get(self):
# return one to make sure it's working
query = LargeTextList.all()
results = query.fetch(limit=1)
self.render('index.html',
{ 'results': results,
'title': 'LargeTextList Example',
})
You can use a generic ListProperty with an item_type as you require (str, or unicode, or whatever).
I'm trying to figure out if there's a way to do a somewhat-complex aggregation in Django using its ORM, or if I'm going to have to use extra() to stick in some raw SQL.
Here are my object models (stripped to show just the essentials):
class Submission(Models.model)
favorite_of = models.ManyToManyField(User, related_name="favorite_submissions")
class Response(Models.model)
submission = models.ForeignKey(Submission)
voted_up_by = models.ManyToManyField(User, related_name="voted_up_responses")
What I want to do is sum all the votes for a given submission: that is, all of the votes for any of its responses, and then also including the number of people who marked the submission as a favorite.
I have the first part working using the following code; this returns the total votes for all responses of each submission:
submission_list = Response.objects\
.values('submission')\
.annotate(votes=Count('voted_up_by'))\
.filter(votes__gt=0)\
.order_by('-votes')[:TOP_NUM]
(So after getting the vote total, I sort in descending order and return the top TOP_NUM submissions, to get a "best of" listing.)
That part works. Is there any way you can suggest to include the number of people who have favorited each submission in its votes? (I'd prefer to avoid extra() for portability, but I'm thinking it may be necessary, and I'm willing to use it.)
EDIT: I realized after reading the suggestions below that I should have been clearer in my description of the problem. The ideal solution would be one that allowed me to sort by total votes (the sum of voted_up_by and favorited) and then pick just the top few, all within the database. If that's not possible then I'm willing to load a few of the fields of each response and do the processing in Python; but since I'll be dealing with 100,000+ records, it'd be nice to avoid that overhead. (Also, to Adam and Dmitry: I'm sorry for the delay in responding!)
One possibility would be to re-arrange your current query slightly. What if you tried something like the following:
submission_list = Response.objects\
.annotate(votes=Count('voted_up_by'))\
.filter(votes__gt=0)\
.order_by('-votes')[:TOP_NUM]
submission_list.query.group_by = ['submission_id']
This will return a queryset of Response objects (objects with the same Submission will be lumped together). In order to access the related submission and/or the favorite_of list/count, you have two options:
num_votes = submission_list[0].votes
submission = submission_list[0].submission
num_favorite = submission.favorite_of.count()
or...
submissions = []
for response in submission_list:
submission = response.submission
submission.votes = response.votes
submissions.append(submission)
num_votes = submissions[0].votes
submission = submissions[0]
num_favorite = submission.favorite_of.count()
Basically the first option has the benefit of still being a queryset, but you have to be sure to access the submission object in order to get any info about the submission (since each object in the queryset is technically a Response). The second option has the benefit of being a list of the submissions with both the favorite_of list as well as the votes, but it is no longer a queryset (so be sure you don't need to alter the query anymore afterwards).
You can count favorites in another query like
favorite_list = Submission.objects.annotate(favorites=Count(favorite_of))
After that you add the values from two lists:
total_votes = {}
for item in submission_list:
total_votes[item.submission.id] = item.voted_by
for item in favorite_list:
has_votes = total_votes.get(item.id, 0)
total_votes[item.id] = has_votes + item.favorites
I am using ids in the dictionary because Submission objects will not be identical. If you need the Submissions themselves, you may use one more dictionary or store tuple (submission, votes) instead of just votes.
Added: this solution is better than the previous because you have only two DB requests.