I am trying to get all the post in a thread before or on a certain time. So how do I get Django to allow me the privilege to enter my own queries?
This is the closest I could come using Django's model functions.
# need to get all the post from Thread post_set that were created before Thread post_set 9
posts = Thread.post_set.filter(created <= Thread.post_set.all()[9].created)
You can use raw sql like so:
Thread.objects.raw('SELECT ... FROM myapp_thread WHERE ...')
If post_set is a foreign key, then use:
posts = Thread.objects.filter( post_set__created__lt=datetime.date(2013, 5, 10))
If you still want to go with a raw SQL query, as detailed here, please be careful, as no escaping is automatically performed.
Related
I was wondering if there is a way in Django to tell if a related field, specifically the "many" part of a one-to-many relationship, has been fetched via, say, prefetch_related() without actually fetching it?
So, as an example, let's say I have these models:
class Question(Model):
"""Class that represents a question."""
class Answer(Model):
"""Class the represents an answer to a question."""
question = ForeignKey('Question', related_name='answers')
Normally, to get the number of answers for a question, the most efficient way to get this would be to do the following (because the Django docs state that count() is more efficient if you just need a count):
# Note: "question" is an instance of class Question.
answer_count = question.answers.count()
However in some cases the answers may have been fetched via a prefetch_related() call (or some way, such as previously having iterated through the answers). So in situations like that, it would be more efficient to do this (because we'd skip the extra count query):
# Answers were fetched via prefetch_related()
answer_count = len(question.answers.all())
So what I really want to do is something like:
if question.answers_have_been_prefetched: # Does this exist?
answer_count = len(question.answers.all())
else:
answer_count = question.answers.count()
I'm using Django 1.4 if it matters. Thanks in advance.
Edit: added clarification that prefetch_related() isn't the only way the answers could've been fetched.
Yes, Django stores the prefetched results in the _prefetched_objects_cache attribute of the parent model instance.
So you can do something like:
instance = Parent.objects.prefetch_related('children').all()[0]
try:
instance._prefetched_objects_cache[instance.children.prefetch_cache_name]
# Ok, it's pefetched
child_count = len(instance.children.all())
except (AttributeError, KeyError):
# Not prefetched
child_count = instance.children.count()
See the relevant use in the django source trunk or the equivalent in v1.4.9
Say, we have the following relationships:
a person can have many email addresses
a email service provider can (obviously) serve multiple email address
So, it's a many to many relationship. I have three tables: emails, providers, and users. Emails have two foreign ids for provider and user.
Now, given a specific person, I want to print all the email providers and the email address it hosts for this person, if it exists. (If the person do not have an email at Gmail, I still want Gmail be in the result. I believe otherwise I only need a left inner join to solve this.)
I figured out how to do this with the following subqueries (following the sqlalchemy tutorial):
email_subq = db.session.query(Emails).\
filter(Emails.user_id==current_user.id).\
subquery()
provider_and_email = db.session.query(Provider, email_subq).\
outerjoin(email_subq, Provider.emails).\
all()
This works okay (it returns a 4-tuple of (Provider, user_id, provider_id, email_address), all the information that I want), but I later found out this is not using the Flask BaseQuery class, so that pagination provided by Flask-SQLAlchemy does not work. Apparently db.session.query() is not the Flask-SQLAlchemy Query instance.
I tried to do Emails.query.outerjoin[...] but that returns only columns in the email table though I want both the provider info and the emails.
My question: how can I do the same thing with Flask-SQLAlchemy so that I do not have to re-implement pagination that is already there?
I guess the simplest option at this point is to implement my own paginate function, but I'd love to know if there is another proper way of doing this.
I'm not sure if this is going to end up being the long-term solution, and it does not directly address my concern about not using the Flask-SQLAlchemy's BaseQuery, but the most trivial way around to accomplish what I want is to reimplement the paginate function.
And, in fact, it is pretty easy to use the original Flask-SQLAlchemy routine to do this:
def paginate(query, page, per_page=20, error_out=True):
if error_out and page < 1:
abort(404)
items = query.limit(per_page).offset((page - 1) * per_page).all()
if not items and page != 1 and error_out:
abort(404)
# No need to count if we're on the first page and there are fewer
# items than we expected.
if page == 1 and len(items) < per_page:
total = len(items)
else:
total = query.order_by(None).count()
return Pagination(query, page, per_page, total, items)
Modified from the paginate function found around line 376: https://github.com/mitsuhiko/flask-sqlalchemy/blob/master/flask_sqlalchemy.py
Your question is how to use Flask-SQLAlchemy's Pagination with regular SQLAlchemy queries.
Since Flask-SQLAlchemy's BaseQuery object holds no state of its own, and is derived from SQLAlchemy's Query, and is really just a container for methods, you can use this hack:
from flask.ext.sqlalchemy import BaseQuery
def paginate(sa_query, page, per_page=20, error_out=True):
sa_query.__class__ = BaseQuery
# We can now use BaseQuery methods like .paginate on our SA query
return sa_query.paginate(page, per_page, error_out)
To use:
#route(...)
def provider_and_email_view(page):
provider_and_email = db.session.query(...) # any SQLAlchemy query
paginated_results = paginate(provider_and_email, page)
return render_template('...', paginated_results=paginated_results)
*Edit:
Please be careful doing this. It's really just a way to avoid copying/pasting the paginate function, as seen in the other answer. Note that BaseQuery has no __init__ method. See How dangerous is setting self.__class__ to something else?.
*Edit2:
If BaseQuery had an __init__, you could construct one using the SA query object, rather than hacking .__class__.
Hey I have found a quick fix for this here it is:
provider_and_email = Provider.query.with_entities(email_subq).\
outerjoin(email_subq, Provider.emails).paginate(page, POST_PER_PAGE_LONG, False)
I'm currently using this approach:
query = BaseQuery([Provider, email_subq], db.session())
to create my own BaseQuery. db is the SqlAlchemy instance.
Update: as #afilbert suggests you can also do this:
query = BaseQuery(provider_and_email.subquery(), db.session())
How do you init your application with SQLAlchemy?
Probably your current SQLAlchemy connection has nothing to do with flask.ext.sqalchemy and you use original sqlalchemy
Check this tutorial and check your imports, that they really come from flask.ext.sqlalchemy
http://pythonhosted.org/Flask-SQLAlchemy/quickstart.html#a-minimal-application
You can try to paginate the list with results.
my_list = [my_list[i:i + per_page] for i in range(0, len(my_list), per_page)][page]
I did this and it works:
query = db.session.query(Table1, Table2, ...).filter(...)
if page_size is not None:
query = query.limit(page_size)
if page is not None:
query = query.offset(page*page_size)
query = query.all()
I could be wrong, but I think your problem may be the .all(). By using that, you're getting a list, not a query object.
Try leaving it off, and pass your query to the pagination method like so (I left off all the subquery details for clarity's sake):
email_query = db.session.query(Emails).filter(**filters)
email_query.paginate(page, per_page)
We know, that update - is thread safe operation.
It means, that when you do:
SomeModel.objects.filter(id=1).update(some_field=100)
Instead of:
sm = SomeModel.objects.get(id=1)
sm.some_field=100
sm.save()
Your application is relativly thread safe and operation SomeModel.objects.filter(id=1).update(some_field=100) will not rewrite data in other model fields.
My question is.. If there any way to do
SomeModel.objects.filter(id=1).update(some_field=100)
but with creation of object if it does not exists?
from django.db import IntegrityError
def update_or_create(model, filter_kwargs, update_kwargs)
if not model.objects.filter(**filter_kwargs).update(**update_kwargs):
kwargs = filter_kwargs.copy()
kwargs.update(update_kwargs)
try:
model.objects.create(**kwargs)
except IntegrityError:
if not model.objects.filter(**filter_kwargs).update(**update_kwargs):
raise # re-raise IntegrityError
I think, code provided in the question is not very demonstrative: who want to set id for model?
Lets assume we need this, and we have simultaneous operations:
def thread1():
update_or_create(SomeModel, {'some_unique_field':1}, {'some_field': 1})
def thread2():
update_or_create(SomeModel, {'some_unique_field':1}, {'some_field': 2})
With update_or_create function, depends on which thread comes first, object will be created and updated with no exception. This will be thread-safe, but obviously has little use: depends on race condition value of SomeModek.objects.get(some__unique_field=1).some_field could be 1 or 2.
Django provides F objects, so we can upgrade our code:
from django.db.models import F
def thread1():
update_or_create(SomeModel,
{'some_unique_field':1},
{'some_field': F('some_field') + 1})
def thread2():
update_or_create(SomeModel,
{'some_unique_field':1},
{'some_field': F('some_field') + 2})
You want django's select_for_update() method (and a backend that supports row-level locking, such as PostgreSQL) in combination with manual transaction management.
try:
with transaction.commit_on_success():
SomeModel.objects.create(pk=1, some_field=100)
except IntegrityError: #unique id already exists, so update instead
with transaction.commit_on_success():
object = SomeModel.objects.select_for_update().get(pk=1)
object.some_field=100
object.save()
Note that if some other process deletes the object between the two queries, you'll get a SomeModel.DoesNotExist exception.
Django 1.7 and above also has atomic operation support and a built-in update_or_create() method.
You can use Django's built-in get_or_create, but that operates on the model itself, rather than a queryset.
You can use that like this:
me = SomeModel.objects.get_or_create(id=1)
me.some_field = 100
me.save()
If you have multiple threads, your app will need to determine which instance of the model is correct. Usually what I do is refresh the model from the database, make changes, and then save it, so you don't have a long time in a disconnected state.
It's impossible in django do such upsert operation, with update. But queryset update method return number of filtered fields so you can do:
from django.db import router, connections, transaction
class MySuperManager(models.Manager):
def _lock_table(self, lock='ACCESS EXCLUSIVE'):
cursor = connections[router.db_for_write(self.model)]
cursor.execute(
'LOCK TABLE %s IN %s MODE' % (self.model._meta.db_table, lock)
)
def create_or_update(self, id, **update_fields):
with transaction.commit_on_success():
self.lock_table()
if not self.get_query_set().filter(id=id).update(**update_fields):
self.model(id=id, **update_fields).save()
this example if for postgres, you can use it without sql code, but update or insert operation will not be atomic. If you create a lock on table you will be sure that two objects will be not created in two other threads.
I think if you have critical demands on atom operations. You'd better design it in database level instead of Django ORM level.
Django ORM system is focusing on convenience instead of performance and safety. You have to optimize the automatic generated SQL sometimes.
"Transaction" in most productive databases provide database lock and rollback well.
In mashup(hybrid) systems, or say your system added some 3rd part components, like logging, statistics. Application in different framework or even language may access database at the same time, adding thread safe in Django is not enough in this case.
SomeModel.objects.filter(id=1).update(set__some_field=100)
In django, I'm trying to do something like this:
# if form is valid ...
article = form.save(commit=False)
article.author = req.user
product_name = form.cleaned_data['product_name']
try:
article.product = Component.objects.get(name=product_name)
except:
article.product = Component(name=product_name)
article.save()
# do some more form processing ...
But then it tells me:
null value in column "product_id" violates not-null constraint
But I don't understand why this is a problem. When article.save() is called, it should be able the create the product then (and generate an id).
I can get around this problem by using this code in the except block:
product = Component(name=product_name)
product.save()
article.product = product
But the reason this concerns me is because if article.save() fails, it will already have created a new component/product. I want them to succeed or fail together.
Is there a nice way to get around this?
The way the Django ManyToManyField works is that it creates an extra table. So say you have two models, ModelA and ModelB. If you did...
ModelA.model_b = models.ManyToManyField(ModelB)
What Django actually does behind the scenes is it creates a table... app_modela_modelb with three columns: id, model_a_id, model_b_id.
Hold that thought in your mind. Regarding the saving of ModelB, Django does not assign it an ID until it's saved. You could technically manually assign it an ID and avoid this problem. It seems you're letting django handle that which is perfectly acceptable.
Django has a problem then doing the M2M. Why? If ModelB doesn't have an id yet, what goes in the model_b_id column on the M2M table? The error for null product_id is more than likely a null constraint error on the M2M field, not the ModelB record id.
If you would like them to "succeed together" or "fail together" perhaps it's time to look into transactions. You, for example, wrap the whole thing in a transaction, and do a rollback in the case of a partial failure. I haven't done a whole lot of work personally in this area so hopefully someone else will be of assistance on that topic.
You could get around this by using :
target_product, created_flag = Component.objects.get_or_create(name=product_name)
article.product = target_product
as I'm pretty sure get_or_create() will set the id of an object, if it has to create one.
Alternatively, if you don't mind empty FK relations on the Article table, you could add null=True to the definition.
There's little value in including a code snippet on transactions, as you should read the Django documentation to gain a good understanding.
Say I have 2 models:
class Poll(models.Model):
category = models.CharField(u"Category", max_length = 64)
[...]
class Choice(models.Model):
poll = models.ForeignKey(Poll)
[...]
Given a Poll object, I can query its choices with:
poll.choice_set.all()
But, is there a utility function to query all choices from a set of Poll?
Actually, I'm looking for something like the following (which is not supported, and I don't seek how it could be):
polls = Poll.objects.filter(category = 'foo').select_related('choice_set')
for poll in polls:
print poll.choice_set.all() # this shouldn't perform a SQL query at each iteration
I made an (ugly) function to help me achieve that:
def qbind(objects, target_name, model, field_name):
objects = list(objects)
objects_dict = dict([(object.id, object) for object in objects])
for foreign in model.objects.filter(**{field_name + '__in': objects_dict.keys()}):
id = getattr(foreign, field_name + '_id')
if id in objects_dict:
object = objects_dict[id]
if hasattr(object, target_name):
getattr(object, target_name).append(foreign)
else:
setattr(object, target_name, [foreign])
return objects
which is used as follow:
polls = Poll.objects.filter(category = 'foo')
polls = qbind(polls, 'choices', Choice, 'poll')
# Now, each object in polls have a 'choices' member with the list of choices.
# This was achieved with 2 SQL queries only.
Is there something easier already provided by Django? Or at least, a snippet doing the same thing in a better way.
How do you handle this problem usually?
Time has passed and this functionality is now available in Django 1.4 with the introduction of the prefetch_related() QuerySet function. This function effectively does what is performed by the suggested qbind function. ie. Two queries are performed and the join occurs in Python land, but now this is handled by the ORM.
The original query request would now become:
polls = Poll.objects.filter(category = 'foo').prefetch_related('choice_set')
As is shown in the following code sample, the polls QuerySet can be used to obtain all Choice objects per Poll without requiring any further database hits:
for poll in polls:
for choice in poll.choice_set:
print choice
Update: Since Django 1.4, this feature is built in: see prefetch_related.
First answer: don't waste time writing something like qbind until you've already written a working application, profiled it, and demonstrated that N queries is actually a performance problem for your database and load scenarios.
But maybe you've done that. So second answer: qbind() does what you'll need to do, but it would be more idiomatic if packaged in a custom QuerySet subclass, with an accompanying Manager subclass that returns instances of the custom QuerySet. Ideally you could even make them generic and reusable for any reverse relation. Then you could do something like:
Poll.objects.filter(category='foo').fetch_reverse_relations('choices_set')
For an example of the Manager/QuerySet technique, see this snippet, which solves a similar problem but for the case of Generic Foreign Keys, not reverse relations. It wouldn't be too hard to combine the guts of your qbind() function with the structure shown there to make a really nice solution to your problem.
I think what you're saying is, "I want all Choices for a set of Polls." If so, try this:
polls = Poll.objects.filter(category='foo')
choices = Choice.objects.filter(poll__in=polls)
I think what you are trying to do is the term "eager loading" of child data - meaning you are loading the child list (choice_set) for each Poll, but all in the first query to the DB, so that you don't have to make a bunch of queries later on.
If this is correct, then what you are looking for is 'select_related' - see https://docs.djangoproject.com/en/dev/ref/models/querysets/#select-related
I noticed you tried 'select_related' but it didn't work. Can you try doing the 'select_related' and then the filter. That might fix it.
UPDATE: This doesn't work, see comments below.