I have some relationships in my database that I describe like that:
#property
def translations(self):
"""
:return: QuerySet
"""
if not hasattr(self, '_translations'):
self._translations = ClientTranslation.objects.filter(base=self)
return self._translations
The idea behind the hasattr() and self._translation is to have the db only hit one time, while the second time the stored property is returned.
However, after reading, the docs, I'm not sure if the code is doing that - as queries are only hitting the db when the values are really needed - which comes after my code.
How would a correct approach look like?
Yes, DB is hit the first time someone needs the value. But as you pointed out, you save the query, not the results. Wrap the query with list(...) to save the results.
By the way, you can use the cached_property decorator to make it more elegant. It is not a built-in, though. It can be found here. You end up with:
#cached_property
def translations(self):
return list(ClientTranslation.objects.filter(base=self))
Related
i'm having a hard time changing the default _get_for_dict() Method.
This is what my code looks at the moment:
class ImageProperty(ndb.BlobKeyProperty):
def _get_for_dict(self, entity):
value = super(ImageProperty, self)._get_for_dict(entity)
if value:
return images.get_serving_url(value)
else:
return None
I'm not that much into the concepts for overriding methods, and having trouble with ndb iself...
Basically what I want to do: Store my Datastore Key as a BlobKeyProperty, but when retrieving it as a dict I want to get the image serving url.
Thanks a lot
I haven't tried this, but I think that this would be better as a _from_base_type hook:
class ImageProperty(ndb.BlobKeyProperty):
def _from_base_type(self, value):
return images.get_serving_url(value)
If I understand the documentation correctly, this API "stacks" so you don't need to call the _from_base_type on the super class (BlobKeyProperty). I guess ndb handles that for you. Personally, I think this is a bit weird for an API when super seems like it would work just fine ... But ... that's how it is I guess.
I have a datastore entity with several properties. Each property is updated using a separate method. However, every so often I find that either a method overwrites a property it is not modifying with an old value (Null).
For example.
class SomeModel(ndb.Model):
property1 = ndb.StringProperty()
property2 = ndb.StringProperty()
def method1(self, entity_key_urlsafe):
data1 = ndb.Key(urlsafe = entity_key_urlsafe).get()
data1.property1 = "1"
data1.put()
The data 1 entity now has property1 with value of "1"
def method2(self, entity_key_urlsafe):
data1 = ndb.Key(urlsafe = entity_key_urlsafe).get()
data1.property2 = "2"
data1.put()
The data 1 entity now has property2 with value of "2"
However, if these methods are run to closely in succession - method2 seems to overwrite property1 with its initial value (Null).
To get around this issue, I've been using the deferred library, however it's not reliable (deferred entities seem to disappear every now-and-then) or predictable (the _countdown time seems to be for guidance at best) enough.
My question is: Is there a way to only retrieve and modify one property of a datastore entity without overwriting the rest when you call data1.put()? I.e. In the case of method2 - could I only write to property2 without overwriting property1?
The way to prevent such overwrites, is to make sure your updates are done inside transactions. With NDB this is really easy - just attach the #ndb.transactional decorator to your methods:
#ndb.transactional
def method1(self, entity_key_urlsafe):
data1 = ndb.Key(urlsafe = entity_key_urlsafe).get()
data1.property1 = "1"
data1.put()
The documentation on transactions with NDB doesn't give as much background as the (older) DB version, so to familiarise yourself fully with the limitations and options, you should read both.
I say No
I have never seen a reference to that or a trick or a hack.
I also think that it would be quite difficult for such an operation to exist.
When you perform .put() on an entity the entity is serialised and then written.
An entity is an instance of the Class that you can save or retrieve from the Datastore.
Imagine if you had a date property that has auto_now? What would have to happen then? Which of the 2 saves should edit that property?
Though your problem seems to be different. One of your functions commits first and nullifies the other methods value because it retrieves an outdated copy, and not the expected one.
#Greg's Answer talks about transactions. You might want to take a look at them.
Transactions are used for concurrent requests and not that much for succession.
Imagine that 2 users pressing the save button to increase a counter at the same time. There transactions work.
#ndb.transactional
def increase_counter(entity_key_urlsafe):
entity = ndb.Key(urlsafe = entity_key_urlsafe).get()
entity.counter += 1
entity.put()
Transactions will ensure that the counter is correct.
The first that tries to commit the above transaction will succeed and the later will have to retry if retries are on (3 by default).
Though succession is something different. Said that, I and #Greg advise you to change your logic towards using transaction if the problem you want to solve is something like the counter example.
Say, we have the following relationships:
a person can have many email addresses
a email service provider can (obviously) serve multiple email address
So, it's a many to many relationship. I have three tables: emails, providers, and users. Emails have two foreign ids for provider and user.
Now, given a specific person, I want to print all the email providers and the email address it hosts for this person, if it exists. (If the person do not have an email at Gmail, I still want Gmail be in the result. I believe otherwise I only need a left inner join to solve this.)
I figured out how to do this with the following subqueries (following the sqlalchemy tutorial):
email_subq = db.session.query(Emails).\
filter(Emails.user_id==current_user.id).\
subquery()
provider_and_email = db.session.query(Provider, email_subq).\
outerjoin(email_subq, Provider.emails).\
all()
This works okay (it returns a 4-tuple of (Provider, user_id, provider_id, email_address), all the information that I want), but I later found out this is not using the Flask BaseQuery class, so that pagination provided by Flask-SQLAlchemy does not work. Apparently db.session.query() is not the Flask-SQLAlchemy Query instance.
I tried to do Emails.query.outerjoin[...] but that returns only columns in the email table though I want both the provider info and the emails.
My question: how can I do the same thing with Flask-SQLAlchemy so that I do not have to re-implement pagination that is already there?
I guess the simplest option at this point is to implement my own paginate function, but I'd love to know if there is another proper way of doing this.
I'm not sure if this is going to end up being the long-term solution, and it does not directly address my concern about not using the Flask-SQLAlchemy's BaseQuery, but the most trivial way around to accomplish what I want is to reimplement the paginate function.
And, in fact, it is pretty easy to use the original Flask-SQLAlchemy routine to do this:
def paginate(query, page, per_page=20, error_out=True):
if error_out and page < 1:
abort(404)
items = query.limit(per_page).offset((page - 1) * per_page).all()
if not items and page != 1 and error_out:
abort(404)
# No need to count if we're on the first page and there are fewer
# items than we expected.
if page == 1 and len(items) < per_page:
total = len(items)
else:
total = query.order_by(None).count()
return Pagination(query, page, per_page, total, items)
Modified from the paginate function found around line 376: https://github.com/mitsuhiko/flask-sqlalchemy/blob/master/flask_sqlalchemy.py
Your question is how to use Flask-SQLAlchemy's Pagination with regular SQLAlchemy queries.
Since Flask-SQLAlchemy's BaseQuery object holds no state of its own, and is derived from SQLAlchemy's Query, and is really just a container for methods, you can use this hack:
from flask.ext.sqlalchemy import BaseQuery
def paginate(sa_query, page, per_page=20, error_out=True):
sa_query.__class__ = BaseQuery
# We can now use BaseQuery methods like .paginate on our SA query
return sa_query.paginate(page, per_page, error_out)
To use:
#route(...)
def provider_and_email_view(page):
provider_and_email = db.session.query(...) # any SQLAlchemy query
paginated_results = paginate(provider_and_email, page)
return render_template('...', paginated_results=paginated_results)
*Edit:
Please be careful doing this. It's really just a way to avoid copying/pasting the paginate function, as seen in the other answer. Note that BaseQuery has no __init__ method. See How dangerous is setting self.__class__ to something else?.
*Edit2:
If BaseQuery had an __init__, you could construct one using the SA query object, rather than hacking .__class__.
Hey I have found a quick fix for this here it is:
provider_and_email = Provider.query.with_entities(email_subq).\
outerjoin(email_subq, Provider.emails).paginate(page, POST_PER_PAGE_LONG, False)
I'm currently using this approach:
query = BaseQuery([Provider, email_subq], db.session())
to create my own BaseQuery. db is the SqlAlchemy instance.
Update: as #afilbert suggests you can also do this:
query = BaseQuery(provider_and_email.subquery(), db.session())
How do you init your application with SQLAlchemy?
Probably your current SQLAlchemy connection has nothing to do with flask.ext.sqalchemy and you use original sqlalchemy
Check this tutorial and check your imports, that they really come from flask.ext.sqlalchemy
http://pythonhosted.org/Flask-SQLAlchemy/quickstart.html#a-minimal-application
You can try to paginate the list with results.
my_list = [my_list[i:i + per_page] for i in range(0, len(my_list), per_page)][page]
I did this and it works:
query = db.session.query(Table1, Table2, ...).filter(...)
if page_size is not None:
query = query.limit(page_size)
if page is not None:
query = query.offset(page*page_size)
query = query.all()
I could be wrong, but I think your problem may be the .all(). By using that, you're getting a list, not a query object.
Try leaving it off, and pass your query to the pagination method like so (I left off all the subquery details for clarity's sake):
email_query = db.session.query(Emails).filter(**filters)
email_query.paginate(page, per_page)
Currently I am developing a class which abstracts the SQLAlchemy. This class will act as helper tool to verify the values from database. This class will be used in regression/load test. Test cases will make hundred-thousands of database query. The layout of my class is as following.
class MyDBClass:
def __init__(self, dbName)
self.dbName = dbName
self.dbEngines[dbName] = create_engine()
self.dbMetaData[dbName] = MetaData()
self.dbMetaData[dbName].reflect(bind=self.dbEngines[dbName])
self.dbSession[dbName] = sessionmaker(bind=self.dbEngines[dbName])
def QueryFunction(self,dbName, tablename, some arguments):
session = self.dbSession[dbName]()
query = session.query(requiredTable)
result = query.filter().all()
session.close()
def updateFunction(self, dbName, talbeName, some arguments):
session = self.dbSession[dbName]()
session.query(requiredTable).filter().update()
session.commit()
session.close()
def insertFunction(self, dbName, tableName, some arguments):
connection = self.dbEngines[dbName].connect()
requiredTable = self.dbMetaData[dbName].tables[tableName]
connection.execute(requiredTable.insert(values=columnValuePair))
connection.close()
def cleanClose(self):
# Code which will remove the connection/session/object from memory.
# do some graceful work to clean close.
I want to write cleanClose() method which should remove the object which might be created by this class. This method should remove all those object from memory and provide a clean close. This may also avoid the memory leak.
I am not able to figure out what all object should be removed from the memory. Can some one suggest me what method calls I need to make here?
Edit1:
Is there any way by which I can measure the performance different method and their variant?
I was going through the documentation here and realized that I should not make session in every method rather I should create single instance of session and use throughout. Please provide your feedback on this. And let me know what would be the best way of doing thing here.
Any kind of help will be greatly appreciated here.
To remove objects from memory in Python, you just need to stop referencing them. There is not usually any need to explicitly write or call any methods to destroy or clean up the objects. So, an instance of MyDBClass will be automatically cleaned up when it goes out of scope.
If you are talking about closing down an SQLAlchemy session, then you just need to call the close() method on it.
An SQLAlchemy session is designed for multiple transactions. You don't generally need to create and destroy it multiple times. Create one session in the __init__ function and then use that in QueryFunction, updateFunction, etc.
Say I have 2 models:
class Poll(models.Model):
category = models.CharField(u"Category", max_length = 64)
[...]
class Choice(models.Model):
poll = models.ForeignKey(Poll)
[...]
Given a Poll object, I can query its choices with:
poll.choice_set.all()
But, is there a utility function to query all choices from a set of Poll?
Actually, I'm looking for something like the following (which is not supported, and I don't seek how it could be):
polls = Poll.objects.filter(category = 'foo').select_related('choice_set')
for poll in polls:
print poll.choice_set.all() # this shouldn't perform a SQL query at each iteration
I made an (ugly) function to help me achieve that:
def qbind(objects, target_name, model, field_name):
objects = list(objects)
objects_dict = dict([(object.id, object) for object in objects])
for foreign in model.objects.filter(**{field_name + '__in': objects_dict.keys()}):
id = getattr(foreign, field_name + '_id')
if id in objects_dict:
object = objects_dict[id]
if hasattr(object, target_name):
getattr(object, target_name).append(foreign)
else:
setattr(object, target_name, [foreign])
return objects
which is used as follow:
polls = Poll.objects.filter(category = 'foo')
polls = qbind(polls, 'choices', Choice, 'poll')
# Now, each object in polls have a 'choices' member with the list of choices.
# This was achieved with 2 SQL queries only.
Is there something easier already provided by Django? Or at least, a snippet doing the same thing in a better way.
How do you handle this problem usually?
Time has passed and this functionality is now available in Django 1.4 with the introduction of the prefetch_related() QuerySet function. This function effectively does what is performed by the suggested qbind function. ie. Two queries are performed and the join occurs in Python land, but now this is handled by the ORM.
The original query request would now become:
polls = Poll.objects.filter(category = 'foo').prefetch_related('choice_set')
As is shown in the following code sample, the polls QuerySet can be used to obtain all Choice objects per Poll without requiring any further database hits:
for poll in polls:
for choice in poll.choice_set:
print choice
Update: Since Django 1.4, this feature is built in: see prefetch_related.
First answer: don't waste time writing something like qbind until you've already written a working application, profiled it, and demonstrated that N queries is actually a performance problem for your database and load scenarios.
But maybe you've done that. So second answer: qbind() does what you'll need to do, but it would be more idiomatic if packaged in a custom QuerySet subclass, with an accompanying Manager subclass that returns instances of the custom QuerySet. Ideally you could even make them generic and reusable for any reverse relation. Then you could do something like:
Poll.objects.filter(category='foo').fetch_reverse_relations('choices_set')
For an example of the Manager/QuerySet technique, see this snippet, which solves a similar problem but for the case of Generic Foreign Keys, not reverse relations. It wouldn't be too hard to combine the guts of your qbind() function with the structure shown there to make a really nice solution to your problem.
I think what you're saying is, "I want all Choices for a set of Polls." If so, try this:
polls = Poll.objects.filter(category='foo')
choices = Choice.objects.filter(poll__in=polls)
I think what you are trying to do is the term "eager loading" of child data - meaning you are loading the child list (choice_set) for each Poll, but all in the first query to the DB, so that you don't have to make a bunch of queries later on.
If this is correct, then what you are looking for is 'select_related' - see https://docs.djangoproject.com/en/dev/ref/models/querysets/#select-related
I noticed you tried 'select_related' but it didn't work. Can you try doing the 'select_related' and then the filter. That might fix it.
UPDATE: This doesn't work, see comments below.