Same Kind Entities Groups - python

Let's take an example on which I run a blog that automatically updates its posts.
I would like to keep an entity of class(=model) BlogPost in two different "groups", one called "FutureBlogPosts" and one called "PastBlogPosts".
This is a reasonable division that will allow me to work with my blog posts efficiently (query them separately etc.).
Basically the problem is the "kind" of my model will always be "BlogPost". So how can I separate it into two different groups?
Here are the options I found so far:
Duplicating the same model class code twice (once FutureBlogPost class and once PastBlogPost class (so their kinds will be different)) -- seems quite ridiculous.
Putting them under different anchestors (FutureBlogPost, "SomeConstantValue", BlogPost, #id) but this method also has its implications (1 write per second?) and also the whole ancestor-child relationship doesn't seem fit here. (and why do I have to use "SomeConstantValue" if I choose that option?)
Using different namespaces -- seems too radical for such a simple separation
What is the right way to do it?

Well seems like I finally found the relevant article.
As I understand it, pulling all entities by a specific kind and pulling them by a specific property would make no difference, both will require the same type of work on the background.
(However, querying by a specific full-key, is still faster)
So basically adding a property named "Type" or any other property you want to use to split your specific entities into groups is just as useful as giving it a certain kind.
Read more here: https://developers.google.com/appengine/articles/storage_breakdown
As you see, both EntitiesByKind and EntitiesByProperty are nothing but index tables to the original key.
Finally, an answer.

Why not just put a boolean in your "BlogPost" Entity, 0 if it's past, 1 if it's future? will let you query them separately easily.

Related

How to keep multiple QAbstractItemModel classes in sync

I've been working really hard trying to find a solution to this for the past few weeks. I have committed to a direction now, but I am still not entirely satisfied with what I have come up with. Asking this now purely out of curiosity and for hope of a more proper solution for next time.
How on earth do I keep multiple QAbstractItemModel classes in sync that are referring to the same source data but displaying in different ways in the tree view?
One of the main reasons for using model/view is to keep multiple views in sync with one another. However, if each of my views requires different data being displayed at the same column, as far as I can tell I need to then subclass my model to two different models with different implementations that will then cater to each of those unique view displays of the same items.
Underlying source items are the same, but data displayed is different. Maybe the flags are different as well, so that the user can only select top level items in one view and then can only select child items in the other view.
I'll try to give an example:
Lets say my TreeItem has three properties: a, b, c.
I have two tree views: TreeView1, TreeView2. Each has two columns.
TreeView1 displays data as follows: column1 -> a, column2 -> b
TreeView2 displays data as follows: column1 -> a, column2 -> c
I then need to create two different models, one for TreeView1 and one for TreeView2, and override the data and flags methods appropriately for each.
Since they are now different models, even though they are both referring to the same TreeItem in the background, they are no longer staying in sync. I have to manually call the refresh on TreeView2 whenever I change data on TreeView1, and vice versa.
Consider that column1, or property a, is editable and allows the user to set the name of the TreeItem. Desired behaviour would be for the edit that is done in TreeView1 to instantly be reflected in TreeView2.
I feel like I am missing some important design pattern or something when approaching this. Can anyone out there see where I am going wrong and correct me? Or is this a correct interpretation?
Thanks!
One way to do it is to use viewmodels. Have one QAbstractItemModel adapter to your underlying data model. All interaction must pass through that model. When you need to further adapt the data to a view, simply use a proxy view model class that refers to the adapter above and reformats/adapts the data for a view. All the view models will then be automagically synchronized. They can derive from QAbstractProxyModel, although that's not strictly necessary.
There is no other way of doing it if the underlying source of data doesn't provide change notification both for the contents and for the structure. If the underlying data source provides relevant notifications, it might as well be a QAbstractItemModel to begin with :)

How do to explicitly define the query used in subqueryload_all?

I'm using subqueryload/subqueryload_all pretty heavily, and I've run into the edge case where I tend to need to very explicitly define the query that is used during the subqueryload. For example I have a situation where I have posts and comments. My query looks something like this:
posts_q = db.query(Post).options(subqueryload(Post.comments))
As you can see, I'm loading each Post's comments. The problem is that I don't want all of the posts' comments, I need to also take into account a deleted field, and they need to be ordered by create time descending. The only way I have observed this being done, is by adding options to the relationship() declaration between posts and comments. I would prefer not to do this, b/c it means that that relationship cannot be reused everywhere after that, as I have other places in the app where those constraints may not apply.
What I would love to do, is explicitly define the query that subqueryload/subqueryload_all uses to load the posts' comments. I read about DisjointedEagerLoading here, and it looks like I could simply define a special function that takes in the base query, and a query to load the specified relationship. Is this a good route to take for this situation? Anyone ever run into this edge case before?
The answer is that you can define multiple relationships between Posts and Comments:
class Post(...):
active_comments = relationship(Comment,
primary_join=and_(Comment.post_id==Post.post_id, Comment.deleted=False),
order_by=Comment.created.desc())
Then you should be able to subqueryload by that relationship:
posts_q = db.query(Post).options(subqueryload(Post.active_comments))
You can still use the existing .comments relationship elsewhere.
I also had this problem and it took my some time to realize that this is an issue by design. When you say Post.comments then you refer to the relationship that says "these are all the comments of that post". However, now you want to filter them. If you'd now specify that condition somewhere on subqueryload then you are essentially loading only a subset of values into Post.comments. Thus, there will be values missing. Essentially you have a faulty representation of your data in the model.
The question here is how to approach this then, because you obviously need this value somewhere. The way I go is building the subquery myself and then specify special conditions there. That means you get two objects back: The list of posts and the list of comments. That is not a pretty solution, but at least it is not displaying data in a wrong way. If you were to access Post.comments for some reason, you can safely assume it contains all posts.
But there is room for improvement: You might want to have this attached to your class so you don't carry around two variables. The easy way might be to define a second relationship, e.g. published_comments which specifies extra parameters. You could then also control that no-one writes to it, e.g. with attribute events. In these events you could, instead of forbidding manipulation, handle how manipulation is allowed. The only problem might be when updates happen, e.g. when you add a comment to Post.comments then published_comments won't be updated automatically because they are not aware of each other. Again, I'd take events for this if this is a required feature (but with the above ugly solution you would not have that either).
As a last, hybrid, solution you could take the first approach and then just assign those values to your object, e.g. Post.deleted_comments = deleted_comments.
The thing to keep in mind here is that it is generally not a clever idea to manipulate the query the ORM makes as this could lead to problems later on. I have taken this approach and manipulated the queries (with contains_eager this is easily possible) but it has created problems on some points (while generally being functional) so I dropped that approach.

GAE/P: Dealing with eventual consistency

In may app, I have the following process:
Get a very long list of people
Create an entity for each person
Send an email to each person (step 2 must be completed before step 3 starts)
Because the list of people is very large, I don't want to put them in the same entity group.
In doing step 3, I can query the list of people like this:
Person.all()
Because of eventual consistency, I might miss some people in step 3. What is a good way to ensure that I am not missing anyone in step 3?
Is there a better solution than this?:
while Person.all().count() < N:
pass
for p in Person.all()
# do whatever
EDIT:
Another possible solution came to mind. I could create a linked list of the people. I can store a link to the first one, he can link to the second one and so one. It seems that the performance would be poor however, because you'd be doing each get separately and wouldn't have the efficiencies of a query.
UPDATE: I reread your post and saw that you don't want to put them all in the same entity group. I'm not sure how to guarantee strong consistency without doing so. You might want to restructure your data so that you don't have to put them in the same entity group, but in several. Perhaps depending on some aspect of a group of Person entities? (e.g., mailing list they are on, type of email being sent, etc.) Does each Person only contain a name and an email address, or are there other properties involved?
Google suggests a a few other alternatives:
If your application is likely to encounter heavier write usage, you may need to consider using other means: for example, you might put recent posts in a memcache with an expiration and display a mix of recent posts from the memcache and the Datastore, or you might cache them in a cookie, put some state in the URL, or something else entirely. The goal is to find a caching solution that provides the data for the current user for the period of time in which the user is posting to your application. Remember, if you do a get, a put, or any operation within a transaction, you will always see the most recently written data.
So it looks like you may want to investigate those possibilities, although I'm not sure how well they would translate to what your app needs.
ORIGINAL POST: Use ancestor queries.
From Google's "Structuring Data for Strong Consistency":
To obtain strongly consistent query results, you need to use an ancestor query limiting the results to a single entity group. This works because entity groups are a unit of consistency as well as transactionality. All data operations are applied to the entire group; an ancestor query won't return its results until the entire entity group is up to date. If your application relies on strongly consistent results for certain queries, you may need to take this into consideration when designing your data model. This page discusses best practices for structuring your data to support strong consistency.
So when you create a Person entity, set a parent for it. I believe you could even just have a specific entity be the "parent" of all the others, and it should give you strong consistency. (Although I like to structure my data a bit with ancestors anyway.)
# Gives you the ancestor key
def ancestor_key(kind, id_or_name):
return db.Key.from_path(kind, id_or_name)
# Kind is the db model your using (should be 'Person' in this case) and
# id_or_name should be the key id or name for the parent
new_person = Person(your_params, parent=ancestor_key('Kind', id_or_name)
You could even do queries at that point for all the entities with the same parent, which is nice. But that should help you get more consistent results regardless.

In my MapperExtension.create_instance, how can I extract individual row data by column name?

I've got a query that returns a fair number of rows, and have found that
We wind up throwing away most of the associated ORM instances; and
building up those soon-to-be-thrown-away instances is pretty slow.
So I'd like to build only the instances that I need!
Unfortunately, I can't do this by simply restricting the query; I need to do a fair bit of "business logic" processing on each row before I can tell if I'll throw it out; I can't do this in SQL.
So I was thinking that I could use a MapperExtension to handle this: I'd subclass MapperExtension, and then override create_instance; that method would examine the row data, and either return EXT_CONTINUE if the data is worth building into an instance, or ... something else (I haven't yet decided what) otherwise.
Firstly, does this approach even make sense?
Secondly, if it does make sense, I haven't figured out how to find the data I need in the arguments that get passed to create_instance. I suspect it's in there somewhere, but it's hard to find ... instead of getting a row that directly corresponds to the particular class I'm interested in, I'm getting a row that corresponds to the query that SQLalchemy generated, which happens to be a somewhat complex join between (say) tables A, B, and C.
The problem is that I don't know which elements of the row correspond to the fields in my ORM class: I want to be able to pluck out (e.g.) A.id, B.weight, and C.height.
I assume that somewhere inside the mapper, selectcontext, or class_ arguments is some sort of mapping between columns of my table, and offsets into the row. But I haven't yet found just the right thing. I've come tantalizingly close, though. For example, I've found that selectcontext.statement.columns contains the names of the generated columns ... but not those of the table I'm interested in. For example:
Column(u'A_id', UUID(), ...
...
Column(u'%(32285328 B)s_weight, MSInt(), ...
...
Column(u'%(32285999 C)s_height', MSInt(), ...
So: how do I map column names like C.height to offsets into the
row?
The row accepts Column objects as indexes:
row[MyClass.some_element.__clause_element__()]
but that will only get you as far as the classes and aliased() constructs you have access to on the outside. Its very likely that would be all you'd need for that part of the issue (even though ultimately the idea won't work, read on).
If your statement has had subqueries wrapped around it, from using things like from_self() or join() to a polymorphic target, the create_instance() method doesn't give you access to the translation functions you'd need to accomplish that.
If you're trying to get at rows that are linked to an eagerload(), that's totally not something you should be doing. eagerload() is about optimizing the load of collections. If you want your query to join between two tables and you're looking to filter on the joined table, use join().
But above all, create_instance() is from version 0.1 of SQLAlchemy and I doubt anyone uses it for anything, and it has no capability to say, "skip this row". It has to return something or the mapper will create the instance on its own. So no matter how well you can interpret the row, there's no hook for what you want to do here.
If i really wanted to do such a thing, it would likely be easier to monkeypatch the "fetchall()" method of the returned ResultProxy to filter rows, and send it to Query.instances(). Any result can be sent to this method. Although, if the Query has done translations and such on the mapped selectables, it would need the original QueryContext as well to know how to translate. But this is nothing I'd be bothering with either.
Overall, if speed is so critical of an issue throughout all of this that creating the object is that big of a difference, I'd make it so that I don't need the mapped objects at all for the whole operation, or I'd use caching, or generate the objects I need manually from a result set. I also would make sure that I have access to all the targeted columns in the selectable I'm using so I can re-fetch from result rows, which means I either don't use automatic-subquery/alias generation functions in the ORM, or I use the expression language directly (if you're really hungry for speed and are in the mood to write large tracts of optimizing code, you should probably just be using the expression language).
So the real questions you have to ask here are:
Have you verified that the real difference in speed is creating the object from the row. I.e. not fetching the row, or fetching its columns, etc.
Does the row just have some expensive columns that you don't need? Have you looked into deferred() ?
What are these business rules and why cant they be done in SQL, as stored procedures, etc.
How many thousands of rows are you really skipping here, that its so "slow" to not "skip" them
Have you investigated techniques for having the objects already present, like in-memory caches, preloads, etc. For many scenarios, this fits the bill.
None of this works, and you really want to hack up some home-rolled optimization code. So why not use the SQL expression language directly? If ultimately you're just dealing with a view layer, result rows are quite friendly (they allow "attribute" style access and such), or build some quick "generate an object" routine from it. The ORM presents a very specific use case of the SQL expression language, and if you really need something much more lightweight than it, you're better off skipping it.

How to handle user mangement in Django for groups that has same access but different rules?

Background information:
I have created an internal site for a company. Most of the work has gone into making calculation tools that their sale persons can use to make offers for clients. Create pdf offers and contracts that can be downloaded, compare prices etc. All of this is working fine.
Now their sale persons have been divided into two groups.
One group is sale personal that is hired by the company.
The other group is persons a company themselves.
The question:
My challenge now is, that I in some cases need to display different things depending on the type of sales person. Some of the rules for the calculation tools will have different rules as to which numbers will be allowed etc. But a big part of the site will still be the same for both groups.
What I would like to know, is if there is a good way of handling this problem?
My own thoughts:
I thought about managing this by using the groups that is available in contrib.auth. That way I could keep a single code base, but would have to make rules a lot of different places. Rules for validating forms to check if the numbers entered is allowed, will depend on the group the user is in. Some things will have different names, or the workflow might be a bit different. Some tools will only be available to one of the groups. This seems like a quick solution here and now, but if the two groups will need to change more and more, it seems like this would quickly become hard to manage.
I also thought about making two different sites. The idea here was to create apps that both groups use, so I only would need to make the code for that 1 place. Then I could make the custom parts for each site and wouldn't need to check for the user in most templates and views. But I'm not sure if this is a good way to go about things. It will create a lot of extra work, and if the two groups can use a lot of the same code, this might not really be needed.
The biggest concern is that I don't really know how this evolve, so it could end up with the two groups being entire different or with only very few differences. What I would like to do, is write some code that can support both scenarios so I wont end up regretting my choice a half year from now.
So, how do you handle this case of user management. I'm looking for ideas techniques or reusable apps that address this problem, not a ready made solution.
Clarifications:
My issue is not pure presentation that can be done with templates, but also that certain calculation tools (a form that is filled out) will have different rules/validation applied to them, and in some cases the calculations done will also be different. So they might see the same form, but wont be allowed to enter the same numbers, and the same numbers might not give the same result.
you could use proxy models on the Group and User models that come packed with django.
then write your authorization and calculation methods inside the proxy model. if a new group is added later, you only need to add/change the methods inside of those two proxy models. then make every instance of Group and User (obviously only where necessary, not literally every one) find the proxy model instead of the actual contrib model.
If I'm understanding you correctly, it seems like you want to have two different groups have access to all the same views, but they will see different numbers. You can achieve this effect by making separate templates for the different groups, and then loading the appropriate template for each view depending on the group of the current user.
Similarly you can use a context processor to put the current group into the context for every view, and then put conditionals in the templates to select which numbers to show.
The other option is to have two separate sets of views for the two different groups. Then use decorators on the views to make sure the groups only go to the views that are for them.

Categories

Resources