User-defined derived data in Django - python

How do I let my users apply their own custom formula to a table of data to derive new fields?
I am working on a Django application which is going to store and process a lot of data for subscribed users on the open web. Think 100-10,000 sensor readings in one page request. I am going to be drawing graphs using this data and also showing tables containing it. I expect groups of sensors to be defined by my users, who will have registered themselves on my website (i.e they correspond with a django model).
I would like to allow the user to be able to create fields that are derived from their sensor data (as part of a setup process). For example, the user might know that their average house temperature is (temperature sensor1 + temperature sensor2) / 2 and want to show that on the graph. They might also want something more interesting like solar hot water heated is (temp out - temp in) * flow * conversion constant. I will then save these defined formulas for them and everyone else who views this page of sensor data.
The main question is how do I define the formula at the centre of the system. Do I just have a user-defined string to define the formula (say 100 chars long) and parse it myself - replace the user defined with an input sample and call it toast?
Update
In the end I got just the answer I asked for : A safe way to evaluate a stored user function on the server. Evaluating the same function also on the client when the function is being defined will be a great way to make the UI intuitive.

Depends on who your clients are.
If this is "open to the public" on the WWW, you have to parse expressions yourself. You can use the Python compiler to compile Python syntax. You can also invent your own compiler for a subset of Python syntax. There are lots of examples; start with the ply project.
If this is in-house ("behind the firewall") let the post a piece of Python code and exec that code.
Give them an environment from math import * functionality available.
Fold the following around their supplied line of code:
def userFunc( col1, col2, col3, ... ):
result1= {{ their code goes here }}
return result1
Then you can exec the function definition and use the defined function without bad things happening.
While some folks like to crow that exec is "security problem", it's no more a security problem than user's sharing passwords, and admin's doing intentionally stupid things like deleting important files or turning the power off randomly while your programming is running.
exec is only a security problem if you allow anyone access to it. For in-house applications, you know the users. Train them.

I would work out what operations you want to support [+,-,*,/,(,),etc] and develop client side (javascript) to edit and apply those values to new fields of the data. I don't see the need to do any of this server-side and you will end up with a more responsive and enjoyable user experience as a result.
If you allow the user to save their formulas and re-load them when they revisit the site, you can get their browser to do all the calculations. Just provide some generic methods to add columns of data which are generated by applying one of their forumla's to your data.
I imagine the next step would be to allow them to apply those operations to the newly generated columns.
Have you considered posting their data into a google spreadsheet? This would save a lot of the development work as they already allow you to define formulas etc. and apply it to the data. I'm not too sure of the data limit (how much data you can post and operate on) mind you.

Another user asked a similar question in C.
In that post, Warren suggested that the formula could be parsed and converted from
(a + c) / b
Into reverse polish notation
a c + b /
Which is easier to process.
In this case, you could intercept the formula model's save and generate the postfix notation from the user-defined formula. Once you have postfix notation, it is fairly straightforward to write a loop that evaluates the formula from left to right.
As for implementation in Django, the core question remaining is how to map different input fields into the formula. The simple solution would be a model representing the derived field uses a many-to-many relationship with the symbol name ("a", "b" or "c") defined per-input.
If performance is really critical, you might somehow further pre-process the postfix formula before applying it to the data.

Related

What to do with runtime generated data spanning several classes?

I'm a self-taught programmer and a lot of the problems I encounter come from a lack of formal education (and often also experience).
My question it the following: How to rationalize where you store the data a class or function creates? I'll make a simple example:
Case: I have a webshop (SHOP) with a REST api and a product provider (PROVIDER) also with a REST API. I determine the product, I send that data to PROVIDER who sends me back formatted data that can be read by SHOP to make a working product on the webshop. PROVIDER also has a secondary REST api that provides generated images.
What I would come up with:
I'd make three classes: ProductBase, Shop and Provider
ProductBase would be the class from where I instantiate and store the individual product information.
Shop would be where I design the api interactions with the webshop.
Provider same as shop, but for interactions with provider api.
My problem: At some point you're creating data that's not clearly separated in concern. For example: Would I store the generated product data (from PROVIDER) in the ProductBase instance I created? It feels like I'm coupling the two classes this way. But it not there, then where?
What if I create product images with PROVIDER and I upload them to SHOP? Do I store the uploaded image-url in PRODUCT? How do you keep track of all this info?
The question I want answered:
I've read a lot on OOP and Design Patterns, and I have adopted a TDD approach which has greatly helped to improve my code but I haven't found anything on how to approach the flow of at runtime generated data within software engineering.
What would be a good way to solve above problem(s) and could you explain your rationale for it?
If I understand correctly, I think your current concern is that you have "raw" product data, which you want to store in objects, and you have "processed" (formatted) product data, which you also want to store in objects. Your question being should you mix them.
Let me just first point out the other obvious option. Namely, having two product classes: RawProduct and ProcessedProduct. Which to do?
(Edit: also, to be sure, product data should not be stored in provider. The provide performs the action of formatting but the data is product data. Not provider data).
It depends. There are a couple of considerations:
1) In general, in OOP, the idea is to couple actions on data with the data. So if possible, you have some method in ProductBase like "format()", where format will send the object off to the API to get formatted, and store the result in an instance variable. You can then also have a method like "find_image", that goes and fetches the image url from the API and then stores that in a field. An object's data is meant to be dynamic. It is meant to be altered by object methods.
2) If you need version control (if you want the full history of the object's state to be available), then you can't override fields with new data. So either you need to store a history of every object field in the object, or you need to create new objects.
3) Is RAM a concern? I sometimes create dataclasses that store only the final part of an object's life so that I can fit more of the objects into memory.
Personally I often find myself creating "RawObject" and "ProcessedObject" classes, it's just easier a lot of the time. But that's probably because I mostly work with document processing, so it's very clear. Usually You'll just update the objects data.
A benefit of having one object with the full history is that it is much easier to debug. Because the raw data and the API result are in the same object. So you can very easily probe what went wrong. If you start splitting things up it's harder to track. In general, the more information an object has about where it's been, the easier it is to figure out what went wrong with it.
Remember also though, since this is a Python question, Python is multi-paridigm. And if you're writing pipeline-style architectures (synchronous, linear processes), then a functional approach can also work well.
Once your data is stored in a product object, anything can hold a reference to that. So a shop can reference an object and a product can reference the object. Be clear on the difference between "has-a" relationships and "is-a" relationships.

Django: how to transfer scattered data to a template?

Introduction
In Django, when the data you want to display on a template is included in one object, It's f**** easy. To sum up the steps (that everyone knows actually):
You Write the right method to get your object in your model class
You Call this method in your view, passing the result to the template
You Iterate on the result in the template with a for loop, to display your objects in a table, for example.
Now, let's take a more complex situation
Let's say that the data you want to display is widely spread over different objects of different classes. You need to call many methods to get these data.
Once you call these different methods, you got different variables (unsimilar objects, integers, list of strings, etc.)
Nevertheless, you still want to pass everything to a template and display a pretty table in the end.
The problem is:
If you're passing all the raw objects containing the data you need to your template, it is completely unorganised and you can't iterate on variables in a clean way to get what you need to display your table.
The question is:
How (which structure) and where (models? views?) should I organize my complex data before passing it to a template?
My idea on this (which can be totally wrong):
For each view that need "spread data" to pass to a template, I could create a method (like viewXXX_organize_data()) in views.py, that would take the raws objects and would return a data structure with organized data that would help me to display a table by iterating on it.
About the data structure to choose, I compared lists with dictionaries
dictionaries have key so it's cleaner to call {{dict.a-key-name}} rather than {{ tabl.3}} in the template.
lists can be sorted, so when you need to sort by date the elements you want to display, dictionary is not helpful, arghh, stuck again!
What do you think about all that? Thanks for reading until there, and sharing on this!
With your question you are entering in a conceptual/architectural domain rather than in a "this particular view of the data in my project is hard to represent in the template layer of django". So I will try to give you the birds view (when flying and not on the ground) of the problem and you can decide for yourself.
From the first philosophy box in the django template language documentation it's clearly stated that templates should have as little program logic as possible. This indicates that the representation of the data used in the template should be simple and totally adapted to the template you are trying to build (this is my interpretation of it). This approach indicates that you should have a layer responsible for intermediating the representation of your data (models or other sources) and the data that your template needs to achieve the final representation you want you users to see.
This layer can simple stay in your view, in viewXXX_organize_data, or in some other form respecting to a more complex/elaborated architecture (see DCI or Hexagonal).
In your case I would start by doing something like viewXXX_organize_data() where I would use the most appropriate data structures for the template you are trying to build, while keeping some independence from the way you obtain your data (through models other services etc).
You can even think of not using you model objects directly in the template and creating template specific objects to represent a certain view of the data.
Hope this helps you make a decision. It's not a concrete answer but will help you for sure make a decision and then staying coherent all trough your app.

Best strategy for error handling in an interface to a database and web display

I decided to ask this question after going back and forth 100s of times trying to place error handling routines to optimize data integrity while also taking into account speed and efficiency (and wasting 100s of hours in the process. So here's the setup.
Database -> python classes -> python code -> javascript
MongoDB | that represent | that serves | web interface
the data pages (pyramid)
I want data to be robust, that is the number one requirement. So right now I validate data on the javascript side of the page, but also validate in the python classes which more or so represent data structures. While most server routines run through python classes, sometimes that feel inefficient given that it have to pass through different levels of error checking.
EDIT: I guess I should clarify. I am not looking to unify validation of client and server side code. Sorry for the bad write-up. I'm looking more to figure out where the server side validation should be done. Should it be in the direct interface to the database, or in the web server code where the data is received.
for instance, if I have an object with a barcode, should I validate the barcode in the code that reviews the data through AJAX or should I just call the object's class and validate there?
Again, is there sort of guidelines on how to do error checking in general? I want to be sort of professional, and learn but hopefully not have to go through a whole book.
I am not a software engineer, but I hope those of you who are familiar with complex projects, can tell me where I can find few guidelines on how to model/error check in a situation like this.
I'm not necessarily looking for an answer, but more like pointing me to a short set of guidelines when creating projects with different layers like this. Hopefully not extremely long..
I don't even know what tags to use in the post. HELP!!
Validating on the client and validating on the server serve different purposes entirely. Validating on the server is to make sure your model invariants hold and has to be done to maintain data integrity. Validating on the client is so the user has a friendly error message telling him that his input would've validated data integrity instead of having a traceback blow up into his face.
So there's a subtle difference in that when validating on the server you only really care whether or not the data is valid. On the client you also care, on a finer-grained level, why the input could be invalid. (Another thing that has to be handled at the client is an input format error, i.e. entering characters where a number is expected.)
It is possible to meet in the middle a little. If your model validity constraints are specified declaratively, you can use that metadata to generate some of the client validations, but they're not really sufficient. A good example would be user registration. Commonly you want two password fields, and you want the input in both to match, but the model will only contain one attribute for the password. You might also want to check the password complexity, but it's not necessarily a domain model invariant. (That is, your application will function correctly even if users have weak passwords, and the password complexity policy can change over time without the data integrity breaking.)
Another problem specific to client-side validation is that you often need to express a dependency between the validation checks. I.e. you have a required field that's a number that must be lower than 100. You need to validate that a) the field has a value; b) that the field value is a valid integer; and c) the field value is lower than 100. If any of these checks fails, you want to avoid displaying unnecessary error messages for further checks in the sequence in order to tell the user what his specific mistake was. The model doesn't need to care about that distinction. (Aside: this is where some frameworks fail miserably - either JSF or Spring MVC or either of them first attempts to do data-type conversion from the input strings to the form object properties, and if that fails, they cannot perform any further validations.)
In conclusion, the above implies that if you care about data integrity, and usability, you necessarily have to validate data at least twice, since the validations achieve different purposes even if there's some overlap. Client-side validation will have more checks and more finer-grained checks than the model-layer validation. I wouldn't really try to unify them except where your chosen framework makes it easy. (I don't know about Pyramid - Django makes these concerns separate in that Forms are a different layer than your Models, both can be validated, and they're joined by ModelForms that let you add additional validations to the ones performed by the model.)
Not sure I fully understand your question, but error handling on pymongo can be found here -
http://api.mongodb.org/python/current/api/pymongo/errors.html
Not sure if you're using a particular ORM - the docs have links to what's available, and these individually have their own best usages:
http://api.mongodb.org/python/current/tools.html
Do you have a particular ORM that you're using, or implementing your own through pymongo?

Reverse Search Best Practices?

I'm making an app that has a need for reverse searches. By this, I mean that users of the app will enter search parameters and save them; then, when any new objects get entered onto the system, if they match the existing search parameters that a user has saved, a notification will be sent, etc.
I am having a hard time finding solutions for this type of problem.
I am using Django and thinking of building the searches and pickling them using Q objects as outlined here: http://www.djangozen.com/blog/the-power-of-q
The way I see it, when a new object is entered into the database, I will have to load every single saved query from the db and somehow run it against this one new object to see if it would match that search query... This doesn't seem ideal - has anyone tackled such a problem before?
At the database level, many databases offer 'triggers'.
Another approach is to have timed jobs that periodically fetch all items from the database that have a last-modified date since the last run; then these get filtered and alerts issued. You can perhaps put some of the filtering into the query statement in the database. However, this is a bit trickier if notifications need to be sent if items get deleted.
You can also put triggers manually into the code that submits data to the database, which is perhaps more flexible and certainly doesn't rely on specific features of the database.
A nice way for the triggers and the alerts to communicate is through message queues - queues such as RabbitMQ and other AMQP implementations will scale with your site.
The amount of effort you use to solve this problem is directly related to the number of stored queries you are dealing with.
Over 20 years ago we handled stored queries by treating them as minidocs and indexing them based on all of the must have and may have terms. A new doc's term list was used as a sort of query against this "database of queries" and that built a list of possibly interesting searches to run, and then only those searches were run against the new docs. This may sound convoluted, but when there are more than a few stored queries (say anywhere from 10,000 to 1,000,000 or more) and you have a complex query language that supports a hybrid of Boolean and similarity-based searching, it substantially reduced the number we had to execute as full-on queries -- often no more that 10 or 15 queries.
One thing that helped was that we were in control of the horizontal and the vertical of the whole thing. We used our query parser to build a parse tree and that was used to build the list of must/may have terms we indexed the query under. We warned the customer away from using certain types of wildcards in the stored queries because it could cause an explosion in the number of queries selected.
Update for comment:
Short answer: I don't know for sure.
Longer answer: We were dealing with a custom built text search engine and part of it's query syntax allowed slicing the doc collection in certain ways very efficiently, with special emphasis on date_added. We played a lot of games because we were ingesting 4-10,000,000 new docs a day and running them against up to 1,000,000+ stored queries on a DEC Alphas with 64MB of main memory. (This was in the late 80's/early 90's.)
I'm guessing that filtering on something equivalent to date_added could be done used in combination the date of the last time you ran your queries, or maybe the highest id at last query run time. If you need to re-run the queries against a modified record you could use its id as part of the query.
For me to get any more specific, you're going to have to get a lot more specific about exactly what problem you are trying to solve and the scale of the solution you are trying accomplishing.
If you stored the type(s) of object(s) involved in each stored search as a generic relation, you could add a post-save signal to all involved objects. When the signal fires, it looks up only the searches that involve its object type and runs those. That probably will still run into scaling issues if you have a ton of writes to the db and a lot of saved searches, but it would be a straightforward Django approach.

How to handle user mangement in Django for groups that has same access but different rules?

Background information:
I have created an internal site for a company. Most of the work has gone into making calculation tools that their sale persons can use to make offers for clients. Create pdf offers and contracts that can be downloaded, compare prices etc. All of this is working fine.
Now their sale persons have been divided into two groups.
One group is sale personal that is hired by the company.
The other group is persons a company themselves.
The question:
My challenge now is, that I in some cases need to display different things depending on the type of sales person. Some of the rules for the calculation tools will have different rules as to which numbers will be allowed etc. But a big part of the site will still be the same for both groups.
What I would like to know, is if there is a good way of handling this problem?
My own thoughts:
I thought about managing this by using the groups that is available in contrib.auth. That way I could keep a single code base, but would have to make rules a lot of different places. Rules for validating forms to check if the numbers entered is allowed, will depend on the group the user is in. Some things will have different names, or the workflow might be a bit different. Some tools will only be available to one of the groups. This seems like a quick solution here and now, but if the two groups will need to change more and more, it seems like this would quickly become hard to manage.
I also thought about making two different sites. The idea here was to create apps that both groups use, so I only would need to make the code for that 1 place. Then I could make the custom parts for each site and wouldn't need to check for the user in most templates and views. But I'm not sure if this is a good way to go about things. It will create a lot of extra work, and if the two groups can use a lot of the same code, this might not really be needed.
The biggest concern is that I don't really know how this evolve, so it could end up with the two groups being entire different or with only very few differences. What I would like to do, is write some code that can support both scenarios so I wont end up regretting my choice a half year from now.
So, how do you handle this case of user management. I'm looking for ideas techniques or reusable apps that address this problem, not a ready made solution.
Clarifications:
My issue is not pure presentation that can be done with templates, but also that certain calculation tools (a form that is filled out) will have different rules/validation applied to them, and in some cases the calculations done will also be different. So they might see the same form, but wont be allowed to enter the same numbers, and the same numbers might not give the same result.
you could use proxy models on the Group and User models that come packed with django.
then write your authorization and calculation methods inside the proxy model. if a new group is added later, you only need to add/change the methods inside of those two proxy models. then make every instance of Group and User (obviously only where necessary, not literally every one) find the proxy model instead of the actual contrib model.
If I'm understanding you correctly, it seems like you want to have two different groups have access to all the same views, but they will see different numbers. You can achieve this effect by making separate templates for the different groups, and then loading the appropriate template for each view depending on the group of the current user.
Similarly you can use a context processor to put the current group into the context for every view, and then put conditionals in the templates to select which numbers to show.
The other option is to have two separate sets of views for the two different groups. Then use decorators on the views to make sure the groups only go to the views that are for them.

Categories

Resources