How to validate, store and execute user rules?

How to validate, store and execute user rules? - python

In my application:
there are users.
emails come in to the system.
users can use a web browser to define filter rules for their inbound emails.
Filter rules are of the general form "if field operator value/regex then do action X with parameter Y"
I need to:
validate the user rules (which will be created by the user with
an HTML page of dropdowns and text fields)
store the user rules in some appropriate format (maybe as Python
source?)
execute the rules within my Python processing script.
My question is:
is this is a "solved problem"? I want to avoid reinventing the wheel and it sure seems like this or something conceptually very similar must have been done before. Maybe there's even some sort of open source library or application that already does it.
If it is not a solved problem, can anyone suggest an effective way to validate, store and execute the rules? I'm guessing somehow it makes sense for the rules to be Python statements but I sure as hell don't want to write a language syntax parser.
UPDATE: Although the users will be very technical/likely developers, I still want the entry of rules to be tightly constrained to the general form specified above. very similar to the email rules you probably have in your email client right now. For example "if subject contains "subscribe" then send reply email to fromaddress"
UPDATE 2: The filtering rules will not only be on the email fields but also on other non email stuff, for example, the content of an attachment, or evaluation of a custom field that relates to the email but is not part of the email, generated by my system.
Any input valued.
I'm using Python 3.

Related

Python Advice for Project: Taking input and writing to existing PDF?? WebApp?

I am somewhat of a beginner at Python and I am currently starting some brainstorming and planning for a project to simplify the tedious task of filling out a product order form for a friend's business. I am wanting to create a program with an interface that takes in input and writes to an already existing pdf form of the physical order form. I would also like to implement being able to then email that form to another coworker and having accessible information of previous orders from the current customer ordering.
I am most curious about how to write to an already existing pdf form and just fill in the blanks, potentially using a PDF Reader? Also, for product catalog data and customer order history, would using dictionaries suffice or would it be better to use some python database?
I know I could figure out how to individually do each of those tasks but I don't know how to tie it all together properly to distribute it, either making a WebApp or an executable file from the script and just use tkinter or another GUI library for the interface, or if there is a more obvious and convenient option?
If I were to do a WebApp, what would be my best option, I've seen options such as Anvil, Flask, Django, and I just don't know what would be best for what I'm trying to accomplish.
I know this is a longshot and vague but any advice or guidance in the right direction would be much appreciated!

I'd go with a web app here. Personally, I prefer Django, and I don't think that it would be too difficult to learn it well enough for your purposes!
Regarding saving product catalog data and customer order history: I would definitely recommend using a database for your backend, which Django helps you do! Database integration is almost ridiculously easy using Django.
Here's some resources for the specific things you asked about:
Django vs. Flask
Working with HTML forms (Django) (this will help with taking user input)
filling an existing PDF form in Python (this will help with using user input to fill out the pdfs)
How to send email attachments in Python? (this will help you email your coworker the filled out pdf)
If you need more help/guidance, definitely feel free to follow up!

Best strategy for error handling in an interface to a database and web display

I decided to ask this question after going back and forth 100s of times trying to place error handling routines to optimize data integrity while also taking into account speed and efficiency (and wasting 100s of hours in the process. So here's the setup.
Database -> python classes -> python code -> javascript
MongoDB | that represent | that serves | web interface
the data pages (pyramid)
I want data to be robust, that is the number one requirement. So right now I validate data on the javascript side of the page, but also validate in the python classes which more or so represent data structures. While most server routines run through python classes, sometimes that feel inefficient given that it have to pass through different levels of error checking.
EDIT: I guess I should clarify. I am not looking to unify validation of client and server side code. Sorry for the bad write-up. I'm looking more to figure out where the server side validation should be done. Should it be in the direct interface to the database, or in the web server code where the data is received.
for instance, if I have an object with a barcode, should I validate the barcode in the code that reviews the data through AJAX or should I just call the object's class and validate there?
Again, is there sort of guidelines on how to do error checking in general? I want to be sort of professional, and learn but hopefully not have to go through a whole book.
I am not a software engineer, but I hope those of you who are familiar with complex projects, can tell me where I can find few guidelines on how to model/error check in a situation like this.
I'm not necessarily looking for an answer, but more like pointing me to a short set of guidelines when creating projects with different layers like this. Hopefully not extremely long..
I don't even know what tags to use in the post. HELP!!

Validating on the client and validating on the server serve different purposes entirely. Validating on the server is to make sure your model invariants hold and has to be done to maintain data integrity. Validating on the client is so the user has a friendly error message telling him that his input would've validated data integrity instead of having a traceback blow up into his face.
So there's a subtle difference in that when validating on the server you only really care whether or not the data is valid. On the client you also care, on a finer-grained level, why the input could be invalid. (Another thing that has to be handled at the client is an input format error, i.e. entering characters where a number is expected.)
It is possible to meet in the middle a little. If your model validity constraints are specified declaratively, you can use that metadata to generate some of the client validations, but they're not really sufficient. A good example would be user registration. Commonly you want two password fields, and you want the input in both to match, but the model will only contain one attribute for the password. You might also want to check the password complexity, but it's not necessarily a domain model invariant. (That is, your application will function correctly even if users have weak passwords, and the password complexity policy can change over time without the data integrity breaking.)
Another problem specific to client-side validation is that you often need to express a dependency between the validation checks. I.e. you have a required field that's a number that must be lower than 100. You need to validate that a) the field has a value; b) that the field value is a valid integer; and c) the field value is lower than 100. If any of these checks fails, you want to avoid displaying unnecessary error messages for further checks in the sequence in order to tell the user what his specific mistake was. The model doesn't need to care about that distinction. (Aside: this is where some frameworks fail miserably - either JSF or Spring MVC or either of them first attempts to do data-type conversion from the input strings to the form object properties, and if that fails, they cannot perform any further validations.)
In conclusion, the above implies that if you care about data integrity, and usability, you necessarily have to validate data at least twice, since the validations achieve different purposes even if there's some overlap. Client-side validation will have more checks and more finer-grained checks than the model-layer validation. I wouldn't really try to unify them except where your chosen framework makes it easy. (I don't know about Pyramid - Django makes these concerns separate in that Forms are a different layer than your Models, both can be validated, and they're joined by ModelForms that let you add additional validations to the ones performed by the model.)

Not sure I fully understand your question, but error handling on pymongo can be found here -
http://api.mongodb.org/python/current/api/pymongo/errors.html
Not sure if you're using a particular ORM - the docs have links to what's available, and these individually have their own best usages:
http://api.mongodb.org/python/current/tools.html
Do you have a particular ORM that you're using, or implementing your own through pymongo?

Query regarding data entry form

I am curious about something I encountered when I was registering on the wakari website. I entered my username which was something like abc.def.ghi and all other information and submitted the form ( or at least tried to submit! ). It threw up an error which said "username must be a valid python variable", so they were obviously doing something in their back-end with usernames as python variables. Would anyone explain to me if this is some sort of design scheme that they are using wherein they store user information as python variables or something like that. Again I apologize since this is not really a specific programming question but this is eating me up and I must know why that happened.
The following is the URL:
https://www.wakari.io/usermgmt/loginorregister

This is pure conjecture. One thing I could see wakiri doing is using the usernames as a module name for your code. That might be interesting. So storing user code as wakiri.<username>. Then the application might be doing an import wakiri.<username> with some interesting stuff in the __init__.py that runs whatever it finds.
Maybe that's it. Or maybe they are storing user code in files on disk. Maybe user code is written out to a file that contains lots of dictionaries that contain code and are named after the username?
Maybe they aren't even using it and just think it is cute to restrict people to valid Python variables.

I'm a Wakari developer, and we've only just caught this question. The short version is that you are pretty safe with a valid UNIX username, and the "error" text should say something using better "plain english" to this effect.
The reason we say the username needs to be a valid Python module name is that we're imagining a day when users could have something like ~/public_python as a place to put directly-shareable code, and then other users could access this via something like from wakari.users import steve. We'd leave it up to you to figure out if you trust user steve enough to import his code directly.

design for handling exceptions - google app engine

I'm developing a project on google app engine (webapp framework). I need you people to assess how I handle exceptions.
There are 4 types of exceptions I am handling:
Programming exceptions
Bad user input
Incorrect URLs
Incorrect query strings
Here is how I handle them:
I have subclassed the webapp.requesthandler class and overrode the handle_exceptions method. Whenever an exception occurs, I take the user to a friendly "we're sorry" page and in the meantime send a message with the traceback to the admins.
On the client side I (will) use js and also validate on the server side.
Here I figure (as a coder with non-web experience) in addition to validate inputs according to programming logic (check: cash input is of the float type?) and business rules (check: user has enough points to take that action?), I also have to check against malicious intentions. What measures should I take against malicious actions?
I have a catch-all URL that handles incorrect URLs. That is to say, I take the user to a custom "page does not exist" page. Here I have no problems, I think.
Incorrect query strings presumably raise exceptions if left to themselves. If an ID does not exist, the method returns None (an exception is on the way). if the parameter is inconvenient, the code raises an exception. Here I think I must raise a 404 and take the user to the custom "page does not exist" page. What should I do?
What are your opinions? Thanks in advance..

You seem to have thought things through pretty well. The only thing I would add is that you might want to take a look at Bloog as an example. Bloog is a pretty well written and popular open source blog engine for App Engine written in Python.
Also, on Point #2, watch out for these types of Cross Scripting attacks.
As for #4, keep in mind that 404 pages are an opportunity to add some color and creativity to your design.

Ad. #4: I usually treat query strings as non-essential. If anything is wrong with query string, I'd just present bare resource page (as if no query was present), possibly with some information to user what was wrong with the query string.
This leads to the problem similar to your #3: how did the user got into this wrong query? Did my application produce wrong URL somewhere? Or was it outdated link in some external service, or saved bookmark? HTTP_REFERER might contain some clue, but of course is not authoritative, so I'd log the problematic query (with some additional HTTP headers) and try to investigate the case.

How do I prevent execution of arbitrary commands from a Django app making system calls?

I have a Django application I'm developing that must make a system call to an external program on the server. In creating the command for the system call, the application takes values from a form and uses them as parameters for the call. I suppose this means that one can essentially use bogus parameters and write arbitrary commands for the shell to execute (e.g., just place a semicolon and then rm -rf *).
This is bad. While most users aren't malicious, it is a potential security problem. How does one handle these potential points of exploit?
EDIT (for clarification): The users will see a form that is split up with various fields for each of the parameters and options. However some fields will be available as open text fields. All of these fields are combined and fed to subprocess.check_call(). Technically, though, this isn't separated too far from just handing the users a command prompt. This has got to be fairly common, so what do other developers do to sanitize input so that they don't get a Bobby Tables.

Based on my understanding of the question, I'm assuming you aren't letting the users specify commands to run on the shell, but just arguments to those commands. In this case, you can avoid shell injection attacks by using the subprocess module and not using the shell (i.e. specify use the default shell=False parameter in the subprocess.Popen constructor.
Oh, and never use os.system() for any strings containing any input coming from a user.

By never trusting users. Any data coming from the web browser should be considered tainted. And absolutely do not try to validate the data via JS or by limiting what can be entered in the FORM fields. You need to do the tests on the server before passing it to your external application.
Update after your edit: no matter how you present the form to users on your front-end the backend should treat it as though it came from a set of text boxes with big flashing text around them saying "insert whatever you want here!"

To do this, you must do the following. If you don't know what "options" and "arguments" are, read the optparse background.
Each "Command" or "Request" is actually an instance of a model. Define your Request model with all of the parameters someone might provide.
For simple options, you must provide a field with a specific list of CHOICES. For options that are "on" or "off" (-x in the command-line) you should provide a CHOICE list with two human-understandable values ("Do X" and "Do not do X".)
For options with a value, you must provide a field that takes the option's value. You must write a Form with the validation for this field. We'll return to option value validation in a bit.
For arguments, you have a second Model (with an FK to the first). This may be as simple as a single FilePath field, or may be more complex. Again, you may have to provide a Form to validate instances of this Model, also.
Option validation varies by what kind of option it is. You must narrow the acceptable values to be narrowest possible set of characters and write a parser that is absolutely sure of passing only valid characters.
Your options will fall into the same categories as the option types in optparse -- string, int, long, choice, float and complex. Note that int, long, float and complex have validation rules already defined by Django's Models and Forms. Choice is a special kind of string, already supported by Django's Models and Forms.
What's left are "strings". Define the allowed strings. Write a regex for those strings. Validate using the regex. Most of the time, you can never accept quotes (", ' or `) in any form.
Final step. Your Model has a method which emits the command as a sequence of strings all ready for subprocess.Popen.
Edit
This is the backbone of our app. It's so common, we have a single Model with numerous Forms, each for a special batch command that gets run. The Model is pretty generic. The Forms are pretty specific ways to build the Model object. That's the way Django is designed to work, and it helps to fit with Django's well-thought-out design patterns.
Any field that is "available as open text fields" is a mistake. Each field that's "open" must have a regex to specify what is permitted. If you can't formalize a regex, you have to rethink what you're doing.
A field that cannot be constrained with a regex absolutely cannot be a command-line parameter. Period. It must be stored to a file to database column before being used.
Edit
Like this.
class MySubprocessCommandClass( models.Model ):
myOption_1 = models.CharField( choice = OPTION_1_CHOICES, max_length=2 )
myOption_2 = models.CharField( max_length=20 )
etc.
def theCommand( self ):
return [ "theCommand", "-p", self.myOption_1, "-r", self.myOption_2, etc. ]
Your form is a ModelForm for this Model.
You don't have to save() the instances of the model. We save them so that we can create a log of precisely what was run.

The answer is, don't let users type in shell commands! There is no excuse for allowing arbitrary shell commands to be executed.
Also, if you really must allow users to supply arguments to external commands, don't use a shell. In C, you could use execvp() to supply arguments directly to the command, but with django, I'm not sure how you would do this (but I'm sure there is a way). Of course, you should still do some argument sanitation, especially if the command has the potential to cause any harm.

Depending on the range of your commands, you could customize the form, so that parameters are entered in separate form-fields. Those you can parse for fitting values more easily.
Also, beware of backticks and other shell specific stuff.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.