design for handling exceptions - google app engine - python

I'm developing a project on google app engine (webapp framework). I need you people to assess how I handle exceptions.
There are 4 types of exceptions I am handling:
Programming exceptions
Bad user input
Incorrect URLs
Incorrect query strings
Here is how I handle them:
I have subclassed the webapp.requesthandler class and overrode the handle_exceptions method. Whenever an exception occurs, I take the user to a friendly "we're sorry" page and in the meantime send a message with the traceback to the admins.
On the client side I (will) use js and also validate on the server side.
Here I figure (as a coder with non-web experience) in addition to validate inputs according to programming logic (check: cash input is of the float type?) and business rules (check: user has enough points to take that action?), I also have to check against malicious intentions. What measures should I take against malicious actions?
I have a catch-all URL that handles incorrect URLs. That is to say, I take the user to a custom "page does not exist" page. Here I have no problems, I think.
Incorrect query strings presumably raise exceptions if left to themselves. If an ID does not exist, the method returns None (an exception is on the way). if the parameter is inconvenient, the code raises an exception. Here I think I must raise a 404 and take the user to the custom "page does not exist" page. What should I do?
What are your opinions? Thanks in advance..

You seem to have thought things through pretty well. The only thing I would add is that you might want to take a look at Bloog as an example. Bloog is a pretty well written and popular open source blog engine for App Engine written in Python.
Also, on Point #2, watch out for these types of Cross Scripting attacks.
As for #4, keep in mind that 404 pages are an opportunity to add some color and creativity to your design.

Ad. #4: I usually treat query strings as non-essential. If anything is wrong with query string, I'd just present bare resource page (as if no query was present), possibly with some information to user what was wrong with the query string.
This leads to the problem similar to your #3: how did the user got into this wrong query? Did my application produce wrong URL somewhere? Or was it outdated link in some external service, or saved bookmark? HTTP_REFERER might contain some clue, but of course is not authoritative, so I'd log the problematic query (with some additional HTTP headers) and try to investigate the case.

Related

GET vs POST API for validating docker image is allowed

Unsure of whether to use a GET or POST for this situation. I'm creating an API that will return whether a docker image is compliant or not. What would be the better approach?
Scenario 1:
GET: https://<hostname>/api/checkImage?image=nginx:latest
Scenario 2
GET: https://<hostname>/api/checkImage/nginx:latest
Scenario 3
POST: https://<hostname>/api/checkImage
Payload: {"image": "nginx"}
Obviously would need to url encode the colon, but all of the above scenarios would return:
{"allowed": false}
The important semantic concept to understand is safe
Request methods are considered "safe" if their defined semantics are essentially read-only
Of the request methods defined by this specification, the GET, HEAD, OPTIONS, and TRACE methods are defined to be safe.
So in your case, the resource is something like "the compliance report for the image"; and the GET request asks that the server provide the current representation of the report. The server might choose to provide a previously cached representation, or it might generate a new compliance report and return that -- but those are implementation details unrelated to the semantics of the request.
On the other hand, if we want the server to update its representation of the resource to something more recent, then we would likely use POST, rather than GET, because we're trying to induce a change and we don't want a general-purpose cache to return a stale representation of the report.
From the perspective of the client, it might look like
GET /api/checkImage?image=nginx:latest
Oh, this representation is too old, I need something recent
POST /api/checkImage?image=nginx:latest

Wait for datastore changes before redirecting

Very similar to this question, except that the answer is not suitable.
I populate a table from a datastore query, then there is a link allowing the user to delete a specific row. Clicking the link goes to a url that deletes the row from the datastore then redirects back to the table.
Changes more often than not aren't shown in the table until reloading again.
Easy solution is to redirect to another page, that uses a javascript redirect to add a delay of a couple of seconds. Other alternative is to send details back to the page like action=delete&key=### and then make sure that item is missed from the table. That's a pain though.
The answer is with ancestor queries.
https://cloud.google.com/appengine/docs/python/datastore/queries#Python_Ancestor_queries
Create the entities with a parent. When one of the entities is deleted, you can run an ancestor query for your table list view which will have strong consistency when data is changed.
Example ancestor query:
tom = Person(key_name='Tom')
photo_query = Photo.all()
photo_query.ancestor(tom)
With the datastore, unless you can use ancestors, you cant guarantee when the indexes will be updated, only the entity itself (for getting by key later) by doing a put without async. Best is a combination of your suggestion where the client takes into account its action to patch the ui, plus maybe using memcache to remember recent actions and patch queries server-side before returning to client.
Here is a different approach. Use Javascript and AJAX. When the user clicks a link, you do two things:
Use Javascript/jQuery to remove the row from the DOM, and
Send an AJAX call to the server to do the appropriate datastore modifications.
It makes for a nice user experience because you are not reloading the page at all.
You might want to consider that there is always room for displaying outdated info in the table: for example displaying the table simultaneously in 2 different windows/tabs, then in one of them deletion is performed, the other will still display a delete link which will cause a 404 if followed.
With this in mind I'd 1st focus on managing the expectations (the user should know that the page may occasionally display outdated info) and then on the user's ability to get an outdated page in sync (refresh button?). Which might make the issue moot.
The delay-based "solutions" are bound to fail sooner or later in race condition scenarios, I wouldn't bother with the extra complexity. Especially for the document where the deletion is done: that's exactly where the user knows that the info is outdated (for free) and would be likely inclined to refresh until the recent change becomes visible.

Best strategy for error handling in an interface to a database and web display

I decided to ask this question after going back and forth 100s of times trying to place error handling routines to optimize data integrity while also taking into account speed and efficiency (and wasting 100s of hours in the process. So here's the setup.
Database -> python classes -> python code -> javascript
MongoDB | that represent | that serves | web interface
the data pages (pyramid)
I want data to be robust, that is the number one requirement. So right now I validate data on the javascript side of the page, but also validate in the python classes which more or so represent data structures. While most server routines run through python classes, sometimes that feel inefficient given that it have to pass through different levels of error checking.
EDIT: I guess I should clarify. I am not looking to unify validation of client and server side code. Sorry for the bad write-up. I'm looking more to figure out where the server side validation should be done. Should it be in the direct interface to the database, or in the web server code where the data is received.
for instance, if I have an object with a barcode, should I validate the barcode in the code that reviews the data through AJAX or should I just call the object's class and validate there?
Again, is there sort of guidelines on how to do error checking in general? I want to be sort of professional, and learn but hopefully not have to go through a whole book.
I am not a software engineer, but I hope those of you who are familiar with complex projects, can tell me where I can find few guidelines on how to model/error check in a situation like this.
I'm not necessarily looking for an answer, but more like pointing me to a short set of guidelines when creating projects with different layers like this. Hopefully not extremely long..
I don't even know what tags to use in the post. HELP!!
Validating on the client and validating on the server serve different purposes entirely. Validating on the server is to make sure your model invariants hold and has to be done to maintain data integrity. Validating on the client is so the user has a friendly error message telling him that his input would've validated data integrity instead of having a traceback blow up into his face.
So there's a subtle difference in that when validating on the server you only really care whether or not the data is valid. On the client you also care, on a finer-grained level, why the input could be invalid. (Another thing that has to be handled at the client is an input format error, i.e. entering characters where a number is expected.)
It is possible to meet in the middle a little. If your model validity constraints are specified declaratively, you can use that metadata to generate some of the client validations, but they're not really sufficient. A good example would be user registration. Commonly you want two password fields, and you want the input in both to match, but the model will only contain one attribute for the password. You might also want to check the password complexity, but it's not necessarily a domain model invariant. (That is, your application will function correctly even if users have weak passwords, and the password complexity policy can change over time without the data integrity breaking.)
Another problem specific to client-side validation is that you often need to express a dependency between the validation checks. I.e. you have a required field that's a number that must be lower than 100. You need to validate that a) the field has a value; b) that the field value is a valid integer; and c) the field value is lower than 100. If any of these checks fails, you want to avoid displaying unnecessary error messages for further checks in the sequence in order to tell the user what his specific mistake was. The model doesn't need to care about that distinction. (Aside: this is where some frameworks fail miserably - either JSF or Spring MVC or either of them first attempts to do data-type conversion from the input strings to the form object properties, and if that fails, they cannot perform any further validations.)
In conclusion, the above implies that if you care about data integrity, and usability, you necessarily have to validate data at least twice, since the validations achieve different purposes even if there's some overlap. Client-side validation will have more checks and more finer-grained checks than the model-layer validation. I wouldn't really try to unify them except where your chosen framework makes it easy. (I don't know about Pyramid - Django makes these concerns separate in that Forms are a different layer than your Models, both can be validated, and they're joined by ModelForms that let you add additional validations to the ones performed by the model.)
Not sure I fully understand your question, but error handling on pymongo can be found here -
http://api.mongodb.org/python/current/api/pymongo/errors.html
Not sure if you're using a particular ORM - the docs have links to what's available, and these individually have their own best usages:
http://api.mongodb.org/python/current/tools.html
Do you have a particular ORM that you're using, or implementing your own through pymongo?

Query regarding data entry form

I am curious about something I encountered when I was registering on the wakari website. I entered my username which was something like abc.def.ghi and all other information and submitted the form ( or at least tried to submit! ). It threw up an error which said "username must be a valid python variable", so they were obviously doing something in their back-end with usernames as python variables. Would anyone explain to me if this is some sort of design scheme that they are using wherein they store user information as python variables or something like that. Again I apologize since this is not really a specific programming question but this is eating me up and I must know why that happened.
The following is the URL:
https://www.wakari.io/usermgmt/loginorregister
This is pure conjecture. One thing I could see wakiri doing is using the usernames as a module name for your code. That might be interesting. So storing user code as wakiri.<username>. Then the application might be doing an import wakiri.<username> with some interesting stuff in the __init__.py that runs whatever it finds.
Maybe that's it. Or maybe they are storing user code in files on disk. Maybe user code is written out to a file that contains lots of dictionaries that contain code and are named after the username?
Maybe they aren't even using it and just think it is cute to restrict people to valid Python variables.
I'm a Wakari developer, and we've only just caught this question. The short version is that you are pretty safe with a valid UNIX username, and the "error" text should say something using better "plain english" to this effect.
The reason we say the username needs to be a valid Python module name is that we're imagining a day when users could have something like ~/public_python as a place to put directly-shareable code, and then other users could access this via something like from wakari.users import steve. We'd leave it up to you to figure out if you trust user steve enough to import his code directly.

Multi-language website - unique URLs required for different languages (to prevent caching)?

I have developed an AppEngine/Python/Django application that currently works in Spanish, and I am in the process of internationalizing with multi-language support. It is basically a dating website, in which people can browse other profiles and send messages. Viewing a profile in different languages will result in some of the text (menus etc) being displayed in whichever language is selected, but user-generated content (ie. user profile or message) will be displayed in the original language in which it was written.
My question is: is it necessary (or a good idea) to use unique URLs for the same page being displayed in different languages or is it OK to overload the same URL for a given page being displayed in different languages. In particular, I am worried that if I use the same URL for multiple languages, then some pages might be cached (either by Google, or by some other proxy that I might not be aware of), which could result in an incorrect language being displayed to a user.
Does anyone know if this is a legitimate concern, or if I am worrying about something that will not happen?
In principle, you can use the Content-Language and Vary response headers and the Accept-Language request header to control how caches behave and to prevent them serving up the wrong language to users.
In practice, however, Accept-Language is frequently set incorrectly in browsers, which is why most sites don't rely on it, or at least provide a secondary mechanism. Caches may be similarly unreliable about respecting the Vary header, but I'm not sure. Having language-specific URLs is certainly a practical way to do it, and avoids any potential issues with caching.
I don't know how this works with django, but looking at it from a general web-development perspective, you could:
use a query parameter to determine the language (e.g. /foo/bar/page.py?lang=en)
Add the language code to the url path (e.g. /foo/bar/en/page.py), and optionally use mod_rewrite so that that part of the path gets passed to your script as a query parameter.

Categories

Resources