GET vs POST API for validating docker image is allowed

GET vs POST API for validating docker image is allowed - python

Unsure of whether to use a GET or POST for this situation. I'm creating an API that will return whether a docker image is compliant or not. What would be the better approach?
Scenario 1:
GET: https://<hostname>/api/checkImage?image=nginx:latest
Scenario 2
GET: https://<hostname>/api/checkImage/nginx:latest
Scenario 3
POST: https://<hostname>/api/checkImage
Payload: {"image": "nginx"}
Obviously would need to url encode the colon, but all of the above scenarios would return:
{"allowed": false}

The important semantic concept to understand is safe
Request methods are considered "safe" if their defined semantics are essentially read-only
Of the request methods defined by this specification, the GET, HEAD, OPTIONS, and TRACE methods are defined to be safe.
So in your case, the resource is something like "the compliance report for the image"; and the GET request asks that the server provide the current representation of the report. The server might choose to provide a previously cached representation, or it might generate a new compliance report and return that -- but those are implementation details unrelated to the semantics of the request.
On the other hand, if we want the server to update its representation of the resource to something more recent, then we would likely use POST, rather than GET, because we're trying to induce a change and we don't want a general-purpose cache to return a stale representation of the report.
From the perspective of the client, it might look like
GET /api/checkImage?image=nginx:latest
Oh, this representation is too old, I need something recent
POST /api/checkImage?image=nginx:latest

Related

Besides automatic documentation, what's the rationale of providing a response model for FastAPI endpoints?

The question is basically in the title: Does providing a bespoke response model serve any further purpose besides clean and intuitive documentation? What's the purpose of defining all these response models for all the endpoints rather than just leaving it empty?
I've started working with FastAPI recently, and I really like it. I'm using FastAPI with a MongoDB backend. I've followed the following approach:
Create a router for a given endpoint-category
Write the endpoint with the decorator etc. This involves the relevant query and defining the desired output of the query.
Then, test and trial everything.
Usually, prior to finalising an endpoint, I would set the response_model in the decorator to something generic, like List (imported from typing). This would look something like this:
#MyRouter.get(
'/the_thing/{the_id}',
response_description="Returns the thing for target id",
response_model=List,
response_model_exclude_unset=True
)
In the swagger-ui documentation, this will result in an uninformative response-model, which looks like this:
So, I end up defining a response-model, which corresponds to the fields I'm returning in my query in the endpoint function; something like this:
class the_thing_out(BaseModel):
id : int
name : str | None
job : str | None
And then, I modify the following: response_model=List[the_thing_out]. This will give a preview of what I can expect to be returned from a given call from within the swagger ui documentation.

Well, to be fair, having an automatically generated OpenAPI-compliant description of your interface is very valuable in and of itself.
Other than that, there is the benefit of data validation in the broader sense, i.e. ensuring that the data that is actually sent to the client conforms to a pre-defined schema. This is why Pydantic is so powerful and FastAPI just utilizes its power to that end.
You define a schema with Pydantic, set it as your response_model and then never have to worry about wrong types or unexpected values or what have you accidentally being introduced in your response data.* If you try to return some invalid data from your route, you'll get an error, instead of the client potentially silently receiving garbage that might mess up the logic on its end.
Now, could you achieve the same thing by just manually instantiating your Pydantic model with the data you want to send yourself first, then generating the JSON and packaging that in an adequate HTTP response?
Sure. But that is just extra steps you have to make for each route. And if you do that three, four, five times, you'll probably come up with an idea to factor out that model instantiation, etc. in a function that is more or less generic over any Pydantic model and data you throw at it... and oh, look! You implemented your own version of the response_model logic. 😉
Now, all this becomes more and more important the more complex your schemas get. Obviously, if all your route does is return something like
{"exists": 1}
then neither validation nor documentation is all that worthwhile. But I would argue it's usually better to prepare in advance for potential growth of whatever application you are developing.
Since you are using MongoDB in the back, I would argue this becomes even more important. I know, people say that it is one of the "perks" of MongoDB that you need no schema for the data you throw at it, but as soon as you provide an endpoint for clients, it would be nice to at least broadly define what the data coming from that endpoint can look like. And once you have that "contract", you just need a way to safeguard yourself against messing up, which is where the aforementioned model validation comes in.
Hope this helps.
* This rests on two assumptions of course: 1) You took great care in defining your schema (incl. validation) and 2) Pydantic works as expected.

Best strategy for error handling in an interface to a database and web display

I decided to ask this question after going back and forth 100s of times trying to place error handling routines to optimize data integrity while also taking into account speed and efficiency (and wasting 100s of hours in the process. So here's the setup.
Database -> python classes -> python code -> javascript
MongoDB | that represent | that serves | web interface
the data pages (pyramid)
I want data to be robust, that is the number one requirement. So right now I validate data on the javascript side of the page, but also validate in the python classes which more or so represent data structures. While most server routines run through python classes, sometimes that feel inefficient given that it have to pass through different levels of error checking.
EDIT: I guess I should clarify. I am not looking to unify validation of client and server side code. Sorry for the bad write-up. I'm looking more to figure out where the server side validation should be done. Should it be in the direct interface to the database, or in the web server code where the data is received.
for instance, if I have an object with a barcode, should I validate the barcode in the code that reviews the data through AJAX or should I just call the object's class and validate there?
Again, is there sort of guidelines on how to do error checking in general? I want to be sort of professional, and learn but hopefully not have to go through a whole book.
I am not a software engineer, but I hope those of you who are familiar with complex projects, can tell me where I can find few guidelines on how to model/error check in a situation like this.
I'm not necessarily looking for an answer, but more like pointing me to a short set of guidelines when creating projects with different layers like this. Hopefully not extremely long..
I don't even know what tags to use in the post. HELP!!

Validating on the client and validating on the server serve different purposes entirely. Validating on the server is to make sure your model invariants hold and has to be done to maintain data integrity. Validating on the client is so the user has a friendly error message telling him that his input would've validated data integrity instead of having a traceback blow up into his face.
So there's a subtle difference in that when validating on the server you only really care whether or not the data is valid. On the client you also care, on a finer-grained level, why the input could be invalid. (Another thing that has to be handled at the client is an input format error, i.e. entering characters where a number is expected.)
It is possible to meet in the middle a little. If your model validity constraints are specified declaratively, you can use that metadata to generate some of the client validations, but they're not really sufficient. A good example would be user registration. Commonly you want two password fields, and you want the input in both to match, but the model will only contain one attribute for the password. You might also want to check the password complexity, but it's not necessarily a domain model invariant. (That is, your application will function correctly even if users have weak passwords, and the password complexity policy can change over time without the data integrity breaking.)
Another problem specific to client-side validation is that you often need to express a dependency between the validation checks. I.e. you have a required field that's a number that must be lower than 100. You need to validate that a) the field has a value; b) that the field value is a valid integer; and c) the field value is lower than 100. If any of these checks fails, you want to avoid displaying unnecessary error messages for further checks in the sequence in order to tell the user what his specific mistake was. The model doesn't need to care about that distinction. (Aside: this is where some frameworks fail miserably - either JSF or Spring MVC or either of them first attempts to do data-type conversion from the input strings to the form object properties, and if that fails, they cannot perform any further validations.)
In conclusion, the above implies that if you care about data integrity, and usability, you necessarily have to validate data at least twice, since the validations achieve different purposes even if there's some overlap. Client-side validation will have more checks and more finer-grained checks than the model-layer validation. I wouldn't really try to unify them except where your chosen framework makes it easy. (I don't know about Pyramid - Django makes these concerns separate in that Forms are a different layer than your Models, both can be validated, and they're joined by ModelForms that let you add additional validations to the ones performed by the model.)

Not sure I fully understand your question, but error handling on pymongo can be found here -
http://api.mongodb.org/python/current/api/pymongo/errors.html
Not sure if you're using a particular ORM - the docs have links to what's available, and these individually have their own best usages:
http://api.mongodb.org/python/current/tools.html
Do you have a particular ORM that you're using, or implementing your own through pymongo?

What would be the equivalent of Pythons "pickle" in nodejs

One of Python's features is the pickle function, that allows you to store any arbitrary anything, and restore it exactly to its original form. One common usage is to take a fully instantiated object and pickle it for later use. In my case I have an AMQP Message object that is not serializable and I want to be able to store it in a session store and retrieve it which I can do with pickle. The primary difference is that I need to call a method on the object, I am not just looking for the data.
But this project is in nodejs and it seems like with all of node's low-level libraries there must be some way to save this object, so that it could persist between web calls.
The use case is that a web page picks up a RabbitMQ message and displays the info derived from it. I don't want to acknowledge the message until the message has been acted on. I would just normally just save the data in session state, but that's not an option unless I can somehow save it in its original form.

See the pickle-js project: https://code.google.com/p/pickle-js/
Also, from findbestopensource.com:
pickle.js is a JavaScript implementation of the Python pickle format. It supports pickles containing a cross-language subset of the primitive types. Key differences between pickle.js and pickle.py:text pickles only some types are lossily converted (e.g. int) some types are not supported (e.g. class)
More information available here: http://www.findbestopensource.com/product/pickle-js

As far as I am aware, there isn't an equivalent to pickle in JavaScript (or in the standard node libraries).

Check out https://github.com/carlos8f/hydration to see if it fits your needs. I'm not sure it's as complete as pickle but it's pretty terrific.
Disclaimer: The module author and I are coworkers.

Multi-language website - unique URLs required for different languages (to prevent caching)?

I have developed an AppEngine/Python/Django application that currently works in Spanish, and I am in the process of internationalizing with multi-language support. It is basically a dating website, in which people can browse other profiles and send messages. Viewing a profile in different languages will result in some of the text (menus etc) being displayed in whichever language is selected, but user-generated content (ie. user profile or message) will be displayed in the original language in which it was written.
My question is: is it necessary (or a good idea) to use unique URLs for the same page being displayed in different languages or is it OK to overload the same URL for a given page being displayed in different languages. In particular, I am worried that if I use the same URL for multiple languages, then some pages might be cached (either by Google, or by some other proxy that I might not be aware of), which could result in an incorrect language being displayed to a user.
Does anyone know if this is a legitimate concern, or if I am worrying about something that will not happen?

In principle, you can use the Content-Language and Vary response headers and the Accept-Language request header to control how caches behave and to prevent them serving up the wrong language to users.
In practice, however, Accept-Language is frequently set incorrectly in browsers, which is why most sites don't rely on it, or at least provide a secondary mechanism. Caches may be similarly unreliable about respecting the Vary header, but I'm not sure. Having language-specific URLs is certainly a practical way to do it, and avoids any potential issues with caching.

I don't know how this works with django, but looking at it from a general web-development perspective, you could:
use a query parameter to determine the language (e.g. /foo/bar/page.py?lang=en)
Add the language code to the url path (e.g. /foo/bar/en/page.py), and optionally use mod_rewrite so that that part of the path gets passed to your script as a query parameter.

design for handling exceptions - google app engine

I'm developing a project on google app engine (webapp framework). I need you people to assess how I handle exceptions.
There are 4 types of exceptions I am handling:
Programming exceptions
Bad user input
Incorrect URLs
Incorrect query strings
Here is how I handle them:
I have subclassed the webapp.requesthandler class and overrode the handle_exceptions method. Whenever an exception occurs, I take the user to a friendly "we're sorry" page and in the meantime send a message with the traceback to the admins.
On the client side I (will) use js and also validate on the server side.
Here I figure (as a coder with non-web experience) in addition to validate inputs according to programming logic (check: cash input is of the float type?) and business rules (check: user has enough points to take that action?), I also have to check against malicious intentions. What measures should I take against malicious actions?
I have a catch-all URL that handles incorrect URLs. That is to say, I take the user to a custom "page does not exist" page. Here I have no problems, I think.
Incorrect query strings presumably raise exceptions if left to themselves. If an ID does not exist, the method returns None (an exception is on the way). if the parameter is inconvenient, the code raises an exception. Here I think I must raise a 404 and take the user to the custom "page does not exist" page. What should I do?
What are your opinions? Thanks in advance..

You seem to have thought things through pretty well. The only thing I would add is that you might want to take a look at Bloog as an example. Bloog is a pretty well written and popular open source blog engine for App Engine written in Python.
Also, on Point #2, watch out for these types of Cross Scripting attacks.
As for #4, keep in mind that 404 pages are an opportunity to add some color and creativity to your design.

Ad. #4: I usually treat query strings as non-essential. If anything is wrong with query string, I'd just present bare resource page (as if no query was present), possibly with some information to user what was wrong with the query string.
This leads to the problem similar to your #3: how did the user got into this wrong query? Did my application produce wrong URL somewhere? Or was it outdated link in some external service, or saved bookmark? HTTP_REFERER might contain some clue, but of course is not authoritative, so I'd log the problematic query (with some additional HTTP headers) and try to investigate the case.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.