Best way to parse arguments in a REST api - python

I am building a Rest API based on flask Restful. And I'm troubled what would be the best way to parse the arguments I am waiting for.
The info:
Table Schema:
-----------------------------------------
| ID | NAME | IP | MAIL | OS | PASSWORD |
-----------------------------------------
Method that does the job:
def update_entry(data, id):
...
...
pass
The resource that handles the request:
def put(self, id):
json_data = request.get_json(force=True)
update_entry(json_data, id)
pass
Json format:
{'NAME': 'john', 'OS': 'windows'}
I have to mention that I do not know if all the above are relevant to my question.
Now what I would like to know is, where is the proper place to check if the client sent the arguments i want or the keys in his request are valid.
I have thought a couple of alternatives but i have the feeling that i'm missing a best practice here.
Pass whatever the client sends to the backend let an error happen and catch it.
Create sth like a json template and validate the client's request with that before pass it back.
Ofc the first option is simplier, but the second doesn't create unnecessary load to my db although might become quite complex.
Any opinion for either of the above two or any other suggestion welcome.

Why you don't consider to use a library like marchmallow since the flask-restful documentation suggest it? It will answer your problems in a proper and non custom why like if you would right the validation from scratch.

It's not a good idea to let your database api catch errors. Repeated DB access will hamper performance.
Best case scenario, if you can error check the json at the client, do it.
Error check a json in python anyhow. You must always assume that the network
is compromised and you will get garbage values/ malicious requests.
A design principle I read somewhere (Clean Code I think) was that be strict on
output but go easy on the input.

Related

Besides automatic documentation, what's the rationale of providing a response model for FastAPI endpoints?

The question is basically in the title: Does providing a bespoke response model serve any further purpose besides clean and intuitive documentation? What's the purpose of defining all these response models for all the endpoints rather than just leaving it empty?
I've started working with FastAPI recently, and I really like it. I'm using FastAPI with a MongoDB backend. I've followed the following approach:
Create a router for a given endpoint-category
Write the endpoint with the decorator etc. This involves the relevant query and defining the desired output of the query.
Then, test and trial everything.
Usually, prior to finalising an endpoint, I would set the response_model in the decorator to something generic, like List (imported from typing). This would look something like this:
#MyRouter.get(
'/the_thing/{the_id}',
response_description="Returns the thing for target id",
response_model=List,
response_model_exclude_unset=True
)
In the swagger-ui documentation, this will result in an uninformative response-model, which looks like this:
So, I end up defining a response-model, which corresponds to the fields I'm returning in my query in the endpoint function; something like this:
class the_thing_out(BaseModel):
id : int
name : str | None
job : str | None
And then, I modify the following: response_model=List[the_thing_out]. This will give a preview of what I can expect to be returned from a given call from within the swagger ui documentation.
Well, to be fair, having an automatically generated OpenAPI-compliant description of your interface is very valuable in and of itself.
Other than that, there is the benefit of data validation in the broader sense, i.e. ensuring that the data that is actually sent to the client conforms to a pre-defined schema. This is why Pydantic is so powerful and FastAPI just utilizes its power to that end.
You define a schema with Pydantic, set it as your response_model and then never have to worry about wrong types or unexpected values or what have you accidentally being introduced in your response data.* If you try to return some invalid data from your route, you'll get an error, instead of the client potentially silently receiving garbage that might mess up the logic on its end.
Now, could you achieve the same thing by just manually instantiating your Pydantic model with the data you want to send yourself first, then generating the JSON and packaging that in an adequate HTTP response?
Sure. But that is just extra steps you have to make for each route. And if you do that three, four, five times, you'll probably come up with an idea to factor out that model instantiation, etc. in a function that is more or less generic over any Pydantic model and data you throw at it... and oh, look! You implemented your own version of the response_model logic. 😉
Now, all this becomes more and more important the more complex your schemas get. Obviously, if all your route does is return something like
{"exists": 1}
then neither validation nor documentation is all that worthwhile. But I would argue it's usually better to prepare in advance for potential growth of whatever application you are developing.
Since you are using MongoDB in the back, I would argue this becomes even more important. I know, people say that it is one of the "perks" of MongoDB that you need no schema for the data you throw at it, but as soon as you provide an endpoint for clients, it would be nice to at least broadly define what the data coming from that endpoint can look like. And once you have that "contract", you just need a way to safeguard yourself against messing up, which is where the aforementioned model validation comes in.
Hope this helps.
* This rests on two assumptions of course: 1) You took great care in defining your schema (incl. validation) and 2) Pydantic works as expected.

Graphene/Flask/SQLAlchemy - What is the recommended method to retrieve data from a route entry point?

Given a basic project structure as follows:
/
app.py <-- Flask app startup and basic routing
models.py
schema.py <-- Contains Graphene/SQLAlchemy schema definitions for queries/mutations
Say in my app.py I have some basic routes setup like so:
#app.route('/members/list', methods=['GET'])
def members():
# What should I do here?
What is the "correct" way to retrieve data? I can see a few different approaches, but I'm not sure if there's a recommended way and I can't seem to find a straightforward answer. For example:
return jsonify(session.query(MembersModel).all()) I feel that this is probably the right way, but it feels weird plopping this down right at the route (feels like I'm missing some service layer architecture) or that I'm not using schema.py correctly. If I were to go this method, does this sit with in my schema.py? Or should I be making a different service-esque file elsewhere?
Running a GraphQL query directly by myself like schema.execute('{ allMembers { ... } }') via Graphene (as seen here) and then parsing my result back in a response. This feels ... wrong, having hardcoded GraphQL in my code when there's a better alternative in #1.
I have prior experience with Spring and I always did it with an MVC styled controller <-> service <-> dao, but I'm not sure what the Flask/Graphene/SQLAlchemy/GraphQL/SQLite equivalent is. I have this nagging feeling that I'm missing an obvious answer here, so if anyone could direct me to some resources or help out, I'd appreciate it.
Thanks!
Okay, after hours of reading I finally realized it: I'm not supposed to be playing between REST web api's and GraphQL like this (disregarding legacy systems/migrations/etc). Essentially, GraphQL loosely competes with REST sort of in the vein with how JSON competes with XML.
I was under the impression that GraphQL was comparable to a higher-level SQL, wherein GraphQL sat above my SQLite layer and abstracted away traditional SQL with some new-fangled terminology and abstractions like relays and connections. Instead, GraphQL competes at an even higher level as mentioned earlier.
So, when dealing with Flask, GraphQL and Graphene, the most I should be doing is execute queries via the GraphiQL interface or POST'em to my server directly - and not do something like GET /my-path/some-resource just to manually hit a GraphQL query somewhere in the backend.
Of course, if I misinterpreted anything, please let me know!

Get variable from url in view

I want to make an HTTP GET request to ip:port/this1234 and use "1234" as a variable in Python code. The "1234" is an arbitrary int. How do I get the int as an argument to my view?
#app.route('/stop####')
def stop####():
global stopped
if stopped:
call(["./stop.sh"], shell = True)
stopped = False
return "started"
return "already started"
You'll want to take a look at the Quickstart guide.
Meanwhile, you can get POST data using
myvar = request.form["myvar"]
and GET using
myvar = request.args.get("myvar")
The guide then goes on to mention some error handling recommendations and references the more in depth request object page.
We recommend accessing URL parameters with get or by catching the KeyError because users might change the URL and presenting them a 400 bad request page in that case is not user friendly.
For a full list of methods and attributes of the request object, head
over to the request documentation.
You might also want to look at routing a bit. I'm uncertain what you're trying to accomplish with the pound sign in your routing (EDIT: I see what you mean on re-reading; see edit at bottom). Note the quote below from a comment on SitePoint.
browsers don't send the #whatever part of the URL to the server in the HTTP request when requesting the page
Ok, so if you want to pass the value in the URI, I recommend something more like: example.com/this/1234 and your routing rule would look like #app.route('/this/<myVar>') above your def my_func(myVar):
Finally, at some level, killing any process based off of an http request seems awful daring, but you know your environment best, and for all I know, it might not even be exposed to the internet. Goodluck, and don't forget to be safe about this.
I think the obvious way to do it would be:
#app.route('/kill/<int:pid>')
def kill(pid):
.....

python json response using schema

I have an app and a server communicating using json. I'm now trying to "pythonize" my server code as best as I can (I'm a long time C coder and I'm afraid my python code flow looks more C-like than pythonic).
I have a bunch of messages going back and forth. Thus far the message format was "implicit", and I didn't really define a schema to make it explicit/readable/validatable etc.
Searching through on the topic, I now have a good handle on how to define the incoming message schema, validate it etc. With colander, i might even directly be able to take it into a class.
However, on the outbound side (ie, responses from the server), I want to have a similar well defined structure and interface.
My question is:
How do I USE the defined outbound schema while CONSTRUCTING the response data ? A 'C' analogy would be to use a struct.
Essentially, I don't want any place in my code to do something ugly like
r = dict(response_field=response_data)
HttpResponse(json.dumps(r))
Because them I'm implicitly creating my format on the fly...
I'd rather use the schema as the base to contruct the response
Any thoughts, suggestions, best practices pointers ?
thanks
You can define your outbound data contracts with regular Python classes.
Or you might consider json-schema to define the public API interfaces (incoming and outgoing data contracts). You have a json-schema validator in python that can be a good alternative to colander.
If you have structured data à la relational database, then you might consider XSD and XML. More on this on stackoverflow.
If structures and constraints are simple, then Avro or Protocol Buffers might be enough.

Public API with Private Elements in Python

I'm working on a web mapping service and would like to provide my users with a Python API that they can use to create custom plugins. These plugins would be running on my server so I'm trying to lock down Python as much as possible.
To ensure that users can't access files they are not supposed to, I'm planning on running the plugins inside of a PyPy sandboxed interpreter.
The last large hurdle that I'm trying to overcome is how to overcome is with the API itself. I have a database API that will allow the user to make controlled queries to the database. For example, I have a select(column,condition) function that will allow users to retrieve items from the database but they will be restricted to a table. Then, there is a private function _query(sql) that takes raw SQL commands and executes them on the server
The problem is that Python (at least to my knowledge) provides no way of preventing users from calling the query function directly and nuking my database. In order to prevent this, I was thinking of breaking my API into two parts:
------------- pipe -------------- ------------
| publicAPI | -------> | privateAPI | ---> | database |
------------- -------------- ------------
So publicAPI would basically be a proxy API communicating with the main API via a pipe. publicAPI would only contain proxies to the public API functions, users would be unable to get a hold of the private elements (such as _query(sql)).
Do you think this is a viable solution? Am I making way too much work for myself by overlooking a simpler solution?
Thanks so much for your help!
This looks like a clean way to implement this to me. I believe it's also sometimes referred to as the "Facade" design pattern.
In python this is very easy to implement using explicit method delegation (a short snippet to give you a general idea):
class FacadingAPI():
def __init__(fullapi_instance):
self.select = fullapi_instance.select # method delegating
...
There are many ways that you can do this. I tend to have a tuple with all the function names that are available and then check if the function being called is in the tuple, if not then throw an error.
e.g.
funcs = ("available","functions","here")
if (calledFunction in funcs):
#do something
else:
#throw error
or you can take the approach that google have on their App Engine page
http://code.google.com/appengine/articles/rpc.html

Categories

Resources