python json response using schema - python

I have an app and a server communicating using json. I'm now trying to "pythonize" my server code as best as I can (I'm a long time C coder and I'm afraid my python code flow looks more C-like than pythonic).
I have a bunch of messages going back and forth. Thus far the message format was "implicit", and I didn't really define a schema to make it explicit/readable/validatable etc.
Searching through on the topic, I now have a good handle on how to define the incoming message schema, validate it etc. With colander, i might even directly be able to take it into a class.
However, on the outbound side (ie, responses from the server), I want to have a similar well defined structure and interface.
My question is:
How do I USE the defined outbound schema while CONSTRUCTING the response data ? A 'C' analogy would be to use a struct.
Essentially, I don't want any place in my code to do something ugly like
r = dict(response_field=response_data)
HttpResponse(json.dumps(r))
Because them I'm implicitly creating my format on the fly...
I'd rather use the schema as the base to contruct the response
Any thoughts, suggestions, best practices pointers ?
thanks

You can define your outbound data contracts with regular Python classes.
Or you might consider json-schema to define the public API interfaces (incoming and outgoing data contracts). You have a json-schema validator in python that can be a good alternative to colander.
If you have structured data à la relational database, then you might consider XSD and XML. More on this on stackoverflow.
If structures and constraints are simple, then Avro or Protocol Buffers might be enough.

Related

Besides automatic documentation, what's the rationale of providing a response model for FastAPI endpoints?

The question is basically in the title: Does providing a bespoke response model serve any further purpose besides clean and intuitive documentation? What's the purpose of defining all these response models for all the endpoints rather than just leaving it empty?
I've started working with FastAPI recently, and I really like it. I'm using FastAPI with a MongoDB backend. I've followed the following approach:
Create a router for a given endpoint-category
Write the endpoint with the decorator etc. This involves the relevant query and defining the desired output of the query.
Then, test and trial everything.
Usually, prior to finalising an endpoint, I would set the response_model in the decorator to something generic, like List (imported from typing). This would look something like this:
#MyRouter.get(
'/the_thing/{the_id}',
response_description="Returns the thing for target id",
response_model=List,
response_model_exclude_unset=True
)
In the swagger-ui documentation, this will result in an uninformative response-model, which looks like this:
So, I end up defining a response-model, which corresponds to the fields I'm returning in my query in the endpoint function; something like this:
class the_thing_out(BaseModel):
id : int
name : str | None
job : str | None
And then, I modify the following: response_model=List[the_thing_out]. This will give a preview of what I can expect to be returned from a given call from within the swagger ui documentation.
Well, to be fair, having an automatically generated OpenAPI-compliant description of your interface is very valuable in and of itself.
Other than that, there is the benefit of data validation in the broader sense, i.e. ensuring that the data that is actually sent to the client conforms to a pre-defined schema. This is why Pydantic is so powerful and FastAPI just utilizes its power to that end.
You define a schema with Pydantic, set it as your response_model and then never have to worry about wrong types or unexpected values or what have you accidentally being introduced in your response data.* If you try to return some invalid data from your route, you'll get an error, instead of the client potentially silently receiving garbage that might mess up the logic on its end.
Now, could you achieve the same thing by just manually instantiating your Pydantic model with the data you want to send yourself first, then generating the JSON and packaging that in an adequate HTTP response?
Sure. But that is just extra steps you have to make for each route. And if you do that three, four, five times, you'll probably come up with an idea to factor out that model instantiation, etc. in a function that is more or less generic over any Pydantic model and data you throw at it... and oh, look! You implemented your own version of the response_model logic. 😉
Now, all this becomes more and more important the more complex your schemas get. Obviously, if all your route does is return something like
{"exists": 1}
then neither validation nor documentation is all that worthwhile. But I would argue it's usually better to prepare in advance for potential growth of whatever application you are developing.
Since you are using MongoDB in the back, I would argue this becomes even more important. I know, people say that it is one of the "perks" of MongoDB that you need no schema for the data you throw at it, but as soon as you provide an endpoint for clients, it would be nice to at least broadly define what the data coming from that endpoint can look like. And once you have that "contract", you just need a way to safeguard yourself against messing up, which is where the aforementioned model validation comes in.
Hope this helps.
* This rests on two assumptions of course: 1) You took great care in defining your schema (incl. validation) and 2) Pydantic works as expected.

Convert GraphQLResponse dictionary to python object

I am running a graphql query using aiographql-client and getting back a GraphQLResponse object, which contains a raw dict as part of the response json data.
This dictionary conforms to a schema, which I am able to parse into a graphql.type.schema.GraphQLSchema type using graphql-core's build_schema method.
I can also correctly get the GraphQLObjectType of the object that is being returned, however I am not sure how to properly deserialize the dictionary into a python object with all the appropriate fields, using the GraphQLObjectType as a reference.
Any help would be greatly appreciated!
I'd recommend using Pydantic to the heavy lifting in the parsing.
You can then either generate the models beforehand and select the ones you need based on GraphQLObjectType or generate them at runtime based on the definition returned by build_schema.
If you really must define your models at runtime you can do that with pydantic's create_model function described here : https://pydantic-docs.helpmanual.io/usage/models/#dynamic-model-creation
For the static model generation you can probably leverage something like https://jsontopydantic.com/
If you share some code samples I'd be happy to give some more insights on the actual implementation
I faced the same tedious problem while developing a personal project.
Because of that I just published a library which has the purpose of managing the mappings between python objects and graphql objects (python objects -> graphql query and graphql response -> python objects).
https://github.com/dapalex/py-graphql-mapper
So far it manages only basic query and response, if it will become useful I will keep implementing more features.
Try to have a look and see if it can help you
Coming back to this, there are some projects out there trying to achieve this functionality:
https://github.com/enra-GmbH/graphql-codegen-ariadne
https://github.com/sauldom102/gql_schema_codegen

Repository pattern on Python

I'm new to python and I'm comming from the c# world.
Over there it seemed like the repository pattern was the way to go, but I am having trouble finding any tutorials of how to best do this on Python.
edit I understand that it can be implemented, I'm just wondering if there is any reason why I am finding close to nothing for how to go about doing this.
Thanks!
I wasn't immediately familiar with the "repository pattern", so I looked it up. It appears to be the idea of putting a more general API, like a dictionary-like key/value lookup, in front of a database or other data store. It seems that the idea is to add an additional layer of abstraction that can allow multiple types of data sources (like both a relational database and a CVS file) to be accessed transparently via a common API.
Given this definition, I can think of no reason why this design pattern wouldn't be equally applicable to a problem addressed with Python vs any other programming language.

Graphene/Flask/SQLAlchemy - What is the recommended method to retrieve data from a route entry point?

Given a basic project structure as follows:
/
app.py <-- Flask app startup and basic routing
models.py
schema.py <-- Contains Graphene/SQLAlchemy schema definitions for queries/mutations
Say in my app.py I have some basic routes setup like so:
#app.route('/members/list', methods=['GET'])
def members():
# What should I do here?
What is the "correct" way to retrieve data? I can see a few different approaches, but I'm not sure if there's a recommended way and I can't seem to find a straightforward answer. For example:
return jsonify(session.query(MembersModel).all()) I feel that this is probably the right way, but it feels weird plopping this down right at the route (feels like I'm missing some service layer architecture) or that I'm not using schema.py correctly. If I were to go this method, does this sit with in my schema.py? Or should I be making a different service-esque file elsewhere?
Running a GraphQL query directly by myself like schema.execute('{ allMembers { ... } }') via Graphene (as seen here) and then parsing my result back in a response. This feels ... wrong, having hardcoded GraphQL in my code when there's a better alternative in #1.
I have prior experience with Spring and I always did it with an MVC styled controller <-> service <-> dao, but I'm not sure what the Flask/Graphene/SQLAlchemy/GraphQL/SQLite equivalent is. I have this nagging feeling that I'm missing an obvious answer here, so if anyone could direct me to some resources or help out, I'd appreciate it.
Thanks!
Okay, after hours of reading I finally realized it: I'm not supposed to be playing between REST web api's and GraphQL like this (disregarding legacy systems/migrations/etc). Essentially, GraphQL loosely competes with REST sort of in the vein with how JSON competes with XML.
I was under the impression that GraphQL was comparable to a higher-level SQL, wherein GraphQL sat above my SQLite layer and abstracted away traditional SQL with some new-fangled terminology and abstractions like relays and connections. Instead, GraphQL competes at an even higher level as mentioned earlier.
So, when dealing with Flask, GraphQL and Graphene, the most I should be doing is execute queries via the GraphiQL interface or POST'em to my server directly - and not do something like GET /my-path/some-resource just to manually hit a GraphQL query somewhere in the backend.
Of course, if I misinterpreted anything, please let me know!

Serializing data and unpacking safely from untrusted source

I am using Pyramid as a basis for transfer of data for a turn-based video game. The clients use POST data to present their actions, and GET to retrieve serialized game board data. The game data can sometimes involve strings, but is almost always two integers and two tuples:
gamedata = (userid, gamenumber, (sourcex, sourcey), (destx, desty))
My general client side framework was to Pickle , convert to base 64, use urlencode, and submit the POST. The server then receives the POST, unpacks the single-item dictionary, decodes the base64, and then unpickles the data object.
I want to use Pickle because I can use classes and values. Submitting game data as POST fields can only give me strings.
However, Pickle is regarded as unsafe. So, I turned to pyYAML, which serves the same purpose. Using yaml.safe_load(data), I can serialize data without exposing security flaws. However, the safe_load is VERY safe, I cannot even deserialize harmless tuples or lists, even if they only contain integers.
Is there some middle ground here? Is there a way to serialize python structures without at the same time allowing execution of arbitrary code?
My first thought was to write a wrapper for my send and receive functions that uses underscores in value names to recreate tuples, e.g. sending would convert the dictionary value source : (x, y) to source_0 : x, source_1: y. My second thought was that it wasn't a very wise way to develop.
edit: Here's my implementation using JSON... it doesn't seem as powerful as YAML or Pickle, but I'm still concerned there may be security holes.
Client side was constructed a bit more visibly while I experimented:
import urllib, json, base64
arbitrarydata = { 'id':14, 'gn':25, 'sourcecoord':(10,12), 'destcoord':(8,14)}
jsondata = json.dumps(arbitrarydata)
b64data = base64.urlsafe_b64encode(jsondata)
transmitstring = urllib.urlencode( [ ('data', b64data) ] )
urllib.urlopen('http://127.0.0.1:9000/post', transmitstring).read()
Pyramid Server can retrieve the data objects:
json.loads(base64.urlsafe_b64decode(request.POST['data'].encode('ascii')))
On an unrelated note, I'd love to hear some other opinions about the acceptability of using POST data in this method, my game client is in no way browser based at this time.
Why not use colander for your serialization and deserialization? Colander turns an object schema into simple data structure and vice-versa, and you can use JSON to send and receive this information.
For example:
import colander
class Item(colander.MappingSchema):
thing = colander.SchemaNode(colander.String(),
validator=colander.OneOf(['foo', 'bar']))
flag = colander.SchemaNode(colander.Boolean())
language = colander.SchemaNode(colander.String()
validator=colander.OneOf(supported_languages)
class Items(colander.SequenceSchema):
item = Item()
The above setup defines a list of item objects, but you can easily define game-specific objects too.
Deserialization becomes:
items = Items().deserialize(json.loads(jsondata))
and serialization is:
json.dumps(Items().serialize(items))
Apart from letting you round-trip python objects, it also validates the serialized data to ensure it fits your schema and hasn't been mucked about with.
How about json? The library is part of the standard Python libraries, and it allows serialization of most generic data without arbitrary code execution.
I don't see raw JSON providing the answer here, as I believe the question specifically mentioned pickling classes and values. I don't believe using straight JSON can serialize and deserialize python classes, while pickle can.
I use a pickle-based serialization method for almost all server-to-server communication, but always include very serious authentication mechanisms (e.g. RSA key-pair matching). However, that means I only deal with trusted sources.
If you absolutely need to work with untrusted sources, I would at the very least, try to add (much like #MartijnPieters suggests) a schema to validate your transactions. I don't think there is a good way to work with arbitrary pickled data from an untrusted source. You'd have to do something like parse the byte-string with some disassembler and then only allow trusted patterns (or block untrusted patterns). I don't know of anything that can do this for pickle.
However, if your class is "simple enough"… you might be able to use the JSONEncoder, which essentially converts your python class to something JSON can serialize… and thus validate…
How to make a class JSON serializable
The impact is, however, you have to derive your classes from JSONEncoder.

Categories

Resources