I made a wrapper, and recently I've reflected about what is a poor design decision in my opinion, however, I'm not sure what would be the better alternative.
Currently, my wrapper works like this (names changed):
import wrapper
api = wrapper.Api(params...)
# fetch fetches data from the API
objects = wrapper.model.fetch(api, params...)
# get_special might be only relevant to a certain object type
other_objects = wrapper.other_model.get_special(api, params...)
Obviously, passing the API as a parameter isn't very oop-y, and regardless it just looks bad. I considered just putting all of those classes in the API class, but I don't think that's a good idea, either. What would be a better design choice?
Thanks in advance :D
As per request, I'm adding more details.
My application is a wrapper for the website launchlibrary.net, which provides data about rocket launches. It's available here - https://github.com/Plutoberth/python-launch-library
Basically, I'm not sure how to implement this in a way that is both tidy and doesn't require you to pass the API each time. My ideal way would be like this, but I think that would require adding those functions in the API, which would make it quite huge.
import launchlibrary as ll # the wrapper
api = ll.Api(params..)
launches = api.Launch(params...)
I guess that this is more of a design question than an implementation one. Implementing that piece of code would be trivial, but it'd make my API class huge.
Related
The question is basically in the title: Does providing a bespoke response model serve any further purpose besides clean and intuitive documentation? What's the purpose of defining all these response models for all the endpoints rather than just leaving it empty?
I've started working with FastAPI recently, and I really like it. I'm using FastAPI with a MongoDB backend. I've followed the following approach:
Create a router for a given endpoint-category
Write the endpoint with the decorator etc. This involves the relevant query and defining the desired output of the query.
Then, test and trial everything.
Usually, prior to finalising an endpoint, I would set the response_model in the decorator to something generic, like List (imported from typing). This would look something like this:
#MyRouter.get(
'/the_thing/{the_id}',
response_description="Returns the thing for target id",
response_model=List,
response_model_exclude_unset=True
)
In the swagger-ui documentation, this will result in an uninformative response-model, which looks like this:
So, I end up defining a response-model, which corresponds to the fields I'm returning in my query in the endpoint function; something like this:
class the_thing_out(BaseModel):
id : int
name : str | None
job : str | None
And then, I modify the following: response_model=List[the_thing_out]. This will give a preview of what I can expect to be returned from a given call from within the swagger ui documentation.
Well, to be fair, having an automatically generated OpenAPI-compliant description of your interface is very valuable in and of itself.
Other than that, there is the benefit of data validation in the broader sense, i.e. ensuring that the data that is actually sent to the client conforms to a pre-defined schema. This is why Pydantic is so powerful and FastAPI just utilizes its power to that end.
You define a schema with Pydantic, set it as your response_model and then never have to worry about wrong types or unexpected values or what have you accidentally being introduced in your response data.* If you try to return some invalid data from your route, you'll get an error, instead of the client potentially silently receiving garbage that might mess up the logic on its end.
Now, could you achieve the same thing by just manually instantiating your Pydantic model with the data you want to send yourself first, then generating the JSON and packaging that in an adequate HTTP response?
Sure. But that is just extra steps you have to make for each route. And if you do that three, four, five times, you'll probably come up with an idea to factor out that model instantiation, etc. in a function that is more or less generic over any Pydantic model and data you throw at it... and oh, look! You implemented your own version of the response_model logic. 😉
Now, all this becomes more and more important the more complex your schemas get. Obviously, if all your route does is return something like
{"exists": 1}
then neither validation nor documentation is all that worthwhile. But I would argue it's usually better to prepare in advance for potential growth of whatever application you are developing.
Since you are using MongoDB in the back, I would argue this becomes even more important. I know, people say that it is one of the "perks" of MongoDB that you need no schema for the data you throw at it, but as soon as you provide an endpoint for clients, it would be nice to at least broadly define what the data coming from that endpoint can look like. And once you have that "contract", you just need a way to safeguard yourself against messing up, which is where the aforementioned model validation comes in.
Hope this helps.
* This rests on two assumptions of course: 1) You took great care in defining your schema (incl. validation) and 2) Pydantic works as expected.
So, I'm not a test expert and sometimes, when using packages like DRF, I think what should I test on the code...
If I write custom functions for some endpoints, I understand I should test this because I've written this code and there are no tests for this... But the DRF codebase is pretty tested.
But if I'm writing a simple API that only extends ModelSerializer and ModelViewSet what should I be testing?
The keys in the JSON serialized?
The relations?
What should I be testing?
Testing your ModelSerializer, Check the request payload against your expected Model fields.
Testing your ModelViewSet, Check the response HTTP_Status_Code against the expected Status codes for your viewsets. You can also test for your response data.
A good resource - https://realpython.com/test-driven-development-of-a-django-restful-api/
Even if you're only using automated features and added absolutely no customization on your serializer and viewset, and it's obvious to you that this part of the code works smoothly, you still need to write tests.
Code tends to get large, and some other person might be extending your code, or you might go back to your code a few months later and not remember how your implementation was. Knowing that tests are passing will inform other people (or yourself in the distant future) that you're code is working without having to read it and dive into the implementation details, which makes your code reliable.
The person using your API might be using it at a service and not even be interested in what framework or language you used for implementation, but only wants to be sure that the features he/she requires work properly. How can we ensure this? One way is to write tests and pass them.
That's why it's very important to write complete and reliable tests so people can safely use or extend your code knowing that the tests are passing and everything is OK.
I'm a self-taught programmer and a lot of the problems I encounter come from a lack of formal education (and often also experience).
My question it the following: How to rationalize where you store the data a class or function creates? I'll make a simple example:
Case: I have a webshop (SHOP) with a REST api and a product provider (PROVIDER) also with a REST API. I determine the product, I send that data to PROVIDER who sends me back formatted data that can be read by SHOP to make a working product on the webshop. PROVIDER also has a secondary REST api that provides generated images.
What I would come up with:
I'd make three classes: ProductBase, Shop and Provider
ProductBase would be the class from where I instantiate and store the individual product information.
Shop would be where I design the api interactions with the webshop.
Provider same as shop, but for interactions with provider api.
My problem: At some point you're creating data that's not clearly separated in concern. For example: Would I store the generated product data (from PROVIDER) in the ProductBase instance I created? It feels like I'm coupling the two classes this way. But it not there, then where?
What if I create product images with PROVIDER and I upload them to SHOP? Do I store the uploaded image-url in PRODUCT? How do you keep track of all this info?
The question I want answered:
I've read a lot on OOP and Design Patterns, and I have adopted a TDD approach which has greatly helped to improve my code but I haven't found anything on how to approach the flow of at runtime generated data within software engineering.
What would be a good way to solve above problem(s) and could you explain your rationale for it?
If I understand correctly, I think your current concern is that you have "raw" product data, which you want to store in objects, and you have "processed" (formatted) product data, which you also want to store in objects. Your question being should you mix them.
Let me just first point out the other obvious option. Namely, having two product classes: RawProduct and ProcessedProduct. Which to do?
(Edit: also, to be sure, product data should not be stored in provider. The provide performs the action of formatting but the data is product data. Not provider data).
It depends. There are a couple of considerations:
1) In general, in OOP, the idea is to couple actions on data with the data. So if possible, you have some method in ProductBase like "format()", where format will send the object off to the API to get formatted, and store the result in an instance variable. You can then also have a method like "find_image", that goes and fetches the image url from the API and then stores that in a field. An object's data is meant to be dynamic. It is meant to be altered by object methods.
2) If you need version control (if you want the full history of the object's state to be available), then you can't override fields with new data. So either you need to store a history of every object field in the object, or you need to create new objects.
3) Is RAM a concern? I sometimes create dataclasses that store only the final part of an object's life so that I can fit more of the objects into memory.
Personally I often find myself creating "RawObject" and "ProcessedObject" classes, it's just easier a lot of the time. But that's probably because I mostly work with document processing, so it's very clear. Usually You'll just update the objects data.
A benefit of having one object with the full history is that it is much easier to debug. Because the raw data and the API result are in the same object. So you can very easily probe what went wrong. If you start splitting things up it's harder to track. In general, the more information an object has about where it's been, the easier it is to figure out what went wrong with it.
Remember also though, since this is a Python question, Python is multi-paridigm. And if you're writing pipeline-style architectures (synchronous, linear processes), then a functional approach can also work well.
Once your data is stored in a product object, anything can hold a reference to that. So a shop can reference an object and a product can reference the object. Be clear on the difference between "has-a" relationships and "is-a" relationships.
How can i implement structure array like matlab in python ?
matlab code :
cluster.c=[]
cluster.indiv=[]
Although you can do this in Python (as I explain below), it might not be the best or most pythonic approach. For other users that have to look at your code (including yourself in 3 months) this syntax is extremely confusing. Think for example about how this deals with name conflicts, undefined values and iterating over properties.
Instead consider storing the data in a data structure that is better suited for this such as a dictionary. Then you can just store everything in
cluster = {'c':[],'indiv':[]}
Imitating Matlab in a bad way:
You can assign properties to any mutable objects in python.
If you need an object just for data storage, then you can define a custom class without any functionality in the following way:
class CustomStruct():
pass
Then you can have
struct=CustomStruct()
struct.c=[]
and change or request properties of the class in this way.
Better approach:
If you really want to store these things as properties of an object, then it might be best to define the variables in the init of that class.
class BetterStruct():
def __init__(self):
self.c=[]
self.indiv=[]
In this way, users looking at your code can immediately understand the expected values, and you can guarantee that they are initalised in a proper fashion.
Allowing data control
If you want to verify the data when it is stored, or if it has to be calculated once the user requests it (instead of storing it constantly), then consider using Python property decorators
So I have a function which creates a dynamic model. I accomplish this in a way very similar to AuditTrail (see django wiki).
Sample of code is here:
https://gist.github.com/0212845ae00891efe555
Is there any way I can make a dynamically-generated class pickle-able? Ideally something thats not a crazy monkeypatch/hack?
I am aware of the problem where pickle can't store a generated or dynamic class. I solved this by rigging in my dynamic type into the modules dict like so:
new_class = type(name, (models.Model,), attrs)
mod = sys.modules[new_class.__module__]
mod.__dict__[new_class.__name__] = new_class
It's FAR from a clean or elegant solution, so if someone can think of a more django-friendly way to make this happen, I am all ears. However, the above code does work.
The reason there aren't answers for this is because the answer is likely hackish. I don't think you can unpickle an object in Python without knowing the structure of the class on the receiving end without some sort of hackish solution. A big reason pickle doesn't support it is probably because it's a fantastic way to introduce malicious code into your application.
http://www.mofeel.net/871-comp-lang-python/2898.aspx explains a bit why dynamically created classes can't be unpickled.
In every case, I've either just serialized a dictionary of the attributes of the object using the dict method, or just figured out some awful work around. I hope you come up with something better.
Good Luck!