Flatten nested Pydantic model - python

from typing import Union
from pydantic import BaseModel, Field
class Category(BaseModel):
name: str = Field(alias="name")
class OrderItems(BaseModel):
name: str = Field(alias="name")
category: Category = Field(alias="category")
unit: Union[str, None] = Field(alias="unit")
quantity: int = Field(alias="quantity")
When instantiated like this:
OrderItems(**{'name': 'Test','category':{'name': 'Test Cat'}, 'unit': 'kg', 'quantity': 10})
It returns data like this:
OrderItems(name='Test', category=Category(name='Test Cat'), unit='kg', quantity=10)
But I want the output like this:
OrderItems(name='Test', category='Test Cat', unit='kg', quantity=10)
How can I achieve this?

You should try as much as possible to define your schema the way you actually want the data to look in the end, not the way you might receive it from somewhere else.
UPDATE: Generalized solution (one nested field or more)
To generalize this problem, let's assume you have the following models:
from pydantic import BaseModel
class Foo(BaseModel):
x: bool
y: str
z: int
class _BarBase(BaseModel):
a: str
b: float
class Config:
orm_mode = True
class BarNested(_BarBase):
foo: Foo
class BarFlat(_BarBase):
foo_x: bool
foo_y: str
Problem: You want to be able to initialize BarFlat with a foo argument just like BarNested, but the data to end up in the flat schema, wherein the fields foo_x and foo_y correspond to x and y on the Foo model (and you are not interested in z).
Solution: Define a custom root_validator with pre=True that checks if a foo key/attribute is present in the data. If it is, it validates the corresponding object against the Foo model, grabs its x and y values and then uses them to extend the given data with foo_x and foo_y keys:
from pydantic import BaseModel, root_validator
from pydantic.utils import GetterDict
...
class BarFlat(_BarBase):
foo_x: bool
foo_y: str
#root_validator(pre=True)
def flatten_foo(cls, values: GetterDict) -> GetterDict | dict[str, object]:
foo = values.get("foo")
if foo is None:
return values
# Assume `foo` must ba valid `Foo` data:
foo = Foo.validate(foo)
return {
"foo_x": foo.x,
"foo_y": foo.y,
} | dict(values)
Note that we need to be a bit more careful inside a root validator with pre=True because the values are always passed in the form of a GetterDict, which is an immutable mapping-like object. So we cannot simply assign new values foo_x/foo_y to it like we would to a dictionary. But nothing is stopping us from returning the cleaned up data in the form of a regular old dict.
To demonstrate, we can throw some test data at it:
test_dict = {"a": "spam", "b": 3.14, "foo": {"x": True, "y": ".", "z": 0}}
test_orm = BarNested(a="eggs", b=-1, foo=Foo(x=False, y="..", z=1))
test_flat = '{"a": "beans", "b": 0, "foo_x": true, "foo_y": ""}'
bar1 = BarFlat.parse_obj(test_dict)
bar2 = BarFlat.from_orm(test_orm)
bar3 = BarFlat.parse_raw(test_flat)
print(bar1.json(indent=4))
print(bar2.json(indent=4))
print(bar3.json(indent=4))
The output:
{
"a": "spam",
"b": 3.14,
"foo_x": true,
"foo_y": "."
}
{
"a": "eggs",
"b": -1.0,
"foo_x": false,
"foo_y": ".."
}
{
"a": "beans",
"b": 0.0,
"foo_x": true,
"foo_y": ""
}
The first example simulates a common situation, where the data is passed to us in the form of a nested dictionary. The second example is the typical database ORM object situation, where BarNested represents the schema we find in a database. The third is just to show that we can still correctly initialize BarFlat without a foo argument.
One caveat to note is that the validator does not get rid of the foo key, if it finds it in the values. If your model is configured with Extra.forbid that will lead to an error. In that case, you'll just need to have an extra line, where you coerce the original GetterDict to a dict first, then pop the "foo" key instead of getting it.
Original post (flatten single field)
If you need the nested Category model for database insertion, but you want a "flat" order model with category being just a string in the response, you should split that up into two separate models.
Then in the response model you can define a custom validator with pre=True to handle the case when you attempt to initialize it providing an instance of Category or a dict for category.
Here is what I suggest:
from pydantic import BaseModel, validator
class Category(BaseModel):
name: str
class OrderItemBase(BaseModel):
name: str
unit: str | None
quantity: int
class OrderItemCreate(OrderItemBase):
category: Category
class OrderItemResponse(OrderItemBase):
category: str
#validator("category", pre=True)
def handle_category_model(cls, v: object) -> object:
if isinstance(v, Category):
return v.name
if isinstance(v, dict) and "name" in v:
return v["name"]
return v
Here is a demo:
if __name__ == "__main__":
insert_data = '{"name": "foo", "category": {"name": "bar"}, "quantity": 1}'
insert_obj = OrderItemCreate.parse_raw(insert_data)
print(insert_obj.json(indent=2))
... # insert into DB
response_obj = OrderItemResponse.parse_obj(insert_obj.dict())
print(response_obj.json(indent=2))
Here is the output:
{
"name": "foo",
"unit": null,
"quantity": 1,
"category": {
"name": "bar"
}
}
{
"name": "foo",
"unit": null,
"quantity": 1,
"category": "bar"
}
One of the benefits of this approach is that the JSON Schema stays consistent with what you have on the model. If you use this in FastAPI that means the swagger documentation will actually reflect what the consumer of that endpoint receives. You could of course override and customize schema creation, but... why? Just define the model correctly in the first place and avoid headache in the future.

Try this when instantiating:
myCategory = Category(name="test cat")
OrderItems(
name="test",
category=myCategory.name,
unit="kg",
quantity=10)

Well, i was curious, so here's the insane way:
class Category(BaseModel):
name: str = Field(alias="name")
class OrderItems(BaseModel):
name: str = Field(alias="name")
category: Category = Field(alias="category")
unit: Union[str, None] = Field(alias="unit")
quantity: int = Field(alias="quantity")
def json(self, *args, **kwargs) -> str:
self.__dict__.update({'category': self.__dict__['category'].name})
return super().json(*args, **kwargs)
c = Category(name='Dranks')
m = OrderItems(name='sodie', category=c, unit='can', quantity=1)
m.json()
And you get:
'{"name": "sodie", "category": "Dranks", "unit": "can", "quantity": 1}'
The sane way would probably be:
class Category(BaseModel):
name: str = Field(alias="name")
class OrderItems(BaseModel):
name: str = Field(alias="name")
category: Category = Field(alias="category")
unit: Union[str, None] = Field(alias="unit")
quantity: int = Field(alias="quantity")
c = Category(name='Dranks')
m = OrderItems(name='sodie', category=c, unit='can', quantity=1)
r = m.dict()
r['category'] = r['category']['name']

Related

Pydantic: Create model with fixed and extended fields from a Dict[str, OtherModel], the Typescript [key: string] way

From a similar question, the goal is to create a model like this Typescript interface:
interface ExpandedModel {
fixed: number;
[key: string]: OtherModel;
}
However the OtherModel needs to be validated, so simply using:
class ExpandedModel(BaseModel):
fixed: int
class Config:
extra = "allow"
Won't be enough. I tried root (pydantic docs):
class VariableKeysModel(BaseModel):
__root__: Dict[str, OtherModel]
But doing something like:
class ExpandedModel(VariableKeysModel):
fixed: int
Is not possible due to:
ValueError: root cannot be mixed with other fields
Would something like #root_validator (example from another answer) be helpful in this case?
Thankfully, Python is not TypeScript. As mentioned in the comments here as well, an object is generally not a dictionary and dynamic attributes are considered bad form in almost all cases.
You can of course still set attributes dynamically, but they will for example never be recognized by a static type checker like Mypy or your IDE. This means you will not get auto-suggestions for those dynamic fields. Only attributes that are statically defined within the namespace of the class are considered members of that class.
That being said, you can abuse the extra config option to allow arbitrary fields to by dynamically added to the model, while at the same time enforcing all corresponding values to be of a specific type via a root_validator.
from typing import Any
from pydantic import BaseModel, root_validator
class Foo(BaseModel):
a: int
class Bar(BaseModel):
b: str
#root_validator
def validate_foo(cls, values: dict[str, Any]) -> dict[str, Any]:
for name, value in values.items():
if name in cls.__fields__:
continue # ignore statically defined fields here
values[name] = Foo.parse_obj(value)
return values
class Config:
extra = "allow"
Demo:
if __name__ == "__main__":
from pydantic import ValidationError
bar = Bar.parse_obj({
"b": "xyz",
"foo1": {"a": 1},
"foo2": Foo(a=2),
})
print(bar.json(indent=4))
try:
Bar.parse_obj({
"b": "xyz",
"foo": {"a": "string"},
})
except ValidationError as err:
print(err.json(indent=4))
try:
Bar.parse_obj({
"b": "xyz",
"foo": {"not_a_foo_field": 1},
})
except ValidationError as err:
print(err.json(indent=4))
Output:
{
"b": "xyz",
"foo2": {
"a": 2
},
"foo1": {
"a": 1
}
}
[
{
"loc": [
"__root__",
"a"
],
"msg": "value is not a valid integer",
"type": "type_error.integer"
}
]
[
{
"loc": [
"__root__",
"a"
],
"msg": "field required",
"type": "value_error.missing"
}
]
A better approach IMO is to just put the dynamic name-object-pairs into a dictionary. For example, you could define a separate field foos: dict[str, Foo] on the Bar model and get automatic validation out of the box that way.
Or you ditch the outer base model altogether for that specific case and just handle the data as a native dictionary with Foo values and parse them all via the Foo model.

FastAPI create a generic response model that would suit requirements

I've been working with FastAPI for some time, it's a great framework.
However real life scenarios can be surprising, sometimes a non-standard approach is necessary. There's a one case I'd like to ask your help with.
There's a strange external requirement that a model response should be formatted as stated in example:
Desired behavior:
GET /object/1
{status: ‘success’, data: {object: {id:‘1’, category: ‘test’ …}}}
GET /objects
{status: ‘success’, data: {objects: [...]}}}
Current behavior:
GET /object/1 would respond:
{id: 1,field1:"content",... }
GET /objects/ would send a List of Object e.g.,:
{
[
{id: 1,field1:"content",... },
{id: 1,field1:"content",... },
...
]
}
You can substitute 'object' by any class, it's just for description purposes.
How to write a generic response model that will suit those reqs?
I know I can produce response model that would contain status:str and (depending on class) data structure e.g ticket:Ticket or tickets:List[Ticket].
The point is there's a number of classes so I hope there's a more pythonic way to do it.
Thanks for help.
Generic model with static field name
A generic model is a model where one field (or multiple) are annotated with a type variable. Thus the type of that field is unspecified by default and must be specified explicitly during subclassing and/or initialization. But that field is still just an attribute and an attribute must have a name. A fixed name.
To go from your example, say that is your model:
{
"status": "...",
"data": {
"object": {...} # type variable
}
}
Then we could define that model as generic in terms of the type of its object attribute.
This can be done using Pydantic's GenericModel like this:
from typing import Generic, TypeVar
from pydantic import BaseModel
from pydantic.generics import GenericModel
M = TypeVar("M", bound=BaseModel)
class GenericSingleObject(GenericModel, Generic[M]):
object: M
class GenericMultipleObjects(GenericModel, Generic[M]):
objects: list[M]
class BaseGenericResponse(GenericModel):
status: str
class GenericSingleResponse(BaseGenericResponse, Generic[M]):
data: GenericSingleObject[M]
class GenericMultipleResponse(BaseGenericResponse, Generic[M]):
data: GenericMultipleObjects[M]
class Foo(BaseModel):
a: str
b: int
class Bar(BaseModel):
x: float
As you can see, GenericSingleObject reflects the generic type we want for data, whereas GenericSingleResponse is generic in terms of the type parameter M of GenericSingleObject, which is the type of its data attribute.
If we now want to use one of our generic response models, we would need to specify it with a type argument (a concrete model) first, e.g. GenericSingleResponse[Foo].
FastAPI deals with this just fine and can generate the correct OpenAPI documentation. The JSON schema for GenericSingleResponse[Foo] looks like this:
{
"title": "GenericSingleResponse[Foo]",
"type": "object",
"properties": {
"status": {
"title": "Status",
"type": "string"
},
"data": {
"$ref": "#/definitions/GenericSingleObject_Foo_"
}
},
"required": [
"status",
"data"
],
"definitions": {
"Foo": {
"title": "Foo",
"type": "object",
"properties": {
"a": {
"title": "A",
"type": "string"
},
"b": {
"title": "B",
"type": "integer"
}
},
"required": [
"a",
"b"
]
},
"GenericSingleObject_Foo_": {
"title": "GenericSingleObject[Foo]",
"type": "object",
"properties": {
"object": {
"$ref": "#/definitions/Foo"
}
},
"required": [
"object"
]
}
}
}
To demonstrate it with FastAPI:
from fastapi import FastAPI
app = FastAPI()
#app.get("/foo/", response_model=GenericSingleResponse[Foo])
async def get_one_foo() -> dict[str, object]:
return {"status": "foo", "data": {"object": {"a": "spam", "b": 123}}}
Sending a request to that route returns the following:
{
"status": "foo",
"data": {
"object": {
"a": "spam",
"b": 123
}
}
}
Dynamically created model
If you actually want the attribute name to also be different every time, that is obviously no longer possible with static type annotations. In that case we would have to resort to actually creating the model type dynamically via pydantic.create_model.
In that case there is really no point in genericity anymore because type safety is out of the window anyway, at least for the data model. We still have the option to define a GenericResponse model, which we can specify via our dynamically generated models, but this will make every static type checker mad, since we'll be using variables for types. Still, it might make for otherwise concise code.
We just need to define an algorithm for deriving the model parameters:
from typing import Any, Generic, Optional, TypeVar
from pydantic import BaseModel, create_model
from pydantic.generics import GenericModel
M = TypeVar("M", bound=BaseModel)
def create_data_model(
model: type[BaseModel],
plural: bool = False,
custom_plural_name: Optional[str] = None,
**kwargs: Any,
) -> type[BaseModel]:
data_field_name = model.__name__.lower()
if plural:
model_name = f"Multiple{model.__name__}"
if custom_plural_name:
data_field_name = custom_plural_name
else:
data_field_name += "s"
kwargs[data_field_name] = (list[model], ...) # type: ignore[valid-type]
else:
model_name = f"Single{model.__name__}"
kwargs[data_field_name] = (model, ...)
return create_model(model_name, **kwargs)
class GenericResponse(GenericModel, Generic[M]):
status: str
data: M
Using the same Foo and Bar examples as before:
class Foo(BaseModel):
a: str
b: int
class Bar(BaseModel):
x: float
SingleFoo = create_data_model(Foo)
MultipleBar = create_data_model(Bar, plural=True)
This also works as expected with FastAPI including the automatically generated schemas/documentations:
from fastapi import FastAPI
app = FastAPI()
#app.get("/foo/", response_model=GenericResponse[SingleFoo]) # type: ignore[valid-type]
async def get_one_foo() -> dict[str, object]:
return {"status": "foo", "data": {"foo": {"a": "spam", "b": 123}}}
#app.get("/bars/", response_model=GenericResponse[MultipleBar]) # type: ignore[valid-type]
async def get_multiple_bars() -> dict[str, object]:
return {"status": "bars", "data": {"bars": [{"x": 3.14}, {"x": 0}]}}
Output is essentially the same as with the first approach.
You'll have to see, which one works better for you. I find the second option very strange because of the dynamic key/field name. But maybe that is what you need for some reason.

Parse flat data into nested pydantic model

I have two models:
from pydantic import BaseModel
class Nested(BaseModel):
id: int
title: str
class Model(BaseModel):
id: int
nested_id: int
nested: Nested
Model references Nested.
I make a query with a JOIN to my database and get something like this response:
data = {'id': 5, 'nested_id': 1, 'id_1': 1, 'title': 'хлеб'}
I would like to parse this response and assign the right fields to the Nested model,
Are there any methods in BaseModel, with which I can parse data?
Of course I can work with dict which I already have. But I want to do it in a method of BaseModel .
I use parse_as_obj(List[Model], data).
Something like this?
from pydantic import BaseModel, root_validator
DataDict = dict[str, object]
class Nested(BaseModel):
id: int
title: str
class Model(BaseModel):
id: int
nested_id: int
nested: Nested
#root_validator(pre=True)
def parse_flat_nested_data(cls, values: DataDict) -> DataDict:
if "nested" not in values:
values["nested"] = {
"id": values.get("id_1"),
"title": values.get("title"),
}
return values
if __name__ == "__main__":
data = {"id": 5, "nested_id": 1, "id_1": 1, "title": "хлеб"}
instance = Model.parse_obj(data)
print(instance)
Output: id=5 nested_id=1 nested=Nested(id=1, title='хлеб')
See documentation on validators for more.

Is it possible to change the output alias in pydantic?

Setup:
# Pydantic Models
class TMDB_Category(BaseModel):
name: str = Field(alias="strCategory")
description: str = Field(alias="strCategoryDescription")
class TMDB_GetCategoriesResponse(BaseModel):
categories: list[TMDB_Category]
#router.get(path="category", response_model=TMDB_GetCategoriesResponse)
async def get_all_categories():
async with httpx.AsyncClient() as client:
response = await client.get(Endpoint.GET_CATEGORIES)
return TMDB_GetCategoriesResponse.parse_obj(response.json())
Problem:
Alias is being used when creating a response, and I want to avoid it. I only need this alias to correctly map the incoming data but when returning a response, I want to use actual field names.
Actual response:
{
"categories": [
{
"strCategory": "Beef",
"strCategoryDescription": "Beef is ..."
},
{
"strCategory": "Chicken",
"strCategoryDescription": "Chicken is ..."
}
}
Expected response:
{
"categories": [
{
"name": "Beef",
"description": "Beef is ..."
},
{
"name": "Chicken",
"description": "Chicken is ..."
}
}
Switch aliases and field names and use the allow_population_by_field_name model config option:
class TMDB_Category(BaseModel):
strCategory: str = Field(alias="name")
strCategoryDescription: str = Field(alias="description")
class Config:
allow_population_by_field_name = True
Let the aliases configure the names of the fields that you want to return, but enable allow_population_by_field_name to be able to parse data that uses different names for the fields.
An alternate option (which likely won't be as popular) is to use a de-serialization library other than pydantic. For example, the Dataclass Wizard library is one which supports this particular use case. If you need the same round-trip behavior that Field(alias=...) provides, you can pass the all param to the json_field function. Note that with such a library, you do lose out on the ability to perform complete type validation, which is arguably one of pydantic's greatest strengths; however it does, perform type conversion in a similar fashion to pydantic. There are also a few reasons why I feel that validation is not as important, which I do list below.
Reasons why I would argue that data validation is a nice to have
feature in general:
If you're building and passing in the input yourself, you can most likely trust that you know what you are doing, and are passing in the correct data types.
If you're getting the input from another API, then assuming that API has decent docs, you can just grab an example response from their documentation, and use that to model your class structure. You generally don't need any validation if an API documents its response structure clearly.
Data validation takes time, so it can slow down the process slightly, compared to if you just perform type conversion and catch any errors that might occur, without validating the input type beforehand.
So to demonstrate that, here's a simple example for the above use case using the dataclass-wizard library (which relies on the usage of dataclasses instead of pydantic models):
from dataclasses import dataclass
from dataclass_wizard import JSONWizard, json_field
#dataclass
class TMDB_Category:
name: str = json_field('strCategory')
description: str = json_field('strCategoryDescription')
#dataclass
class TMDB_GetCategoriesResponse(JSONWizard):
categories: list[TMDB_Category]
And the code to run that, would look like this:
input_dict = {
"categories": [
{
"strCategory": "Beef",
"strCategoryDescription": "Beef is ..."
},
{
"strCategory": "Chicken",
"strCategoryDescription": "Chicken is ..."
}
]
}
c = TMDB_GetCategoriesResponse.from_dict(input_dict)
print(repr(c))
# TMDB_GetCategoriesResponse(categories=[TMDB_Category(name='Beef', description='Beef is ...'), TMDB_Category(name='Chicken', description='Chicken is ...')])
print(c.to_dict())
# {'categories': [{'name': 'Beef', 'description': 'Beef is ...'}, {'name': 'Chicken', 'description': 'Chicken is ...'}]}
Measuring Performance
If anyone is curious, I've set up a quick benchmark test to compare deserialization and serialization times with pydantic vs. just dataclasses:
from dataclasses import dataclass
from timeit import timeit
from pydantic import BaseModel, Field
from dataclass_wizard import JSONWizard, json_field
# Pydantic Models
class Pydantic_TMDB_Category(BaseModel):
name: str = Field(alias="strCategory")
description: str = Field(alias="strCategoryDescription")
class Pydantic_TMDB_GetCategoriesResponse(BaseModel):
categories: list[Pydantic_TMDB_Category]
# Dataclasses
#dataclass
class TMDB_Category:
name: str = json_field('strCategory', all=True)
description: str = json_field('strCategoryDescription', all=True)
#dataclass
class TMDB_GetCategoriesResponse(JSONWizard):
categories: list[TMDB_Category]
# Input dict which contains sufficient data for testing (100 categories)
input_dict = {
"categories": [
{
"strCategory": f"Beef {i * 2}",
"strCategoryDescription": "Beef is ..." * i
}
for i in range(100)
]
}
n = 10_000
print('=== LOAD (deserialize)')
print('dataclass-wizard: ',
timeit('c = TMDB_GetCategoriesResponse.from_dict(input_dict)',
globals=globals(), number=n))
print('pydantic: ',
timeit('c = Pydantic_TMDB_GetCategoriesResponse.parse_obj(input_dict)',
globals=globals(), number=n))
c = TMDB_GetCategoriesResponse.from_dict(input_dict)
pydantic_c = Pydantic_TMDB_GetCategoriesResponse.parse_obj(input_dict)
print('=== DUMP (serialize)')
print('dataclass-wizard: ',
timeit('c.to_dict()',
globals=globals(), number=n))
print('pydantic: ',
timeit('pydantic_c.dict()',
globals=globals(), number=n))
And the benchmark results (tested on Mac OS Big Sur, Python 3.9.0):
=== LOAD (deserialize)
dataclass-wizard: 1.742989194
pydantic: 5.31538175
=== DUMP (serialize)
dataclass-wizard: 2.300118940
pydantic: 5.582638598
In their docs, pydantic claims to be the fastest library in general, but it's rather straightforward to prove otherwise. As you can see, for the above dataset pydantic is about 2x slower in both the deserialization and serialization process. It’s worth noting that pydantic is already quite fast, though.
Disclaimer: I am the creator (and maintener) of said library.
maybe you could use this approach
from pydantic import BaseModel, Field
class TMDB_Category(BaseModel):
name: str = Field(alias="strCategory")
description: str = Field(alias="strCategoryDescription")
data = {
"strCategory": "Beef",
"strCategoryDescription": "Beef is ..."
}
obj = TMDB_Category.parse_obj(data)
# {'name': 'Beef', 'description': 'Beef is ...'}
print(obj.dict())
I was trying to do something similar (migrate a field pattern to a list of patterns while gracefully handling old versions of the data). The best solution I could find was to do the field mapping in the __init__ method. In the terms of OP, this would be like:
class TMDB_Category(BaseModel):
name: str
description: str
def __init__(self, **data):
if "strCategory" in data:
data["name"] = data.pop("strCategory")
if "strCategoryDescription" in data:
data["description"] = data.pop("strCategoryDescription")
super().__init__(**data)
Then we have:
>>> TMDB_Category(strCategory="name", strCategoryDescription="description").json()
'{"name": "name", "description": "description"}'
If you need to use field aliases to do this but still use the name/description fields in your code, one option is to alter Hernán Alarcón's solution to use properties:
class TMDB_Category(BaseModel):
strCategory: str = Field(alias="name")
strCategoryDescription: str = Field(alias="description")
class Config:
allow_population_by_field_name = True
#property
def name(self):
return self.strCategory
#name.setter
def name(self, value):
self.strCategory = value
#property
def description(self):
return self.strCategoryDescription
#description.setter
def description(self, value):
self.strCategoryDescription = value
That's still a bit awkward, since the repr uses the "alias" names:
>>> TMDB_Category(name="name", description="description")
TMDB_Category(strCategory='name', strCategoryDescription='description')
Use the Config option by_alias.
from fastapi import FastAPI, Path, Query
from pydantic import BaseModel, Field
app = FastAPI()
class Item(BaseModel):
name: str = Field(..., alias="keck")
#app.post("/item")
async def read_items(
item: Item,
):
return item.dict(by_alias=False)
Given the request:
{
"keck": "string"
}
this will return
{
"name": "string"
}

Initialize Python dataclass from dictionary

Let's say I want to initialize the below dataclass
from dataclasses import dataclass
#dataclass
class Req:
id: int
description: str
I can of course do it in the following way:
data = make_request() # gives me a dict with id and description as well as some other keys.
# {"id": 123, "description": "hello", "data_a": "", ...}
req = Req(data["id"], data["description"])
But, is it possible for me to do it with dictionary unpacking, given that the keys I need is always a subset of the dictionary?
req = Req(**data) # TypeError: __init__() got an unexpected keyword argument 'data_a'
Here's a solution that can be used generically for any class. It simply filters the input dictionary to exclude keys that aren't field names of the class with init==True:
from dataclasses import dataclass, fields
#dataclass
class Req:
id: int
description: str
def classFromArgs(className, argDict):
fieldSet = {f.name for f in fields(className) if f.init}
filteredArgDict = {k : v for k, v in argDict.items() if k in fieldSet}
return className(**filteredArgDict)
data = {"id": 123, "description": "hello", "data_a": ""}
req = classFromArgs(Req, data)
print(req)
Output:
Req(id=123, description='hello')
UPDATE: Here's a variation on the strategy above which creates a utility class that caches dataclasses.fields for each dataclass that uses it (prompted by a comment by #rv.kvetch expressing performance concerns around duplicate processing of dataclasses.fields by multiple invocations for the same dataclass).
from dataclasses import dataclass, fields
class DataClassUnpack:
classFieldCache = {}
#classmethod
def instantiate(cls, classToInstantiate, argDict):
if classToInstantiate not in cls.classFieldCache:
cls.classFieldCache[classToInstantiate] = {f.name for f in fields(classToInstantiate) if f.init}
fieldSet = cls.classFieldCache[classToInstantiate]
filteredArgDict = {k : v for k, v in argDict.items() if k in fieldSet}
return classToInstantiate(**filteredArgDict)
#dataclass
class Req:
id: int
description: str
req = DataClassUnpack.instantiate(Req, {"id": 123, "description": "hello", "data_a": ""})
print(req)
req = DataClassUnpack.instantiate(Req, {"id": 456, "description": "goodbye", "data_a": "my", "data_b": "friend"})
print(req)
#dataclass
class Req2:
id: int
description: str
data_a: str
req2 = DataClassUnpack.instantiate(Req2, {"id": 123, "description": "hello", "data_a": "world"})
print(req2)
print("\nHere's a peek at the internals of DataClassUnpack:")
print(DataClassUnpack.classFieldCache)
Output:
Req(id=123, description='hello')
Req(id=456, description='goodbye')
Req2(id=123, description='hello', data_a='world')
Here's a peek at the internals of DataClassUnpack:
{<class '__main__.Req'>: {'description', 'id'}, <class '__main__.Req2'>: {'description', 'data_a', 'id'}}
You can possibly introduce a new function that will perform the given conversion from dict to dataclass:
import inspect
from dataclasses import dataclass
#dataclass
class Req:
id: int
description: str
def from_dict_to_dataclass(cls, data):
return cls(
**{
key: (data[key] if val.default == val.empty else data.get(key, val.default))
for key, val in inspect.signature(cls).parameters.items()
}
)
from_dict_to_dataclass(Req, {"id": 123, "description": "hello", "data_a": ""})
# Output: Req(id=123, description='hello')
Note, if val.default == val.empty condition is needed in order to check if your dataclass has a default value set. If it's true then we should take the given value into consideration when constructing a dataclass.
A workaround to this is by intercepting the __init__ of the dataclass and filter out the fields that are not recognized.
from dataclasses import dataclass, fields
#dataclass
class Req1:
id: int
description: str
#dataclass
class Req2:
id: int
description: str
def __init__(self, **kwargs):
for key, value in kwargs.items():
if key in REQ2_FIELD_NAMES:
setattr(self, key, value)
# To not re-evaluate the field names for each and every creation of Req2, list them here.
REQ2_FIELD_NAMES = {field.name for field in fields(Req2)}
data = {
"id": 1,
"description": "some",
"data_a": None,
}
try:
print("Call for Req1:", Req1(**data))
except Exception as error:
print("Call for Req1:", error)
try:
print("Call for Req2:", Req2(**data))
except Exception as error:
print("Call for Req2:", error)
Output:
Call for Req1: __init__() got an unexpected keyword argument 'data_a'
Call for Req2: Req2(id=1, description='some')
Related question:
How does one ignore extra arguments passed to a data class?

Categories

Resources