I am using the data from the League of Legends API to learn Python, JSON, and Data Classes. Using dacite, I have created parent and child classes that allow access to the data using this syntax: champs.data['Ahri']['key']. However, I wonder if there is a way to create a class that returns the keys as fields so one could access the data using this syntax: champs.data.Ahri.key.
Here is the working code:
from dataclasses import dataclass
from dacite import from_dict
j1 = {'type': 'champion',
'data': {'Aatrox': {'id': 'Aatrox', 'key': '266', 'name': 'Aatrox'},
'Ahri': {'id': 'Ahri', 'key': '103', 'name': 'Ahri'}}}
#dataclass
class C:
type: str
data: dict
#dataclass
class P:
type: str
data: dict
champs = from_dict(data_class=P, data=j1)
champs.data['Ahri']['key']
If it were me, I would probably leave/make champions a dictionary. Then access it like champions['Ahri'].key
Something like:
import dataclasses
#dataclasses.dataclass
class Champion:
id: str
key: str
name: str
j1 = {
'type': 'champion',
'data': {
'Aatrox': {'id': 'Aatrox', 'key': '266', 'name': 'Aatrox'},
'Ahri': {'id': 'Ahri', 'key': '103', 'name': 'Ahri'}
}
}
champions = {
champion["id"]: Champion(**champion)
for champion in j1["data"].values()
}
print(champions['Ahri'].key)
resulting in 103
However if you were really keen on champions.Ahri.key then you can implement Champions as an empty class and use setattr()
import dataclasses
#dataclasses.dataclass
class Champion:
id: str
key: str
name: str
#dataclasses.dataclass
class Champions:
pass
j1 = {
'type': 'champion',
'data': {
'Aatrox': {'id': 'Aatrox', 'key': '266', 'name': 'Aatrox'},
'Ahri': {'id': 'Ahri', 'key': '103', 'name': 'Ahri'}
}
}
champions = Champions()
for champion in j1["data"].values():
setattr(champions, champion["id"], Champion(**champion))
print(champions.Ahri.key)
again giving you 103
Note: The #dataclass decorator can likely be omitted from Champion().
The closest you can probably get - at least in a safe enough manner - is as #JonSG suggests, using champs.data['Ahri'].key.
Here's a straightforward example using the dataclass-wizard. It doesn't do a strict type checking as I know dacite does.
Instead, it opts to do implicit type coercision where possible, which is useful in some cases; you can see an example of this below - str to annotated int in this case.
Note: This example should work for Python 3.7+ with the included __future__ import.
from __future__ import annotations
from dataclasses import dataclass
from dataclass_wizard import fromdict
data = {
'type': 'champion',
'data': {
'Aatrox': {'id': 'Aatrox', 'key': '266', 'name': 'Aatrox'},
'Ahri': {'id': 'Ahri', 'key': '103', 'name': 'Ahri'},
}
}
#dataclass
class P:
type: str
data: dict[str, Character]
#dataclass
class Character:
id: str
key: int
name: str
champs = fromdict(P, data)
print(champs)
print(champs.data['Ahri'].key)
Output:
P(type='champion', data={'Aatrox': Character(id='Aatrox', key=266, name='Aatrox'), 'Ahri': Character(id='Ahri', key=103, name='Ahri')})
103
How to do this
d = {
"type": "champion",
"data": {
"Aatrox": {"id": "Aatrox", "key": "266", "name": "Aatrox"},
"Ahri": {"id": "Ahri", "key": "103", "name": "Ahri"},
},
}
def dict_to_class(d) -> object:
if isinstance(d, dict):
class C:
pass
for k, v in d.items():
setattr(C, k, dict_to_class(v))
return C
else:
return d
champ = dict_to_class(d)
print(champ.data.Ahri.key)
# 103
The key here is the setatter builtin method, which takes an object, a string, and some value, and creates an attribute (field) on that object, named according to the string and containing the value.
Don't do this!
I must stress that there is almost never a good reason to do this. When dealing with JSON data of an unknown shape, the correct way to represent it is a dict.
If you do know the shape of the data, you should create a specialized dataclass, like so:
from dataclasses import dataclass
d = {
"type": "champion",
"data": {
"Aatrox": {"id": "Aatrox", "key": "266", "name": "Aatrox"},
"Ahri": {"id": "Ahri", "key": "103", "name": "Ahri"},
},
}
#dataclass
class Champion:
id: str
key: str
name: str
champions = {name: Champion(**attributes) for name, attributes in d["data"].items()}
print(champions)
# {'Aatrox': Champion(id='Aatrox', key='266', name='Aatrox'), 'Ahri': Champion(id='Ahri', key='103', name='Ahri')}
print(champions["Aatrox"].key)
# 266
The dacite docs have a section about nested structures that is very close to what you want. The example they use, verbatim, is as follows:
#dataclass
class A:
x: str
y: int
#dataclass
class B:
a: A
data = {
'a': {
'x': 'test',
'y': 1,
}
}
result = from_dict(data_class=B, data=data)
assert result == B(a=A(x='test', y=1))
We can access fields at arbitrary depth as e.g. result.a.x == 'test'.
The critical difference between this and your data is that the dictionary under the data key has keys with arbitrary values (Aatrox, Ahri, etc.). dacite isn't set up to create new field names on the fly, so the best you're going to get is something like the latter part of #JonSG's answer, which uses setattr to dynamically build new fields.
Let's imagine how you would use this data for a moment, though. Probably you'd want a some point to be able to iterate over your champions in order to perform a filter/transform/etc. operation. It's possible to iterate over fields in python, but you have to really dig into python internals, which means your code will be less readable/generally comprehensible.
Much better would be one of the following:
Preprocess j1 into a shape that fits the structure you want to use, and then use dacite with a dataclass that fits the new structure. For example, maybe it makes sense to pull the values of the data dict out into a list.
Process in steps using dacite. For example, something like the following:
from dataclasses import dataclass
from dacite import from_dict
#dataclass
class TopLevel:
type: str
data: dict
j1 = {
"type": "champion",
"data": {
"Aatrox": {"id": "Aatrox", "key": "266", "name": "Aatrox"},
"Ahri": {"id": "Ahri", "key": "103", "name": "Ahri"},
},
}
champions = from_dict(data_class=TopLevel, data=j1)
# champions.data is a dict of dicts
#dataclass
class Champion:
id: str
key: str
name: str
# transform champions.data into a dict of Champions
for k, v in champions.data.items():
champions.data[k] = from_dict(data_class=Champion, data=v)
# now, you can do interesting things like the following filter operation
start_with_a = [
champ for champ in champions.data.values() if champ.name.lower().startswith("a")
]
print(start_with_a)
# [Champion(id='Aatrox', key='266', name='Aatrox'), Champion(id='Ahri', key='103', name='Ahri')]
Related
I've been working with FastAPI for some time, it's a great framework.
However real life scenarios can be surprising, sometimes a non-standard approach is necessary. There's a one case I'd like to ask your help with.
There's a strange external requirement that a model response should be formatted as stated in example:
Desired behavior:
GET /object/1
{status: ‘success’, data: {object: {id:‘1’, category: ‘test’ …}}}
GET /objects
{status: ‘success’, data: {objects: [...]}}}
Current behavior:
GET /object/1 would respond:
{id: 1,field1:"content",... }
GET /objects/ would send a List of Object e.g.,:
{
[
{id: 1,field1:"content",... },
{id: 1,field1:"content",... },
...
]
}
You can substitute 'object' by any class, it's just for description purposes.
How to write a generic response model that will suit those reqs?
I know I can produce response model that would contain status:str and (depending on class) data structure e.g ticket:Ticket or tickets:List[Ticket].
The point is there's a number of classes so I hope there's a more pythonic way to do it.
Thanks for help.
Generic model with static field name
A generic model is a model where one field (or multiple) are annotated with a type variable. Thus the type of that field is unspecified by default and must be specified explicitly during subclassing and/or initialization. But that field is still just an attribute and an attribute must have a name. A fixed name.
To go from your example, say that is your model:
{
"status": "...",
"data": {
"object": {...} # type variable
}
}
Then we could define that model as generic in terms of the type of its object attribute.
This can be done using Pydantic's GenericModel like this:
from typing import Generic, TypeVar
from pydantic import BaseModel
from pydantic.generics import GenericModel
M = TypeVar("M", bound=BaseModel)
class GenericSingleObject(GenericModel, Generic[M]):
object: M
class GenericMultipleObjects(GenericModel, Generic[M]):
objects: list[M]
class BaseGenericResponse(GenericModel):
status: str
class GenericSingleResponse(BaseGenericResponse, Generic[M]):
data: GenericSingleObject[M]
class GenericMultipleResponse(BaseGenericResponse, Generic[M]):
data: GenericMultipleObjects[M]
class Foo(BaseModel):
a: str
b: int
class Bar(BaseModel):
x: float
As you can see, GenericSingleObject reflects the generic type we want for data, whereas GenericSingleResponse is generic in terms of the type parameter M of GenericSingleObject, which is the type of its data attribute.
If we now want to use one of our generic response models, we would need to specify it with a type argument (a concrete model) first, e.g. GenericSingleResponse[Foo].
FastAPI deals with this just fine and can generate the correct OpenAPI documentation. The JSON schema for GenericSingleResponse[Foo] looks like this:
{
"title": "GenericSingleResponse[Foo]",
"type": "object",
"properties": {
"status": {
"title": "Status",
"type": "string"
},
"data": {
"$ref": "#/definitions/GenericSingleObject_Foo_"
}
},
"required": [
"status",
"data"
],
"definitions": {
"Foo": {
"title": "Foo",
"type": "object",
"properties": {
"a": {
"title": "A",
"type": "string"
},
"b": {
"title": "B",
"type": "integer"
}
},
"required": [
"a",
"b"
]
},
"GenericSingleObject_Foo_": {
"title": "GenericSingleObject[Foo]",
"type": "object",
"properties": {
"object": {
"$ref": "#/definitions/Foo"
}
},
"required": [
"object"
]
}
}
}
To demonstrate it with FastAPI:
from fastapi import FastAPI
app = FastAPI()
#app.get("/foo/", response_model=GenericSingleResponse[Foo])
async def get_one_foo() -> dict[str, object]:
return {"status": "foo", "data": {"object": {"a": "spam", "b": 123}}}
Sending a request to that route returns the following:
{
"status": "foo",
"data": {
"object": {
"a": "spam",
"b": 123
}
}
}
Dynamically created model
If you actually want the attribute name to also be different every time, that is obviously no longer possible with static type annotations. In that case we would have to resort to actually creating the model type dynamically via pydantic.create_model.
In that case there is really no point in genericity anymore because type safety is out of the window anyway, at least for the data model. We still have the option to define a GenericResponse model, which we can specify via our dynamically generated models, but this will make every static type checker mad, since we'll be using variables for types. Still, it might make for otherwise concise code.
We just need to define an algorithm for deriving the model parameters:
from typing import Any, Generic, Optional, TypeVar
from pydantic import BaseModel, create_model
from pydantic.generics import GenericModel
M = TypeVar("M", bound=BaseModel)
def create_data_model(
model: type[BaseModel],
plural: bool = False,
custom_plural_name: Optional[str] = None,
**kwargs: Any,
) -> type[BaseModel]:
data_field_name = model.__name__.lower()
if plural:
model_name = f"Multiple{model.__name__}"
if custom_plural_name:
data_field_name = custom_plural_name
else:
data_field_name += "s"
kwargs[data_field_name] = (list[model], ...) # type: ignore[valid-type]
else:
model_name = f"Single{model.__name__}"
kwargs[data_field_name] = (model, ...)
return create_model(model_name, **kwargs)
class GenericResponse(GenericModel, Generic[M]):
status: str
data: M
Using the same Foo and Bar examples as before:
class Foo(BaseModel):
a: str
b: int
class Bar(BaseModel):
x: float
SingleFoo = create_data_model(Foo)
MultipleBar = create_data_model(Bar, plural=True)
This also works as expected with FastAPI including the automatically generated schemas/documentations:
from fastapi import FastAPI
app = FastAPI()
#app.get("/foo/", response_model=GenericResponse[SingleFoo]) # type: ignore[valid-type]
async def get_one_foo() -> dict[str, object]:
return {"status": "foo", "data": {"foo": {"a": "spam", "b": 123}}}
#app.get("/bars/", response_model=GenericResponse[MultipleBar]) # type: ignore[valid-type]
async def get_multiple_bars() -> dict[str, object]:
return {"status": "bars", "data": {"bars": [{"x": 3.14}, {"x": 0}]}}
Output is essentially the same as with the first approach.
You'll have to see, which one works better for you. I find the second option very strange because of the dynamic key/field name. But maybe that is what you need for some reason.
In python 3, how can I deserialize an object structure from json?
Example json:
{ 'name': 'foo',
'some_object': { 'field1': 'bar', 'field2' : '0' },
'some_list_of_objects': [
{ 'field1': 'bar1', 'field2' : '1' },
{ 'field1': 'bar2', 'field2' : '2' },
{ 'field1': 'bar3', 'field2' : '3' },
]
}
Here's my python code:
import json
class A:
name: str
some_object: B
some_list_of_objects: list(C)
def __init__(self, file_name):
with open(file_name, "r") as json_file:
self.__dict__ = json.load(json_file)
class B:
field1: int
field2: str
class C:
field1: int
field2: str
How to force some_object to be of type B and some_list_of_objects to be of type list of C?
As you're using Python 3, I would suggest using dataclasses to model your classes. This should improve your overall code quality and also eliminate the need to explicltly declare an __init__ constructor method for your class, for example.
If you're on board with using a third-party library, I'd suggest looking into an efficient JSON serialization library like the dataclass-wizard that performs implicit type conversion - for example, string to annotated int as below. Note that I'm using StringIO here, which is a file-like object containing a JSON string to de-serialize into a nested class model.
Note: the following approach should work in Python 3.7+.
from __future__ import annotations
from dataclasses import dataclass
from io import StringIO
from dataclass_wizard import JSONWizard
json_data = StringIO("""
{ "name": "foo",
"some_object": { "field1": "bar", "field2" : "0" },
"some_list_of_objects": [
{ "field1": "bar1", "field2" : "1" },
{ "field1": "bar2", "field2" : "2" },
{ "field1": "bar3", "field2" : "3" }
]
}
""")
#dataclass
class A(JSONWizard):
name: str
some_object: B
some_list_of_objects: list[C]
#dataclass
class B:
field1: str
field2: int
#dataclass
class C:
field1: str
field2: int
a = A.from_json(json_data.read())
print(f'{a!r}') # alternatively: print(repr(a))
Output
A(name='foo', some_object=B(field1='bar', field2=0), some_list_of_objects=[C(field1='bar1', field2=1), C(field1='bar2', field2=2), C(field1='bar3', field2=3)])
Loading from a JSON file
As per the suggestions in this post, I would discourage overriding the constructor method to pass the name of a JSON file to load the data from. Instead, I would suggest creating a helper class method as below, that can be invoked like A.from_json_file('file.json') if desired.
#classmethod
def from_json_file(cls, file_name: str):
"""Deserialize json file contents into an A object."""
with open(file_name, 'r') as json_file:
return cls.from_dict(json.load(json_file))
Suggestions
Note that variable annotations (or annotations in general) are subscripted using square brackets [] rather than parentheses as appears in the original version above.
some_list_of_objects: list(C)
In the above solution, I've instead changed that to:
some_list_of_objects: list[C]
This works because using subscripted values in standard collections was introduced in PEP 585. However, using the from __future__ import annotations statement introduced to Python 3.7+ effectively converts all annotations to forward-declared string values, so that new-style annotations that normally only would work in Python 3.10, can also be ported over to Python 3.7+ as well.
One other change I made, was in regards to swapping out the order of declared class annotations. For example, note the below:
class B:
field1: int
field2: str
However, note the corresponding field in the JSON data, that would be deserialized to a B object:
'some_object': { 'field1': 'bar', 'field2' : '0' },
In the above implementation, I've swapped out the field annotations in such cases, so class B for instance is declared as:
class B:
field1: str
field2: int
I would like pydantic to choose the model to use for parsing the input dependent on the input value. Is this possible?
MVCE
I have a pydantic model which looks similar to this one:
from typing import List, Literal
from pydantic import BaseModel
class Animal(BaseModel):
name: str
type: Literal["mamal", "bird"]
class Bird(Animal):
max_eggs: int
class Mamal(Animal):
max_offspring: int
class Config(BaseModel):
animals: List[Animal]
cfg = Config.parse_obj(
{
"animals": [
{"name": "eagle", "type": "bird", "max_eggs": 3},
{"name": "Human", "type": "mamal", "max_offspring": 3},
]
}
)
print(cfg.json(indent=4))
gives
{
"animals": [
{
"name": "eagle",
"type": "bird"
<-- missing max_offspring, as "Animal" was used instead of Bird
},
{
"name": "Human",
"type": "mamal"
<-- missing max_offspring, as "Animal" was used instead of Mamal
}
]
}
I know that I could set Config.extra="allow" in Animal, but that is not what I want. I would like pydantic to see that a dictionary with 'type': 'mamal' should use the Mamal model to parse.
Is this possible?
You could add concrete literals to every child class to differentiate and put them in Union from more to less specific order. Like so:
class Animal(BaseModel):
name: str
type: str
class Bird(Animal):
type: Literal["bird"]
max_eggs: int
class Mamal(Animal):
type: Literal["mamal"]
max_offspring: int
class Config(BaseModel):
animals: List[Union[Bird, Mamal, Animal]] # From more specific to less
Context
I'm trying to validate/parse some data with pydantic.
I want to specify that the dict can have a key daytime, or not.
If it does, I want the value of daytime to include both sunrise and sunset.
e.g. These should be allowed:
{
'type': 'solar',
'daytime': {
'sunrise': 4, # 4am
'sunset': 18 # 6pm
}
}
And
{
'type': 'wind'
# daytime key is omitted
}
And
{
'type': 'wind',
'daytime': None
}
But I want to fail validation for
{
'type': 'solar',
'daytime': {
'sunrise': 4
}
}
Because this has a daytime value, but no sunset value.
MWE
I've got some code that does this.
If I run this script, it executes successfully.
from pydantic import BaseModel, ValidationError
from typing import List, Optional, Dict
class DayTime(BaseModel):
sunrise: int
sunset: int
class Plant(BaseModel):
daytime: Optional[DayTime] = None
type: str
p = Plant.parse_obj({'type': 'wind'})
p = Plant.parse_obj({'type': 'wind', 'daytime': None})
p = Plant.parse_obj({
'type': 'solar',
'daytime': {
'sunrise': 5,
'sunset': 18
}})
try:
p = Plant.parse_obj({
'type': 'solar',
'daytime': {
'sunrise': 5
}})
except ValidationError:
pass
else:
raise AssertionError("Should have failed")
Question
What I'm wondering is,
is this how you're supposed to use pydantic for nested data?
I have lots of layers of nesting, and this seems a bit verbose.
Is there any way to do something more concise, like:
class Plant(BaseModel):
daytime: Optional[Dict[('sunrise', 'sunset'), int]] = None
type: str
Pydantic create_model function is what you need:
from pydantic import BaseModel, create_model
class Plant(BaseModel):
daytime: Optional[create_model('DayTime', sunrise=(int, ...), sunset=(int, ...))] = None
type: str
I have a dictionary with config info:
my_conf = {
'version': 1,
'info': {
'conf_one': 2.5,
'conf_two': 'foo',
'conf_three': False,
'optional_conf': 'bar'
}
}
I want to check if the dictionary follows the structure I need.
I'm looking for something like this:
conf_structure = {
'version': int,
'info': {
'conf_one': float,
'conf_two': str,
'conf_three': bool
}
}
is_ok = check_structure(conf_structure, my_conf)
Is there any solution done to this problem or any library that could make implementing check_structure more easy?
You may use schema (PyPi Link)
schema is a library for validating Python data structures, such as those obtained from config-files, forms, external services or command-line parsing, converted from JSON/YAML (or something else) to Python data-types.
from schema import Schema, And, Use, Optional, SchemaError
def check(conf_schema, conf):
try:
conf_schema.validate(conf)
return True
except SchemaError:
return False
conf_schema = Schema({
'version': And(Use(int)),
'info': {
'conf_one': And(Use(float)),
'conf_two': And(Use(str)),
'conf_three': And(Use(bool)),
Optional('optional_conf'): And(Use(str))
}
})
conf = {
'version': 1,
'info': {
'conf_one': 2.5,
'conf_two': 'foo',
'conf_three': False,
'optional_conf': 'bar'
}
}
print(check(conf_schema, conf))
Without using libraries, you could also define a simple recursive function like this:
def check_structure(struct, conf):
if isinstance(struct, dict) and isinstance(conf, dict):
# struct is a dict of types or other dicts
return all(k in conf and check_structure(struct[k], conf[k]) for k in struct)
if isinstance(struct, list) and isinstance(conf, list):
# struct is list in the form [type or dict]
return all(check_structure(struct[0], c) for c in conf)
elif isinstance(conf, type):
# struct is the type of conf
return isinstance(struct, conf)
else:
# struct is neither a dict, nor list, not type
return False
This assumes that the config can have keys that are not in your structure, as in your example.
Update: New version also supports lists, e.g. like 'foo': [{'bar': int}]
Advice for the future: use Pydantic!
Pydantic enforces type hints at runtime, and provides user friendly errors when data is invalid. Define how data should be in pure, canonical python; validate it with pydantic, as simple as that:
from pydantic import BaseModel
class Info(BaseModel):
conf_one: float
conf_two: str
conf_three: bool
class Config:
extra = 'forbid'
class ConfStructure(BaseModel):
version: int
info: Info
If validation fails pydantic will raise an error with a breakdown of what was wrong:
my_conf_wrong = {
'version': 1,
'info': {
'conf_one': 2.5,
'conf_two': 'foo',
'conf_three': False,
'optional_conf': 'bar'
}
}
my_conf_right = {
'version': 10,
'info': {
'conf_one': 14.5,
'conf_two': 'something',
'conf_three': False
}
}
model = ConfStructure(**my_conf_right)
print(model.dict())
# {'version': 10, 'info': {'conf_one': 14.5, 'conf_two': 'something', 'conf_three': False}}
res = ConfStructure(**my_conf_wrong)
# pydantic.error_wrappers.ValidationError: 1 validation error for ConfStructure
# info -> optional_conf
# extra fields not permitted (type=value_error.extra)
You can build structure using recursion:
def get_type(value):
if isinstance(value, dict):
return {key: get_type(value[key]) for key in value}
else:
return str(type(value))
And then compare required structure with your dictionary:
get_type(current_conf) == get_type(required_conf)
Example:
required_conf = {
'version': 1,
'info': {
'conf_one': 2.5,
'conf_two': 'foo',
'conf_three': False,
'optional_conf': 'bar'
}
}
get_type(required_conf)
{'info': {'conf_two': "<type 'str'>", 'conf_one': "<type 'float'>", 'optional_conf': "<type 'str'>", 'conf_three': "<type 'bool'>"}, 'version': "<type 'int'>"}
Looks like the dict-schema-validator package does exactly what you need:
Here is a simple schema representing a Customer:
{
"_id": "ObjectId",
"created": "date",
"is_active": "bool",
"fullname": "string",
"age": ["int", "null"],
"contact": {
"phone": "string",
"email": "string"
},
"cards": [{
"type": "string",
"expires": "date"
}]
}
Validation:
from datetime import datetime
import json
from dict_schema_validator import validator
with open('models/customer.json', 'r') as j:
schema = json.loads(j.read())
customer = {
"_id": 123,
"created": datetime.now(),
"is_active": True,
"fullname": "Jorge York",
"age": 32,
"contact": {
"phone": "559-940-1435",
"email": "york#example.com",
"skype": "j.york123"
},
"cards": [
{"type": "visa", "expires": "12/2029"},
{"type": "visa"},
]
}
errors = validator.validate(schema, customer)
for err in errors:
print(err['msg'])
Output:
[*] "_id" has wrong type. Expected: "ObjectId", found: "int"
[+] Extra field: "contact.skype" having type: "str"
[*] "cards[0].expires" has wrong type. Expected: "date", found: "str"
[-] Missing field: "cards[1].expires"
You can also use dataclasses_json library. Here is how I would normally do it
from dataclasses import dataclass
from dataclasses_json import dataclass_json, Undefined
from dataclasses_json.undefined import UndefinedParameterError
from typing import Optional
#### define schema #######
#dataclass_json(undefined=Undefined.RAISE)
#dataclass
class Info:
conf_one: float
# conf_two: str
conf_three: bool
optional_conf: Optional[str]
#dataclass_json
#dataclass
class ConfStructure:
version: int
info: Info
####### test for compliance####
try:
ConfStructure.from_dict(my_conf).to_dict()
except KeyError as e:
print('theres a missing parameter')
except UndefinedParameterError as e:
print('extra parameters')
You can use dictify from https://pypi.org/project/dictify/.
Read docs here https://dictify.readthedocs.io/en/latest/index.html
This is how it can be done.
from dictify import Field, Model
class Info(Model):
conf_one = Field(required=True).instance(float)
conf_two = Field(required=True).instance(str)
conf_three = Field(required=True).instance(bool)
optional_conf = Field().instance(str)
class MyConf(Model):
version = Field(required=True).instance(int)
info = Field().model(Info)
my_conf = MyConf() # Invalid without required fields
# Valid
my_conf = MyConf({
'version': 1,
'info': {
'conf_one': 2.5,
'conf_two': 'foo',
'conf_three': False,
'optional_conf': 'bar'
}
})
my_conf['info']['conf_one'] = 'hi' # Invalid, won't be assinged
There is a standard for validating JSON files called JSON Schema.
Validators have been implemented in many languages, including the Python. Read also the documentation for more details. In the following example I will use a Python package jsonschema (docs) that I am familiar with.
Given the config data
my_conf = {
'version': 1,
'info': {
'conf_one': 2.5,
'conf_two': 'foo',
'conf_three': False,
'optional_conf': 'bar',
},
}
and the corresponding config schema
conf_structure = {
'type': 'object',
'properties': {
'version': {'type': 'integer'},
'info': {
'type': 'object',
'properties': {
'conf_one': {'type': 'number'},
'conf_two': {'type': 'string'},
'conf_three': {'type': 'boolean'},
'optional_conf': {'type': 'string'},
},
'required': ['conf_one', 'conf_two', 'conf_three'],
},
},
}
the actual code to validate this data is then as simple as this:
import jsonschema
jsonschema.validate(my_conf, schema=conf_structure)
A big advantage of this approach is that you can store both data and schema as JSON-formatted files.
#tobias_k beat me to it (both in time and quality probably) but here is another recursive function for the task that might be a bit easier for you (and me) to follow:
def check_dict(my_dict, check_against):
for k, v in check_against.items():
if isinstance(v, dict):
return check_dict(my_dict[k], v)
else:
if not isinstance(my_dict[k], v):
return False
return True
The nature of dictionaries, if they are being used in python and not exported as some JSON, is that the order of the dictionary need not be set. Instead, looking up keys returns values (hence a dictionary).
In either case, these functions should provide you with what your looking for for the level of nesting present in the samples you provided.
#assuming identical order of keys is required
def check_structure(conf_structure,my_conf):
if my_conf.keys() != conf_structure.keys():
return False
for key in my_conf.keys():
if type(my_conf[key]) == dict:
if my_conf[key].keys() != conf_structure[key].keys():
return False
return True
#assuming identical order of keys is not required
def check_structure(conf_structure,my_conf):
if sorted(my_conf.keys()) != sorted(conf_structure.keys()):
return False
for key in my_conf.keys():
if type(my_conf[key]) != dict:
return False
else:
if sorted(my_conf[key].keys()) != sorted(conf_structure[key].keys()):
return False
return True
This solution would obviously need to be changed if the level of nesting was greater (i.e. it is configured to assess the similarity in structure of dictionaries that have some values as dictionaries, but not dictionaries where some values these latter dictionaries are also dictionaries).