Pydantic: JSON-encoding dictionary keys

Pydantic: JSON-encoding dictionary keys - python

When using (hashable) objects as dictionary keys, calling .json() fails because while the values are encoded, the keys aren't:
from pydantic import BaseModel
from typing import dict
from datetime import datetime
class Foo(BaseModel):
date: datetime
sdict: Dict[datetime, str]
class Config:
json_encoders = {
datetime: repr
}
foo = Foo(date=datetime.now(), sdict={datetime.now(): 'now'})
foo
# Foo(date=datetime.datetime(2021, 9, 3, 12, 9, 55, 36105), sdict={datetime.datetime(2021, 9, 3, 12, 9, 55, 36114): 'now'})
foo.json()
TypeError: keys must be a string
# to prove the other way around works:
class Foo(BaseModel):
date: datetime
sdict: Dict[str, datetime]
class Config:
json_encoders = {
datetime: repr
}
foo = Foo(date=datetime.now(), sdict={'now': datetime.now()})
foo.json()
# '{"date": "datetime.datetime(2021, 9, 3, 12, 13, 30, 606880)", "sdict": {"now": "datetime.datetime(2021, 9, 3, 12, 13, 30, 606884)"}}'
This is because the default= param in json.dumps() which is ultimately used to dump doesn't encode dictionary keys. Defining a JSON encoder class does work, but it doesn't work for me for other reasons.
I've seen TypedDict in pydantic but it doesn't seem to fix the issue. Actually, I'm unsure what's the use of TypedDict, since AFAICS you need to define every key in the dict, which makes it analogue to a static object?
My use-case is that I need to represent the following idea:
{
"report": {
"warehouses": {
warehouse.id: {
"name": warehouse.name,
"address": warehouse.address,
}
for warehouse in warehouses
}
}
and warehouse.id is an Identifier object which can convert to different formats on demand, and which the json encoder will convert to a string.
Anyone knows of a way other than a dictionary where I can add arbitrary keys to an object in a way that will be affected by the json encoder, or some other way of serializing?

One of the options of solving the problem is using custom json_dumps function for pydantic model, inside which to make custom serialization, I did it by inheriting from JSONEncoder.
For example, like this:
import json
from pydantic import BaseModel
from typing import Dict
from datetime import datetime
class CustomEncoder(json.JSONEncoder):
def _transform(self, v):
res = v
if isinstance(v, datetime):
res = v.isoformat()
# else other variants
return self._encode(res)
def _encode(self, obj):
if isinstance(obj, dict):
return {self._transform(k): self._transform(v) for k, v in obj.items()}
else:
return obj
def encode(self, obj):
return super(CustomEncoder, self).encode(self._encode(obj))
def custom_dumps(values, *, default):
return CustomEncoder().encode(values)
class Foo(BaseModel):
date: datetime
sdict: Dict[datetime, str]
class Config:
json_dumps = custom_dumps
foo = Foo(date=datetime.now(), sdict={datetime.now(): 'now'})
Foo(date=datetime(2021, 9, 3, 12, 9, 55, 36105), sdict={datetime(2021, 9, 3, 12, 9, 55, 36114): 'now'})
print(foo.json())
{"date": "2021-09-07T16:02:51.070159", "sdict": {"2021-09-07T16:02:51.070164": "now"}}

Related

Recursively creates dataclasses based in nested dictionary

I have a dataclass called Config that is created through the properties and values of a dictionary. Since this dictionary can have nested dictionaries, i would like to make nested dictionaries as Config objects. Here is an example:
## Dummy example of a config dict
data = {
'a' : 1,
'b' : [2,2,2],
'c': {
'c_1' : 3.1
}
}
final_config = create_config(data)
# Expected result
Config(a=1, b=[2,2,2], c=Config(c_1=3.1) )
Here is what i've came up, using dataclasses.make_dataclass:
def _Config(params_dict):
config = make_dataclass('Config', params_dict.keys())
return config(**params_dict)
def get_inner_dict(d):
for _, v in d.items():
if isinstance(v, dict):
return get_inner_dict(v)
else:
return _Config(**d)
Unfortunately, this doesn't work because the recursion will try to create a dataclass object when it finds a single value. I feel like i'm in the right way, but couldn't figure out what needs to change.

It looks like you (technically) don't need to use dataclasses or make_dataclass in this scenario.
You can implement a custom class with a __dict__ update approach as mentioned by #Stef. Check out the following example:
from __future__ import annotations
## Dummy example of a config dict
data = {
'a': 1,
'b': [2, 2, 2],
'c': {
'c_1': 3.1
},
'd': [
1,
'2',
{'k1': 'v1'}
]
}
_CONTAINER_TYPES = (dict, list)
class Config:
def __init__(self, **kwargs):
self.__dict__ = kwargs
#classmethod
def create(cls, data: dict | list) -> Config | list:
if isinstance(data, list):
return [cls.create(e) if isinstance(e, _CONTAINER_TYPES) else e
for e in data]
new_data = {
k: cls.create(v) if isinstance(v, _CONTAINER_TYPES) else v
for k, v in data.items()
}
return cls(**new_data)
def __repr__(self):
return f"Config({', '.join([f'{name}={val!r}' for name, val in self.__dict__.items()])})"
final_config = Config.create(data)
print(final_config)
# Prints:
# Config(a=1, b=[2, 2, 2], c=Config(c_1=3.1), d=[1, '2', Config(k1='v1')])

Python: dynamic json with string interpolation

I created a class of functions that provision some cloud infrastructure.
response = self.ecs_client.register_task_definition(
containerDefinitions=[
{
"name": "redis-283C462837EF23AA",
"image": "redis:3.2.7",
"cpu": 1,
"memory": 512,
"essential": True,
},
...
This is a very long json, I show just the beginning.
Then I refactored the code to use a parameter instead of the hard coded hash, memory and cpu.
response = self.ecs_client.register_task_definition(
containerDefinitions=[
{
"name": f"redis-{git_hash}",
"image": "redis:3.2.7",
"cpu": {num_cpu},
"memory": {memory_size},
"essential": True,
},
...
I read the values of git_hash, num_cpu and memory_size from a config file prior to this code.
Now, I also want to read to entire json from a file.
The problem is that if I save {num_cpu} etc. in a file, the string interpolation won't work.
How can I extract the json from my logic and still use string interpolation or variables?

You can use Template from string.
{
"name": "redis-${git_hash}",
"image": "redis:3.2.7",
"cpu": ${num_cpu},
"memory": ${memory_size},
"essential": true
}
from string import Template
import json
if __name__ == '__main__':
data = dict(
num_cpu = 1,
memory_size = 1,
git_hash = 1
)
with open('test.json', 'r') as json_file:
content = ''.join(json_file.readlines())
template = Template(content)
configuration = json.loads(template.substitute(data))
print(configuration)
# {'name': 'redis-1', 'image': 'redis:3.2.7', 'cpu': 1, 'memory': 1, 'essential': True}
Opinion: I think the overall approach is wrong. There is a reason why this method is not as popular as others. You can separate your configuration into two files (1) a static list of options and (2) your compact changeable configuration, and compose them in your code.
EDIT: You can create an object which reads the configuration from a standard (static or changeable) JSON file FileConfig. And then compose them using another object, something line ComposedConfig.
This will allow you to extend the behaviour, and add, for example, a run-time configuration in the mix. This way the configuration from your JSON file no longer depends on the run-time params, and you can separate what is changeable from what is static in your system.
PS: The get method is just an example for explaining the composed behaviour; you can use other methods/designs.
import json
from abc import ABC, abstractmethod
class Configuration(ABC):
#abstractmethod
def get(self, key: str, default: str) -> str:
pass
class FileConfig(Configuration):
def __init__(self, file_path):
self.__content = {}
with open(file_path, 'r') as json_file:
self.__content = json.load(json_file)
def get(self, key: str, default: str) -> str:
return self.__content.get(key, default)
class RunTimeConfig(Configuration):
def __init__(self, option: str):
self.__content = {'option': option}
def get(self, key: str, default: str) -> str:
return self.__content.get(key, default)
class ComposedConfig:
def __init__(self, first: Configuration, second: Configuration):
self.__first = first
self.__second = second
def get(self, key: str, default: str) -> str:
return self.__first.get(key, self.__second.get(key, default))
if __name__ == '__main__':
static = FileConfig("static.json")
changeable = FileConfig("changeable.json")
runTime = RunTimeConfig(option="a")
config = ComposedConfig(static, changeable)
alternative = ComposedConfig(static, runTime)
print(config.get("image", "test")) # redis:3.2.7
print(alternative.get("option", "test")) # a

Can I have an optional parameter in dataclasses that is omitted when transformed to dict?

I wish to perform static type checking (pylance in vscode) on some dictionaries. The "tricky" part is the I want some of the parameters to be optional and not show up at all in the dictionary. I've tried using dataclasses and TypedDict but without luck so far.
from typing import Optional, List
from dataclasses import dataclass, asdict
#dataclass
class SubOrder:
name: str
#dataclass
class Order:
name: str
sub_orders: Optional[List[SubOrder]]
assert asdict(Order(name="Pizza")) == {"name": "Pizza"}
assert asdict(Order(name="Pizza", sub_orders=[SubOrder(name="Pasta")])) == {
"name": "Pizza",
"sub_orders": [{"name": "Pasta"}],
}
Is that achievable with dataclasses? I basically just want my static type checker (pylance / pyright) to check my dictionaries which is why I'm using dataclasses. I've tried with TypedDict as well but the type checkers does not seem to behave like I was. They always require me to set sub_orders.
The following code passes but pylance is not happy about not having sub_orders.
from typing import Optional, List, TypedDict
class SubOrder(TypedDict):
name: str
class Order(TypedDict):
name: str
sub_orders: Optional[List[SubOrder]]
assert Order(name="Pizza") == {"name": "Pizza"}
assert Order(name="Pizza", sub_orders=[SubOrder(name="Pasta")]) == {
"name": "Pizza",
"sub_orders": [{"name": "Pasta"}],
}
EDIT
I've added a bug report in pylance since this might actually be a bug in pylance / pyright

from dataclasses import asdict, dataclass
from typing import List, Optional
from validated_dc import ValidatedDC
#dataclass
class SubOrder(ValidatedDC):
name: str
#dataclass
class Order(ValidatedDC):
name: str
sub_orders: Optional[List[SubOrder]] = None
def as_dict(self):
data = asdict(self)
return {key: value for key, value in data.items() if value is not None}
data = {'name': 'pizza'}
order = Order(**data)
assert order.get_errors() is None
assert asdict(order) == {'name': 'pizza', 'sub_orders': None}
assert order.as_dict() == {'name': 'pizza'}
data = {'name': 'pizza', 'sub_orders': [{'name': 'pasta'}]}
order = Order(**data)
assert order.get_errors() is None
assert asdict(order) == {'name': 'pizza', 'sub_orders': [{'name': 'pasta'}]}
assert isinstance(order.sub_orders[0], SubOrder)
ValidatedDC - https://github.com/EvgeniyBurdin/validated_dc

You can have an optional parameter in data class by setting it to an empty string like:
from dataclasses import dataclass
#dataclass
class SubOrder:
name: str=""

Django ORM how to get raw values grouped by a field

I have a model which is like so:
class CPUReading(models.Model):
host = models.CharField(max_length=256)
reading = models.IntegerField()
created = models.DateTimeField(auto_now_add=True)
I am trying to get a result which looks like the following:
{
"host 1": [
{
"created": DateTimeField(...),
"value": 20
},
{
"created": DateTimeField(...),
"value": 40
},
...
],
"host 2": [
{
"created": DateTimeField(...),
"value": 19
},
{
"created": DateTimeField(...),
"value": 10
},
...
]
}
I need it grouped by host and ordered by created.
I have tried a bunch of stuff including using values() and annotate() in order to create a GROUP BY statement, but I think I must be missing something because in order to use GROUP BY it seems I need to use some aggregation function which I don't really want to do. I need the actual values of the reading field grouped by the host field and ordered by the created field.
This is more-or-less how any charting library needs the data.
I know I can make it happen with either python code or with raw sql queries, but I'd much prefer to use the django ORM, unless it explicitly disallows this sort of query.

As far as I'm aware, there's nothing in the ORM that makes this easy. If you want to do it in the ORM without raw queries, and if you're willing and able to change your data structure, you can solve this mostly in the ORM, with Python code kept to a minimum:
class Host(models.Model):
pass
class CPUReading(models.Model):
host = models.ForeignKey(Host, related_name="readings", on_delete=models.CASCADE)
reading = models.IntegerField()
created = models.DateTimeField(auto_now_add=True)
With this you can use two queries with fairly clean code:
from collections import defaultdict
results = defaultdict(list)
hosts = Host.objects.prefetch_related("readings")
for host in hosts:
for reading in host.readings.all():
results[host.id].append(
{"created": reading.created, "value": reading.reading}
)
Or you can do it a little more efficiently with one query and a single loop:
from collections import defaultdict
results = defaultdict(list)
readings = CPUReading.objects.select_related("host")
for reading in readings:
results[reading.host.id].append(
{"created": reading.created, "value": reading.reading}
)

Assuming you are using PostgreSQL you can use a combination of array_agg and json_object to achieve what you're after.
from django.contrib.postgres.aggregation import ArrayAgg
from django.contrib.postgres.fields import ArrayField, JSONField
from django.db.models import CharField
from django.db.models.expressions import Func, Value
class JSONObject(Func):
function = 'json_object'
output_field = JSONField()
def __init__(self, **fields):
fields, expressions = zip(*fields.items())
super().__init__(
Value(fields, output_field=ArrayField(CharField())),
Func(*expressions, template='array[%(expressions)s]'),
)
readings = dict(CPUReading.objects.values_list(
'host',
ArrayAgg(
JSONObject(
created_at='created_at',
value='value',
),
ordering='created_at',
),
))

If you want to stay close to the Django ORM, you just need to remember this doesn't return a queryset but a dictionary and is evaluated on the fly, so don't use this in declarative scope. However, the interface is similar to QuerySet.values() and has the additional requirement that it needs to be sorted first.
class PlotQuerySet(models.QuerySet):
def grouped_values(self, key_field, *fields, **expressions):
if key_field not in fields:
fields += (key_field,)
values = self.values(*fields, **expressions)
data = {}
for key, gen in itertools.groupby(values, lambda x: x.pop(key_field)):
data[key] = list(gen)
return data
PlotManager = models.Manager.from_queryset(PlotQuerySet, class_name='PlotManager')
class CpuReading(models.Model):
host = models.CharField(max_length=255)
reading = models.IntegerField()
created_at = models.DateTimeField(auto_now_add=True)
objects = PlotManager()
Example:
CpuReading.objects.order_by(
'host', 'created_at'
).grouped_values(
'host', 'created_at', 'reading'
)
Out[10]:
{'a': [{'created_at': datetime.datetime(2020, 7, 13, 16, 45, 23, 215005, tzinfo=<UTC>),
'reading': 0},
{'created_at': datetime.datetime(2020, 7, 13, 16, 45, 23, 223080, tzinfo=<UTC>),
'reading': 1},
{'created_at': datetime.datetime(2020, 7, 13, 16, 45, 23, 230218, tzinfo=<UTC>),
'reading': 2},
...],
'b': [{'created_at': datetime.datetime(2020, 7, 13, 16, 45, 23, 241476, tzinfo=<UTC>),
'reading': 0},
{'created_at': datetime.datetime(2020, 7, 13, 16, 45, 23, 242015, tzinfo=<UTC>),
'reading': 1},
{'created_at': datetime.datetime(2020, 7, 13, 16, 45, 23, 242537, tzinfo=<UTC>),
'reading': 2},
...]}

How to serialize objects which members are not serializable directly, but their str() representation is?

I receive dicts such as (without knowing the exact structure in advance)
{
'a': 1,
'id': UUID('6b3acb30-08bf-400c-bc64-bf70489e388c'),
}
This dict is not directly serializable, but when casting the value of id to an str - it is:
import json
import uuid
print(json.dumps({
'a': 1,
'id': str(uuid.UUID('6b3acb30-08bf-400c-bc64-bf70489e388c')),
}))
# outputs {"a": 1, "id": "6b3acb30-08bf-400c-bc64-bf70489e388c"}
In the general case where I have elements which need to be casted to an str before being serializable, is there a generic (pythonic) way to make the transformation automatically?

The best option is to override JSONEncoder.default method:
class MyJSONEncoder(json.JSONEncoder):
def default(self, o):
if isinstance(o, uuid.UUID):
return str(o)
return super().default(o)
print( MyJSONEncoder().encode(data) )
If you want to stringify everything that the default encoder cannot handle, you may use the following trick, although I would recommend to take control over the types you want to support.
class MyJSONEncoder(json.JSONEncoder):
def default(self, o):
try:
return super().default(o)
except TypeError:
return str(o)
print( json.dumps(data, cls=MyJSONEncoder) )
DOCS: https://docs.python.org/3/library/json.html#json.JSONEncoder.default

No there is not, but you can check for the data items individually and convert when needed. This way you do not need to know the data structure in advance. Consider the following:
import json
import uuid
data = {
'a': 1,
'id': UUID('6b3acb30-08bf-400c-bc64-bf70489e388c')
}
for k, v in data.items():
try:
json.dumps(v)
except TypeError:
data[k] = str(uuid.v)
print(json.dumps(data))
# outputs {"a": 1, "id": "6b3acb30-08bf-400c-bc64-bf70489e388c"}

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pydantic: JSON-encoding dictionary keys - python

Related

Recursively creates dataclasses based in nested dictionary

Python: dynamic json with string interpolation

Can I have an optional parameter in dataclasses that is omitted when transformed to dict?

Django ORM how to get raw values grouped by a field

How to serialize objects which members are not serializable directly, but their str() representation is?

Categories

Resources