Is there any way to add a pointer to data class structure?
from dataclasses import dataclass
#dataclass
class Point:
x: int
y: int
pointer: pointer # Here
If you want to create a singly-linked list of Point objects then you can do
from dataclasses import dataclass
#dataclass
class Point:
x: int
y: int
next: 'Point' = None
And then you can do, for example:
a = Point(1, 1)
b = Point(4, 5)
a.next = b
Unless you have a specific need for a linked list, though, I would recommend using one of the standard Python data types like list. There's usually little need to roll your own, especially something like a linked list that takes a lot of management.
Assuming pointer is a class, sure, there's no restrictions.
#dataclass
class Point:
x: int
y: int
pointer: pointer
If you don't know (or care) about the concrete type, you can always use Any as the type. Dataclasses really only care about the value.
from typing import Any
#dataclass
class Point:
x: int
y: int
pointer: Any
Now pointer is a field in the dataclass which can contain anything at all.
Related
Unfortunately I have to load a dictionary containing an invalid name (which I can't change):
dict = {..., "invalid-name": 0, ...}
I would like to cast this dictionary into a dataclass object, but I can't define an attribute with this name.
from dataclasses import dataclass
#dataclass
class Dict:
...
invalid-name: int # can't do this
...
The only solution I could find is to change the dictionary key into a valid one right before casting it into a dataclass object:
dict["valid_name"] = dict.pop("invalid-name")
But I would like to avoid using string literals...
Is there any better solution to this?
One solution would be using dict-to-dataclass. As mentioned in its documents it has two options:
1.passing dictionary keys
It's probably quite common that your dataclass fields have the same names as the dictionary keys they map to but in case they don't, you can pass the dictionary key as the first argument (or the dict_key keyword argument) to field_from_dict:
#dataclass
class MyDataclass(DataclassFromDict):
name_in_dataclass: str = field_from_dict("nameInDictionary")
origin_dict = {
"nameInDictionary": "field value"
}
dataclass_instance = MyDataclass.from_dict(origin_dict)
>>> dataclass_instance.name_in_dataclass
"field value"
Custom converters
If you need to convert a dictionary value that isn't covered by the defaults, you can pass in a converter function using field_from_dict's converter parameter:
def yes_no_to_bool(yes_no: str) -> bool:
return yes_no == "yes"
#dataclass
class MyDataclass(DataclassFromDict):
is_yes: bool = field_from_dict(converter=yes_no_to_bool)
dataclass_instance = MyDataclass.from_dict({"is_yes": "yes"})
>>> dataclass_instance.is_yes
True
The following code allow to filter the nonexistent keys :
import dataclasses
#dataclasses.dataclass
class ClassDict:
valid-name0: str
valid-name1: int
...
dict = {..., "invalid-name": 0, ...}
dict = {k:v for k,v in dict.items() if k in tuple(e.name for e in dataclasses.fields(ClassDict).keys())}
However, I'm sure there should be a better way to do it since this is a bit hacky.
I would define a from_dict class method anyway, which would be a natural place to make the change.
#dataclass
class MyDict:
...
valid_name: int
...
#classmethod
def from_dict(cls, d):
d['valid_name'] = d.pop('invalid-name')
return cls(**d)
md = MyDict.from_dict({'invalid-name': 3, ...})
Whether you should modify d in place or do something to avoid unnecessary copies is another matter.
Another option could be to use the dataclass-wizard library, which is likewise a de/serialization library built on top of dataclasses. It should similarly support custom key mappings, as needed in this case.
I've also timed it with the builtin timeit module, and found it to be (on average) about 5x faster than a solution with dict_to_dataclass. I've added the code I used for comparison below.
from dataclasses import dataclass
from timeit import timeit
from typing_extensions import Annotated # Note: in Python 3.9+, can import this from `typing` instead
from dataclass_wizard import JSONWizard, json_key
from dict_to_dataclass import DataclassFromDict, field_from_dict
#dataclass
class ClassDictWiz(JSONWizard):
valid_name: Annotated[int, json_key('invalid-name')]
#dataclass
class ClassDict(DataclassFromDict):
valid_name: int = field_from_dict('invalid-name')
my_dict = {"invalid-name": 0}
n = 100_000
print('dict-to-dataclass: ', round(timeit('ClassDict.from_dict(my_dict)', globals=globals(), number=n), 3))
print('dataclass-wizard: ', round(timeit('ClassDictWiz.from_dict(my_dict)', globals=globals(), number=n), 3))
i1, i2 = ClassDict.from_dict(my_dict), ClassDictWiz.from_dict(my_dict)
# assert we get the same result with both approaches
assert i1.__dict__ == i2.__dict__
Results, on my Mac OS X laptop:
dict-to-dataclass: 0.594
dataclass-wizard: 0.098
I am learning about Dataclasses but I am confused on the purpose of sort_index and how it actually works.
I can't seem to find any valuable information on it. The official Python documentation doesn't mention it, which is mind boggling.
Here is an example:
#dataclass(order=True)
class Person:
sort_index: int = field(init=False, repr=False)
name: str
age: int
weight: int = 190
def __post_init__(self):
self.sort_index = self.weight
So, what is the purpose of sort_index? What is it used for? When do I use it?
Thanks again for taking the time to answer my question. I am new to Python.
Setting a sort_index attribute (or indeed, any identifier—the name is irrelevant) in the __post_init__ method appears to be the value on which comparisons are performed.
There is an implicit setting of the comparison methods (__lt__, __gt__, etc--read about dunder methods if unfamiliar), using the attributes provided in the __post_init__ method first, and if required, the remaining attributes for resolution.
Class constructor
from dataclasses import dataclass, field
#dataclass(order=True)
class Person:
sort_index: int = field(init=False)
age: int
def __post_init__(self):
self.sort_index = self.age
first example—attribute age is equal:
>>> p1 = Person(age=10)
>>> p2 = Person(age=10)
>>> p1 == p2
True
Second example—age is greater:
>>> p1 = Person(age=10)
>>> p2 = Person(age=20)
>>> p2 > p1
True
More complex example:
from dataclasses import dataclass, field
#dataclass(order=True)
class Person:
foo: int = field(init=False, repr=False)
bar: int = field(init=False, repr=False)
name: str
age: int
weight: int = 190
def __post_init__(self):
self.foo = self.weight
self.bar = self.age
>>> p1 = Person('p1', 10)
>>> p2 = Person('p1', 11)
>>> p2 > p2
True
Reason
foo (weight) is equal for both instances, so comparison is done on bar (age)
Conclusion
The comparisons can be arbitrarily complex, and identifiers are not important.
I highly recommend this video on dataclasses, by ArjanCodes.
Apart from the video, here's a github link to example dataclass code (from the same video).
Hope this helped—I just learned about dataclasses myself.
Finally I've found the simple truth about that.
First, 'sort_index' or whatever you want to call this attribute, in not usefull unless you need to sort the class depending on an attribute defined after the init is done (then defined in the post_init).
All the tricky behaviour comes from how #dataclasse(order=True) works.
It is not intended to make direct comparisons like var1 > var2, but it is used to sort your objects, if, lets say you store them into an iterable that you can sort.
And this sorting is done like that (objects must be instances from the same class of course):
compare the first attribute to sort the objects
in case of equality -> compare with the second attribute, etc...
So, the order the attributes are wrote matters. And that is why one may use a 'sort_index' simply to put this attribute in the first place even though it is not defined in the init, but after the init.
(I've found a good explanation in this video)
#dataclass(order=True)
class Person:
sort_index: int = field(init=False) # <- not defined yet
age: int
name: str
def __post_init__(self):
self.sort_index = self.age # <- definition's here
# if you try this:
print(person_1 == person_2)
# and get 'True', it means that all the values of the attributes of person_1
# and person_2 are strictly the same, not only 'sort_index'
In this example, the first sorting attribute is sort_index which is also the age, it is not a very good example. A better attribute could be an autogenerated id given after the init of the object, but even then, it would be easier to do:
#dataclass(order=True)
class Person:
id: int = field(init=False, default_factory=get_an_id_function)
age: int
name: str
# Where get_an_id_function is a function that returns an id
This question already has an answer here:
Is there a way to define a list data structure with specific element types in Python 3.9 type hinting? [closed]
(1 answer)
Closed 12 months ago.
I have a data structure in my code that (for the sake of an MWE) is a list where the first element is a string, and the second element is an integer. For example:
foo: MyStructure = ["hello", 42].
Now, since there is an ordering to this structure, usually I would use a tuple and instead do:
foo: Tuple[str, int] = ("hello", 42).
But I explicitly want to be able to easily modify elements within the structure. In particular, I want to be able to set foo[0] = "goodbye", which cannot be done if foo is a tuple.
What is the best way to go about typing this structure?
(I don't think that this question is opinion-based, since I think there is likely clear rationale for how to handle this that would be preferred by most developers.)
Right now, the main solution I can think of is to not actually type the structure correctly, and instead to define my own structure whose true type is listed in a comment:
# MyStructure = [str, int]
MyStructure = List[Union[str, int]]
foo: MyStructure = ["hello", 42]
Is there a better way?
You don't want a list or a tuple; you want a custom class representing the type-level product of str and int. A dataclass is particularly useful here.
from dataclasses import dataclass
#dataclass
class MyStructure:
first: str
second: int
foo: MyStructure = MyStructure("hello", 42)
assert foo.first == "hello"
assert foo.second = 42
If you really want to access the components using integer indices, you can add a __getitem__ method to the class:
#dataclass
class MyStructure:
first: str
second: int
def __getitem__(self, key) -> Union[str,int]:
if key == 0:
return self.first
elif key == 1:
return self.second
else:
raise IndexError(key)
In addition, an instance of MyStructure uses less memory than the corresponding list:
>>> foo = MyStructure("hello", 42)
>>> import sys
>>> sys.getsizeof(foo)
48
>>> sys.getsizeof(["hello", 42])
72
I want to create a data class instance and supply values later.
How can I do this?
def create_trade_data():
trades = []
td = TradeData()
td.Symbol='New'
trades.append(td)
return trades
DataClass:
from dataclasses import dataclass
#dataclass
class TradeData:
Symbol : str
ExecPrice : float
You have to make the attributes optional by giving them a default value None
from dataclasses import dataclass
#dataclass
class TradeData:
Symbol: str = None
ExecPrice: float = None
Then your create_trade_data function would return
[TradeData(Symbol='New', ExecPrice=None)]
Now, I chose None as the default value to indicate a lack of content. Of course, you could choose more sensible defaults like in the other answer.
from dataclasses import dataclass
#dataclass
class TradeData:
Symbol : str = ''
ExecPrice : float = 0.0
With the = operator you can assign default values.
There is the field method which is used for mutable values, like list.
Let's say I want to store some information about a conference schedule with a presentation time and a pause time. I can do this in a NamedTuple.
from typing import NamedTuple
class BlockTime(NamedTuple):
t_present: float
t_pause: float
However, if I also want to store how much each block would take such that t_each = t_pause + t_present, I can't just add it as an attribute:
class BlockTime(NamedTuple):
t_present: float
t_pause: float
# this causes an error
t_each = t_present + t_pause
What is the correct way to do this in Python? If I make an __init__(self) method and store it as an instance variable there, but it would then be mutable.
In case it would be okay that it's not really stored but calculated dynamically you could use a simple property for it.
from typing import NamedTuple
class BlockTime(NamedTuple):
t_present: float
t_pause: float
#property
def t_each(self):
return self.t_present + self.t_pause
>>> b = BlockTime(10, 20)
>>> b.t_each # only available as property, not in the representation nor by indexing or iterating
30
That has the advantage that you can never (not even accidentally) store a wrong value for it. However at the expense of not actually storing it at all. So in order to appear as if it were stored you'd have to at least override __getitem__, __iter__, __repr__ which is likely too much trouble.
For example the NamedTuple approach given by Patrick Haugh has the downside that it's still possible to create inconsistent BlockTimes or lose parts of the namedtuple convenience:
>>> b = BlockTime.factory(1.0, 2.0)
>>> b._replace(t_present=20)
BlockTime(t_present=20, t_pause=2.0, t_each=3.0)
>>> b._make([1, 2])
TypeError: Expected 3 arguments, got 2
The fact that you actually have a "computed" field that has to be in sync with other fields already indicates that you probably shouldn't store it at all to avoid inconsistent state.
You can make a classmethod that builds BlockTime objects
class BlockTime(NamedTuple):
t_present: float
t_pause: float
t_each: float
#classmethod
def factory(cls, present, pause):
return cls(present, pause, present+pause)
print(BlockTime.factory(1.0, 2.0))
# BlockTime(t_present=1.0, t_pause=2.0, t_each=3.0)
EDIT:
Here's a solution using the new Python 3.7 dataclass
from dataclasses import dataclass, field
#dataclass(frozen=True)
class BlockTime:
t_present: float
t_pause: float
t_each: float = field(init=False)
def __post_init__(self):
object.__setattr__(self, 't_each', self.t_present + self.t_pause)
Frozen dataclasses aren't totally immutable but they're pretty close, and this lets you have natural looking instance creation BlockTime(1.0, 2.0)
Well.. You cant override __new__ or __init__ of a class whose parent is NamedTuple. But you can overide __new__ of a class, inherited from another class whose parent is NamedTuple.
So you can do something like this
from typing import NamedTuple
class BlockTimeParent(NamedTuple):
t_present: float
t_pause: float
t_each: float
class BlockTime(BlockTimeParent):
def __new__(cls, t_present, t_pause):
return super().__new__(cls, t_present, t_pause, t_present+ t_pause)
b = BlockTime(1,2)
print (b)
# BlockTime(t_present=1, t_pause=2, t_each=3)