Data descriptors for python module

Data descriptors for python module - python

I know you can define data descriptors for a class instance using the __get__ and __set__ methods. Is it possible to define something similar for an imported module ?
Use case:
I have a large test file with a lot of dictionaries defined in a
test_data.py (legacy code), therefore all of them are **mutable and cannot be modified by individual tests without using deepcopy
I want to be able to modify these dictionaries
Without re-writing the data into classes
without calling deepcopy in tests.
Test data:
expected_response_1 = dict(status=False)
Test case:
from test import test_data
data = test_data.expected_response_1
data['status'] = True
print(data)
# --> {'status': True}
print(test_data.expected_response_1)
# --> {'status': False}
Is there any *python-magic i can use to always return a copy of expected_response_1

This cannot be done directly since descriptors need to be defined as class attributes (which mean you'd have to add them to the builtin module type, which is not allowed).
BUT you can just use a simple wrapper around your test_data module and use the __getattr__() magic method:
class DataWrapper(object):
def __init__(self, module):
self._module = module
def __getattr__(self, name):
val = getattr(self._module, name)
return copy.deepcopy(val)
from test import test_data
test_data = WrapperData(test_data)

I think you mean that dictionaries are mutable, and you want to change the dictionaries in the test case without modifying the original dictionary.
You can indeed use deepcopy, which is not a bad practice at all. You can also change the test_data module to provide the dictionaries as class properties: this will return a new dictionary every time, with the original content:
test_data.py:
class test_data:
#property
#staticmethod
def expected_response_1:
return dict(status=False)
test_case.py:
from test.test_data import test_data
data = test_data.expected_response_1
data['status'] = True
print(data)
# --> {'status': True}
print(test_data.expected_response_1)
# --> {'status': False}

Related

Python class attribute 'is not defined' when referenced by another class attribute

Using the following, I am able to successfully create a parser and add my arguments to self._parser through the __init()__ method.
class Parser:
_parser_params = {
'description': 'Generate a version number from the version configuration file.',
'allow_abbrev': False
}
_parser = argparse.ArgumentParser(**_parser_params)
Now I wish to split the arguments into groups so I have updated my module, adding some classes to represent the argument groups (in reality there are several subclasses of the ArgumentGroup class), and updating the Parser class.
class ArgumentGroup:
_title = None
_description = None
def __init__(self, parser) -> ArgumentParser:
parser.add_argument_group(*self._get_args())
def _get_args(self) -> list:
return [self._title, self._description]
class ArgumentGroup_BranchType(ArgumentGroup):
_title = 'branch type arguments'
class Parser:
_parser_params = {
'description': 'Generate a version number from the version configuration file.',
'allow_abbrev': False
}
_parser = argparse.ArgumentParser(**_parser_params)
_argument_groups = [cls(_parser) for cls in ArgumentGroup.__subclasses__()]
However, I'm now seeing an error.
Traceback (most recent call last):
...
File "version2/args.py", line 62, in <listcomp>
_argument_groups = [cls(_parser) for cls in ArgumentGroup.__subclasses__()]
NameError: name '_parser' is not defined
What I don't understand is why _parser_params do exist when they are referred by another class attribute, but _parser seemingly does not exist in the same scenario? How can I refactor my code to add the parser groups as required?

This comes from the confluence of two quirks of Python:
class statements do not create a new local scope
List comprehensions do create a new local scope.
As a result, the name _parser is in a local scope whose closest enclosing scope is the global scope, so it cannot refer to the about-to-be class attribute.
A simple workaround would be to replace the list comprehension with a regular for loop.
_argument_groups = []
for cls in ArgumentGroup.__subclasses()__:
_argument_groups.append(cls(_parser))
(A better solution would probably be to stop using class attributes where instance attributes make more sense.)

Best practices for sharing a python dictionary between functions in an imported module

Relatively new to python and I'm curious as to the best method to pull off sharing/modifying a dictionary between functions within a module. Example:
some_module.py
import requests
my_dict = { 'url': 'www.difficult.com', 'keys': None, 'params': None }
def set_keys(keys):
my_dict['keys']=keys #these are needed for every request/
def set_limit_param(param):
my_dict["params"]["limit"] = param #not needed for every request
def make_request(added):
r = requests.get(my_dict["url"] + added,headers=my_dict["keys"],params=my_dict["params"])
def do_thing1():
make_request("/thing1")
def do_thing2():
set_limit_param("75,000")
make_request("/thing2")
In my use case some_module.py is imported within another script.
Calling some_module.do_thing2() modifies my dictionary with values I don't really want present when I call some_module.do_thing1().
However I want the data added when calling some_module.set_keys(key="blahblah") to persist within the dictionary.
I've experimented a bit with my_dict.copy() and copy.deepcopy() but it seems cumbersome to do that within very function that modifies my_dict. Any guidance would be appreciated.

As you are modifying the dict, I would recommend an object (from a standard class):
import requests
class Requester():
def __init__(self):
self.config = { 'url': 'www.difficult.com', 'keys': None, 'params': None }
def set_keys(self, keys):
self.config['keys']=keys #these are needed for every request/
def set_limit_param(self, param):
self.config["params"]["limit"] = param #not needed for every request
def make_request(self, added):
r = requests.get(
self.config["url"] + added,
headers=self.config["keys"],
params=self.config["params"]
)
def do_thing1(self):
make_request("/thing1")
def do_thing2(self):
set_limit_param("75,000")
make_request("/thing2")
# create
my_requester = Requester()
Then, generally in your code you use this object instead of a module.
Even if there will be only one instance of such class, objects are generally best to store some data along with some methods to these data.
Another approach would be leave the methods as they are and use a plain dict (as you did), but don't store it in the methods' module. Tread your module just like a collection of stateless methods and store/define the data (dict) separately:
# requester.py
import requests
def set_keys(my_dict, keys):
my_dict['keys']=keys #these are needed for every request/
def set_limit_param(my_dict, param):
my_dict["params"]["limit"] = param #not needed for every request
def make_request(my_dict, added):
r = requests.get(my_dict["url"] + added,headers=my_dict["keys"],params=my_dict["params"])
def do_thing1(my_dict):
make_request(my_dict, "/thing1")
def do_thing2(my_dict):
set_limit_param("75,000")
make_request(my_dict, "/thing2")
# my_app.py
import requester
my_dict = { 'url': 'www.difficult.com', 'keys': None, 'params': None }
requester.set_keys(my_dict, 1)
requester.do_thing1(my_dict)
Calling some_module.do_thing2() modifies my dictionary with values I don't really want present when I call some_module.do_thing1().
This sounds like you need 2 different dicts - to avoid modifying one by another. Depending on what you do in do_thing1 and do_thing2 functions I would also consider separate them to different classes/modules.

How to retrieve all the content of calls made to a mock?

I'm writing a unit test for a function that takes an array of dictionaries and ends up saving it in a CSV. I'm trying to mock it with pytest as usual:
csv_output = (
"Name\tSurname\r\n"
"Eve\tFirst\r\n"
)
with patch("builtins.open", mock_open()) as m:
export_csv_func(array_of_dicts)
assert m.assert_called_once_with('myfile.csv', 'wb') is None
[and here I want to gather all output sent to the mock "m" and assert it against "csv_output"]
I cannot get in any simple way all the data sent to the mock during the open() phase by csv to do the comparison in bulk, instead of line by line. To simplify things, I verified that the following code mimics the operations that export_csv_func() does to the mock:
with patch("builtins.open", mock_open()) as m:
with open("myfile.csv", "wb") as f:
f.write("Name\tSurname\r\n")
f.write("Eve\tFirst\r\n")
When I dig into the mock, I see:
>>> m
<MagicMock name='open' spec='builtin_function_or_method' id='4380173840'>
>>> m.mock_calls
[call('myfile.csv', 'wb'),
call().__enter__(),
call().write('Name\tSurname\r\n'),
call().write('Eve\tFirst\r\n'),
call().__exit__(None, None, None)]
>>> m().write.mock_calls
[call('Name\tSurname\r\n'), call('Eve\tFirst\r\n')]
>>> dir(m().write.mock_calls[0])
['__add__'...(many methods), '_mock_from_kall', '_mock_name', '_mock_parent', 'call_list', 'count', 'index']
I don't see anything in the MagickMock interface where I can gather all the input that the mock has received.
I also tried calling m().write.call_args but it only returns the last call (the last element of the mock_calls attribute, i.e. call('Eve\tFirst\r\n')).
Is there any way of doing what I want?

You can create your own mock.call objects and compare them with what you have in the .call_args_list.
from unittest.mock import patch, mock_open, call
with patch("builtins.open", mock_open()) as m:
with open("myfile.csv", "wb") as f:
f.write("Name\tSurname\r\n")
f.write("Eve\tFirst\r\n")
# Create your array of expected strings
expected_strings = ["Name\tSurname\r\n", "Eve\tFirst\r\n"]
write_calls = m().write.call_args_list
for expected_str in expected_strings:
# assert that a mock.call(expected_str) exists in the write calls
assert call(expected_str) in write_calls
Note that you can use the assert call of your choice. If you're in a unittest.TestCase subclass, prefer to use self.assertIn.
Additionally, if you just want the arg values you can unpack a mock.call object as tuples. Index 0 is the *args. For example:
for write_call in write_calls:
print('args: {}'.format(write_call[0]))
print('kwargs: {}'.format(write_call[1]))

Indeed you can't patch builtins.open.write directly since the patch within a with would need to enter the patched method and see that write is not a class method.
There are a bunch of solutions and the one I would think of first would be to use your own mock. See the example:
class MockOpenWrite:
def __init__(self, *args, **kwargs):
self.res = []
# What's actually mocking the write. Name must match
def write(self, s: str):
self.res.append(s)
# These 2 methods are needed specifically for the use of with.
# If you mock using a decorator, you don't need them anymore.
def __enter__(self):
return self
def __exit__(self, exc_type, exc_val, exc_tb):
return
mock = MockOpenWrite
with patch("builtins.open", mock):
with open("myfile.csv", "w") as f:
f.write("Name\tSurname\r\n")
f.write("Eve\tFirst\r\n")
print(f.res)
In that case, the res attribute is linked to the instance. So it disappears after the with closes.
You could eventually stored results somewhere else, like a global array, and check the results beyond the end of with.
Feel free to play around with your actual method.

I had to it this way (Python 3.9). It was quite tedious just to get the mock-args out of the function.
from somewhere import my_thing
#patch("lib.function", return_value=MagicMock())
def test_my_thing(my_mock):
my_thing(value1, value2)
(value1_call_args, value2_call_args) = my_mock.call_args_list[0].args

Use methods on Mock object

I have an object that is used for fetching information from another service which is very simple. Since the object is simple and the initialization method could be easily patched I thought I would try to write my code to be super reusable and extendable. But alas, I cannot figure out how to make it work. The code below is pretty well sudo code and is super simplified but it should get the point across.
class SimpleClient:
def __init__(self):
pass
def read(self, key, path='some/path'):
return value_from_get_on_another_service
I then have a request handler object that initializes a client via get_client() (seen below)
def get_client():
return SimpleClient()
Then a method on the request handler uses the client.read() method a few times with different parameters (2nd dependent upon the 1st).
For my tests, I thought I could "patch" the get_client method to return my own simple object that could then be used "regularly" and eliminate the dependence on the third party service and actually use the values retrieved from the method execution. I was disappointed to find it was not that easy and clean. The test pattern is seen below.
class MockClient:
def __init__(self, addr='someAddr', token='someToken'):
pass
def read(self, value, prefix):
data = {}
if prefix == 'path/1':
data = self.p1_lookup(value)
elif prefix == 'path/2':
data = self.p2_lookup(value)
return self.response_wrapper(data)
def p2_lookup(self, key):
data = {
'key1': {
'sub_key': {"55B3FE7D-9F43-4DD4-9090-9D89330C918A": "Dev2",
"7A1C2F4B-E91C-4659-A33E-1B18B0BEE2B3": "Dev"}
}
}
return data.get(key, {})
#mock.patch('a.module.get_client')
def test_authorize_valid_request_no_body(mock_get_client):
request = RequestMock()
request.body = None
handler = RequestHandler(Application(), request=request, logging_level='INFO')
mock_get_client.return_value = MockClient()
handler.authorize_request()
assert handler.verified_headers is None
assert handler.verified_body is None
assert handler.user_authenticated is False
I have seen where I can mock the responses for the actual client.read() to return multiple values with a list. But this just seems like I will be doing lots of copy and paste and have to do the same thing over and over for each little test. Forgive me if this is simple, sadly I am just learning the art of testing. Is there a way to accomplish what I am trying to do? Maybe there is something super simple I am missing. Or maybe I am just totally on the wrong track for no good reason. Help?!

After a sleep, with fresh eyes I was able to figure this out relatively quickly thanks to a couple other similar questions/answers that I had not found before. Primarily this one, Python Mock Object with Method called Multiple Times.
Rather than needing to rebuild the module object completely I need to let mock do that for me and then override the specific method on it with the side_effect attribute. So below is what sanitized version of the code looks like.
def read_override(value, prefix):
lookup_data1 = {"lookup1": {'key1': 'value1'}}
lookup_data2 = {'some_id': {'akey': {'12345678': 'DEV'}}
data = {}
if prefix == 'path1/1a':
data = lookup_data1.get(value, {})
elif prefix == 'path2/2a':
data = lookup_data2.get(value, {})
return {'data': data}
# Create a true Mock of the entire LookupClient Object
VAULT_MOCK = mock.Mock(spec=LookupClient)
# make the read method work the way I want it to with an "override" of sorts
VAULT_MOCK.read.side_effect = vault_read_override
Then the test simply looked like this...
#mock.patch('a.module.get_client')
def test_authorize_valid_request_no_body(get_client):
get_client.return_value = VAULT_MOCK
request = RequestMock()
request.body = None
handler = RequestHandler(Application(), request=request, logging_level='INFO')
handler.authorize_request()
assert handler.verified_headers is None
assert handler.verified_body is None
assert handler.user_authenticated is False

When using Python classes as program configuration structures (which includes inherited class attributes), a good way to save/restore?

Let's say I have a (simplified) class as below. I am using it for a program configuration (hyperparameters).
# config.py
class Config(object): # default configuration
GPU_COUNT = 1
IMAGES_PER_GPU = 2
MAP = {1:2, 2:3}
def display(self):
pass
# experiment1.py
from config import Config as Default
class Config(Default): # some over-written configuration
GPU_COUNT = 2
NAME='2'
# run.py
from experiment1 import Config
cfg = Config()
...
cfg.NAME = 'ABC' # possible runtime over-writing
# Now I would like to save `cfg` at this moment
I'd like to save this configuration and restore later. The member functions must be out of concern when restoring.
1. When I tried pickle:
import pickle
with open('cfg.pk', 'rb') as f: cfg = pickle.load(f)
##--> AttributeError: Can't get attribute 'Config' on <module '__main__'>
I saw a solution using class_def of Config, but I wish I can restore the configuration without knowing the class definition (eg, export to dict and save as JSON)
2. I tried to convert class to dict (so that I can export as JSON)
cfg.__dict__ # {'NAME': 'ABC'}
vars(cfg) # {'NAME': 'ABC'}
In both cases, it was difficult to access attributes. Is it possible?

The question's title is "how to convert python class to dict", but I suspect you are really just looking for an easy way to represent (hyper)parameters.
By far the easiest solution is to not use classes for this. I've seen it happen on some machine learning tutorials, but I consider it a pretty ugly hack. It breaks some semantics about classes vs objects, and the difficulty pickling is a result from that. How about you use a simple class like this one:
class Params(dict):
__getattr__ = dict.__getitem__
__setattr__ = dict.__setitem__
__delattr__ = dict.__delitem__
def __getstate__(self):
return self
def __setstate__(self, state):
self.update(state)
def copy(self, **extra_params):
return Params(**self, **extra_params)
It can do everything the class approach can. Predefined configs are then just objects you should copy before editing, as follows:
config = Params(
GPU_COUNT = 2,
NAME='2',
)
other_config = config.copy()
other_config.GPU_COUNT = 4
Or alternatively in one step:
other_config = config.copy(
GPU_COUNT = 4
)
Works fine with pickle (although you will need to have the Params class somewhere in your source), and you could also easily write load and save methods for the Params class if you want to use JSON.
In short, do not use a class for something that really is just an object.

Thankfully, #evertheylen's answer was great to me. However, the code returns error when p.__class__ = Params, so I slightly changed as below. I think it works in the same way.
class Params(dict):
__getattr__ = dict.__getitem__
__setattr__ = dict.__setitem__
__delattr__ = dict.__delitem__
def __getstate__(self):
return self
def __setstate__(self, state):
self.update(state)
def copy(self, **extra_params):
lhs = Params()
lhs.update(self)
lhs.update(extra_params)
return lhs
and you can do
config = Params(
GPU_COUNT = 2,
NAME='2',
)
other_config = config.copy()
other_config.GPU_COUNT = 4

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Data descriptors for python module - python

Related

Python class attribute 'is not defined' when referenced by another class attribute

Best practices for sharing a python dictionary between functions in an imported module

How to retrieve all the content of calls made to a mock?

Use methods on Mock object

When using Python classes as program configuration structures (which includes inherited class attributes), a good way to save/restore?

Categories

Resources