Mocking with h5py returning dictionary in Python - python

I have the following in a function:
import h5py
with h5py.File(path, 'r') as f:
big = f['big']
box = f['box']
I'm currently writing a test for the function where I try to mock it
by something like:
def test_function(mocker):
mocker.patch("h5py.File", new=mocker.mock_open())
...
Where mocker comes from: https://pypi.org/project/pytest-mock/
What I want to achieve is for the mock to return me a dict by the name of f so I'm able to interact with it such as in the function above.
Is this possible, I'm prepared to use whatever brute force solution possible out there...
br.
KJ

As described here, you need to ensure the result of h5py.File().__enter__() returns an appropriate dictionary:
from unittest import mock
import h5py
import pytest
def foo():
with h5py.File('.', 'r') as f:
big = f['big']
box = f['box']
return big, box
def test_foo(mocker):
d = {'big': 1, 'box': 2}
m = mocker.MagicMock()
m.__enter__.return_value = d
mocker.patch("h5py.File",
return_value=m)
assert foo() == (1,2)

Related

Mocking functions in python called from a dictionary

I have a problem with the code below. I would like to mock the functions in different file for unit testing in the FUNCTION_MAPPING part.
import module.module2 as module_name
FUNCTION_MAPPING = {
1: module_name.foo,
2: module_name.foo2,
3: module_name.foo3
}
def my_func(number):
function_call = FUNCTION_MAPPING[number]
result = function_call()
return result
For some reason I can not mock those functions. I have tried every possible way that i had knowledge about. If possible i would like not to change the code above.
foo, foo2 and foo3 inner code can be anything print(1), print(2) etc
Code of the unit test:
#patch("module_of_the_code_above.module_name.foo",return_value="Test")
def test_my_func(self,mocked_foo):
result = my_func(1)
nose_tools.assert_equal(result,"Test")
Simple two options if you use MagicMock:
The issue with this case is that as you import your module the dictionary with the lookups is created and the references to the "module_name.foo" is made, so any patching/mocking at a global level will not affect that explicit mapping, you have to replace/wrap it on that structure.
....
# in you test manually replace the function mapping with either new
# functions you define, but I prefer MagicMock as it allows you to get
# all kinds of goodies.
# This is necessary as the dict for the function lookup is probably already
# initialize before any mocking can take place (as you include the module)
# and so even patching the function globally will not affect that lookup
# You can use MagicMock so that your original function is not used..
FUNCTION_MAPPING['1'] = mock.MagicMock()
FUNCTION_MAPPING['2'] = mock.MagicMock()
FUNCTION_MAPPING['3'] = mock.MagicMock()
# Or
# if you just want to spy/keep stats but still call the original function, you will
# need to do something like
FUNCTION_MAPPING['1'] = mock.Mock(wraps=FUNCTION_MAPPING['1')
# then you can all the great things that you can do with mock.
assert FUNCTION_MAPPING['1'].call_counts == 3
PS: did not have time to test/vet the this for exact syntax but hope this points you in the right direction.
You should assign the function(without brackets) and not the result of the fucntion - {1: func}
def hello():
print('Hello!')
def text(text):
print(text)
function_map = {
1: hello,
2: text
}
func1 = function_map[1]
func1()
func2 = function_map[2]
func2('ABC123')
Output:
Hello!
ABC123
When working with imports:
import math
function_map = {
'factorial': math.factorial,
'gcd': math.gcd
}
func1 = function_map['factorial']
print(func1(5))
func2 = function_map['gcd']
print(func2(5, 25))

Re-serializing values and functions into a new environment in Python

I'm interested in moving a "session" across a boundary where I can only pass text.
I've been using Dill and this ALMOST seems to work.
import dill
from base64 import urlsafe_b64encode
def fun1():
value_one = "abcdef"
def sub_fun_one(x):
return x * x
__context_export = {}
for val in globals():
if not val.startswith("_"):
__context_export[val] = dill.dumps(globals()[val])
for val in locals():
if not val.startswith("_"):
__context_export[val] = dill.dumps(locals()[val])
# __context_export["globals"] = dill.dumps(globals())
# __context_export["locals"] = dill.dumps(locals())
b64_string = str(urlsafe_b64encode(dill.dumps(__context_export)), encoding="ascii")
print(b64_string)
fun1()
This outputs gASVHAQAAAAAAAB9lCiMBGRpbGyUQzeABJUsAAAAAAAAAIwKZGlsbC5fZGlsbJSMDl9pbXBvcnRfbW9kdWxllJOUjARkaWxslIWUUpQulIwRdXJsc2FmZV9iNjRlbmNvZGWUQyuABJUgAAAAAAAAAIwGYmFzZTY0lIwRdXJsc2FmZV9iNjRlbmNvZGWUk5QulIwEZnVuMZRCYwIAAIAElVgCAAAAAAAAjApkaWxsLl9kaWxslIwQX2NyZWF0ZV9mdW5jdGlvbpSTlChoAIwMX2NyZWF0ZV9jb2RllJOUKEsASwBLAEsFSwVLQ0OGZAF9AGQCZAOEAH0BaQB9AnQAgwBEAF0ifQN8A6ABZAShAXMWdAKgA3QAgwB8AxkAoQF8AnwDPABxFnQEgwBEAF0ifQN8A6ABZAShAXNAdAKgA3QEgwB8AxkAoQF8AnwDPABxQHQFdAZ0AqADfAKhAYMBZAVkBo0CfQR0B3wEgwEBAGQAUwCUKE6MBmFiY2RlZpRoBChLAUsASwBLAUsCS1NDCHwAfAAUAFMAlE6FlCmMAXiUhZSMMS9ob21lL2RhYXJvbmNoL2NvZGUvc2FtZS1jbGkvZXhwZXJpbWVudGFsL2Z1bjEucHmUjAtzdWJfZnVuX29uZZRLCEMCAAGUKSl0lFKUjBlmdW4xLjxsb2NhbHM-LnN1Yl9mdW5fb25llIwBX5SMBWFzY2lplIwIZW5jb2RpbmeUhZR0lCiMB2dsb2JhbHOUjApzdGFydHN3aXRolIwEZGlsbJSMBWR1bXBzlIwGbG9jYWxzlIwDc3RylIwRdXJsc2FmZV9iNjRlbmNvZGWUjAVwcmludJR0lCiMCXZhbHVlX29uZZRoDIwQX19jb250ZXh0X2V4cG9ydJSMA3ZhbJSMCmI2NF9zdHJpbmeUdJRoC4wEZnVuMZRLBUMWAAEEAggDBAEKAQoBFgIKAQoBFgQWApQpKXSUUpRjX19idWlsdGluX18KX19tYWluX18KaCROTn2UTnSUUpQulIwJdmFsdWVfb25llEMVgASVCgAAAAAAAACMBmFiY2RlZpQulIwLc3ViX2Z1bl9vbmWUQ9SABJXJAAAAAAAAAIwKZGlsbC5fZGlsbJSMEF9jcmVhdGVfZnVuY3Rpb26Uk5QoaACMDF9jcmVhdGVfY29kZZSTlChLAUsASwBLAUsCS1NDCHwAfAAUAFMAlE6FlCmMAXiUhZSMMS9ob21lL2RhYXJvbmNoL2NvZGUvc2FtZS1jbGkvZXhwZXJpbWVudGFsL2Z1bjEucHmUjAtzdWJfZnVuX29uZZRLCEMCAAGUKSl0lFKUY19fYnVpbHRpbl9fCl9fbWFpbl9fCmgKTk59lE50lFKULpSMA3ZhbJRDEoAElQcAAAAAAAAAjAN2YWyULpR1Lg==" as expected.
But in the second function - mocked up here - "value_one" doesn't seem to work at all.
import dill
from base64 import urlsafe_b64decode
def fun2(import_string):
__base64_decode = urlsafe_b64decode(import_string)
__context_import_dict = dill.loads(__base64_decode)
# __globals_import = dill.loads(__context_import_dict["globals"])
# __locals_import = dill.loads(__context_import_dict["locals"])
print(__context_import_dict)
for k in __context_import_dict:
print(f"local = {k}")
if locals().get(k) is None:
g = dill.loads(__context_import_dict[k])
locals()[k] = g
print(value_one)
print(f"Square value: {sub_fun_one(5)}")
import_value = "gASVHAQAAAAAAAB9lCiMBGRpbGyUQzeABJUsAAAAAAAAAIwKZGlsbC5fZGlsbJSMDl9pbXBvcnRfbW9kdWxllJOUjARkaWxslIWUUpQulIwRdXJsc2FmZV9iNjRlbmNvZGWUQyuABJUgAAAAAAAAAIwGYmFzZTY0lIwRdXJsc2FmZV9iNjRlbmNvZGWUk5QulIwEZnVuMZRCYwIAAIAElVgCAAAAAAAAjApkaWxsLl9kaWxslIwQX2NyZWF0ZV9mdW5jdGlvbpSTlChoAIwMX2NyZWF0ZV9jb2RllJOUKEsASwBLAEsFSwVLQ0OGZAF9AGQCZAOEAH0BaQB9AnQAgwBEAF0ifQN8A6ABZAShAXMWdAKgA3QAgwB8AxkAoQF8AnwDPABxFnQEgwBEAF0ifQN8A6ABZAShAXNAdAKgA3QEgwB8AxkAoQF8AnwDPABxQHQFdAZ0AqADfAKhAYMBZAVkBo0CfQR0B3wEgwEBAGQAUwCUKE6MBmFiY2RlZpRoBChLAUsASwBLAUsCS1NDCHwAfAAUAFMAlE6FlCmMAXiUhZSMMS9ob21lL2RhYXJvbmNoL2NvZGUvc2FtZS1jbGkvZXhwZXJpbWVudGFsL2Z1bjEucHmUjAtzdWJfZnVuX29uZZRLCEMCAAGUKSl0lFKUjBlmdW4xLjxsb2NhbHM-LnN1Yl9mdW5fb25llIwBX5SMBWFzY2lplIwIZW5jb2RpbmeUhZR0lCiMB2dsb2JhbHOUjApzdGFydHN3aXRolIwEZGlsbJSMBWR1bXBzlIwGbG9jYWxzlIwDc3RylIwRdXJsc2FmZV9iNjRlbmNvZGWUjAVwcmludJR0lCiMCXZhbHVlX29uZZRoDIwQX19jb250ZXh0X2V4cG9ydJSMA3ZhbJSMCmI2NF9zdHJpbmeUdJRoC4wEZnVuMZRLBUMWAAEEAggDBAEKAQoBFgIKAQoBFgQWApQpKXSUUpRjX19idWlsdGluX18KX19tYWluX18KaCROTn2UTnSUUpQulIwJdmFsdWVfb25llEMVgASVCgAAAAAAAACMBmFiY2RlZpQulIwLc3ViX2Z1bl9vbmWUQ9SABJXJAAAAAAAAAIwKZGlsbC5fZGlsbJSMEF9jcmVhdGVfZnVuY3Rpb26Uk5QoaACMDF9jcmVhdGVfY29kZZSTlChLAUsASwBLAUsCS1NDCHwAfAAUAFMAlE6FlCmMAXiUhZSMMS9ob21lL2RhYXJvbmNoL2NvZGUvc2FtZS1jbGkvZXhwZXJpbWVudGFsL2Z1bjEucHmUjAtzdWJfZnVuX29uZZRLCEMCAAGUKSl0lFKUY19fYnVpbHRpbl9fCl9fbWFpbl9fCmgKTk59lE50lFKULpSMA3ZhbJRDEoAElQcAAAAAAAAAjAN2YWyULpR1Lg=="
fun2(import_value)
I'd prefer to do this more elegantly than eval 'k = v' because that will be hard to reserialize functions and dictionaries into.
Is this possible?
To be clear, I've read about the dangers of modifying locals() and I do NOT want to do this. But I also cannot rewrite the code that I'm accessing elsewhere to use a custom dict.
E.g. in fun2, I can NOT change print(value_one) to print(my_dict['value_one'])
Ok, so it APPEARS that doing the following works:
for k in __context_import_dict:
if globals().get(k) is None:
globals()[k] = dill.loads(__context_import_dict[k])
What kind of pain am i signing myself up for?
From the Python documentation about locals():
Note: The contents of this dictionary should not be modified; changes may not affect the values of local and free variables used by the interpreter.
Function namespaces are not commmon dictionaries like module namespaces for performance reasons. I was bitten by this same quirk just yesterday.

How to pickle a function in Python?

I defined a simple function and pickled it
However when I deserialised it in another file
I couldn’t load it back
I got an error
Here is an example:
import pickle
def fnc(c=0):
a = 1
b = 2
return a,b,c
f = open('example', 'ab')
pickle.dump(fnc, f)
f.close()
f = open('example', 'rb')
fnc = pickle.load(f)
print(fnc)
print(fnc())
print(fnc(1))
<function fnc at 0x7f06345d7598>
(1, 2, 0)
(1, 2, 1)
You can also do it using shelve module. I believe it still uses pickle to store data, but very convenient feature of it is that you can store data in a form of key-value pairs. For example, if you store a ML model, you can store training data and/or feature column names along with the model itself which makes it more convenient.
import shelve
def func(a, b):
return a+b
# Now store function
with shelve.open('foo.shlv', 'w') as shlv:
shlv['function'] = func
# Load function
with shelve.open('foo.shlv', 'r') as shlv:
x = shlv['function']
print(x(2, 3))

Updating variables across files persistently [duplicate]

So, I want to store a dictionary in a persistent file. Is there a way to use regular dictionary methods to add, print, or delete entries from the dictionary in that file?
It seems that I would be able to use cPickle to store the dictionary and load it, but I'm not sure where to take it from there.
If your keys (not necessarily the values) are strings, the shelve standard library module does what you want pretty seamlessly.
Use JSON
Similar to Pete's answer, I like using JSON because it maps very well to python data structures and is very readable:
Persisting data is trivial:
>>> import json
>>> db = {'hello': 123, 'foo': [1,2,3,4,5,6], 'bar': {'a': 0, 'b':9}}
>>> fh = open("db.json", 'w')
>>> json.dump(db, fh)
and loading it is about the same:
>>> import json
>>> fh = open("db.json", 'r')
>>> db = json.load(fh)
>>> db
{'hello': 123, 'bar': {'a': 0, 'b': 9}, 'foo': [1, 2, 3, 4, 5, 6]}
>>> del new_db['foo'][3]
>>> new_db['foo']
[1, 2, 3, 5, 6]
In addition, JSON loading doesn't suffer from the same security issues that shelve and pickle do, although IIRC it is slower than pickle.
If you want to write on every operation:
If you want to save on every operation, you can subclass the Python dict object:
import os
import json
class DictPersistJSON(dict):
def __init__(self, filename, *args, **kwargs):
self.filename = filename
self._load();
self.update(*args, **kwargs)
def _load(self):
if os.path.isfile(self.filename)
and os.path.getsize(self.filename) > 0:
with open(self.filename, 'r') as fh:
self.update(json.load(fh))
def _dump(self):
with open(self.filename, 'w') as fh:
json.dump(self, fh)
def __getitem__(self, key):
return dict.__getitem__(self, key)
def __setitem__(self, key, val):
dict.__setitem__(self, key, val)
self._dump()
def __repr__(self):
dictrepr = dict.__repr__(self)
return '%s(%s)' % (type(self).__name__, dictrepr)
def update(self, *args, **kwargs):
for k, v in dict(*args, **kwargs).items():
self[k] = v
self._dump()
Which you can use like this:
db = DictPersistJSON("db.json")
db["foo"] = "bar" # Will trigger a write
Which is woefully inefficient, but can get you off the ground quickly.
Unpickle from file when program loads, modify as a normal dictionary in memory while program is running, pickle to file when program exits? Not sure exactly what more you're asking for here.
Assuming the keys and values have working implementations of repr, one solution is that you save the string representation of the dictionary (repr(dict)) to file. YOu can load it using the eval function (eval(inputstring)). There are two main disadvantages of this technique:
1) Is will not work with types that have an unuseable implementation of repr (or may even seem to work, but fail). You'll need to pay at least some attention to what is going on.
2) Your file-load mechanism is basically straight-out executing Python code. Not great for security unless you fully control the input.
It has 1 advantage: Absurdly easy to do.
My favorite method (which does not use standard python dictionary functions): Read/write YAML files using PyYaml. See this answer for details, summarized here:
Create a YAML file, "employment.yml":
new jersey:
mercer county:
pumbers: 3
programmers: 81
middlesex county:
salesmen: 62
programmers: 81
new york:
queens county:
plumbers: 9
salesmen: 36
Step 3: Read it in Python
import yaml
file_handle = open("employment.yml")
my__dictionary = yaml.safe_load(file_handle)
file_handle.close()
and now my__dictionary has all the values. If you needed to do this on the fly, create a string containing YAML and parse it wth yaml.safe_load.
If using only strings as keys (as allowed by the shelve module) is not enough, the FileDict might be a good way to solve this problem.
pickling has one disadvantage. it can be expensive if your dictionary has to be read and written frequently from disk and it's large. pickle dumps the stuff down (whole). unpickle gets the stuff up (as a whole).
if you have to handle small dicts, pickle is ok. If you are going to work with something more complex, go for berkelydb. It is basically made to store key:value pairs.
Have you considered using dbm?
import dbm
import pandas as pd
import numpy as np
db = b=dbm.open('mydbm.db','n')
#create some data
df1 = pd.DataFrame(np.random.randint(0, 100, size=(15, 4)), columns=list('ABCD'))
df2 = pd.DataFrame(np.random.randint(101,200, size=(10, 3)), columns=list('EFG'))
#serialize the data and put in the the db dictionary
db['df1']=df1.to_json()
db['df2']=df2.to_json()
# in some other process:
db=dbm.open('mydbm.db','r')
df1a = pd.read_json(db['df1'])
df2a = pd.read_json(db['df2'])
This tends to work even without a db.close()

With Python, can I keep a persistent dictionary and modify it?

So, I want to store a dictionary in a persistent file. Is there a way to use regular dictionary methods to add, print, or delete entries from the dictionary in that file?
It seems that I would be able to use cPickle to store the dictionary and load it, but I'm not sure where to take it from there.
If your keys (not necessarily the values) are strings, the shelve standard library module does what you want pretty seamlessly.
Use JSON
Similar to Pete's answer, I like using JSON because it maps very well to python data structures and is very readable:
Persisting data is trivial:
>>> import json
>>> db = {'hello': 123, 'foo': [1,2,3,4,5,6], 'bar': {'a': 0, 'b':9}}
>>> fh = open("db.json", 'w')
>>> json.dump(db, fh)
and loading it is about the same:
>>> import json
>>> fh = open("db.json", 'r')
>>> db = json.load(fh)
>>> db
{'hello': 123, 'bar': {'a': 0, 'b': 9}, 'foo': [1, 2, 3, 4, 5, 6]}
>>> del new_db['foo'][3]
>>> new_db['foo']
[1, 2, 3, 5, 6]
In addition, JSON loading doesn't suffer from the same security issues that shelve and pickle do, although IIRC it is slower than pickle.
If you want to write on every operation:
If you want to save on every operation, you can subclass the Python dict object:
import os
import json
class DictPersistJSON(dict):
def __init__(self, filename, *args, **kwargs):
self.filename = filename
self._load();
self.update(*args, **kwargs)
def _load(self):
if os.path.isfile(self.filename)
and os.path.getsize(self.filename) > 0:
with open(self.filename, 'r') as fh:
self.update(json.load(fh))
def _dump(self):
with open(self.filename, 'w') as fh:
json.dump(self, fh)
def __getitem__(self, key):
return dict.__getitem__(self, key)
def __setitem__(self, key, val):
dict.__setitem__(self, key, val)
self._dump()
def __repr__(self):
dictrepr = dict.__repr__(self)
return '%s(%s)' % (type(self).__name__, dictrepr)
def update(self, *args, **kwargs):
for k, v in dict(*args, **kwargs).items():
self[k] = v
self._dump()
Which you can use like this:
db = DictPersistJSON("db.json")
db["foo"] = "bar" # Will trigger a write
Which is woefully inefficient, but can get you off the ground quickly.
Unpickle from file when program loads, modify as a normal dictionary in memory while program is running, pickle to file when program exits? Not sure exactly what more you're asking for here.
Assuming the keys and values have working implementations of repr, one solution is that you save the string representation of the dictionary (repr(dict)) to file. YOu can load it using the eval function (eval(inputstring)). There are two main disadvantages of this technique:
1) Is will not work with types that have an unuseable implementation of repr (or may even seem to work, but fail). You'll need to pay at least some attention to what is going on.
2) Your file-load mechanism is basically straight-out executing Python code. Not great for security unless you fully control the input.
It has 1 advantage: Absurdly easy to do.
My favorite method (which does not use standard python dictionary functions): Read/write YAML files using PyYaml. See this answer for details, summarized here:
Create a YAML file, "employment.yml":
new jersey:
mercer county:
pumbers: 3
programmers: 81
middlesex county:
salesmen: 62
programmers: 81
new york:
queens county:
plumbers: 9
salesmen: 36
Step 3: Read it in Python
import yaml
file_handle = open("employment.yml")
my__dictionary = yaml.safe_load(file_handle)
file_handle.close()
and now my__dictionary has all the values. If you needed to do this on the fly, create a string containing YAML and parse it wth yaml.safe_load.
If using only strings as keys (as allowed by the shelve module) is not enough, the FileDict might be a good way to solve this problem.
pickling has one disadvantage. it can be expensive if your dictionary has to be read and written frequently from disk and it's large. pickle dumps the stuff down (whole). unpickle gets the stuff up (as a whole).
if you have to handle small dicts, pickle is ok. If you are going to work with something more complex, go for berkelydb. It is basically made to store key:value pairs.
Have you considered using dbm?
import dbm
import pandas as pd
import numpy as np
db = b=dbm.open('mydbm.db','n')
#create some data
df1 = pd.DataFrame(np.random.randint(0, 100, size=(15, 4)), columns=list('ABCD'))
df2 = pd.DataFrame(np.random.randint(101,200, size=(10, 3)), columns=list('EFG'))
#serialize the data and put in the the db dictionary
db['df1']=df1.to_json()
db['df2']=df2.to_json()
# in some other process:
db=dbm.open('mydbm.db','r')
df1a = pd.read_json(db['df1'])
df2a = pd.read_json(db['df2'])
This tends to work even without a db.close()

Categories

Resources