memoize to disk - python - persistent memoization - python

Is there a way to memoize the output of a function to disk?
I have a function
def getHtmlOfUrl(url):
... # expensive computation
and would like to do something like:
def getHtmlMemoized(url) = memoizeToFile(getHtmlOfUrl, "file.dat")
and then call getHtmlMemoized(url), so as to do the expensive computation only once for each url.

Python offers a very elegant way to do this - decorators. Basically, a decorator is a function that wraps another function to provide additional functionality without changing the function source code. Your decorator can be written like this:
import json
def persist_to_file(file_name):
def decorator(original_func):
try:
cache = json.load(open(file_name, 'r'))
except (IOError, ValueError):
cache = {}
def new_func(param):
if param not in cache:
cache[param] = original_func(param)
json.dump(cache, open(file_name, 'w'))
return cache[param]
return new_func
return decorator
Once you've got that, 'decorate' the function using #-syntax and you're ready.
#persist_to_file('cache.dat')
def html_of_url(url):
your function code...
Note that this decorator is intentionally simplified and may not work for every situation, for example, when the source function accepts or returns data that cannot be json-serialized.
More on decorators: How to make a chain of function decorators?
And here's how to make the decorator save the cache just once, at exit time:
import json, atexit
def persist_to_file(file_name):
try:
cache = json.load(open(file_name, 'r'))
except (IOError, ValueError):
cache = {}
atexit.register(lambda: json.dump(cache, open(file_name, 'w')))
def decorator(func):
def new_func(param):
if param not in cache:
cache[param] = func(param)
return cache[param]
return new_func
return decorator

Check out joblib.Memory. It's a library for doing exactly that.
from joblib import Memory
memory = Memory("cachedir")
#memory.cache
def f(x):
print('Running f(%s)' % x)
return x

A cleaner solution powered by Python's Shelve module. The advantage is the cache gets updated in real time via well-known dict syntax, also it's exception proof(no need to handle annoying KeyError).
import shelve
def shelve_it(file_name):
d = shelve.open(file_name)
def decorator(func):
def new_func(param):
if param not in d:
d[param] = func(param)
return d[param]
return new_func
return decorator
#shelve_it('cache.shelve')
def expensive_funcion(param):
pass
This will facilitate the function to be computed just once. Next subsequent calls will return the stored result.

There is also diskcache.
from diskcache import Cache
cache = Cache("cachedir")
#cache.memoize()
def f(x, y):
print('Running f({}, {})'.format(x, y))
return x, y

The Artemis library has a module for this. (you'll need to pip install artemis-ml)
You decorate your function:
from artemis.fileman.disk_memoize import memoize_to_disk
#memoize_to_disk
def fcn(a, b, c = None):
results = ...
return results
Internally, it makes a hash out of input arguments and saves memo-files by this hash.

Check out Cachier. It supports additional cache configuration parameters like TTL etc.
Simple example:
from cachier import cachier
import datetime
#cachier(stale_after=datetime.timedelta(days=3))
def foo(arg1, arg2):
"""foo now has a persistent cache, trigerring recalculation for values stored more than 3 days."""
return {'arg1': arg1, 'arg2': arg2}

Something like this should do:
import json
class Memoize(object):
def __init__(self, func):
self.func = func
self.memo = {}
def load_memo(filename):
with open(filename) as f:
self.memo.update(json.load(f))
def save_memo(filename):
with open(filename, 'w') as f:
json.dump(self.memo, f)
def __call__(self, *args):
if not args in self.memo:
self.memo[args] = self.func(*args)
return self.memo[args]
Basic usage:
your_mem_func = Memoize(your_func)
your_mem_func.load_memo('yourdata.json')
# do your stuff with your_mem_func
If you want to write your "cache" to a file after using it -- to be loaded again in the future:
your_mem_func.save_memo('yournewdata.json')

Assuming that you data is json serializable, this code should work
import os, json
def json_file(fname):
def decorator(function):
def wrapper(*args, **kwargs):
if os.path.isfile(fname):
with open(fname, 'r') as f:
ret = json.load(f)
else:
with open(fname, 'w') as f:
ret = function(*args, **kwargs)
json.dump(ret, f)
return ret
return wrapper
return decorator
decorate getHtmlOfUrl and then simply call it, if it had been run previously, you will get your cached data.
Checked with python 2.x and python 3.x

You can use the cache_to_disk package:
from cache_to_disk import cache_to_disk
#cache_to_disk(3)
def my_func(a, b, c, d=None):
results = ...
return results
This will cache the results for 3 days, specific to the arguments a, b, c and d. The results are stored in a pickle file on your machine, and unpickled and returned next time the function is called. After 3 days, the pickle file is deleted until the function is re-run. The function will be re-run whenever the function is called with new arguments. More info here: https://github.com/sarenehan/cache_to_disk

Most answers are in a decorator fashion. But maybe I don't want to cache the result every time when calling the function.
I made one solution using context manager, so the function can be called as
with DiskCacher('cache_id', myfunc) as myfunc2:
res=myfunc2(...)
when you need the caching functionality.
The 'cache_id' string is used to distinguish data files, which are named [calling_script]_[cache_id].dat. So if you are doing this in a loop, will need to incorporate the looping variable into this cache_id, otherwise data will be overwritten.
Alternatively:
myfunc2=DiskCacher('cache_id')(myfunc)
res=myfunc2(...)
Alternatively (this is probably not quite useful as the same id is used all time time):
#DiskCacher('cache_id')
def myfunc(*args):
...
The complete code with examples (I'm using pickle to save/load, but can be changed to whatever save/read methods. NOTE that this is also assuming the function in question returns only 1 return value):
from __future__ import print_function
import sys, os
import functools
def formFilename(folder, varid):
'''Compose abspath for cache file
Args:
folder (str): cache folder path.
varid (str): variable id to form file name and used as variable id.
Returns:
abpath (str): abspath for cache file, which is using the <folder>
as folder. The file name is the format:
[script_file]_[varid].dat
'''
script_file=os.path.splitext(sys.argv[0])[0]
name='[%s]_[%s].nc' %(script_file, varid)
abpath=os.path.join(folder, name)
return abpath
def readCache(folder, varid, verbose=True):
'''Read cached data
Args:
folder (str): cache folder path.
varid (str): variable id.
Keyword Args:
verbose (bool): whether to print some text info.
Returns:
results (tuple): a tuple containing data read in from cached file(s).
'''
import pickle
abpath_in=formFilename(folder, varid)
if os.path.exists(abpath_in):
if verbose:
print('\n# <readCache>: Read in variable', varid,
'from disk cache:\n', abpath_in)
with open(abpath_in, 'rb') as fin:
results=pickle.load(fin)
return results
def writeCache(results, folder, varid, verbose=True):
'''Write data to disk cache
Args:
results (tuple): a tuple containing data read to cache.
folder (str): cache folder path.
varid (str): variable id.
Keyword Args:
verbose (bool): whether to print some text info.
'''
import pickle
abpath_out=formFilename(folder, varid)
if verbose:
print('\n# <writeCache>: Saving output to:\n',abpath_out)
with open(abpath_out, 'wb') as fout:
pickle.dump(results, fout)
return
class DiskCacher(object):
def __init__(self, varid, func=None, folder=None, overwrite=False,
verbose=True):
'''Disk cache context manager
Args:
varid (str): string id used to save cache.
function <func> is assumed to return only 1 return value.
Keyword Args:
func (callable): function object whose return values are to be
cached.
folder (str or None): cache folder path. If None, use a default.
overwrite (bool): whether to force a new computation or not.
verbose (bool): whether to print some text info.
'''
if folder is None:
self.folder='/tmp/cache/'
else:
self.folder=folder
self.func=func
self.varid=varid
self.overwrite=overwrite
self.verbose=verbose
def __enter__(self):
if self.func is None:
raise Exception("Need to provide a callable function to __init__() when used as context manager.")
return _Cache2Disk(self.func, self.varid, self.folder,
self.overwrite, self.verbose)
def __exit__(self, type, value, traceback):
return
def __call__(self, func=None):
_func=func or self.func
return _Cache2Disk(_func, self.varid, self.folder, self.overwrite,
self.verbose)
def _Cache2Disk(func, varid, folder, overwrite, verbose):
'''Inner decorator function
Args:
func (callable): function object whose return values are to be
cached.
varid (str): variable id.
folder (str): cache folder path.
overwrite (bool): whether to force a new computation or not.
verbose (bool): whether to print some text info.
Returns:
decorated function: if cache exists, the function is <readCache>
which will read cached data from disk. If needs to recompute,
the function is wrapped that the return values are saved to disk
before returning.
'''
def decorator_func(func):
abpath_in=formFilename(folder, varid)
#functools.wraps(func)
def wrapper(*args, **kwargs):
if os.path.exists(abpath_in) and not overwrite:
results=readCache(folder, varid, verbose)
else:
results=func(*args, **kwargs)
if not os.path.exists(folder):
os.makedirs(folder)
writeCache(results, folder, varid, verbose)
return results
return wrapper
return decorator_func(func)
if __name__=='__main__':
data=range(10) # dummy data
#--------------Use as context manager--------------
def func1(data, n):
'''dummy function'''
results=[i*n for i in data]
return results
print('\n### Context manager, 1st time call')
with DiskCacher('context_mananger', func1) as func1b:
res=func1b(data, 10)
print('res =', res)
print('\n### Context manager, 2nd time call')
with DiskCacher('context_mananger', func1) as func1b:
res=func1b(data, 10)
print('res =', res)
print('\n### Context manager, 3rd time call with overwrite=True')
with DiskCacher('context_mananger', func1, overwrite=True) as func1b:
res=func1b(data, 10)
print('res =', res)
#--------------Return a new function--------------
def func2(data, n):
results=[i*n for i in data]
return results
print('\n### Wrap a new function, 1st time call')
func2b=DiskCacher('new_func')(func2)
res=func2b(data, 10)
print('res =', res)
print('\n### Wrap a new function, 2nd time call')
res=func2b(data, 10)
print('res =', res)
#----Decorate a function using the syntax sugar----
#DiskCacher('pie_dec')
def func3(data, n):
results=[i*n for i in data]
return results
print('\n### pie decorator, 1st time call')
res=func3(data, 10)
print('res =', res)
print('\n### pie decorator, 2nd time call.')
res=func3(data, 10)
print('res =', res)
The outputs:
### Context manager, 1st time call
# <writeCache>: Saving output to:
/tmp/cache/[diskcache]_[context_mananger].nc
res = [0, 10, 20, 30, 40, 50, 60, 70, 80, 90]
### Context manager, 2nd time call
# <readCache>: Read in variable context_mananger from disk cache:
/tmp/cache/[diskcache]_[context_mananger].nc
res = [0, 10, 20, 30, 40, 50, 60, 70, 80, 90]
### Context manager, 3rd time call with overwrite=True
# <writeCache>: Saving output to:
/tmp/cache/[diskcache]_[context_mananger].nc
res = [0, 10, 20, 30, 40, 50, 60, 70, 80, 90]
### Wrap a new function, 1st time call
# <writeCache>: Saving output to:
/tmp/cache/[diskcache]_[new_func].nc
res = [0, 10, 20, 30, 40, 50, 60, 70, 80, 90]
### Wrap a new function, 2nd time call
# <readCache>: Read in variable new_func from disk cache:
/tmp/cache/[diskcache]_[new_func].nc
res = [0, 10, 20, 30, 40, 50, 60, 70, 80, 90]
### pie decorator, 1st time call
# <writeCache>: Saving output to:
/tmp/cache/[diskcache]_[pie_dec].nc
res = [0, 10, 20, 30, 40, 50, 60, 70, 80, 90]
### pie decorator, 2nd time call.
# <readCache>: Read in variable pie_dec from disk cache:
/tmp/cache/[diskcache]_[pie_dec].nc
res = [0, 10, 20, 30, 40, 50, 60, 70, 80, 90]

Here's a solution I came up with which can:
memoize mutable objects (memoized functions should have no side effects that change mutable parameters or it won't work as expected)
writes to a separate cache file for each wrapped function (easy to delete the file to purge that particular cache)
compresses the data to make it much smaller on disk (a LOT smaller)
It will create cache files like:
cache.__main__.function.getApiCall.db
cache.myModule.function.fixDateFormat.db
cache.myOtherModule.function.getOtherApiCall.db
Here's the code. You can choose a compression library of your choosing, but I've found LZMA works best for the pickle storage we are using.
import dbm
import hashlib
import pickle
# import bz2
import lzma
# COMPRESSION = bz2
COMPRESSION = lzma # better with pickle compression
# Create a #memoize_to_disk decorator to cache a memoize to disk cache
def memoize_to_disk(function, cache_filename=None):
uniqueFunctionSignature = f'cache.{function.__module__}.{function.__class__.__name__}.{function.__name__}'
if cache_filename is None:
cache_filename = uniqueFunctionSignature
# print(f'Caching to {cache_file}')
def wrapper(*args, **kwargs):
# Convert the dictionary into a JSON object (can't memoize mutable fields, this gives us an immutable, hashable function signature)
if cache_filename == uniqueFunctionSignature:
# Cache file is function-specific, so don't include function name in params
params = {'args': args, 'kwargs': kwargs}
else:
# add module.class.function name to params so no collisions occur if user overrides cache_file with the same cache for multiple functions
params = {'function': uniqueFunctionSignature, 'args': args, 'kwargs': kwargs}
# key hash of the json representation of the function signature (to avoid immutable dictionary errors)
params_json = json.dumps(params)
key = hashlib.sha256(params_json.encode("utf-8")).hexdigest() # store hash of key
# Get cache entry or create it if not found
with dbm.open(cache_filename, 'c') as db:
# Try to retrieve the result from the cache
try:
result = pickle.loads(COMPRESSION.decompress(db[key]))
# print(f'CACHE HIT: Found {key[1:100]=} in {cache_file=} with value {str(result)[0:100]=}')
return result
except KeyError:
# If the result is not in the cache, call the function and store the result
result = function(*args, **kwargs)
db[key] = COMPRESSION.compress(pickle.dumps(result))
# print(f'CACHE MISS: Stored {key[1:100]=} in {cache_file=} with value {str(result)[0:100]=}')
return result
return wrapper
To use the code, use the #memoize_to_disk decorator (with an optional filename parameter if you don't like "cache." as a prefix)
#memoize_to_disk
def expensive_example(n):
// expensive operation goes here
return value

Related

Python unittest to create a mock .json file

I have function that looks like this:
def file1_exists(directory):
file1_path = os.path.join(directory, 'file1.json')
return os.path.exists(file1_path)
def file2_exists(directory):
log_path = os.path.join(directory, 'file2.log')
return os.path.exists(file2_path)
def create_file1(directory):
if file1_exists(directory):
return
if not file2_exists(directory):
return
mod_time = os.stat(os.path.join(directory, 'file2.log')).st_mtime
timestamp = {
"creation_timestamp": datetime.datetime.fromtimestamp(mod_time).isoformat()
}
with open(os.path.join(directory, "file1.json"), "w") as f:
json.dump(timestamp, f)
And I need to create a unittest that uses mock files.
The 3 Unittests that I need are:
A mock myfile.json file where I will assert that the function will return None (based on the 1st if statement, since the file exists)
A way to mock-hide the data.txt item in order to assert that the function will return None (based on the second if statement)
A mock myfile.json file where I write the required data and then assert that the return matches the expected outcome.
So far I've tried tests 1. and 2. with variations of this but I've been unsuccessful:
class TestAdminJsonCreation(unittest.TestCase):
#patch('os.path.exists', return_value=True)
def test_existing_admin_json(self):
self.assertNone(postprocess_results.create_json_file())
I've also read about other solutions such as:
Python testing: using a fake file with mock & io.StringIO
But I haven't found a way to successfully do what I need...
You want to be able to provide different return values for each call to os.path.exists. Since you know the order of the calls, you can use side_effects to supply a list of values to be used in order.
class TestAdminJsonCreation(unittest.TestCase):
# No JSON file
#patch('os.path.exists', return_value=True)
def test_existing_admin_json(self):
self.assertNone(postprocess_results.create_json_file())
# JSON file, log file
#patch('os.path.exists', side_effects=[True, False])
def test_existing_admin_json(self):
self.assertNone(postprocess_results.create_json_file())
# JSON file, no log file
#patch('os.path.exists', side_effects=[True, True])
def test_existing_admin_json(self):
...
The third test requires an actual file system, or for you to mock open.
So, I ended up breaking my original function into 3 different functions for easier testing.
The tests are performed by checking what the result of the 'def create_file1' would be when we feed it different return_values from the other 2 functions and when we add valid data.
class TestFile1JsonCreation(unittest.TestCase):
#patch('builtins.open', new_callable=mock_open())
#patch('os.stat')
#patch('file1_exists', return_value=True)
#patch('file2_exists', return_value=False)
def test_existing_file1_json(self, file2_exists, file1_existsmock, stat, mopen):
create_file1('.')
# file1.json should not have been written
mopen.assert_not_called()
#patch('builtins.open', new_callable=mock_open())
#patch('os.stat')
#patch('file1_exists', return_value=False)
#patch('file2_exists', return_value=False)
def test_missing_file2(self, file2_exists, file1_existsmock, stat, mopen):
create_file1('.')
# file1.json should not have been written
mopen.assert_not_called()
#patch('builtins.open', new_callable=mock_open())
#patch('os.stat')
#patch('file1_exists', return_value=False)
#patch('file2_exists', return_value=True)
def test_write_data(self, file2_exists, file1_existsmock, stat, mopen):
class FakeStat:
st_mtime = 1641992788
stat.return_value = FakeStat()
create_file1('.')
# file1.json should have been written
mopen.assert_called_once_with('./file1.json', 'w')
written_data = ''.join(
c[1][0]
for c in mopen().__enter__().write.mock_calls
)
expected_data = {"creation_timestamp": "2022-01-12T13:06:28"}
written_dict_data = json.loads(written_data)
self.assertEqual(written_dict_data, expected_data)

File is sometimes open, sometimes closed when I try to write data in it in Python

In my main file, I call the following function to write same data to a binary file:
main.py
writeOutputFile(param1, param2, param3)
In file_a.writeOutputFile I open my output file in a with statement
and call the function file_b.writeReference:
file_a.py
#singleton
class BitstreamToFile:
def __init__(self, outfile):
self.outfile = outfile
self.cache = ''
def add(self, data, length):
s = ''
if (type(data) == str):
log.info("string %s \n "%data)
for char in data:
b = bin(ord(char))[2:]
s = s + "{:0>8}".format(b)
else:
s = bin(data)[2:]
if (len(s) < length):
resto = length - len(s)
for _ in range(0, resto):
s = '0' + s
s = s[0:length]
self.cache = self.cache + s
self.flush()
def writeByteToFile(self):
if (len(self.cache) < 8):
raise ("Not enough bits to make a byte ")
data = int(self.cache[:8], 2)
log.info("writeByteToFile %s " % data)
self.outfile.write(struct.pack('>B', data))
self.cache = self.cache[8:]
def flush(self, padding=False):
while (len(self.cache) >= 8):
log.info("BEF flush len(self.cache) %s"%len(self.cache))
self.writeByteToFile()
log.info("AFT flush len(self.cache) %s"%len(self.cache))
if (padding):
self.cache = "{:0<8}".format(self.cache)
self.writeByteToFile()
def writeOutputFile(param1, param2, param3):
[..]
with open(OUTPUT_FILE, 'wb') as out_file:
writeReference(out_file, param2, param1)
In file_B.writeReference I instantiate my BitstreamToFile object
file_b.py
def writeReference(out_file, param2, param1):
bitstream = file_a.BitstreamToFile(file)
log.debug ("write key && length")
bitstream.add("akey", 32)
bitstream.add(0, 64)
[..]
When I compile and execute the first time, I get no error. The second time instead I get:
# log from `file_B.writeReference`
write key && length
# log from file_a.bitstream.flush
BEF flush len(self.cache) 32
#log from file_a.bitstream.writeByteToFile
writeByteToFile 114
then the code crashes:
Exception on /encode [POST]
[..]
File "/src/file_a.py", line 83, in flush
self.writeByteToFile()
File "/src/file_a.py", line 73, in writeByteToFile
self.outfile.write(struct.pack('>B', data))
ValueError: write to closed file
"POST /encode HTTP/1.1" 500 -
Any hints on where the error might be? I do not really understand why sometimes it works, sometimes it does not.
Thank you in advance
Not an answer.
diagnostic tool:
Subclass io.FileIO; override the __enter__ and __exit__ methods adding logging so you can see when the context manager enters and exits (file closed?). Maybe add more logging to other parts of the program for finer grained time-history. Do some test runs with a fake file or even something more isolated from your real stuff (I say this mainly because I don't know the consequences of using the subclass so you should be careful). Here is an example:
import io
class B(io.FileIO):
def __enter__(self):
print(f'\tcontext manager entry - file:{self.name}')
return super().__enter__()
def __exit__(self,*args,**kwargs):
print(f'\tcontext manager exiting - file:{self.name}')
super().__exit__(self,*args,**kwargs)
In [32]: with B('1.txt','wb') as f:
...: f.write(b'222')
...:
context manager entry - file:1.txt
context manager exiting - file:1.txt
In [33]:
The issue is related to the docker container which handles the code I shared above.
I'm a newby in Docker, and so I was using the following command to build up my containers (I have 3 micro-services):
$docker-compose up -d --build
without knowing that, if my container is not RE-created (no changes in the source code), the second time I re-run the previously stopped container where my file was closed at the end.
If I force container to be recreated (in the case I do not need to change the source code):
$docker-compose up -d --build --force-recreate
I have no more errors.

Pass kwargs to starmap while using Pool in Python

I'm using Pool to multithread my programme using starmap to pass arguments.
I'm stuck because I cannot seem to find a way to pass kwargs along with the zip arrays that I'm passing in the starmap function.
pool = Pool(NO_OF_PROCESSES)
branches = pool.starmap(fetch_api, zip(repeat(project_name), api_extensions))
The branches request is incomplete as I'm still not able to figure out how to pass keywords arguments.
def fetch_api(project_name, api_extension, payload={}, headers={}, API_LINK=API_LINK, key=False):
headers[AUTH_STRING] = 'Gogo'
call_api = API_LINK + project_name + api_extension
response_api = requests.get(call_api, headers=headers, params=payload)
if key: return project_name + ':' + response_api
else: return response_api
While calling fetch_api() from the branch line, I want to pass payload as {'a':1} and key=True.
Please guide me to the direction or answer. Thanks. Using Python 3.3+.
You can create a wrapper around pool.starmap that also accepts an iterator of over kwargs dictionaries.
from itertools import repeat
def starmap_with_kwargs(pool, fn, args_iter, kwargs_iter):
args_for_starmap = zip(repeat(fn), args_iter, kwargs_iter)
return pool.starmap(apply_args_and_kwargs, args_for_starmap)
def apply_args_and_kwargs(fn, args, kwargs):
return fn(*args, **kwargs)
Then you can call it in your case as:
args_iter = zip(repeat(project_name), api_extensions)
kwargs_iter = repeat(dict(payload={'a': 1}, key=True))
branches = starmap_with_kwargs(pool, fetch_api, args_iter, kwargs_iter)

Input/output decorator to pickle function result

Given a function with a parameter a and two other parameters (pickle_from, pickle_to), I'd like to:
Load and return the pickled object located at pickle_from, if pickle_from is not None. If it is None, compute some function of a and return it.
Dump the result of the above to pickle_to if pickle_to is not None.
With a single function this is straightforward. If pickle_from isn't null, the function just loads the pickled result and returns it. Otherwise, it performs some time-intensive calculation with a, dumps that to pickle_to, and returns the calculation result.
try:
import cPickle as pickle
except:
import pickle
def somefunc(a, pickle_from=None, pickle_to=None):
if pickle_from:
with open(pickle_from + '.pickle', 'rb') as f
res = pickle.load(f)
else:
# Re-calcualte some time-intensive func call
res = a ** 2
if pickle_to:
# Update pickled data with newly calculated `res`
with open(pickle_to + '.pickle', 'wb') as f:
pickle.dump(res, f)
return res
My question is regarding how to build a decorator so that this process can form a shell around multiple functions similar to somefunc, cutting down on source code in the process.
I'd like to be able to write something like:
#pickle_option
def somefunc(a, pickle_from=None, pickle_to=None)
# or do params need to be in the decorator call?
# remember, "the files are in the computer"
res = a ** 2
return res
Is this possible? Something about decorators makes my head explode, so I will politely decline to post here "what I have tried."
This decorator requires a little bit of introspection. Specifically, I've made use of inspect.Signature to extract the pickle_from and pickle_to parameters.
Other than that, it's a very straightforward decorator: It keeps a reference to the decorated function, and calls it if necessary.
import inspect
from functools import wraps
def pickle_option(func):
sig = inspect.signature(func)
#wraps(func)
def wrapper(*args, **kwargs):
# get the value of the pickle_from and pickle_to parameters
# introspection magic, don't worry about it or read the docs
bound_args = sig.bind(*args, **kwargs)
pickle_from = bound_args.arguments.get('pickle_from', \
sig.parameters['pickle_from'].default)
pickle_to = bound_args.arguments.get('pickle_to', \
sig.parameters['pickle_to'].default)
if pickle_from:
with open(pickle_from + '.pickle', 'rb') as f:
result = pickle.load(f)
else:
result = func(*args, **kwargs)
if pickle_to:
with open(pickle_to + '.pickle', 'wb') as f:
pickle.dump(result, f)
return result
return wrapper
Given your use case, I think it would be clearer to use just a generic wrapper:
def pickle_call(fun, *args, pickle_from=None, pickle_to=None, **kwargs):
if pickle_from:
with open(pickle_from + '.pickle', 'rb') as f
res = pickle.load(f)
else:
res = fun(*args, **kwargs)
if pickle_to:
# Update pickled data with newly calculated `res`
with open(pickle_to + '.pickle', 'wb') as f:
pickle.dump(res, f)
return res
Then you'd use it like:
res = pickle_call(somefunc, a, pickle_from="from", pickle_to="to")
This avoids having to add a decorator everywhere you want to use this feature, and in fact works with any callable (not just functions), from your code or else.

Understanding Mocking and SideEffects

I am a newb to Python and I understand testing, however, I cannot wrap my head around working with Mocked Objects and side_effects.
Here is my method:
#retry(every=RETRY_EVERY, until=RETRY_UNTIL)
#unique()
#sessionized(0)
def record_click(session, queue, mailing_id, member_id, link_id, timestamp, user_agent):
message = session.query(Message).get((mailing_id, member_id))
mailing = session.query(Mailing).get(mailing_id)
# More code here
Here is my test:
#mock.patch("audience.jobs.EventProvider")
#mock.patch("audience.jobs.enqueue_webhook")
#mock.patch("logging.exception")
#mock.patch("audience.jobs.audience_queues")
#mock.patch("audience.jobs.Session")
#mock.patch("audience.jobs.DatabaseConnector")
def test_track_click_publishes_event_to_sns(self, DatabaseConnector, Session, audience_queues, logger, enqueue_webhook, EventProvider):
message_mock = mock.Mock(account_id=77)
message_mock.record_open.return_value = True
mailing_mock = mock.Mock(mailing_id=123)
mailing_mock.recipient_groups.return_value = [111]
session_query = Session.return_value.query.return_value
session_query.side_effect = lambda arg: message_mock if isinstance(arg, tuple) else mailing_mock
result = jobs.record_click(
888,
9999,
2048,
datetime.datetime(1999, 12, 31, 23, 59, 59, 999999).isoformat(),
"Mozilla/5.0")
self.assertIsNone(result)
self.assertListEqual(EventProvider.mock_calls, [
mock.call(),
mock.call().publish_link_clicked(
headers={'User-Agent': 'Mozilla/5.0'},
mailing_id=888,
account_id=77,
contact_id=9999,
link_id=2048,
group_ids=[111]
)
])
self.assertListEqual(logger.mock_calls, [])
There error I keep receiving is:
Instead of
call().publish_link_clicked(group_ids=[111], account_id=77, **etc)
This is what is called in the UnitTest
call().publish_link_clicked(group_ids=<MagicMock name='Session().query().get().recipient_groups' id='4557662736'>, account_id=<MagicMock name='Session().query().get().account_id' id='4557652048'>, **etc)
What am I doing wrong?
Don't call Session() or query(); use the Mock.return_value attribute instead to traverse the call graph:
Session.return_value.query.return_value.side_effect = lambda arg: message_mock if isinstance(arg, tuple) else mailing_mock
I usually use intermediary names to hold a return value:
session_query = Session.return_value.query.return_value
session_query.side_effect = lambda arg: message_mock if isinstance(arg, tuple) else mailing_mock
You also need to patch the right Session class; this depends entirely how your code produces the session argument to record_click. See Where to Patch for more details. If the #sessionized decorator produces this argument, and it doesn't live in the audience.jobs module, you are not patching the right location.

Categories

Resources