Overwriting methods via mixin pattern does not work as intended - python

I am trying to introduce a mod/mixin for a problem. In particular I am focusing here on a SpeechRecognitionProblem. I intend to modify this problem and therefore I seek to do the following:
class SpeechRecognitionProblemMod(speech_recognition.SpeechRecognitionProblem):
def hparams(self, defaults, model_hparams):
SpeechRecognitionProblem.hparams(self, defaults, model_hparams)
vocab_size = self.feature_encoders(model_hparams.data_dir)['targets'].vocab_size
p = defaults
p.vocab_size['targets'] = vocab_size
def feature_encoders(self, data_dir):
# ...
So this one does not do much. It calls the hparams() function from the base class and then changes some values.
Now, there are already some ready-to-go problems e.g. Libri Speech:
#registry.register_problem()
class Librispeech(speech_recognition.SpeechRecognitionProblem):
# ..
However, in order to apply my modifications I am doing this:
#registry.register_problem()
class LibrispeechMod(SpeechRecognitionProblemMod, Librispeech):
# ..
This should, if I am not mistaken, overwrite everything (with identical signatures) in Librispeech and instead call functions of SpeechRecognitionProblemMod.
Since I was able to train a model with this code I am assuming that it's working as intended so far.
Now here comes the my problem:
After training I want to serialize the model. This usually works. However, it does not with my mod and I actually know why:
At a certain point hparams() gets called. Debugging to that point will show me the following:
self # {LibrispeechMod}
self.hparams # <bound method SpeechRecognitionProblem.hparams of ..>
self.feature_encoders # <bound method SpeechRecognitionProblemMod.feature_encoders of ..>
self.hparams should be <bound method SpeechRecognitionProblemMod.hparams of ..>! It would seem that for some reason hparams() of SpeechRecognitionProblem gets called directly instead of SpeechRecognitionProblemMod. But please note that it's the correct type for feature_encoders()!
The thing is that I know this is working during training. I can see that the hyper-paramaters (hparams) are applied accordingly simply because the model's graph node names change through my modifications.
There is one specialty I need to point out. tensor2tensor allows to dynamically load a t2t_usr_dir, which are additional python modules which get loaded by import_usr_dir. I make use of that function in my serialization script as well:
if usr_dir:
logging.info('Loading user dir %s' % usr_dir)
import_usr_dir(usr_dir)
This could be the only culprit I can see at the moment although I would not be able to tell why this may cause the problem.
If anybody sees something I do not I'd be glad to get a hint what I'm doing wrong here.
So what is the error you're getting?
For the sake of completeness, this is the result of the wrong hparams() method being called:
NotFoundError (see above for traceback): Restoring from checkpoint failed.
Key transformer/symbol_modality_256_256/softmax/weights_0 not found in checkpoint
symbol_modality_256_256 is wrong. It should be symbol_modality_<vocab-size>_256 where <vocab-size> is a vocabulary size which gets set in SpeechRecognitionProblemMod.hparams.

So, this weird behavior came from the fact that I was remote debugging and that the source files of the usr_dir were not correctly synchronized. Everything works as intended but the source files where not matching.
Case closed.

Related

objc.error: NSInternalInconsistencyException - readFromData:ofType:error: is a subclass responsibility but has not been overridden

I am using PyObj-C and am making some methods in a python file to read and write files using NSDocument, which uses the abstract NSFileCoordinater class. Accessing files this way instead of just using python's open let's these classes handle things for me such as preventing files from being edited from more than one program at a time or giving enough time for read/write operations to finish before it could get deadlocked.
These features are very important, and the app I ma building I want to be up to standard as much as I can here.
I have this code that instantiates a NSDocument object that contains the content of whatever file path you put into it, as a function:
#classmethod
def write(cls, file: str):
path = NSURL.fileURLWithPath_(file)
ext = file.split('.')[-1]
doc = NSDocument.alloc().initWithContentsOfURL_ofType_error_(path, ext, None)
When I call this function with a valid file path I get this error:
File "/Users/user123/PycharmProjects/shoutout/src/sutils/cfiles.py", line 27, in write
doc = NSDocument.alloc().initWithContentsOfURL_ofType_error_(path, ext, None)
objc.error: NSInternalInconsistencyException - readFromData:ofType:error: is a subclass responsibility but has not been overridden.
I have tried to find forums both objective-c, swift, or pyobj-c based as it were asking any keywords such as objective-c is a subclass responsibility but has not been overridden on google, and checked stackoverflow, and github for existing posts on this error but I could find none.
As I understand it Objective-C being polymorphic, has my method initWithContentsOfURL:ofType:error: call readFromData:ofType:error, among other ones at the same time. I don't understand exactly however what it means when it's saying that "is a subclass responsibility but has not been overridden." I am not sure also about what it means to override a class or a one being a responsibility so that doesn't help on my part.
A NSInternalInconsistencyException means a "when an internal assertion fails and implies an unexpected condition within the called code." Not sure what a internal "assertion" is either or what this could mean.
Any idea of what I could do to fix this?
NSDocument is an abstract class that requires you to subclass and implement a number of methods to make it usable. This is document in Apple's documentation for the class.
the older Document-Baed App Programming Guide for Mac gives more information on this.

What does __bases__ in __init__(self) mean?

In a certain library I found the following construction:
class PropertyHolder:
def __init__(self, raw):
__bases__ = raw # noqa
What can this be used for?
It does absolutely nothing, probably a bug in the code. Just sets a random local variable which immediately gets deleted.
Looks like it was added in issue #306 as an attempt to fix pickling a dynamic class. Needless to say, the pickling works because this code does nothing. They even ignored the linter (# noqa) when it told them it does nothing.
I've opened an appropriate issue.
Update:
Issue was fixed and code was removed 🙌

How to test complicated functions which use requests?

I want to test my code that is based on the API created by someone else, but im not sure how should I do this.
I have created some function to save the json into file so I don't need to send requests each time I run test, but I don't know how to make it work in situation when the original (check) function takes an input arg (problem_report) which is an instance of some class provided by API and it has this
problem_report.get_correction(corr_link) method. I just wonder if this is a sign of bad written code by me, beacuse I can't write a test to this, or maybe I should rewrite this function in my tests file like I showed at the end of provided below code.
# I to want test this function
def check(problem_report):
corrections = {}
for corr_link, corr_id in problem_report.links.items():
if re.findall(pattern='detailCorrection', string=corr_link):
correction = problem_report.get_correction(corr_link)
corrections.update({corr_id: correction})
return corrections
# function serves to load json from file, normally it is downloaded by API from some page.
def load_pr(pr_id):
print('loading')
with open('{}{}_view_pr.json'.format(saved_prs_path, pr_id)) as view_pr:
view_pr = json.load(view_pr)
...
pr_info = {'view_pr': view_pr, ...}
return pr_info
# create an instance of class MyPR which takes json to __init__
#pytest.fixture
def setup_pr():
print('setup')
pr = load_pr('123')
my_pr = MyPR(pr['view_pr'])
return my_pr
# test function
def test_check(setup_pr):
pr = setup_pr
checked_pr = pr.check(setup_rft[1]['problem_report_pr'])
assert checker_pr
# rewritten check function in test file
#mock.patch('problem_report.get_correction', side_effect=get_corr)
def test_check(problem_report):
corrections = {}
for corr_link, corr_id in problem_report.links.items():
if re.findall(pattern='detailCorrection', string=corr_link):
correction = problem_report.get_correction(corr_link)
corrections.update({corr_id: correction})
return corrections
Im' not sure if I provided enough code and explanation to underastand the problem, but I hope so. I wish you could tell me if this is normal that some function are just hard to test, and if this is good practice to rewritte them separately so I can mock functions inside the tested function. I also was thinking that I could write new class with similar functionality but API is very large and it would be very long process.
I understand your question as follows: You have a function check that you consider hard to test because of its dependency on the problem_report. To make it better testable you have copied the code into the test file. You will test the copied code because you can modify this to be easier testable. And, you want to know if this approach makes sense.
The answer is no, this does not make sense. You are not testing the real function, but completely different code. Well, the code may not start being completely different, but in short time the copy and the original will deviate, and it will be a maintenance nightmare to ensure that the copy always resembles the original. Improving code for testability is a different story: You can make changes to the check function to improve its testability. But then, exactly the same resulting function should be used both in the test and the production code.
How to better test the function check then? First, are you sure that using the original problem_report objects really can not be sensibly used in your tests? (Here are some criteria that help you decide: What to mock for python test cases?). Now, lets assume that you come to the conclusion you can not sensibly use the original problem_report.
In that case, here the interface is simple enough to define a mocked problem_report. Keep in mind that Python uses duck typing, so you only have to create a class that has a links member which has an items() method. Plus, your mocked problem_report class needs a method get_correction(). Beyond that, your mock does not have to produce types that are similar to the types used by problem_report. The items() method can return simply a list of lists, like [["a",2],["xxxxdetailCorrectionxxxx",4]]. The same argument holds for get_correction, which could for example simply return its argument or a derived value, like, its negative.
For the above example (items() returning [["a",2],["xxxxdetailCorrectionxxxx",4]] and get_correction returning the negative of its argument) the expected result would be {4: -4}. No need to simulate real correction objects. And, you can create your mocked versions of problem_report without need to read data from files - the mocks can be setup completely from within the unit-testing code.
Try patching the problem_report symbol in the module. You should put your tests in a separate class.
#mock.patch('some.module.path.problem_report')
def test_check(problem_report):
problem_report.side_effect = get_corr
corrections = {}
for corr_link, corr_id in problem_report.links.items():
if re.findall(pattern='detailCorrection', string=corr_link):
correction = problem_report.get_correction(corr_link)
corrections.update({corr_id: correction})
return corrections

How to construct an instance of _pytest.pytester.Testdir

I'm attempting to do some debugging (specifically on pytest/testing/test_doctest.py) and I want to step through some code in IPython. I have experience with pytest, but I never do anything too fancy with it, so I've never delved to deep into the more "magic" things it does.
In the test that I want to step through (potentially introspecting some of the objects), there is an argument called testdir, but nowhere in this file does it reference what testdir is or how I could possibly construct one.
After doing some digging it seems this is some magic fixture that automatically gets constructed and send to your function as a parameter, when you execute pytest with the pytester plugin. When I tracked down that class, it is constructed again via some magic request param, where the code is massively unhelpful in telling you what that magic request is or how to make one.
To make this concrete I simply want to take a test like this one:
def test_reportinfo(self, testdir):
'''
Test case to make sure that DoctestItem.reportinfo() returns lineno.
'''
p = testdir.makepyfile(test_reportinfo="""
def foo(x):
'''
>>> foo('a')
'b'
'''
return 'c'
""")
items, reprec = testdir.inline_genitems(p, '--doctest-modules')
reportinfo = items[0].reportinfo()
assert reportinfo[1] == 1
and run its logic in IPython. Looking at what the testdir object does, it seems pretty cool. It automatically makes a file for you and runs pytest problematically instead of via the command line. How can I make one of these? Is there some documentation I missed that makes how to do this clear and seem less obfuscated?
If I wanted to use something like this is my tests is there a way I could make what the magic testdir parameter is slightly more explicit so the next coder that looks at it isn't pulling his/her hair out like I am?
After much agonizing, I've figured out how to instantiate a fixture value.
import _pytest
config = _pytest.config._prepareconfig(['-s'], plugins=['pytester'])
session = _pytest.main.Session(config)
_pytest.tmpdir.pytest_configure(config)
_pytest.fixtures.pytest_sessionstart(session)
_pytest.runner.pytest_sessionstart(session)
def func(testdir):
return testdir
parent = _pytest.python.Module('parent', config=config, session=session)
function = _pytest.python.Function(
'func', parent, callobj=func, config=config, session=session)
_pytest.fixtures.fillfixtures(function)
testdir = function.funcargs['testdir']
The main idea is to create a dummy pytest session. This is a bit tricky. Its critical that the ['-s'] is passed into _prepareconfig otherwise this will not print stdout, or crash when run in IPython.
Given a barebones config and session, the next step is to manually load in whatever fixture functionality you are going to use. This amounts to manually calling the hooks that pluggy usually takes care of for you. I found these by looking at the attribute error I got when trying to run code without them. Usually its just due to session or config lacking a required attribute. There may be a better way to go about doing this (aka automatically via pluggy).
Next, we create a function that requests the specific fixture we are interested in. Its up to you to know what these names are. Finally we setup a dummy module / function tree structure and call fillfixtures, which does the magic. The funcargs then contains a dictionary of these objects ready for use. Be careful if you expect some teardown functionality. I'm not sure if this covers that, but I don't really need it for what I'm doing.
Hope this helps someone else. Note: this talk helped me understand what was happening in pytest under the hood a bit better: https://www.youtube.com/watch?v=zZsNPDfOoHU

dreaded "not the same object error" pickling a queryset.query object

I have a queryset that I need to pickle lazily and I am having some serious troubles. cPickle.dumps(queryset.query) throws the following error:
Can't pickle <class 'myproject.myapp.models.myfile.QuerySet'>: it's not the same object as myproject.myapp.models.myfile.QuerySet
Strangely (or perhaps not so strangely), I only get that error when I call cPcikle from another method or a view, but not when I call it from the command line.
I made the method below after reading PicklingError: Can't pickle <class 'decimal.Decimal'>: it's not the same object as decimal.Decimal and Django mod_wsgi PicklingError while saving object:
def dump_queryset(queryset, model):
from segment.segmentengine.models.segment import QuerySet
memo = {}
new_queryset = deepcopy(queryset, memo)
memo = {}
new_query = deepcopy(new_queryset.query, memo)
queryset = QuerySet(model=model, query=new_query)
return cPickle.dumps(queryset.query)
As you can see, I am getting extremely desperate -- that method still yields the same error. Is there a known, non-hacky solution to this problem?
EDIT: Tried using --noreload running on the django development server, but to no avail.
EDIT2: I had a typo in the error I displayed above -- it was models.QuerySet, not models.mymodel.QuerySet that it was complaining about. There is another nuance here, which is that my models file is broken out into multiple modules, so the error is ACTUALLY:
Can't pickle <class 'myproject.myapp.models.myfile.QuerySet'>: it's not the same object as myproject.myapp.models.myfile.QuerySet
Where myfile is one of the modules under models. I have an __ini__.py in models with the following line:
from myfile import *
I wonder if this is contributing to my issue. Is there some way to change my init to protect myself against this? Are there any other tests to try?
EDIT3: Here is a little more background on my use case: I have a model called Context that I use to populate a UI element with instances of mymodel. The user can add/remove/manipulate the objects on the UI side, changing their context, and when they return, they can keep their changes, because the context serialized everything. A context has a generic foreign key to different types of filters/ways the user can manipulate the object, all of which must implement a few methods that the context uses to figure out what it should display. One such filter takes a queryset that can be passed in and displays all of the objects in that queryset. This provides a way to pass in arbitrary querysets that are produced elsewhere and have them displayed in the UI element. The model that uses the Context is hierarchical (using mptt for this), and the UI element makes a request to get children each time the user clicks around, we can then take the children and determine if they should be displayed based on whether or not they are included in the Context. Hope that helps!
EDIT4: I am able to dump an empty queryset, but as soon as I add anything of value, it fails.
EDIT4: I am on Django 1.2.3
This may not be the case for everyone, but I was using Ipython notebook and having a similar issue pickling my own class.
The problem turned out to be from a reload call
from dir.my_module import my_class
reload(dir.my_module)
Removing the reload call and then re-running the import and the cell where the instance of that object was created then allowed it to be pickled.
not so elegant but perhaps it works:
add the directory of the myfile -module to os.sys.path and use only import myfile in each module where you use myfile. (remove any from segment.segmentengine.models.segment import, anywhere in your project)
According to this doc, pickling a QuerySet should be not a problem. Thus, the problem should come from other place.
Since you mentined:
EDIT2: I had a typo in the error I displayed above -- it was models.QuerySet, not models.mymodel.QuerySet that it was complaining about. There is another nuance here, which is that my models file is broken out into multiple modules, so the error is ACTUALLY:
The second error message you provided look like the same as previous one, is that what you mean?
The error message you provided looks weird. Since you are pickling "queryset.query", the error should related to the django.db.models.sql.Query class instead of the QuerySet class.
Some modules or classes may have the same name. They will override each other then cause this kind of issue. To make thing easier, I will recommend you to use "import ooo.xxx" instead of "from ooo import *".
Your could also try
import ooo.xxx as othername

Categories

Resources