Algorithms to achieve reasonable result with most economic efforts - python

Suppose that there are six kinds of problems to be handled when reading a book,
I illustrate the details as follow:
while True:
if encounter A :
handle A
#during handling the problem, it might spawn new problems of
#A, B, C, D, E,
produce (A,B..E or null)
continue
if B occurs:
handle B
#during hanlding the problem, it might spwan new problems
#A, B, C, D, E,
produce (A,B..E or null)
continue
if C happens:
handle C
produce (A,B..E or null)
continue
...
if there are no problmes:
break
Assume I have 3 problems,
the above program might endless loop on first one and never touch the second.
Take an example that I am reading a book,
'Problem A' is defined as encounter a 'new word', handle it is to look up dictionary.
When looking up, I might come acorss another new word, another and another.
In this case, I will never end up reading one sentence of a book.
As a solution,
I introduce a container to collect problems, value-weighted them then determine to execute which one.
def solve_problems(problems)
problem_backet = list(problems)
while True:
if problem_backet not is null:
#value-weighted all the problem
#and determine one to implement
value_weight problems
problem = x
if problem == A:
handle A
problem_backet.append(new_problem)
continue
if problem == B:
handle B
problem_backet.append(new_problem)
continue
...
if problem_backet is null:
return
I tried alternatively to seek inspiration and improve efficiency.
def solve_problems(problems):
global problem_backet
problem_backet = list(problems)
value_weight problems
problem = x
if problem == A:
handle A
problem_backet.append(new_problem)
solve_problems(problem_backet)
if problem == B:
handle B
problem_backet.append(new_problem)
solve_problems(problem_backet)
if problem == C:
handle C
problem_backet.append(new_problem)
solve_problems(problem_backet)
...
if problem_backet is null:
return
Again, the value_weighted process consume huge efforts and time.
How to solve such a problem in a proper algorithms?

'Problem A' is defined as encounter a 'new word', handle it is to look up dictionary.
When looking up, I might come across another new word, another and another.
In this case, I will never end up reading one sentence of a book.
Looks like it will eventually end up reading the sentence, since the number of new words is limited by dictionary size. In general it sounds OK to me, unless there are some other restrictions not explicitly mentioned, like finish sentence reading in a limited time.
How to solve such a problem in a proper algorithms?
Well, if there is no a "limited time" restriction, your original algorithm is almost perfect. To make it even better in therms of overall performance, we might handle all problems A first, then move to B and so on. It will increase the data locality and overall performance of our algorithm.
But if there is a "limited time" restriction, we can end up reading full sentence in that time (without full understanding) or end up reading part of the sentence (fully understanding that part) or something in between (like suggested by #Lauro Bravar).
From the example above it is not quite clear how we do the value_weight, but the proper name for this kind of problems is Priority Queueing. There are variety of algorithms and implementations, please have a look at Wikipedia page for the details: https://en.wikipedia.org/wiki/Priority_queue

You can do several things to approach such a problem.
One of them is to set a value of "max-iterations" or "max-effort" in a Machine Learning style that you can invest into reading a book. Therefore you will execute (handle) only up to a number of actions, until the limit has been reached. This solution will look like:
while(effortRemaining > 0){
# Do your actions
}
The actions you do should be the ones that report more benefit/less effort according to some metric that you need to define.
When you perform a certain action (handle) you substract the cost/effort of that action from effortRemaining and you continue with your flow.

You have already the algorithm (with the help of Andriy for the priority queue), but you lack the design. When I see your multple ifs that check the type of the problem, I think to polymorphism.
Why not try OOP? You have two objects to define: a problem and a priority queue. Fortunately, the priority queue is defined in the heapq module. Let's focus on the problem:
in its core definition, it is handled and may be compared to other problems (it is more or less urgent). Note that, guided by the OOP principles, I do not talk of the structure or implementation of a problem,
but only of the functions of a problem:
class Problem()
def handle(self, some args): # we may need a dictionary, a database connection, ...
...
def compare(self, other):
...
But you said that when a problem is handled, it may add new problems to the queue. So let's add a precision to the definition of handle:
def handle(self, queue, some args): # we still may need a dictionary, a database connection, ...
...
In Python, compare is a special method named __lt__, for "lower than". (You have other special comparison methods, but __lt__ will be sufficient here.)
Here's a basic implementation example:
class Problem():
def __init__(self, name, weight):
self.__name = name
self.__weight = weight
def handle(self, queue):
print ("handle the problem {}".format(self))
while random.random() > 0.75: # randomly add new problems for the example
new_problem = Problem(name*2, random.randint(0, 100))
print ("-> Add the problem {} to the queue".format(new_problem))
heapq.heappush(queue, new_problem) # add the problem to the queue
def __lt__(self, other):
return self.__weight > other.__weight # note the >
def __repr__(self): # to show in lists
return "Problem({}, {})".format(self.__name, self.__weight)
Wait! Why "lower than" and a >? That's because the module heapq is a min-heap: it returns the smallest element first. Thus, we define the big weights as smaller than the little weights.
Now, we can build a begin queue with fake data for the example:
queue = []
for name in ["foo", "bar", "baz"]:
problem = Problem(name, random.randint(0, 100))
heapq.heappush(queue, problem) # add the problem to the queue
And run the main loop:
while queue:
print ("Current queue", queue)
problem = heapq.heappop(queue) # the problem with the max weight in O(lg n)
problem.handle(queue)
I guess you will be able to subclass the Problem class to represent the various problems you might want to handle.

Related

Undo in Python with a very large state. Is it possible?

This appears simple, but I can't find a good solution.
It's the old 'pass by reference'/ 'pass by value' / 'pass by object reference' problem. I understand what is happening, but I can't find a good work around.
I am aware of solutions for small problems, but my state is very large and extremely expensive to save/ recalculate. Given these constraints, I can't find a solution.
Here is some simple pseudocode to illustrate what I would like to do (if Python would let me pass by reference):
class A:
def __init__(self,x):
self.g=x
self.changes=[]
def change_something(self,what,new): # I want to pass 'what' by reference
old=what # and then de-reference it here to read the value
self.changes.append([what,old]) # store a reference
what=new # dereference and change the value
def undo_changes():
for c in self.changes:
c[0]=c[1] # dereference and restore old value
Edit: Adding some more pseudocode to show how I would like the use the above
test=A(1) # initialise test.g as 1
print(test.g)
out: 1
test.change_something(test.g,2)
# if my imaginary code above functioned as described in its comments,
# this would change test.g to 2 and store the old value in the list of changes
print(test.g)
out: 2
test.undo_changes()
print(test.g)
out: 1
Obviously the above code doesnt work in python due to being 'pass by object reference'. Also I'd like to be able to undo a single change, not just all of them as in the code above.
The thing is... I can't seem to find a good work around. There are solutions out there like these:
Do/Undo using command pattern in Python
making undo in python
Which involve storing a stack of commands. 'Undo' then involves removing the last command and then re-building the final state by taking the initial state and re-applying everything but the last command. My state is too large for this to be feasible, the issues are:
The state is very large. Saving it entirely is prohibitively expensive.
'Do' operations are costly (making recalculating from a saved state infeasible).
Do operations are also non-deterministic, relying on random input.
Undo operations are very frequent
I have one idea, which is to ensure that EVERYTHING is stored in lists, and writing my code so that everything is stored, read from and written to these lists. Then in the code above I can pass the list name and list index every time I want to read/write a variable.
Essentially this amounts to building my own memory architecture and C-style pointer system within Python!
This works, but seems a little... ridiculous? Surely there is a better way?
Please check if it helps....
class A:
def __init__(self,x):
self.g=x
self.changes={}
self.changes[str(x)] = {'init':x, 'old':x, 'new':x} #or make a key by your choice(immutable)
def change_something(self,what,new): # I want to pass 'what' by reference
self.changes[what]['new'] = new #add changed value to your dict
what=new # dereference and change the value
def undo_changes():
what = self.changes[what]['old'] #retrieve/changed to the old value
self.changes[what]['new'] = self.changes[what]['old'] #change latest new val to old val as you reverted your changes
for each change you can update the change_dictionary. Onlhy thing you have to figure out is "how to create entry for what as a key in self.change dictionary", I just made it str(x), just check the type(what) and how to make it a key in your case.
Okay so I have come up with an answer... but it's ugly! I doubt it's the best solution. It uses exec() which I am told is bad practice and to be avoided if at all possible. EDIT: see below!
Old code using exec():
class A:
def __init__(self,x):
self.old=0
self.g=x
self.h=x*10
self.changes=[]
def change_something(self,what,new):
whatstr='self.'+what
exec('self.old='+whatstr)
self.changes.append([what,self.old])
exec(whatstr+'=new')
def undo_changes(self):
for c in self.changes:
exec('self.'+c[0]+'=c[1]')
def undo_last_change(self):
c = self.changes[-1]
exec('self.'+c[0]+'=c[1]')
self.changes.pop()
Thanks to barny, here's a much nicer version using getattr and setattr:
class A:
def __init__(self,x):
self.g=x
self.h=x*10
self.changes=[]
def change_something(self,what,new):
self.changes.append([what,getattr(self,what)])
setattr(self,what,new)
def undo_changes(self):
for c in self.changes:
setattr(self,c[0],c[1])
def undo_last_change(self):
c = self.changes[-1]
setattr(self,c[0],c[1])
self.changes.pop()
To demonstrate, the input:
print("demonstrate changing one value")
b=A(1)
print('g=',b.g)
b.change_something('g',2)
print('g=',b.g)
b.undo_changes()
print('g=',b.g)
print("\ndemonstrate changing two values and undoing both")
b.change_something('h',3)
b.change_something('g',4)
print('g=', b.g, 'h=',b.h)
b.undo_changes()
print('g=', b.g, 'h=',b.h)
print("\ndemonstrate changing two values and undoing one")
b.change_something('h',30)
b.change_something('g',40)
print('g=', b.g, 'h=',b.h)
b.undo_last_change()
print('g=', b.g, 'h=',b.h)
returns:
demonstrate changing one value
g= 1
g= 2
g= 1
demonstrate changing two values and undoing both
g= 4 h= 3
g= 1 h= 10
demonstrate changing two values and undoing one
g= 40 h= 30
g= 1 h= 30
EDIT 2: Actually... after further testing, my initial version with exec() has some advantages over the second. If the class contains a second class, or list, or whatever, the exec() version has no trouble updating a list within a class within a class, however the second version will fail.

Clearest way to initialize module-level lists in Python

I've got a situation where a module needs to do some simple, but slightly time consuming initialization. One of the end conditions is a pair of lists which will be filled out by the initialization; what's bugging me is that there's a conflict between the role of the lists - which are basically intended to be constants - and the need to actually initialize them.
I feel uneasy writing code like this:
CONSTANT_LIST = []
DIFFERENT_LIST = []
for item in get_some_data_slowly():
if meets_criteria_one(item):
CONSTANT_LIST.append(item)
continue
if meets_criteria_two(item):
DIFFERENT_LIST.append(item)
Since the casual reader will see those lists in the position usually occupied by constants and may expect them to be empty.
OTOH, I'd be OK with the same under-the-hood facts if I could write this as a list comprehension:
CONSTANT_LIST = [i for i in some_data() if criterion(i)]
and so on... except that I need two lists drawn from the same (slightly time consuming) source, so two list comprehensions will make the code noticeably slower.
To make it worse, the application is such that hiding the constants behind methods:
__private_const_list = None
__other_private_list = None
def public_constant_list():
if __private_const_list: return __private_const_list
# or do the slow thing now and fill out both lists...
# etc
def public_other_const_list():
# same thing
is kind of silly since the likely use frequency is basically 1 per session.
As you can see it's not a rocket science issue but my Python sense is not tingling at all. Whats the appropriate pythonic pattern here?
The loop is quite clear. Don't obfuscate it by being too clever. Just use comments to help explain
CONSTANT_LIST = [] # Put a comment here to tell the reader that these
DIFFERENT_LIST = [] # are constants that are filled in elsewhere
"""
Here is an example of what CONSTANT_LIST looks like
...
Here is an example of what DIFFERENT_LIST looks like
...
"""
for item in get_some_data_slowly():
if meets_criteria_one(item):
CONSTANT_LIST.append(item)
elif meets_criteria_two(item):
DIFFERENT_LIST.append(item)
Maybe use elif instead of continue/if

using function attributes to store results for lazy (potential) processing

I'm doing some collision detection and I would very much like to use the same function in two different contexts. In one context, I would like for it to be something like
def detect_collisions(item, others):
return any(collides(item, other) for other in others)
and in another, I would like it to be
def get_collisions(item, others):
return [other for other in others if collides(item, other)]
I really hate the idea of writing two functions here. Just keeping their names straight is one turnoff and complicating the interface to the collision detecting is another. so I was thinking:
def peek(gen):
try:
first = next(gen)
except StopIteration:
return False
else:
return it.chain((first,), gen)
def get_collisions(item, others):
get_collisions.all = peek(other for other in others if collides(item, other))
return get_collisions.all
Now when I just want to do a check, I can say:
if get_collisions(item, others):
# aw snap
or
if not get_collisions(item, others):
# w00t
and in the other context where I actually want to examine them, I can do:
if get_collisions(item, others):
for collision in get_collisions.all:
# fix it
and in both cases, I don't do any more processing than I need to.
I recognize that this is more code than the first two functions but it also has the advantage of:
Keeping my interface to the collision detection as a tree with the node at the top level instead of a mid level. This seems simpler.
Hooking myself up with a handy peek function. If I use it one other time, then I'm actually writing less code. (In response to YAGNI, if I have it, I will)
So. If you were the proverbial homicidal maniac that knows where I live, would I be expecting a visit from you if I wrote the above code? If so, how would you approach this situation?
Just make get_collisions return a generator:
def get_collisions(item, others):
return (other for other in others if collides(item, other))
Then, if you want to do a check:
for collision in get_collisions(item, others):
    print 'Collision!'
break
else:
    print 'No collisions!'
This is very similar to what we were discussing in "pythonic way to rewrite an assignment in an if statement", but this version only handles one positional or one keyword argument per call so one can always retrieve the actual valued cached (not just whether is a True value in the Pythonian boolean sense or not).
I never really cared for your proposal to accept multiple keywords and have differently named functions depending on whether you wanted all the results put through any() or all() -- but liked the idea of using keyword arguments which would allow a single function to be used in two or more spots simultaneously. Here's what I ended up with:
# can be called with a single unnamed value or a single named value
def cache(*args, **kwargs):
if len(args)+len(kwargs) == 1:
if args:
name, value = 'value', args[0] # default attr name 'value'
else:
name, value = kwargs.items()[0]
else:
raise NotImplementedError('"cache" calls require either a single value argument '
'or a name=value argument identifying an attribute.')
setattr(cache, name, value)
return value
# add a sub-function to clear the cache attributes (optional and a little weird)
cache.clear = lambda: cache.func_dict.clear()
# you could then use it either of these two ways
if get_collisions(item, others):
# no cached value
if cache(collisions=get_collisions(item, others)):
for collision in cache.collisions:
# fix them
By putting all the ugly details in a separate function, it doesn't affect the code in get_collisions() one way or the other, and is also available for use elsewhere.

Basic Python: Exception raising and local variable scope / binding

I have a basic "best practices" Python question. I see that there are already StackOverflow answers tangentially related to this question but they're mired in complicated examples or involve multiple factors.
Given this code:
#!/usr/bin/python
def test_function():
try:
a = str(5)
raise
b = str(6)
except:
print b
test_function()
what is the best way to avoid the inevitable "UnboundLocalError: local variable 'b' referenced before assignment" that I'm going to get in the exception handler?
Does python have an elegant way to handle this? If not, what about an inelegant way? In a complicated function I'd prefer to avoid testing the existence of every local variable before I, for example, printed debug information about them.
Does python have an elegant way to
handle this?
To avoid exceptions from printing unbound names, the most elegant way is not to print them; the second most elegant is to ensure the names do get bound, e.g. by binding them at the start of the function (the placeholder None is popular for this purpose).
If not, what about an inelegant way?
try: print 'b is', b
except NameError: print 'b is not bound'
In a complicated function I'd prefer
to avoid testing the existence of
every local variable before I, for
example, printed debug information
about them
Keeping your functions simple (i.e., not complicated) is highly recommended, too. As Hoare wrote 30 years ago (in his Turing acceptance lecture "The Emperor's old clothes", reprinted e.g. in this PDF):
There are two ways of constructing a
software design: One way is to make it
so simple that there are obviously no
deficiencies, and the other way is to
make it so complicated that there are
no obvious deficiencies. The first
method is far more difficult.
Achieving and maintaining simplicity is indeed difficult: given that you have to implement a certain total functionality X, it's the most natural temptation in the world to do so via complicated accretion into a few complicated classes and functions of sundry bits and pieces, "clever" hacks, copy-and-paste-and-edit-a-bit episodes of "drive-by coding", etc, etc.
However, it's a worthwhile effort to strive instead to keep your functions "so simple that there are obviously no deficiencies". If a function's hard to completely unit-test, it's too complicated: break it up (i.e., refactor it) into its natural components, even though it will take work to unearth them. (That's actually one of the way in which a strong focus on unit testing helps code quality: by spurring you relentlessly to keep all the code perfectly testable, it's at the same time spurring you to make it simple in its structure).
You can initialize your variables outside of the try block
a = None
b = None
try:
a = str(5)
raise
b = str(6)
except:
print b
You could check to see if the variable is defined in local scope using the built-in method locals()
http://docs.python.org/library/functions.html#locals
#!/usr/bin/python
def test_function():
try:
a = str(5)
raise
b = str(6)
except:
if 'b' in locals(): print b
test_function()
def test_function():
try:
a = str(5)
raise
b = str(6)
except:
print b
b = str(6) is never run; the program exits try block just after raise. If you want to print some variable in the except block, evaluate it before raising an exception and put them into the exception you throw.
class MyException(Exception):
def __init__(self, var):
self.var = var
def test_function():
try:
a = str(5)
b = str(6)
raise MyException(b)
except MyException,e:
print e.var

A simple freeze behavior decorator

I'm trying to write a freeze decorator for Python.
The idea is as follows :
(In response to the two comments)
I might be wrong but I think there is two main use of
test case.
One is the test-driven development :
Ideally , developers are writing case before writing the code.
It usually helps defining the architecture because this discipline
forces to define the real interfaces before development.
One may even consider that in some case the person who
dispatches job between dev is writing the test case and
use it to illustrate efficiently the specification he has in mind.
I don't have any experience of the use of test case like that.
The second is the idea that all project with a decent
size and a several programmers is suffering from broken code.
Something that use to work may get broken from a change
that looked like an innocent refactoring.
Though good architecture, loose couple between component may
help to fight against this phenomenon ; you will sleep better
at night if you have written some test case to make sure
that nothing will break your program's behavior.
HOWEVER,
Nobody can deny the overhead of writting test cases. In the
first case one may argue that test case is actually guiding
development and is therefore not to be considered as an overhead.
Frankly speaking, I'm a pretty young programmer and if I were
you, my word on this subject is not really valuable...
Anyway, I think that mosts company/projects are not working
like that, and that unit tests are mainly used in the second
case...
In other words, rather than ensuring that the program is
working correctly, it is aiming at checking that it will
work the same in the future.
This needs can be met without the cost of writing tests,
by using this freezing decorator.
Let's say you have a function
def pow(n,k):
if n == 0: return 1
else: return n * pow(n,k-1)
It is perfectly nice, and you want to rewrite it as an optimized version.
It is part of a big project. You want it to give back the same result
for a few value.
Rather than going through the pain of test cases, one could use some
kind of freeze decorator.
Something such that the first time the decorator is run,
the decorator run the function with the defined args (below 0, and 7)
and saves the result in a map ( f --> args --> result )
#freeze(2,0)
#freeze(1,3)
#freeze(3,5)
#freeze(0,0)
def pow(n,k):
if n == 0: return 1
else: return n * pow(n,k-1)
Next time the program is executed, the decorator will load this map and check
that the result of this function for these args as not changed.
I already wrote quickly the decorator (see below), but hurt a few problems about
which I need your advise...
from __future__ import with_statement
from collections import defaultdict
from types import GeneratorType
import cPickle
def __id_from_function(f):
return ".".join([f.__module__, f.__name__])
def generator_firsts(g, N=100):
try:
if N==0:
return []
else:
return [g.next()] + generator_firsts(g, N-1)
except StopIteration :
return []
def __post_process(v):
specialized_postprocess = [
(GeneratorType, generator_firsts),
(Exception, str),
]
try:
val_mro = v.__class__.mro()
for ( ancestor, specialized ) in specialized_postprocess:
if ancestor in val_mro:
return specialized(v)
raise ""
except:
print "Cannot accept this as a value"
return None
def __eval_function(f):
def aux(args, kargs):
try:
return ( True, __post_process( f(*args, **kargs) ) )
except Exception, e:
return ( False, __post_process(e) )
return aux
def __compare_behavior(f, past_records):
for (args, kargs, result) in past_records:
assert __eval_function(f)(args,kargs) == result
def __record_behavior(f, past_records, args, kargs):
registered_args = [ (a, k) for (a, k, r) in past_records ]
if (args, kargs) not in registered_args:
res = __eval_function(f)(args, kargs)
past_records.append( (args, kargs, res) )
def __open_frz():
try:
with open(".frz", "r") as __open_frz:
return cPickle.load(__open_frz)
except:
return defaultdict(list)
def __save_frz(past_records):
with open(".frz", "w") as __open_frz:
return cPickle.dump(past_records, __open_frz)
def freeze_behavior(*args, **kvargs):
def freeze_decorator(f):
past_records = __open_frz()
f_id = __id_from_function(f)
f_past_records = past_records[f_id]
__compare_behavior(f, f_past_records)
__record_behavior(f, f_past_records, args, kvargs)
__save_frz(past_records)
return f
return freeze_decorator
Dumping and Comparing of results is not trivial for all type. Right now I'm thinking about using a function (I call it postprocess here), to solve this problem.
Basically instead of storing res I store postprocess(res) and I compare postprocess(res1)==postprocess(res2), instead of comparing res1 res2.
It is important to let the user overload the predefined postprocess function.
My first question is :
Do you know a way to check if an object is dumpable or not?
Defining a key for the function decorated is a pain. In the following snippets
I am using the function module and its name.
** Can you think of a smarter way to do that. **
The snippets below is kind of working, but opens and close the file when testing and when recording. This is just a stupid prototype... but do you know a nice way to open the file, process the decorator for all function, close the file...
I intend to add some functionalities to this. For instance, add the possibity to define
an iterable to browse a set of argument, record arguments from real use, etc.
Why would you expect from such a decorator?
In general, would you use such a feature, knowing its limitation... Especially when trying to use it with POO?
"In general, would you use such a feature, knowing its limitation...?"
Frankly speaking -- never.
There are no circumstances under which I would "freeze" results of a function in this way.
The use case appears to be based on two wrong ideas: (1) that unit testing is either hard or complex or expensive; and (2) it could be simpler to write the code, "freeze" the results and somehow use the frozen results for refactoring. This isn't helpful. Indeed, the very real possibility of freezing wrong answers makes this a bad idea.
First, on "consistency vs. correctness". This is easier to preserve with a simple mapping than with a complex set of decorators.
Do this instead of writing a freeze decorator.
print "frozen_f=", dict( (i,f(i)) for i in range(100) )
The dictionary object that's created will work perfectly as a frozen result set. No decorator. No complexity to speak of.
Second, on "unit testing".
The point of a unit test is not to "freeze" some random results. The point of a unit test is to compare real results with results developed another (simpler, more obvious, poorly-performing way). Usually unit tests compare hand-developed results. Other times unit tests use obvious but horribly slow algorithms to produce a few key results.
The point of having test data around is not that it's a "frozen" result. The point of having test data is that it is an independent result. Done differently -- sometimes by different people -- that confirms that the function works.
Sorry. This appears to me to be a bad idea; it looks like it subverts the intent of unit testing.
"HOWEVER, Nobody can deny the overhead of writting test cases"
Actually, many folks would deny the "overhead". It isn't "overhead" in the sense of wasted time and effort. For some of us, unittests are essential. Without them, the code may work, but only by accident. With them, we have ample evidence that it actually works; and the specific cases for which it works.
Are you looking to implement invariants or post conditions?
You should specify the result explicitly, this wil remove most of you problems.

Categories

Resources