Related
I'm encountering some behaviour which I can't explain.
I have some expensive functions that get called repeatedly; I have decorated them with #lru_cache(None) to help speed things up. My run times were still quite slow after doing that so I was a little confused.
I then realised that some of these functions had custom objects as parameters. My understanding is that by default, the hash for any custom object is based on it's ID. So my theory was that some of these expensive functions were being re-evaluated despite these arguments containing identical data. My objects are only used to group immutable data so I'm comfortable with looking up the cached value where the data within those objects is the same.
So based on my understanding of the lru_cache function, I added a __hash__ method to my objects, just doing something very crude for starters:
def __hash__(self):
return hash(str(self.__dict__))
So my theory is that my program should now be much quicker, as the caching will now take place on some of these expensive functions where it wasn't before.
To my dismay, my program is vastly slower; possibly it's getting stuck somewhere as I have not even had the patience to let it finish. For context, without the custom __hash__ methods a test case ran in about 16s; after adding the __hash__ methods the same test case was still running after about 10 minutes.
I don't have a deep understanding of how lru_cache works, but I have had a look at the source code and as far as I can tell it will just use my __hash__ function when it encounters those objects as parameters. Based on the drastic increase in run time, my current theory is that this is somehow causing the program to get stuck somewhere, rather than the cache lookups actually taking that long for some reason. But I can't see any reason why that would happen.
This feels like a bit of a wild goose chase to me but I can't imagine I'm the first person to try this. Does anybody know why this might happen?
Thanks
Edit:
I ran an even smaller test case to check if the program is actually terminating; it is. The smaller test case took 2.5s to run without the custom __hash__ functions, and 40s with them.
I have to stress that nothing else is changing between these two runs. The only difference is that I am adding the __hash__ function described above to three classes which take a journey around my code. Therefore I think the only possible conclusion is that my __hash__ function is somehow hugely slower than the default that would otherwise be used by lru_cache. That is, unless implementing a custom __hash__ function has other (invisible) costs that I'm not aware of.
I'm still at a loss to explain this. These are quite large objects which contain a lot of data, so str(self.__dict__) will be a pretty long string (probably thousands of characters). However I don't believe that hashing should take appreciably longer for a longer string. Perhaps Python does huge amounts of hashing in the background in various places and this small difference can add up? It seems far-fetched to me but there don't seem to be many options - the only alternative I can see is some weird interaction with the lru_cache logic which leads to a big slow-down. I'll keep doing experiments but hopefully someone will know the answer!
Edit 2:
I followed Samwise's suggestion and benchmarked this __hash__ function and it does seem to be genuinely a lot slower, and given the number of calls I can believe that this is the entire reason for my issue. I'm guessing that the self.__dict__ part is the bottleneck but my intuition about this doesn't have the best track-record so far.
That still leaves me with the problem of trying to speed up my code, but at least I know what's going on now.
Edit 3:
For anyone else who encounters this problem in the future - I decided to just pre-compute a hash value in my initialiser for my objects and return that in my __hash__ function, and that has sped things up massively. This solution does depend on the object not being mutated after creation.
The answer to this question ended up being quite simple - str(self.__dict__) is actually a pretty slow thing to run on every function call. I'm not sure why I didn't think of that in the first place.
Ultimately what I decided to do was just add a property to my classes, _hash, and I would set this equal to str(self.__dict__) at the end of initialising a new object. Then in my custom __hash__ method I would just grab the value of _hash, so that now lru_cache will work for functions with arguments which assume the type of my objects and not have to call str(self.__dict__) for every function call.
I should make it clear that this only works under the assumption that the object has its entire state defined at initialisation and doesn't get mutated over its lifetime - if it does, then the hash will go out of date you'll end up getting hits from the cache that aren't appropriate.
Wondering about the performance impact of doing one iteration vs many iterations. I work in Python -- I'm not sure if that affects the answer or not.
Consider trying to perform a series of data transformations to every item in a list.
def one_pass(my_list):
for i in xrange(0, len(my_list)):
my_list[i] = first_transformation(my_list[i])
my_list[i] = second_transformation(my_list[i])
my_list[i] = third_transformation(my_list[i])
return my_list
def multi_pass(my_list):
range_end = len(my_list)
for i in xrange(0, range_end):
my_list[i] = first_transformation(my_list[i])
for i in xrange(0, range_end):
my_list[i] = second_transformation(my_list[i])
for i in xrange(0, range_end):
my_list[i] = third_transformation(my_list[i])
return my_list
Now, apart from issues with readability, strictly in performance terms, is there a real advantage to one_pass over multi_pass? Assuming most of the work happens in the transformation functions themselves, wouldn't each iteration in multi_pass only take roughly 1/3 as long?
The difference will be how often the values and code you're reading are in the CPU's cache.
If the elements of my_list are large, but fit into the CPU cache, the first version may be beneficial. On the other hand, if the (byte)code of the transformations is large, caching the operations may be better than caching the data.
Both versions are probably slower than the way more readable:
def simple(my_list):
return [third_transformation(second_transformation(first_transformation(e)))
for e in my_list]
Timing it yields:
one_pass: 0.839533090591
multi_pass: 0.840938806534
simple: 0.569097995758
Assuming you're considering a program that can easily be one loop with multiple operations, or multiple loops doing one operation each, then it never changes the computational complexity (e.g. an O(n) algorithm is still O(n) either way).
One advantage of the single-pass approach are that you save on the "book-keeping" of the looping. Whether the iteration mechanism is incrementing and comparing counters, or retrieving "next" pointers and checking for null, or whatever, you do it less when you do everything in one pass. Assuming that your operations do any significant amount of work at all (and that your looping mechanism is simple and straightforward, not looping over an expensive generator or something), then this "book-keeping" work will be dwarfed by the actual work of your operations, which makes this definitely a micro-optimisation that you shouldn't be doing unless you know your program is too slow and you've exhausted all more significant available optimisations.
Another advantage can be that applying all your operations to each element of the iteration before you move on to the next one tends to benefit better from the CPU cache, since each item could still be in the cache in subsequent operations on the same item, whereas using multiple passes makes that almost impossible (unless your entire collection fits in the cache). Python has so much indirection via dictionaries going on though that it's probably not hard for each individual operation to overflow the cache by reading hash buckets scattered all over the memory space. So this is still a micro-optimisation, but this analysis gives it more of a chance (though still no certainty) of making a significant difference.
One advantage of multi-pass can be that if you need to keep state between loop iterations, the single-pass approach forces you to keep the state of all operations around. This can hurt the CPU cache (maybe the state of each operation individually fits in the cache for an entire pass, but not the state of all the operations put together). In the extreme case this effect could theoretically make the difference between the program fitting in memory and not (I have encountered this once in a program that was chewing through very large quantities of data). But in the extreme cases you know that you need to split things up, and the non-extreme cases are again micro-optimisations that are not worth making in advance.
So performance generally favours single-pass by an insignificant amount, but can in some cases favour either single-pass or multi-pass by a significant amount. The conclusion you can draw from this is the same as the general advice applying to all programming: start by writing code in whatever way is most clear and maintainable and still adequately addresses your program. Only once you've got a mostly finished program and if it turns out to be "not fast enough", then measure the performance effects of the various parts of your code to find out where it's worth spending your time.
Time spent worrying about whether to write single-pass or multi-pass algorithms for performance reasons will almost always turn out to have been wasted. So unless you have unlimited development time available to you, you will get the "best" results from your total development effort (where best includes performance) by not worrying about this up-front, and addressing it on an as-needed basis.
Personally, I would no doubt prefer the one_pass option. It definitely performs better. You may be right that the difference wouldn't be huge. Python has optimized the xrange iterator really well, but you are still doing 3 times more iterations than needed.
You may get decreased cached misses in either version compared to the other. It depends on what those transform functions actually do.
If those functions have a lot of code and operate on different sets of data (besides the input and output), multipass may be better. Otherwise the single pass is likely to be better because the current list element will likely remain cached and the loop operations are only done once instead of three times.
This is a case were comparing actual run times would be very useful.
When I started learning Python, I created a few applications just using functions and procedural code. However, now I know classes and realized that the code can be much readable (and subjectively easier to understand) if I rewrite it with classes.
How much slower the equivalent classes may get compared to the functions in general? Will the initializer, methods in the classes make any considerable difference in speed?
To answer the question: yes, it is likely to be a little slower, all else being equal. Some things that used to be variables (including functions) are now going to be object attributes, and self.foo is always going to be slightly slower than foo regardless of whether foo was a global or local originally. (Local variables are accessed by index, and globals by name, but an attribute lookup on an object is either a local or a global lookup, plus an additional lookup by name for the attribute, possibly in multiple places.) Calling a method is also slightly slower than calling a function -- not only is it slower to get the attribute, it is also slower to make the call, because a method is a wrapper object that calls the function you wrote, adding an extra function call overhead.
Will this be noticeable? Usually not. In rare cases it might be, say if you are accessing an object attribute a lot (thousands or millions of times) in a particular method. But in that case you can just assign self.foo to a local variable foo at the top of the method, and reference it by the local name throughout, to regain 99.44% of the local variable's performance advantage.
Beyond that there will be some overhead for allocating memory for instances that you probably didn't have before, but unless you are constantly creating and destroying instances, this is likely a one-time cost.
In short: there will be a likely-minor performance hit, and where the performance hit is more than minor, it is easy to mitigate. On the other hand, you could save hours in writing and maintaining the code, assuming your problem lends itself to an object-oriented solution. And saving time is likely why you're using a language like Python to begin with.
No.
In general you will not notice any difference in performance based on using classes or not. The different code structures implied may mean that one is faster than the other, but it's impossible to say which.
Always write code to be read, then if, and only if, it's not fast enough make it faster. Remember: Premature optimization is the root of all evil.
Donald Knuth, one of the grand old minds of computing, is credited with the observation that "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil." Deciding to use procedural techniques rather than object-oriented ones on the basis of speed gains that may well not be realized anyway is not a sensible strategy.
If your code works and doesn't need to be modified then feel free to leave it alone. If it needs to be modified then you should consider a judicious refactoring to include classes, since program readability is far more important than speed during development. You will also see benefits in improved maintainability. An old saw from Kernighan and Plauger's "Elements of Programming Style" still applies:
First, make it work. Then (if it doesn't work fast enough) make it work faster.
But, first and foremost, go for readability. Seriously.
You probably don't care as much as you think you do.
Really.
Sure, code with classes might be a little slower through indirection. Maybe. That is what JIT compilation is for, right? I can never remember which versions of python do this and which don't, because:
Performance doesn't matter.
At least constant performance differences like this. Unless you are doing a hell of a lot of computations (you aren't!), you will spend more time developing/debugging/maintaining your code. Optimize for that.
Really. Because you will never ever be able to measure the difference, unless you are in a tight loop. And you don't want to be doing that in python anyway, unless you don't really care about time. It's not like you're trying to balance your segway in python, right? You just want to compute some numbers, right? Your computer is really good at this. Trust it.
That said, this doesn't mean classes are the way to go. Just that speed isn't the question you should be asking. Instead, try to figure out what representation will be the best for your code. It seems, now you know classes, you will write clean code in OO fashion. Go ahead. Learn. Iterate.
I'm working in the Google App Engine environment and programming in Python. I am creating a function that essentially generates a random number/letter string and then stores to the memcache.
def generate_random_string():
# return a random 6-digit long string
def check_and_store_to_memcache():
randomstring = generate_random_string()
#check against memcache
#if ok, then store key value with another value
#if not ok, run generate_random_string() again and check again.
Does creating two functions instead of just one big one affect performance? I prefer two, as it better matches how I think, but don't mind combining them if that's "best practice".
Focus on being able to read and easily understand your code.
Once you've done this, if you have a performance problem, then look into what might be causing it.
Most languages, python included, tend to have fairly low overhead for making method calls. Putting this code into a single function is not going to (dramatically) change the performance metrics - I'd guess that your random number generation will probably be the bulk of the time, not having 2 functions.
That being said, splitting functions does have a (very, very minor) impact on performance. However, I'd think of it this way - it may take you from going 80 mph on the highway to 79.99mph (which you'll never really notice). The important things to watch for are avoiding stoplights and traffic jams, since they're going to make you have to stop altogether...
In almost all cases, "inlining" functions to increase speed is like getting a hair cut to lose weight.
Reed is right. For the change you're considering, the cost of a function call is a small number of cycles, and you'd have to be doing it 10^8 or so times per second before you'd notice.
However, I would caution that often people go to the other extreme, and then it is as if function calls were costly. I've seen this in over-designed systems where there were many layers of abstraction.
What happens is there is some human psychology that says if something is easy to call, then it is fast. This leads to writing more function calls than strictly necessary, and when this occurs over multiple layers of abstraction, the wastage can be exponential.
Following Reed's driving example, a function call can be like a detour, and if the detour contains detours, and if those also contain detours, soon there is tremendous time being wasted, for no obvious reason, because each function call looks innocent.
Like others have said, I wouldn't worry about it in this particular scenario. The very small overhead involved in function calls would pale in comparison to what is done inside each function. And as long as these functions don't get called in rapid succession, it probably wouldn't matter much anyway.
It is a good question though. In some cases it's best not to break code into multiple functions. For example, when working with math intensive tasks with nested loops it's best to make as few function calls as possible in the inner loop. That's because the simple math operations themselves are very cheap, and next to that the function-call-overhead can cause a noticeable performance penalty.
Years ago I discovered the hypot (hypotenuse) function in the math library I was using in a VC++ app was very slow. It seemed ridiculous to me because it's such a simple set of functionality -- return sqrt(a * a + b * b) -- how hard is that? So I wrote my own and managed to improve performance 16X over. Then I added the "inline" keyword to the function and made it 3X faster than that (about 50X faster at this point). Then I took the code out of the function and put it in my loop itself and saw yet another small performance increase. So... yeah, those are the types of scenarios where you can see a difference.
I had an argument with a colleague about writing python efficiently. He claimed that though you are programming python you still have to optimise the little bits of your software as much as possible, as if you are writing an efficient algorithm in C++.
Things like:
In an if statement with an or always put the condition most likely to fail first, so the second will not be checked.
Use the most efficient functions for manipulating strings in common use. Not code that grinds strings, but simple things like doing joins and splits, and finding substrings.
Call as less functions as possible, even if it comes on the expense of readability, because of the overhead this creates.
I say, that in most cases it doesn't matter. I should also say that context of the code is not a super-efficient NOC or missile-guidance systems. We're mostly writing tests in python.
What's your view of the matter?
My answer to that would be :
We should forget about small
efficiencies, say about 97% of the
time: premature optimization is the
root of all evil.
(Quoting Knuth, Donald. Structured Programming with go to Statements, ACM Journal Computing Surveys, Vol 6, No. 4, Dec. 1974. p.268)
If your application is doing anything like a query to the database, that one query will take more time than anything you can gain with those kind of small optimizations, anyway...
And if running after performances like that, why not code in assembly language, afterall ? Because Python is easier/faster to write and maintain ? Well, if so, you are right :-)
The most important thing is that your code is easy to maintain ; not a couple micro-seconds of CPU-time !
Well, maybe except if you have thousands of servers -- but is it your case ?
The answer is really simple :
Follow Python best practices, not C++ best practices.
Readability in Python is more important that speed.
If performance becomes an issue, measure, then start optimizing.
This sort of premature micro-optimisation is usually a waste of time in my experience, even in C and C++. Write readable code first. If it's running too slowly, run it through a profiler, and if necessary, fix the hot-spots.
Fundamentally, you need to think about return on investment. Is it worth the extra effort in reading and maintaining "optimised" code for the couple of microseconds it saves you? In most cases it isn't.
(Also, compilers and runtimes are getting cleverer. Some micro-optimisations may become micro-pessimisations over time.)
I agree with others: readable code first ("Performance is not a problem until performance is a problem.").
I only want to add that when you absolutely need to write some unreadable and/or non-intuitive code, you can generally isolate it in few specific methods, for which you can write detailed comments, and keep the rest of your code highly readable. If you do so, you'll end up having easy to maintain code, and you'll only have to go through the unreadable parts when you really need to.
I should also say that context of the code is not a super-efficient NOC or missile-guidance systems. We're mostly writing tests in python.
Given this, I'd say that you should take your colleague's advice about writing efficient Python but ignore anything he says that goes against prioritizing readability and maintainability of the code, which will probably be more important than the speed at which it'll execute.
In an if statement with an or always
put the condition most likely to fail
first, so the second will not be
checked.
This is generally a good advice, and also depends on the logic of your program. If it makes sense that the second statement is not evaluated if the first returns false, then do so. Doing the opposite could be a bug otherwise.
Use the most efficient functions for
manipulating strings in common use.
Not code that grinds strings, but
simple things like doing joins and
splits, and finding substrings.
I don't really get this point. Of course you should use the library provided functions, because they are probably implemented in C, and a pure python implementation is most likely to be slower. In any case, no need to reinvent the wheel.
Call as less functions as possible,
even if it comes on the expense of
readability, because of the overhead
this creates.
$ cat withcall.py
def square(a):
return a*a
for i in xrange(1,100000):
i_square = square(i)
$ cat withoutcall.py
for i in xrange(1,100000):
i_square = i*i
$ time python2.3 withcall.py
real 0m5.769s
user 0m4.304s
sys 0m0.215s
$ time python2.3 withcall.py
real 0m5.884s
user 0m4.315s
sys 0m0.206s
$ time python2.3 withoutcall.py
real 0m5.806s
user 0m4.172s
sys 0m0.209s
$ time python2.3 withoutcall.py
real 0m5.613s
user 0m4.171s
sys 0m0.216s
I mean... come on... please.
I think there are several related 'urban legends' here.
False Putting the more often-checked condition first in a conditional and similar optimizations save enough time for a typical program that it is worthy for a typical programmer.
True Some, but not many, people are using such styles in Python in the incorrect belief outlined above.
True Many people use such style in Python when they think that it improves readability of a Python program.
About readability: I think it's indeed useful when you give the most useful conditional first, since this is what people notice first anyway. You should also use ''.join() if you mean concatenation of strings since it's the most direct way to do it (the s += x operation could mean something different).
"Call as less functions as possible" decreases readability and goes against Pythonic principle of code reuse. And so it's not a style people use in Python.
Before introducing performance optimizations at the expense of readability, look into modules like psyco that will do some JIT-ish compiling of distinct functions, often with striking results, with no impairment of readability.
Then if you really want to embark on the optimization path, you must first learn to measure and profile. Optimization MUST BE QUANTITATIVE - do not go with your gut. The hotspot profiler will show you the functions where your program is burning up the most time.
If optimization turns up a function like this is being frequently called:
def get_order_qty(ordernumber):
# look up order in database and return quantity
If there is any repetition of ordernumbers, then memoization would be a good optimization technique to learn, and it is easily packaged in an #memoize decorator so that there is little impact to program readability. The effect of memoizing is that values returned for a given set of input arguments are cached, so that the expensive function can be called only once, with subseqent calls resolved against the cache.
Lastly, consider lifting invariants out of loops. For large multi-dimensional structures, this can save a lot of time - in fact in this case, I would argue that this optimization improves readability, as it often serves to make clear that some expression can be computed at a high-level dimension in the nested logic.
(BTW, is this really what you meant?
•In an if statement with an or always put the condition most likely to fail first, so the second will not be checked.
I should think this might be the case for "and", but an "or" will short-circuit if the first value is True, saving the evaluation of the second term of the conditional. So I would change this optimization "rule" to:
If testing "A and B", put A first if
it is more likely to evaluate to
False.
If testing "A or B", put A first if
it is more likely to evaluate to
True.
But often, the sequence of conditions is driven by the tests themselves:
if obj is not None and hasattr(obj,"name") and obj.name.startswith("X"):
You can't reorder these for optimization - they have to be in this order (or just let the exceptions fly and catch them later:
if obj.name.startswith("X"):
Sure follow Python best-practices (and in fact I agree with the first two recommendations), but maintainability and efficiency are not opposites, they are mostly togethers (if that's a word).
Statements like "always write your IF statements a certain way for performance" are a-priori, i.e. not based on knowledge of what your program spends time on, and are therefore guesses. The first (or second, or third, whatever) rule of performance tuning is don't guess.
If after you measure, profile, or in my case do this, you actually know that you can save much time by re-ordering tests, by all means, do. My money says that's at the 1% level or less.
My visceral reaction is this:
I've worked with guys like your colleague and in general I wouldn't take advice from them.
Ask him if he's ever even used a profiler.