I'm encountering some behaviour which I can't explain.
I have some expensive functions that get called repeatedly; I have decorated them with #lru_cache(None) to help speed things up. My run times were still quite slow after doing that so I was a little confused.
I then realised that some of these functions had custom objects as parameters. My understanding is that by default, the hash for any custom object is based on it's ID. So my theory was that some of these expensive functions were being re-evaluated despite these arguments containing identical data. My objects are only used to group immutable data so I'm comfortable with looking up the cached value where the data within those objects is the same.
So based on my understanding of the lru_cache function, I added a __hash__ method to my objects, just doing something very crude for starters:
def __hash__(self):
return hash(str(self.__dict__))
So my theory is that my program should now be much quicker, as the caching will now take place on some of these expensive functions where it wasn't before.
To my dismay, my program is vastly slower; possibly it's getting stuck somewhere as I have not even had the patience to let it finish. For context, without the custom __hash__ methods a test case ran in about 16s; after adding the __hash__ methods the same test case was still running after about 10 minutes.
I don't have a deep understanding of how lru_cache works, but I have had a look at the source code and as far as I can tell it will just use my __hash__ function when it encounters those objects as parameters. Based on the drastic increase in run time, my current theory is that this is somehow causing the program to get stuck somewhere, rather than the cache lookups actually taking that long for some reason. But I can't see any reason why that would happen.
This feels like a bit of a wild goose chase to me but I can't imagine I'm the first person to try this. Does anybody know why this might happen?
Thanks
Edit:
I ran an even smaller test case to check if the program is actually terminating; it is. The smaller test case took 2.5s to run without the custom __hash__ functions, and 40s with them.
I have to stress that nothing else is changing between these two runs. The only difference is that I am adding the __hash__ function described above to three classes which take a journey around my code. Therefore I think the only possible conclusion is that my __hash__ function is somehow hugely slower than the default that would otherwise be used by lru_cache. That is, unless implementing a custom __hash__ function has other (invisible) costs that I'm not aware of.
I'm still at a loss to explain this. These are quite large objects which contain a lot of data, so str(self.__dict__) will be a pretty long string (probably thousands of characters). However I don't believe that hashing should take appreciably longer for a longer string. Perhaps Python does huge amounts of hashing in the background in various places and this small difference can add up? It seems far-fetched to me but there don't seem to be many options - the only alternative I can see is some weird interaction with the lru_cache logic which leads to a big slow-down. I'll keep doing experiments but hopefully someone will know the answer!
Edit 2:
I followed Samwise's suggestion and benchmarked this __hash__ function and it does seem to be genuinely a lot slower, and given the number of calls I can believe that this is the entire reason for my issue. I'm guessing that the self.__dict__ part is the bottleneck but my intuition about this doesn't have the best track-record so far.
That still leaves me with the problem of trying to speed up my code, but at least I know what's going on now.
Edit 3:
For anyone else who encounters this problem in the future - I decided to just pre-compute a hash value in my initialiser for my objects and return that in my __hash__ function, and that has sped things up massively. This solution does depend on the object not being mutated after creation.
The answer to this question ended up being quite simple - str(self.__dict__) is actually a pretty slow thing to run on every function call. I'm not sure why I didn't think of that in the first place.
Ultimately what I decided to do was just add a property to my classes, _hash, and I would set this equal to str(self.__dict__) at the end of initialising a new object. Then in my custom __hash__ method I would just grab the value of _hash, so that now lru_cache will work for functions with arguments which assume the type of my objects and not have to call str(self.__dict__) for every function call.
I should make it clear that this only works under the assumption that the object has its entire state defined at initialisation and doesn't get mutated over its lifetime - if it does, then the hash will go out of date you'll end up getting hits from the cache that aren't appropriate.
Related
I want to know if I for example wrote 100 function in a class or even without a class and only used one function in each time I call the class, Does these too many uncalled and unused functions influence the performance or count for something negative?
The answer is practically no. Chunks of code that aren't executed don't influence the performance of the program. This is true for most / all programming languages - not just Python.
That being said, there are some scenarios where this is not accurate:
If your program is very large, it may take a while to load. Once it loads, the execution time with or without the redundant code is the same, but there's a difference in load time.
More code may impact memory organization, which in turn may impact the OS' ability to cache stuff in an effective manner. It's an indirect impact, and unless you know exactly what you're doing it's mostly theoretical.
If you have a very large number of methods in a class, looking up a given method in a class' dictionary may take longer. The average cost of getting an item from a dict is O(1), but worst case can be O(N). You'll have to do a lot of optimization to (maybe) get to a point where you care about this.
There might be some other obscure scenarios in which code size impacts performance - but again, it's more theory than practice.
I have a bit of a stylistic question about parameter validation.
(Using Python)
Say I have a method with a parameter a, which needs to be an int, and maybe needs to be in a certain range - i.e. a list index or something. I could use assertions/other validation to ensure this, but what if I only call the function from one or two places, and the parameter is validated to the proper value/type there? Maybe its possible that the function could be called from other places in the future, but for now, it is 'basically' impossible to have an invalid parameter passed.
It feels unnecessary to add validation code to something that doesn't really need it, but it also seems sloppy to leave the function open to raising an uncaught error if its called from somewhere different.
Sorry if this is too abstract - I expect the answer may just be "it depends" but I was curious if there was a general consensus about this.
In general, I think it does not hurt to validate input parameters to a method every time it is called, even if it is unlikely that the parameters are wrong. The computational overhead is negligible in most cases (say checking type with if type(x) is not int: raise TypeError takes ~100 ns on my laptop, if the condition is verified). Besides, I'm not sure that doing conditional validation is worth it, with regards to code maintainability (it just makes things more complicated).
Of course, this is also problem specific. For instance, if you have a computationally critical function that is called repeatedly (say more then million times) within a loop, it is probably worth skipping the validation step and check the parameters beforehand.
Captain Hindsight, reporting in:
After reading through the comment and answer and running a few tests, I found out that I had made a subtle error in my calculations. Turns out, I was
comparing compiled lookups to interpreted calls. When I precompiled
the call using the NON-IPython line magic version ( ie:
timeit.timeit(codestr, setup_codestr), I found that the function calls
were indeed on the same order of magnitude as the lookups :)
Now there's a whole world of caching function results, precompiling functions, and precompiling types to explore! ..and that's nice :)
For posterity:
I realize that sounds like a strange question, but someone might know a way around this, and that would be great. So here goes:
If I do something like:
%%timeit somelist[42]
Then I get times in the 90 nanosecond range. A slice will get it up to 190ish; and, to my pleasant surprise, even big crazy ones were still fast. This bad boy, for instance, weighs in at 385 nanseconds:
%%timeit some_nested_list[2:5][1][6:13]
Here's the thing. Function calls, it seems, are a lot slower than that. I like decomposing problems functionally, and am starting to give functional programming a bit more thought, but the speed difference is significant and (3.34 microseconds vs 100-150 nanoseconds (realistic actual avgs of conditionals, etc)). The following takes 3.34 micros:
def func():
some_nested_list[2:5][1][6:13]
%%timeit func()
So, there's presumably a lot of functional programmers out there? You all must have dealt with this little hiccup? Someone care to point me in the right direction?
Not really. Python function calls involve a certain amount of overhead for setting up the stack frame, etc., and you can't eliminate that overhead while still writing a Python function. The reason the operations in your example are fast is that you're doing them on a list, and lists are written in C.
One thing to keep in mind is that, in many practical situations, the function call overhead will be small relative to what the function actually does. See this question for some discussion. However, if you move toward a pure-functional style in which each function just evaluates one expression, you may indeed suffer a performance penalty.
An alternative is to look at PyPy, which makes many pure-Python operations faster. I don't know whether it improves function call speed specifically. Also, by using PyPy you restrict the set of libraries you can use.
Finally, there is Cython, which allows you to write code in a language that looks basically the same as Python, but actually compiles to C. This can be much faster than Python in some cases.
The bottom line is that how to speed up your functions depends on what your functions actually do. There is no magic way to just magically make all function calls faster while still keeping everything else about Python the same. If there were, they probably would have already added it to Python.
When I started learning Python, I created a few applications just using functions and procedural code. However, now I know classes and realized that the code can be much readable (and subjectively easier to understand) if I rewrite it with classes.
How much slower the equivalent classes may get compared to the functions in general? Will the initializer, methods in the classes make any considerable difference in speed?
To answer the question: yes, it is likely to be a little slower, all else being equal. Some things that used to be variables (including functions) are now going to be object attributes, and self.foo is always going to be slightly slower than foo regardless of whether foo was a global or local originally. (Local variables are accessed by index, and globals by name, but an attribute lookup on an object is either a local or a global lookup, plus an additional lookup by name for the attribute, possibly in multiple places.) Calling a method is also slightly slower than calling a function -- not only is it slower to get the attribute, it is also slower to make the call, because a method is a wrapper object that calls the function you wrote, adding an extra function call overhead.
Will this be noticeable? Usually not. In rare cases it might be, say if you are accessing an object attribute a lot (thousands or millions of times) in a particular method. But in that case you can just assign self.foo to a local variable foo at the top of the method, and reference it by the local name throughout, to regain 99.44% of the local variable's performance advantage.
Beyond that there will be some overhead for allocating memory for instances that you probably didn't have before, but unless you are constantly creating and destroying instances, this is likely a one-time cost.
In short: there will be a likely-minor performance hit, and where the performance hit is more than minor, it is easy to mitigate. On the other hand, you could save hours in writing and maintaining the code, assuming your problem lends itself to an object-oriented solution. And saving time is likely why you're using a language like Python to begin with.
No.
In general you will not notice any difference in performance based on using classes or not. The different code structures implied may mean that one is faster than the other, but it's impossible to say which.
Always write code to be read, then if, and only if, it's not fast enough make it faster. Remember: Premature optimization is the root of all evil.
Donald Knuth, one of the grand old minds of computing, is credited with the observation that "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil." Deciding to use procedural techniques rather than object-oriented ones on the basis of speed gains that may well not be realized anyway is not a sensible strategy.
If your code works and doesn't need to be modified then feel free to leave it alone. If it needs to be modified then you should consider a judicious refactoring to include classes, since program readability is far more important than speed during development. You will also see benefits in improved maintainability. An old saw from Kernighan and Plauger's "Elements of Programming Style" still applies:
First, make it work. Then (if it doesn't work fast enough) make it work faster.
But, first and foremost, go for readability. Seriously.
You probably don't care as much as you think you do.
Really.
Sure, code with classes might be a little slower through indirection. Maybe. That is what JIT compilation is for, right? I can never remember which versions of python do this and which don't, because:
Performance doesn't matter.
At least constant performance differences like this. Unless you are doing a hell of a lot of computations (you aren't!), you will spend more time developing/debugging/maintaining your code. Optimize for that.
Really. Because you will never ever be able to measure the difference, unless you are in a tight loop. And you don't want to be doing that in python anyway, unless you don't really care about time. It's not like you're trying to balance your segway in python, right? You just want to compute some numbers, right? Your computer is really good at this. Trust it.
That said, this doesn't mean classes are the way to go. Just that speed isn't the question you should be asking. Instead, try to figure out what representation will be the best for your code. It seems, now you know classes, you will write clean code in OO fashion. Go ahead. Learn. Iterate.
I'm working in the Google App Engine environment and programming in Python. I am creating a function that essentially generates a random number/letter string and then stores to the memcache.
def generate_random_string():
# return a random 6-digit long string
def check_and_store_to_memcache():
randomstring = generate_random_string()
#check against memcache
#if ok, then store key value with another value
#if not ok, run generate_random_string() again and check again.
Does creating two functions instead of just one big one affect performance? I prefer two, as it better matches how I think, but don't mind combining them if that's "best practice".
Focus on being able to read and easily understand your code.
Once you've done this, if you have a performance problem, then look into what might be causing it.
Most languages, python included, tend to have fairly low overhead for making method calls. Putting this code into a single function is not going to (dramatically) change the performance metrics - I'd guess that your random number generation will probably be the bulk of the time, not having 2 functions.
That being said, splitting functions does have a (very, very minor) impact on performance. However, I'd think of it this way - it may take you from going 80 mph on the highway to 79.99mph (which you'll never really notice). The important things to watch for are avoiding stoplights and traffic jams, since they're going to make you have to stop altogether...
In almost all cases, "inlining" functions to increase speed is like getting a hair cut to lose weight.
Reed is right. For the change you're considering, the cost of a function call is a small number of cycles, and you'd have to be doing it 10^8 or so times per second before you'd notice.
However, I would caution that often people go to the other extreme, and then it is as if function calls were costly. I've seen this in over-designed systems where there were many layers of abstraction.
What happens is there is some human psychology that says if something is easy to call, then it is fast. This leads to writing more function calls than strictly necessary, and when this occurs over multiple layers of abstraction, the wastage can be exponential.
Following Reed's driving example, a function call can be like a detour, and if the detour contains detours, and if those also contain detours, soon there is tremendous time being wasted, for no obvious reason, because each function call looks innocent.
Like others have said, I wouldn't worry about it in this particular scenario. The very small overhead involved in function calls would pale in comparison to what is done inside each function. And as long as these functions don't get called in rapid succession, it probably wouldn't matter much anyway.
It is a good question though. In some cases it's best not to break code into multiple functions. For example, when working with math intensive tasks with nested loops it's best to make as few function calls as possible in the inner loop. That's because the simple math operations themselves are very cheap, and next to that the function-call-overhead can cause a noticeable performance penalty.
Years ago I discovered the hypot (hypotenuse) function in the math library I was using in a VC++ app was very slow. It seemed ridiculous to me because it's such a simple set of functionality -- return sqrt(a * a + b * b) -- how hard is that? So I wrote my own and managed to improve performance 16X over. Then I added the "inline" keyword to the function and made it 3X faster than that (about 50X faster at this point). Then I took the code out of the function and put it in my loop itself and saw yet another small performance increase. So... yeah, those are the types of scenarios where you can see a difference.