Difference between PyInt_FromLong and Py_BuildValue - python

The Python/C API has a number of related functions that perform similar operations where one is usually for general use and another is somehow more efficient or convenient for a specific situation.
For example, PyDict_SetItem and PyDict_SetItemString are the same but the latter is specialized for a C string as a key. For PyList_SetItem and PyList_SET_ITEM, the second has no error checking and doesn't decref the existing item (explained here).
But what is the reason for having special conversions like PyInt_FromLong when there is always Py_BuildValue? Both return a new reference to a Python object and seem to do the exact same thing. This same question applies to functions like PyInt_FromSsize_t, PyTuple_Pack, etc. (all compared to Py_BuildValue with the appropriate format string of course).
When should you use one over the other?

For me, it is a trade-off between performance and simplicity. For code that I've written, I tend to use Py_BuildValue when I don't care about the running time (i.e. returning a set of values that won't change, or the running time is so long the extra overhead doesn't matter.) For performance critical functions, especially those that can be at the innermost level of a loop, I will use the use lowest level API call that I can.
Whether it is worth it or not is a different question. I'll never recoup the time I've spent trying to get the best performance but I know other applications that use my code are measurably faster.

Related

Is there any way to "call a function" in python without incurring the usual performance hit?

Captain Hindsight, reporting in:
After reading through the comment and answer and running a few tests, I found out that I had made a subtle error in my calculations. Turns out, I was
comparing compiled lookups to interpreted calls. When I precompiled
the call using the NON-IPython line magic version ( ie:
timeit.timeit(codestr, setup_codestr), I found that the function calls
were indeed on the same order of magnitude as the lookups :)
Now there's a whole world of caching function results, precompiling functions, and precompiling types to explore! ..and that's nice :)
For posterity:
I realize that sounds like a strange question, but someone might know a way around this, and that would be great. So here goes:
If I do something like:
%%timeit somelist[42]
Then I get times in the 90 nanosecond range. A slice will get it up to 190ish; and, to my pleasant surprise, even big crazy ones were still fast. This bad boy, for instance, weighs in at 385 nanseconds:
%%timeit some_nested_list[2:5][1][6:13]
Here's the thing. Function calls, it seems, are a lot slower than that. I like decomposing problems functionally, and am starting to give functional programming a bit more thought, but the speed difference is significant and (3.34 microseconds vs 100-150 nanoseconds (realistic actual avgs of conditionals, etc)). The following takes 3.34 micros:
def func():
some_nested_list[2:5][1][6:13]
%%timeit func()
So, there's presumably a lot of functional programmers out there? You all must have dealt with this little hiccup? Someone care to point me in the right direction?
Not really. Python function calls involve a certain amount of overhead for setting up the stack frame, etc., and you can't eliminate that overhead while still writing a Python function. The reason the operations in your example are fast is that you're doing them on a list, and lists are written in C.
One thing to keep in mind is that, in many practical situations, the function call overhead will be small relative to what the function actually does. See this question for some discussion. However, if you move toward a pure-functional style in which each function just evaluates one expression, you may indeed suffer a performance penalty.
An alternative is to look at PyPy, which makes many pure-Python operations faster. I don't know whether it improves function call speed specifically. Also, by using PyPy you restrict the set of libraries you can use.
Finally, there is Cython, which allows you to write code in a language that looks basically the same as Python, but actually compiles to C. This can be much faster than Python in some cases.
The bottom line is that how to speed up your functions depends on what your functions actually do. There is no magic way to just magically make all function calls faster while still keeping everything else about Python the same. If there were, they probably would have already added it to Python.

Is Python allowed to optimize a function definition to eliminate unused code?

If I defined a function like this:
def ccid_year(seq):
year, prefix, index, suffix = seq
return year
Is Python allowed to optimize it to be effectively:
def ccid_year(seq):
return seq[0]
I'd prefer to write the first function because it documents the format of the data being passed in but would hope that Python would generate code that is effectively as efficient as the second definition.
The two functions are not equivalent:
def ccid_year_1(seq):
year, prefix, index, suffix = seq
return year
def ccid_year_2(seq):
return seq[0]
arg = {1:'a', 2:'b', 0:'c', 3:'d'}
print ccid_year_1(arg)
print ccid_year_2(arg)
The first call prints 0 and the second prints c.
I'll answer the question at face value later, but first: When in doubt, benchmark it! But first, recall that most time is spent in a small portion of the code (i.e., most code is irrelevant to performance!) and, in CPython, function call overhead usually dominates small inefficiencies. Not to mention that large-scale algorithmic inefficiencies (a.k.a. freaking stupid code) dwarfs micro-optimization concerns.
So either don't worry about this at all, or if you have reason to worry about it, first benchmark alternatives and second don't put it in a function. Note that "reasons to worry about it" must be weighted against the time spent worrying, and the maintenance burden (if there is one) of the manual optimization.
CPython, the reference implementation you most like use, is very conservative about optimizing at this level. While there is a peephole optimizer operating on bytecode, it is limited in scale. More generally, you can't expect much optimization crossing a single statement. The problem with statically optimizing Python code is that there's a billion ways even the most innocently-looking program frament can call into arbitrary code, which might do anything at all, so you can't omit these calls.
While we're at it, your proposed optimization is invalid (in the sense that the program doesn't have the same behavior) if seq is of the wrong type (not a sequence, or a very weird sequence) or length (not exactly three items long)! Any program claiming to implement Python must maintain such differences, so it won't do the transformation you suggest literally. I assume this was just an off-hand illustration, but it does indicate you seriously underestimate how complex Python is (to implement, and doubly so to optimize). I and others have written about this at length before, so I'll stop now before this post becomes even larger.
PyPy on the other hand will, if this function is indeed called from a hot loop, probably optimize this and a million other things you didn't even think of, while compiling it down to a machine code loop that iterates faster than any Python loop could ever iterate on CPython. It will still contain a few checks to break out of the loop and take the proper action (e.g. raise an exception) if necessary, but they'll also be highly efficient if not triggered.
I do not know much about IronPython and Jython and other implementations, but if their lack of consistent several-times-faster-than-CPython benchmark results is any indicator, they do not perform significant optimizations. While the VMs IronPython and Jython include JIT compilers (not - but not quite - entirely unlike PyPy's), these JIT compilers are built for very different languages, and I'd be very surprised if they could look through the mess of code IronPython/Jython must execute to achieve Python semantics and perform such optimizations on it.

How much slower python classes are compared to their equivalent functions?

When I started learning Python, I created a few applications just using functions and procedural code. However, now I know classes and realized that the code can be much readable (and subjectively easier to understand) if I rewrite it with classes.
How much slower the equivalent classes may get compared to the functions in general? Will the initializer, methods in the classes make any considerable difference in speed?
To answer the question: yes, it is likely to be a little slower, all else being equal. Some things that used to be variables (including functions) are now going to be object attributes, and self.foo is always going to be slightly slower than foo regardless of whether foo was a global or local originally. (Local variables are accessed by index, and globals by name, but an attribute lookup on an object is either a local or a global lookup, plus an additional lookup by name for the attribute, possibly in multiple places.) Calling a method is also slightly slower than calling a function -- not only is it slower to get the attribute, it is also slower to make the call, because a method is a wrapper object that calls the function you wrote, adding an extra function call overhead.
Will this be noticeable? Usually not. In rare cases it might be, say if you are accessing an object attribute a lot (thousands or millions of times) in a particular method. But in that case you can just assign self.foo to a local variable foo at the top of the method, and reference it by the local name throughout, to regain 99.44% of the local variable's performance advantage.
Beyond that there will be some overhead for allocating memory for instances that you probably didn't have before, but unless you are constantly creating and destroying instances, this is likely a one-time cost.
In short: there will be a likely-minor performance hit, and where the performance hit is more than minor, it is easy to mitigate. On the other hand, you could save hours in writing and maintaining the code, assuming your problem lends itself to an object-oriented solution. And saving time is likely why you're using a language like Python to begin with.
No.
In general you will not notice any difference in performance based on using classes or not. The different code structures implied may mean that one is faster than the other, but it's impossible to say which.
Always write code to be read, then if, and only if, it's not fast enough make it faster. Remember: Premature optimization is the root of all evil.
Donald Knuth, one of the grand old minds of computing, is credited with the observation that "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil." Deciding to use procedural techniques rather than object-oriented ones on the basis of speed gains that may well not be realized anyway is not a sensible strategy.
If your code works and doesn't need to be modified then feel free to leave it alone. If it needs to be modified then you should consider a judicious refactoring to include classes, since program readability is far more important than speed during development. You will also see benefits in improved maintainability. An old saw from Kernighan and Plauger's "Elements of Programming Style" still applies:
First, make it work. Then (if it doesn't work fast enough) make it work faster.
But, first and foremost, go for readability. Seriously.
You probably don't care as much as you think you do.
Really.
Sure, code with classes might be a little slower through indirection. Maybe. That is what JIT compilation is for, right? I can never remember which versions of python do this and which don't, because:
Performance doesn't matter.
At least constant performance differences like this. Unless you are doing a hell of a lot of computations (you aren't!), you will spend more time developing/debugging/maintaining your code. Optimize for that.
Really. Because you will never ever be able to measure the difference, unless you are in a tight loop. And you don't want to be doing that in python anyway, unless you don't really care about time. It's not like you're trying to balance your segway in python, right? You just want to compute some numbers, right? Your computer is really good at this. Trust it.
That said, this doesn't mean classes are the way to go. Just that speed isn't the question you should be asking. Instead, try to figure out what representation will be the best for your code. It seems, now you know classes, you will write clean code in OO fashion. Go ahead. Learn. Iterate.

how fast is python's slice

In order to save space and the complexity of having to maintain the consistency of data between different sources, I'm considering storing start/end indices for some substrings instead of storing the substrings themselves. The trick is that if I do so, it's possible I'll be creating slices ALL the time. Is this something to be avoided? Is the slice operator fast enough I don't need to worry? How about the new object creation/destruction overhead?
Okay, I learned my lesson. Don't optimize unless there's a real problem you're trying to fix. (Of course this doesn't mean to right needlessly bad code, but that's beside the point...) Also, test and profile before coming to stack overflow. =D Thanks everyone!
Fast enough as opposed to what? How do you do it right now? What exactly are you storing, what exactly are you retrieving? The answer probably highly depends on this. Which brings us to ...
Measure! Don't discuss and analyze theoretically; try and measure what is the more performant way. Then decide whether the possible performance gain justifies refactoring your database.
Edit: I just ran a test measuring string slicing versus lookup in a dict keyed on (start, end) tuples. It suggests that there's not much of a difference. It's a pretty naive test, though, so take it with a pinch of salt.
In a comment the OP mentions bloat "in the database" -- but no information regarding what database he's talking about; from the scant information in that comment it would seem that Python string slices aren't necessarily what's involved, rather, the "slicing" would be done by the DB engine upon retrieval.
If that's the actual situation then I would recommend on general principles against storing redundant information in the DB -- a "normal form" (maybe in a lax sense of the expression;-) whereby information is stored just once and derived information is recomputed (or cached charge of the DB engine, etc;-) should be the norm, and "denormalization" by deliberately storing derived information very much the exception and only when justified by specific, well measured retrieval-performance needs.
If the reference to "database" was a misdirection;-), or rather used in a lax sense as I did for "normal form" above;-), then another consideration may apply: since Python strings are immutable, it would seem to be natural to not have to do slices by copying, but rather have each slice reuse part of the memory space of the parent it's being sliced from (much as is done for numpy arrays' slices). However that's not currently part of the Python core. I did once try a patch to that purpose, but the problem of adding a reference to the big string and thus making it stay in memory just because a tiny substring thereof is still referenced loomed large for general-purpose adaptation. Still it would be possible to make a special purpose subclass of string (and one of unicode) for the case in which the big "parent" string needs to stay in memory anyway. Currently buffer does a tiny bit of that, but you can't call string methods on a buffer object (without explicitly copying it to a string object first), so it's only really useful for output and a few special cases... but there's no real conceptual block against adding string method (I doubt that would be adopted in the core, but it should be decently easy to maintain as a third party module anyway;-).
The worth of such an approach can hardly be solidly proven by measurement, one way or another -- speed would be very similar to the current implicitly-copying approach; the advantage would come entirely in terms of reducing memory footprint, which wouldn't so much make any given Python code faster, but rather allow a certain program to execute on a machine with a bit less RAM, or multi-task better when several instances are being used at the same time in separate processes. See rope for a similar but richer approach once experimented with in the context of C++ (but note it didn't make it into the standard;-).
I haven't done any measurements either, but since it sounds like you're already taking a C approach to a problem in Python, you might want to take a look at Python's built-in mmap library:
Memory-mapped file objects behave like both strings and like file objects. Unlike normal string objects, however, these are mutable. You can use mmap objects in most places where strings are expected; for example, you can use the re module to search through a memory-mapped file. Since they’re mutable, you can change a single character by doing obj[index] = 'a', or change a substring by assigning to a slice: obj[i1:i2] = '...'. You can also read and write data starting at the current file position, and seek() through the file to different positions.
I'm not sure from your question if that's exactly what you're looking for. And it bears repeating that you need to take some measurements. Python's timeit library is the easy one to use, but there's also cProfile or hotshot, although hotshot is at risk of being removed from the standard library as I understand it.
Would slices be ineffective because they create copies of the source string? This may or may not be an issue. If it turns out to be an issue, would it not be possible to simply implement a "String view"; an object that has a reference to the source string and has a start and end point.. Upon access/iteration, it just reads from the source string.
premature optimization is the rool of all evil.
Prove to yourself that you really have a need to optimize code, then act.

Does creating separate functions instead of one big one slow processing time?

I'm working in the Google App Engine environment and programming in Python. I am creating a function that essentially generates a random number/letter string and then stores to the memcache.
def generate_random_string():
# return a random 6-digit long string
def check_and_store_to_memcache():
randomstring = generate_random_string()
#check against memcache
#if ok, then store key value with another value
#if not ok, run generate_random_string() again and check again.
Does creating two functions instead of just one big one affect performance? I prefer two, as it better matches how I think, but don't mind combining them if that's "best practice".
Focus on being able to read and easily understand your code.
Once you've done this, if you have a performance problem, then look into what might be causing it.
Most languages, python included, tend to have fairly low overhead for making method calls. Putting this code into a single function is not going to (dramatically) change the performance metrics - I'd guess that your random number generation will probably be the bulk of the time, not having 2 functions.
That being said, splitting functions does have a (very, very minor) impact on performance. However, I'd think of it this way - it may take you from going 80 mph on the highway to 79.99mph (which you'll never really notice). The important things to watch for are avoiding stoplights and traffic jams, since they're going to make you have to stop altogether...
In almost all cases, "inlining" functions to increase speed is like getting a hair cut to lose weight.
Reed is right. For the change you're considering, the cost of a function call is a small number of cycles, and you'd have to be doing it 10^8 or so times per second before you'd notice.
However, I would caution that often people go to the other extreme, and then it is as if function calls were costly. I've seen this in over-designed systems where there were many layers of abstraction.
What happens is there is some human psychology that says if something is easy to call, then it is fast. This leads to writing more function calls than strictly necessary, and when this occurs over multiple layers of abstraction, the wastage can be exponential.
Following Reed's driving example, a function call can be like a detour, and if the detour contains detours, and if those also contain detours, soon there is tremendous time being wasted, for no obvious reason, because each function call looks innocent.
Like others have said, I wouldn't worry about it in this particular scenario. The very small overhead involved in function calls would pale in comparison to what is done inside each function. And as long as these functions don't get called in rapid succession, it probably wouldn't matter much anyway.
It is a good question though. In some cases it's best not to break code into multiple functions. For example, when working with math intensive tasks with nested loops it's best to make as few function calls as possible in the inner loop. That's because the simple math operations themselves are very cheap, and next to that the function-call-overhead can cause a noticeable performance penalty.
Years ago I discovered the hypot (hypotenuse) function in the math library I was using in a VC++ app was very slow. It seemed ridiculous to me because it's such a simple set of functionality -- return sqrt(a * a + b * b) -- how hard is that? So I wrote my own and managed to improve performance 16X over. Then I added the "inline" keyword to the function and made it 3X faster than that (about 50X faster at this point). Then I took the code out of the function and put it in my loop itself and saw yet another small performance increase. So... yeah, those are the types of scenarios where you can see a difference.

Categories

Resources