Is there any way short of reading and parsing a file to see whether a function uses a for loop vs. recursion to get its answer? Assume it can be only one or the other, that the function doesn't raise, and that the function is guaranteed to end.
There is not general method to prove that a computer program finishes (it is proven that there are programs for which you cannot tell). Fortunately, in practise there are static analysis tools, that can do this for most of small programs.
Also, for most of Python functions it is either obvious what complexity they have (for an educated programmer) or it is mentioned in documentation what algorithm is used.
If you are interested in details, you should read a good book or two about algorithms. You can also take a look at https://cs.stackexchange.com/ , which is a site better suited for general discussion about computational complexity of different algorithms.
Under "basic" conditions, you can at least tell if a function will call itself directly:
def test(x):
return test(x)
test.__code__.co_name in test.__code__.co_names # true and co_name can't be changed
test.__name__ in test.__code__.co_names # true but name can be changed
This checks if the name of the function is in the global names referenced by the function.
I say "basic" because there are a bunch of ways someone can probably get around this. Lambdas don't have a name, so they won't match the global name they were bound to. A function could be renamed. The inner code could reference a different name for the function. The function could call a second function, which is the real recursive function. And on and on...
Basically, there is no way to know for sure that a function is recursive.
Related
My idea is to build some kind of caching mechanism for functions. To that end, I need to determine if a function needs to be evaluated. The result of a function depends - for the sake of this example - on its parameters and the actual code of the function. There may be calls to other functions inside the function. Therefore, only the fully inlined code of the function is a useful "value" for determining whether a function needs to be reevaluated.
Is there a good way to get this fully inlined code of a function in python?
Not possible. The "fully inlined code" of a Python function isn't a well-defined concept, for multiple reasons.
First, almost anything a Python function refers to can be redefined at runtime, which invalidates ahead-of-time inlining. You can even replace built-ins like print.
Second, even with no such rebinding, it is impossible to "fully inline" a recursive or indirectly recursive function.
Third, a function can and almost always will invoke code dynamically based on the provided parameters. def f(x): return x.something() requires a concrete value of x to determine what something is. Even something like def f(x, y): return x + y dynamically invokes an __add__ or __radd__ callback that can't be determined until the actual values of x and y are known.
Fourth, Python functions can invoke C code, which cannot be inlined at Python level.
Finally, even if all the problems with "fully inlining" a function didn't exist, a "fully inlined" version of a function still wouldn't be enough. There is no general way to determine if a Python function performs state mutation, or depends on mutable state that has been mutated, both of which are major issues for caching.
Although it's possible in Python, code is not generally in a state of flux if you're concerned that it might change behind your back. There might be something in the dis module that could help you.
Otherwise you could use memoization to cache function results by mapping them to their parameters. You could use the #functools.lru_cache() decorator, but writing a decorator to do that is pretty easy. See What is memoization and how can I use it in Python?
Apart from mutating code (which might even affect pure functions), the value of any function that relies on changing data is also indeterminate, e.g. some function that returns the latest stock price for a company. Memoization would not help here, nor would producing a function code signature/hash to detect changes in code.
I just want to check whether a particular function is called by other function or not. If yes then I have to store it in a different category and the function that does not call a particular function will be stored in different category.
I have 3 .py files with classes and functions in them. I need to check each and every function. e.g. let's say a function trial(). If a function calls this function, then that function is in example category else non-example.
I have no idea what you are asking, but even if it is be technically possible, the one and only answer: don't do that.
If your design is as such that method A needs to know whether it was called from method B or C; then your design is most likely ... broken. Having such dependencies within your code will quickly turn the whole thing un-maintainable. Simply because you will very soon be constantly asking yourself "that path seems fine, but will happen over here?"
One way out of that: create different methods; so that B can call something else as C does; but of course, you should still extract the common parts into one method.
Long story short: my non-answer is: take your current design; and have some other people review it. As you should step back from whatever you are doing right now; and find a way to it differently! You know, most of the times, when you start thinking about strange/awkward ways to solve a problem within your current code, the real problem is your current code.
EDIT: given your comments ... The follow up questions would be: what is the purpose of this classification? How often will it need to take place? You know, will it happen only once (then manual counting might be an option), or after each and any change? For the "manual" thing - ideas such as pycharm are pretty good in analyzing python source code, you can do simple things like "search usages in workspaces" - so the IDE lists you all those methods that invoke some method A. But of course, that works only for one level.
The other option I see: write some test code that imports all your methods; and then see how far the inspect module helps you. Probably you could really iterate through a complete method body and simply search for matching method names.
I was writing some Python code and, as usual, I try to make my functions small and give them a clear name (although sometimes a little too long). I get to the point where there are no global variables and everything a function needs is passed to it.
But I thought, in this case, every function has access to any other function. Why not limit their access to other functions just like we limit the access to other variables.
I was thinking to use nested functions but that implies closures and that's even worse for my purpose.
I was also thinking about using objects and I think this is the point of OOP, although it'll be a little too much boilerplate in my case.
Has anyone got this problem on her/his mind and what's the solution.
It is not a good idea to have global mutable data, e.g. variables. The mutability is the key here. You can have constants and functions to your hearts content.
But as soon as you write functions that rely on globally mutable state it limits the reusability of your functions - they're always bound to that one shared state.
For the sake of everyone reading your code, grouping the functions into classes will help to mentally categorize them. Using the class self parameter helps to organize the variables, too, by grouping them in a class.
You can limit their access with a single leading underscore at the beginning of the function name.
Global variables are discouraged because they make it hard to keep track of the state of the program. If I'm debugging a 1,000-line file, and somewhere in the middle of a function I see some_well_named_flag = False, I'm going to have a lot of hunting to do to see how else it affects what else in the program.
Functions don't have state. The places where they can modify the program are more or less limited to the parameters and return value.
If you're still concerned about controlling access to functions, there are other languages like Java or C++ that can help you do that. One convention with Python is to prefix functions that shouldn't be used outside of the class with an underscore, and then trust people not to call them from outside the class.
I read somewhere that it is bad to define functions inside of functions in python, because it makes python create a new function object very time the outer function is called. Someone basically said this:
#bad
def f():
def h():
return 4
return h()
#faster
def h():
return 4
def f(h=h):
return h()
Is this true? Also, what about a case when I have a ton of constants like this:
x = # long tuple of strings
# and several more similar tuples
# which are used to build up data structures
def f(x):
#parse x using constants above
return parse dictionary
Is it faster if I put all the constants inside the definition of f? Or should I leave them outside and bind them to local names in a keyword argument? I don't have any data to do timings with unfortunately, so I guess I'm asking about your experiences with similar things.
Short answers to you questions - it is true. Inner functions are created each time outer function is called, and this takes some time. Also access to objects, which were defined outside of a function, is slower, comparing to access to local variables.
However, you also asked more important question - "should I care?". Answer to this, almost always, NO. Performance difference will be really minor, and readability of your code is much more valuable.
So, if you think that this function belongs to body of other function and makes no sense elsewhere - just put it inside and do not care about performance (at least, until your profiler tells you otherwise).
When a function is executed, all the code that is inside needs to be executed. So of course, very simply said, the more you put in the function, the more effort it takes Python to execute this. Especially when you have constant things that do not need to be constructed at run time of the function, then you can save quite a bit by putting it in an upper scope so that Python only needs to look it up instead of generating it again and allocating (temporary) memory to save it for the short run time of a function.
So in your case, if you have a large tuple or anything that does not depend on the input x to the function f, then yes, you should store it outside.
Now the other thing you mentioned is a scope lookup for functions or constants using a keyword argument. In general, looking up variables in an outer scope is more expensive than looking it up in the most local scope. So yes, when you define those constants on module level and you access them inside a function, the lookup will be more expensive than when the constants would be defined inside the function. However, actually defining them inside the function (with memory allocation and actual generation of the data) is likely to be more expensive, so it’s really not a good option.
Now you could pass the constants as a keyword argument to the function, so the lookup inside the function would be a local scope lookup. But very often, you don’t need those constants a lot. You maybe access it once or twice in the function and that’s absolutely not worth adding the overhead of another argument to the function and the possibility to pass something different/incompatible to it (breaking the function).
If you know that you access some global stuff multiple times, then create a local variable at the top of the function which looks that global stuff up once and then use the local variable in all further places. This also applies to member lookup, which can be expensive too.
In general though, these are all rather micro optimizations and you are unlikely to run into any problems if you do it one or the other way. So I’d suggest you to write clear code first and make sure that the rest of it works well, and if you indeed run into performance problems later, then you can check where the issues are.
In my tests, the fastest way to do what I needed to was to define all the constants I needed outside, then make the list of functions who need those constants outside, then pass the list of functions to the main function. I used dis.dis, cProfile.run, and timeit.timeit for my tests, but I can't find the benchmarking script and can't be bothered to rewrite it and put up the results.
I'm writing a Python script, and it needs to print a line to console. Simple enough, but I can't remember (or decide) whether the accepted practice is to print in the function, for example:
def exclaim(a):
print (a + '!')
or have it return a string, like:
def exclaim(a):
return (a + '!')
and then print in the main script. I can make an argument for either method. Is there a generally accepted way to do this or is it up to preference?
EDIT: To clarify, this isn't the function I'm working with. I don't feel comfortable posting the code here, so I wrote those functions as an oversimplified example.
Normally a function should generate and return its output and let the caller decide what to do with it.
If you print the result directly in the function then it's difficult to reuse the function to do something else, and it's difficult to test as you have to intercept stdout rather than just testing the return value.
However in this trivial example it hardly seems necessary to either reuse or test the function, or even to have the function at all.
One of the things I most appreciate about Python is that it encourages proper separation of concerns. There's so little friction in splitting a task down to its atomic components.
Calculating a value and displaying a value are two different things, they should not be mixed together.
One thing to keep in mind is if you print it your function will return nothing, so if you ever plan on using the line your wish to write you should return it.
If you are sure that function could not be re-used, and you formed that function just to enhance code readablilty, you could go with printing it within the function.
Else, if there is any way you can re-use it in any way later, you should return the value instead. The second approach is preferred, as one of the purposes of a function is re-use.
There is no real overhead.
Functions should, in my opinion, return values.
I mix both ways all the time, but I think it could boil down like this:
If the sole purpose of the whole function is output to terminal (print a message to the user), then the function should print. Optionally, you could use some return value as an error status or boolean value, but as a side step. Usually the printed string/message would not have other uses inside the program flow, because the user received the message and the string variable, having been consumed, can be garbage-collected.
If the function actually PERFORMS something, thus returning a real value (even if this value is a string), and you'd like to check that value on the terminal, than you call the function and print its result in another statement. In case of debugging, you just remove or comment that print statement, but the function will continue to perform silently. Just for comparison, if the print statement were inside the function, you'd need to modify the function itself.
As a last consideration, both approaches are somewhat equivalent, but for anyone reading your code (even yourself some time later), the way it is layed out aids to capture the INTENT of the code, in a readability point of view.
Hope this helps
If a function computes values and then prints them immediately, I regularly find that I will eventually want to display or use the values in a different place or in a different way.
I'll decide to write the results to a file, send them out as part of a web page, use them as a partial value for other functions, display them sorted or filtered differently, etc.
It's fine to have functions print computed values immediately when you're prototyping, or if the functions are very simple...but sooner or later, if a program grows into something larger and more valuable, you'll need to convert that simple "just print it!" function into a purer compute-and-return function.
So I try to just start with a return-the-results, print-them-later coding style. It seems cleaner and generally saves me time later. But as with all things stylistic, it's a judgment call, and YMMV.