I have around 80 lines of a function in a file. I need the same functionality in another file so I am currently importing the other file for the function.
My question is that in terms of running time on a machine which technique would be better :- importing the complete file and running the function or copying the function as it is and run it from same package.
I know it won't matter in a large sense but I want to learn it in the sense that if we are making a large project is it better to import a complete file in Python or just add the function in the current namespace.....
Importing is how you're supposed to do it. That's why it's possible. Performance is a complicated question, but in general it really doesn't matter. People who really, really need performance, and can't be satisfied by just fixing the basic algorithm, are not using Python in the first place. :) (At least not for the tiny part of the project where the performance really matters. ;) )
Importing is good cause it helps you manage stuff easily. What if you needed the same function again? Instead of making changes at multiple places, there is just one centralized location - your module.
In case the function is small and you won't need it anywhere else, put it in the file itself.
If it is complex and would require to be used again, separate it and put it inside a module.
Performance should not be your concern here. It should hardly matter. And even if it does, ask yourself - does it matter to you?
Copy/Paste cannot be better. Importing affects load-time performance, not run-time (if you import it at the top-level).
The whole point of importing is to allow code reuse and organization.
Remember too that you can do either
import MyModule
to get the whole file or
from MyModule import MyFunction
for when you only need to reference that one part of the module.
If the two modules are unrelated except for that common function, you may wish to consider extracting that function (and maybe other things that are related to that function) into a third module.
Related
This may be a subjective question so I understand if it gets shut down, but this is something I've been wondering about ever since I started to learn python in a more serious way.
Is there a generally accepted 'best practice' about whether importing an additional module to accomplish a task more cleanly is better than avoiding the call and 'working around it'?
For example, I had some feedback on a script I worked on recently, and the suggestion was that I could have replaced the code below with a glob.glob() call. I avoided this at the time, because it meant adding another import that seemed unecessary to me (and the actual flow of filtering the lines just meshed with my thought process for the task).
headers = []
with open(hhresult_file) as result_fasta:
for line in result_fasta:
if line.startswith(">"):
line = line.split("_")[0]
headers.append(line.replace(">",""))
Similarly, I decided to use an os.rename() call later in the script for moving some files rather than import shutil.
Is there a right answer here? Are there any overheads associated with calling additional modules and creating more dependencies (lets say for instance, that the module wasn't a built-in python module) vs. writing a slightly 'messier' code using modules that are already in your script?
This is quite a broad question, but I'll try to answer it succinctly.
There is no real best practice, however, it is generally a good idea to recycle code that's already been written by others. If you find a bug in the imported code, it's more beneficial than finding one in your own code because you can submit a ticket to the author and have it fixed for potentially a large group of people.
There are certainly considerations to be made when making additional imports, mostly when they are not part of the Python standard library.
Sometimes adding in a package that is a little bit too 'magical' makes code harder to understand, because it's another library or file that somebody has to look up to understand what is going on, versus just a few lines that might not be as sophisticated as the third party library, but get the job done regardless.
If you can get away with not making additional imports, you probably should, but if it would save you substantial amounts of time and headache, it's probably worth importing something that has been pre-written to deal with the problem you're facing.
It's a continual consideration that has to be made.
Python is extremely elegant language. Well, except... except imports. I still can't get it work the way it seems natural to me.
I have a class MyObjectA which is in file mypackage/myobjecta.py. This object uses some utility functions which are in mypackage/utils.py. So in my first lines in myobjecta.py I write:
from mypackage.utils import util_func1, util_func2
But some of the utility functions create and return new instances of MyObjectA. So I need to write in utils.py:
from mypackage.myobjecta import MyObjectA
Well, no I can't. This is a circular import and Python will refuse to do that.
There are many question here regarding this issue, but none seems to give satisfactory answer. From what I can read in all the answers:
Reorganize your modules, you are doing it wrong! But I do not know
how better to organize my modules even in such a simple case as I
presented.
Try just import ... rather than from ... import ...
(personally I hate to write and potentially refactor all the full
name qualifiers; I love to see what exactly I am importing into
module from the outside world). Would that help? I am not sure,
still there are circular imports.
Do hacks like import something in the inner scope of a function body just one line before you use something from other module.
I am still hoping there is solution number 4) which would be Pythonic in the sense of being functional and elegant and simple and working. Or is there not?
Note: I am primarily a C++ programmer, the example above is so much easily solved by including corresponding headers that I can't believe it is not possible in Python.
There is nothing hackish about importing something in a function body, it's an absolutely valid pattern:
def some_function():
import logging
do_some_logging()
Usually ImportErrors are only raised because of the way import() evaluates top level statements of the entire file when called.
In case you do not have a logic circular dependency...
, nothing is impossible in python...
There is a way around it if you positively want your imports on top:
From David Beazleys excellent talk Modules and Packages: Live and Let Die! - PyCon 2015, 1:54:00, here is a way to deal with circular imports in python:
try:
from images.serializers import SimplifiedImageSerializer
except ImportError:
import sys
SimplifiedImageSerializer = sys.modules[__package__ + '.SimplifiedImageSerializer']
This tries to import SimplifiedImageSerializer and if ImportError is raised (due to a circular import error or the it not existing) it will pull it from the importcache.
PS: You have to read this entire post in David Beazley's voice.
Don't import mypackage.utils to your main module, it already exists in mypackage.myobjecta. Once you import mypackage.myobjecta the code from that module is being executed and you don't need to import anything to your current module, because mypackage.myobjecta is already complete.
What you want isn't possible. There's no way for Python to know in which order it needs to execute the top-level code in order to do what you ask.
Assume you import utils first. Python will begin by evaluating the first statement, from mypackage.myobjecta import MyObjectA, which requires executing the top level of the myobjecta module. Python must then execute from mypackage.utils import util_func1, util_func2, but it can't do that until it resolves the myobjecta import.
Instead of recursing infinitely, Python resolves this situation by allowing the innermost import to complete without finishing. Thus, the utils import completes without executing the rest of the file, and your import statement fails because util_func1 doesn't exist yet.
The reason import myobjecta works is that it allows the symbols to be resolved later, after the body of every module has executed. Personally, I've run into a lot of confusion even with this kind of circular import, and so I don't recommend using them at all.
If you really want to use a circular import anyway, and you want them to be "from" imports, I think the only way it can reliably work is this: Define all symbols used by another module before importing from that module. In this case, your definitions for util_func1 and util_func2 must be before your from mypackage.myobjecta import MyObjectA statement in utils, and the definition of MyObjectA must be before from mypackage.utils import util_func1, util_func2 in myobjecta.
Compiled languages like C# can handle situations like this because the top level is a collection of definitions, not instructions. They don't have to create every class and every function in the order given. They can work things out in whatever order is required to avoid any cycles. (C++ does it by duplicating information in prototypes, which I personally feel is a rather hacky solution, but that's also not how Python works.)
The advantage of a system like Python is that it's highly dynamic. Yes you can define a class or a function differently based on something you only know at runtime. Or modify a class after it's been created. Or try to import dependencies and go without them if they're not available. If you don't feel these things are worth the inconvenience of adhering to a strict dependency tree, that's totally reasonable, and maybe you'd be better served by a compiled language.
Pythonistas frown upon importing from a function. Pythonistas usually frown upon global variables. Yet, I saw both and don't think the projects that used them were any worse than others done by some strict Pythhonistas. The feature does exist, not going into a long argument over its utility.
There's an alternative to the problem of importing from a function: when you import from the top of a file (or the bottom, really), this import will take some time (some small time, but some time), but Python will cache the entire file and if another file needs the same import, Python can retrieve the module quickly without importing. Whereas, if you import from a function, things get complicated: Python will have to process the import line each time you call the function, which might, in a tiny way, slow your program down.
A solution to this is to cache the module independently. Okay, this uses imports inside function bodies AND global variables. Wow!
_MODULEA = None
def util1():
if _MODULEA is None:
from mymodule import modulea as _MODULEA
obj = _MODULEA.ClassYouWant
return obj
I saw this strategy adopted with a project using a flat API. Whether you like it or not (and I'm not sure about that myself), it works and is fast, because the import line is executed only once (when the function first executes). Still, I would recommend restructuring: problems with circular imports show a problem in structure, usually, and this is always worth fixing. I do agree, though, it would be nice if Python provided more useful errors when this kind of situation happens.
I have a bunch of Python modules I want to clean up, reorganize and refactor (there's some duplicate code, some unused code ...), and I'm wondering if there's a tool to make a map of which module uses which other module.
Ideally, I'd like a map like this:
main.py
-> task_runner.py
-> task_utils.py
-> deserialization.py
-> file_utils.py
-> server.py
-> (deserialization.py)
-> db_access.py
checkup_script.py
re_test.py
main_bkp0.py
unit_tests.py
... so that I could tell which files I can start moving around first (file_utils.py, db_access.py), which files are not used by my main.py and so could be deleted, etc. (I'm actually working with around 60 modules)
Writing a script that does this probably wouldn't be very complicated (though there are different syntaxes for import to handle), but I'd also expect that I'm not the first one to want to do this (and if someone made a tool for this, it might include other neat features such as telling me which classes and functions are probably not used).
Do you know of any tools (even simple scripts) that assist code reorganization?
Do you know of a more exact term for what I'm trying to do? Code reorganization?
Python's modulefinder does this. It is quite easy to write a script that will turn this information into an import graph (which you can render with e.g. graphviz): here's a clear explanation. There's also snakefood which does all the work for you (and using ASTs, too!)
You might want to look into pylint or pychecker for more general maintenance tasks.
Writing a script that does this probably wouldn't be very complicated (though there are different syntaxes for import to handle),
It's trivial. There's import and from module import. Two syntax to handle.
Do you know of a more exact term for what I'm trying to do? Code reorganization?
Design. It's called design. Yes, you're refactoring an existing design, but...
Rule One
Don't start a design effort with what you have. If you do, you'll only "nibble around the edges" making small and sometimes inconsequential changes.
Rule Two
Start a design effort with what you should have had if you'd only been smarter. Think broadly and clearly about what you're really supposed to be doing. Ignore what you did.
Rule Three
Design from the ground up (or de novo as some folks say) with the correct package and module architecture.
Create a separate project for this.
Rule Four
Test First. Write unit tests for your new architecture. If you have existing unit tests, copy them into the new project. Modify the imports to reflect the new architecture and rewrite the tests to express your glorious new simplification.
All the tests fail, because you haven't moved any code. That's a good thing.
Rule Five
Move code into the new structure last. Stop moving code when the tests pass.
You don't need to analyze imports to do this, BTW. You're just using grep to find modules and classes. The old imports and the tangled relationships among the old imports doesn't matter, and doesn't need to be analyzed. You're throwing it away. You don't need tools smarter than grep.
If feel an urge to move code, you must be very disciplined. (1) you must have test(s) which fail and then (2) you can move some code to pass the failing test(s).
chuckmove is a tool that lets you recursively rewrite imports in your entire source tree to refer to a new location of a module.
chuckmove --old sound.utils --new media.sound.utils src
...this descends into src, and rewrites statements that import sound.utils to import media.sound.utils instead. It supports the whole range of Python import formats. I.e. from x import y, import x.y.z as w etc.
Modulefinder may not work with Python 3.5*, but pydeps worked very well:
Installation:
sudo apt install python-pygraphviz
pip install pydeps
Then, in the directory where you want to map from,
pydeps --max-bacon=0 .
..to create a map of maximum depth.
*An issue in Python 3.5 but not 3.6 caused the problems with modulefinder, similar to this
I'm just wondering, I often have really long python files and imports tend to stack quite quickly.
PEP8 says that the imports should always be written at the beginning of the file.
Do all the imported libraries get imported when calling a function coded in the file? Or do only the necessary libraries get called?
Does it make sense to worry about this? Is there no reason to import libraries within the functions or classes that need them?
Every time Python hits an import statement, it checks to see if that module has already been imported, and if not, imports it. So the imports at the top of your file will happen as soon as your file is run or imported by another module.
There is some overhead to this, so it's generally best to keep imports at the top of your file so that cost gets taken care of up front.
The best place for imports is at the top of your file. That documents the dependencies in one place and makes errors from their absence appear earlier. The import itself actually occurs at the time of the import statement, but this seldom matters much.
It is not typical that you have anything to gain by not importing a library until you are in a function or method that needs it. (There is never anything to gain by doing so inside the body of a class.) It is rare that you want optional dependencies and even rarer that this is the right technique to get them, though. Perhaps you can share a compelling use case?
Does it make sense to worry about
this?
No
There no reason to import libraries within the functions or classes that need them.
It's just slow because the import statement has to check to see if it's been imported once, and realize that it has been imported.
If you put this in a function that's called frequently, you can waste some time with all the import checking.
Imports happen when the module that contains the imports gets executed or imported, not when the functions are called.
Ordinarily, I wouldn't worry about it. If you are encountering slowdowns, you might profile to see if your problem is related to this. If it is, you can check to see if your module can divided up into smaller modules.
But if all the files are getting used by the same program, you'll just end up importing everything anyway.
If a function inside a module is the only one to import a given other module (say you have a function sending tweets, only if some configuration option is on), then it makes sense to import that specific module in the function.
Unless I see some profiling data proving otherwise, my guess is that the overhead of an import statement in a function is completely negligible.
Let from module import function be called the FMIF coding style.
Let import module be called the IM coding style.
Let from package import module be called the FPIM coding style.
Why is IM+FPIM considered a better coding style than FMIF? (See this post for the inspiration for this question.)
Here are some criteria which lead me to prefer FMIF over IM:
Shortness of code: It allows me to use shorter function names and thus help stick to the 80 columns-per-line convention.
Readability: chisquare(...) appears more readable than scipy.stats.stats.chisquare(...). Although this is a subjective criterion, I think most people would agree.
Ease of redirection: If I use FMIF and for some reason at some later time want to redirect python to define function from alt_module instead of module I need to change just one line: from alt_module import function. If I were to use IM, I'd need to change many lines of code.
I realize FPIM goes some way to nullifying the first two issues, but what about the third?
I am interested in all reasons why IM+FPIM may be better than FMIF,
but in particular, I'd be interested in elaboration on the following points mentioned here:
Pros for IM:
ease of mocking/injecting in tests. (I am not very familiar with mocking, though I recently learned what the term means. Can you show code which demonstrates how IM is better than FMIF here?)
ability for a module to change flexibly by redefining some entries. (I must be misunderstanding something, because this seems to be an advantage of FMIF over IM. See my third reason in favor of FMIF above.)
predictable and controllable behavior on serialization and recovery of your data. (I really don't understand how the choice of IM or FMIF affects this issue. Please elaborate.)
I understand that FMIF "pollutes my namespace", but beyond being a negative-sounding phrase, I don't appreciate how this hurts the code in any concrete way.
PS. While writing this question I received a warning that the question appears subjective and is likely to be closed. Please don't close it. I'm not looking for subjective opinion, but rather concrete coding situations where IM+FPIM is demonstrably better than FMIF.
Many thanks.
The negatives you list for IM/FPIM can often be ameliorated by appropriate use of an as clause. from some.package import mymodulewithalongname as mymod can usefully shorten your code and enhance its readability, and if you rename mymodulewithalongname to somethingcompletelydifferent tomorrow, the as clause can be used as a single statement to edit.
Consider your pro-FMIF point 3 (call it R for redirection) vs your pro-FPIM point 2 (call it F for flexibility): R amounts to facilitating the loss of integrity of module boundaries, while F strenghtens it. Multiple functions, classes and variables in a module are often intended to work together: they should not be independently switched to different meanings. For example, consider module random and its functions seed and uniform: if you were to switch the import of just one of them to a different module, then you'd break the normal connection between calls to seed and results of calls to uniform. When a module is well designed, with cohesion and integrity, R's facilitation of breaking down the module's boundaries is actually a negative -- it makes it easier to do something you're better off not doing.
Vice versa, F is what enables coordinated switching of coupled functions, classes, and variables (so, generally, of entities that belong together, by modularity). For example, to make testing repeatable (FPIM pro-point 1), you mock both seed and random in the random module, and if your code follows FPIM, you're all set, coordination guaranteed; but if you have code that has imported the functions directly, you have to hunt down each such module and repeat the mocking over and over and over again. Making tests perfectly repeatable typically also requires "coordinated mocking" of date and time functions -- if you use from datetime import datetime in some modules, you need to find and mock them all (as well as all those doing from time import time, and so forth) to ensure that all the times received when the various parts of the system ask "so what time is it now?" are perfectly consistent (if you use FPIM, you just mock the two relevant modules).
I like FPIM, because there's really not much added value by using a multiply qualified name rather than a singly qualified one (while the difference between barenames and qualified names is huge -- you get so much more control with a qualified name, be it singly or multiply, than you possibly ever can with a barename!).
Ah well, can't devote all of the working day to responding to each and every one of your points -- your question should probably be half a dozen questions;-). I hope this at least addresses "why is F better than R" and some of the mocking/testing issues -- it boils down to preserving and enhancing well-designed modularity (via F) rather than undermining it (via R).
The classic text on this, as so often, is from Fredrik Lundh, the effbot. His advice: always use import - except when you shouldn't.
In other words, be sensible. Personally I find that anything that's several modules deep tends to get imported via from x.y.z import a - the main example being Django models. But as much as anything else it's a matter of style, and you should have a consistent one - especially with modules like datetime, where both the module and the class it contains are called the same thing. Do you need to write datetime.datetime.now() or just datetime.now()? (In my code, always the former.)
Items 1 and 2 in your list of questions seem to be the same issue. Python's dynamic nature means it is fairly simple to replace an item in a module's namespace no matter which of the methods you use. The difficulty comes if one function in a module refers to another, which is the one you want to mock. In this case, importing the module rather than the functions means you can do module.function_to_replace = myreplacementfunc and everything works transparently - but that is as easy to do via FPIM as it is via IM.
I also don't understand how item 3 has anything to do with anything. I think your item 4, however, is based on a bit of a misunderstanding. None of the methods you give will 'pollute your namespace'. What does do that is from module import *, where you have no idea at all what you're importing and so functions can appear in your code with no clue given to the reader where they came from. That's horrible, and should be avoided at all costs.
Great answers here (I upvoted them all), and here are my thoughts on this matter:
First, addressing each of your bullets:
(Allegedly) Pros of FMIF:
Shortness of code: shorter function names help stick to the 80 columns-per-line.
Perhaps, but module names are usually short enough so this is not relevant. Sure, there's datetime, but also os, re, sys, etc. And Python has free line breaks inside { [ (. And for nested modules there's always as in both IM and FPIM
Readability: chisquare(...) appears more readable than scipy.stats.stats.chisquare(...).
Strongly disagree. When reading foreign code (or my own code after a few months) it's hard to know where each function comes from. Qualified names saves me from going back and forth from line 2345 to module declarations header. And it also gives you context: "chisquare? What's that? Oh, it's from scypy? Ok, some math-related stuff then". And, once again, you can always abbreviate scipy.stats.stats as scypyst. scypyst.chisquare(...) is short enough with all benefits of a qualified name.
import os.path as osp is another good example, considering it's very common to chain 3 or more of its functions together in a single call: join(expanduser(),basename(splitext())) etc.
Ease of redirection: one-line redefinition of a function from altmodule instead of module.
How often you want to redefine a single function but not whole module? Module boundaries and function coordination should be preserved, and Alex already explained this in great depth. For most (all?) real-world scenarios, if alt_module.x is a viable replacement for module.x, then probably alt_module itself is a drop in alternative for module, so both IM and FPIM are one-liners just like FMIF, provided you use as.
I realize FPIM goes some way to nullifying the first two issues...
Actually, as is the one that mitigates the first 2 issues (and the 3rd), not FPIM. You can use IM for that too: import some.long.package.path.x as x for the same result as FPIM.
So none of the above are really pros of FMIF. And the reasons I prefer IM/FPIM are:
For the sake of simplicity and consistency, when I import something, either IM or FPIM, I'm always importing a module, not an object from a module. Remember FMIF can be (ab-)used to import functions, classes, variables, or even other modules! Think about the mess of from somemodule import sys, somevar, os, SomeClass, datetime, someFunc.
Also, if you want more than a single object from a module, FMIF will pollute your namespace more than IM or FPIM, which will use a single name no matter how many objects you want to use. And such objects will have a qualified name, which is a pro, not a con: as I've said in issue 2, IMHO a it improves readability.
it all comes down to consistency, simplicity, organization. "Import modules, not objects" is a good, easy mind model to stick with.
Like Alex Martelli, I am fond of using as when importing a function.
One thing I have done is to use some prefix on all the functions that were imported from the same module:
from random import seed as r_seed
from random import random as r_random
r_seed is shorter to type than random.seed but somewhat preserves the module boundaries. Someone casually looking at your code can see r_seed() and r_random() and have a chance to grok that they are related.
Of course, you can always simply do:
import random as r
and then use r.random() and r.seed(), which may be the ideal compromise for this case. I only use the prefix trick when I'm importing one or two functions from a module. When I want to use many functions from the same module, I'll import the module, perhaps with an as to shorten the name.
I agree with MestreLion the most here (and so an upvote).
My perspective: I review code frequently that I am unfamiliar with, and not knowing what module a function is coming from just looking at the function is quite frustrating.
Code is written once and read many times, and so readability and maintainability trumps ease of typing.
In a similar vein, typically code is not being written for the benefit of the coder, but for the benefit of another entity.
Your code should be readable to someone who knows python better than you, but is unfamiliar with the code.
Full path imports can also better help IDE's point you at the correct source of the function or object you're looking at.
For all of these reasons and the reasons MestreLion noted, I conclude that it is best practice to import and use the full path.