I have a file that has a thousand line of codes and I'd like to break it into several files. However, I found those functions depends on each other so I have no idea how to decouple those... Here is a simplified example:
import numpy as np
def tensor(data):
return Tensor(data)
class Tensor:
def __init__(self,data):
self.data=data
def __repr__(self):
return f'Tensor({str(self.data)})'
def mean(self):
return mean(self.data)
def mean(data):
value=np.mean(data)
return tensor(value)
What is the best way to separate tensor, Tensor, and mean (put them into 3 different files)? Thanks for your help!!
Having a module that is thousands of lines long isn't that bad. You may not actually need to break it up into different modules. It is common to have a module that has a function alongside a class like your tensor and Tensor in the same module, and there is no reason for mean to be split up into a separate function as that code can just be placed directly in Tensor.mean.
A module should have a specific purpose and be a self contained unit around that purpose. If you are splitting things up just to have smaller files, then that is only going to make your codebase worse. However, large modules are a sign that things may need to be refactored. If you can find good ways of refactoring ideas in the code into smaller ideas, then those smaller units could be given their own modules, otherwise, keep everything as a bigger module.
As for how you can split up code that is coupled together. Here is one of way of splitting up the code into the modules you indicated. Since you have a function, the tensor function, that you would like people to use to get an instance of your Tensor class, it seemed like creating a Python package would be somewhat sensible since packages come with an __init__.py file that is used for establishing the API ie your tensor function. I put the tensor function directly in the __init__.py file, but if the function is pretty large, it can be broken out into a separate module, since the __init__.py file is just suppose to give you an overview of the API being created.
# --- main.py ----
from tensor import tensor
print(tensor([1,2,3]).mean())
# --- tensor/__init__.py ----
'''
Add some documentation here
'''
def tensor(data):
return Tensor(data)
from tensor.Tensor import Tensor
# --- tensor/Tensor.py ----
from tensor import helper
class Tensor:
def __init__(self,data):
self.data=data
def __repr__(self):
return f'Tensor({str(self.data)})'
def mean(self):
return helper.mean(self.data)
# --- tensor/helper.py ----
import numpy as np
from . import tensor
def mean(data):
value=np.mean(data)
return tensor(value)
About circular dependencies
Tensor and helper are importing each other, and this is ok. When the helper module imports Tensor, and Tensor in turn imports helper again, helper will just continue loading normally, and then when it is done Tensor will finish loading. Now if you had stuff on the module level (code outside of your function/classes) being executed when the module is first loaded, and it is dependent on functionality in another module that is only partially loaded, then that is when you run into problems with circular dependencies.
Using classes that don't exist yet
I can add to the __init__ file
def some_function():
return DoesntExist()
and your code would still run. It doesn't look for a class named Tensor until it is actually running the tensor function. If we did the following then we would get an error about Tensor not existing.
def tensor(data):
return Tensor(data)
tensor()
from tensor.Tensor import Tensor
because now we are running the tensor function before the import and it can't find anything named Tensor.
The order of stuff in __init__
If you switch the order around you will have
__init__ imports Tensor imports helper imports __init__ again
as it tries to grab the tensor function, but it can't as the __init__ function can't proceed past the the line that imports Tensor until that import has been completed.
Now with the current order we have,
__init__ defines tensor, sees the import statement, and saves its current progress as a partial import
The same imports happen (__init__ imports Tensor imports helper imports __init__ looking for a tensor function)
This time we look at the partial import for the tensor function, find it, and we are able to continue on using that.
I didn't think about any of that when I put things in that order. I just wrote out the code, got the circular import error, switched the order around, and didn't think about what was going on until you asked about it.
And now that I think about it, the following would have worked too.
The order of things in the __init__ file will no longer matter.
from tensor.Tensor import Tensor
def tensor(data):
return Tensor(data)
And then in helper.py
import numpy as np
import tensor
def mean(data):
value=np.mean(data)
return tensor.tensor(value)
The difference is that now instead of specifically asking that the tensor function exist when the module is imported by trying to do from . import tensor, we are doing import tensor (which is importing the tensor package and not the function). And now, whenever the the mean function gets run, we are going to do tensor.tensor(value) to get the tensor function inside our tensor package.
Related
I was trying to modify a function within a class. I was following the steps from this link. I want to understand why the changes are not working.
The function is:
def explain(self, test_df, row_index=None, row_num=None, class_id=None, bacckground_size=50, nsamples=500)
from module ktrain
I am trying to get the shape values themselves instead of the plot. My changes are in
def alternative_explain (self, test_df, row_index=None, row_num=None, class_id=None, background_size=50, nsamples=500)
Then I try:
import types
import ktrain
funcType = types.MethodType
predictor1 = TabularPredictor()
But get the error that "name 'TabularPredictor' is not defined. Simarly, I can not make a new, inherited class from TabularPredictor. What am I doing wrong?
Update: I did import ktrain
It sounds like you're a little confused about the python import statement, which has several alternative syntaxes.
Using import ktrain will only import a reference to the module ktrain in your code; if you want your code to refer to anything inside the ktrain module, you need to use dot-notation, e.g. ktrain.TabularPredictor(). Pros: everything from the ktrain module is now accessible from within your code. Cons: it might be a bit wordy to type out ktrain.TabularPredictor() every time you want to make an instance of the class, and you might only actually need one or two classes from that module.
Using from ktrain import TabularPredictor will make the TabularPredictor class accessible in your code's namespace, so there will be no need to use the dot-notation; you can just type TabularPredictor() when you want to create an instance. Pros: less wordy, you only import what you need (none of the other classes or functions from ktrain will be accessible from within your code). Cons: you might find out later on that some of the other classes/functions in the module are useful, which means you'll have to change your import statement. It can also be a pain to have to individually import 10 different classes from the same module.
You can read more here.
This question already has answers here:
Short description of the scoping rules?
(9 answers)
Closed 1 year ago.
Context: I'm writing a translator from one Python API to another, both in Python 3.5+. I load the file to be translated with a class named FileLoader, described by Fileloader.py. This file loader allows me to transfer the file's content to other classes doing the translation job.
All of the .py files describing each class are in the same folder
I tried two different ways to import my FileLoader module inside the other modules containing the classes doing the translation job. One seems to work, but the other didn't and I don't understand why.
Here are two code examples illustrating both ways:
The working way
import FileLoader
class Parser:
#
def __init__(self, fileLoader):
if isinstance(fileLoader, FileLoader.FileLoader)
self._fileLoader = fileLoader
else:
# raise a nice exception
The crashing way
class Parser:
import FileLoader
#
def __init__(self, fileLoader):
if isinstance(fileLoader, FileLoader.FileLoader)
self._fileLoader = fileLoader
else:
# raise a nice exception
I thought doing the import inside the class's scope (where it's the only scope FileLoader is used) would be enough, since it would know how to relate to the FileLoader module and its content. I'm obviously wrong since it's the first way which worked.
What am I missing about scopes in Python? Or is it about something different?
2 things : this won't work. And there is no benefit to doing it this way.
First, why not?
class Parser:
#this assigns to the Parser namespace, to refer to it
#within a method you need to use `self.FileLoader` or
#Parser.FileLoader
import FileLoader
#`FileLoader` works fine here, under the Parser indentation
#(in its namespace, but outside of method)
copy_of_FileLoader = FileLoader
#
def __init__(self, fileLoader):
# you need to refer to modules under in Parser namespace
# with that `self`, just like you would with any other
# class or instance variable ๐
if isinstance(fileLoader, self.FileLoader.FileLoader)
self._fileLoader = fileLoader
else:
# raise a nice exception
#works here again, since we are outside of method,
#in `Parser` scope/indent.
copy2_of_FileLoader = FileLoader
Second it's not Pythonic and it doesn't help
Customary for the Python community would be to put import FileLoader at the top of the program. Since it seems to be one of your own modules, it would go after std library imports and after third party module imports. You would not put it under a class declaration.
Unless... you had a good (probably bad actually reason to).
My own code, and this doesn't reflect all that well on me, sometimes has stuff like.
class MainManager(batchhelper.BatchManager):
....
def _load(self, *args, **kwargs):
๐ from pssystem.models import NotificationConfig
So, after stating this wasn't a good thing, why am I doing this?
Well, there are some specific circumstances to my code going here. This is a batch, command-line, script, usable within a Django context and it uses some Django ORM models. In order for those to be used, Django needs to be imported first and then setup. But that often happens too early in the context of these types of batch programs and I get circular import errors, with Django complaining that it hasn't initialized yet.
The solution? Defer execution until the method is called, when all the other modules have been imported and Django has been setup elsewhere.
NotificationConfig is now available, but only within that method as it is a local variable in it. It works, but... it's really not great practice.
Remember: anything in the global scope gets executed at module load time, anything under classes at module load time, anything withing method/function bodies when the method/function is called.
#happens at module load time, you could have circular import errors
import X1
class DoImportsLater:
.
#happens at module load time, you could have circular import errors
import X2
def _load(self, *args, **kwargs):
#only happens when this method is called, if ever
#so you shouldn't be seeing circular imports
import X3
import X1 is std practice, Pythonic.
import X2, what are doing, is not and doesn't help
import X3, what I did, is a hack and is covering up circular import references. But it "fixes" the issue.
In my python package I have an entry_point run.py file which takes the seed (e.g. 42) and the cuda device (e.g. "cuda:0") as command line argument.
Since both of these variables are used throughout the entire package at different places, I don't want to pass them as arguments from function to function. Hence, I did the following:
utils.py:
import random
import numpy as np
import torch
def set_device(device: str):
global _DEVICE
_DEVICE = torch.device(device)
def get_device() -> torch.device:
return _DEVICE
def set_seed_number(seed: int):
global _SEED
_SEED = seed
def set_seeds():
torch.manual_seed(_SEED)
random.seed(_SEED)
np.random.seed(_SEED)
And then within run.py I set these variables once by calling:
from package.utils import set_device, set_seed_number
...
set_device(device)
set_seed_number(seed=seed)
Now I can import and call the get_device()and set_seeds method from anywhere in my package and I don't have to pass these variables as arguments.
So far this approach works fine, but after reading that using globals in python is strongly discouraged I am wondering if there is a more pythonic way of achieving the above discribed goal?
I already thought of having a dedicated Singleton class, which dynamically would instantiate those constants but I am not exactly sure if and how that would work and if it would be considered more "pythonic" after all.
Thanks already for your answers and maybe you can point me to some patterns that seem applicable in this situation. I can only guess that I am not the first one trying to achieve the above discribed goal.
I can't honestly see a problem with global if it is used sparingly and only when there is a strong reason to do so. (I think the strong discouragement aganst global is because it is often abused.)
But as regards your proposed custom class, there is no need to instantiate it -- you can just set class variables.
main.py
import settings
settings.set_foo(3)
print(settings.Settings.foo)
settings.py
class Settings:
pass
def set_foo(x):
Settings.foo = x
This is no different in principle from putting your data items inside some other mutable collection e.g. a dictionary and then setting them inside functions in the module that defines it (or another that imports it).
main.py
import settings
settings.set_foo(3)
print(settings.settings['foo'])
settings.py
settings = {}
def set_foo(x):
settings['foo'] = x
I'm working on python packages that implement scientific models and I'm wondering what is the best way to handle optional features.
Here's the behavior I'd like:
If some optional dependencies can't be imported (plotting module on a headless machine for example), I'd like to disable the functions using these modules in my classes, warn the user if he tries to use them and all that without breaking the execution.
so the following script would work in any cases:
mymodel.dostuff()
mymodel.plot() <= only plots if possible, else display log an error
mymodel.domorestuff() <= get executed regardless of the result of the previous statement
So far the options I see are the following:
check in the __init __.py for available modules and keep a list of
them (but how to properly use it in the rest of the package?)
for each function relying on optional dependencies have a try import ...
except ... statement
putting functions depending on a particular module in a separated file
These options should work, but they all seem to be rather hacky and hard to maintain. what if we want to drop a dependency completely? or make it mandatory?
The easiest solution, of course, is to simply import the optional dependencies in the body of the function that requires them. But the always-right PEP 8 says:
Imports are always put at the top of the file, just after any module
comments and docstrings, and before module globals and constants.
Not wanting to go against the best wishes of the python masters, I take the following approach, which has several benefits...
First, import with an try-except
Say one of my functions foo needs numpy, and I want to make it an optional dependency. At the top of the module, I put:
try:
import numpy as _numpy
except ImportError:
_has_numpy = False
else:
_has_numpy = True
Here (in the except block) would be the place to print a warning, preferably using the warnings module.
Then throw the exception in the function
What if the user calls foo and doesn't have numpy? I throw the exception there and document this behaviour.
def foo(x):
"""Requires numpy."""
if not _has_numpy:
raise ImportError("numpy is required to do this.")
...
Alternatively you can use a decorator and apply it to any function requiring that dependency:
#requires_numpy
def foo(x):
...
This has the benefit of preventing code duplication.
And add it as an optional dependency to your install script
If you're distributing code, look up how to add the extra dependency to the setup configuration. For example, with setuptools, I can write:
install_requires = ["networkx"],
extras_require = {
"numpy": ["numpy"],
"sklearn": ["scikit-learn"]}
This specifies that networkx is absolutely required at install time, but that the extra functionality of my module requires numpy and sklearn, which are optional.
Using this approach, here are the answers to your specific questions:
What if we want to make a dependency mandatory?
We can simply add our optional dependency to our setup tool's list of required dependencies. In the example above, we move numpy to install_requires. All of the code checking for the existence of numpy can then be removed, but leaving it in won't cause your program to break.
What if we want to drop a dependency completely?
Simply remove the check for the dependency in any function that previously required it. If you implemented the dependency check with a decorator, you could just change it so that it simply passes the original function through unchanged.
This approach has the benefit of placing all of the imports at the top of the module so that I can see at a glance what is required and what is optional.
I would use the mixin style of composing a class. Keep optional behaviour in separate classes and subclass those classes in your main class. If you detect that the optional behaviour is not possible then create a dummy mixin class instead. For example:
model.py
import numpy
import plotting
class Model(PrimaryBaseclass, plotting.Plotter):
def do_something(self):
...
plotting.py
from your_util_module import headless as _headless
__all__ = ["Plotter"]
if _headless:
import warnings
class Plotter:
def plot(self):
warnings.warn("Attempted to plot in a headless environment")
else:
class Plotter:
"""Expects an attribute called `data' when plotting."""
def plot(self):
...
Or, as an alternative, use decorators to describe when a function might be unavailable.
eg.
class unavailable:
def __init__(self, *, when):
self.when = when
def __call__(self, func):
if self.when:
def dummy(self, *args, **kwargs):
warnings.warn("{} unavailable with current setup"
.format(func.__qualname__))
return dummy
else:
return func
class Model:
#unavailable(when=headless)
def plot(self):
...
I am new in Python and I am creating a module to re-use some code.
My module (impy.py) looks like this (it has one function so far)...
import numpy as np
def read_image(fname):
....
and it is stored in the following directory:
custom_modules/
__init.py__
impy.py
As you can see it uses the module numpy. The problem is that when I import it from another script, like this...
import custom_modules.impy as im
and I type im. I get the option of calling not only the function read_image() but also the module np.
How can I do to make it only available the functions I am writing in my module and not the modules that my module is calling (numpy in this case)?
Thank you very much for your help.
I've got a proposition, that could maybe answer the following concern: "I do not want to mess class/module attributes with class/module imports". Because, Idle also proposes access to imported modules within a class or module.
This simply consists in taking the conventional name that coders normally don't want to access and IDE not to propose: name starting with underscore. This is also known as "weak ยซ internal use ยป indicator", as described in PEP 8 / Naming styles.
class C(object):
import numpy as _np # <-- here
def __init__(self):
# whatever we need
def do(self, arg):
# something useful
Now, in Idle, auto-completion will only propose do function; imported module is not proposed.
By the way, you should change the title of your question: you do not want to avoid imports of your imported modules (that would make them unusable), so it should rather be "how to prevent IDE to show imported modules of an imported module" or something similar.
You could import numpy inside your function
def read_image(fname):
import numpy as np
....
making it locally available to the read_image code, but not globally available.
Warning though, this might cause a performance hit (as numpy would be imported each time the code is run rather than just once on the initial import) - especially if you run read_image multiple times.
If you really want to hide it, then I suggest creating a new directory such that your structure looks like this:
custom_modules/
__init__.py
impy/
__init__.py
impy.py
and let the new impy/__init__.py contain
from impy import read_image
This way, you can control what ends up in the custom_modules.impy namespace.