I am new to Azure Machine Learning and have been struggling with importing modules into my run script. I am using the AzureML SDK for Python. I think I somehow have to append the script location to PYTHONPATH, but have been unable to do so.
To illustrate the problem, assume I have the following project directory:
project/
src/
utilities.py
test.py
run.py
requirements.txt
I want to run test.py on a compute instance on AzureML and I submit the run via run.py.
A simple version of run.py looks as follows:
from azureml.core import Workspace, Experiment, ScriptRunConfig
from azureml.core.compute import ComputeInstance
ws = Workspace.get(...) # my credentials here
env = Environment.from_pip_requirements(name='test-env', file_path='requirements.txt')
instance = ComputeInstance(ws, '<instance-name>')
config = ScriptRunConfig(source_directory='./src', script='test.py', environment=env, compute_target=instance)
run = exp.submit(config)
run.wait_for_completion()
Now, test.py imports functions from utilities.py, e.g.:
from src.utilities import test_func
test_func()
Then, when I submit a run, I get the error:
Traceback (most recent call last):
File "src/test.py", line 13, in <module>
from src.utilities import test_func
ModuleNotFoundError: No module named 'src.utilities'; 'src' is not a package
This looks like a standard error where the directory is not appended to the Python path. I tried two things to get rid of it:
include an __init__.py file in src. This didn't work and I would also for various reasons prefer not to use __init__.py files anyways.
fiddle with the environment_variables passed to AzureML like so
env.environment_variables={'PYTHONPATH': f'./src:${{PYTHONPATH}}' but that didn't really work either and I assume that is simply not the correct way to append the PYTHONPATH
I would greatly appreciate any suggestions on extending PYTHONPATH or any other ways to import modules when running a script in AzureML.
The source diectory set in ScriptRunConfig will automaticaly add to the PYTHONPATH, that means remove the "src" directory from the import line.
from utilities import test_func
Hope that helps
Related
The directory I have looks like this:
repository
/src
/main.py
/a.py
/b.py
/c.py
I run my program via python ./main.py and within main.py there's an important statement from a import some_func. I'm getting a ModuleNotFoundError: No module named 'a' every time I run the program.
I've tried running the Python shell and running the commands import b or import c and those work without any errors. There's nothing particularly special about a either, it just contains a few functions.
What's the problem and how can I fix this issue?
repository/
__init__.py
/src
__init__.py
main.py
a.py
b.py
c.py
In the __init__.py in repository, add the following line:
from . import repository
In the __init__.py in src, add the following line:
from . import main
from . import a
from . import b
from . import c
Now from src.a import your_func is going to work on main.py
Maybe you could try using a relative import, which allows you to import modules from other directories relative to the location of the current file.
Note that you will need to add a dot (.) before the module name when using a relative import, this indicates that the module is in the same directory as the current file:
from . import a
Or try running it from a different directory and appending the /src path like this:
import sys
sys.path.append('/src')
You could also try using the PYTHONPATH (environment variable) to add a directory to the search path:
Open your terminal and navigate to the directory containing the main.py file (/src).
Set the PYTHONPATH environment variable to include the current directory, by running the following command
export PYTHONPATH=$PYTHONPATH:$(pwd)
At last you could try to use the -m flag inside your command, so that Python knows to look for the a module inside the /src directory:
python -m src.main
I've had similar problems in the past. Imports in Python depend on a lot of things like how you run your program, as a script or as a module and what is your current working directory.
Thus I've created a new import library: ultraimport It gives the programmer more control over imports and lets you do file system based, relative imports.
Your main.py could look like this:
import ultraimport
a = ultraimport('__dir__/a.py')
This will always work, no matter how you run your code, no matter what is your sys.path and also no init files are necessary.
I have a bunch of scripts for static code analysis.
The get a directory as the command line argument, and they run on all files inside that directory.
Here's the structure of my project:
__init__.py
run.py
message.py
globals.py
react
__init__.py
run.py
check_imports.py
analyze_states.py
next
__init__.py
check_routes.py
analyze_images.py
git
__init__.py
check_size.py
ensure_branch_name.py
run.py => how can I call this and still access message.py?
Now, if I use top-level run.py as the orchestrator to call sub-modules inside sub-packages, everything works great and I can use import message from each sub-module.
But for git package, I want to call it directly. And I want to use functions defined inside message.py. I'm stuck at this point.
I saw Python import from parent package and tried from .. import message but it does not work.
The way I did find out is for the git parent folder to be in your python path.
In general when I am developing I would add this lines in your git/run.py
import sys
sys.path.append('..')
Then you will be able to import message
When you finish with development and it is ready for pip install you need to make sure that message is an importable module in your setup.py for example.
Update
The '..' will only work if you execute the git/run.py script or module from within the git repository as '..' is a relative path.
If you want to execute run.py from anywhere you can do:
import os
import sys
parent_dir = os.path.dirname(os.path.dirname(os.path.realpath(__file__)))
sys.path.append(parent_dir)
Just have in mind that __file__ will only exist if you execute run.py in batch mode not from an interactive shell or a notebook.
I'm having trouble with a python package that uses separate modules to structure code. The package itself is working, however when imported from another environment fails with a ModuleNotFound error.
Here's the structure:
Project-root
|
|--src/
| __init__.py
| module_a.py
| module_b.py
| module_c.py
| module_d.py
|--tests
etc.
In module_a.py I have:
from module_a import function_a1,...
from module_b import function_b1,...
from module_c import function_c1,...
In module_c I import module_d like:
from module_d import function_d1,...
As mentioned above, executing module_a or module_c directly from the CLI work as expected, the unit tests I've created in the test directory also work (with the help of sys.path.insert), however if I create a new environment and import the package I get the following error:
>>> import module_a
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/<abs_path>/.venv/lib/python3.9/site-packages/module_a.py", line 22, in <module>
from module_c import function_c1, function_c2
File /<abs_path>/.venv/lib/python3.9/site-packages/module_c.py", line 9, in <module>
import module_d
ModuleNotFoundError: No module named 'module_d'
>>>
I've exhausted all ideas how to overcome this, besides combining the code of modules c and d in one file, which I'd hate to do, or rethink the flow so that all modules are imported from module_a.
Any suggestions how to approach this would be greatly appreciated.
Update: It turned out to be a typing mistake in the name of module_d in setup.py. For whatever reason python setup.py install was failing silently or I wasn't reading the logs carefully.
The problem comes down to understanding the basics of the import system and the PYTHONPATH.
When you try to import a module (import module_a), Python will search in order in every directory listed in sys.path. If a directory matches the name (module_a)1, then it runs the __init__.py file is such exist.
When you get an [https://docs.python.org/3/library/exceptions.html#ImportError], it means that there is no directory in sys.path containing a directory with the name asked.
You said for your tests you did something like sys.path.insert(0, "some/path/"), but it is not a solution, just a broken fix.
What you should do is set your PYTHONPATH environment variable to contain the directory where your modules are located, Project-root/src in your case. That way, no need to ever use sys.path.insert, or fiddle with relative/absolute paths in import statements.
When you create your new environment, just set your environment variable PYTHONPATH to include Project-root/src and you are done. This is how installing regular Python modules (libraries) work : they are all put into a directory in site-packages.
1: this changed since old Python versions, it used to be required for the directory to contain an __init__.py file
SUMMARY
I am fairly new to designing full-fledged python projects, and all my Python work earlier has been with Jupyter Notebooks. Now that I am designing some application with Python, I am having considerable difficulty making it 'run'.
I have visited the following sites -
Relative imports in Python
Ultimate answer to relative python imports
python relative import example code does not work
But none of them seem to solve my issue.
PROBLEM
Here's my repo structure -
my_app/
__init__.py
code/
__init__.py
module_1/
some_code_1.py
module_2/
some_code_2.py
module_3/
some_code_3.py
main.py
tests/
__init__.py
module_1/
test_some_code_1.py
module_2/
test_some_code_2.py
module_3/
test_some_code_3.py
resources/
__init__.py
config.json
data.csv
I am primarily using PyCharm and VS Code for development and testing.
The main.py file has the following imports -
from code.module_1.some_code_1 import class_1
from code.module_2.some_code_2 import class_2
from code.module_3.some_code_3 import class_3
In the PyCharm run configuration, I have the working directory set to `User/blah/blah/my_app/
Whenever I run the main.py from PyCharm, it runs perfectly.
But if I run the program from terminal like -
$ python code/main.py
Traceback (most recent call last):
File "code/main.py", line 5, in <module>
from code.module_1.some_code_1 import class_1
ModuleNotFoundError: No module named 'code.module_1.some_code_1'; 'code' is not a package
I get the same error if I run the main.py from VS Code.
Is there a way to make this work for PyCharm as well as terminal?
If I change the imports to -
from module_1.some_code_1 import class_1
from module_2.some_code_2 import class_2
from module_3.some_code_3 import class_3
This works on the terminal but doesn't work in PyCharm. The test cases fail too.
Is there something I am missing, or some configuration that can be done to make all this work seamlessly?
Can someone help me with this?
Thanks!
The problem is when you do python code/main.py it makes your current working directory code/, which makes all of your absolute imports incorrect since Python doesn't see above that directory unless you explicitly change a setting like the PYTHONPATH environment variable.
Your best option is to rename main.py to __main__.py and then use python -m code (although do note that package name clashes with a module in the stdlib).
You also don't need the __init__.py in my_app/ unless you're going to treat that entire directory as a package.
And I would consider using relative imports instead of absolute ones (and I would also advise importing to the module and not the object/class in a module in your import statements to avoid circular import issues). For instance, for the from code.module_1.some_code_1 import class_1 line in code.main, I would make it from .module_1 import some_code_1.
Given the directory structure:
program/
setup.py
ilm/
__init__.py
app/
__init__.py
bin/
script.py
Note: the setup.py is not a typical setup.py, rather it is a custom-made setup uniquely for py2app.
program/ilm/app/__init__.py is non-empty: it contains a main() function, which instantiates a class in the same file. My question: In program/ilm/bin/script.py, if I want to import and execute the main() function in program/ilm/app/__init__.py, what are the valid ways of achieving this? The reason I ask is that script.py is doing so thus:
import ilm.app as app
if __name__ == '__main__':
app.main()
Based on my (admittedly limited) understanding of packaging and importing, this shouldn't work, since we have not explicitly told script.py where to look for project/ilm/app/__init__.py using ... And indeed, I get:
MacBook-Pro-de-Pyderman:program Pyderman$ python ./bin/script.py
Traceback (most recent call last):
File "./bin/script.py", line 5, in <module>
import ilm.app as app
ImportError: No module named ilm.app
In contrast, when the Python interpreter is started in /project, import ilm.app as app works fine.
This is apparently fully-functional production code which I should not have to change to get running.
Is the import statement valid, given the directory structure, and if so, what am I missing?
If not, what is the recommended way of getting the import to work? Add the path using sys.path.append() above the import statement? Or use .. notation in the import statement to explicitly point to program to pick up project/ilm/app/__init__.py? Is the fact that it is an __init__.py I am trying to import significant?
Two things. You need to make sure the iml directory is in the python path. Either make sure you are running python from the right directory or add the right path to sys.path list. And you need to make sure that both iml and app directory both have
__init__.py
file, since python needs to interpret the whole thing as a hierarchy of modules rather than just dirs. Then you should be able to do
from iml import app
The obvious conclusion would seem to be that the iml directory has an __init__.py inside it, but why that would happen in your production setup is hard to say. Have you checked in the production environment whether this is the case?
Assuming that the production environment is importing the package at iml/app (which you can check by examining app.__file__) then the program will indeed execute the main function from the __init__.py file - but __init__.py might easily be importing it from sonewhere else rather than defining it locally.
The Python package needs proper setup.py which defines the package structure. Furthermore bin/ scripts should be defined as console_scripts entry points.
This all must be installed in a proper Python environment, not just any folder.