I have been researching how to build the folder structure for a custom python package. There were several attempts, but none of them seemed to be applicable in general. In particular, the usage (or not usage) of the \__init__.py file(s).
I have a package that consists of several sub-packages, each being responsible to parse Files of a certain kind. Therefore I currently adopted this structure:
Parsers/
├── __init__.py
|
├── ExternalPackages
│ ├── __init__.py
│ ├── package1
│ └── package2
|
├── FileType1_Parsers/
│ ├── __init__.py
│ ├── parsers1.py
│ └── containers1.py
│
└── FileType2_Parsers/
├── __init__.py
├── parsers2.py
└── containers2.py
But it seems not very pythonic, that when I import his package and I want to use a certain class of a module I have to type something like
from Parsers.FileType1_Parsers.parsers1 import example_class
Is there any convention on how to structure such packages or any rules on how to avoid such long import lines?
You can add the following line to Parsers/__init__.py
from .FileType1_Parsers.parsers1 import example_class
Then you can import example_class by
from Parsers import example_class
This is a common practice in large package.
You can modify sys.path at run-time so that it contains a directory for each module you'll be using. For example, for package1 issue the following statements:
>>> sys.path.append(r"[package directory path]\\Parsers\\FileType1_Parsers\\package1")
You can do this for any other modules in the package as well. Now, you can just use this command:
from package1 import example_class
Hope this helps!
Related
I am relatively new to protocol buffers and python, so I stumbled upon a problem described below. (I've done plenty of research in this issue, but found no solution)
The relevant hirarchy of my code:
.
└── MainFolder
├── main.py
├── create_python_proto.bat
├── protos
│ └── foo.proto
├── client
│ ├── protos
│ │ ├ __init__.py
│ │ ├ foo_pb2.py
│ │ └ foo_pb2_grpc.py
│ ├── __init__.py
│ ├── client.py
│ └── etc...
└── server
├── ...
(main.py is my starting point)
When I create the code from my protobuf with protoc, the files are placed correctly, as I want them to be.
But the auto generated code inside of my 'foo_pb2_grpc.py' is not correct, precisly the import of 'foo_pb2.py'.
Actual state:
from protos import foo_pb2 as protos_dot_foo__pb2
Target state:
from ..protos import foo_pb2 as protos_dot_foo__pb2
I create the proto files via my 'create_python_proto.bat', command below:
setlocal
python -m grpc_tools.protoc -I . --python_out=.\client --grpc_python_out=.\client protos\foo.proto
endlocal
So when I run my program from 'main.py' I get an "ImportError" at 'foo_pb2_grpc.py' where it says:
"cannot import name 'foo_pb2' from 'protos' (unknown location)"
What am I doing wrong? I am desperatly searching for a solution, but I just cannot find one...
Thanks for helping me out in advance!
ps: both '_ _ init _ _.py' given in '.\MainFolder\client' and '.\MainFolder\client\protos' are empty
In my experience, using modules with protoc-generated Python is gnarly and best avoided.
The gRPC repo's examples avoid using modules (possibly for this reason).
A solution is to:
Create a true module source e.g. src
Manually revise the foo_pb2_grpc.py import path
Use absolute|relative import references
With:
src
├── client
│ ├── __init__.py
│ └── main.py
├── __init__.py
├── protos
│ ├── foo_pb2_grpc.py
│ ├── foo_pb2.py
│ └── __init__.py
└── server
├── __init__.py
└── main.py
And:
python \
-m grpc_tools.protoc \
--proto_path ${PWD}/protos \
--python_out=${PWD}/src/protos \
--grpc_python_out=${PWD}/src/protos \
${PWD}/protos/foo.proto
And src/protos/foo_pb2_grpc.py:
from ..protos import foo_pb2 as foo__pb2
Or:
import src.protos.foo_pb2 as foo__pb2
And e.g. src/server/main.py:
from ..protos import foo_pb2
from ..protos import foo_pb2_grpc
Or from src.protos import ...
NOTE The protoc-generated packages represent the interface between peers (client|server). If these were provided by a 3rd-party, the generated files would be in an entirely distinct module (com.third.party). You may wish to follow a similiar practice for in-house protos. At least, place these alongside (as here src.protos) client and server implementations of them.
I copied python project B to my project A as a model. But B's include path is just based on B. So how can I call B's function from A. For example,
.
└── mypackage A
├── subpackage_1
│ ├── test11.py
│ └── test12.py
├── subpackage_2
│ ├── test21.py
│ └── test22.py
└── subpackage B
├── test31.py
└── test32.py
test31.py may include test32.py by
import test32
But from A's prospective, I should include it by
import B.test32
In fact, B is more complex than this example. How can I refactor it?
first make module using __init__.py in B
then try this:
from B.test32 import 'function Name'
and then call it.
As #Wàlid Bachri said, you should start placing __init__.py files, but it is a bit more complex than than.
You must put those __init__.py files, in each directory. So it would Look something like this.
.
└── mypackage A
├── __init__.py # here under A
├── subpackage_1
| ├── __init__.py
│ ├── test11.py
│ └── test12.py
├── subpackage_2
| ├── __init__.py # under package 2
│ ├── test21.py
│ └── test22.py
└── subpackage B
├── __init__.py # under package B
├── test31.py
└── test32.py
In each of the __init__.py files, you need to import all the files in the same directory. So for example in subpackage B you need to do this.
#/subpackage B/__init__.py
from . import test31
from . import test32
in mypackage A, where there are no files, just directories, you also do the same thing (from . import subpackage_2 etc).
If I suppose that mypackage A is the "main" package (as can be seen by your diargam), so it is not a submodule, to run any file you will need to execute the following.
First cd to the parent directory of mypackage A and then
# Suppose you want to execute /subpackage_1/test11
python -m mypackage_A.subpackage_1.test11 # WARNING mypackage A should have no whitespace
You may get a RuntimeWarning about your sys.modules being modified, but I can assure you, from experience, that you can safely ignore it. This is how python modules for pip are normally done, to ensure that pushing to production is easy.
EDIT : Also note that in all of your files, you should switch to package imports for relative imports, so you must use the dot syntax.
from . import test32 # this will import .test32
if you were to instead to import test32 from test31, the interpeter would try to search for a global package named test32 and not look in the same directory.
Simply add an empty python file called __init__.py to your B project to make Python treat this project as a normal python module.
I would like to set up a python namespace package containing several connected packages which need to be installable independently unless depencies are explicitly specified. Existing solutions however seem more ore less messy to me.
One of the packages contains moste of the problems logic for example and the others contain auxiliray functionality such as plotting and data export. The logic package needs to stay slim and can not import more than numpy where as the other packages can utilise more complex packages like pandas and matplolib. I would like to set up a package structure which looks like the resulting namespace of a namespace package but without unnecessary folder nesting something like this:
namespace
├── logic
│ ├── __init__.py
| ├── functions.py
│ └── setup.py # requires numpy
├── datastructure
│ ├── __init__.py
| ├── functions.py
│ └── setup.py # requires namespace.logic and pandas
├── plotting
│ ├── __init__.py
| ├── functions.py
│ └── setup.py # requires namespace.logic, namespace.datastructure and matplotlib
└── setup.py #should install every package in namespace
I figured that this looks like a conventional package with modules but I did not find a way yet to set it up as a packgae while mainintainign the option to only install specific modules therefore I assumed a namespace package should offer that option but I can not quite get it to work with pip
At the moment I would be required to have two more directory levels like this:
namespace
├── NamespaceLogic #don't want this
│ ├── namespace #don't want this
│ │ └── logic
│ │ └── __init__.py
│ └── setup.py
├── NamespaceDatastructure #don't want this
│ ├── namespace #don't want this
│ │ └── datastructure
│ │ └── __init__.py
│ └── setup.py
├── NamespacePlotting #don't want this
│ ├── namespace #don't want this
│ │ └── plotting
│ │ └── __init__.py
│ └── setup.py
└── setup.py
My problem is similar to this question: Python pip install sub-package from own package but I would like to avoid having to many subfolders as this poses the risk to max out the path length restrictions of my system (+ it confuses everyone else). How do I need to configure the different setup.py files in order to be able to run
pip install namespace #installs namespace.logic, namespace.datastructure, namespace.plotting
pip install namespce.logic #installs only namspace.logic and works in an environment with numpy which does not have pandas or matplotlib
You can use the package_dir option of setuptools to your advantage to get rid of the empty folders for the namespace packages:
NmspcPing
├── ping
│ └── __init__.py
└── setup.py
import setuptools
setuptools.setup(
name='NmspcPing',
version='0.0.0.dev0',
packages=['nmspc.ping'],
package_dir={'nmspc.ping': 'ping'},
)
Something like the following would also be feasible, but depending on how the projects are built or installed, the setup.py files might be included as part of the packages as well (which is probably unwanted):
.
├── ping
│ ├── __init__.py
│ └── setup.py
├── pong
│ ├── __init__.py
│ └── setup.py
└── setup.py
If the path length restriction is an issue, then using shorter package names might be a better bet. Because in many cases the packages are installed with all the directory levels anyway (unless they stay zipped), even if you skip them in your source code repository.
Honestly I would be surprised if a path length restriction issue actually happened, and I believe it would anyway still happen on things that you don't have control over (like 3rd party packages: numpy, pandas, plotlib probably have lots of nested sub-packages as well).
This must have been asked before, but I cannot for the life of me find it.
I have a project structure something like this
ROOT
├── README.md
├── modules
│ ├── __init__.py
│ ├── mod1.py
│ └── mod2.py
├── tests
│ ├── test_mod1.py
│ └── test_mod2.py
├── notebooks
│ ├── nb1.ipynb
│ ├── nb2.ipynb
│ └── sub_dir
│ ├── sub_nb.ipynb
│ ├── generate_py
│ └── py_files
│ └── sub_nb.py
├── definitions
└── main.py
So from main.py I am able to import anything from definitions, or any module from ROOT/modules.
What I want is to be able to import from these from anywhere within the notebooks directory tree. I know I could do this using:
import sys
sys.path.append("..")
But the notebooks directory tree has many layers, and I don't want my code to start looking like this:
import sys
sys.path.append("../../../../")
What's more, the file generate_py is a bash script, that converts the jupyter notebooks (.ipynb) to .py files and stashes the .py files into the ./py_files subdir.
With the above method i end up having to manually edit every file to put an extra ../ into the sys.append(). This is annoying.
If I run files from within pycharm all works well (I'm guessing it updates your PYTHONPATH to include the project root when you create main.py?)
To further complicate, this project is run on several machines, shared via git. So absolute references are out.
But running from terminal, or from within jupyter it cant gind modules, or definitions without going through the sys.append() process. And I feel there must be a better way.
So what is the best practice here?
Yes, I know this is a recurrent question but I still couldn't find a convincing answer. I even read at https://chrisyeh96.github.io/2017/08/08/definitive-guide-python-imports.html but could not find out how to solve the problem:
I'm running python 3.6 project that includes jupyter (ipython) notebooks. I want the notebook to import a custom local helpers.py package that I will probably use also later in other sources.
The project structure is similar to:
my_project/
│
├── my_project/
│ ├── notebooks/
│ └── a_notebook.ipynb
│ ├── __init__.py # suppose to make package `my_project` importable
│ └── helpers.py
│
├── tests/
│ └── helpers_tests.py
│
├── .gitignore
├── LICENSE
├── README.md
├── requirements.txt
└── setup.py
When importing helpers in the notebook I get the error:
----> 4 import helpers
ModuleNotFoundError: No module named 'helpers'
I also tried from my_project import helpers and I get the same error ModuleNotFoundError: No module named 'my_project'
I finally (and temporarily) used the usual trick:
import sys
sys.path.append('..')
import helpers
But it looks awful and I'm still looking for a better solution
One can tell python where to look for modules via sys.path. I have a project structure like this:
project/
│
├── src/
│ └── my_module/
│ ├── __init__.py
│ └── helpers.py
├── notebooks/
│ └── a_notebook.ipynb
...
I was able to load the module like so:
import sys
sys.path.append('../src/')
from my_module import helpers
One should be able load the module from wherever they have it.
Here I could find several solutions. Some of them are similar to the ones answered before:
https://mg.readthedocs.io/importing-local-python-modules-from-jupyter-notebooks/index.html
If you move the notebooks directory out one level, and then explicitly import your module from the package, that should do it. So your directory would look like this:
my_project/
│
├── my_project/
│ ├── __init__.py
│ └── helpers.py
├── notebooks/
│ └── a_notebook.ipynb
...
and then your import statement within the notebook would be:
from my_project import helpers.
Try the following line:
from my_project.helpers import what_you_need
This line should also work:
import my_project.helpers
I think you need a __init__.py module in the notebooks/ directory. I haven't really used Jupyter notebooks before so I could be wrong. You may also need to try changing your import statement to:
import .. helpers
to indicate that the import statement is for a local package that is located in the parent directory of the Jupyter notebook.
This worked for me.
import sys
MODULE_FULL_PATH = '/<home>/<username>/my_project/my_project'
sys.path.insert(1, MODULE_FULL_PATH)
from my_module import helpers
If you are on a Unix/Linux system another elegant solution may be creating a "soft link" to the module file helpers.py that you would like to use. Change to the notebooks directory and create the link to the module file this way:
cd notebooks; ln -fs ../my_project/helpers.py .
This "soft link" is essentially a pointer (a shortcut) to the original target file. With the link in place you will be import your module file as usual:
import helpers