Relative Repo Path in __init__.py? - python

I'll share a problem that we are kinda having at work. We have a code repo and there are files spread all around (currently trying to get everyone to clean up). The issue is that with every file that gets created we are having to make a RELATIVE_REPO_PATH (RRP) in order to import other files in our repo. The repo is hosted using mercurial, and all users have a clone of the repo on their local machines. They each have the ability to push and pull updates as needed. This means that the RRP cannot be hardcoded in anywhere. Below I'll show an example of our structure, and also show what we currently do to get a RRP.
An example of our structure.
.
├── __init__.py
├── misc_functions
│   ├── __init__.py
│   ├── myTest.py
│   └── __pycache__
│   └── __init__.cpython-37.pyc
└── Test
├── __init__.py
├── pythonfile1 (copy).py
└── pythonfile1.py
Here is what we currently do at work
import os
import sys
if not hasattr(sys, 'frozen'):
RELATIVE_REPO_PATH = os.path.dirname(os.path.dirname(os.path.realpath(__file__)))
if __name__ == '__main__':
sys.path.append(RELATIVE_REPO_PATH)
sys.path.pop(0)
else:
RELATIVE_REPO_PATH = os.path.dirname(sys.executable)
As you can see, however deep you are from the root of the repo, you need to add more or less os.path.dirname. We have tried this with while loops, however, alot of our files import each other and it's a kind of mess.
Is there any way to just add a simple snippet of code to the top level init.py or all the init.py files that would eliminate the constant copy and pasting of this code?
Just to clarify, everything that we do works just fine. Want to know if there is a way to set up a global RRP in the init.py that will work anywhere there is an init.py.

Related

Pylint disagrees with VSCode and python in imports

I am not finding the way to properly code so that both pylint and the execution of the code (within VSCode or from the command line) would work.
There are some similar questions but none seems to apply to my project structure with a src directory under which there will be multiple packages. Here's the simplified project structure:
.
├── README.md
├── src
│   ├── rssita
│   │   ├── __init__.py
│   │   ├── feeds.py
│   │   ├── rssita.py
│   │   └── termcolors.py
│   └── zanotherpackage
│   ├── __init__.py
│   └── anothermodule.py
└── tests
├── __init__.py
└── test_feeds.py
From what I understand rssita is one of my my packages (because of the init.py file) with some modules under it amongst which rssita.py file contains the following imports:
from feeds import RSS_FEEDS
from termcolors import PC
The rssita.py code as shown above runs well from both within VSCode and from command line python ( python src/rssita/rssita.py ) from the project root, but at the same time pylint (both from within VSCode and from the command line (pylint src or pylint src/rssita)) flags the two imports as not found.
If I modify the code as follows:
from rssita.feeds import RSS_FEEDS
from rssita.termcolors import PC
pylint will then be happy but the code will not run anymore since it would not find the imports.
What's the cleanest fix for this?
As far as I'm concerned pylinty is right, your setup / PYTHONPATH is screwed up: in Python 3, all imports are absolute by default, so
from feeds import RSS_FEEDS
from termcolors import PC
should look for top-level packages called feeds and termcolors which I don't think exist.
python src/rssita/rssita.py
That really ain't the correct invocation, it's going to setup a really weird PYTHONPATH in order to run a random script.
The correct imports should be package-relative:
from .feeds import RSS_FEEDS
from .termcolors import PC
Furthermore if you intend to run a package, that should be either a runnable package using __main__:
python -m rssita
or you should run the sub-package as a module:
python -m rssita.rssita
Because you're using an src-package, you'll either need to create a pyproject.toml so you can use an editable install, or you'll have to PYTHONPATH=src before you run the command. This ensures the packages are visible at the top-level of the PYTHONPATH, and thus correctly importable. Though I'm not a specialist in the interaction of src layouts & runnable packages, so there may be better solutions.

How to fix Pylint's false positive Unable to import error?

For last couple of hours I was trying to figure out what's the Pythonic way of importing modules from parent directory and from sub-directories. I made a project just for testing purposes and hosted it on github, so you could take a look at it to better understand my question.
In the project we have the following files structure:
├── __init__.py
├── main.py
└── project
├── a.py
├── __init__.py
└── subdirectory
├── b.py
└── __init__.py
I'm trying to figure out how to import modules from subdirectories and from the parent directories.
If I try to import the modules ./project/subdirectories/b.pyand ./project/a.py into the main.py module without specifying the root directory's name in the import statement then pylint starts to complain that it's unable to locate the modules, but the program runs fine:
If I do specify the root directory in the import statement then the pylint stops complaining, but the program doesn't run anymore:
Can someone, please, explain to me why do I have those false positive from pylint when the program does work and if I make the pylint happy, by specifying the root directory in the import statement, then the program stops working?
Your IDE (and thus pylint) expect the code to be launched from the root of your git repository not from the directory above it. There is no test directory in your git project structure. So I think the import from test.project import a is working only because your git repository itself is named test and because you launch the program from above the project directory.
Ie your real structure is this:
test
├── .git # root of git repository inside test
├── __init__.py
├── main.py
└── project
├── a.py
├── __init__.py
└── subdirectory
├── b.py
└── __init__.py
But the test directory is not inside git and for the IDE the git root as the source root. If you start launching __main__ from the same directory you're launching pylint from, then pylint indication will be accurate. So the fix is to create the test directory inside your git project:
.git # root of git repository outside test/
test
├── __init__.py
├── main.py
└── project
├── a.py
├── __init__.py
└── subdirectory
├── b.py
└── __init__.py
You could also launch pylint from the same directory you launch the code by changing the root directory of your IDE.

Why docker-compose python no module found for airflow operator

I am working on tests for docker-airflow postgres etl. My project structure currently looks like this:
docker-airflow
|
├── Dockerfile
├── __init__.py
├── dags
│   ├── __init__.py
│   ├── pandas_etl.py
│   └── tuto.py
├── docker-compose.yml
├── operators
   ├── __init__.py
   └── pandas_etl_over_postgres_operator.py
When importing my pandas_etl_over_postgres_operator.py into the pandas_etl.py dag, I am getting an error that the module is not found.
The pandas_etl.py import code is:
from operators.pandas_etl_over_postgres_operator import PandasETLOverPostgresOperator
I have tried the following two alternatives, they also give the same error.
from .operators.pandas_etl_over_postgres_operator import PandasETLOverPostgresOperator
and
from ..operators.pandas_etl_over_postgres_operator import PandasETLOverPostgresOperator
The import works fine locally but fails when I build and run using docker-compose.
Please note that for airflow, by default, the [core]>dags_folder will have a value of /usr/local/airflow/dags meaning that airflow will look for dags at path /usr/local/airflow/dags.
As a result, all your dags code should be inside that folder and hence, here are a few things that you need to change for your code to work:
In docker-compose.yml file:
- ./dags:/usr/local/airflow/dags/dags
- ./logs:/usr/local/airflow/dags/logs
- ./operators:/usr/local/airflow/dags/operators
In pandas_etl.py file:
from operators.pandas_etl_over_postgres_operator import PandasETLOverPostgresOperator
Hope it helps!
I think it is because you need to put your operator inside a plugins directory and mount that into a container. Have a look at the Puckle documentation regarding plugins here. You can also see and change where your particular airflow instance looks for plugins by checking your configuration file here

PyQt5 Project Structure and PyInstaller ModuleNotFoundError

I'm in the process of structuring my PyQt5 application to more established conventions. Now it looks like this
MyProj
├── my_qt_tool
│   ├── __init__.py
│   ├── class1.py
│   ├── my_qt_tool.py
│   ├── wizard1.py
│   ├── resources
│   │   └── templates
│   │   └── tool.conf.template
│   └── ui
│   ├── __init__.py
│   ├── mainwindow.py
│   ├── mainwindow.ui
│   ├── wizard_01_start.py
│   ├── wizard_01_start.ui
│   ├── ...
├── my_qt_tool.spec # for PyInstaller
├── bin
│   └── generate_ui_code.py # for compiling Qt *.ui to *.py
├── dist
│   └── my_qt_tool
├── environment.yml # conda environment requirements.
├── LICENSE
└── README.md
So MyProj is the top-level git repo, my_qt_tool is the package of my application, with a subpackage for UI specific code, my_qt_tool.py contains the "main" code which runs the GUI, class1.py handles business logic and wizard1.py is just some extra class for a GUI wizard.
Q1: Is this project structure canonical? Is the main function where it should be? Should *.ui files be separated to resources?
Now, after some haggling with imports, I added my_qt_tool as source directory to pycharm to make the imports work and created a run for my_qt_tool.py with working dir MyProj/my_qt_tool.
Q2: Technically, I want the working dir to be MyProj, but then I would have to reference resources/templates/tool.conf.template with my_qt_tool/resources.., which seems yucky... or is this the way to do it?
Now the imports in my_qt_tool look like this:
from class1 import DataModel
from ui.mainwindow import Ui_MainWindow
...
so no relative imports or the like, because everything is in the same package, right? (Again: to make this work, I had to add my_qt_tool as source directory in my PyCharm project settings...)
Q3: Okay, now the thing that doesn't work. Running PyInstaller on the spec file, which is pretty much stock with Analysis(['my_qt_tool/my_qt_tool.py'], ..., the resulting binary fails to start with the error message: ModuleNotFoundError: No Module named 'class1'. How can I fix this up?
Q1
if project going to get larger you may create module specific folders and each module has py and gui files within. structure it as mvc project folders.
For folder structure of mvc: https://www.tutorialsteacher.com/mvc/mvc-folder-structure and here is hov model-view architecture can be implemented https://www.learnpyqt.com/courses/model-views/modelview-architecture/.
Q2
read resources/templates/tool.conf.template when application bootstrapped instead of statically referencing. this could be done in generate_ui_code.py to load all configs as part of app reference
so no relative imports or the like, because everything is in the same package, right? (Again: to make this work, I had to add my_qt_tool as source directory in my PyCharm project settings...)
no need to add my_qt_tool if bootstrapped properly
Q3
add these 2 lines top of spec file
import sys
sys.setrecursionlimit(5000)
if you still encounter class1 related issue try importing it in where pyinstaller being called for your case its my_qt_tool.py
fix pyinstaller issue first and than consider refactoring folder structure with model-view conventions.
here are some fairly large project examples
https://wiki.python.org/moin/PyQt/SomeExistingApplications
https://github.com/topics/pyqt5-desktop-application

Python Not Finding Module

Given the following python project, created in PyDev:
├── algorithms
│   ├── __init__.py
│   └── neighborhood
│   ├── __init__.py
│   ├── neighbor
│   │   ├── connector.py
│   │   ├── __init__.py
│   │   ├── manager.py
│   │   └── references.py
│   ├── neighborhood.py
│   ├── tests
│   │   ├── fixtures
│   │   │   └── neighborhood
│   │   ├── __init__.py
│   └── web
│   ├── __init__.py
│   └── service.py
├── configuration
│   ├── Config.py
│   └── __init__.py
├── __init__.py
└── webtrack
|- teste.py
├── .gitignore
├── __init__.py
├── manager
   ├── Data.py
   ├── ImportFile.py
   └── __init__.py
We've been trying with no success to import modules from one folder to another, such as:
from algorithms.neighborhood.neighbor.connector import NeighborhoodConnector
Which yields the result:
Traceback (most recent call last):
File "teste.py", line 49, in <module>
from algorithms.neighborhood.neighbor.connector import NeighborhoodConnector
ImportError: No module named algorithms.neighborhood.neighbor.connector
We tried to append its path to the sys.path variable but with no success.
We also tried to use os.walk to insert all paths into PATH variable but still we get the same error, even though we checked PATH does contain the path to find the modules.
We are using Python 2.7 on Linux Ubuntu 13.10.
Is there anything we could be doing wrong?
Thanks in advance,
Getting imports right when running a script that lives within a package is tricky. You can read this section of the (sadly deferred) PEP 395 for a description of a bunch of ways that don't work to run such a script.
Give a file system hierarchy like:
top_level/
my_package/
__init__.py
sub_package/
__init__.py
module_a.py
module_b.py
sub_sub_package/
__init__.py
module_c.py
scripts/
__init__.py
my_script.py
script_subpackage/
__init__.py
script_module.py
There are only a few ways to make running my_script.py work right.
The first would be to put the top_level folder into the PYTHONPATH environment variable, or use a .pth file to achieve the same thing. Or, once the interpreter is running, insert that folder into sys.path (but this can get ugly).
Note that you're adding top_level to the path, not my_package! I suspect this is what you've got messed up in your current attempts at this solution. Its very easy to get wrong.
Then, absolute imports like import my_package.sub_package.module_a will mostly work correctly. (Just don't try importing package.scripts.my_script itself while it is running as the __main__ module, or you'll get a weird duplicate copy of the module.)
However, absolute imports will always be more verbose than relative imports, since you always need to specify the full path, even if you're importing a sibling module (or "niece" module, like module_c from module_a). With absolute imports, the way to get module_c is always the big, ugly mouthful of code from my_package.sub_package.sub_sub_package import module_c regardless of what module is doing the importing.
For that reason, using relative imports is often more elegant. Alas, they're hard to get to work from a script. The only ways are:
Run my_script from the top_level folder with the -m flag (e.g. python -m my_package.scripts.my_script) and never by filename.
It won't work if you're in a different folder, or if you use a different method to run the script (like pressing F5 in an IDE). This is somewhat inflexible, but there's not really any way to make it easier (until PEP 395 gets undeferred and implemented).
Set up sys.path like for absolute imports (e.g. add top_level to PYTHONPATH or something), then use a PEP 366 __package__ string to tell Python what the expected package of your script is. That is, in my_script.py you'd want to put something like this above all your relative imports:
if __name__ == "__main__" and __package__ is None:
__package__ = "my_package.my_scripts"
This will require updating if you reorganize your file organization and move the script to a different package (but that's probably less work than updating lots of absolute imports).
Once you've implemented one of those soutions, your imports can get simpler. Importing module_c from module_a becomes from .sub_sub_package import module_c. In my_script, relative imports like from ..subpackage import module_a will just work.
I know this is an old post but still I am going to post my solution.
Had a similar issue. Just added the paths with the following line before importing the package:
sys.path.append(os.path.join(os.path.dirname(__file__), '..'))
from lib import create_graph
The way imports work is slightly different in Python 2 and 3. First Python 3 and the sane way (which you seem to expect). In Python 3, all imports are relative to the folders in sys.path (see here for more about the module search path). Python doesn't use $PATH, by the way.
So you can import anything from anywhere without worrying too much.
In Python 2, imports are relative and sometimes absolute. The document about packages contains an example layout and some import statements which might be useful for you.
The section "Intra-package References" contains information about how to import between packages.
From all the above, I think that your sys.path is wrong. Make sure the folder which contains algorithms (i.e. not algorithms itself but it's parent) needs to be in sys.path
Just set __package__ = None in every .py file. It will setup all the package hierarchy automatically.
After that you may freely use absolute module names for import.
from algorithms.neighborhood.neighbor.connector import NeighborhoodConnector

Categories

Resources