Python extension with multiple modules - python

I am building Python bindings for a standalone C library that I wrote. The file layout of the library is as following:
<project root>
|
`- cpython
| |
| `- module1_mod.c
| `- module2_mod.c
| `- module3_mod.c
|
`- include
| |
| `- module1.h
| `- module2.h
| `- module3.h
|
`- src
| |
| `- module1.c
| `- module2.c
| `- module3.c
|
`- setup.py
I want to obtain a Python package so I can import modules in a namespace such as my_package.module1, my_package.module2, etc.
This is my setup.py so far:
from os import path
from setuptools import Extension, setup
ROOT_DIR = path.dirname(path.realpath(__file__))
MOD_DIR = path.join(ROOT_DIR, 'cpython')
SRC_DIR = path.join(ROOT_DIR, 'src')
INCL_DIR = path.join(ROOT_DIR, 'include')
EXT_DIR = path.join(ROOT_DIR, 'ext')
ext_libs = [
path.join(EXT_DIR, 'ext_lib1', 'lib.c'),
# [...]
]
setup(
name="my_package",
version="1.0a1",
ext_modules=[
Extension(
"my_package.module1",
[
path.join(SRC_DIR, 'module1.c',
path.join(MOD_DIR, 'module1_mod.c',
] + ext_libs,
include_dirs=[INCL_DIR],
libraries=['uuid', 'pthread'],
),
],
)
Importing mypackage.module1 works but the problem is that the external libraries are also needed by module2 and module3 (not all of them for all the modules), and I assume that if I include the same external libs in the other modules, I would get a lot of bloat.
I looked around sample setups in Github but haven't found an example resolving this problem.
What is a good way to organize my builds?
EDIT: This is actually a more severe problem in that I have symbols in module1 that are needed in module2, etc. E.g. an object in module2 requires an object type defined in module1. If I create separate binaries without including all sources for each dependency, the symbols won't be available at linking time, thus increasing redundancy and complexity of keeping track of what is needed for which module.

After a couple of days of digging into Python bug reports and scarcely documented features, I found an answer to this, which resolved both the multiple external dependencies and the internal cross-linking.
The solution was to create a monolithic "module" with all the modules defined inside it, then exposing them with a few lines of Python code in a package initialization file.
To do this I changed the module source files to header files, maintaining most of their methods static and only exposing the PyTypeObject structs and my object type structs so they can be used in other modules.
Then I moved the PyMODINIT_FUNC functions defining all the modules in a "package" module (py_mypackage.c), which also defines an empty module. the "package" module is defined as _my_package.
Finally I added some internal machinery to an __init__.py script that extracts the module symbols from the .so file and exposes them as modules of the package. This is documented in the Python docs :
import importlib.util
import sys
import _my_package
pkg_path = _my_package.__file__
def _load_module(mod_name, path):
spec = importlib.util.spec_from_file_location(mod_name, path)
module = importlib.util.module_from_spec(spec)
sys.modules[mod_name] = module
spec.loader.exec_module(module)
return module
for mod_name in ('module1', 'module2', 'module3'):
locals()[mod_name] = _load_module(mod_name, pkg_path)
The new layout is thus:
<project root>
|
`- cpython
| |
| `- my_package
| |
| `- __init__.py
|
| `- py_module1.h
| `- py_module2.h
| `- py_module3.h
| `- py_mypackage.c
|
`- include
| |
| `- module1.h
| `- module2.h
| `- module3.h
|
`- src
| |
| `- module1.c
| `- module2.c
| `- module3.c
|
`- setup.py
And setup.py:
setup(
name="my_package",
version="1.0a1",
package_dir={'my_package': path.join(CPYTHON_DIR, 'my_package')},
packages=['my_package'],
ext_modules=[
Extension(
"_my_package",
"<all .c files in cpython folder + ext library sources>",
libraries=[...],
),
],
)
For the curious, the complete code is at https://notabug.org/scossu/lsup_rdf/src/e08da1a83647454e98fdb72f7174ee99f9b8297c/cpython (pinned at the current commit).

Related

How to get absolute path of root directory from anywhere within the directory in python

Let's say I have the following directory
model_folder
|
|
------- model_modules
| |
| ---- __init__.py
| |
| ---- foo.py
| |
| ---- bar.py
|
|
------- research
| |
| ----- training.ipynb
| |
| ----- eda.ipynb
|
|
------- main.py
and I want to import model_modules into a script in research
I can do that with the following
import sys
sys.path.append('/absolute/path/model_folder')
from model_modules.foo import Foo
from model_modules.bar import Bar
However, let's say I don't explicitly know the absolute path of the root, or perhaps just don't want to hardcode it as it may change locations. How could I get the absolute path of module_folder from anywhere in the directory so I could do something like this?
import sys
sys.path.append(root)
from model_modules.foo import Foo
from model_modules.bar import Bar
I referred to this question in which one of the answers recommends adding the following to the root directory, like so:
utils.py
from pathlib import Path
def get_project_root() -> Path:
return Path(__file__).parent.parent
model_folder
|
|
------- model_modules
| |
| ---- __init__.py
| |
| ---- foo.py
| |
| ---- bar.py
|
|
|
------- src
| |
| ---- utils.py
|
|
|
|
|
------- research
| |
| ----- training.ipynb
| |
| ----- eda.ipynb
|
|
------- main.py
But then when I try to import this into a script in a subdirectory, like training.ipynb, I get an error
from src.utils import get_project_root
root = get_project_root
ModuleNotFoundError: No module named 'src'
So my question is, how can I get the absolute path to the root directory from anywhere within the directory in python?
sys.path[0] contain your root directory (the directory where the program is located). You can use that to add your sub-directories.
import sys
sys.path.append( sys.path[0] + "/model_modules")
import foo
and for cases where foo.py may exist elsewhere:
import sys
sys.path.insert( 1, sys.path[0] + "/model_modules") # put near front of list
import foo

How can I create a dynamic logging structure in scrapy?

I am building a scraper using scrapy framework. I want to log every run in a structured manner. I already created dynamic logging using the timestamp in the setting.py file
LOG_DIR = os.path.join(BASE_DIR, 'logs')
if not os.path.exists(LOG_DIR):
try:
os.mkdir(LOG_DIR)
except OSError as e:
pass
LOG_FILE = os.path.join(LOG_DIR, f'{datetime.now().timestamp()}.log')
but I further want to store logs in a nested directory structure. that can help me access the accurate logs more easily
i.e.
- ROOT
|- logs
| |- 21-02-2022
| |- 22-02-2022
| | |- us
| | |- UK
| | | |- t-shirt
| | | |- hoodie
| | | | |- 1653029099.520938.log
Can someone please direct me on how can I achieve this?

exposing all classes and functions visible from the main module

I'm a little confused about the import in a python project.
I used this as a model to create my project:
https://docs.python-guide.org/writing/structure/
at the moment, working in spyder, I set my working directory to MyProject/
MyProject
|
|
--- mymodule
| |
| |--- myclass1.py (contains def MyClass1 )
| |
| |--- myclass2.py (contains def MyClass2 )
|
|
|--- tests
| |
| |-- test_MyClass1.py (contains def TestMyClass1(unittest.TestCase)
| |
| |
| |-- test_MyClass2.py (contains def TestMyClass2(unittest.TestCase)
then I run test_MyClass1.py
the test_MyClass1.py references the MyClass1 this way:
from mymodule.myclass1 import MyClass1
and in the myclass1.py, I reference the MyClass2 this way:
from mymodule.myclass2 import MyClass2
I read about the __init__.py and the namespace packages, the more I read the more confused I get...
Basically I do not want to do :
mymodule.myfile.myclass
but rather:
import mymodule as mm
mm.MyClass1
or again:
from mymodule import *
a = MyClass1()
Still, I want one file by class.
You can add the import of MyClass1 and MyClass2 in mymodule/__init__.py.
Basically you will have the following files:
mymodule/
__init__.py
myclass1.py
myclass2.py
tests/
test_myclass1.py
test_myclass2.py
where:
mymodule/__init__.py contains the following lines:
from mymodule.myclass1 import MyClass1
from mymodule.myclass2 import MyClass2
mymodule/myclass1.py contains MyClass1 definition
mymodule/myclass2.py contains MyClass2 definition
Then in tests/test_myclass1.py you can import MyClass1 thanks to:
from mymodule import MyClass1
a = MyClass1()
or
import mymodule as mm
a = mm.MyClass1()
You can do the same for MyClass2

Python Submodule is not getting imported

This is the file path for my Pydev project in Eclipse:
project
|
+----tests
| |
| +----subtests
| | |
| | +----__init__.py
| | |
| | +----test1.py
| |
| +----__init__.py
| |
| +----test2.py
|
+----mods
|
+----__init__.py
|
+----submods1
|
+----__init__.py
|
+----submods2
|
+----__init__.py
|
+----a.py
|
+----b.py
|
...
|
+----z.py
test1 and test2 are exactly the same, all of the init files only have comments in them. The tests are getting the modules from the mods directory and those modules dependencies. When I run test1, all of the modules are found, but test2 always unable to find the same module (let's call it "z.py") in submods2. But somehow it's able to find the rest of the modules. It's not that it's unable to import something in z.py, it just cannot find the file at all.
test2:
>>> from mods.submod1.submod2 import z
exec exp in global_vars, local_vars
File "<console>", line 1, in <module>
ImportError: cannot import name z
>>> from mods.submod1 import submod2
>>> hasattr(submod2, 'z')
False
The only difference in the sys.path during the two tests are the directories that tests are located in, project/tests/subtests for test1 and project/tests for test2.
I cannot figure out why test2 is unable to import z.py but test1 can and test2 can import the rest of the modules.
To help diagnose the issue, do:
from mods.submod1 import submod2
print(submod2)
My guess is that it's not the module you're expecting.
What Python version are you using?
I think I found my solution to this. In my Run Configurations for test2, the Working directory in the Arguments tab had a custom path ${workspace_loc:project/tests/}, I switched it to the default path ${project_loc:/selected project name} and that seems to be fixing the issue. While I don't understand how this fixed the problem, the result is good enough for me.

PyCharm PYTHONPATH with different parts of single logical package

Assume I have projects deployment and cms with this structure:
+ deployment
| + src
| | + my_company
| | | + __init__.py
| | | + deployment
| | | | + ...
+ cms
| + src
| | + my_company
| | | + __init__.py
| | | + cms
| | | | + ...
+ ...
My company has many projects that are distributed as single logical package my_company. This functionality ensures extend_path in each my_company/__init__.py file.
https://docs.python.org/2/library/pkgutil.html#pkgutil.extend_path
So then is possible import like this:
from mp_company import cms
from mp_company import deployment
Problem comes when I mark all src directories as Sources Root in PyCharm. Because then PyCharm sees just only one package (probably the first it encounters) for the first level of imports in suggestions box. So if I want sugesstions for phrase import my_company. it appears only deployment. Strange is that for second level of imports all working right. So all suggestions for phrase import my_company.cms. suddenly appears after I write dot character after cms package name.
Is there any option in settings to fix this problem?
It looks like it is known issue https://youtrack.jetbrains.com/issue/PY-23087.

Categories

Resources