I am building a scraper using scrapy framework. I want to log every run in a structured manner. I already created dynamic logging using the timestamp in the setting.py file
LOG_DIR = os.path.join(BASE_DIR, 'logs')
if not os.path.exists(LOG_DIR):
try:
os.mkdir(LOG_DIR)
except OSError as e:
pass
LOG_FILE = os.path.join(LOG_DIR, f'{datetime.now().timestamp()}.log')
but I further want to store logs in a nested directory structure. that can help me access the accurate logs more easily
i.e.
- ROOT
|- logs
| |- 21-02-2022
| |- 22-02-2022
| | |- us
| | |- UK
| | | |- t-shirt
| | | |- hoodie
| | | | |- 1653029099.520938.log
Can someone please direct me on how can I achieve this?
I am trying to execute the following structure in order to have a more manageable folder - structure in order to have a dedicated folder for celery tasks for those who don't depend on Django project, but I am getting unregistered tasks when I have registered tasks from app.autodiscover_tasks what am I missing in my codebase? https://github.com/SkyBulk/celery/blob/main/thirdparty/celery_tasks/celery.py#L5
Let us focus on this path tree:
└── thirdparty
├── appone
│ ├── __init__.py
│ └── tasks.py
├── apptwo
│ ├── __init__.py
│ └── tasks.py
└── celery_tasks
├── celery.py
├── __init__.py
└── settings.py
Relative to the outer project (as the sources root), the list of modules are:
thirdparty.appone.tasks (contains add)
thirdparty.apptwo.tasks (contains mult)
thirdparty.celery_tasks.celery (contains app)
thirdparty.celery_tasks.settings (contains CELERY_BROKER_URL, CELERY_RESULT_BACKEND, etc.)
thirdparty/celery_tasks/init.py
from __future__ import absolute_import
from outside import file
from .celery import app as celery_app
__all__ = ['celery_app']
thirdparty/celery_tasks/appone/tasks.py
import time
from celery import shared_task
from outside import file
#shared_task
def add(x, y):
print('start appone add function')
time.sleep(10)
file.fun()
return x + y
thirdparty/celery_tasks/apptwo/tasks.py
import time
from celery import shared_task
#shared_task
def mult(x, y):
print('start apptwo mult function')
time.sleep(10)
print('result:', x * y)
return x * y
Error:
celery_1 | Did you remember to import the module containing this task?
celery_1 | Or maybe you're using relative imports?
celery_1 |
celery_1 | Please see
celery_1 | http://docs.celeryq.org/en/latest/internals/protocol.html
celery_1 | for more information.
celery_1 |
celery_1 | The full contents of the message body was:
celery_1 | b'[[], {}, {"callbacks": null, "errbacks": null, "chain": null, "chord": null}]' (77b)
celery_1 | Traceback (most recent call last):
celery_1 | File "/usr/local/lib/python3.8/site-packages/celery/worker/consumer/consumer.py", line 562, in on_task_received
celery_1 | strategy = strategies[type_]
celery_1 | KeyError: 'appone.tasks.add'
celery_1 | [2021-04-30 06:07:00,004: ERROR/MainProcess] Received unregistered task of type 'apptwo.tasks.mult'.
celery_1 | The message has been ignored and discarded.
They might have been registered under a different name. I suggest you run celery -A your.project inspect registered (replace your.project with correct name) and check the registered tasks with what you expect to have registered.
I am building Python bindings for a standalone C library that I wrote. The file layout of the library is as following:
<project root>
|
`- cpython
| |
| `- module1_mod.c
| `- module2_mod.c
| `- module3_mod.c
|
`- include
| |
| `- module1.h
| `- module2.h
| `- module3.h
|
`- src
| |
| `- module1.c
| `- module2.c
| `- module3.c
|
`- setup.py
I want to obtain a Python package so I can import modules in a namespace such as my_package.module1, my_package.module2, etc.
This is my setup.py so far:
from os import path
from setuptools import Extension, setup
ROOT_DIR = path.dirname(path.realpath(__file__))
MOD_DIR = path.join(ROOT_DIR, 'cpython')
SRC_DIR = path.join(ROOT_DIR, 'src')
INCL_DIR = path.join(ROOT_DIR, 'include')
EXT_DIR = path.join(ROOT_DIR, 'ext')
ext_libs = [
path.join(EXT_DIR, 'ext_lib1', 'lib.c'),
# [...]
]
setup(
name="my_package",
version="1.0a1",
ext_modules=[
Extension(
"my_package.module1",
[
path.join(SRC_DIR, 'module1.c',
path.join(MOD_DIR, 'module1_mod.c',
] + ext_libs,
include_dirs=[INCL_DIR],
libraries=['uuid', 'pthread'],
),
],
)
Importing mypackage.module1 works but the problem is that the external libraries are also needed by module2 and module3 (not all of them for all the modules), and I assume that if I include the same external libs in the other modules, I would get a lot of bloat.
I looked around sample setups in Github but haven't found an example resolving this problem.
What is a good way to organize my builds?
EDIT: This is actually a more severe problem in that I have symbols in module1 that are needed in module2, etc. E.g. an object in module2 requires an object type defined in module1. If I create separate binaries without including all sources for each dependency, the symbols won't be available at linking time, thus increasing redundancy and complexity of keeping track of what is needed for which module.
After a couple of days of digging into Python bug reports and scarcely documented features, I found an answer to this, which resolved both the multiple external dependencies and the internal cross-linking.
The solution was to create a monolithic "module" with all the modules defined inside it, then exposing them with a few lines of Python code in a package initialization file.
To do this I changed the module source files to header files, maintaining most of their methods static and only exposing the PyTypeObject structs and my object type structs so they can be used in other modules.
Then I moved the PyMODINIT_FUNC functions defining all the modules in a "package" module (py_mypackage.c), which also defines an empty module. the "package" module is defined as _my_package.
Finally I added some internal machinery to an __init__.py script that extracts the module symbols from the .so file and exposes them as modules of the package. This is documented in the Python docs :
import importlib.util
import sys
import _my_package
pkg_path = _my_package.__file__
def _load_module(mod_name, path):
spec = importlib.util.spec_from_file_location(mod_name, path)
module = importlib.util.module_from_spec(spec)
sys.modules[mod_name] = module
spec.loader.exec_module(module)
return module
for mod_name in ('module1', 'module2', 'module3'):
locals()[mod_name] = _load_module(mod_name, pkg_path)
The new layout is thus:
<project root>
|
`- cpython
| |
| `- my_package
| |
| `- __init__.py
|
| `- py_module1.h
| `- py_module2.h
| `- py_module3.h
| `- py_mypackage.c
|
`- include
| |
| `- module1.h
| `- module2.h
| `- module3.h
|
`- src
| |
| `- module1.c
| `- module2.c
| `- module3.c
|
`- setup.py
And setup.py:
setup(
name="my_package",
version="1.0a1",
package_dir={'my_package': path.join(CPYTHON_DIR, 'my_package')},
packages=['my_package'],
ext_modules=[
Extension(
"_my_package",
"<all .c files in cpython folder + ext library sources>",
libraries=[...],
),
],
)
For the curious, the complete code is at https://notabug.org/scossu/lsup_rdf/src/e08da1a83647454e98fdb72f7174ee99f9b8297c/cpython (pinned at the current commit).
Assume I have projects deployment and cms with this structure:
+ deployment
| + src
| | + my_company
| | | + __init__.py
| | | + deployment
| | | | + ...
+ cms
| + src
| | + my_company
| | | + __init__.py
| | | + cms
| | | | + ...
+ ...
My company has many projects that are distributed as single logical package my_company. This functionality ensures extend_path in each my_company/__init__.py file.
https://docs.python.org/2/library/pkgutil.html#pkgutil.extend_path
So then is possible import like this:
from mp_company import cms
from mp_company import deployment
Problem comes when I mark all src directories as Sources Root in PyCharm. Because then PyCharm sees just only one package (probably the first it encounters) for the first level of imports in suggestions box. So if I want sugesstions for phrase import my_company. it appears only deployment. Strange is that for second level of imports all working right. So all suggestions for phrase import my_company.cms. suddenly appears after I write dot character after cms package name.
Is there any option in settings to fix this problem?
It looks like it is known issue https://youtrack.jetbrains.com/issue/PY-23087.
I'm having a problem with assigning static files containing resources.
My working directory structure is:
|- README.md
|- nlp
| |-- morpheme
| |-- |-- morpheme_builder.py
| |-- fsa_setup.py
| - tests
| |-- test_fsa.py
| - res
| |-- suffixes.xml
The code for fsa_setup.py is:
class FSASetup():
fsa = None
def get_suffixes():
list_suffix = list()
file = os.path.realpath("../res/suffixes.xml")
.....
if __name__ == "__main__":
FSASetup.get_suffixes()
The code for morpheme_builder.py is:
class MorphemeBuilder:
def get_all_words_from_fsa(self):
......
if __name__ == "__main__":
FSASetup.get_suffixes()
When it is called in fsa_setup.py, the file path's value is '\res\suffixes.xml' and that is correct, but when the other case realized, the file path value is '\nlp\res\suffixes.xml'.
I know how it works like this. So how can I give the path of the resource to the file.
The problem is that morpheme_builder.py is in the directory morphem. So when you say ../res/suffixes.xml it will go on directory back ... so it will go to nlp/res/suffixes.xml. What about if you use os.path.abspath("../res/suffixes.xml")?