Airflow PyTest DagBag Import Error from local module - python

I have a directory structure as follows for my DAGs:
.
├── __init__.py
├── _base.py
├── dag1.py
├── dag2.py
├── dag3.py
├── ...
When I run PyTest to fill my DagBag, I get the following error for each DAG file
ERROR - Failed to import: dags/dag2.py
Traceback (most recent call last):
File "~/code/myproject/venv/lib/python3.7/site-packages/airflow/models/dagbag.py", line 339, in parse
loader.exec_module(new_module)
File "<frozen importlib._bootstrap_external>", line 728, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "dags/dag2.py", line 12, in <module>
from _base import start_here, task_factory
ModuleNotFoundError: No module named '_base'
My _base.py is a helper file that holds common task declarations like:
def task_factory(
operator_func,
base_env_vars=[],
op_kwargs={},
):
def build_airflow_task(task_id=None, extra_env_vars=[], **kwargs):
task = operator_func(
task_id=task_id,
name=task_id,
env_vars=[*base_env_vars, *extra_env_vars],
**op_kwargs,
**kwargs,
)
return task
return build_airflow_task
In each DAG file, I do the following:
import os
from airflow import DAG
from airflow.kubernetes.secret import Secret
from airflow.providers.google.cloud.operators.kubernetes_engine import (
GKEStartPodOperator,
)
from airflow.utils.helpers import chain
from kubernetes.client import models as k8s
from _base import start_here, finish, dummy_task, task_factory
I am not sure how to instrument Airflow/PyTest to find _base.py for tests, but in my actual deployment (Google Cloud Composer), my DAG files are able to find and import from _base.py without issue.
My test file looks like this:
import sys
sys.path.insert(0, "..")
from airflow.models import DagBag
def test_dagbag_compiles():
dags = DagBag("dags", include_examples=False)
assert len(dags.import_errors) == 0

The best solution is adding your module to python path:
PYTHONPATH=/path/to/module pytest ...

Related

ImportError: cannot import name 'TryExcept' from 'utils' (mypath/utils.py)

I have a script with the following folder structure:
myfolder/
myscript.py
utils.py
mymodels/
mymodel.py
utils.py
within the myscript.py I call:
from utils import funca, funcb, funcc
from mymodel.utils import DataLoader
from mymodels.mymodel import *
But mymodel itself import loads a yolov5 Modell from torch.hub and throws the following error message:
File "/home/vitouser/.local/lib/python3.10/site-packages/torch/hub.py", line 540, in load
model = _load_local(repo_or_dir, model, *args, **kwargs)
File "/home/vitouser/.local/lib/python3.10/site-packages/torch/hub.py", line 569, in _load_local
model = entry(*args, **kwargs)
File "/home/vitouser/.cache/torch/hub/ultralytics_yolov5_master/hubconf.py", line 83, in custom
return _create(path, autoshape=autoshape, verbose=_verbose, device=device)
File "/home/vitouser/.cache/torch/hub/ultralytics_yolov5_master/hubconf.py", line 33, in _create
from models.common import AutoShape, DetectMultiBackend
File "/home/vitouser/.cache/torch/hub/ultralytics_yolov5_master/models/common.py", line 28, in <module>
from utils import TryExcept
ImportError: cannot import name 'TryExcept' from 'utils' (/myfolder/utils.py)
I am guessing the extensive use of "utils" is causing some form of issue. Since Python cant find the TryExcept in my own /myfolder/utils.py.
How do I get rid of this problem.
Just calling my mymodel.py file from the shell using python does not cause any issues. So I am guessing it must be the import of the my utils file.

Python grpc fail : PicklingError: Can't pickle <class 'demo_pb2.msg'>: import of module demo_pb2 failed

It is not possible to import python modules generated by protobuf. Here's the code.
error message
here is the error message after run python main.py.
# python main.py
Traceback (most recent call last):
File "/usr/lib64/python2.7/multiprocessing/queues.py", line 266, in _feed
send(obj)
PicklingError: Can't pickle <class 'demo_pb2.msg'>: import of module demo_pb2 failed
^CTraceback (most recent call last):
File "main.py", line 31, in <module>
main()
File "main.py", line 24, in main
for msg in iter(q.get, None):
File "/usr/lib64/python2.7/multiprocessing/queues.py", line 117, in get
res = self._recv()
KeyboardInterrupt
code tree
.
├── __init__.py
├── main.py
└── pb
├── demo_pb2_grpc.py
├── demo_pb2.py
├── demo.proto
├── __init__.py
└── run_codegen.py
main.py
from multiprocessing import Queue
import os
from threading import Thread
import time
from pb import demo_pb2
q = Queue()
def generate_file_path(path):
for root, dirs, files in os.walk(path):
for dir_ in dirs:
q.put(demo_pb2.msg(path=os.path.join(root, dir_)))
time.sleep(0.1)
for file_ in files:
q.put(demo_pb2.msg(path=os.path.join(root, file_)))
time.sleep(0.1)
q.put(None)
def main():
t = Thread(target=generate_file_path, args=('/root/pip',))
t.start()
for msg in iter(q.get, None):
print(msg)
q.close()
t.join()
if __name__ == '__main__':
main()
pb/demo.proto
syntax = "proto3";
package demo;
message msg {
string path = 1;
}
pb/run_codegen.py
from grpc_tools import protoc
protoc.main((
'-I./',
'--python_out=./',
'--grpc_python_out=./',
'demo.proto',
))
Protobuf messages are not picklable. multiprocessing.Queue requires that enqueued objects be picleable. Maybe try serializing to JSON on one end and then deserializing on the other.

Python Azure Function: Failure Exception: ModuleNotFoundError: No module named '__main__'

I'm triggering an Azure Function that fetches some data and writes it to a SQL db. The file Function works perfectly locally, but when I deploy it to Azure I keep getting the following error:
Result: Failure Exception: ModuleNotFoundError: No module named '__main__'. Troubleshooting
Guide: https://aka.ms/functions-modulenotfound Stack: File "/azure-functions-
host/workers/python/3.9/LINUX/X64/azure_functions_worker/dispatcher.py", line 301, in
_handle__function_load_request func = loader.load_function( File "/azure-functions-
host/workers/python/3.9/LINUX/X64/azure_functions_worker/utils/wrappers.py", line 42, in call
raise extend_exception_message(e, message) File "/azure-functions-
host/workers/python/3.9/LINUX/X64/azure_functions_worker/utils/wrappers.py", line 40, in call
return func(*args, **kwargs) File "/azure-functions-
host/workers/python/3.9/LINUX/X64/azure_functions_worker/loader.py", line 83, in load_function
mod = importlib.import_module(fullmodname) File
"/usr/local/lib/python3.9/importlib/__init__.py", line 127, in import_module return
_bootstrap._gcd_import(name[level:], package, level) File
"/home/site/wwwroot/jobinfo/__init__.py", line 7, in <module> from .auth_and_get import * File
"/home/site/wwwroot/jobinfo/auth_and_get.py", line 9, in <module> load_dotenv() File
"/home/site/wwwroot/.python_packages/lib/site-packages/dotenv/main.py", line 317, in
load_dotenv f = dotenv_path or stream or find_dotenv() File
"/home/site/wwwroot/.python_packages/lib/site-packages/dotenv/main.py", line 265, in
find_dotenv if usecwd or _is_interactive() or getattr(sys, 'frozen', False): File
"/home/site/wwwroot/.python_packages/lib/site-packages/dotenv/main.py", line 262, in
_is_interactive main = __import__('__main__', None, None, fromlist=['__file__'])
In my Function file __init__.py I import a module auth_and_get.py where I've made methods, which I wanna call inside my Azure Function, and I think that something's wrong with my import, but I've tried fixing it without luck.
My current __init__.py file is
from __future__ import absolute_import
import datetime
import logging
import azure.functions as func
from .auth_and_get import *
def main(mytimer: func.TimerRequest) -> None:
utc_timestamp = datetime.datetime.utcnow().replace(
tzinfo=datetime.timezone.utc).isoformat()
if mytimer.past_due:
logging.info('The timer is past due!')
authenticate()
users()
to_db()
logging.info('Python timer trigger function ran at %s', utc_timestamp)
and I suspect that it's the following statment that fails: from .auth_and_get import *. I've tried removing the dot such that from auth_and_get import *, but then the module becomes unresolveable.
The structure is
ProjectFolder/
| - .venv
| - .vscode
|jobinfo/
| | - __pycache__
| | - __init__.py
| | - auth_and_get.py
| | - function.json
| | - sample.dat
| - .funcignore
| - host.json
| - local.settings.json
| - proxies.json
| - requirements.txt
QUESTION
Why do I get the Failure Exception: ModuleNotFoundError: No module named '__main__' error?
So after a variety of attempts, I found a solution by adding
sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__))))
to the __init__.py file.
Have you set startup command on Azure Portal for your App Service?
I mean Settings->Configuration->General Settings and then in the field "Startup Command" you should have something like:
gunicorn --bind=0.0.0.0 --timeout 600 --chdir jobinfo __init__:<app object>
Of course you can set another command (like non-Gunicorn), but it is an example based on Azure docs.
Azure by default looks for main module named app.py or application.py.
It's explained here: https://learn.microsoft.com/en-us/azure/app-service/configure-language-python#customize-startup-command
ModuleNotFoundError: No module named '__main__' means that Azure looks for module named __main__ instead of __init__.

python module import conundrum (module from submodule)

I have the following project structure:
./app/__init__.py
./app/main/__init__.py
./app/main/views.py
./app/models.py
./app/resources/__init__.py
./app/resources/changesAPI.py
./config.py
./manage.py
The app/models.py file has the following line:
from app import db
db is defined in app/__init__.py
db = SQLAlchemy()
I'm importing classes from models.py from app/resources/__init__.py:
from app.models import User, Task, TaskChange, Revision
However, it fails when model tries to import db:
Traceback (most recent call last):
File "manage.py", line 5, in <module>
from app import create_app, db, api
File "/Users/nahuel/proj/ptcp/app/__init__.py", line 16, in <module>
from app.resources.changesAPI import ChangesAPI
File "/Users/nahuel/proj/ptcp/app/resources/__init__.py", line 5, in <module>
from app.models import User, Task, TaskChange, Revision
File "/Users/nahuel/proj/ptcp/app/models.py", line 1, in <module>
from app import db
ImportError: cannot import name db
What am I doing wrong?
You have a circular import.
You are importing create_app, db and api from manage.py, which triggers an import of the app.resources.changesAPI module, which in turn then triggers import of the __init__.py package in app/resources which then tries to import your models, which fails because db was not yet defined in app/__init__.py.
You need to move importing ChangesAPI to after the line that defines db in your app/__init__.py file. Any name defined in app/__init__.py before the from app.resources.changesAPI import ChangesAPI is available to your sub-packages, names after are not.

Inheritance in web.py?

I am currently developing wep.py application. This is my web application which is binded with web.py and wsgi.
root/main.py
import web
import sys
import imp
import os
sys.path.append(os.path.dirname(__file__))
#from module import module
from exam import exam
urls = (
'/exam', 'exam'
)
application = web.application(urls, globals(), autoreload = True).wsgifunc()
My application has an abstract class called module in module.py in root directory and its purpose is to be inherited by modules.
root/module.py
class module:
def fetchURL(self, url):
# ...
return content
The lower level module called "exam" would inherits module
root/exam/init.py
from module import module
class exam(module):
def getResults(self):
# error occurs here
self.fetchURL('math.json')
When I call the parent method, web.py raises an exception
WalkerError: ('unexpected node type', 339)
Environment: Python 2.5
How can I resolve the problem? Thanks
// EDIT 03 July 10:22 GMT+0
The stack trace is as follows
mod_wsgi (pid=1028): Exception occurred processing WSGI script 'D:/py/labs_library/index.py'.
Traceback (most recent call last):
File "D:\csvn\Python25\lib\site-packages\web\application.py", line 277, in wsgi
result = self.handle_with_processors()
File "D:\csvn\Python25\lib\site-packages\web\application.py", line 247, in handle_with_processors
return process(self.processors)
File "D:\csvn\Python25\lib\site-packages\web\application.py", line 244, in process
raise self.internalerror()
File "D:\csvn\Python25\lib\site-packages\web\application.py", line 467, in internalerror
return debugerror.debugerror()
File "D:\csvn\Python25\lib\site-packages\web\debugerror.py", line 305, in debugerror
return web._InternalError(djangoerror())
File "D:\csvn\Python25\lib\site-packages\web\debugerror.py", line 290, in djangoerror
djangoerror_r = Template(djangoerror_t, filename=__file__, filter=websafe)
File "D:\csvn\Python25\lib\site-packages\web\template.py", line 845, in __init__
code = self.compile_template(text, filename)
File "D:\csvn\Python25\lib\site-packages\web\template.py", line 924, in compile_template
ast = compiler.parse(code)
File "D:\csvn\Python25\lib\compiler\transformer.py", line 51, in parse
return Transformer().parsesuite(buf)
File "D:\csvn\Python25\lib\compiler\transformer.py", line 128, in parsesuite
return self.transform(parser.suite(text))
File "D:\csvn\Python25\lib\compiler\transformer.py", line 124, in transform
return self.compile_node(tree)
File "D:\csvn\Python25\lib\compiler\transformer.py", line 167, in compile_node
raise WalkerError, ('unexpected node type', n)
WalkerError: ('unexpected node type', 339)
If it is possible I would like to turn off the template functionality as I use python only for JSON output for mobile app.
if you create python module you should add __init__.py in top of your hierarchy:
dvedit/
__init__.py
clipview.py
filters/
__init__.py
it means that in every directory which will be imported via from ... import ... should have __init__.py file.
further info available: http://wiki.cython.org/PackageHierarchy

Categories

Resources