Python Celery - Using shared_tasks with QueueOnce - python

Main Problem
I cannot get QueueOnce (imported from celery-once) to work with #shared_task which makes it impossible to test the celery execution of a function.
Background
I am trying to implement a clean directory structure for celery using shared_tasks, so that I can re-use the function definitions when testing with pytest.
My directory structure looks like:
module
-> celery_app.py
-> tasks
-> task_1_foo.py
-> task_2_bar.py
The celery app is initialized in celery_app. Example code:
import celery
from module.tasks.task_1_foo import task_1_foo
from module.tasks.task_2_foo import task_2_foo
def configure_app():
app = celery.Celery("app")
app.register_task(task_1_foo)
app.register_task(task_2_foo)
A task file looks like:
from celery import shared_task
#shared_task(name='task_1_foo')
def task_1_foo(x, y):
return x + y
Using the common testing strategies provided in the celery docs, I can spin up a redis server and see my tests executing properly.
Queue Once and Testing
We are trying to implement QueueOnce (imported from the celery-once plug-in). QueueOnce does not seem to work with #shared_app. This means that it's really hard to test code that executes a CRON job because we cannot validate correct de-duplication of keys.
Changing the definition of task_1_foo:
from celery import shared_task
from celery_once import QueueOnce
#shared_task(name='task_1_foo', base=QueueOnce)
def task_1_foo(x, y):
return x + y
Does anyone how to make QueueOnce compatible with the shared_task decorator in Celery?

Related

Lambda Handler to invoke Python script directly

I'm not a Python developer but I got stuck with taking over the maintenance of a Python 3.9 project that currently runs as a cronjob on an EC2 instance. I need to get it running from AWS Lambda. Currently the crontab on the EC2 instance invokes a cronjob/run.py script that looks a little like this:
import os
import sys
from dotenv import load_dotenv
load_dotenv()
sync_events = get_sync_events()
# lots more stuff down here
The important thing here is that there is no __main__ method invoked. The crontab just treats this Python source file like a script and executes it from top to bottom.
My understanding is that the Lambda Handler needs a main method to be invoked. So I need a way to run the existing cronjob/run.py (that again, has no main entry point) from inside the Lambda Handler, somehow:
def lambda_handler(event, context):
try:
# run everything thats in cronjob/run.py right here
raise e
except Exception as e:
raise e
if __name__ == '__main__':
lambda_handler(None, None)
So my question: do I need my Lambda Handler to have a __main__ method like the above, or is it possible to configure my Lambda to just call cronjob/run.py directly? If not, what are the best options here? Thanks in advance!
do I need my Lambda Handler to have a main method
No, you don't.
If you just want to run run.py with lambda, you can keep things simple and just use:
import os
import sys
from dotenv import load_dotenv
def main(event, context):
load_dotenv()
sync_events = get_sync_events()
# lots more stuff down here
and configure the lambda function to have run.main as the handler.
The name of the handler function, in this case main, can be anything, but it must have event and context as arguments.
You can find more information on lambda handler here: https://docs.aws.amazon.com/lambda/latest/dg/python-handler.html

Airflow 2.0.1/Python 3.7.9 ModuleNotFoundError for custom hook

I have structure as below in my native airflow build
dags/cust_dag.py
dags/jhook.py --contains class UtilTriggers under which there are multiple methods
In cust_dag code i am calling that hook/module as :
from jhook import UtilTriggers as trigger
When I check on Airflow UI, I am getting broken dag for cust_dag mentioning error as
ModuleNotFoundError: No module named jhook
The same kind of code is working on composer 1.9, currently I am running this on native airflow.
Also I have tried adding init.py file as well as created a new folder job_trigger under which I added that file still not working.
I have tried solution mentioned in this question Apache Airflow DAG cannot import local module
i.e adding below code lines in both hook custom module and dag file
import sys
sys.path.insert(0,os.path.abspath(os.path.dirname(file)))
Please guide me what can be the cause for this ModuleNotFound Error when everything looks okay.
As Per your comments, the error message you are receiving is “import error”, it seems the problem is related to python only.
dag1.py
from __future__ import print_function
import datetime
from airflow import models
from airflow.operators import bash_operator
from airflow.operators import python_operator
from new1 import hi as h1
default_dag_args = {
# The start_date describes when a DAG is valid / can be run. Set this to a
# fixed point in time rather than dynamically, since it is evaluated every
# time a DAG is parsed. See:
# https://airflow.apache.org/faq.html#what-s-the-deal-with-start-date
'start_date': datetime.datetime(2018, 1, 1),
}
# Define a DAG (directed acyclic graph) of tasks.
# Any task you create within the context manager is automatically added to the
# DAG object.
with models.DAG(
'demo_run',
schedule_interval=datetime.timedelta(days=1),
default_args=default_dag_args) as dag:
# An instance of an operator is called a task. In this case, the
# hello_python task calls the "greeting" Python function.
hello_python = python_operator.PythonOperator(
task_id='hello_world',
python_callable=h1.funt,
op_kwargs={"x" : "python"})
# Likewise, the goodbye_bash task calls a Bash script.
goodbye_bash = bash_operator.BashOperator(
task_id='bye',
bash_command='echo Goodbye.')
new1.py
class hi:
#staticmethod
def funt(x):
return x + " is a programming language"
Since You are using all your methods as Static Methods, there is no need to pass self to your method.The self keyword in the start method refers to the object. Because static methods can be called without object creation, they do not have a self keyword.
If you are passing some arguments in your methods, make sure the arguments are also passed to the DAG Tasks as well by providing op_args and op_kwargs arguments.
To answer your question if this would be a kubernetes issue as it is hosted there.
This issue is not related to kubernetes as
When we create a Composer Environment, The Composer service creates one GKE cluster per environment. The cluster is named and labeled automatically, and should not be deleted manually by users. The cluster is created and managed through the Deployment Manager.
If the cluster is deleted, then the environment will be irreparable and will need to be recreated. Kubernetes Errors will be like “Http error status code: 400 Http error message: BAD REQUEST”,etc.

How to patch globally in pytest?

I use pytest quite a bit for my code. Sample code structure looks like this. The entire codebase is python-2.7
core/__init__.py
core/utils.py
#feature
core/feature/__init__.py
core/feature/service.py
#tests
core/feature/tests/__init__.py
core/feature/tests/test1.py
core/feature/tests/test2.py
core/feature/tests/test3.py
core/feature/tests/test4.py
core/feature/tests/test10.py
The service.py looks something like this:
from modules import stuff
from core.utils import Utility
class FeatureManager:
# lots of other methods
def execute(self, *args, **kwargs):
self._execute_step1(*args, **kwargs)
# some more code
self._execute_step2(*args, **kwargs)
utility = Utility()
utility.doThings(args[0], kwargs['variable'])
All the tests in feature/tests/* end up using core.feature.service.FeatureManager.execute function. However utility.doThings() is not necessary for me to be run while I am running tests. I need it to happen while the production application runs but I do not want it to happen while the tests are being run.
I can do something like this in my core/feature/tests/test1.py
from mock import patch
class Test1:
def test_1():
with patch('core.feature.service.Utility') as MockedUtils:
exectute_test_case_1()
This would work. However I added Utility just now to the code base and I have more than 300 test cases. I would not want to go into each test case and write this with statement.
I could write a conftest.py which sets a os level environment variable based on which the core.feature.service.FeatureManager.execute could decide to not execute the utility.doThings but I do not know if that is a clean solution to this issue.
I would appreciate if someone could help me with global patches to the entire session. I would like to do what I did with the with block above globally for the entire session. Any articles in this matter would be great too.
TLDR: How do I create session wide patches while running pytests?
I added a file called core/feature/conftest.py that looks like this
import logging
import pytest
#pytest.fixture(scope="session", autouse=True)
def default_session_fixture(request):
"""
:type request: _pytest.python.SubRequest
:return:
"""
log.info("Patching core.feature.service")
patched = mock.patch('core.feature.service.Utility')
patched.__enter__()
def unpatch():
patched.__exit__()
log.info("Patching complete. Unpatching")
request.addfinalizer(unpatch)
This is nothing complicated. It is like doing
with mock.patch('core.feature.service.Utility') as patched:
do_things()
but only in a session-wide manner.
Building on the currently accepted answer for a similar use case (4.5 years later), using unittest.mock's patch with a yield also worked:
from typing import Iterator
from unittest.mock import patch
import pytest
#pytest.fixture(scope="session", autouse=True)
def default_session_fixture() -> Iterator[None]:
log.info("Patching core.feature.service")
with patch("core.feature.service.Utility"):
yield
log.info("Patching complete. Unpatching")
Aside
For this answer I utilized autouse=True. In my actual use case, to integrate into my unit tests on a test-by-test basis, I utilized #pytest.mark.usefixtures("default_session_fixture").
Versions
Python==3.8.6
pytest==6.2.2

Sharing CherryPy's BackgroundTaskQueue object between request handlers

I'm using cherrypy to build a web service. I came across the BackgroundTaskQueue plugin and I want to use it to handle specific time-consuming operations on a separate thread.
The documentation states the usage should be like the following:
import cherrypy
from complicated_logging import log
bgtask = BackgroundTaskQueue(cherrypy.engine)
bgtask.subscribe()
class Root(object):
def index(self):
bgtask.put(log, "index was called", ip=cherrypy.request.remote.ip))
return "Hello, world!"
index.exposed = True
But, IMHO, using the bgtask object like this isn't very elegant. I would like handlers from other python modules to use this object too.
Is there a way to subscribe this plugin once, and then "share" the bgtask object among other handlers (like, for example saving it it in the cherrypy.request)?
How is this done? Does this require writing a cherrypy tool?
Place
queue = BackgroundTaskQueue(cherrypy.engine)
in a separate file named, for instance, tasks.py. This way you create module tasks.
Now you can 'import tasks' in other modules and queue is a single instance
For example, in a file called test.py:
import tasks
def test(): print('works!')
tasks.queue.put(log, test)

Fabric -- use common tasks from parent module

Using the #task and submodule convention, my "parent" fabfile imports two submodules ("dev" and "stable" whose tasks are defined in their respective init.py files). How to I get a #task in the dev module to invoke a task defined in the parent fabfile. I can't seem to get the imports to work correctly.
I also tried using the imp.load_source but that produced a nasty circular import (fabfile.py imports dev which tries to import ../fabfile.py).
Using this as an example: http://docs.fabfile.org/en/1.4.3/usage/tasks.html#going-deeper
How would a task defined in lb.py call something in the top init.py or a task in migrations.py call something in the top init.py?
You can invoke fabric task by name:
from fabric.api import execute, task
#task
def innertask():
execute("mytask", arg1, key1=kwarg1)

Categories

Resources