Airflow Fernet_Key issue when trying to query a mssql db - python

I'm pretty new to Airflow. I've read through the documentation several times, torn through numerous S/O questions and many random articles online but have yet to fix this issue. I have a feeling its something super simple I'm doing wrong.
I have Docker for Windows and I pulled the puckel/docker-airflow image and ran a container with ports exposed so I can hit the UI from my host. I have another container running mcr.microsoft.com/mssql/server on which I restored the WideWorldImporters sample db. From the Airflow UI, I have been able to successfully create the connection to this db and can even query it from the Data Profiling section. Check images below:
Connection Creation
Successful Query to Connection
So while this works, my dag fails at the 2nd task sqlData. here is the code:
from airflow.models import DAG
from airflow.operators.bash_operator import BashOperator
from airflow.operators.python_operator import PythonOperator
from airflow.operators.mssql_operator import MsSqlOperator
from datetime import timedelta, datetime
copyData = DAG(
dag_id='copyData',
schedule_interval='#once',
start_date=datetime(2019,1,1)
)
printHelloBash = BashOperator(
task_id = "print_hello_Bash",
bash_command = 'echo "Lets copy some data"',
dag = copyData
)
mssqlConnection = "WWI"
sqlData = MsSqlOperator(sql="select top 100 InvoiceDate, TotalDryItems from sales.invoices",
task_id="select_some_data",
mssql_conn_id=mssqlConnection,
database="WideWorldImporters",
dag = copyData,
depends_on_past=True
)
queryDataSuccess = BashOperator(
task_id = "confirm_data_queried",
bash_command = 'echo "We queried data!"',
dag = copyData
)
printHelloBash >> sqlData >> queryDataSuccess
Initially the error was:
*[2019-02-22 16:13:09,176] {{logging_mixin.py:95}} INFO - [2019-02-22 16:13:09,176] {{base_hook.py:83}} INFO - Using connection to: 172.17.0.3
[2019-02-22 16:13:09,186] {{models.py:1760}} ERROR - Could not create Fernet object: Incorrect padding
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/airflow/models.py", line 171, in get_fernet
_fernet = Fernet(fernet_key.encode('utf-8'))
File "/usr/local/lib/python3.6/site-packages/cryptography/fernet.py", line 34, in __init__
key = base64.urlsafe_b64decode(key)
File "/usr/local/lib/python3.6/base64.py", line 133, in urlsafe_b64decode
return b64decode(s)
File "/usr/local/lib/python3.6/base64.py", line 87, in b64decode
return binascii.a2b_base64(s)
binascii.Error: Incorrect padding*
I noticed that this has to do with cryptography, and I went ahead and ran pip install cryptography and pip install airflow[crytpo], where both returned the exact same results informing me that the requirement has already been satisfied. Finally, I found something that said I just need to generate a fernet_key. The default key in my airflow.cfg file was fernet_key = $FERNET_KEY. So from the cli in the container I ran:
python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"
And got a code that I replaced $FERNET_KEY with. I restarted the container and re-ran the dag and now my error is:
[2019-02-22 16:22:13,641] {{models.py:1760}} ERROR -
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/cryptography/fernet.py", line 106, in _verify_signature
h.verify(data[-32:])
File "/usr/local/lib/python3.6/site-packages/cryptography/hazmat/primitives/hmac.py", line 69, in verify
ctx.verify(signature)
File "/usr/local/lib/python3.6/site-packages/cryptography/hazmat/backends/openssl/hmac.py", line 73, in verify
raise InvalidSignature("Signature did not match digest.")
cryptography.exceptions.InvalidSignature: Signature did not match digest.
Which from an initial crypto doc scan has something to do with compatibility?
I'm at a lost now and decided that I'd ask this question to see if I'm potentially going down the wrong path in resolving this. Any help would be greatly appreciated as Airflow seems awesome.

Thanks to some side communication from #Tomasz I finally got my DAG to work. He recommended I try using docker-compose which is also listed in the puckel/docker-airflow github repo. I ended up using the docker-compose-LocalExecutor.yml file instead of the Celery Executor though. There was some small troubleshooting and more configuration I had to go through as well. To begin, I took my existing MSSQL container that had the sample db in it and turned it into an image using docker commit mssql_container_name. Only reason I did this is to save time having to restore the backup sample dbs; you could always copy the backups into the container and restore them later if you want. Then I added my new image to the existing docker-compose-LocalExecutor.yml file like so:
version: '2.1'
services:
postgres:
image: postgres:9.6
environment:
- POSTGRES_USER=airflow
- POSTGRES_PASSWORD=airflow
- POSTGRES_DB=airflow
mssql:
image: dw:latest
ports:
- "1433:1433"
webserver:
image: puckel/docker-airflow:1.10.2
restart: always
depends_on:
- postgres
- mssql
environment:
- LOAD_EX=n
- EXECUTOR=Local
#volumes:
#- ./dags:/usr/local/airflow/dags
# Uncomment to include custom plugins
# - ./plugins:/usr/local/airflow/plugins
ports:
- "8080:8080"
command: webserver
healthcheck:
test: ["CMD-SHELL", "[ -f /usr/local/airflow/airflow-webserver.pid ]"]
interval: 30s
timeout: 30s
retries: 3
Mind you, dw is what I named the new image that was based off of the mssql container. Next, I renamed the file to just docker-compose.yml so that I could easily run docker-compose up (not sure if there is a command to point directly to a different YAML file). Once everything was up and running, I navigated to the Airflow UI and configured my connection. Note: since you are using docker-compose you don't need to know the IP address of the other containers since they use DNS service discovery which I found out about here. Then to test the connection I went to Data Profiling to do an ad-hoc query, but the connection wasn't there. This is because the puckel/docker-airflow image doesn't have pymssql installed. So just bash into the container docker exec -it airflow_webserver_container bash and install it pip install pymssql --user. Exit the container and restart all services using docker-compose restart. After a minute everything was up and running. My connection showed up in Ad hoc Query and I could successfully select data. Finally, I turned my DAG on, the scheduler picked it up and everything was successful! Super relieved after spending weeks of googling. Thanks to #y2k-shubham for helping out and some super huge appreciation to #Tomasz who I actually reached out to initially after his awesome and thorough post about Airflow on the r/datascience subreddit.

Related

Automated data dumps from postgres database into .sql using django/python and celery

I'm trying to automate regular data dumps from my postgres database using django/python. I'm using celery to automate it as a task since I want the data dumps to run regularly - every day or every two days. Currently i'm using sh, here's my code:
from sh import pg_dump
#periodic_task(
run_every=(crontab(minute='*/2')), #set to 2 min interval for testing
name="portal_backup",
ignore_result=True)
def portal_backup():
filename = "portal_backup.sql"
with gzip.open(filename) as f:
pg_dump('-U', 'postgres', '-W', '-F', 'p', 'Portal', _out=f)
print("backup done")
All of my database information is in my .env file , however when I run my code I get this error:
**sh.ErrorReturnCode_1:
RAN: /usr/bin/pg_dump -U postgres -W -F p Portal
STDOUT:
STDERR:
Password:
pg_dump: error: connection to database "Portal" failed: FATAL: Peer authentication failed for user "postgres"**
I don't know why the authentication failed. If anyone has an better ideas on how I can automate this process through django/python and celery any info would be appreciated.

Deploying Watson Visual recognition app fails

I created some custom classifiers locally and then i try to deploy on bluemix an app that classifies an image based on the classifiers i made.
When I try to deploy it, it failes to start.
import os
import json
from os.path import join, dirname
from os import environ
from watson_developer_cloud import VisualRecognitionV3
import time
start_time = time.time()
visual_recognition = VisualRecognitionV3(VisualRecognitionV3.latest_version, api_key='*************')
with open(join(dirname(__file__), './test170.jpg'), 'rb') as image_file:
print(json.dumps(visual_recognition.classify(images_file=image_file,threshold=0, classifier_ids=['Angle_971786581']), indent=2))
print("--- %s seconds ---" % (time.time() - start_time))
Even if I try to deploy a simple print , it failes to deploy, but the starter app i get from bluemix, or a Flask tutorial (https://www.ibm.com/blogs/bluemix/2015/03/simple-hello-world-python-app-using-flask/) i found online deploy just fine.
I'm very new to web programming and using cloud services so i'm totally lost.
Thank you.
Bluemix is expecting your python application to serve on a port. If your application isn't serving some kind of response on the port, it assumes the application failed to start.
# On Bluemix, get the port number from the environment variable PORT
# When running this app on the local machine, default the port to 8080
port = int(os.getenv('PORT', 8080))
#app.route('/')
def hello_world():
return 'Hello World! I am running on port ' + str(port)
if __name__ == '__main__':
app.run(host='0.0.0.0', port=port)
It looks like you're writing your code to just execute once and stop. Instead, make it do the work when someone hits your URL, like shown in the hello_world() function above.
Think about what you want to happen when someone goes to YOUR_APP_NAME.mybluemix.net
If you do not want your application to be a WEB application, but instead just execute once (a background worker application), then use the --no-route option at the end of your cf push command. Then, look at the logs using cf logs appname --recent to see the output of your application
https://console.ng.bluemix.net/docs/manageapps/depapps.html#deployingapps
The main problem was watson-developer-cloud module, giving me an error that it could not be found.
I downgraded to python version 2.7.12, installing it for all users.
Modified runtime.exe and requirments.txt (requirments.txt possible not needed)
Staged with Diego, using no-route and set-health-check APP_NAME none command.
Those fixed the problem, but i still get an exit status 0.
when you deploy an app in bluemix,you should have a requirements.txt which include services you used in your app.
so ,you should checkout your requirements.txt,maybe you lost
watson_developer_cloud
and then, the requirements.txt likes this:
Flask==0.10.1
watson_developer_cloud

Unknown task error in Celery 3.1.6 with flower 0.7.3

I'm having a similar issue to this one:
"Unknown task" error in Celery Flower when posting a new task
However I'm running flower 0.7.3 which already has the fix mentioned on the previous issue. When I load flower I see the following:
[D 141025 19:22:44 state:87] Registered: {'celery#myhost': ['crossbar.tasks.add.add',
'crossbar.tasks.ping.ping',
'crossbar.tasks.send_email.send_email',
'crossbar.tasks.send_message.send_message',
'crossbar.tasks.send_sms.send_sms']}
[D 141025 19:22:49 events:116] Enabling events
[D 141025 19:22:50 state:153] Resuming inspecting workers...
[I 141025 19:22:50 tasks:99] Invoking a task 'crossbar.tasks.add.add' with '[1, 2]' and '{}'
[W 141025 19:22:50 web:1404] 404 POST /api/task/async-apply/crossbar.tasks.add.add (127.0.0.1): Unknown task 'crossbar.tasks.add.add'
[W 141025 19:22:50 web:1811] 404 POST /api/task/async-apply/crossbar.tasks.add.add (127.0.0.1) 1.11ms
But as you can see the POST fails, I'm trying to post as follows:
curl -X POST -d '{"args":[1,2]}' http://myhost:15629/api/task/async-apply/crossbar.tasks.add.add
Here is how I'm running Celery:
celery -A myapp worker --loglevel=info
And I'm running flower on a separated process like so:
flower --conf=src/crossbar/flowerconfig.py
If I replace async-apply with send-task, I get a 200 but then on the celery console I get the following error:
[2014-10-26 17:03:06,640: CRITICAL/MainProcess] Can't decode message body: ContentDisallowed('Refusing to deserialize untrusted content of type pickle (application/x-python-serialize)',) [type:'application/x-python-serialize' encoding:'binary' headers:{}]
body: '\x80\x02}q\x01(U\x07expiresq\x02NU\x03utcq\x03\x88U\x04argsq\x04]q\x05(K\x01K\x02eU\x05chordq\x06NU\tcallbacksq\x07NU\x08errbacksq\x08NU\x07tasksetq\tNU\x02idq\nU$f1e8fc87-d0ee-4fc6-86cb-8edded4a4f4cq\x0bU\x07retriesq\x0cK\x00U\x04taskq\rX\x16\x00\x00\x00crossbar.tasks.add.addq\x0eU\ttimelimitq\x0fNN\x86q\x10U\x03etaq\x11NU\x06kwargsq\x12}q\x13u.' (229b)
Traceback (most recent call last):
File "/Users/psantann/Documents/git/crossbar-taskmgr_trunk/.tox/crossbar-taskmgr/lib/python2.6/site-packages/kombu/messaging.py", line 586, in _receive_callback
decoded = None if on_m else message.decode()
File "/Users/psantann/Documents/git/crossbar-taskmgr_trunk/.tox/crossbar-taskmgr/lib/python2.6/site-packages/kombu/message.py", line 142, in decode
self.content_encoding, accept=self.accept)
File "/Users/psantann/Documents/git/crossbar-taskmgr_trunk/.tox/crossbar-taskmgr/lib/python2.6/site-packages/kombu/serialization.py", line 174, in loads
raise self._for_untrusted_content(content_type, 'untrusted')
ContentDisallowed: Refusing to deserialize untrusted content of type pickle (application/x-python-serialize)
Ok, I don't get a serialization error anymore if I add 'pickle' to the list of accepted contents. However flower still does not know about my tasks hence async-apply does not work. I could not get flower -A app to work for me because I'm not initializing it via a celery app but rather using flowerconfig.py, in flowerconfig.py I have CELERY_IMPORTS setup but the actual tasks come from a different python package. What would be the proper way to register those with flower?
Looks like you are not passing app argument to flower. To start flower you need to invoke it with
flower -A your_app
or
celery flower -A your_app
If you don't pass app argument it throws a 404 error.

Why is boto dynamodb2 get_item speed inconsistent and seemingly frequently awful?

Why are my dynamodb requests via boto:get_item so slow and too frequently very slow? The AWS console reports that my get latency has hit a high of 12.5ms. None of my requests are anywhere near that low.
Python 2.7.5
AWS region us-west-1
boto 2.31.1
dynamodb table size ~180k records
Code:
from boto.dynamodb2.fields import HashKey
from boto.dynamodb2.table import Table
from boto.dynamodb2.types import STRING
import boto.dynamodb2
import time
REGION = "us-west-1"
AWS_KEY = "xxxxx"
AWS_SECRET = "xxxxx"
start = time.time()
peeps = ("cefbdadf518f44da8a68e35b2321bb1f", "7e3a691df6134a4f83d381a5507cbb18")
connection = boto.dynamodb2.connect_to_region(REGION, aws_access_key_id=AWS_KEY, aws_secret_access_key=AWS_SECRET)
users = Table("users-test", schema=[HashKey("id", data_type=STRING)], connection=connection)
for peep in peeps:
user = users.get_item(consistent=True, id=peep)
print time.time() - start
Results:
(botot)➜ ~ python test2.py
0.056941986084
0.0681240558624
(botot)➜ ~ python test2.py
1.05709600449
1.06937909126
(botot)➜ ~ python test2.py
0.048614025116
0.0575139522552
(botot)➜ ~ python test2.py
0.0553398132324
0.064425945282
(botot)➜ ~ python test2.py
3.05251288414
3.06584000587
(botot)➜ ~ python test2.py
0.0579640865326
0.0699849128723
(botot)➜ ~ python test2.py
0.0530469417572
0.0628390312195
(botot)➜ ~ python test2.py
1.05059504509
1.05963993073
(botot)➜ ~ python test2.py
1.05139684677
1.0603158474
update 2014-07-11 08:03 PST
The actual use-case is looking up a user for each web request. As #gamaat said, the cost for DynamoDB is on the first lookup because thats when the HTTPS connection is made. So it seems if I can store the DynamoDB connection between requests and reuse it, things would go faster. So I used werkzeug.contrib.cache.FileSystemCache to store the connection but it never seems to actually store the connection for retrieval. Other values get stored fine, just not this connection object. Any ideas? And if this is not a good way to store the connection between requests, then what is?
update 2014-07-11 15:30 PST
Since I'm using supervisor and uwsgi to manage my Flask app, it seems that the problem is actually how can I share the connection object between requests for my Flask app.
The solution to the question that appears to be yielding better response times (before average response time was ~500ms, and after it is ~50ms) was to do two things:
1) put the Boto DynamoDB connection object in default_settings.py so that it gets loaded in once into app.config["DYNDB_CONN"] per application load; and
2) configure uwsgi to have a cheaper value of num_proccesses - 1, and cheaper-initial of num_proccesses - 1. This tells uwsgi to always to have num_processes - 1 uwsgi processes running at all times with the option of starting up one more process if load requires it.
I did this to minimize the number of uwsgi processes that would restart and therefore create a new Boto DynamoDB connection object (incurring HTTP connection setup costs).

Celery Received unregistered task of type (run example)

I'm trying to run example from Celery documentation.
I run: celeryd --loglevel=INFO
/usr/local/lib/python2.7/dist-packages/celery/loaders/default.py:64: NotConfigured: No 'celeryconfig' module found! Please make sure it exists and is available to Python.
"is available to Python." % (configname, )))
[2012-03-19 04:26:34,899: WARNING/MainProcess]
-------------- celery#ubuntu v2.5.1
---- **** -----
--- * *** * -- [Configuration]
-- * - **** --- . broker: amqp://guest#localhost:5672//
- ** ---------- . loader: celery.loaders.default.Loader
- ** ---------- . logfile: [stderr]#INFO
- ** ---------- . concurrency: 4
- ** ---------- . events: OFF
- *** --- * --- . beat: OFF
-- ******* ----
--- ***** ----- [Queues]
-------------- . celery: exchange:celery (direct) binding:celery
tasks.py:
# -*- coding: utf-8 -*-
from celery.task import task
#task
def add(x, y):
return x + y
run_task.py:
# -*- coding: utf-8 -*-
from tasks import add
result = add.delay(4, 4)
print (result)
print (result.ready())
print (result.get())
In same folder celeryconfig.py:
CELERY_IMPORTS = ("tasks", )
CELERY_RESULT_BACKEND = "amqp"
BROKER_URL = "amqp://guest:guest#localhost:5672//"
CELERY_TASK_RESULT_EXPIRES = 300
When I run "run_task.py":
on python console
eb503f77-b5fc-44e2-ac0b-91ce6ddbf153
False
errors on celeryd server
[2012-03-19 04:34:14,913: ERROR/MainProcess] Received unregistered task of type 'tasks.add'.
The message has been ignored and discarded.
Did you remember to import the module containing this task?
Or maybe you are using relative imports?
Please see http://bit.ly/gLye1c for more information.
The full contents of the message body was:
{'retries': 0, 'task': 'tasks.add', 'utc': False, 'args': (4, 4), 'expires': None, 'eta': None, 'kwargs': {}, 'id': '841bc21f-8124-436b-92f1-e3b62cafdfe7'}
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/celery/worker/consumer.py", line 444, in receive_message
self.strategies[name](message, body, message.ack_log_error)
KeyError: 'tasks.add'
Please explain what's the problem.
I think you need to restart the worker server. I meet the same problem and solve it by restarting.
I had the same problem:
The reason of "Received unregistered task of type.." was that celeryd service didn't find and register the tasks on service start (btw their list is visible when you start
./manage.py celeryd --loglevel=info ).
These tasks should be declared in CELERY_IMPORTS = ("tasks", ) in settings file.
If you have a special celery_settings.py file it has to be declared on celeryd service start as --settings=celery_settings.py as digivampire wrote.
You can see the current list of registered tasks in the celery.registry.TaskRegistry class. Could be that your celeryconfig (in the current directory) is not in PYTHONPATH so celery can't find it and falls back to defaults. Simply specify it explicitly when starting celery.
celeryd --loglevel=INFO --settings=celeryconfig
You can also set --loglevel=DEBUG and you should probably see the problem immediately.
Whether you use CELERY_IMPORTS or autodiscover_tasks, the important point is the tasks are able to be found and the name of the tasks registered in Celery should match the names the workers try to fetch.
When you launch the Celery, say celery worker -A project --loglevel=DEBUG, you should see the name of the tasks. For example, if I have a debug_task task in my celery.py.
[tasks]
. project.celery.debug_task
. celery.backend_cleanup
. celery.chain
. celery.chord
. celery.chord_unlock
. celery.chunks
. celery.group
. celery.map
. celery.starmap
If you can't see your tasks in the list, please check your celery configuration imports the tasks correctly, either in --setting, --config, celeryconfig or config_from_object.
If you are using celery beat, make sure the task name, task, you use in CELERYBEAT_SCHEDULE matches the name in the celery task list.
app = Celery('proj',
broker='amqp://',
backend='amqp://',
include=['proj.tasks'])
please include=['proj.tasks']
You need go to the top directory, then execute this
celery -A app.celery_module.celeryapp worker --loglevel=info
not
celery -A celeryapp worker --loglevel=info
in your celeryconfig.py input imports = ("path.path.tasks",)
please in other module invoke task!!!!!!!!
I also had the same problem; I added
CELERY_IMPORTS=("mytasks")
in my celeryconfig.py file to solve it.
Using --settings did not work for me. I had to use the following to get it all to work:
celery --config=celeryconfig --loglevel=INFO
Here is the celeryconfig file that has the CELERY_IMPORTS added:
# Celery configuration file
BROKER_URL = 'amqp://'
CELERY_RESULT_BACKEND = 'amqp://'
CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'
CELERY_TIMEZONE = 'America/Los_Angeles'
CELERY_ENABLE_UTC = True
CELERY_IMPORTS = ("tasks",)
My setup was a little bit more tricky because I'm using supervisor to launch celery as a daemon.
For me this error was solved by ensuring the app containing the tasks was included under django's INSTALLED_APPS setting.
What worked for me, was to add explicit name to celery task decorator. I changed my task declaration from #app.tasks to #app.tasks(name='module.submodule.task')
Here is an example
At first my task was like:
# tasks/test_tasks.py
#celery.task
def test_task():
print("Celery Task !!!!")
I changed it to :
# tasks/test_tasks.py
#celery.task(name='tasks.test_tasks.test_task')
def test_task():
print("Celery Task !!!!")
This method is helpful when you don't have a dedicated tasks.py file to include it in celery config.
In my case the issue was, my project was not picking up autodiscover_tasks properly.
In celery.py file the code was for getting autodiscover_tasks was:
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)
I changed it to the following one:
from django.apps import apps
app.autodiscover_tasks(lambda: [n.name for n in apps.get_app_configs()])
Best wishes to you.
I had this problem mysteriously crop up when I added some signal handling to my django app. In doing so I converted the app to use an AppConfig, meaning that instead of simply reading as 'booking' in INSTALLED_APPS, it read 'booking.app.BookingConfig'.
Celery doesn't understand what that means, so I added, INSTALLED_APPS_WITH_APPCONFIGS = ('booking',) to my django settings, and modified my celery.py from
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)
to
app.autodiscover_tasks(
lambda: settings.INSTALLED_APPS + settings.INSTALLED_APPS_WITH_APPCONFIGS
)
I had the same problem running tasks from Celery Beat. Celery doesn't like relative imports so in my celeryconfig.py, I had to explicitly set the full package name:
app.conf.beat_schedule = {
'add-every-30-seconds': {
'task': 'full.path.to.add',
'schedule': 30.0,
'args': (16, 16)
},
}
Try importing the Celery task in a Python Shell - Celery might silently be failing to register your tasks because of a bad import statement.
I had an ImportError exception in my tasks.py file that was causing Celery to not register the tasks in the module. All other module tasks were registered correctly.
This error wasn't evident until I tried importing the Celery task within a Python Shell. I fixed the bad import statement and then the tasks were successfully registered.
This, strangely, can also be because of a missing package. Run pip to install all necessary packages:
pip install -r requirements.txt
autodiscover_tasks wasn't picking up tasks that used missing packages.
I did not have any issue with Django. But encountered this when I was using Flask. The solution was setting the config option.
celery worker -A app.celery --loglevel=DEBUG --config=settings
while with Django, I just had:
python manage.py celery worker -c 2 --loglevel=info
I encountered this problem as well, but it is not quite the same, so just FYI. Recent upgrades causes this error message due to this decorator syntax.
ERROR/MainProcess] Received unregistered task of type 'my_server_check'.
#task('my_server_check')
Had to be change to just
#task()
No clue why.
If you are using the apps config in installed apps like this:
LOCAL_APPS = [
'apps.myapp.apps.MyAppConfig']
Then in your config app, import the task in ready method like this:
from django.apps import AppConfig
class MyAppConfig(AppConfig):
name = 'apps.myapp'
def ready(self):
try:
import apps.myapp.signals # noqa F401
import apps.myapp.tasks
except ImportError:
pass
did you include your tasks.py file or wherever your async methods are stored?
app = Celery('APP_NAME', broker='redis://redis:6379/0', include=['app1.tasks', 'app2.tasks', ...])
I have solved my problem, my 'task' is under a python package named 'celery_task',when i quit this package,and run the command celery worker -A celery_task.task --loglevel=info. It works.
As some other answers have already pointed out, there are many reasons why celery would silently ignore tasks, including dependency issues but also any syntax or code problem.
One quick way to find them is to run:
./manage.py check
Many times, after fixing the errors that are reported, the tasks are recognized by celery.
if you're using Docker, like said # here will kill your pain.
docker stop $(docker ps -a -q)
For me, restarting the broker (Redis) solved it.
The task already showed up correctly in Celery's task list and all relevant Django settings and imports worked fine.
My broker was running before I wrote the task, and restarting Celery and Django alone didn't solve it.
However, stopping Redis with Ctrl+C and then restarting it with redis-server helped Celery to correctly identify the task.
If you are running into this kind of error, there are a number of possible causes but the solution I found was that my celeryd config file in /etc/defaults/celeryd was configured for standard use, not for my specific django project. As soon as I converted it to the format specified in the celery docs, all was well.
The solution for me to add this line to /etc/default/celeryd
CELERYD_OPTS="-A tasks"
Because when I run these commands:
celery worker --loglevel=INFO
celery worker -A tasks --loglevel=INFO
Only the latter command was showing task names at all.
I have also tried adding CELERY_APP line /etc/default/celeryd but that didn't worked either.
CELERY_APP="tasks"
I had the issue with PeriodicTask classes in django-celery, while their names showed up fine when starting the celery worker every execution triggered:
KeyError: u'my_app.tasks.run'
My task was a class named 'CleanUp', not just a method called 'run'.
When I checked table 'djcelery_periodictask' I saw outdated entries and deleting them fixed the issue.
Just to add my two cents for my case with this error...
My path is /vagrant/devops/test with app.py and __init__.py in it.
When I run cd /vagrant/devops/ && celery worker -A test.app.celery --loglevel=info I am getting this error.
But when I run it like cd /vagrant/devops/test && celery worker -A app.celery --loglevel=info everything is OK.
I've found that one of our programmers added the following line to one of the imports:
os.chdir(<path_to_a_local_folder>)
This caused the Celery worker to change its working directory from the projects' default working directory (where it could find the tasks) to a different directory (where it couldn't find the tasks).
After removing this line of code, all tasks were found and registered.
Celery doesn't support relative imports so in my celeryconfig.py, you need absolute import.
CELERYBEAT_SCHEDULE = {
'add_num': {
'task': 'app.tasks.add_num.add_nums',
'schedule': timedelta(seconds=10),
'args': (1, 2)
}
}
An additional item to a really useful list.
I have found Celery unforgiving in relation to errors in tasks (or at least I haven't been able to trace the appropriate log entries) and it doesn't register them. I have had a number of issues with running Celery as a service, which have been predominantly permissions related.
The latest related to permissions writing to a log file. I had no issues in development or running celery at the command line, but the service reported the task as unregistered.
I needed to change the log folder permissions to enable the service to write to it.
My 2 cents
I was getting this in a docker image using alpine. The django settings referenced /dev/log for logging to syslog. The django app and celery worker were both based on the same image. The entrypoint of the django app image was launching syslogd on start, but the one for the celery worker was not. This was causing things like ./manage.py shell to fail because there wouldn't be any /dev/log. The celery worker was not failing. Instead, it was silently just ignoring the rest of the app launch, which included loading shared_task entries from applications in the django project

Categories

Resources