Airflow issue with pathlib / configparser - 'PosixPath' object is not iterable

Airflow issue with pathlib / configparser - 'PosixPath' object is not iterable - python

I am trying to containerize my airflow setup. I've been tasked to keep the environment the same, just move it into a docker container. We currently have Airflow and all our dependencies installed within a anaconda environment. So what I've done is created a custom docker image that installs anaconda and creates my environment. The problem is, our current environment utilized systemd services to start airflow where Docker needs it to run via airflow command "airflow webserver/scheduler/worker" and when I run it like that, I get an error. I get the error after I start up the scheduler.
Our DAGs require a custom repo that helps communicate to our database servers. Within that repo we are using pathlib to get the path of a config file and pass it to configparser.
Basically like this:
import configparser
from pathlib import Path
config = configparser.ConfigParser()
p = Path(__file__)
p = p.parent
config_file_name = 'comms.conf'
config.read(p.joinpath('config', config_file_name))
This is throwing an the following error for all my DAGs in Airflow:
Broken DAG: [/opt/airflow/dags/example_folder/example_dag.py] 'PosixPath' object is not iterable
On the command line the error is:
[2021-01-11 19:53:13,868] {dagbag.py:259} ERROR - Failed to import: /opt/airflow/dags/example_folder/example_dag.py
Traceback (most recent call last):
File "/opt/anaconda3/envs/airflow/lib/python3.7/site-packages/airflow/models/dagbag.py", line 256, in process_file
m = imp.load_source(mod_name, filepath)
File "/opt/anaconda3/envs/airflow/lib/python3.7/imp.py", line 172, in load_source
module = _load(spec)
File "<frozen importlib._bootstrap>", line 696, in _load
File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 728, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/opt/airflow/example_folder/example_dag.py", line 8, in <module>
dag = Dag()
File "/opt/airflow/dags/util/dag_base.py", line 27, in __init__
self.comms = get_comms(Variable.get('environment'))
File "/opt/airflow/repository/repo_folder/custom_script.py", line 56, in get_comms
config = get_config('comms.conf')
File "/opt/airflow/repository/repo_folder/custom_script.py", line 39, in get_config
config.read(p.joinpath('config', config_file_name))
File "/opt/anaconda3/envs/airflow/lib/python3.7/site-packages/backports/configparser/__init__.py", line 702, in read
for filename in filenames:
TypeError: 'PosixPath' object is not iterable
I was able to replicate this behavior outside of the docker container, so I don't think that has anything to do with it. It has to be a difference between how airflow runs as a systemd service and how it runs via cli?
Here is my airflow service file that works:
[Unit]
Description=Airflow webserver daemon
After=network.target postgresql.service mysql.service redis.service rabbitmq-server.service
Wants=postgresql.service mysql.service redis.service rabbitmq-server.service
[Service]
EnvironmentFile=/etc/sysconfig/airflow
User=airflow
Group=airflow
Type=simple
ExecStart=/opt/anaconda3/envs/airflow/bin/airflow webserver
Restart=on-failure
RestartSec=5s
PrivateTmp=true
[Install]
WantedBy=multi-user.target
Here is the airflow environment file that I'm using within the service file. Note that I needed to export these env variables locally to get airflow to run up to this point in the cli. Also note that the custom repos live in the /opt/airflow directory.
AIRFLOW_CONFIG=/opt/airflow/airflow.cfg
AIRFLOW_HOME=/opt/airflow
PATH=/bin:/opt/anaconda3/envs/airflow/bin:/opt/airflow/etl:/opt/airflow:$PATH
PYTHONPATH=/opt/airflow/etl:/opt/airflow:$PYTHONPATH
My airflow config is default, other then the following changes:
executor = CeleryExecutor
sql_alchemy_conn = postgresql+psycopg2://airflow:airflow#192.168.x.x:5432/airflow
load_examples = False
logging_level = WARN
broker_url = amqp://guest:guest#127.0.0.1:5672/
result_backend = db+postgresql://airflow:airflow#192.168.x.x:5432/airflow
catchup_by_default = False
configparser==3.5.3
My conda environment is using python 3.7 and the airflow version is 1.10.14. It's running on a Centos7 server. If anyone has any ideas that could help, I would appropriate it!
Edit: If I change the line config.read(p.joinpath('config', config_file_name)) to point directly to the config like this config.read('/opt/airflow/repository/repo_folder/config/comms.conf') it works fine. So it has something to do with how configparser handles the pathlib output? But it doesn't have a problem with this if airflow is run via systemd service?
Edit2: I can also wrap the pathlib object in str() and it works. config.read(str(p.joinpath('config', config_file_name))) I just want to know why this works fine with the systemd service.. I'm afraid other stuff is going to be broken?

The path to the config file is computed wrongly.
This is because the following line
# filename: custom_script.py
p = p.parent
confpath = p.joinpath('config', config_file_name))
confpath evaluates to /opt/airflow/repository/repo_folder/config/comms.conf
The path you shared where the configuration file lies is /opt/airflow/repository/repo_folder/conn.conf.
You need to resolve the config file relative to repo_folder by constructing its path using the folder custom_script.py is in.
# filename: custom_script.py
from pathlib import Path
p = Path(dirname(__file__))
p = p.parent
confpath = p.joinpath(config_file_name)

I was able to fix this issue by uninstalling and installing a newer version of configparser.
configparser==5.0.1

Related

Azure / Django / Celery / Ubuntu | tkinter & libtk8.6.so import issue

UPDATE / SOLUTION
Per Sytech's answer....
Did not realize that the build was in Ubuntu which has all the packages but when Azure deploys it to a Linux container, the needed packages were missing.
Like in other questions/answers just add these installs to a startup script that Azure will use
ex.
#!/bin/bash
apt-get update
apt-get install tk --yes
python manage.py wait_for_db
python manage.py migrate
gunicorn --bind=0.0.0.0 --timeout 600 app.wsgi --access-logfile '-' --error-logfile '-' &
celery -A app worker -l info --uid=1
Original Post:
When Azure builds & deploys a Python3.9 Django/Django-Rest WebApp it has been failing in it's start up.
Error in question ( full logs below )
2022-03-08T21:13:30.385999188Z File "/tmp/8da0147da65ec79/core/models.py", line 1, in <module>
2022-03-08T21:13:30.386659422Z from tkinter import CASCADE
2022-03-08T21:13:30.387587669Z File "/opt/python/3.9.7/lib/python3.9/tkinter/__init__.py", line 37, in <module>
2022-03-08T21:13:30.387993189Z import _tkinter # If this fails your Python may not be configured for Tk
2022-03-08T21:13:30.388227101Z ImportError: libtk8.6.so: cannot open shared object file: No such file or directory
I have come across other answers to this needing to make sure that tkinter is installed with sudo apt-get python3-tk which I have added to the deployment yml file
Though it still seems to have issue. Reverting back to previous code for deployment is successful and the only feature that has been added to the application is Celery. Not sure if that has anything to do with it or not.
Am I adding the installation of the tk/tkinter in the wrong sequence?
When I revert the to previous code and have a successful build/deploy I ssh into the container and run the python shell and try to manually import the tkinter module.
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/opt/python/3.9.7/lib/python3.9/tkinter/__init__.py", line 37, in <module>
import _tkinter # If this fails your Python may not be configured for Tk
ImportError: libtk8.6.so: cannot open shared object file: No such file or directory
it errors out like expected.
when I run apt-get update && apt-get install python3-tk --yes manually in the container, then go back to the shell on the container there is not error importing tkinter.
Which leads me to believe something is not installing in the right place? virtualenv? Or is being overwritten in the build process?
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout#v2
- name: Set up Python version
uses: actions/setup-python#v1
with:
python-version: "3.9"
- name: Create and start virtual environment
run: |
python -m venv venv
source venv/bin/activate
- name: Install TK dependency
run: |
sudo apt-get update
sudo apt-get install python3-tk
- name: Install dependencies
run: pip install -r requirements.txt
- name: Upload artifact for deployment jobs
uses: actions/upload-artifact#v2
with:
name: python-app
path: |
.
!venv/
App Log spit out below...
2022-03-08T21:13:27.830330743Z Updated PYTHONPATH to ':/opt/startup/code_profiler:/tmp/8da0147da65ec79/antenv/lib/python3.9/site-packages'
2022-03-08T21:13:30.370903021Z Traceback (most recent call last):
2022-03-08T21:13:30.371872470Z File "/tmp/8da0147da65ec79/manage.py", line 22, in <module>
2022-03-08T21:13:30.372648510Z main()
2022-03-08T21:13:30.373176037Z File "/tmp/8da0147da65ec79/manage.py", line 18, in main
2022-03-08T21:13:30.373892773Z execute_from_command_line(sys.argv)
2022-03-08T21:13:30.374862922Z File "/tmp/8da0147da65ec79/antenv/lib/python3.9/site-packages/django/core/management/__init__.py", line 446, in execute_from_comma
nd_line
2022-03-08T21:13:30.374880323Z utility.execute()
2022-03-08T21:13:30.378586012Z File "/tmp/8da0147da65ec79/antenv/lib/python3.9/site-packages/django/core/management/__init__.py", line 420, in execute
2022-03-08T21:13:30.378603012Z django.setup()
2022-03-08T21:13:30.378607713Z File "/tmp/8da0147da65ec79/antenv/lib/python3.9/site-packages/django/__init__.py", line 24, in setup
2022-03-08T21:13:30.378612113Z apps.populate(settings.INSTALLED_APPS)
2022-03-08T21:13:30.378679216Z File "/tmp/8da0147da65ec79/antenv/lib/python3.9/site-packages/django/apps/registry.py", line 116, in populate
2022-03-08T21:13:30.378689817Z app_config.import_models()
2022-03-08T21:13:30.378694417Z File "/tmp/8da0147da65ec79/antenv/lib/python3.9/site-packages/django/apps/config.py", line 304, in import_models
2022-03-08T21:13:30.379003533Z self.models_module = import_module(models_module_name)
2022-03-08T21:13:30.381756173Z File "/opt/python/3.9.7/lib/python3.9/importlib/__init__.py", line 127, in import_module
2022-03-08T21:13:30.383257849Z return _bootstrap._gcd_import(name[level:], package, level)
2022-03-08T21:13:30.383423757Z File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
2022-03-08T21:13:30.383857479Z File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
2022-03-08T21:13:30.384148694Z File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
2022-03-08T21:13:30.384836329Z File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
2022-03-08T21:13:30.384850030Z File "<frozen importlib._bootstrap_external>", line 850, in exec_module
2022-03-08T21:13:30.385281052Z File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
2022-03-08T21:13:30.385999188Z File "/tmp/8da0147da65ec79/core/models.py", line 1, in <module>
2022-03-08T21:13:30.386659422Z from tkinter import CASCADE
2022-03-08T21:13:30.387587669Z File "/opt/python/3.9.7/lib/python3.9/tkinter/__init__.py", line 37, in <module>
2022-03-08T21:13:30.387993189Z import _tkinter # If this fails your Python may not be configured for Tk
2022-03-08T21:13:30.388227101Z ImportError: libtk8.6.so: cannot open shared object file: No such file or directory
2022-03-08T21:13:36.193Z ERROR - Container <container_name>_0_fd6a978c for site <container_name> has exited, failing site start

Tkinter is already included in the ubuntu-latest image. No particular setup is needed.
jobs:
verify-tkinter:
name: verify-tkinter
runs-on: ubuntu-latest
steps:
- uses: actions/checkout#v2
- name: Set up Python version
uses: actions/setup-python#v1
with:
python-version: "3.9"
- name: show tk version
run: |
python -c "import tkinter;print(tkinter.TkVersion)"
If this error is occurring after deployment, you need to install tkinter in your deployment environment, which is separate from GitHub Actions runs.
On your server is running Ubuntu 20 and, make sure the tk package is installed, which provides the libtk8.6.so file needed.
apt install -y tk

I came across this error because a simple mistake.
The IDE add from turtle import up to my .py and I didn't notice

Flask app on OVH : ImportError: No module named 'flask'

I'm using flask to build a project hosted on OVH. Unfortunately it doesnt work.
Here is my app.py :
from flask import Flask, render_template, request, make_response
app = Flask(__name__)
#app.route('/')
#app.route('/test')
def test():
return render_template('test.html')
if __name__ == '__main__':
app.run(debug=True,host='0.0.0.0')
My requirement.txt :
click==7.1.2
Flask==1.1.4
itsdangerous==1.1.0
Jinja2==2.11.3
MarkupSafe==1.1.1
Werkzeug==1.0.1
My tree structure :
www
-templates
--- index.html
-requirement.txt
-my_py3_env
---pyvenv.cfg
---lib
-----python3.5
-------site-packages
---------flask
---bin
-app.py
-__pycache__
However, I get this output :
Traceback (most recent call last):
File "/usr/share/passenger/helper-scripts/wsgi-loader.py", line 369, in <module>
app_module = load_app()
File "/usr/share/passenger/helper-scripts/wsgi-loader.py", line 76, in load_app
return imp.load_source('passenger_wsgi', startup_file)
File "/usr/lib/python3.5/imp.py", line 172, in load_source
module = _load(spec)
File "<frozen importlib._bootstrap>", line 693, in _load
File "<frozen importlib._bootstrap>", line 673, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 673, in exec_module
File "<frozen importlib._bootstrap>", line 222, in _call_with_frames_removed
File "/home/deposec/www/app.py", line 2, in <module>
import flask
ImportError: No module named 'flask'
Does anyone know why ?
EDIT : I have added the following .platform.app.yaml :
name: app
type: python:3.5
web:
commands:
start: "gunicorn -b $PORT project.wsgi:application"
locations:
"/":
root: ""
passthru: true
allow: false
"/static":
root: "static/"
allow: true
hooks:
build: |
pip install -r requirements.txt
pip install -e .
pip install gunicorn
mounts:
tmp:
source: local
source_path: tmp
logs:
source: local
source_path: logs
disk: 512
However I still get No module named 'flask'... Do I also need a wsgi.py somewhere ?

The documentation you are pointing (here is for the OVHcloud Web PaaS powered by Platform.sh offer, not the Cloud Web one. It's two different product.
This means your .platform.app.yaml is ignored on Cloud Web.
To install your Python dependencies in Cloud Web, the only available documentation is here, and seems to be only available in french.
You need to connect to your Cloud Web instance through SSH to run your pip install command.
It looks like:
# 1 - Connect to your Cloud Web through SSH
# You can find these infos in OVH Manager > Web > Your cloudweb > "FTP - SSH"
ssh <cloudweb_username>#sshcloud.cluster024.hosting.ovh.net -p <your port>
# 2 - Setup a Python virtualenv
pip3 install --user virtualenv
export PATH=$PATH:~/.local/bin
echo "export PATH=$PATH:~/.local/bin" >> ~/.profile
# If "www" is your root dir, otherwise adjust it:
cd www/
virtualenv venv
source venv/bin/activate
pip install -r requirements.txt

The problem involved WSL, Gunicorn, Docker and Flask

I am working on Windows 10 Pro, Git Bash, Docker Desktop.
Now I have a project which runs a Flask application in Docker through Gunicorn.
The entrypoint in Dockerfile:
ENTRYPOINT ["gunicorn", "-b",":8080","main.py"]
When run below command:
docker run -p 127.0.0.1:80:8080 jwt-api-test
It shows the error:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/gunicorn/util.py", line 358, in import_app
mod = importlib.import_module(module)
File "/usr/local/lib/python3.7/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
File "<frozen importlib._bootstrap>", line 983, in _find_and_load
File "<frozen importlib._bootstrap>", line 962, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'main.py'; 'main' is not a package
If I am right, it is related to gunicorn, which isn't available in Windows.
After googling, it seems wsl is an option. In fact, there is wsl (already being turned on, running in Docker Desktop), info as below:
wsl.exe --list --all --verbose
NAME STATE VERSION
* docker-desktop-data Running 2
docker-desktop Running 2
When I clicked the wsl.exe, and tried to open bash, it didn't work: no error, just nothing happened. I did use shift +restart according to some instructions, but it didn't work either.
May I ask for your help on how to make this Flask application works? Thanks.
Edited: The structure of main.py:
JWT_SECRET = os.environ.get('JWT_SECRET', 'abc123abc1234')
LOG_LEVEL = os.environ.get('LOG_LEVEL', 'INFO')
LOG = _logger()
LOG.debug("Starting with log level: %s" % LOG_LEVEL )
APP = Flask(__name__)
if __name__ == '__main__':
APP.run(host='127.0.0.1', port=8080, debug=True)
The Dockerfile:
FROM python:stretch
COPY . /app
WORKDIR /app
RUN pip install --upgrade pip
RUN pip install -r requirements.txt
ENTRYPOINT ["gunicorn", "-b", ":8080", "main.py"]

Assuming that the rest of your setup is correct (relative paths, ports, Dockerfile, etc.), the problem could be passing main.py to gunicorn.
Usually you need to pass your Flask variable, i.e in your case replace "main.py" in your ENTRYPOINT with "main.APP" (see docs)
Apart from that: If you get the container running it might be the case that you cannot reach your API. In this case change your gunicorn binding to "0.0.0.0:8080" in your ENTRYPOINT.

How to rename Flask WSGI App during development?

I would like to know how can I change the name of the Flask WSGI App during the development stage.
Using the Flask Mega-Tutorial as reference, I was able to successfully setup a "Hello World" app.
Digressions from the tutorial:
Use pipenv as my Python virtual environment manager (instead of venv)
Name of the app is astronomer.py.
Now, I want to build on top of the existing app and customize the code to my requirements; starting with the app name that I have defined in the .flaskenv file as FLASK_APP env var.
Accordingly, I have updated the name of the root level Python script from astronomer.py (in the tutorial) to galielo.py (for my use). After changing the corresponding value of FLASK_APP and restarting the flask server via $ pipenv run flask run, the app crashes with the following error:
$ pipenv run flask run [12:29:33]
* Serving Flask app "astronomer.py" (lazy loading)
* Environment: development
* Debug mode: on
* Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
* Restarting with stat
* Debugger is active!
* Debugger PIN: 302-012-958
127.0.0.1 - - [09/Oct/2019 12:29:57] "GET / HTTP/1.1" 500 -
Traceback (most recent call last):
File "/Users/kshitij10496/.local/share/virtualenvs/galileo-iQPdbs28/lib/python3.7/site-packages/flask/cli.py", line 240, in locate_app
__import__(module_name)
ModuleNotFoundError: No module named 'astronomer'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/kshitij10496/.local/share/virtualenvs/galileo-iQPdbs28/lib/python3.7/site-packages/flask/cli.py", line 338, in __call__
self._flush_bg_loading_exception()
File "/Users/kshitij10496/.local/share/virtualenvs/galileo-iQPdbs28/lib/python3.7/site-packages/flask/cli.py", line 326, in _flush_bg_loading_exception
reraise(*exc_info)
File "/Users/kshitij10496/.local/share/virtualenvs/galileo-iQPdbs28/lib/python3.7/site-packages/flask/_compat.py", line 39, in reraise
raise value
File "/Users/kshitij10496/.local/share/virtualenvs/galileo-iQPdbs28/lib/python3.7/site-packages/flask/cli.py", line 314, in _load_app
self._load_unlocked()
File "/Users/kshitij10496/.local/share/virtualenvs/galileo-iQPdbs28/lib/python3.7/site-packages/flask/cli.py", line 330, in _load_unlocked
self._app = rv = self.loader()
File "/Users/kshitij10496/.local/share/virtualenvs/galileo-iQPdbs28/lib/python3.7/site-packages/flask/cli.py", line 388, in load_app
app = locate_app(self, import_name, name)
File "/Users/kshitij10496/.local/share/virtualenvs/galileo-iQPdbs28/lib/python3.7/site-packages/flask/cli.py", line 250, in locate_app
raise NoAppException('Could not import "{name}".'.format(name=module_name))
flask.cli.NoAppException: Could not import "astronomer".
Debugging
After logging into the virtual env and checking for the value of the env var FLASK_APP, I get the old value of astronomer.py. This explains why the application is not starting. However, I'm not able to understand why this is happening?
I even tried using "eager-loading" the app using: $ pipenv run flask run --eager-loading
Still, the app does not start with the same error message ofcourse.
I was able to solve this manually by unsetting the env var FLASK_APP from within the virtual env and restarting the flask server. I'm curious to know about why the app is not loading the file .flaskenv at initialization and if there is an automated way to do this?

With Pipenv, I think things are a little bit different when it comes to environment variables. As per the documentation, there is a builtin mechanism for loading a .env file:
If a .env file is present in your project, $ pipenv shell and $ pipenv run will automatically load it, for you
So I guess you should rename your file from .flaskenv to .env and then safely remove the python-dotenv dependency.

GitPython error when pulling from remote repository

I'm trying to pull from a remote repo in a local one with this code:
repo = git.Repo('/home/user/repo/')
o = repo.remotes.origin
try:
o.pull()
except:
logging.exception("oops:")
This fails miserably with the following traceback:
Traceback (most recent call last):
File "/home/user/my_site/app/app.py", line 58, in regen_logs
o.pull()
File "/home/user/my_site/venv/lib/python2.7/site-packages/git/remote.py", line 665, in pull
proc = self.repo.git.pull(self, refspec, with_extended_output=True, as_process=True, v=True, **kwargs)
File "/home/user/my_site/venv/lib/python2.7/site-packages/git/cmd.py", line 440, in <lambda>
return lambda *args, **kwargs: self._call_process(name, *args, **kwargs)
File "/home/user/my_site/venv/lib/python2.7/site-packages/git/cmd.py", line 834, in _call_process
return self.execute(make_call(), **_kwargs)
File "/home/user/my_site/venv/lib/python2.7/site-packages/git/cmd.py", line 576, in execute
raise GitCommandNotFound(str(err))
GitCommandNotFound: [Errno 2] No such file or directory
However, when doing the exact three commands (Repo, set origin, pull()) in an interactive session, it works just fine:
>>> import git
>>> repo = git.Repo('/home/user/repo')
>>> o = repo.remotes.origin
>>> o.pull()
[<git.remote.FetchInfo object at 0x242d2b8>]
I do have Git installed on the system:
$ rpm -q git
git-1.8.3.1-6.el7.x86_64
Any clues what I'm doing wrong here?

Found the culprit. Turns out gunicorn, which I used to start the app, clears out most environment variables, and defines a PATH pointing to the virtual environment's directory. Thus, the git executable could not be located anywhere. I fixed this by calling gunicorn with an extra option:
-e GIT_PYTHON_GIT_EXECUTABLE=/usr/bin/git

You need to set path to git before import git, like this:
import os
os.environ['GIT_PYTHON_GIT_EXECUTABLE'] = '/usr/local/bin/git'
from git import Repo

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Airflow issue with pathlib / configparser - 'PosixPath' object is not iterable - python

I was able to fix this issue by uninstalling and installing a newer version of configparser. configparser==5.0.1

Related

Azure / Django / Celery / Ubuntu | tkinter & libtk8.6.so import issue

Flask app on OVH : ImportError: No module named 'flask'

The problem involved WSL, Gunicorn, Docker and Flask

How to rename Flask WSGI App during development?

GitPython error when pulling from remote repository

Categories

Resources