try to run airflow on databricks but got error - python

I am trying to use airflow on databricks.
I have installed apache-airflow 1.10.6 from https://pypi.org/project/apache-airflow/.
I am using python3.6 on databricks.
But, I got error:
import airflow
ModuleNotFoundError: No module named 'werkzeug.wrappers.json'; 'werkzeug.wrappers' is not a package
I have tried the followings:
Apache Airflow : airflow initdb results in "ImportError: No module named json"
Apache Airflow : airflow initdb throws ModuleNotFoundError: No module named 'werkzeug.wrappers.json'; 'werkzeug.wrappers' is not a package error
But, I still got the same problem.
Thanks

Note: By default, "Airflow" and its dependency is not installed on the databricks.
You need to install the package explicitly.
Dependency installation: Using Databricks library utilities.
dbutils.library.installPyPI("Werkzeug")
You can install the packages in different methods.
Method1: Installing external packages using pip cmdlet.
Syntax: %sh /databricks/python3/bin/pip install <packagename>
%sh
/databricks/python3/bin/pip install apache-airflow
Method2: Using Databricks library utilities
Syntax:
dbutils.library.installPyPI("pypipackage", version="version", repo="repo", extras="extras")
dbutils.library.restartPython() # Removes Python state, but some libraries might not work without calling this function
To install apache-airflow using databricks library utilities use the below command.
dbutils.library.installPyPI("apache-airflow")
Method3: GUI Method
Go to Clusters => Select Cluster => Libraries => Install New => Library Source "PyPI" => Package "apache-airflow" => Install
Hope this helps. Do let us know if you any further queries.
Do click on "Mark as Answer" and Upvote on the post that helps you, this can be beneficial to other community members.

Related

ImportError: No module named 'graphframes' databricks

I am trying to import graphframes in to my databricks notebook
from graphframes import *
but failed with following error message
ImportError: No module named 'graphframes'
How can I add/import in to databricks notebook, any help appreciated.
Note: By default, "graphframes" is not installed on the databricks.
You need to install the package explicitly.
You can install the packages in different methods.
Method1: Installing external packages using pip cmdlet.
Syntax: %sh /databricks/python3/bin/pip install <packagename>
%sh
/databricks/python3/bin/pip install graphframes
Method2: Using Databricks library utilities
Syntax:
dbutils.library.installPyPI("pypipackage", version="version", repo="repo", extras="extras")
dbutils.library.restartPython() # Removes Python state, but some libraries might not work without calling this function
To install graphframes using databricks library utilities use the below command.
dbutils.library.installPyPI("graphframes")
Tried the examples available in this article GraphFrames Documentation.
Notebook output:
Hope this helps.
graphframes is not default dependency with python. You should install this dependency.
You need to install the graphframes module by opening your terminal and typing pip install graphframes

ImportError: import apache_beam as beam. Module not found

I've installed apache_beam Python SDK and apache airflow Python SDK in a Docker.
Python Version: 3.5
Apache Airflow: 1.10.5
I'm trying to execute apache-beam pipeline using **DataflowPythonOperator**.
When I run a DAG from airflow UI at that time I get
Import Error: import apache_beam as beam. Module not found
With the same setup I tried **DataflowTemplateOperator** and it's working perfectly fine.
When I tried same docker setup with Python 2 and apache airflow 1.10.3, two months back at that time operator didn't returned any error and was working as expected.
After SSH into docker when I checked the installed libraries (using pip freeze) in a docker container I can see the installed versions of apache-beam and apache-airflow.
apache-airflow==1.10.5
apache-beam==2.15.0
Dockerfile:
RUN pip install --upgrade pip
RUN pip install --upgrade setuptools
RUN pip install apache-beam
RUN pip install apache-beam[gcp]
RUN pip install google-api-python-client
ADD . /home/beam
RUN pip install apache-airflow[gcp_api]
airflow operator:
new_task = DataFlowPythonOperator(
task_id='process_details',
py_file="path/to/file/filename.py",
gcp_conn_id='google_cloud_default',
dataflow_default_options={
'project': 'xxxxx',
'runner': 'DataflowRunner',
'job_name': "process_details",
'temp_location': 'GCS/path/to/temp',
'staging_location': 'GCS/path/to/staging',
'input_bucket': 'bucket_name',
'input_path': 'GCS/path/to/bucket',
'input-files': 'GCS/path/to/file.csv'
},
dag=test_dag)
This look like a known issue: https://github.com/GoogleCloudPlatform/DataflowPythonSDK/issues/46
please run pip install six==1.10. This is a known issue in Beam (https://issues.apache.org/jira/browse/BEAM-2964) which we are trying to get fixed upstream.
So try installing six==1.10 using pip
This may not be an option for you, but I was getting the same error with python 2. Executing the same script with python 3 resolved the error.
I was running through the dataflow tutorial:
https://codelabs.developers.google.com/codelabs/cpb101-simple-dataflow-py/
and when I follow the instructions as specified:
python grep.py
I get the error from the title of your post. I hit it with:
python3 grep.py
and it works as expected. I hope it helps. Happy hunting if it doesn't. See the link for details on what exactly I was running.
From this github link will help you to solve your problem. Follow below steps.
Read following nice article on virtualenv, this will help in later steps,
https://www.dabapps.com/blog/introduction-to-pip-and-virtualenv-python/?utm_source=feedly
Create virtual environment ( Note I created it in cloudml-samples folder & named it env)
titanium-vim-169612:~/cloudml-samples$ virtualenv env
Activate virtual env
#titanium-vim-169612:~/cloudml-samples$ source env/bin/activate
Install cloud-dataflow using following link: (this brings in apache_beam)
https://cloud.google.com/dataflow/docs/quickstarts/quickstart-python
Now u can check that apache_beam is present in env/lib/python2.7/site-packages/
#titanium-vim-169612:~/cloudml-samples/flowers$ ls ../env/lib/python2.7/site-packages/
Run the sample
At this point, I got an error about missing tensorflow. I installed tensorflow in my virtualenv by using the link below (use installation steps for virtualenv),
https://www.tensorflow.org/install/install_linux#InstallingVirtualenv
The sample seems to work now.

Python external_dependencies: 'google-cloud-pubsub' not working

Environment details
Google cloud pubsub
window 10
Python version: 3.6.3
google-cloud-pubsub version: 0.39.1
Steps to reproduce
I used google-cloud-pubsub in Odoo module. I tried to explain the issue in step by step.
Added 'google-cloud-pubsub' in external_dependencies in python manifest file:
"external_dependencies": { 'python': ['google-cloud-pubsub'] },
Expected result:
Actually, I don't know why this error occurs. Normally it will be work.
Actual result:
When I published python module to Odoo server its threw below error:
odoo.exceptions.UserError: ('Unable to install module "caliva_wsp" because an
external dependency is not met: No module named google-cloud-pubsub', '')
How to solve this issue? I already stuck at this point around 3 days.
Thanks!
This error message is the expected result if you have not installed the dependency. Odoo module manifest external dependencies only check that the external module is available from Odoo code. It does not install the module.
Install google pubsub pip module on your Odoo server with command pip3 install google-cloud-pubsub before installing your own Odoo module. After that your module should be installable.
You can also automate the installation of dependency by putting it in module requirements.txt file. More information on this can be found at https://www.odoo.com/documentation/user/12.0/odoo_sh/getting_started/first_module.html#use-an-external-python-library.

No module named pymysql - aws serverless framework

I deployed a python lambda function through server less framework. Installed pymysql through pip. My handler info is : dynamodbtoauroradb/aurora-data-management/aurora-data-management.handler
I get this error:
Unable to import module 'dynamodbtoauroradb/aurora-data-management/aurora-data-management': No module named 'pymysql'
Not sure where the mistake is.
There is a chance that pymysql is there in your system packages. So when you built the virtualenvironment, it used the system package.
Create a clean virtualenv using
virtualenv --no-site-packages envname
Or else you can use the current one, with
pip install pymysql --no-deps --ignore-installed
Use the plugin serverless-python-requirements with docker.
This will package all your python virtual env dependencies into your serverless package.
See this answer for more details

Airflow command gives error due to missing api_auth.deny_all

On our staging machine, running any airflow command gives error:
[2018-09-01 16:12:55,938] {__init__.py:37} CRITICAL - Cannot import api_auth.deny_all for API authentication due to: No module named api_auth.deny_all
api_auth seems to come along with airflow, as I tried pip install api_auth and could not find a lib.
On the same machine, I tried to reinstall a fresh clean airflow using virtualenv and pip install airflow, and still get this error.
I tried again on my own laptop and airflow works fine. So I suspect it is probably due to the historical ~/airflow/airflow.cfg on the staging machine.
I am not familiar with the airflow.cfg settings, and cannot find any clue on Google.
Anyone know what may cause the issue and how to resolve?
You are installing a wrong version of Apache Airflow.
Please install Airflow using the following:
pip install apache-airflow
instead of
pip install airflow
Airflow package has been renamed to apache-airflow since 1.8.0
Check the following link for documentation:
https://airflow.apache.org/installation.html#getting-airflow

Categories

Resources