missing_dependencies error using Pandas in Azure Web Job - python

I need to run some long running job via Azure Web Job in Python.
I am facing below error trying to import pandas.
File "D:\local\Temp\jobs\triggered\demo2\eveazbwc.iyd\pandas_init_.py", line 13
missing_dependencies.append(f"{dependency}: {e}")
The Web app (under which I will run the web job) also has python code using pandas, and it does not throw any error.
I have tried uploading pandas and numpy folder inside the zip file (creating venv, installing packages and zipping Lib/site-packages content), (for 32 bit and 64 bit python) as well as tried appending 'D:/home/site/wwwroot/my_app_name/env/Lib/site-packages' to sys.path.
I am not facing such issues in importing standard python modules or additional package modules like requests.
Error is also thrown in trying to import numpy.
So, I am assuming some kind of version mismatch is happening somewhere.
Any pointers to solve this will be really useful.
I have been using Python 3.x, not sure if I should try Python 2.x (virtual env, install package and zip content of Lib/site-packages).
Regards
Kunal

The key to solving the problem is to add Python 3.6.4 x64 Extension on the portal.
Steps:
Add Extensions on portal.
Create a requirements.txt file.
pandas==1.1.4
numpy==1.19.3
Create run.cmd.
Create below file and zip them into a single zip.
Upload the zip for the webjob.
For more details, please read this post.
Webjobs Running Error (3587fd: ERR ) from zipfile

Related

Download and use Python libraries locally

I'm working on a solution that uses scipy.stats.multivariate_normal, logpdf() function. The solution has to be uploaded to a server when it's evaluated, and that server has only Python 3.6 Anaconda, numpy and pickle installed. Any other library has to be attached as a file in the package that I upload.
The question is what files I need to download and how to use them locally in my script? Or do I need to manually implement logpdf with numpy only?

No module named 'resource' installing Apache Spark on Windows

I am trying to install apache spark to run locally on my windows machine. I have followed all instructions here https://medium.com/#loldja/installing-apache-spark-pyspark-the-missing-quick-start-guide-for-windows-ad81702ba62d.
After this installation I am able to successfully start pyspark, and execute a command such as
textFile = sc.textFile("README.md")
When I then execute a command that operates on textFile such as
textFile.first()
Spark gives me the error 'worker failed to connect back', and I can see an exception in the console coming from worker.py saying 'ModuleNotFoundError: No module named resource'. Looking at the source file I can see that this python file does indeed try to import the resource module, however this module is not available on windows systems. I understand that you can install spark on windows so how do I get around this?
I struggled the whole morning with the same problem. Your best bet is to downgrade to Spark 2.3.2
The fix can be found at https://github.com/apache/spark/pull/23055.
The resource module is only for Unix/Linux systems and is not applicaple in a windows environment. This fix is not yet included in the latest release but you can modify the worker.py in your installation as shown in the pull request. The changes to that file can be found at https://github.com/apache/spark/pull/23055/files.
You will have to re-zip the pyspark directory and move it the lib folder in your pyspark installation directory (where you extracted the pre-compiled pyspark according to the tutorial you mentioned)
Adding to all those valuable answers,
For windows users,Make sure you have copied the correct version of the winutils.exe file(for your specific version of Hadoop) to the spark/bin folder
Say,
if you have Hadoop 2.7.1, then you should copy the winutils.exe file from the Hadoop 2.7.1/bin folder
Link for that is here
https://github.com/steveloughran/winutils
I edited worker.py file. Removed all resource-related lines. Actually # set up memory limits block and import resource. The error disappeared.

Import Python module into AWS Lambda

I have followed all the steps in the documentation:
https://docs.aws.amazon.com/lambda/latest/dg/lambda-python-how-to-create-deployment-package.html
create a directory.
Save all of your Python source files (the .py files) at the root level of this directory.
Install any libraries using pip at the root level of the directory.
Zip the content of the project-dir directory)
But after I uploaded the zip-file to lambda function, I got the error message when I test the script
my code:
import psycopg2
#my code...
the error:
Unable to import module 'myfilemane': No module named 'psycopg2._psycopg'
I don't know where is the suffix '_psycopg' from...
Any help regarding this?
You are using native libraries with lambda. We had this similar problem and here is how we solved it.
Spin a machine with AWS supported AMI that runs your real lambda.
https://docs.aws.amazon.com/lambda/latest/dg/current-supported-versions.html
As this writing, it is,
AMI name: amzn-ami-hvm-2017.03.1.20170812-x86_64-gp2
Full documentation in installing native modules your python lambda.
https://docs.aws.amazon.com/lambda/latest/dg/lambda-python-how-to-create-deployment-package.html
Install the required modules required for your lambda,
pip install module-name -t /path/to/project-dir
and prepare your package to upload along with the native modules under lambda ami environment.
Hope this helps.
I believe this is caused because psycopg2 needs to be build an compiled with statically linked libraries for Linux. Please reference Using psycopg2 with Lambda to Update Redshift (Python) for more details on this issue. Another [reference][1] of problems of compiling psycopg2 on OSX.
There are a few solutions, but basically it comes down to installing the library on a Linux machine and using that as the Psycopg2 Library in your upload package.

Azure ML Python with Script Bundle cannot import module

In Azure ML, I'm trying to execute a Python module that needs to import the module pyxdameraulevenshtein (https://pypi.python.org/pypi/pyxDamerauLevenshtein).
I followed the usual way, which is to create a zip file and then import it; however for this specific module, it seems to never be able to find it. The error message is as usual:
ImportError: No module named 'pyxdameraulevenshtein'
Has anyone included this pyxdameraulevenshtein module in Azure ML with success ?
(I took the package from https://pypi.python.org/pypi/pyxDamerauLevenshtein.)
Thanks for any help you can provide,
PH
I viewed the pyxdameraulevenshtein module page, there are two packages you can download which include a wheel file for MacOS and a source code tar file. I don't think you can directly use the both on Azure ML, because the MacOS one is just a share library .so file for darwin which is not compatible with Azure ML, and the other you need to first compile it.
So my suggestion is as below for using pyxdameraulevenshtein.
First, compile the source code of pyxdameraulevenshtein to a DLL file on Windows, please refer to the document for Python 2/3 or search for doing this.
Write a Python script using the DLL you compiled to implement your needs, please refer to the SO thread How can I use a DLL file from Python? for how to use DLL from Python and refer to the Azure offical tutorial to write your Python script
Package your Python script and DLL file as a zip file, then to upload the zip file to use it in Execute Python script model of Azure ML.
Hope it helps.
Adding the path to pyxdameraulevenshtein to your system path should alleviate this issue. The script checks the system path that the python script is running on and doesn't know where else to look for anything other than the default packages. If your python script is in the same directory as the pyxdameraulevenshtein package in your ZIP file, this should do the trick. Because you are running this within Azure ML and can't be sure of the exact location of your script each time you run it, this solution should account for that.
import os
import sys
sys.path.append(os.path.join(os.getcwd(), 'pyxdameraulevenshtein'))
import pyxdameraulevenshtein

Python 3.5.1 : How to check whether openpyxl package exist and if it does not exist how to download and install it "within" the script?

I started writing python codes two weeks ago and until now I have manipulated some excel data after converting it to txt file. Now, I want to manipulate excel data directly, so I need to install openpyxl package. However, my script will be used in many different places and computers (Note that: all of them use either OS X or a Linux distrubution) which might (probably do not) contain openpyxl package installed. I want my script to check whether this package exist, and if it does not exit, then download and install it.
For that purpose, as I searched and found that I could check the existence of the package by first importing the pip module and then pip.get_installed_distributions() method. However, I am not sure whether I am in a wrong way or not. Besides, I have no idea how to download and install openpyxl package "without leaving the script".
Would you like to help me ?
You are getting ahead of yourself by trying to solve a deployment problem when you don't have the software to deploy.
Having a script trying to manage its own deployment is generally a bad idea because it takes responsibility away from package managers.
Here is my answer to your question:
Build your software under the assumption that all packages are installed/deployed correctly. I recommend creating a virtualenv (virtual environment) on your development computer, using pip to install openpyxl and any other packages you need.
Once your software is built and functional, read up on how to deploy your software package to PyPi: https://pypi.python.org/pypi. Create your package by defining openpyxl as a dependency, ensure your package can be installed/run properly (I recommend adding in tests), then deploy to PyPi so anyone can use your software.
If you want to add a nice layer of validation in your script just in case, add the following to your main method:
#!/usr/bin/env python3
import sys
try:
import openpyxl
except ImportError as iE:
print("Python Excel module 'openpyxl' not found")
sys.exit(1)
... rest of script ...

Categories

Resources