How to use external library in python UDF on hive? - python

I am want to transform a hive table (hdfs spot instances) using a Python UDF for which I need an external library "user-agents". My udf without the use of external library is working fine. But I am not able to get things working when I want to use it.
I tried installing the library using the code itself given below.
import sys
import subprocess
import pip
import os
sys.stdout = open(os.devnull, 'w+')
pip.main(['install', '--user', 'pyyaml'])
pip.main(['install', '--user', 'ua-parser'])
pip.main(['install', '--user', 'user-agents'])
sys.stdout = sys.__stdout__
and after this I tried this
import user_agents
but the udf is crashing with an exception "No module found". I also tried checking the following paths through code :
/usr/local/lib/python2.7/site-packages
/usr/local/lib64/python2.7/site-packages
But no user_agents module was there. Any help on how to do it to get things working ? Would really appreciate it. Thanks !

I figured a way out of this. For those who are solving this same UDF issue and are not successful yet can possibly try this solution and check if it works for them too.
For external libraries, do the following steps:
Step 1: Force pip to install the external library through code itself to the current working directory of your UDF.
import sys
import os
import pip
sys.stdout = open(os.devnull, 'w+')
pip.main(['install', 'user-agents', '-t', os.getcwd(), '--ignore-installed'])
sys.stdout = sys.__stdout__
Step 2: Update your sys.path
sys.path.append(os.getcwd())
Step 3: Now import the library :)
from user_agents import parse
That's it. Please check and confirm it this works for you too.

Related

Trouble loading azure-cosmos library for Python 3.8 Azure Function

I'm having a challenging time getting the Python azure-cosmos library to correctly load for the purposes of locally testing a function in VS Code.
The specific error I'm getting (with the file path shortened) is: Exception: ImportError: cannot import name 'exceptions' from 'azure.cosmos' ([shortened]/.venv/lib/python3.8/site-packages/azure/cosmos/__init__.py)
Things I've checked/tried so far:
Check that requirements.txt specifies azure-cosmos
Manually go into python for each of the interpreters available within VS code and ensure I can manually import azure.cosmos
As instructed here, attempt to reinstall the azure-cosmos library using pip3 and ensuring the --pre flag is used.
[Updated] Verified I can successfully import azure.cosmos.cosmos_client as cosmos_client without any errors
Any ideas? Thanks! Below is the relevant section of my code.
import datetime
import logging
import tempfile
import requests
import os
import zipfile
import pandas as pd
import azure.functions as func
from azure.cosmos import exceptions, CosmosClient, PartitionKey
def main(mytimer: func.TimerRequest, calendars: func.Out[func.Document]) -> None:
logging.info("Timer function has initiated.")
This is what you face now:
This is the offcial doc:
https://github.com/Azure-Samples/azure-cosmos-db-python-getting-started
This doc tells you how to solve this problem.
So the solution is to install pre version.(George Chen's solution is right.)
Didn't install the pre version is the root reason, but please notice that, you need to first delete the package. Otherwise, the pre version will not be installed.(Only run install pre will not solve this problem, you need to delete all of the related packages first. And then install the pre package.)
Whether azure.cosmos is needed depends on whether function binding meets your needs, if the binding could do what you want suppose you don't need to use azure.cosmos.
About this import error, I could reproduce this exception, and I check the github solution it have to add a --pre flag.
So my solution is go to task.json under .vscde, add the flag to the command like below.
If you want to get more details about cosmos binding you could refer to this doc:Azure Cosmos DB trigger and bindings

Install a package and import in the same python script

(Edit: I made a typo when writing this question: I put quotation mark around "pyparsing" in script. Thanks #dswdsyd)
When running a python script, there's only python standard library on target machine. When a package is needed, I have to install it first. For example, When I tried to install pyparsing and import it:
subprocess.call([sys.executable, "-m", "pip", "install", "pyparsing"])
import pyparsing
I got error:
ModuleNotFoundError: No module named 'pyparsing'
So how to install and import a package in the same python script?
[Update:] In the second run of the script, the package can be imported. Strange.
Essentially you are passing pyparsing as a variable instead of a string, to fix this change pyparsing to "pyparsing"
try this:
import subprocess
import sys
subprocess.call([sys.executable, "-m", "pip", "install", "pyparsing"])
import pyparsing
Problem solved by importlib.invalidate_caches(). According to importlib's documentation:
importlib.invalidate_caches()
Invalidate the internal caches of
finders stored at sys.meta_path. If a finder implements
invalidate_caches() then it will be called to perform the
invalidation. This function should be called if any modules are
created/installed while your program is running to guarantee all
finders will notice the new module’s existence.

Python, using pypiwin32 to stop chromedriver.exe prompt from opening

I'm trying to implement the answer here to suppress the chromedriver.exe prompt:
https://stackoverflow.com/a/39937466/264975
which tells me to use the following:
from win32process import CREATE_NO_WINDOW
However, I cannot get win32process module to load. I am told that it requires pypiwin32, however, there is no information on how to use these modules? For instance, what am I actually supposed to import and from where?
I successfully installed pypiwin32 using pip however, I have no idea how to verify it is working due to the lack of help files.
Would be grateful for some pointers as to how to get the example working.
Does it matter that im on a 64 bit pc? I think the python I am using is 32 bit though.
Was trying to do the same thing.
Am on a Windows 10 64-bit machine using Python 2.7.
Kept on saying win32process not found.
I installed a bunch of different modules and a few command line install commands, but what got it working was after I installed this exe package pywin32-221.win-amd64-py2.7.exe from https://sourceforge.net/projects/pywin32/files/pywin32/
Then as https://stackoverflow.com/a/39937466/264975 instructs go to your Python folder, then
Lib\site-packages\selenium\webdriver\common\
and edit service.py (in the thread it mentions services.py but this is what was in my folder)
And include from win32process import CREATE_NO_WINDOW at the top of this script. Mine looks like this
import errno
import os
import platform
import subprocess
from subprocess import PIPE
from win32process import CREATE_NO_WINDOW
import time
from selenium.common.exceptions import WebDriverException
from selenium.webdriver.common import utils
then further down in this script look for def start(self):, and just add this to the end of self.process =
creationflags=CREATE_NO_WINDOW
So mine looks like this
self.process = subprocess.Popen(cmd, env=self.env,
close_fds=platform.system() != 'Windows',
stdout=self.log_file,
stderr=self.log_file,
stdin=PIPE,
creationflags=CREATE_NO_WINDOW)
That's all. Now the chromedriver.exe console does not pop-up at all in my python scripts.
That's what worked for me. Hard to say if it was a number of things together that made it work or just by installing the pywin32-amd64.exe package.

Having an issue creating an exe with py2exe and script importing xlrd

My goal is to create a python script that loops over cells of an excel document. This is my python script called reader.py, and it works just fine.
import xlrd
import os
exceldoc = raw_input("Enter the path to the doc [C:\\folder\\file.xlsx]: ")
wb = xlrd.open_workbook(exceldoc,'rb')
.... some code....
The problem I'm encountering is attempting to use py2exe to create an executable so this script can be used elsewhere.
Here is my setup.py file:
from distutils.core import setup
import py2exe
import sys
from glob import glob
setup(name='Excel Document Checker',console=['reader.py'])
I run the following command: python setup.py py2exe
It appears to run fine; it creates the dist folder that has my reader.exe file, but near the end of the command I get the following:
The following modules appear to be missing
['cElementTree', 'elementtree.ElementTree']
I did some searching online, and tried the recommendations here Re: Error: Element Tree not found, this changing my setup.py file:
from distutils.core import setup
import py2exe
import sys
from glob import glob
options={
"py2exe":{"unbuffered": True,"optimize": 2,
'includes':['xml.etree.ElementPath', 'xml.etree.ElementTree', 'xml.etree.cElementTree'],
"packages": ["elementtree", "xml"]}}
setup(name='Excel Document Checker',options = options,console=['reader.py'])
I'm now getting an error:
ImportError: No module named elementtree
I'm sort of at an impasse here. Any help or guidance is greatly appreciate.
Just some information - I'm running Python 2.6 on a 32 bit system.
You explicitly told setup.py to depend on a package named elementtree here:
"packages": ["elementtree", "xml"]}}
There is no such package in the stdlib. There's xml.etree, but obviously that's the same name.
The example you found is apparently designed for someone who has installed the third-party package elementtree, which is necessary if you need features added after Python 2.6's version of xml.etree, or if you need to work with Python 1.5-2.4, but not if you just want to use Python 2.6's version. (And anyway, if you do need the third-party package… then you have to install it or it won't work, obviously.)
So, just don't do that, and that error will go away.
Also, if your code—or the code you import (e.g., xlrd) is using xml.etree.cElementTree, then, as the py2exe FAQ says, you must also import xml.etree.ElementTree before using it to get it working. (And you also may need to specify it manually as a dependency.)
You presumably don't want to change all the third-party modules you're using… but I believe that making sure to import xml.etree.ElementTree before importing any of those third-party modules works fine.

How to install and import Python modules at runtime

I want to write a script to automatically setup a brand new ubuntu installation and install a django-based app. Since the script will be run on a new server, the Python script needs to automatically install some required modules.
Here is the script.
#!/usr/bin/env python
import subprocess
import os
import sys
def pip_install(mod):
print subprocess.check_output("pip install %s" % mod, shell=True)
if __name__ == "__main__":
if os.getuid() != 0:
print "Sorry, you need to run the script as root."
sys.exit()
try:
import pexpect
except:
pip_install('pexpect')
import pexpect
# More code here...
The installation of pexpect is success, however the next line import pexpect is failed. I think its because at runtime the code doesn't aware about the newly installed pexpect.
How to install and import Python modules at runtime? I'm open to another approaches.
You can import pip instead of using subprocess:
import pip
def install(package):
pip.main(['install', package])
# Example
if __name__ == '__main__':
try:
import pexpect
except ImportError:
install('pexpect')
import pexpect
Another take:
import pip
def import_with_auto_install(package):
try:
return __import__(package)
except ImportError:
pip.main(['install', package])
return __import__(package)
# Example
if __name__ == '__main__':
pexpect = import_with_auto_install('pexpect')
print(pexpect)
[edit]
You should consider using a requirements.txt along with pip. Seems like you are trying to automate deployments (and this is good!), in my tool belt I have also virtualenvwrapper, vagrant and ansible.
This is the output for me:
(test)root#vagrant:~/test# pip uninstall pexpect
Uninstalling pexpect:
/usr/lib/python-environments/test/lib/python2.6/site-packages/ANSI.py
/usr/lib/python-environments/test/lib/python2.6/site-packages/ANSI.pyc
/usr/lib/python-environments/test/lib/python2.6/site-packages/FSM.py
/usr/lib/python-environments/test/lib/python2.6/site-packages/FSM.pyc
/usr/lib/python-environments/test/lib/python2.6/site-packages/fdpexpect.py
/usr/lib/python-environments/test/lib/python2.6/site-packages/fdpexpect.pyc
/usr/lib/python-environments/test/lib/python2.6/site-packages/pexpect-2.4-py2.6.egg-info
/usr/lib/python-environments/test/lib/python2.6/site-packages/pexpect.py
/usr/lib/python-environments/test/lib/python2.6/site-packages/pexpect.pyc
/usr/lib/python-environments/test/lib/python2.6/site-packages/pxssh.py
/usr/lib/python-environments/test/lib/python2.6/site-packages/pxssh.pyc
/usr/lib/python-environments/test/lib/python2.6/site-packages/screen.py
/usr/lib/python-environments/test/lib/python2.6/site-packages/screen.pyc
Proceed (y/n)? y
Successfully uninstalled pexpect
(test)root#vagrant:~/test# python test.py
Downloading/unpacking pexpect
Downloading pexpect-2.4.tar.gz (113Kb): 113Kb downloaded
Running setup.py egg_info for package pexpect
Installing collected packages: pexpect
Running setup.py install for pexpect
Successfully installed pexpect
Cleaning up...
<module 'pexpect' from '/usr/lib/python-environments/test/lib/python2.6/site-packages/pexpect.pyc'>
(test)root#vagrant:~/test#
For those who are using pip version greater than 10.x, there is no main function for pip so the alternative approach is using import pip._internal as pip instead of import pip like :
Updated answer of Paulo
import pip._internal as pip
def install(package):
pip.main(['install', package])
if __name__ == '__main__':
try:
import pexpect
except ImportError:
install('pexpect')
import pexpect
I actually made a module for this exact purpose (impstall)
It's really easy to use:
import impstall
impstall.now('pexpect')
impstall.now('wx', pipName='wxPython')
Github link for issues/contributions
I solved my problem using the imp module.
#!/usr/bin/env python
import pip
import imp
def install_and_load(package):
pip.main(['install', package])
path = '/usr/local/lib/python2.7/dist-packages'
if path not in sys.path:
sys.path.append(path)
f, fname, desc = imp.find_module(package)
return imp.load(package, f, fname, desc)
if __name__ == "__main__":
try:
import pexpect
except:
pexpect = install_and_load('pexpect')
# More code...
Actually the code is less than ideal, since I need to hardcode the Python module directory. But since the script is intended for a known target system, I think that is ok.
I had the same issue but none of Google's searches helped. After hours debugging, I found that it may be because the sys.path is not reloaded with new installation directory.
In my case on my Ubuntu Docker, I want to import dns.resolver at runtime for Python3.8 (pre-installed). I also created ubuntu user and run all things with this user (including my Python script).
Before installing, sys.path doesn't have /home/ubuntu/.local/lib/python3.8/site-packages since I didn't install anything.
While installing with subprocess or pip.main like above, it creates /home/ubuntu/.local/lib/python3.8/site-packages (as user installation).
After installing , the sys.path should be refreshed to include this new location.
Since the sys.path is managed by site module, we should reload it (ref HERE):
import site
from importlib import reload
reload(site)
The full block for anyone that needs:
import subprocess
import sys
try:
import dns.resolver
except ImportError:
subprocess.check_call([sys.executable, "-m", "pip", "install", "dnspython"])
import site
from importlib import reload
reload(site)
import dns.resolver
I'm not Python developer so these code can be simplify more. This may help in cases such as fresh CI/CD environment for DevOps engineers like me.

Categories

Resources