I have written a python job that uses sqlAlchemy to query a SQL Server database, however when using external libraries with AWS Glue you are required to wrap these libraries in an egg file. This causes an issue with the sqlAlchemy package as it uses the pyodbc package that cannot be wrapped in an egg as to my understanding it has other dependencies.
I have attempted to try and find a way of connecting to a SQL Server database within a Python Glue job but so far the closest advice I've been able to find suggests I write a Spark job instead which isn't appropriate.
Does anyone have experience with connecting to SQL Server within a Python 3 Glue Job? If so can I have an example snippet of code + packages used?
Yes, I actually managed to do something similar by bundling dependencies including transitive dependencies.
Follow the below steps:
1 - Create a script which zips all of the code and dependencies into a zip file and upload to S3:
python3 -m pip install -r requirements.txt --target custom_directory
python3 -m zipapp custom_directory/
mv custom_directory.pyz custom_directory.zip
Upload this zip instead of egg or wheel.
2 - Create a driver program which executes your python source program which we just zipped in step 1.
import sys
if len(sys.argv) == 1:
raise SyntaxError("Please provide a module to load.")
sys.path.append(sys.argv[1])
from your_module import your_function
sys.exit(your_function())
3 - You can then submit your job using:
spark-submit --py-files custom_directory.zip your_program.py
See:
How can you bundle all your python code into a single zip file?
I can't seem to get --py-files on Spark to work
Related
Situation
I have an existing Python app in Google Colab that calls the Twitter API and sends the response to Cloud Storage.
I'm trying to automate the Twitter API call in GCP, and am wondering how I install the requests library for the API call, and install os for authentication.
I tried doing the following library installs in a Cloud Function:
import requests
import os
Result
That produced a resulting error message:
Deployment failure: Function failed on loading user code.
Do I need to install those libraries in a Cloud Function? I'm trying to understand this within the context of my Colab python app, but am not clear if the library installs are necessary.
Thank you for any input.
when you create your cloud function source code , there are two files.
main.py
requirements.txt
Add packages in requirements.txt as below
#Function dependencies, for example:
requests==2.20.0
creating a new python environment for your project might help and would be a good start for any project
it is easy to create.
## for unix-based systems
## create a python environment
python3 -m venv venv
## activate your environment
## in linux-based systems
. ./venv/bin/activate
if you are using google colab, add "!" before these commands, they should work fine.
I am trying to implement azure devops on few of my pyspark projects.
some of the projects are developed in pyCharm and some are in intelliJ with python API.
Below is the code structure commited in the git repository.
setup.py is the build file used to create .egg file.
I have tried few of the steps as shown below to create a build pipeline in the devops.
But the python installation part/execution part is failing with below error.
##[error]The process 'C:\hostedtoolcache\windows\Python\3.7.9\x64\python.exe' failed with exit code 1
I would prefer UI API for building and creating .egg files, If not possible YAML files.
Any leads appreciated!
I used Use Python step and then all works like a charm
Can you show details of Install Python step?
I have used the following steps and it succeeds!
I am trying to submit a job to EMR cluster via Livy. My Python script (to submit job) requires importing a few packages. I have installed all those packages on the master node of EMR. The main script resides on S3 which is being called by the script to submit job to Livy from EC2. Everytime I try to run the job on a remote machine (EC2), it dies stating Import Errors(no module named [mod name] )
I have been stuck on it for more than a week and unable to find a possible solution. Any help would be highly appreciated.
Thanks.
These packages that you are trying to import. Are they custom packages ? if so how did you package them. Did you create a wheel file or zip file and specify them as --py-files in your spark submit via livy ?
Possible problem.
You installed the packages only on the master node. You will need to log into your worker nodes and install the packages there too. Else when u provision the emr , install the packages using bootstrap actions
You should be able to add libraries via —py-files option, but it’s safer to just download the wheel files and use them rather than zipping anything yourself.
I'm using Apex to deploy lambda functions in AWS. I need to write a lambda function which runs a cleanup script on an Oracle RDS in my AWS VPC. Oracle has a very nice python library called cx_Oracle, but I'm having some problems using it in a Lambda function (running on Python 2.7). My first step was to try to run the oracle-described test code as follows:
from __future__ import print_function
import json
import boto3
import boto3.ec2
import os
import cx_Oracle
def handle(event, context):
con = cx_Oracle.connect('username/password#my.oracle.rds:1521/orcl')
print(str(con.version))
con.close()
When I try to run this piece of test code, I get the following response:
Unable to import module 'main': /var/task/cx_Oracle.so: invalid ELF header
Google has told me that this error is caused because the cx_Oracle library is not a complete oracle implementation for python, rather it requires the SQLPlus client to be pre-installed, and the cx_Oracle library references components installed as part of SQLPlus.
Obviously pre-installing SQLPlus might be difficult.
Apex has the
hooks {}
functionality which would allow me to pre-build things, but I'm having trouble finding documentation showing what happens to those artefacts and how that works. In theory I could download the libraries into a nexus or an S3 bucket, and then in my hooks{} declaration, I could add them to the zip file. I could then try to install them as part of the python script. However, I have a few problems with this:
How are the 'built' artefacts accessed inside the lambda
function? Can they be? Have I misunderstood this?
Does a python 2.7 lambda function have enough access rights to
the operating system of the host container to be able to install a
library?
If the answer to question 2 is no, is there another way to write
a lambda function to run some SQL against an Oracle RDS instance?
I'm trying to deploy a Flask web app with mysql connectivity. It's my first time using Azure, and coming off Linux it all seems pretty confusing.
My understanding is that one includes within the requirements.txt to include the packages required. When I build the default Flask app from Azure the file looks like this:
Flask<1
At this stage the site loads fine.
If I then include an additional line
https://cdn.mysql.com/Downloads/Connector-Python/mysql-connector-python-2.1.14.tar.gz
As per this answer https://stackoverflow.com/a/34489738/2697874
Then in my views.py file (which seems to be broadly synonymous to my old app.py file) I include...import mysql.connector
I then restart and reload my site...which then returns the error The page cannot be displayed because an internal server error has occurred.
Error logging spits out a load of html (seems pretty weird way to deliver error logs - so I must be missing something here). When I save to html and load it up I get this...
How can I include the mysql.connector library within my Flask web app?
Per my experience, the resoure https://cdn.mysql.com/Downloads/Connector-Python/mysql-connector-python-2.1.14.tar.gz is for Linux, not for Azure WebApps based on Windows, and the link seems to be not available now.
I used the command pip search mysql-connector to list the related package. Then, I tried to use mysql-connector instead of mysql-connector-python via pip install, and tried to import mysql.connector in local Python interpreter that works fine.
So please use mysql-connector==2.1.4 instead of mysql-connector-python== in the requirements.txt file of your project using IDE, then re-deploy the project on Azure and try again. The package will be installed automatically as the offical doc said as below.
Package Management
Packages listed in requirements.txt will be installed automatically in the virtual environment using pip. This happens on every deployment, but pip will skip installation if a package is already installed.
Any update, please feel free to let me know.