My Pyspark EC2 instance got terminated (my fault) and I am trying to build another one. I have Spark running and now am trying to run a simple Pyspark script to access S3 bucket. The machine has Python 3 installed and I installed boto3 but I get compilation error for the line below.
from boto3.session import Session
No module named 'boto3' .
Also, I get a logger error saying invalid syntax when I do the following
print rddtest.take(10)
Not sure what I am missing. Thanks in advance.
You might have many python installations in your machine. pip might be installing boto3 for python3.6 while you are currently executing python3.7
Related
Despite setting the parameter for my Python AWS Glue Job like this:
--additional-python-modules pyathena
I still get the following error when I try and run the job:
ModuleNotFoundError: No module named 'pyathena'
I have also tried the following parameters:
--additional-python-modules pyathena
--pip-install pyathena
--pip-install pyathena==2.23.0
In case someone finds themselves with a similar issue this is what worked for me.
I solved this issue by removing the '--additional-python-modules pyathena' argument and upgrading the AWS Glue Job Python version to 3.9 (where pyathena is automatically installed as part of the default analytics package).
As I understand it, the --additional-python-modules argument does not work with earlier versions of AWS Glue Python.
I want to use grequests on AWS Lambda. I created venv and reqs.txt and then pip install in a folder named python, then zipped the python folder and uploaded it to a Lambda Layer. After that, I am facing this error:
Gevent is required for grequests.
I already tried this answer on AWS Linux but nothing changed.
Any solution or advice?
Edit: A solution was written, "make zip process on ec2 server, not get the files from ec2 and zip on your local" and I did so. The error messages changed to:
Unable to import module 'lambda_function': No module named 'grequests'
Edit 2: I followed this guide, and facing a new error :)
Unable to import module 'lambda_function': No module named 'zope.interface'
and zope.interface already installed.
I am attempting to import the google-cloud-firestore module in a python script on AWS Lambda. I have the module installed in my virtual environment and upload the code/packages in a zip folder, but receive the following error.
I have successfully imported and executed a script with the requests module using the same approach.
I have attempted to uninstall and reinstall the following modules to my virtual environment and still have no luck: google-cloud-firestore, grpcio, google-cloud-core.
Does anyone have a solution to this problem? What could I be missing here?
This thread gives me the impression that the google-cloud-firestore module cannot be used in a zip file. If this is the case, can I install the module at runtime?
Thank you for any and all advice!
In the AWS documentation, they specify how to activate monitoring for Spark jobs (https://docs.aws.amazon.com/glue/latest/dg/monitor-profile-glue-job-cloudwatch-metrics.html), but not python shell jobs.
Using the code as is gives me this error: ModuleNotFoundError: No module named 'pyspark'
Worse, after commenting out from pyspark.context import SparkContext, I then get ModuleNotFoundError: No module named 'awsglue.context'. It seems the python shell jobs don't have access to glue context? Has anyone solved for this?
The python shell jobs are purely python based environment and do not have access to pyspark ( EMR in the backend). You will not be able to get access to the context attribute here. That is purely a spark concept and glue is essentially a wrapper around pyspark.
I am getting into glue python shell jobs more, and resolving some dependencies in some code files that are shared between my spark jobs and pyshell jobs. I was able to resolve the pyspark dependency, by including in the creation of my .egg/.whl file, in requirements.txt, pyspark==2.4.7. That version because another library required it.
You still cannot use the pyspark context as mentioned above by Emerson, because this is python runtime, not the spark runtime.
So when building distribution with setuptools, can have a requirements.txt that looks like this(below), and when the shell is setup, it will install these dependencies:
elasticsearch
aws_requests_auth
pg8000
pyspark==2.4.7
awsglue-local
I created a lambda function which upload data to snowflake. I installed a all requirements in folder and zipped along with my main python file. While running in AWS it shows an error:
no module found. Cryptography.hamtaz.bindings._constant_time.
But I have this module at specified path. I don't know why it shows an error. I don't know why the error is arise.
Here is the code:
main(event, context):
import snowflake.connector
cnx = snowflake.connector.connect( user='xxx', password='yyyyy', account='zzzz', database="db Name", schema = "schema Name" )
try:
query = "SELECT * FROM Table_Name"
cnx.cursor().execute(query)
finally:
cnx.close()
I recently encountered the same issue. It turned out my Lambda function runtime was Python 3.8 but the 'cffi' library had been compiled for Python 3.6. I created a new Lambda function with the Python 3.6 runtime and uploaded my deployment package to it and it began working right away.
I faced same issue recently and found it is a problem with windows environment, try to create linux environment, install Python, packages, zip your code with all libraries and then throw back to AWS lambda, hopefully it will work.
i needed to set up a virtualenv for my lambda package to work. i also found pip install snowflake-connector-python did not install some cryptography libraries, although if i navigated to the directory i wanted them to be put in, adding --target . did cause those libraries to get installed.
For python 3.6, when I encountered the error "Unable to import module 'main': No module named '_cffi_backend'" in an AWS Lambda Function, I was able to run mv _cffi_backend.cpython-36m-x86_64-linux-gnu.so _cffi_backend.so in my linux docker image with virtualenv and the issue was resolved. Like mentioned above, some dependencies might be better placed with --target, to get them where you need them