I'm trying to use boto3 client create_job() to create a Glue job, this is the script:
job = client.create_job(Name=xxx,
Role=xxx,
Command={
'Name': 'glueetl',
'ScriptLocation': 's3://my_bucket_name/my_project_name/src/glue.py',
'PythonVersion': '3'},
DefaultArguments={
'--job-language': 'python',
'--extra-py-files': 's3://my_bucket_name/my_project_name/src/test.zip',
'--conf': 'spark.yarn.executor.memoryOverhead=7g --conf spark.jars.packages=xxx',
},
ExecutionProperty={
'MaxConcurrentRuns': 1
},
GlueVersion='1.0'
)
The structure in test.zip is __init__.py file + 'glue.py' file (which is duplicated with the one specified in ScriptLocation) + example.py
Inside the 'glue.py' I have import example, then the job failed with error "ErrorMessage":"ModuleNotFoundError: No module named \'example\'".
I tried from test import example but not working, I'm confused and stuck here, how Glue read and import modules? do I need to setup something? Might someone be able to help please? Many thanks.
The _init_.py is incorrect. It should be __init__.py (double underscore) as explained in the AWS docs.
Related
My goal is to import code into three separate Flask servers. It's not going well. I am on python 3.10.4. I have read perhaps 10 different posts that say things like "put a __init__.py file in your folders" which I have done.
For context I'm not exactly new to Python but I've never learned the importing/module system properly.
I have three Flask servers that run scraping operations on different (but similar) websites. I need them to be separate for various reasons. Anyway, all three need to run the same procedure of getting an IP for a proxy from my proxy provider. For this I have some code:
# we don't need the details here so I snip it to save space
def get_proxy_ip(choice):
r = requests.get(download_list, headers={"Authorization": "Token " + token})
selected_proxy_ip = r.json()["results"][choice]["proxy_address"]
selected_proxy_port = r.json()["results"][choice]["port"]
print(selected_proxy_ip)
return selected_proxy_ip, selected_proxy_port
I want to use this function across all 3 of my Flask servers. Here are some various ways I've tried to import the code into one of the Flask servers:
scrapers/rentCanada/app.py
import requests
from flask import Flask, request, make_response
print("cats")
app = Flask(__name__)
print(__name__, __package__)
# from ..shared.ipgetter import get_proxy_ip
# from ..shared.checker import check_public_ip
# from scrapers.shared.ipgetter import get_proxy_ip
# from scrapers.shared.checker import check_public_ip
import shared.ipgetter as ipgetter
import shared.checker as checker
None of them work.
import shared.ipgetter as ipgetter yields:
cats
__main__ None
Traceback (most recent call last):
File "/home/rlm/Code/canadaAps/scrapers/rentCanada/app.py", line 10, in <module>
import shared.ipgetter as ipgetter
ModuleNotFoundError: No module named 'shared'
ModuleNotFoundError: No module named 'scrapers' yields: ModuleNotFoundError: No module named 'scrapers'
from ..shared.ipgetter import get_proxy_ip yields: ImportError: attempted relative import with no known parent package
At this point you need to see my folder structure.
/scrapers
..__init__.py
..setup.py
../rentCanada
.....__init__.py
.....app.py
../rentFaster
.....__init__.py
.....app.py
../rentSeeker
.....__init__.py
.....app.py
../shared
.....__init__.py
.....ipgetter.py
.....checker.py
I need to be able to use any of the app.py files as entry points.
I also tried setup.py with this:
from setuptools import setup, find_packages
setup(
name = 'tools',
packages = find_packages(),
)
followed by python setup.py install but that didn't make a "tools" import available in app.py like I wanted.
As a final note I suspect someone will tell me to use a blueprint. To me those look like a tool I'd use if I was adding a route. I'm not sure they're right for a simple function, but maybe I'm wrong.
My solution for now is to run Flask with python rentCanada/app.py from the /scrapers folder and use this code
import sys
from pathlib import Path
sys.path.append(str(Path(__file__).parent.parent)) # necessary so util folder is available
import requests
from flask import Flask, request, make_response
print("cats")
app = Flask(__name__)
print(__name__, __package__)
from util.ipgetter import get_proxy_ip
from util.checker import check_public_ip
So the program appends the app.py file folder's parent folder to the path. That makes the util folder (which used to be shared but had a naming conflict) available within the app.py file.
I had a def hello() function in my home/file.py file. I created a home/common/utils.pyfile and moved the function there.
Now, I want to import it in my file file.py.
I imported it like this: from utils import hello and from common.utils import hello and the import in my file doesn't throw an error. However, when I run it on AWS Lambda, I get an error that:
Runtime.ImportModuleError: Unable to import module 'file': No module named 'utils'
How can I fix this? without having to use Ec2 or something...
data "archive_file" "file_zip" {
type = "zip"
source_file = "${path.module}/src/file.py"
output_file_mode = "0666"
output_path = "${path.module}/bin/file.zip"
}
The deployment package that you're uploading only contains your main Python script (file.py). Specifically, it does not include any dependencies such as common/utils.py. That's why the import fails when the code runs in Lambda.
Modify the creation of your deployment package (file.zip) so that it includes all needed dependencies.
For example:
data "archive_file" "file_zip" {
type = "zip"
output_file_mode = "0666"
output_path = "${path.module}/bin/file.zip"
source {
content = file("${path.module}/src/file.py")
filename = "file.py"
}
source {
content = file("${path.module}/src/common/utils.py")
filename = "common/utils.py"
}
}
If all of your files happen to be in a single folder then you can use source_dir instead of indicating the individual files.
Note: I don't use Terraform so the file(...) with embedded interpolation may not be 100% correct, but you get the idea.
First of all, properly follow this standard URL:- https://docs.aws.amazon.com/lambda/latest/dg/python-package.html (refer section with title:- Deployment package with dependencies)
Now, if you notice, in the end of section,
zip -g my-deployment-package.zip lambda_function.py
Follow the same command for your utils file:-
zip -g my-deployment-package.zip common/
zip -g my-deployment-package.zip common/utils.py
Ensure that, in lambda_function, you are using proper import statement like:-
from common.utils import util_function_name
Now, you can upload this zip and test for yourself. It should run.
Hope this helps.
I'm struggling to import a folder that has many engines I need to use. I'm importing from main_file.py.
So I think I can use - from engines import qr_code_gen, but I need to import a class which is named _QRCode_ so I tried using - from .engines.qr_code_gen import _QRCode_, but it says "module engines was not found".
Structure:
Server/start.sh
Server/wsgi.py
Server/application/main_file.py
Server/application/engines/qr_code_gen.py
Server/application/engines/__init__.py
...
I used sys.path in main_file.py and I got -
['C:\Users\Dzitc\Desktop\winteka2',
'C:\Users\Dzitc\AppData\Local\Programs\Python\Python37\Scripts\flask.exe',
'c:\users\dzitc\appdata\local\programs\python\python37\python37.zip',
'c:\users\dzitc\appdata\local\programs\python\python37\DLLs',
'c:\users\dzitc\appdata\local\programs\python\python37\lib',
'c:\users\dzitc\appdata\local\programs\python\python37',
'C:\Users\Dzitc\AppData\Roaming\Python\Python37\site-packages',
'c:\users\dzitc\appdata\local\programs\python\python37\lib\site-packages',
'c:\users\dzitc\appdata\local\programs\python\python37\lib\site-packages\win32',
'c:\users\dzitc\appdata\local\programs\python\python37\lib\site-packages\win32\lib',
'c:\users\dzitc\appdata\local\programs\python\python37\lib\site-packages\Pythonwin']
Going from comments you can import engine package.
Try this then:
import engines
engines.qr_code_gen._QRCode_
I wrote a simple Flask app to pass some data to Spark. The script works in IPython Notebook, but not when I try to run it in it's own server. I don't think that the Spark context is running within the script. How do I get Spark working in the following example?
from flask import Flask, request
from pyspark import SparkConf, SparkContext
app = Flask(__name__)
conf = SparkConf()
conf.setMaster("local")
conf.setAppName("SparkContext1")
conf.set("spark.executor.memory", "1g")
sc = SparkContext(conf=conf)
#app.route('/accessFunction', methods=['POST'])
def toyFunction():
posted_data = sc.parallelize([request.get_data()])
return str(posted_data.collect()[0])
if __name__ == '__main_':
app.run(port=8080)
In IPython Notebook I don't define the SparkContext because it is automatically configured. I don't remember how I did this, I followed some blogs.
On the Linux server I have set the .py to always be running and installed the latest Spark by following up to step 5 of this guide.
Edit:
Following the advice by davidism I have now instead resorted to simple programs with increasing complexity to localise the error.
Firstly I created .py with just the script from the answer below (after appropriately adjusting the links):
import sys
try:
sys.path.append("your/spark/home/python")
from pyspark import context
print ("Successfully imported Spark Modules")
except ImportError as e:
print ("Can not import Spark Modules", e)
This returns "Successfully imported Spark Modules". However, the next .py file I made returns an exception:
from pyspark import SparkContext
sc = SparkContext('local')
rdd = sc.parallelize([0])
print rdd.count()
This returns exception:
"Java gateway process exited before sending the driver its port number"
Searching around for similar problems I found this page but when I run this code nothing happens, no print on the console and no error messages. Similarly, this did not help either, I get the same Java gateway exception as above. I have also installed anaconda as I heard this may help unite python and java, again no success...
Any suggestions about what to try next? I am at a loss.
Okay, so I'm going to answer my own question in the hope that someone out there won't suffer the same days of frustration! It turns out it was a combination of missing code and bad set up.
Editing the code:
I did indeed need to initialise a Spark Context by appending the following in the preamble of my code:
from pyspark import SparkContext
sc = SparkContext('local')
So the full code will be:
from pyspark import SparkContext
sc = SparkContext('local')
from flask import Flask, request
app = Flask(__name__)
#app.route('/whateverYouWant', methods=['POST']) #can set first param to '/'
def toyFunction():
posted_data = sc.parallelize([request.get_data()])
return str(posted_data.collect()[0])
if __name__ == '__main_':
app.run(port=8080) #note set to 8080!
Editing the setup:
It is essential that the file (yourrfilename.py) is in the correct directory, namely it must be saved to the folder /home/ubuntu/spark-1.5.0-bin-hadoop2.6.
Then issue the following command within the directory:
./bin/spark-submit yourfilename.py
which initiates the service at 10.0.0.XX:8080/accessFunction/ .
Note that the port must be set to 8080 or 8081: Spark only allows web UI for these ports by default for master and worker respectively
You can test out the service with a restful service or by opening up a new terminal and sending POST requests with cURL commands:
curl --data "DATA YOU WANT TO POST" http://10.0.0.XX/8080/accessFunction/
I was able to fix this problem by adding the location of PySpark and py4j to the path in my flaskapp.wsgi file. Here's the full content:
import sys
sys.path.insert(0, '/var/www/html/flaskapp')
sys.path.insert(1, '/usr/local/spark-2.0.2-bin-hadoop2.7/python')
sys.path.insert(2, '/usr/local/spark-2.0.2-bin-hadoop2.7/python/lib/py4j-0.10.3-src.zip')
from flaskapp import app as application
Modify your .py file as it is shown in the linked guide 'Using IPython Notebook with Spark' part second point. Insted sys.path.insert use sys.path.append. Try insert this snippet:
import sys
try:
sys.path.append("your/spark/home/python")
from pyspark import context
print ("Successfully imported Spark Modules")
except ImportError as e:
print ("Can not import Spark Modules", e)
I have a file structure like so:
app.yaml
something/
__init__.py
models.py
test.py
I have URL set up to run tests.py in app.yaml:
...
- url: /test
script: something/test.py
test.py imports models.py
When I try to navigate to http://myapp.appspot.com/test/ I get the following error:
Error: Server Error
The server encountered an error and could not complete your request.
If the problem persists, please report your problem and mention this error message and the > query that caused it
And, when I check the logs on the dashboard I see the following error occurred:
<type 'exceptions.ImportError'>: No module named models
How do I import the file properly?
Cheers,
Pete
inside test.py you can write at the top something like:
from something.models import *
This will import your models.
For corrective code though - the wildcard '*' is not great and you explicitly import the models your using:
from something.models import ModelName, OtherModel
and so on.
test.py should have imports models, not imports models.py
Try to import models like this:
import something.models as models