AWS Glue: Failed to start job run due to missing metadata

AWS Glue: Failed to start job run due to missing metadata - python

In order to run a job using boto3, the documentation states only JobName is required. However, my code:
def start_job_run(self, name):
print("The name of the job to be run via client is: {}".format(name))
self.response_de_start_job = self.client.start_job_run(
JobName=name
)
print(self.response_de_start_job)
and the client is:
self.client = boto3.client(
'glue',
region_name='ap-south-1',
aws_access_key_id=os.getenv('AWS_ACCESS_KEY_ID'),
aws_secret_access_key=os.getenv('AWS_SECRET_ACCESS_KEY'),
)
when executed via Python3, gives out an error:
botocore.errorfactory.EntityNotFoundException: An error occurred (EntityNotFoundException) when calling the StartJobRun operation: Failed to start job run due to missing metadata
but when I do the same operation on the same job from UI and from the cli(aws glue start-job-run --job-name march15_9), it works all alright.

In my experience the error often means Can not find the job.
As soon as jobs are bound to regions, a combination of name and region uniquely identifies a job, and errors in any of these (including trivial mistypes) will lead to the error you experience(d).
As an example, a job I am using is in the us-east-1, thus the following statement executes successfully.
glue_client = boto3.client('glue', region_name='us-east-1')
response = glue_client.start_job_run(
JobName = glue_job_name)
However, the snippet below will produce the same error as you have
glue_client = boto3.client('glue', region_name='us-west-1')
response = glue_client.start_job_run(
JobName = glue_job_name)
botocore.errorfactory.EntityNotFoundException: An error occurred (EntityNotFoundException) when calling the StartJobRun operation: Failed to start job run due to missing metadata
In the case above, it's relatively easy to check, by specifying running cli with --region parameter
It would be something like:
aws glue start-job-run --job-name march15_9 --region ap-south-1
If this runs successfully(thus the region is indeed ap-south-1), I'd explicitly set parameters in the code to remove unknown factors, and instead of passing them through environment variables you can temporary put string values in the code.
Once the code works with hardcoded values you may remove them one by one, thus finding one (or a few) that need to passed correctly.
All the best
P.S. Indeed, the documentation is correct, only JobName needs to be set as a parameter, I have code that works this way

I too faced the same error, the problem is passing ARN of glue job as JobName. Resolved by passing only Name of the glue job.
response = client.start_job_run(
JobName='Glue Job Name not ARN'
)

Check if the name of your glue job is correctly written. I had a similar case and I fixed it that way. (For example: Job_ 01 instead of Job_01)

What glue error log indicating?
You may be using some parameters in glue job which you are not passing while calling job

Related

AWS Glue error - Invalid input provided while running python shell program

I have Glue job, a python shell code. When I try to run it I end up getting the below error.
Job Name : xxxxx Job Run Id : yyyyyy failed to execute with exception Internal service error : Invalid input provided
It is not specific to code, even if I just put
import boto3
print('loaded')
I am getting the error right after clicking the run job option. What is the issue here?

It happend to me but the same job is working on a different account.
AWS documentation is not really explainative about this error:
The input provided was not valid.
I doubt this is an Amazon issue as mentionned #Quartermass

Same issue here in eu-west-2 yesterday, working now. This was only happening with Pythonshell jobs, not Pyspark ones, and job runs weren't getting as far as outputting any log streams. I can only assume it was an AWS issue they've now fixed and not issued a service announcement for.

I think Quatermass is right, the jobs started working out of the blue the next day without any changes.

I too received this super helpful error message.
What worked for me was explicitly setting properties like worker type, number of workers, Glue version and Python version.
In Terraform code:
resource "aws_glue_job" "my_job" {
name = "my_job"
role_arn = aws_iam_role.glue.arn
worker_type = "Standard"
number_of_workers = 2
glue_version = "4.0"
command {
script_location = "s3://my-bucket/my-script.py"
python_version = "3"
}
default_arguments = {
"--enable-job-insights" = "true",
"--additional-python-modules" : "boto3==1.26.52,pandas==1.5.2,SQLAlchemy==1.4.46,requests==2.28.2",
}
}
Update
After doing some more digging, I realised that what I needed was a Python shell script Glue job, not an ETL (Spark) job. By choosing this flavour of job, setting the Python version to 3.9 and "ticking the box" for Glue's pre-installed analytics libraries, my script, incidentally, had access to all the libraries I needed.
My Terraform code ended up looking like this:
resource "aws_glue_job" "my_job" {
name = "my-job"
role_arn = aws_iam_role.glue.arn
glue_version = "1.0"
max_capacity = 1
connections = [
aws_glue_connection.redshift.name
]
command {
name = "pythonshell"
script_location = "s3://my-bucket/my-script.py"
python_version = "3.9"
}
default_arguments = {
"--enable-job-insights" = "true",
"--library-set" : "analytics",
}
}
Note that I have switched to using Glue version 1.0. I arrived at this after some trial and error, and could not find this explicitly stated as the compatible version for pythonshell jobs… but it works!

Well, in my case, I get this error from time to time without any clear reason. The only thing that seems to cause the issue, is modifying some job parameter and saving the modifications. As soon as I save and try to execute the job, I usually get this error and, the only way to solve the issue, is destroying the job and, then, re-creating it again. Does anybody solved this issue by other means? As I saw in the accepted answer, the job simply begun to work again wthout any manual action, giving an understanding that the problem was a bug in AWS that was corrected.

I was facing a similar issue. I was invoking my job from a workflow. I could solve it by adding WorkerType, GlueVersion, NumberOfWorkers to the job before adding the job to the workflow. I could see it consistently fail before and succeed after this addition.

Pass Dynamic Parameters to AWS Glue

I am trying to pass dynamic parameters to a glue job. I followed this question: AWS Glue Job Input Parameters
And configured my parameters like so:
I'm triggering the glue job with boto3 with the following code:
event = {
'--ncoa': "True",
'--files': 'file.csv',
'--group_file': '3e93475d45b4ebecc9a09533ce57b1e7.csv',
'--client_slug': 'test',
'--slm_id': '12345'
}
glueClient.start_job_run(JobName='TriggerNCOA', Arguments=event)
and when I run this glue code:
args = getResolvedOptions(sys.argv, ['NCOA','Files','GroupFile','ClientSlug', 'SLMID'])
v_list=[{"ncoa":args['NCOA'],"files":args['Files'],"group_file":args['GroupFile'], "client_slug":args['ClientSlug'], "slm_id":args['SLMID']}]
print(v_list)
It just gives me 'a' for every value, not the values of the original event that I passed in from boto3. how do I fix that? Seems like im missing something very slight, but ive looked around and haven't found anything conclusive.

You are using CamelCase and Capital letters into Glue Job Parameters, but you are using small letters in python code to override the Parameters.
Ex.
The key of the job parameter in Glue is --ClientSlug but the key for Argument set in python code is --client_slug

Cloud function triggered by object created storage getting file not found error

I have a cloud function configured to be triggered on google.storage.object.finalize in a storage bucket. This was running well for a while. However recently I start to getting some errors FileNotFoundError when trying to read the file. But if I try download the file through the gsutil or the console works fine.
Code sample:
def main(data, context):
full_filename = data['name']
bucket = data['bucket']
df = pd.read_csv(f'gs://{bucket}/{full_filename}') # intermittent raises FileNotFoundError
The errors occurs most often when the file was overwritten. The bucket has the object versioning enabled.
There are something I can do?

As clarified in this other similar case here, sometimes cache can be an issue between Cloud Functions and Cloud Storage, where this can be causing the files to get overwritten and this way, not possible to be found, causing the FileNotFoundError to show up.
Using the invalidate_cache before reading the file can help in this situations, since it will disconsider the cache for the reading and avoid the error. The code for using invalidate_cache is like this:
import gcsfs
fs = gcsfs.GCSFileSystem()
fs.invalidate_cache()

Check in function logging if your function execution is not triggered twice on single object finalize:
first triggered execution with event attribute 'size': '0'
second triggered execution with event attribute size with actual object size
If your function fails on the first you can simply filter it out by checking the attribute value and continuing only if non-zero.
def main(data, context):
object_size = data['size']
if object_size != '0':
full_filename = data['name']
bucket = data['bucket']
df = pd.read_csv(f'gs://{bucket}/{full_filename}')
Don't know what exactly is causing the double-triggering but had similar problem once when using Cloud Storage FUSE and this was a quick solution solving the problem.

Pyinstaller with APScheduler - TypeError with IntervalTrigger

I had the same problem as here (see link below), brielfy: unable to create .exe of a python script that uses APScheduler
Pyinstaller 3.3.1 & 3.4.0-dev build with apscheduler
So I did as suggested:
from apscheduler.triggers import interval
scheduler.add_job(Run, 'interval', interval.IntervalTrigger(minutes = time_int),
args = (input_file, output_dir, time_int),
id = theID, replace_existing=True)
And indeed importing interval.IntervalTrigger and passing it as an argument to add_job solved this particular error.
However, now I am encountring:
TypeError: add_job() got multiple values for argument 'args'
I tested it and I can ascertain it is occurring because of the way trigger is called now. I also tried defining trigger = interval.IntervalTrigger(minutes = time_int) separately and then just passing trigger, and the same happens.
If I ignore the error with try/except, I see that it does not add the job to the sql database at all (I am using SQLAlchemy as a jobstore). Initially I thought it is because I am adding several jobs in a for loop, but it happens with a single job add as well.
Anyone know of some other workaround if the initial problem, or any idea why this error might occur? I can't find anything online either :(

Things always work better in the morning.
For anyone else who encounters this: you don't need both 'interval' and interval.IntervalTrigger() as arguments, the code should be, this is where the error comes from.
scheduler.add_job(Run, interval.IntervalTrigger(minutes = time_int),
args = (input_file, output_dir, time_int),
id = theID, replace_existing=True)

Error when changing instance type in a python for loop

I have a Python 2 script which uses boto3 library.
Basically, I have a list of instance ids and I need to iterate over it changing the type of each instance from c4.xlarge to t2.micro.
In order to accomplish that task, I'm calling the modify_instance_attribute method.
I don't know why, but my script hangs with the following error message:
EBS-optimized instances are not supported for your requested configuration.
Here is my general scenario:
Say I have a piece of code like this one below:
def change_instance_type(instance_id):
client = boto3.client('ec2')
response = client.modify_instance_attribute(
InstanceId=instance_id,
InstanceType={
'Value': 't2.micro'
}
)
So, If I execute it like this:
change_instance_type('id-929102')
everything works with no problem at all.
However, strange enough, if I execute it in a for loop like the following
instances_list = ['id-929102']
for instance_id in instances_list:
change_instance_type(instance_id)
I get the error message above (i.e., EBS-optimized instances are not supported for your requested configuration) and my script dies.
Any idea why this happens?

When I look at EBS optimized instances I don't see that T2 micros are supported:
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSOptimized.html
I think you would need to add EbsOptimized=false as well.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

AWS Glue: Failed to start job run due to missing metadata - python

I too faced the same error, the problem is passing ARN of glue job as JobName. Resolved by passing only Name of the glue job. response = client.start_job_run( JobName='Glue Job Name not ARN' )

Check if the name of your glue job is correctly written. I had a similar case and I fixed it that way. (For example: Job_ 01 instead of Job_01)

What glue error log indicating? You may be using some parameters in glue job which you are not passing while calling job

Related

AWS Glue error - Invalid input provided while running python shell program

Pass Dynamic Parameters to AWS Glue

Cloud function triggered by object created storage getting file not found error

Pyinstaller with APScheduler - TypeError with IntervalTrigger

Error when changing instance type in a python for loop

Categories

Resources