/pip-install.sh initializationActions error in GCP Deployment Manager

/pip-install.sh initializationActions error in GCP Deployment Manager - python

I am setting up entire GCP architecture using Deployment Manager using Python template structure.
I have tried to execute the script below:
'name': 'dataproccluster',
'type': 'dataproc.py',
'subnetwork': 'default',
'properties': {
'zone': ZONE_NORTH,
'region': REGION_NORTH,
'serviceAccountEmail': 'X#appspot.gserviceaccount.com',
'softwareConfig': {
'imageVersion': '1.4-debian9',
'properties': {
'dataproc:dataproc.conscrypt.provider.enable' : 'False'
}
},
'master': {
'numInstances': 1,
'machineType': 'n1-standard-1',
'diskSizeGb': 50,
'diskType': 'pd-standard',
'numLocalSsds': 0
},
'worker': {
'numInstances': 2,
'machineType': 'n1-standard-1',
'diskType': 'pd-standard',
'diskSizeGb': 50,
'numLocalSsds': 0
},
'initializationActions':[{
'executableFile': 'gs://dataproc-initialization-actions/python/pip-install.sh'
}],
'metadata': {
'PIP_PACKAGES':'requests_toolbelt==0.9.1 google-auth==1.6.31'
},
'labels': {
'environment': 'dev',
'data_type': 'X'
}
}
Which results in the following error:
Initialization action failed. Failed action 'gs://dataproc-initialization-actions/python/pip-install.sh',\
I would like to evaluate if it is an error on my side, or an API problem of any sort? I found Google tickets related to this topic covering CLI deployment, however they were marked as solved. I found nothing on Deployment Manager side.
If it is an error on my side what am I doing wrong?

Related

CDK WAF Python Multiple Statement velues error

I have AWS WAF CDK that is working with rules, and now I'm trying to add a rule in WAF with multiple statements, but I'm getting this error:
Resource handler returned message: "Error reason: You have used none or multiple values for a field that requires exactly one value., field: STATEMENT, parameter: Statement (Service: Wafv2, Status Code: 400, Request ID: 6a36bfe2-543c-458a-9571-e929142f5df1, Extended Request ID: null)" (RequestToken: b751ae12-bb60-bb75-86c0-346926687ea4, HandlerErrorCode: InvalidRequest)
My Code:
{
'name': 'ruleName',
'priority': 3,
'statement': {
'orStatement': {
'statements': [
{
'iPSetReferenceStatement': {
'arn': 'arn:myARN'
}
},
{
'iPSetReferenceStatement': {
'arn': 'arn:myARN'
}
}
]
}
},
'action': {
'allow': {}
},
'visibilityConfig': {
'sampledRequestsEnabled': True,
'cloudWatchMetricsEnabled': True,
'metricName': 'ruleName'
}
},

There are two things going on there:
Firstly, your capitalization is off. iPSetReferenceStatement cannot be parsed and creates an empty statement reference. The correct key is ipSetReferenceStatement.
However, as mentioned here, there is a jsii implementation bug causing some issues with the IPSetReferenceStatementProperty. This causes it not to be parsed properly resulting in a jsii error when synthesizing.
You can fix it by using the workaround mentioned in the post.
Add to your file containing the construct:
import jsii
from aws_cdk import aws_wafv2 as wafv2 # just for clarity, you might already have this imported
#jsii.implements(wafv2.CfnRuleGroup.IPSetReferenceStatementProperty)
class IPSetReferenceStatement:
#property
def arn(self):
return self._arn
#arn.setter
def arn(self, value):
self._arn = value
Then define your ip reference statement as follows:
ip_set_ref_stmnt = IPSetReferenceStatement()
ip_set_ref_stmnt.arn = "arn:aws:..."
ip_set_ref_stmnt_2 = IPSetReferenceStatement()
ip_set_ref_stmnt_2.arn = "arn:aws:..."
Then in the rules section of the webacl, you can use it as follows:
...
rules=[
{
'name': 'ruleName',
'priority': 3,
'statement': {
'orStatement': {
'statements': [
wafv2.CfnWebACL.StatementProperty(
ip_set_reference_statement=ip_set_ref_stmnt
),
wafv2.CfnWebACL.StatementProperty(
ip_set_reference_statement=ip_set_ref_stmnt_2
),
]
}
},
'action': {
'allow': {}
},
'visibilityConfig': {
'sampledRequestsEnabled': True,
'cloudWatchMetricsEnabled': True,
'metricName': 'ruleName'
}
}
]
...
This should synthesize your stack as expected.

building highcarts options in views.py

AVOID EVAL
My question has been answered and I ended up using eval, but after some searching on what eval does and can do I ended up not using it and instead used an alternative found here: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/eval#Do_not_ever_use_eval!
In my application i'm building the whole chart options in the backend and returning it as a json response
def get_chart_data(request):
chart = {
'title': {
'text': ''
},
'xAxis': {
'categories': [],
'title': {
'text': ''
},
'type': 'category',
'crosshair': True
},
'yAxis': [{
'allowDecimals': False,
'min': 0,
'title': {
'text': ''
}
}, {
'allowDecimals': False,
'min': 0,
'title': {
'text': ''
},
'opposite': True
}],
'series': [{
'type': 'column',
'yAxis': 1,
'name': '',
'data': []
}, {
'type': 'line',
'name': '',
'data': []
}, {
'type': 'line',
'name': '',
'data': []
}]
}
return JsonResponse(chart)
And then get the data using ajax and use the response for the data
Highcharts.chart('dashboard1', data);
I'm ok with this so far but i've run into problems if I want to use highcharts functions as part of the options, for example setting the color of text using Highcharts.getOptions().colors[0],
'title': {
'text': 'Rainfall',
'style': {
'color': Highcharts.getOptions().colors[0]
}
},
If i don't put quotes to this when building the options in views.py it would be treated as python code and result in an error, however if i add quotes to it, it will be treated as string in javascript which would not work.
Is this possible? or should i just build the options in javascript and just get the data part in the backend and not the whole thing.

You could return the JS code in Django as a string, and then you can run eval() on it, but executing code like that opens the possibility of an XSS attack, especially if the information is user-submittable.
Your best bet otherwise would be to create the styling on the JS end if possible, and manipulate the incoming data.
document.querySelector('a').addEventListener('click', function (e) {
e.preventDefault();
var complexJson = {"parent": {"child": "alert('Here is a nested alert!')"}}
var alertString = "alert('Here is a simple alert!')";
eval(complexJson["parent"]["child"])
eval(alertString)
})
Click me!

How to create a Premium_LRS SSD disk resource (my problem is probably not python specific)

I'm trying to create a Premium SSD disk to attach to a VM in azure but can't seem to figure out how to specify that correctly - I keep ending up with a Standard HDD.
azure_client.compute_client.disks.create_or_update("my_resource_group", 'deleteme-' + str(disk_num), {
"location": "westus",
"disk_size_gb": 256,
'creation_data': {
'create_option': 'empty',
'sku': {
'name': 'Premium_LRS' # <=== What I want
}
},
'tags': {
"fake": "tags"
}
}).result().as_dict()
{
'id': '/subscriptions/5efe2633-26ac-4638-9f1f-6e24e494d9b4/resourceGroups/my_resource_group/providers/Microsoft.Compute/disks/deleteme-26',
'provisioning_state': 'Succeeded',
'name': 'deleteme-26',
'type': 'Microsoft.Compute/disks',
'time_created': '2019-02-05T00:37:41.907815Z',
'tags': {
'fake': 'tags'
},
'creation_data': {
'create_option': 'Empty'
},
'sku': {
'tier': 'Standard',
'name': 'Standard_LRS' # <== What I actually get
},
'location': 'westus',
'disk_size_gb': 256
}
I'm open to connecting the disk directly to the host at creation, but can't figure out the API for tagging the disk that way.
I've also tried also specifying 'tier': 'Premium' in the sku description - but no change. Here's the documentation I've found:

A bit embarrassing, but maybe someone else will do this in the future... I put the SKU in the wrong sub-dictionary. Azure doesn't yell at you if you put random stuff it doesn't understand in the creation_data section.
azure_client.compute_client.disks.create_or_update("my_resource_group", 'deleteme-' + str(disk_num), {
"location": "westus",
"disk_size_gb": 256,
'creation_data': {
'create_option': 'empty'
},
'sku': {
'name': 'Premium_LRS' # <=== Moved out of creation_data dict
}
'tags': {
"fake": "tags"
}
}).result().as_dict()

boto EMR add step and auto terminate

Python 2.7.12
boto3==1.3.1
How can I add a step to a running EMR cluster and have the cluster terminated after the step is complete, regardless of it fails or succeeds?
Create the cluster
response = client.run_job_flow(
Name=name,
LogUri='s3://mybucket/emr/',
ReleaseLabel='emr-5.9.0',
Instances={
'MasterInstanceType': instance_type,
'SlaveInstanceType': instance_type,
'InstanceCount': instance_count,
'KeepJobFlowAliveWhenNoSteps': True,
'Ec2KeyName': 'KeyPair',
'EmrManagedSlaveSecurityGroup': 'sg-1234',
'EmrManagedMasterSecurityGroup': 'sg-1234',
'Ec2SubnetId': 'subnet-1q234',
},
Applications=[
{'Name': 'Spark'},
{'Name': 'Hadoop'}
],
BootstrapActions=[
{
'Name': 'Install Python packages',
'ScriptBootstrapAction': {
'Path': 's3://mybucket/code/spark/bootstrap_spark_cluster.sh'
}
}
],
VisibleToAllUsers=True,
JobFlowRole='EMR_EC2_DefaultRole',
ServiceRole='EMR_DefaultRole',
Configurations=[
{
'Classification': 'spark',
'Properties': {
'maximizeResourceAllocation': 'true'
}
},
],
)
Add a step
response = client.add_job_flow_steps(
JobFlowId=cluster_id,
Steps=[
{
'Name': 'Run Step',
'ActionOnFailure': 'TERMINATE_CLUSTER',
'HadoopJarStep': {
'Args': [
'spark-submit',
'--deploy-mode', 'cluster',
'--py-files',
's3://mybucket/code/spark/spark_udfs.py',
's3://mybucket/code/spark/{}'.format(spark_script),
'--some-arg'
],
'Jar': 'command-runner.jar'
}
}
]
)
This successfully adds a step and runs, however, when the step completes successfully, I would like the cluster to auto-terminate as noted in the AWS CLI: http://docs.aws.amazon.com/cli/latest/reference/emr/create-cluster.html

In your case (creating the cluster using boto3) you can add these flags
'TerminationProtected': False, 'AutoTerminate': True, to your cluster creation. In this way after your step finished to run the cluster will be shut-down.
Another solution is to add another step to kill the cluster immediately after the step that you want to run. So basically you need to run this command as step
aws emr terminate-clusters --cluster-ids your_cluster_id
The tricky part is to retrive the cluster_id.
Here you can find some solution: Does an EMR master node know it's cluster id?

The 'AutoTerminate': True parameter as suggested did not work for me. However, it worked when I set the parameter 'KeepJobFlowAliveWhenNoSteps' from True to False. Your Code should look then as the following:
response = client.run_job_flow(
Name=name,
LogUri='s3://mybucket/emr/',
ReleaseLabel='emr-5.9.0',
Instances={
'MasterInstanceType': instance_type,
'SlaveInstanceType': instance_type,
'InstanceCount': instance_count,
'KeepJobFlowAliveWhenNoSteps': False,
'Ec2KeyName': 'KeyPair',
'EmrManagedSlaveSecurityGroup': 'sg-1234',
'EmrManagedMasterSecurityGroup': 'sg-1234',
'Ec2SubnetId': 'subnet-1q234',
},
Applications=[
{'Name': 'Spark'},
{'Name': 'Hadoop'}
],
BootstrapActions=[
{
'Name': 'Install Python packages',
'ScriptBootstrapAction': {
'Path': 's3://mybucket/code/spark/bootstrap_spark_cluster.sh'
}
}
],
VisibleToAllUsers=True,
JobFlowRole='EMR_EC2_DefaultRole',
ServiceRole='EMR_DefaultRole',
Configurations=[
{
'Classification': 'spark',
'Properties': {
'maximizeResourceAllocation': 'true'
}
},
],
)

You can create a short-lived cluster that automatically terminates after all steps have been run by specifying 'KeepJobFlowAliveWhenNoSteps': False in the Instances param. I've added a complete example to GitHub that shows how to do this.
Here's some of the code from the demo:
def run_job_flow(
name, log_uri, keep_alive, applications, job_flow_role, service_role,
security_groups, steps, emr_client):
try:
response = emr_client.run_job_flow(
Name=name,
LogUri=log_uri,
ReleaseLabel='emr-5.30.1',
Instances={
'MasterInstanceType': 'm5.xlarge',
'SlaveInstanceType': 'm5.xlarge',
'InstanceCount': 3,
'KeepJobFlowAliveWhenNoSteps': keep_alive,
'EmrManagedMasterSecurityGroup': security_groups['manager'].id,
'EmrManagedSlaveSecurityGroup': security_groups['worker'].id,
},
Steps=[{
'Name': step['name'],
'ActionOnFailure': 'CONTINUE',
'HadoopJarStep': {
'Jar': 'command-runner.jar',
'Args': ['spark-submit', '--deploy-mode', 'cluster',
step['script_uri'], *step['script_args']]
}
} for step in steps],
Applications=[{
'Name': app
} for app in applications],
JobFlowRole=job_flow_role.name,
ServiceRole=service_role.name,
EbsRootVolumeSize=10,
VisibleToAllUsers=True
)
cluster_id = response['JobFlowId']
logger.info("Created cluster %s.", cluster_id)
except ClientError:
logger.exception("Couldn't create cluster.")
raise
else:
return cluster_id
And here's some code that calls this function with some real params:
output_prefix = 'pi-calc-output'
pi_step = {
'name': 'estimate-pi-step',
'script_uri': f's3://{bucket_name}/{script_key}',
'script_args':
['--partitions', '3', '--output_uri',
f's3://{bucket_name}/{output_prefix}']
}
cluster_id = emr_basics.run_job_flow(
f'{prefix}-cluster', f's3://{bucket_name}/logs',
False, ['Hadoop', 'Hive', 'Spark'], job_flow_role, service_role,
security_groups, [pi_step], emr_client)

Google Deployment Manager Issue: Can not create Instance Template

I'm trying to deploy a Custom Instance Template using gcloud deployment-manager, but I keep getting this error:
ERROR: (gcloud.deployment-manager.deployments.update) Error in Operation [operation-1507833758152-55b5de788f540-e3be8bf6-a792d98e]: errors:
- code: RESOURCE_ERROR
location: /deployments/my-project/resources/worker-template
message: '{"ResourceType":"compute.v1.instanceTemplate","ResourceErrorCode":"400","ResourceErrorMessage":{"code":400,"errors":[{"domain":"global","message":"Invalid
value for field ''resource.properties'': ''''. Instance Templates must provide
instance properties.","reason":"invalid"}],"message":"Invalid value for field
''resource.properties'': ''''. Instance Templates must provide instance properties.","statusMessage":"Bad
Request","requestPath":"https://www.googleapis.com/compute/v1/projects/my-project/global/instanceTemplates","httpMethod":"POST"}}'
My python generate_config function is this:
def generate_config(context):
resources = [{
'type': 'compute.v1.instanceTemplate',
'name': 'worker-template',
'properties': {
'zone': context.properties['zone'],
'description': 'Worker Template',
'machineType': context.properties['machineType'],
'disks': [{
'deviceName': 'boot',
'type': 'PERSISTENT',
'boot': True,
'autoDelete': True,
'initializeParams': {
'sourceImage': '/'.join([
context.properties['compute_base_url'],
'projects', context.properties['os_project'],
'global/images/family', context.properties['os_project_family']
])
}
}],
'networkInterfaces': [{
'network': '$(ref.' + context.properties['network'] + '.selfLink)',
'accessConfigs': [{
'name': 'External NAT',
'type': 'ONE_TO_ONE_NAT'
}]
}]
}
}]
return {'resources': resources}
Properties is not empty, so the error message doesn't make much sense. Any ideas?
Thx!

After reading this example, I just found that the correct structure for compute.v1.instanceTemplate is:
...
'type': 'compute.v1.instanceTemplate',
'name': 'worker-template',
'properties': {
'project': 'my-project',
'properties': {
'zone': context.properties['zone'],
...
}
}
...
The structure follows this doc

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

/pip-install.sh initializationActions error in GCP Deployment Manager - python

Related

CDK WAF Python Multiple Statement velues error

building highcarts options in views.py

How to create a Premium_LRS SSD disk resource (my problem is probably not python specific)

boto EMR add step and auto terminate

Google Deployment Manager Issue: Can not create Instance Template

Categories

Resources