I am trying to get the docker container stats inside my python code after running the container as shown below. I am referring to python docker SDk https://docker-py.readthedocs.io/en/stable/index.html to run and get the details of container.
container = docker_client.containers.run(image = image,entrypoint = entrypoint,detach=True,tty=False,volumes = {FILE_PATH+input_file:{'bind': '/src/input.txt', 'mode': 'rw'}})
container.wait()
data = container.stats(stream=False)
The stats which are getting doesn't have memory and few other details properly. Below are the stats details
{
'read': '0001-01-01T00:00:00Z',
'preread': '2020-04-21T10:07:54.773647854Z',
'pids_stats': {
},
'blkio_stats': {
'io_service_bytes_recursive': None,
'io_serviced_recursive': None,
'io_queue_recursive': None,
'io_service_time_recursive': None,
'io_wait_time_recursive': None,
'io_merged_recursive': None,
'io_time_recursive': None,
'sectors_recursive': None
},
'num_procs': 0,
'storage_stats': {
},
'cpu_stats': {
'cpu_usage': {
'total_usage': 0,
'usage_in_kernelmode': 0,
'usage_in_usermode': 0
},
'throttling_data': {
'periods': 0,
'throttled_periods': 0,
'throttled_time': 0
}
},
'precpu_stats': {
'cpu_usage': {
'total_usage': 208804435,
'percpu_usage': [
2260663,
0,
0,
7976886,
0,
2549616,
178168661,
1717192,
117608,
0,
1011534,
3305192,
0,
11372783,
0,
324300
],
'usage_in_kernelmode': 20000000,
'usage_in_usermode': 160000000
},
'system_cpu_usage': 98001601690000000,
'online_cpus': 16,
'throttling_data': {
'periods': 0,
'throttled_periods': 0,
'throttled_time': 0
}
},
'memory_stats': {
},
'name': '/quizzical_mcclintock',
'id': '4bb79d8468f2f91a91022b4a7086744a6b3cdefab2a98f7efa178c9aff7ed246'
}
How to get all the stats details properly using python docker SDK?
import os
os.system("docker stats $(docker ps -q) --no-stream")
if you want to get the stats of all the running container then the following command will help you. Otherwise you need to calculate the stats yourself from total used so can be tricky .
Related
I want to create a Dagster app that creates an EMR cluster and adds a spark-submit step, but due to a lack of documentation or examples I can't figure out how to do that (copilot also struggles with it :-)).
The idea is to create a scheduler with Dagster that creates an EMR cluster and runs scala-spark app as one of its steps.
Here's the code I have (it's not working correctly, but you may get a sense about what I was trying to do):
from dagster_shell import create_shell_command_op
from dagster_aws.emr.emr import EmrJobRunner
from dagster import graph, op
#op
def create_emr_cluster(context):
emr_job_runner = EmrJobRunner('us-east-1', aws_access_key_id='ACCESS_KEY', aws_secret_access='SECRET_KEY')
cluster_id = emr_job_runner.create_cluster()
step_dict = emr_job_runner.construct_step_dict_for_command('Spark Step', 'spark-submit --class org.apache.spark.examples.SparkPi --deploy-mode cluster s3://my-bucket/spark-examples.jar stage')
emr_job_runner.add_job_flow_steps(None, cluster_id, [step_dict])
#graph
def my_graph():
# a = create_shell_command_op('echo "hello, world!"', name="a") # this will invoke spark-submit on an existing cluster
# a()
create_emr_cluster()
my_job = my_graph.to_job()
How can I do it?
You had most of your components correctly setup. You were only missing EMR job flow settings which sets the application you want to use(on EMR), core/task node setup and so on..
More details here:
https://docs.aws.amazon.com/emr/latest/APIReference/API_RunJobFlow.html
Dagster api has a function run_job_flow which takes this input and creates a cluster.
Sharing a sample code snippet
from dagster_aws.emr import EmrJobRunner
REGION="us-east-1"
emr_cluster_config = {
"Applications": [
{
"Name": "Spark"
}
],
"JobFlowRole": "SomeRole",
"Instances": {
"Ec2SubnetId": "subnet-1",
"EmrManagedSlaveSecurityGroup": "sg-slave",
"EmrManagedMasterSecurityGroup": "sg-master",
"KeepJobFlowAliveWhenNoSteps": True,
"TerminationProtected": False,
"InstanceGroups": [
{
"InstanceCount": 1,
"EbsConfiguration": {
"EbsBlockDeviceConfigs": [
{
"VolumeSpecification": {
"SizeInGB": 32,
"VolumeType": "gp3"
},
"VolumesPerInstance": 2
}
]
},
"InstanceRole": "MASTER",
"InstanceType": "r6g.2xlarge",
"Name": "EMR Master"
},
{
"InstanceCount": 2,
"EbsConfiguration": {
"EbsBlockDeviceConfigs": [
{
"VolumeSpecification": {
"SizeInGB": 256,
"VolumeType": "gp3"
},
"VolumesPerInstance": 2
}
]
},
"InstanceRole": "CORE",
"InstanceType": "r6g.2xlarge",
"Name": "EMR Core"
},
{
"InstanceCount":2,
"EbsConfiguration": {
"EbsBlockDeviceConfigs": [
{
"VolumeSpecification": {
"SizeInGB": 256,
"VolumeType": "gp3"
},
"VolumesPerInstance": 2
}
]
},
"InstanceRole": "TASK",
"InstanceType": "r6g.2xlarge",
"Name": "EMR Task"
}
]
},
"StepConcurrencyLevel": 1,
"ReleaseLabel": "emr-5.36.0",
"LogUri": "s3n://<somebucket>/logs/",
"EbsRootVolumeSize": 32,
"ServiceRole": "emr-role",
"Name": "<cluster_name>"
}
emr = EmrJobRunner(region=REGION)
# This step create the cluster
cluster_id = emr.run_job_flow(emr_cluster_config)
step_name = 'test_step'
step_cmd = ['ls', '/']
step_ids = emr.add_job_flow_steps(
cluster_id, [emr.construct_step_dict_for_command(step_name, step_cmd)]
)
You can also look at the test cases on dagster repo, it does provide a very good examples for the same.
So, I want to change my info in json file from python, but I am having trouble.
my json file is just info that I want to edit later:
[
{
"codigo": 10,
"Nom_articulo": "jabon",
"valor": 2500,
"cantidad": 6,
"subtotal": 0,
"descuento": 0
},
{
"codigo": 20,
"Nom_articulo": "Crema",
"valor": 9800,
"cantidad": 4,
"subtotal": 0,
"descuento": 0
},
{
"codigo": 30,
"Nom_articulo": "Cepillo",
"valor": 6000,
"cantidad": 7,
"subtotal": 0,
"descuento": 0
},
{
"codigo": 40,
"Nom_articulo": "Servilletas",
"valor": 3000,
"cantidad": 2,
"subtotal": 0,
"descuento": 0
},
{
"codigo": 50,
"Nom_articulo": "Desodorante",
"valor": 5000,
"cantidad": 6,
"subtotal": 0,
"descuento": 0
}
]
I want to change the value of "subtotal" in all my dictionaries.
so basically what I did was:
for i in range(len(archivo_r)):
precio= archivo_r[i]["valor"]
cantidad=archivo_r[i]["cantidad"]
subtotal=precio*cantidad
print(archivo_r[i]["codigo"], " - " ,archivo_r[i]["Nom_articulo"], " = ", str(subtotal))
#almacenar mis subtotales en el archivo json
print("sbtotal" ,archivo_r[i]["subtotal"])
archivo_r[i]["subtotal"]=subtotal
#archivo_r[i]["subtotal"].append(subtotal)
#print(archivo_r)
write_json(**XXXXX**)
This part of the code:
archivo_r[i]["subtotal"]=subtotal does exactly what I need, but (and this could be very silly, but I am a little lost here) I do not know how to use that to re-write my json file. I mean, I have the function to write it.
def write_json(info, nombre_archivo="productos.json"):
with open(nombre_archivo, "w") as p:
json.dump(info, p)
I need to pass the information in write_json(**XXXXX**), but have been trying to storage my archivo_r[i]["subtotal"]=subtotal in a variable to pass it and other things, but nothing work. I know I am doing wrong but not sure how to solve it.
Once you're done processing the data, simply pass archivo_r to your write_json() function and you should be fine.
As an aside, you can iterate directly over the JSON objects like so:
for section in archivo_r:
precio = section["valor"]
...
You can then replace all instances of archivo_r[i] with section, or whatever you want to call the variable.
I am setting up entire GCP architecture using Deployment Manager using Python template structure.
I have tried to execute the script below:
'name': 'dataproccluster',
'type': 'dataproc.py',
'subnetwork': 'default',
'properties': {
'zone': ZONE_NORTH,
'region': REGION_NORTH,
'serviceAccountEmail': 'X#appspot.gserviceaccount.com',
'softwareConfig': {
'imageVersion': '1.4-debian9',
'properties': {
'dataproc:dataproc.conscrypt.provider.enable' : 'False'
}
},
'master': {
'numInstances': 1,
'machineType': 'n1-standard-1',
'diskSizeGb': 50,
'diskType': 'pd-standard',
'numLocalSsds': 0
},
'worker': {
'numInstances': 2,
'machineType': 'n1-standard-1',
'diskType': 'pd-standard',
'diskSizeGb': 50,
'numLocalSsds': 0
},
'initializationActions':[{
'executableFile': 'gs://dataproc-initialization-actions/python/pip-install.sh'
}],
'metadata': {
'PIP_PACKAGES':'requests_toolbelt==0.9.1 google-auth==1.6.31'
},
'labels': {
'environment': 'dev',
'data_type': 'X'
}
}
Which results in the following error:
Initialization action failed. Failed action 'gs://dataproc-initialization-actions/python/pip-install.sh',\
I would like to evaluate if it is an error on my side, or an API problem of any sort? I found Google tickets related to this topic covering CLI deployment, however they were marked as solved. I found nothing on Deployment Manager side.
If it is an error on my side what am I doing wrong?
I am trying to update the array: ['media_details'] with a local image path after the image has been downloaded. However using $push just added the local_url on top.
This is what ['media_details'] looks like:
"image_details": [
{
"processed": true,
"position": 0,
"seconds": "46",
"src_url": "https://xxxxx/1.jpg",
"image_fname": "1.jpg",
},
{
"processed": true,
"position": 1,
"seconds": "55",
"src_url": "https://xxxxx/2.jpg",
"image_fname": "2.jpg",
},
my code then downloads the image from the src_url and I want to add the local image url to the ['media_details'].
job = mongo.db.JobProcess
job.update({'_id': db_id},
{'$push': {
'image_details': {
'local_url': img_local_file,
}
}})
This adds the local_url to the top of the ['media_details'] - like so:
{'local_url': '/bin/static/5432ec0f-ea53-4fe1-83e4-f78166d1b9a6/1.jpg'},
{'local_url': '/bin/static/5432ec0f-ea53-4fe1-83e4-f78166d1b9a6/2.jpg'},
{'processed': True, 'position': 0, 'seconds': '46', 'src_url': 'https://xxxxx1.jpg', 'image_fname': '1.jpg'}
what I want it to do is:
"image_details": [
{
"processed": true,
"position": 0,
"seconds": "46",
"src_url": "https://xxxxx/1.jpg",
"image_fname": "1.jpg",
"local_url": "/bin/static/5432ec0f-ea53-4fe1-83e4-f78166d1b9a6/1.jpg"
},
but which command ($set, $push, $addToSet) is best suited for updating this? and how do I implement it?
You need to update the image_details array item using the positional operator $. You will need a query that can uniquely identify the array item, perhaps src_url:
job.update({$and:[
{"_id": db_id},
{"image_details.src_url": img_src_url }
]},
{$set :{"image_details.$.local_url": img_local_file },
{multi:false})
You need to use positional update operator
job.updateOne({
'_id': db_id,
'image_details.src_url': yourUrl,
}, {
$set: {
'image_details.$.local_url': img_local_file
});
Pretty new to Python API calls I Have the JSON data sample below
I can get data calling for example (swell.*) or (swell.minBreakingHeight) and it returns all the swell data no worries. So ok with a working request
I can't seem to narrow it down with success example swell.primary.height
Obviously the format above here is incorrect and keeps returning []
How do I get in that extra level?
[{
timestamp: 1366902000,
localTimestamp: 1366902000,
issueTimestamp: 1366848000,
fadedRating: 0,
solidRating: 0,
swell: {
minBreakingHeight: 1,
absMinBreakingHeight: 1.06,
maxBreakingHeight: 2,
absMaxBreakingHeight: 1.66,
unit: "ft",
components: {
combined: {
height: 1.1,
period: 14,
direction: 93.25,
compassDirection: "W"
},
primary: {
height: 1,
period: 7,
direction: 83.37,
compassDirection: "W"
},
Working with your data snippet:
data = [{
'timestamp': 1366902000,
'localTimestamp': 1366902000,
'issueTimestamp': 1366848000,
'fadedRating': 0,
'solidRating': 0,
'swell': {
'minBreakingHeight': 1,
'absMinBreakingHeight': 1.06,
'maxBreakingHeight': 2,
'absMaxBreakingHeight': 1.66,
'unit': "ft",
'components': {
'combined': {
'height': 1.1,
'period': 14,
'direction': 93.25,
'compassDirection': "W"
},
'primary': {
'height': 1,
'period': 7,
'direction': 83.37,
'compassDirection': "W"
}
}
}
}
]
In [54]: data[0]['timestamp']
Out[54]: 1366902000
In [55]: data[0]['swell']['components']['primary']['height']
Out[55]: 1
So using your dot notation, you should be calling:
swell.components.primary.height
For furhter insight on parsing json files refer to this other stackoverflow question