while loop stuck at same condtion

while loop stuck at same condtion - python

I have a code which creates a snapshot and then checks if it is done, i wrote the following code, but for some reason it doesn't update the state variable and the while loops keep printing the same thing even if the snapshot has been completed
The following is the code :
def call_creater():
regions = ['eu-central-1']
for region in regions:
ec2 = boto3.resource('ec2', region, aws_access_key_id=ACCESS_KEY, aws_secret_access_key=SECRET_KEY, )
snapshot = ec2.create_snapshot(VolumeId='vol-f9e7d220', Description='fra01-he-trial-ansible01')
while snapshot.state != 'completed':
print snapshot.state
print "Snapshot under creation"
time.sleep(10)
else:
print "snapshot READY"
OUTPUT:
pending
Snapshot under creation
pending
Snapshot under creation
pending
Snapshot under creation
pending
Snapshot under creation
This just keeps on printing the "Snapshot under creation" even though the snapshot gets completed. The reason for this is, i am not able to update my state variable i believe , please help me out how ?

snapshot = ec2.create_snapshot(VolumeId='vol-f9e7d220', Description='fra01-he-trial-ansible01')
This line is only executed once and at that instance the state is "pending". You have to recheck the state of snapshot variable again inside while loop.
You are not updating snapshot variable anywhere inside your code or while loop.
You will have to do something like this inside your while loop.
snapshot = conn.get_all_snapshots(snapshot_ids=[<YOUR SNAPSHOT ID>])[0]
Check the boto library and how to get the state of snapshot with id.

As Pratik mentioned, your statement is only executing once and never updating. You can update your resource with .load(). Even better though, I recommend you use the waiter. This will handle all the waiting logic for you and return when your snapshot is completed. For that resource, you would use: snapshot.wait_until_completed().

Related

Continue stopped run in MLflow

We run our experiment on AWS spot instances. Sometimes the experiments are stopped, and we would prefer to continue logging to the same run. How can you set the run-id of the active run?
Something like this pseudocode (not working):
if new:
mlflow.start_run(experiment_id=1, run_name=x)
else:
mlflow.set_run(run_id)

You can pass the run_id directly to start_run:
mlflow.start_run(experiment_id=1,
run_name=x,
run_id=<run_id_of_interrupted_run> # pass None to start a new run
)
Of course, you have to store the run_id for this. You can get it with run.info.run_id

How to continuously run a python script until an output is printed?

I am trying to get a python program to continuously run until a certain aws log is registered and printed. It is supposed to:
Run indefinitely even if no events happen
List whatever events occurs (in my case a log stating that the task is finished)
Stop running
the python command looks like this: python3 watch_logs.py <log-source> --start=15:00:00
The logs are working fine, and the python script can print them out between certain time frames as long as they already exist. The program works by taking a continuously running task which prints events to the log file, and the python script should filter out events I am looking for and print them.
But, when I run the script, it wont print the event even if I can see the log entry appear in the file. If i kill the process and run it again using the same timestamp, it will find the log entry and end the script like it should.
the code is fairly short:
logs = get_log_events(
log_group=log_group,
start_time=start_time,
end_time=end_time
)
while True:
for event in logs:
print(event['message'].rstrip())
sys.exit("Task complete")
Any insight why this is happening would help a lot. I am fairly new to python

The value in the variable logs is old when the file is updated. You need to update this variable. For example if you were to use logs = myfile.read() at start of your script, the value in the logs variable would be a snapshot of that file at that time.

Try storing event['message'].rstrip() in a variable and checking with an if statement if it corresponds to the log you want to find

If you don't want to read the file each time through the loop, you should have a look at pygtail (https://pypi.org/project/pygtail/).

I was overthinking the problem. I put the log variable inside the loop so it was being defined at each cycle
while True:
logs = get_log_events(
log_group=log_group,
start_time=start_time,
end_time=end_time
)
for event in logs:
print(event['message'].rstrip())
sys.exit("Task complete")

aws - lambda - python - firing up to 4x on s3 object created triggers

I have some simple lambda functions that are always getting fired up to 4x on almost every S3 Object Created (All) trigger and I can't seem to figure out why.
The files are relatively small right now, only a few KB, coming from Kinesis Firehose.
In the screenshot below, everything with a "File:" in the beginning of the message is a duplicate call on the same file written. The "Uploading:" part is where the function uploads the file to a different folder that has a lambda function watching (that one too executes multiple times!). That only happened once in the screenshot, but I've seen it happen multiple times.
I did add some code so when the function is executed it first checks an s3 folder for a blank file to see if it's already been processed or not, and if not then proceed. This catches some of the dupe executions, problem I'm seeing is sometimes the lambda function is executing twice within near the same millisecond so this check is obviously no good.
def lambda_handler(event, context):
bucket = event['Records'][0]['s3']['bucket']['name']
key = urllib.unquote_plus(event['Records'][0]['s3']['object']['key']).decode('utf8')
print "File: "+bucket+"/"+key
exists = False
tmp_f = key.replace('firehose/','')
tmp_f = tmp_f.split('/')
tmp_f_date = tmp_f[0]+"-"+tmp_f[1]+"-"+tmp_f[2]
tmp_f_name = "folder/"+tmp_f_date+"/"+tmp_f[0]+"-"+tmp_f[1]+"-"+tmp_f[2]+"-"+tmp_f[3]+"____"+(tmp_f[4].replace('.gz','.txt'))
try:
s3_client.download_file(bucket, tmp_f_name, '/tmp/tmpf')
if os.path.isfile('/tmp/tmpf'):
exists = True
print "--> File exists..."
except:
print "--> New file..."
pass
if exists == True: return False
open('/tmp/tmpf', 'a').close()
s3_client.upload_file('/tmp/tmpf', bucket, tmp_f_name)
return process_file() # returns True
Doing some researching I'm not finding a solid solution. Some say this is the nature of lambda, "at least once" execution and to keep track of the request ID but we're seeing executions fire within a millisecond or two of each other so we'd have to have a redis instance just to keep track? The S3 file method above is apparently too slow.
This doesn't seem optimal so I'm HOPING I'm missing maybe a line of code or callback? Someone please help :)

AWS boto - Instance Status/Snapshot Status won't update Python While Loop

So I am creating a Python script with boto to allow the user to prepare to expand their Linux root volume and partition. During the first part of the script, I would like to have a While loop or something similar to make the script not continue until:
a) the instance has been fully stopped
b) the snapshot has been finished creating.
Here are the code snippets for both of these:
Instance:
ins_prog = conn.get_all_instances(instance_ids=src_ins, filters={"instance-state":"stopped"})
while ins_prog == "[]":
print src_ins + " is still shutting down..."
time.sleep(2)
if ins_prog != "[]":
break
Snapshot:
snap_prog = conn.get_all_snapshots(snapshot_ids=snap_id, filters={"progress":"100"})
while snap_prog == "[]":
print snap.update
time.sleep(2)
if snap_prog != "[]":
print "done!"
break
So when calling conn.get_all_instances and conn.get_all_snapshots they return an empty list if the filters show nothing, which is formatted like []. The problem is the While loop does not even run. It's as if it does not recognize [] as the string produced by the get_all functions.
If there is an easier way to do this, please let me know I am at a loss right now ):
Thanks!
Edit: Based on garnaat's help here is the follow up issue.
snap_prog = conn.get_all_snapshots(snapshot_ids=snap.id)[0]
print snap_prog
print snap_prog.id
print snap_prog.volume_id
print snap_prog.status
print snap_prog.progress
print snap_prog.start_time
print snap_prog.owner_id
print snap_prog.owner_alias
print snap_prog.volume_size
print snap_prog.description
print snap_prog.encrypted
Results:
Snapshot:snap-xxx
snap-xxx
vol-xxx
pending
2015-02-12T21:55:40.000Z
xxxx
None
50
Created by expandDong.py at 2015-02-12 21:55:39
False
Note how snap_prog.progress returns null, but snap_prog.status stays as 'pending' when being placed in a While loop.
SOLVED:
MY colleague and I found out how to get the loop for snapshot working.
snap = conn.create_snapshot(src_vol)
while snap.status != 'completed':
snap.update()
print snap.status
time.sleep(5)
if snap.status == 'completed':
print snap.id + ' is complete.'
break
snap.update() call purely updates the variable snap to return the most recent information, where snap.status outputs the "pending" | "completed". I also had an issue with snap.status not showing the correct status of the snapshot according to the console. Apparently there is a significant lagtime between the Console and the SDK call. I had to wait ~4 minutes for the status to update to "completed" when the snapshot was completed in the console.

If I wanted to check the state of a particular instance and wait until that instance reached some state, I would do this:
import time
import boto.ec2
conn = boto.ec2.connect_to_region('us-west-2') # or whatever region you want
instance = conn.get_all_instances(instance_ids=['i-12345678'])[0].instances[0]
while instance.state != 'stopped':
time.sleep(2)
instance.update()
The funny business with the get_all_instances call is necessary because that call returns a Reservation object which, in turn, has an instances attribute that is a list of all matching instances. So, we are taking the first (and only) Reservation in the list and then getting the first (and only) Instance inside the reservation. You should probably but some error checking around that.
The snapshot can be handled in a similar way.
snapshot = conn.get_all_snapshots(snapshot_ids=['snap-12345678'])[0]
while snapshot.status != 'completed':
time.sleep(2)
snapshot.update()
The update() method on both objects queries EC2 for the latest state of the object and updates the local object with that state.

I will try to answer this generally first. So, you are querying a resource for its state. If a certain state is not met, you want to keep on querying/asking/polling the resource, until it is in the state you wish it to be. Obviously this requires you to actually perform the query within your loop. That is, in an abstract sense:
state = resource.query()
while state != desired_state:
time.sleep(T)
state = resource.query()
Think about how and why this works, in general.
Now, regarding your code and question, there are some uncertainties you need to figure out yourself. First of all, I am very sure that conn.get_all_instances() returns an empty list in your case and not actually the string '[]'. That is, your check should be for an empty list instead of for a certain string(*). Checking for an empty list in Python is as easy as not l:
l = give_me_some_list()
if not l:
print "that list is empty."
The other problem in your code is that you expect too much of the language/architecture you are using here. You query a resource and store the result in ins_prog. After that, you keep on checking ins_prog, as if this would "magically" update via some magic process in the background. No, that is not happening! You need to periodically call conn.get_all_instances() in order to get updated information.
(*) This is documented here: http://boto.readthedocs.org/en/latest/ref/ec2.html#boto.ec2.connection.EC2Connection.get_all_instances -- the docs explicitly state "Return type: list". Not string.

Using pymongo tailable cursors dies on empty collections

Hoping someone can help me understand if I'm seeing an issue or if I just don't understand mongodb tailable cursor behavior. I'm running mongodb 2.0.4 and pymongo 2.1.1.
Here is an script that demonstrates the problem.
#!/usr/bin/python
import sys
import time
import pymongo
MONGO_SERVER = "127.0.0.1"
MONGO_DATABASE = "mdatabase"
MONGO_COLLECTION = "mcollection"
mongodb = pymongo.Connection(MONGO_SERVER, 27017)
database = mongodb[MONGO_DATABASE]
if MONGO_COLLECTION in database.collection_names():
database[MONGO_COLLECTION].drop()
print "creating capped collection"
database.create_collection(
MONGO_COLLECTION,
size=100000,
max=100,
capped=True
)
collection = database[MONGO_COLLECTION]
# Run this script with any parameter to add one record
# to the empty collection and see the code below
# loop correctly
#
if len(sys.argv[1:]):
collection.insert(
{
"key" : "value",
}
)
# Get a tailable cursor for our looping fun
cursor = collection.find( {},
await_data=True,
tailable=True )
# This will catch ctrl-c and the error thrown if
# the collection is deleted while this script is
# running.
try:
# The cursor should remain alive, but if there
# is nothing in the collection, it dies after the
# first loop. Adding a single record will
# keep the cursor alive forever as I expected.
while cursor.alive:
print "Top of the loop"
try:
message = cursor.next()
print message
except StopIteration:
print "MongoDB, why you no block on read?!"
time.sleep(1)
except pymongo.errors.OperationFailure:
print "Delete the collection while running to see this."
except KeyboardInterrupt:
print "trl-C Ya!"
sys.exit(0)
print "and we're out"
# End
So if you look at the code, it is pretty simple to demonstrate the issue I'm having. When I run the code against an empty collection (properly capped and ready for tailing), the cursor dies and my code exits after one loop. Adding a first record in the collection makes it behave the way I'd expect a tailing cursor to behave.
Also, what is the deal with the StopIteration exception killing the cursor.next() waiting on data? Why can't the backend just block until data becomes available? I assumed the await_data would actually do something, but it only seems to keep the connection waiting a second or two longer than without it.
Most of the examples on the net show putting a second While True loop around the cursor.alive loop, but then when the script tails an empty collection, the loop just spins and spins wasting CPU time for nothing. I really don't want to put in a single fake record just to avoid this issue on application startup.

This is known behavior, and the 2 loops "solution" is the accepted practice to work around this case. In the case that the collection is empty, rather than immediately retrying and entering a tight loop as you suggest, you can sleep for a short time (especially if you expect that there will soon be data to tail).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.