Using pymongo tailable cursors dies on empty collections

Using pymongo tailable cursors dies on empty collections - python

Hoping someone can help me understand if I'm seeing an issue or if I just don't understand mongodb tailable cursor behavior. I'm running mongodb 2.0.4 and pymongo 2.1.1.
Here is an script that demonstrates the problem.
#!/usr/bin/python
import sys
import time
import pymongo
MONGO_SERVER = "127.0.0.1"
MONGO_DATABASE = "mdatabase"
MONGO_COLLECTION = "mcollection"
mongodb = pymongo.Connection(MONGO_SERVER, 27017)
database = mongodb[MONGO_DATABASE]
if MONGO_COLLECTION in database.collection_names():
database[MONGO_COLLECTION].drop()
print "creating capped collection"
database.create_collection(
MONGO_COLLECTION,
size=100000,
max=100,
capped=True
)
collection = database[MONGO_COLLECTION]
# Run this script with any parameter to add one record
# to the empty collection and see the code below
# loop correctly
#
if len(sys.argv[1:]):
collection.insert(
{
"key" : "value",
}
)
# Get a tailable cursor for our looping fun
cursor = collection.find( {},
await_data=True,
tailable=True )
# This will catch ctrl-c and the error thrown if
# the collection is deleted while this script is
# running.
try:
# The cursor should remain alive, but if there
# is nothing in the collection, it dies after the
# first loop. Adding a single record will
# keep the cursor alive forever as I expected.
while cursor.alive:
print "Top of the loop"
try:
message = cursor.next()
print message
except StopIteration:
print "MongoDB, why you no block on read?!"
time.sleep(1)
except pymongo.errors.OperationFailure:
print "Delete the collection while running to see this."
except KeyboardInterrupt:
print "trl-C Ya!"
sys.exit(0)
print "and we're out"
# End
So if you look at the code, it is pretty simple to demonstrate the issue I'm having. When I run the code against an empty collection (properly capped and ready for tailing), the cursor dies and my code exits after one loop. Adding a first record in the collection makes it behave the way I'd expect a tailing cursor to behave.
Also, what is the deal with the StopIteration exception killing the cursor.next() waiting on data? Why can't the backend just block until data becomes available? I assumed the await_data would actually do something, but it only seems to keep the connection waiting a second or two longer than without it.
Most of the examples on the net show putting a second While True loop around the cursor.alive loop, but then when the script tails an empty collection, the loop just spins and spins wasting CPU time for nothing. I really don't want to put in a single fake record just to avoid this issue on application startup.

This is known behavior, and the 2 loops "solution" is the accepted practice to work around this case. In the case that the collection is empty, rather than immediately retrying and entering a tight loop as you suggest, you can sleep for a short time (especially if you expect that there will soon be data to tail).

Related

Check if a database connection is busy using python

I want to create a Database class which can create cursors on demand.
It must be possible to use the cursors in parallel (two or more cursor can coexist) and, since we can only have one cursor per connection, the Database class must handle multiple connections.
For performance reasons we want to reuse connections as much as possible and avoid creating a new connection every time a cursor is created:
whenever a request is made the class will try to find, among the opened connections, the first non-busy connection and use it.
A connection is still busy as long as the cursor has not been consumed.
Here is an example of such class:
class Database:
...
def get_cursos(self,query):
selected_connection = None
# Find usable connection
for con in self.connections:
if con.is_busy() == False: # <--- This is not PEP 249
selected_connection = con
break
# If all connections are busy, create a new one
if (selected_connection is None):
selected_connection = self._new_connection()
self.connections.append(selected_connection)
# Return cursor on query
cur = selected_connection.cursor()
cur.execute(query)
return cur
However looking at the PEP 249 standard I cannot find any way to check whether a connection is actually being used or not.
Some implementations such as MySQL Connector offer ways to check whether a connection has still unread content (see here), however as far as I know those are not part of PEP 249.
Is there a way I can achieve what described before for any PEP 249 compliant python database API ?

Perhaps you could use the status of the cursor to tell you if a cursor is being used. Let's say you had the following cursor:
new_cursor = new_connection.cursor()
cursor.execute(new_query)
and you wanted to see if that connection was available for another cursor to use. You might be able to do something like:
if (new_cursor.rowcount == -1):
another_new_cursor = new_connection.cursor()
...
Of course, all this really tells you is that the cursor hasn't executed anything yet since the last time it was closed. It could point to a cursor that is done (and therefore a connection that has been closed) or it could point to a cursor that has just been created or attached to a connection. Another option is to use a try/catch loop, something along the lines of:
try:
another_new_cursor = new_connection.cursor()
except ConnectionError?: //not actually sure which error would go here but you get the idea.
print("this connection is busy.")
Of course, you probably don't want to be spammed with printed messages but you can do whatever you want in that except block, sleep for 5 seconds, wait for some other variable to be passed, wait for user input, etc. If you are restricted to PEP 249, you are going to have to do a lot of things from scratch. Is there a reason you can't use external libraries?
EDIT: If you are willing to move outside of PEP 249, here is something that might work, but it may not be suitable for your purposes. If you make use of the mysql python library, you can take advantage of the is_connected method.
new_connection = mysql.connector.connect(host='myhost',
database='myDB',
user='me',
password='myPassword')
...stuff happens...
if (new_connection.is_connected()):
pass
else:
another_new_cursor = new_connection.cursor()
...

Non-interfering printing from threads

I am doing some threading with a Python script and it spits out stuff in all kinds of orders. However, I want to print out a single "Remaining: x" at the end of each thread and then erase that line before the next print statement. So basically, I'm trying to implement a progress/status update that erases itself before the next print statement.
I have something like this:
for i in range(1,10):
print "Something here"
print "Remaining: x"
sleep(5)
sys.stdout.write("\033[F")
sys.stdout.write("\033[K")
This works fine when you're printing this out just the way it is; however, as soon as you implement threading, the "Remaining" text doesn't get wiped out all the time and sometimes you get another "Something here" right before it wipes out the previous line.
Just trying to figure out the best way to get my progress/status text organized with multithreading.

Since your code doesn't include any threads it's difficult to give you specific advice about what you are doing wrong.
If it's output sequencing that bothers you, however, you should learn about locking, and have the threads share a lock. The idea would be to grab the lock (so nobody else can), send your output and flush the stream before releasing the lock.
However, the way the code is structured makes this difficult, since there is no way to guarantee that the cursor will always end up at a specific position when you have multiple threads writing output.

As previous commenter mentioned, you can use mutex (RLock) object to serialize access.
import threading
# global lock
stdout_lock = threading.RLock()
def log_stdout(*args):
global stdout_lock
msg = ""
for i in args:
msg += i
with stdout_lock:
sys.stdout.write(msg)
for i in range(1,10):
log_stdout("Something here\n")
log_stdout("Remaining: x\n")
sleep(5)
log_stdout("\033[F")
log_stdout("\033[K")

AWS boto - Instance Status/Snapshot Status won't update Python While Loop

So I am creating a Python script with boto to allow the user to prepare to expand their Linux root volume and partition. During the first part of the script, I would like to have a While loop or something similar to make the script not continue until:
a) the instance has been fully stopped
b) the snapshot has been finished creating.
Here are the code snippets for both of these:
Instance:
ins_prog = conn.get_all_instances(instance_ids=src_ins, filters={"instance-state":"stopped"})
while ins_prog == "[]":
print src_ins + " is still shutting down..."
time.sleep(2)
if ins_prog != "[]":
break
Snapshot:
snap_prog = conn.get_all_snapshots(snapshot_ids=snap_id, filters={"progress":"100"})
while snap_prog == "[]":
print snap.update
time.sleep(2)
if snap_prog != "[]":
print "done!"
break
So when calling conn.get_all_instances and conn.get_all_snapshots they return an empty list if the filters show nothing, which is formatted like []. The problem is the While loop does not even run. It's as if it does not recognize [] as the string produced by the get_all functions.
If there is an easier way to do this, please let me know I am at a loss right now ):
Thanks!
Edit: Based on garnaat's help here is the follow up issue.
snap_prog = conn.get_all_snapshots(snapshot_ids=snap.id)[0]
print snap_prog
print snap_prog.id
print snap_prog.volume_id
print snap_prog.status
print snap_prog.progress
print snap_prog.start_time
print snap_prog.owner_id
print snap_prog.owner_alias
print snap_prog.volume_size
print snap_prog.description
print snap_prog.encrypted
Results:
Snapshot:snap-xxx
snap-xxx
vol-xxx
pending
2015-02-12T21:55:40.000Z
xxxx
None
50
Created by expandDong.py at 2015-02-12 21:55:39
False
Note how snap_prog.progress returns null, but snap_prog.status stays as 'pending' when being placed in a While loop.
SOLVED:
MY colleague and I found out how to get the loop for snapshot working.
snap = conn.create_snapshot(src_vol)
while snap.status != 'completed':
snap.update()
print snap.status
time.sleep(5)
if snap.status == 'completed':
print snap.id + ' is complete.'
break
snap.update() call purely updates the variable snap to return the most recent information, where snap.status outputs the "pending" | "completed". I also had an issue with snap.status not showing the correct status of the snapshot according to the console. Apparently there is a significant lagtime between the Console and the SDK call. I had to wait ~4 minutes for the status to update to "completed" when the snapshot was completed in the console.

If I wanted to check the state of a particular instance and wait until that instance reached some state, I would do this:
import time
import boto.ec2
conn = boto.ec2.connect_to_region('us-west-2') # or whatever region you want
instance = conn.get_all_instances(instance_ids=['i-12345678'])[0].instances[0]
while instance.state != 'stopped':
time.sleep(2)
instance.update()
The funny business with the get_all_instances call is necessary because that call returns a Reservation object which, in turn, has an instances attribute that is a list of all matching instances. So, we are taking the first (and only) Reservation in the list and then getting the first (and only) Instance inside the reservation. You should probably but some error checking around that.
The snapshot can be handled in a similar way.
snapshot = conn.get_all_snapshots(snapshot_ids=['snap-12345678'])[0]
while snapshot.status != 'completed':
time.sleep(2)
snapshot.update()
The update() method on both objects queries EC2 for the latest state of the object and updates the local object with that state.

I will try to answer this generally first. So, you are querying a resource for its state. If a certain state is not met, you want to keep on querying/asking/polling the resource, until it is in the state you wish it to be. Obviously this requires you to actually perform the query within your loop. That is, in an abstract sense:
state = resource.query()
while state != desired_state:
time.sleep(T)
state = resource.query()
Think about how and why this works, in general.
Now, regarding your code and question, there are some uncertainties you need to figure out yourself. First of all, I am very sure that conn.get_all_instances() returns an empty list in your case and not actually the string '[]'. That is, your check should be for an empty list instead of for a certain string(*). Checking for an empty list in Python is as easy as not l:
l = give_me_some_list()
if not l:
print "that list is empty."
The other problem in your code is that you expect too much of the language/architecture you are using here. You query a resource and store the result in ins_prog. After that, you keep on checking ins_prog, as if this would "magically" update via some magic process in the background. No, that is not happening! You need to periodically call conn.get_all_instances() in order to get updated information.
(*) This is documented here: http://boto.readthedocs.org/en/latest/ref/ec2.html#boto.ec2.connection.EC2Connection.get_all_instances -- the docs explicitly state "Return type: list". Not string.

Results of previous update returned on mysql select in Python3

I have one script running on a server that updates a list of items in a MySQL database to be processed by another script running on my desktop. The script runs in a loop, processing the list every 5 minutes (the server side script also runs on a 5 minute cycle). On the first loop, the script retrieves the current list (basic SELECT operation), on the second cycle, it gets the same version (not updated) list, on the third, it gets the list it should have gotten on the second pass. On every pass after the first, the SELECT operation returns the data from the previous UPDATE operation.
def mainFlow():
activeList=[]
d=()
a=()
b=()
#cycleStart=datetime.datetime.now()
cur = DBSV.cursor(buffered=True)
cur.execute("SELECT list FROM active_list WHERE id=1")
d=cur.fetchone()
DBSV.commit()
a=d[0]
b=a[0]
activeList=ast.literal_eval(a)
print(activeList)
buyList=[]
clearOrders()
sellDecide()
if activeList:
for i in activeList:
a=buyCalculate(i)
if a:
buyList.append(i)
print ('buy list: ',buyList)
if buyList:
buyDecide(buyList)
cur.close()
d=()
a=()
b=()
activeList=[]
print ('+++++++++++++END OF BLOCK+++++++++++++++')
state=True
while state==True:
cycleStart=datetime.datetime.now()
mainFlow()
cycleEnd=datetime.datetime.now()
wait=300-(cycleEnd-cycleStart).total_seconds()
print ('wait=: ' +str(wait))
if wait>0:
time.sleep(wait)
As you can see, I am re initializing all my variables, I am closing my cursor, I am doing a commit() operation that is supposed to solve this sort of problem, I have tried plain cursors, and cursors with the buffer set True and False, always with the same result.
When I run the exact same Select query from MySQL Workbench, the results returned are fine.
Baffled, and stuck on this for 2 days.

You're performing your COMMIT before your UPDATE/INSERT/DELETE transactions
Though a SELECT statement is, theoretically, DML it has certain differences with INSERT, UPDATE and DELETE in that it doesn't modify the data within the database. If you want to see the data that has been changed within another session then you must COMMIT it only after it's been changed. This is partially exacerbated by you closing the cursor after each loop.
You've gone far too far in trying to solve this problem; there's no need to reset everything within the mainFlow() method (and I can't see a need for most of the variables)
def mainFlow():
buyList = []
cur = DBSV.cursor(buffered=True)
cur.execute("SELECT list FROM active_list WHERE id = 1")
activeList = cur.fetchone()[0]
activeList = ast.literal_eval(activeList)
clearOrders()
sellDecide()
for i in activeList:
a = buyCalculate(i)
if a:
buyList.append(i)
if buyList:
buyDecide(buyList)
DBSV.commit()
cur.close()
while True:
cycleStart = datetime.datetime.now()
mainFlow()
cycleEnd = datetime.datetime.now()
wait = 300 - (cycleEnd - cycleStart).total_seconds()
if wait > 0:
time.sleep(wait)
I've removed a fair amount of unnecessary code (and added spaces), I've removed the reuse of variable names for different things and the declaration of variables that are overwritten immediately. This still isn't very OO though...
As we don't have detailed knowledge of exactly what clearOrders(), sellDecide() and buyCalculate() you might want to double check this yourself.

Changes to a Gmail notification script

After a long search I've found this python script that does what I need in order to get a real time notification to my iOS app when a new email arrives. I usually write in Objective-c and this is the first time I'm dealing with Python. Before I'll try to setup and run the script I'd like to understand it a bit better.
This is the part that I'm not sure about:
# Because this is just an example, exit after 8 hours
time.sleep(8*60*60)
#finally:
# Clean up.
idler.stop()
idler.join()
M.close()
# This is important!
M.logout()
My questions:
Should I comment out time.sleep(8*60*60) If I want to keep the connection active at all times?
What's the use for the Clean up section? Do I need it if I want to keep the connection?
Why M.logout() is important?
The main question that includes all the above is What changes (if any) I need to do to this script in order for it to function without stoping or timing out.
Thanks

The script has started another thread, the actual work is done in this other thread.
For some reason the main thread is left without anything to do, that's why the author has put the time.sleep(8*60*60) to occupy it for a while.
If you want to keep the connection active at all times you need to uncomment the try:/finally:, see bellow.
If you are new to python beware that indentation is used to define blocks of code. The cleanup part might actually not be useful if you don't plan to stop the program, but with the try:/finally: the cleanup code will be executed even if you stop the program with Ctrl+C.
Not tested:
# Had to do this stuff in a try-finally, since some testing
# went a little wrong.....
try:
# Set the following two lines to your creds and server
M = imaplib2.IMAP4_SSL("imap.gmail.com")
M.login(USER, PASSWORD)
# We need to get out of the AUTH state, so we just select
# the INBOX.
M.select("INBOX")
numUnseen = getUnseen()
sendPushNotification(numUnseen)
#print M.status("INBOX", '(UNSEEN)')
# Start the Idler thread
idler = Idler(M)
idler.start()
# Sleep forever, one minute at a time
while True:
time.sleep(60)
finally:
# Clean up.
idler.stop()
idler.join()
M.close()
# This is important!
M.logout()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.