Python: Script works, but seems to deadlock after some time - python

I have the following script, which is working for the most part Link to PasteBin The script's job is to start a number of threads which in turn each start a subprocess with Popen. The output from each subprocess is as follows:
1
2
3
.
.
.
n
Done
Bascially the subprocess is transferring 10M records from tables in one database to different tables in another db with a lot of data massaging/manipulation in between because of the different schemas. If the subprocess fails at any time in it's execution (bad records, duplicate primary keys, etc), or it completes successfully, it will output "Done\n". If there are no more records to select against for transfer then it will output "NO DATA\n"
My intent was to create my script "tableTransfer.py" which would spawn a number of these processes, read their output, and in turn output information such as number of updates completed, time remaining, time elapsed, and number of transfers per second.
I started running the process last night and checked in this morning to see it had deadlocked. There were not subprocceses running, there are still records to be updated, and the script had not exited. It was simply sitting there, no longer outputting the current information because no subprocces were running to update the total number complete which is what controls updates to the output. This is running on OS X.
I am looking for three things:
I would like to get rid of the possibility of this deadlock occurring so I don't need to check in on it as frequently. Is there some issue with locking?
Am I doing this in a bad way (gThreading variable to control looping of spawning additional thread... etc.) I would appreciate some suggestions for improving my overall methodology.
How should I handle ctrl-c exit? Right now I need to kill the process, but assume I should be able to use the signal module or other to catch the signal and kill the threads, is that right?
I am not sure whether I should be pasting my entire script here, since I usually just paste snippets. Let me know if I should paste it here as well.

You have a few places in your script where you return without releasing your locks. This could cause a problem - lines: 97 and 99 - this is where try: finally: blocks can help you a lot as you can then ensure that the release is called properly.

Related

How to identify and kill the child processes which were created during an API call whose parent processes have been killed ? (python code required)

I have created an API in python using tornado. For a given URL, it returns me with a summary of data on that URL. There are several processes which are still there (and not running, but consuming space). For example, in the image provided, I can see 'chrome_crashpad' which is not running but still consuming space. So, I am looking for a python code where we make sure that for API hit, the processes which were created are deleted at the end , so that it doesn't affect the server (over longer period of time). In other words, the junk processes whose parents have been killed should also be identified and deleted.
I tried and was unable to identify if these processes are junk processes or not. Also , was unable to identify if the process which is there and TIME shown is 00:00:00 is parent process or not. Need more clarity on this. But, the motive of this is to delete all the junk and zombie processes so that for a longer duration of time, the server remains stable.processes shown at my server

exiting a program with a cached exit code

I have a "healthchecker" program, that calls a "prober" every 10 seconds to check if a service is running. If the prober exits with return code 0, the healthchecker considers the tested service fine. Otherwise, it considers it's not working.
I can't change the healthchecker (I can't make it check with a bigger interval, or using a better communication protocol than spawning a process and checking its exit code).
That said, I don't want to really probe the service every 10 seconds because it's overkill. I just wanna probe it every minute.
My solution to that is to make the prober keep a "cache" of the last answer valid for 1 minute, and then just really probe when this cache expires.
That seems fine, but I'm having trouble thinking on a decent approach to do that, considering the program must exit (to return an exit code). My best bet so far would be to transform my prober in a daemon (that will keep the cache in memory) and create a client to just query it and exit with its response, but it seems too much work (and dealing with threads, and so on).
Another approach would be to use SQLite/memcached/redis.
Any other ideas?
Since no one has really proposed anything I'll drop my idea here. If you need an example let me know and I'll include one.
The easiest thing to do would be to serialize a dictionary that contains the system health and last time.time() it was checked. At the beginning of your program unpickle the dictionary, check the time, if it's less then your 60 second time interval, quit. Otherwise check the health like normal and cache it (with the time).

Task queue for deferred tasks in GAE with python

I'm sorry if this question has in fact been asked before. I've searched around quite a bit and found pieces of information here and there but nothing that completely helps me.
I am building an app on Google App engine in python, that lets a user upload a file, which is then being processed by a piece of python code, and then resulting processed file gets sent back to the user in an email.
At first I used a deferred task for this, which worked great. Over time I've come to realize that since the processing can take more than then 10 mins I have before I hit the DeadlineExceededError, I need to be more clever.
I therefore started to look into task queues, wanting to make a queue that processes the file in chunks, and then piece everything together at the end.
My present code for making the single deferred task look like this:
_=deferred.defer(transform_function,filename,from,to,email)
so that the transform_function code gets the values of filename, from, to and email and sets off to do the processing.
Could someone please enlighten me as to how I turn this into a linear chain of tasks that get acted on one after the other? I have read all documentation on Google app engine that I can think about, but they are unfortunately not written in enough detail in terms of actual pieces of code.
I see references to things like:
taskqueue.add(url='/worker', params={'key': key})
but since I don't have a url for my task, but rather a transform_function() implemented elsewhere, I don't see how this applies to me…
Many thanks!
You can just keep calling deferred to run your task when you get to the end of each phase.
Other queues just allow you to control the scheduling and rate, but work the same.
I track the elapsed time in the task, and when I get near the end of the processing window the code stops what it is doing, and calls defer for the next task in the chain or continues where it left off, depending if its a discrete set up steps or a continues chunk of work. This was all written back when tasks could only run for 60 seconds.
However the problem you will face (it doesn't matter if it's a normal task queue or deferred) is that each stage could fail for some reason, and then be re-run so each phase must be idempotent.
For long running chained tasks, I construct an entity in the datastore that holds the description of the work to be done and tracks the processing state for the job and then you can just keep rerunning the same task until completion. On completion it marks the job as complete.
To avoid the 10 minutes timeout you can direct the request to a backend or a B type module
using the "_target" param.
BTW, any reason you need to process the chunks sequentially? If all you need is some notification upon completion of all chunks (so you can "piece everything together at the end")
you can implement it in various ways (e.g. each deferred task for a chunk can decrease a shared datastore counter [read state, decrease and update all in the same transaction] that was initialized with the number of chunks. If the datastore update was successful and counter has reached zero you can proceed with combining all the pieces together.) An alternative for using deferred that would simplify the suggested workflow can be pipelines (https://code.google.com/p/appengine-pipeline/wiki/GettingStarted).

Asynchronous listening/iteration of pipes in python

I'm crunching a tremendous amount of data and since I have a 12 core server at my disposal, I've decided to split the work by using the multiprocessing library. The way I'm trying to do this is by having a single parent process that dishes out work evenly to multiple worker processes, then another that acts as a collector/funnel of all the completed work to be moderately processed for final output. Having done something similar to this before, I'm using Pipes because they are crazy fast in contrast to managed ques.
Sending data out to the workers using the pipes is working fine. However, I'm stuck on efficiently collecting the data from the workers. In theory, the work being handed out will be processed at the same pace and they will all get done at the same time. In practice, this never happens. So, I need to be able to iterate over each pipe to do something, but if there's nothing there, I need it to move on to the next pipe and check if anything is available for processing. As mentioned, it's on a 12 core machine, so I'll have 10 workers funneling down to one collection process.
The workers use the following to read from their pipe (called WorkerRadio)
for Message in iter(WorkerRadio.recv, 'QUIT'):
Crunch Numbers & perform tasks here...
CollectorRadio.send(WorkData)
WorkerRadio.send('Quitting')
So, they sit there looking at the pipe until something comes in. As soon as they get something they start doing their thing. Then fire it off to the data collection process. If they get a quit command, they acknowledge and shut down peacefully.
As for the collector, I was hoping to do something similar but instead of just 1 pipe (radio) there would be 10 of them. The collector needs to check all 10, and do something with the data that comes in. My first try was doing something like the workers...
i=0
for Message in iter(CollectorRadio[i].recv, 'QUIT'):
Crunch Numbers & perform tasks here...
if i < NumOfRadios:
i += 1
else:
i = 0
CollectorRadio.send('Quitting')
That didn't cut it & I tried a couple other ways of manipulating without success too. I either end up with syntax errors, or like the above, I get stuck on the first radio because it never changes for some reason. I looked into having all the workers talking into a single pipe, but the Python site explicit states that "data in a pipe may become corrupted if two processes (or threads) try to read from or write to the same end of the pipe at the same time."
As I mentioned, I'm also worried about some processes going slower than the others and holding up progress. If at all possible, I would like something that doesn't wait around for data to show up (ie. check and move on if nothing's there).
Any help on this would be greatly appreciated. I've seen some use of managed ques that might allow this to work; but, from my testing, managed ques are significantly slower than pipes and I can use as much performance on this as I can muster.
SOLUTION:
Based on pajton's post here's what I did to make it work...
#create list of pipes(labeled as radios)
TheRadioList = [CollectorRadio[i] for i in range(NumberOfRadios)]
while True:
#check for data on the pipes/radios
TheTransmission, Junk1, Junk2 = select.select(TheRadioList, [], [])
#find out who sent the data (which pipe/radio)
for TheSender in TheTransmission:
#read the data from the pipe
TheMessage = TheSender.recv()
crunch numbers & perform tasks here...
If you are using standard system pipes, then you can use select system call to query for which descriptors the data is available. Bt default select will block until at least one of passed descriptors is ready:
read_pipes = [pipe_fd0, pipe_fd1, ... ]
while True:
read_fds, write_fds, exc_fds = select.select(read_pipes, [], [] )
for read_fd in read_fds:
# read from read_fd pipe descriptor

Terminate Python Program, but Recover Data

I have an inefficient simulation running (it has been running for ~24 hours).
It can be split into 3 independent parts, so I would like to cancel the simulation, and start a more efficient one, but still recover the data that has already been calculated for the first part.
When an error happens in a program, for example, you can still access the data that the script was working with, and examine it to see where things went wrong.
Is there a way to kill the process manually without losing the data?
You could start a debugger such as winpdb, or any of several IDE debuggers, in a separate session, attach to the running process, (this halts it), set a break point in a section of the code that has access to your data, resume until you reach the break point and then save your data to a file, your new process could then load that data as a starting point.

Categories

Resources