exiting a program with a cached exit code - python

I have a "healthchecker" program, that calls a "prober" every 10 seconds to check if a service is running. If the prober exits with return code 0, the healthchecker considers the tested service fine. Otherwise, it considers it's not working.
I can't change the healthchecker (I can't make it check with a bigger interval, or using a better communication protocol than spawning a process and checking its exit code).
That said, I don't want to really probe the service every 10 seconds because it's overkill. I just wanna probe it every minute.
My solution to that is to make the prober keep a "cache" of the last answer valid for 1 minute, and then just really probe when this cache expires.
That seems fine, but I'm having trouble thinking on a decent approach to do that, considering the program must exit (to return an exit code). My best bet so far would be to transform my prober in a daemon (that will keep the cache in memory) and create a client to just query it and exit with its response, but it seems too much work (and dealing with threads, and so on).
Another approach would be to use SQLite/memcached/redis.
Any other ideas?

Since no one has really proposed anything I'll drop my idea here. If you need an example let me know and I'll include one.
The easiest thing to do would be to serialize a dictionary that contains the system health and last time.time() it was checked. At the beginning of your program unpickle the dictionary, check the time, if it's less then your 60 second time interval, quit. Otherwise check the health like normal and cache it (with the time).

Related

How to change mongodb default cleanup time for TTL index?

I want to set TTL around 2-3 months so its clearly infeasible to check after every 60 sec for TTL indexex. I want to reduce overhead by checking TTL once in a day. Is there any way to manually/programmatically define this time?
Based on my knowledge it is impossible to do this. Some time ago I was looking for this option but have not found anything apart of disabling it completely.
I am inclined to think that this is impossible to modify because in TTL documentation it is told explicitly that:
The background task that removes expired documents runs every 60
seconds.
and there is no parameter in server configuration which makes anything similar.
P.S. I understand that you see this as a waste of resources, but I would start to worry about it only when I will see a bottleneck regarding to it.
P.P.S and if you would found that this is a bottleneck, you can implement your own cleanup (put a script which removes all documents later than some timestamp) and put it once per day on cron job.

MPI: How to get one process to terminate all others - python -> fortran

I have some MPI-enabled python MCMC sampling code that fires off parallel likelihood calls to separate cores. Because it's (necessarily - don't ask) rejection sampling, I only need one of the np samples to be successful to begin the next iteration, and have quite happily achieved a ~ np x speed-up by this method in the past.
I have applied this to a new problem where the likelihood calls a f2py-wrapped fortran subroutine. In this case, on each iteration the other np-1 processes wait for the slowest (sometimes very slow) result to come back in even if one of those np-1 is already acceptable.
So I suspect I need to pass a message to all non-winning (in speed terms) processes to terminate so that the next iteration can begin, and I need to get clear on some details of the best way to do this, as below.
The python code goes something like this. The sampler is PyMultiNEST.
from mpi4py import MPI
world=MPI.COMM_WORLD
def myloglike(parameters,data,noise):
modelDataRealisation,status=call_fortran_sub(parameters)
if status == 0: # Model generated OK
winner=world.rank # This is the rank of the current winner
# I want to pass a message to the other still-running processes
# identifying that a successful sample has come back
won=world.bcast(winner,root=winner)
# I tried receiving the message here but the fortran_sub doesn't know
# anything about this - need to go deeper - see below
# Calculate chisq value etc.
loglike = f(data,modelDataRealisation,noise)
return loglike
Should the broadcast go via the master process?
Now, the tricky part is how to receive the kill signal in the F90 code. Presumably if the code is always listening out (while loop?) it will slow down a lot - but should I anyway be using something like:
call MPI_RECV(winner,1,MPI_DOUBLE_PRECISION,MPI_ANY_SOURCE,MPI_ANY_TAG&
&,MPI_COMM_WORLD,0,0)
And then how to best to kill that process once the message has been received?
Finally, do I need to do anything in the F code to make the next iteration restart OK/spawn new processes?
Thanks!
What you are trying to do is not exactly textbook MPI, so I don't have a textbook answer for you. It sounds like you do not know how long a "bad" result will take.
You ask "Presumably if the code is always listening out (while loop?) it will slow down a lot" -- but if you are using non-blocking sends and receives, you can do work for, say, 100 iterations and then test for a "stop work" message.
I would avoid MPI_Bcast here, as that's not exactly what you want. One process wins. That process should then send a "i won!" message to everyone else. Yes, you are doing n-1 point-to-point operations, which is going to be a headache when you have a million mpi processes.
On the worker side, MPI_Irecv with ANY_SOURCE will match any processes "i won!" message. Periodically test for completion.

Having a function run at random time intervals, web2py

Im currently making a program that would send random text messages at randomly generated times during the day. I first made my program in python and then realized that if I would like other people to sign up to receive messages, I would have to use some sort of online framework. (If anyone knowns a way to use my code in python without having to change it that would be amazing, but for now I have been trying to use web2py) I looked into scheduler but it does not seem to do what I have in mind. If anyone knows if there is a way to pass a time value into a function and have it run at that time, that would be great. Thanks!
Check out the Apscheduler module for cron-like scheduling of events in python - In their example it shows how to schedule some python code to run in a cron'ish way.
Still not sure about the random part though..
As for a web framework that may appeal to you (seeing you are familiar with Python already) you should really look into Django (or to keep things simple just use WSGI).
Best.
I think that actually you can use Scheduler and Tasks of web2py. I've never used it ;) but the documentation describes creation of a task to which you can pass parameters from your code - so something you need - and it should work fine for your needs:
scheduler.queue_task('mytask', start_time=myrandomtime)
So you need web2py's cron job, running every day and firing code similar to the above for each message to be sent (passing parameters you need, possibly message content and phone number, see examples in web2py book). This would be a daily creation of tasks which would be processed later by the scheduler.
You can also have a simpler solution, one daily cron job which prepares the queue of messages with random times for the next day and the second one which runs every, like, ten minutes, checks what awaits to be processed and sends messages. So, no Tasks. This way is a bit ugly though (consider a single processing which takes more then 10 minutes). You may also want to have and check some statuses of the messages to be processed (like pending, ongoing, done) to prevent a situation in which two jobs are working on the same message and to allow tracking progress of the processing. Anyway, you could use the cron method it in an early version of your software and later replace it by a better method :)
In any case, you should check expected number of messages to process and average processing time on your target platform - to make sure that the chosen method is quick enough for your needs.
This is an old question but in case someone is interested, the answer is APScheduler blocking scheduler with jobs set to run in regular intervals with some jitter
See: https://apscheduler.readthedocs.io/en/3.x/modules/triggers/interval.html

How to ignore SIGKILL or force a process into 'D' sleep state?

I am trying to figure out how to get a process to ignore SIGKILL. The way I understand it, this isn't normally possible. My idea is to get a process into the 'D' state permanently. I want to do this for testing purposes (the corner case isn't really reproducible). I'm not sure this is possible programatically (I don't want to go damage hardware). I'm working in C++ and Python, but any language should be fine. I have root access.
I don't have any code to show because I don't know how to get started with this, or if it's even possible. Could I possibly set up a bad NFS and try reading from it?
Apologies in advance if this is a duplicate question; I didn't find anyone else trying to induce the D state.
Many thanks.
To get a process into the "D" state (uninterruptible sleep), you have to write kernel code which does that, and then call that code from user space via a system call.
In the Linux kernel, this is done by setting the current task state to uninterruptible, and invoking the scheduler:
set_current_state(TASK_UNINTERRUPTIBLE);
schedule();
Of course, these actions are normally wrapped with additional preparations so that the task has a way to wake up, such as registering on some wait queue or whatever.
Device drivers for low-latency devices such as mass storage use uninterruptible sleeps to simplify their logic. It should only be used when there is a sure-fire way that the process will wake up almost no matter what happens.
Kernel code to do little thing like performing an uninterruptible sleep can be put into a tiny module (start with a minimal driver skeleton) whose initialization function performs the code and then returns nonzero. You can then run the code using insmod, e.g.
insmod my_uninterruptible_sleep_mod.ko
there is no need to rmmod because the function fails, and so the module is unloaded immediately.
It is not possible to ignore SIGKILL or handle it in any way.
From man sigaction:
The sa_mask field specified in act is not allowed to block SIGKILL or SIGSTOP. Any attempt to do so will be silently ignored.

Task queue for deferred tasks in GAE with python

I'm sorry if this question has in fact been asked before. I've searched around quite a bit and found pieces of information here and there but nothing that completely helps me.
I am building an app on Google App engine in python, that lets a user upload a file, which is then being processed by a piece of python code, and then resulting processed file gets sent back to the user in an email.
At first I used a deferred task for this, which worked great. Over time I've come to realize that since the processing can take more than then 10 mins I have before I hit the DeadlineExceededError, I need to be more clever.
I therefore started to look into task queues, wanting to make a queue that processes the file in chunks, and then piece everything together at the end.
My present code for making the single deferred task look like this:
_=deferred.defer(transform_function,filename,from,to,email)
so that the transform_function code gets the values of filename, from, to and email and sets off to do the processing.
Could someone please enlighten me as to how I turn this into a linear chain of tasks that get acted on one after the other? I have read all documentation on Google app engine that I can think about, but they are unfortunately not written in enough detail in terms of actual pieces of code.
I see references to things like:
taskqueue.add(url='/worker', params={'key': key})
but since I don't have a url for my task, but rather a transform_function() implemented elsewhere, I don't see how this applies to me…
Many thanks!
You can just keep calling deferred to run your task when you get to the end of each phase.
Other queues just allow you to control the scheduling and rate, but work the same.
I track the elapsed time in the task, and when I get near the end of the processing window the code stops what it is doing, and calls defer for the next task in the chain or continues where it left off, depending if its a discrete set up steps or a continues chunk of work. This was all written back when tasks could only run for 60 seconds.
However the problem you will face (it doesn't matter if it's a normal task queue or deferred) is that each stage could fail for some reason, and then be re-run so each phase must be idempotent.
For long running chained tasks, I construct an entity in the datastore that holds the description of the work to be done and tracks the processing state for the job and then you can just keep rerunning the same task until completion. On completion it marks the job as complete.
To avoid the 10 minutes timeout you can direct the request to a backend or a B type module
using the "_target" param.
BTW, any reason you need to process the chunks sequentially? If all you need is some notification upon completion of all chunks (so you can "piece everything together at the end")
you can implement it in various ways (e.g. each deferred task for a chunk can decrease a shared datastore counter [read state, decrease and update all in the same transaction] that was initialized with the number of chunks. If the datastore update was successful and counter has reached zero you can proceed with combining all the pieces together.) An alternative for using deferred that would simplify the suggested workflow can be pipelines (https://code.google.com/p/appengine-pipeline/wiki/GettingStarted).

Categories

Resources