Synchronize a Linux system command and a while-loop in Python

Synchronize a Linux system command and a while-loop in Python - python

With the RaspberryPi system I have to synchronize a Raspbian system command (raspivid -t 20000) with a while loop that reads continuously from a sensor adn stores samples in an array. The Raspbian command start a video recording by the RaspberryPi camera CSI module and I have to be sure that it starts at the same instant of the acquisition by the sensor. I have seen many solution that have confused me among modules like multiprocessing, threading, subprocess, ecc. So far the only thing that I have understood is that the os.system() function blocks execution of following python's commands placed in the script as long as it runs. So if I try with:
import os
import numpy as np
os.system("raspivid -t 20000 /home/pi/test.h264")
data = np.zeros(20000, dtype="float") #memory pre-allocation supposing I have to save 20000 samples from the sensor (1 for each millisecond of the video)
indx=0
while True:
sens = readbysensor() #where the readbysensor() function is defined before in the script and reads a sample from the sensor
data[indx]=sens
if indx==19999:
break
else:
indx+=1
that while-loop will run only when the os.system() function will finish. But as I wrote above I need that the two processes are synchronized and work in parallel. Any suggestion?

Just add an & at the end, to make the process detach to the background:
os.system("raspivid -t 20000 /home/pi/test.h264 &")
According to bash man pages:
If a command is terminated by the control operator &, the shell
executes the command in the background in a subshell. The shell does
not wait for the command to finish, and the return status is 0.
Also, if you want to minimize the time it takes for the loop to start after executing raspivid, you should allocate your data and indx prior to the call:
data = np.zeros(20000, dtype="float")
indx=0
os.system("raspivid -t 20000 /home/pi/test.h264 &")
while True:
# ....
Update:
Since we discussed further in the comments, it is clear that there is no really a need to start the loop "at the same time" as raspivid (whatever that might mean), because if you are trying to read data from the I2C and make sure you don't miss any data, you will be best of starting the reading operation prior to running raspivid. This way you are certain that in the meantime (however big of delay there is between those two executions) you are not missing any data.
Taking this into consideration, your code could look something like this:
data = np.zeros(20000, dtype="float")
indx=0
os.system("(sleep 1; raspivid -t 20000 /home/pi/test.h264) &")
while True:
# ....
This is the simplest version in which we add a delay of 1 second before running raspivid, so we have time to enter our while loop and start waiting for I2C data.
This works, but it is hardly a production quality code. For a better solution, run the data acquisition function in one thread and the raspivid in a second thread, preserving the launch order (the reading thread is started first).
Something like this:
import Queue
import threading
import os
# we will store all data in a Queue so we can process
# it at a custom speed, without blocking the reading
q = Queue.Queue()
# thread for getting the data from the sensor
# it puts the data in a Queue for processing
def get_data(q):
for cnt in xrange(20000):
# assuming readbysensor() is a
# blocking function
sens = readbysensor()
q.put(sens)
# thread for processing the results
def process_data(q):
for cnt in xrange(20000):
data = q.get()
# do something with data here
q.task_done()
t_get = threading.Thread(target=get_data, args=(q,))
t_process = threading.Thread(target=process_data, args=(q,))
t_get.start()
t_process.start()
# when everything is set and ready, run the raspivid
os.system("raspivid -t 20000 /home/pi/test.h264 &")
# wait for the threads to finish
t_get.join()
t_process.join()
# at this point all processing is completed
print "We are all done!"

You could rewrite your code as:
import subprocess
import numpy as np
n = 20000
p = subprocess.Popen(["raspivid", "-t", str(n), "/home/pi/test.h264"])
data = np.fromiter(iter(readbysensor, None), dtype=float, count=n)
subprocess.Popen() returns immidiately without waiting for raspivid to end.

Related

Python multiprocessing gradually increases memory until it runs our

I have a python program with multiple modules. They go like this:
Job class that is the entry point and manages the overall flow of the program
Task class that is the base class for the tasks to be run on given data. Many SubTask classes created specifically for different types of calculations on different columns of data are derived from the Task class. think of 10 columns in the data and each one having its own Task to do some processing. eg. 'price' column can used by a CurrencyConverterTask to return local currency values and so on.
Many other modules like a connector for getting data, utils module etc, which I don't think are relevant for this question.
The general flow of program: get data from the db continuously -> process the data -> write back the updated data to the db.
I decided to do it in multiprocessing because the tasks are relatively simple. Most of them do some basic arithmetic or logic operations and running it in one process takes a long time, especially getting data from a large db and processing in sequence is very slow.
So the multiprocessing (mp) code looks something like this (I cannot expose the entire file so i'm writing a simplified version, the parts not included are not relevant here. I've tested by commenting them out so this is an accurate representation of the actual code):
class Job():
def __init__():
block_size = 100 # process 100 rows at a time
some_query = "SELECT * IF A > B" # some query to filter data from db
def data_getter():
# continusouly get data from the db and put it into a queue in blocks
cursor = Connector.get_data(some_query)
block = []
for item in cursor:
block.append(item)
if len(block) ==block_size:
data_queue.put(data)
block = []
data_queue.put(None) # this will indicate the worker processors when to stop
def monitor():
# continuously monitor the system stats
timer = Timer()
while (True):
if timer.time_taken >= 60: # log some stats every 60 seconds
print(utils.system_stats())
timer.reset()
def task_runner():
while True:
# get data from the queue
# if there's no data, break out of loop
data = data_queue.get()
if data is None:
break
# run task one by one
for task in tasks:
task.do_something(data)
def run():
# queue to put data for processing
data_queue = mp.Queue()
# start a process for reading data from db
dg = mp.Process(target=self.data_getter).start()
# start a process for monitoring system stats
mon = mp.Process(target=self.monitor).start()
# get a list of tasks to run
tasks = [t for t in taskmodule.get_subtasks()]
workers = []
# start 4 processes to do the actual processing
for _ in range(4):
worker = mp.Process(target=task_runner)
worker.start()
workers.append(worker)
for w in workers:
w.join()
mon.terminate() # terminate the monitor process
dg.terminate() # end the data getting process
if __name__ == "__main__":
job = Job()
job.run()
The whole program is run like: python3 runjob.py
Expected behaviour: continuous stream of data goes in the data_queue and the each worker process gets the data and processes until there's no more data from the cursor at which point the workers finish and the entire program finishes.
This is working as expected but what is not expected is that the system memory usage keeps creeping up continuously until the system crashes. The data i'm getting here is not copied anywhere (at least intentionally). I expect the memory usage to be steady throughout the program. The length of the data_queue rarely exceeds 1 or 2 since the processes are fast enough to get the data when available so It's not the queue holding too much data.
My guess is that all the processes initiated here are long running ones and that has something to do with this. Although I can print the pid and if I follow the PID on top command the data_getter and monitor processes don't exceed more than 2% of memory usage. the 4 worker processes also don't use a lot of memory. And neither does the main process the whole thing runs in. there is an unaccounted for process that takes up 20%+ of the ram. And it bugs me so much I can't figure out what it is.

Streamz/Dask: gather does not wait for all results of buffer

Imports:
from dask.distributed import Client
import streamz
import time
Simulated workload:
def increment(x):
time.sleep(0.5)
return x + 1
Let's suppose I'd like to process some workload on a local Dask client:
if __name__ == "__main__":
with Client() as dask_client:
ps = streamz.Stream()
ps.scatter().map(increment).gather().sink(print)
for i in range(10):
ps.emit(i)
This works as expected, but sink(print) will, of course, enforce waiting for each result, thus the stream will not execute in parallel.
However, if I use buffer() to allow results to be cached, then gather() does not seem to correctly collect all results anymore and the interpreter exits before getting results. This approach:
if __name__ == "__main__":
with Client() as dask_client:
ps = streamz.Stream()
ps.scatter().map(increment).buffer(10).gather().sink(print)
# ^
for i in range(10): # - allow parallel execution
ps.emit(i) # - before gather()
...does not print any results for me. The Python interpreter just exits shortly after starting the script and before buffer() emits it's results, thus nothing gets printed.
However, if the main process is forced to wait for some time, the results are printed in parallel fashion (so they do not wait for each other, but are printed nearly simultaneously):
if __name__ == "__main__":
with Client() as dask_client:
ps = streamz.Stream()
ps.scatter().map(increment).buffer(10).gather().sink(print)
for i in range(10):
ps.emit(i)
time.sleep(10) # <- force main process to wait while ps is working
Why is that? I thought gather() should wait for a batch of 10 results since buffer() should cache exactly 10 results in parallel before flushing them to gather(). Why does gather() not block in this case?
Is there a nice way to otherwise check if a Stream still contains elements being processed in order to prevent the main process from exiting prematurely?

"Why is that?": because the Dask distributed scheduler (which executes the stream mapper and sink functions) and your python script run in different processes. When the "with" block context ends, your Dask Client is closed and execution shuts down before the items emitted to the stream are able reach the sink function.
"Is there a nice way to otherwise check if a Stream still contains elements being processed": not that I am aware of. However: if the behaviour you want is (I'm just guessing here) the parallel processing of a bunch of items, then Streamz is not what you should be using, vanilla Dask should suffice.

ffmpeg - Poll folder for files, and stream as video with rtp

(I'm a newbie when it comes to ffmpeg).
I have an image source which saves files to a given folder in a rate of 30 fps. I want to wait for every (let's say) 30 frames chunk, encode it to h264 and stream it with RDP to some other app.
I thought about writing a python app which just waits for the images, and then executes an ffmpeg command. For that I wrote the following code:
main.py:
import os
import Helpers
import argparse
import IniParser
import subprocess
from functools import partial
from Queue import Queue
from threading import Semaphore, Thread
def Run(config):
os.chdir(config.Workdir)
iteration = 1
q = Queue()
Thread(target=RunProcesses, args=(q, config.AllowedParallelRuns)).start()
while True:
Helpers.FileCount(config.FramesPathPattern, config.ChunkSize * iteration)
command = config.FfmpegCommand.format(startNumber = (iteration-1)*config.ChunkSize, vFrames=config.ChunkSize)
runFunction = partial(subprocess.Popen, command)
q.put(runFunction)
iteration += 1
def RunProcesses(queue, semaphoreSize):
semaphore = Semaphore(semaphoreSize)
while True:
runFunction = queue.get()
Thread(target=HandleProcess, args=(runFunction, semaphore)).start()
def HandleProcess(runFunction, semaphore):
semaphore.acquire()
p = runFunction()
p.wait()
semaphore.release()
if __name__ == '__main__':
argparser = argparse.ArgumentParser()
argparser.add_argument("config", type=str, help="Path for the config file")
args = argparser.parse_args()
iniFilePath = args.config
config = IniParser.Parse(iniFilePath)
Run(config)
Helpers.py (not really relevant):
import os
import time
from glob import glob
def FileCount(pattern, count):
count = int(count)
lastCount = 0
while True:
currentCount = glob(pattern)
if lastCount != currentCount:
lastCount = currentCount
if len(currentCount) >= count and all([CheckIfClosed(f) for f in currentCount]):
break
time.sleep(0.05)
def CheckIfClosed(filePath):
try:
os.rename(filePath, filePath)
return True
except:
return False
I used the following config file:
Workdir = "C:\Developer\MyProjects\Streaming\OutputStream\PPM"
; Workdir is the directory of reference from which all paths are relative to.
; You may still use full paths if you wish.
FramesPathPattern = "F*.ppm"
; The path pattern (wildcards allowed) where the rendered images are stored to.
; We use this pattern to detect how many rendered images are available for streaming.
; When a chunk of frames is ready - we stream it (or store to disk).
ChunkSize = 30 ; Number of frames for bulk.
; ChunkSize sets the number of frames we need to wait for, in order to execute the ffmpeg command.
; If the folder already contains several chunks, it will first process the first chunk, then second, and so on...
AllowedParallelRuns = 1 ; Number of parallel allowed processes of ffmpeg.
; This sets how many parallel ffmpeg processes are allowed.
; If more than one chunk is available in the folder for processing, we will execute several ffmpeg processes in parallel.
; Only when on of the processes will finish, we will allow another process execution.
FfmpegCommand = "ffmpeg -re -r 30 -start_number {startNumber} -i F%08d.ppm -vframes {vFrames} -vf vflip -f rtp rtp://127.0.0.1:1234" ; Command to execute when a bulk is ready for streaming.
; Once a chunk is ready for processing, this is the command that will be executed (same as running it from the terminal).
; There is however a minor difference. Since every chunk starts with a different frame number, you can use the
; expression of "{startNumber}" which will automatically takes the value of the matching start frame number.
; You can also use "{vFrames}" as an expression for the ChunkSize which was set above in the "ChunkSize" entry.
Please note that if I set "AllowedParallelRuns = 2" then it allows multiple ffmpeg processes to run simultaneously.
I then tried to play it with ffplay and see if I'm doing it right.
The first chunk was streamed fine. The following chunks weren't so great. I got a lot of [sdp # 0000006de33c9180] RTP: dropping old packet received too late messages.
What should I do so I get the ffplay, to play it in the order of the incoming images? Is it right to run parallel ffmpeg processes? Is there a better solution to my problem?
Thank you!

As I stated in the comment, since you rerun ffmpeg each time, the pts values are reset, but the client perceives this as a single continuous ffmpeg stream and thus expects increasing PTS values.
As I said you could use a ffmpeg python wrapper to control the streaming yourself, but yeah that is quite an amount of code. But, there is actually a dirty workaround.
So, apparently there is a -itsoffset parameter with which you can offset the input timestamps (see FFmpeg documentation). Since you know and control the rate, you could pass an increasing value with this parameter, so that each next stream is offset with the proper duration. E.g. if you stream 30 frames each time, and you know the fps is 30, the 30 frames create a time interval of one second. So on each call to ffmepg you would increase the -itsoffset value by one second, thus that should be added to the output PTS values. But I can't guarantee this works.
Since the idea about -itsoffset did not work, you could also try feeding the jpeg images via stdin to ffmpeg - see this link.

Why are the parallel tasks always slow at the first time?

I have some classifiers which I want to evaluate on the one sample. This task can be ran in parallel since they are independent of each other. This means that I want to parallelize it.
I tried it with python and also as a bash script. The problem is that when I run it the program for the first time, it takes like 30s-40s to finish. When I run the program multiple times consecutively, it takes just 1s-3s to finish. Even If I fed classifiers with different input I got different result so it seems that there is no caching. When I run some other program and afterwards rerun the program then it again takes 40s to finish.
I also observed in htop that CPUs are not that much utilized when the program is run for the first time but then when I rerun it again and again the CPUs are fully utilized.
Can someone please explain me this strange behaviour? How can I avoid it so that even the first run of the program will be fast?
Here is the python code:
import time
import os
from fastText import load_model
from joblib import delayed, Parallel, cpu_count
import json
os.system("taskset -p 0xff %d" % os.getpid())
def format_duration(start_time, end_time):
m, s = divmod(end_time - start_time, 60)
h, m = divmod(m, 60)
return "%d:%02d:%02d" % (h, m, s)
def classify(x, classifier_name, path):
f = load_model(path + os.path.sep + classifier_name)
labels, probabilities = f.predict(x, 2)
if labels[0] == '__label__True':
return classifier_name
else:
return None
if __name__ == '__main__':
with open('classifier_names.json') as json_data:
classifiers = json.load(json_data)
x = "input_text"
Parallel(n_jobs=cpu_count(), verbose=100, backend='multiprocessing', pre_dispatch='all') \
(delayed(perform_binary_classification)
(x, classifier, 'clfs/') for
classifier in classifiers)
end_time = time.time()
print(format_duration(start_time, end_time))
Here is the bash code:
#!/usr/bin/env bash
N=4
START_TIME=$SECONDS
open_sem(){
mkfifo pipe-$$
exec 3<>pipe-$$
rm pipe-$$
local i=$1
for((;i>0;i--)); do
printf %s 000 >&3
done
}
run_with_lock(){
local x
read -u 3 -n 3 x && ((0==x)) || exit $x
(
"$#"
printf '%.3d' $? >&3
)&
}
open_sem $N
for d in classifiers/* ; do
run_with_lock ~/fastText/fasttext predict "$d" test.txt
done
ELAPSED_TIME=$(($SECONDS - $START_TIME))
echo time taken $ELAPSED_TIME seconds
EDITED
The bigger picture is that I am running flask app with 2 API methods. Each of them calls the function that parallelize the classification. When I am doing requests, it behaves the same way like this program below. First request to method A takes a lot and then subsequent requests take like 1s. When I switch to method B it is the same behavior as with method A. If I switch between method A and method B several times like A,B,A,B then each request takes like 40s to finish.

One approach is to modify your python code to use an event loop, stay running all the time, and execute new jobs in parallel whenever new jobs are detected. One way to do this is is to have a job directory, and place a file in that directory whenever there is a new job todo. The python script should also move completed jobs out of that directory to prevent running them more than once. How to run an function when anything changes in a dir with Python Watchdog?
Another option is to use a fifo file which is piped to the python script, and add new lines to that file for new jobs. https://www.linuxjournal.com/content/using-named-pipes-fifos-bash
I personally dislike parallelizing in python, and prefer to parallelize in bash using GNU parallel. To do it this way, I would
implement the event loop and jobs directory or the fifo file job queue using bash and GNU parallel
modify the python script to remove all the parallel code
read each jobspec from stdin
process each one serially in a loop
pipe jobs to parallel, which pipes them to ncpu python processes, which each runs forever waiting for the next job from stdin
e.g., something like:
run_jobs.sh:
mkfifo jobs
cat jobs | parallel --pipe --round-robin -n1 ~/fastText/fasttext
queue_jobs.sh:
echo jobspec >> jobs
.py:
for jobspec in sys.stdin:
...
This has the disadvantage that all ncpu python processes may have the slow startup problem, but they can stay running indefinitely, so the problem becomes insignificant, and the code is much simpler and easier to debug and maintain.
Using a jobs directory and a file for each jobspec instead of a fifo jobs queue requires slightly more code, but it also makes it more straightforward to see which jobs are queued and which jobs are done.

how to start multiple jobs in python and communicate with the main job

I am a novice user of python multithreading/multiprocessing, so please bear with me.
I would like to solve the following problem and I need some help/suggestions in this regard.
Let me describe in brief:
I would like to start a python script which does something in the
beginning sequentially.
After the sequential part is over, I would like to start some jobs
in parallel.
Assume that there are four parallel jobs I want to start.
I would like to also start these jobs on some other machines using "lsf" on the computing cluster.My initial script is also running on a ” lsf”
machine.
The four jobs which I started on four machines will perform two logical steps A and B---one after the other.
When a job started initially, they start with logical step A and finish it.
After every job (4jobs) has finished the Step A; they should notify the first job which started these. In other words, the main job which started is waiting for the confirmation from these four jobs.
Once the main job receives confirmation from these four jobs; it should notify all the four jobs to do the logical step B.
Logical step B will automatically terminate the jobs after finishing the task.
Main job is waiting for the all jobs to finish and later on it should continue with the sequential part.
An example scenario would be:
Python script running on an “lsf” machine in the cluster starts four "tcl shells" on four “lsf” machines.
In each tcl shell, a script is sourced to do the logical step A.
Once the step A is done, somehow they should inform the python script which is waiting for the acknowledgement.
Once the acknowledgement is received from all the four, python script inform them to do the logical step B.
Logical step B is also a script which is sourced in their tcl shell; this script will also close the tcl shell at the end.
Meanwhile, python script is waiting for all the four jobs to finish.
After all four jobs are finished; it should continue with the sequential part again and finish later on.
Here are my questions:
I am confused about---should I use multithreading/multiprocessing. Which one suits better?
In fact what is the difference between these two? I read about these but I wasn't able to conclude.
What is python GIL? I also read somewhere at any one point in time only one thread will execute.
I need some explanation here. It gives me an impression that I can't use threads.
Any suggestions on how could I solve my problem systematically and in a more pythonic way.
I am looking for some verbal step by step explanation and some pointers to read on each step.
Once the concepts are clear, I would like to code it myself.
Thanks in advance.

In addition to roganjosh's answer, I would include some signaling to start the step B after A has finished:
import multiprocessing as mp
import time
import random
import sys
def func_A(process_number, queue, proceed):
print "Process {} has started been created".format(process_number)
print "Process {} has ended step A".format(process_number)
sys.stdout.flush()
queue.put((process_number, "done"))
proceed.wait() #wait for the signal to do the second part
print "Process {} has ended step B".format(process_number)
sys.stdout.flush()
def multiproc_master():
queue = mp.Queue()
proceed = mp.Event()
processes = [mp.Process(target=func_A, args=(x, queue)) for x in range(4)]
for p in processes:
p.start()
#block = True waits until there is something available
results = [queue.get(block=True) for p in processes]
proceed.set() #set continue-flag
for p in processes: #wait for all to finish (also in windows)
p.join()
return results
if __name__ == '__main__':
split_jobs = multiproc_master()
print split_jobs

1) From the options you listed in your question, you should probably use multiprocessing in this case to leverage multiple CPU cores and compute things in parallel.
2) Going further from point 1: the Global Interpreter Lock (GIL) means that only one thread can actually execute code at any one time.
A simple example for multithreading that pops up often here is having a prompt for user input for, say, an answer to a maths problem. In the background, they want a timer to keep incrementing at one second intervals to register how long the person took to respond. Without multithreading, the program would block whilst waiting for user input and the counter would not increment. In this case, you could have the counter and the input prompt run on different threads so that they appear to be running at the same time. In reality, both threads are sharing the same CPU resource and are constantly passing an object backwards and forwards (the GIL) to grant them individual access to the CPU. This is hopeless if you want to properly process things in parallel. (Note: In reality, you'd just record the time before and after the prompt and calculate the difference rather than bothering with threads.)
3) I have made a really simple example using multiprocessing. In this case, I spawn 4 processes that compute the sum of squares for a randomly chosen range. These processes do not have a shared GIL and therefore execute independently unlike multithreading. In this example, you can see that all processes start and end at slightly different times, but we can aggregate the results of the processes into a single queue object. The parent process will wait for all 4 child processes to return their computations before moving on. You could then repeat the code for func_B (not included in the code).
import multiprocessing as mp
import time
import random
import sys
def func_A(process_number, queue):
start = time.time()
print "Process {} has started at {}".format(process_number, start)
sys.stdout.flush()
my_calc = sum([x**2 for x in xrange(random.randint(1000000, 3000000))])
end = time.time()
print "Process {} has ended at {}".format(process_number, end)
sys.stdout.flush()
queue.put((process_number, my_calc))
def multiproc_master():
queue = mp.Queue()
processes = [mp.Process(target=func_A, args=(x, queue)) for x in xrange(4)]
for p in processes:
p.start()
# Unhash the below if you run on Linux (Windows and Linux treat multiprocessing
# differently as Windows lacks os.fork())
#for p in processes:
# p.join()
results = [queue.get() for p in processes]
return results
if __name__ == '__main__':
split_jobs = multiproc_master()
print split_jobs

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.