Saving images faster on disk - python

I have developed a Python multi-thread program, one producer thread which acquires frames (512 x 640, 16uint) at high fps (around 75 fps), and two consumer-threads, one for real-time visualization and other for saving as 16-bit tiff. For each one of the consumers, I use a different queue.
Visualizing at real-time works fine, but saving takes so much time after the video is stoped ( even 20 seconds for a 2-minute recording). For saving, I used tifffile library or cv2, with similar performances.
UPDATED info
The images are gray-scale 16-bit numpy arrays directly placed in the queue, not compression, saved using tifffile.imsave. The second queue for visualization works properly on real-time, so saving must be the slowest process. I need to save each image independently, saving 3D is not an option for the time being. Using different threads for saving may ruin my acquisition order.
Is there any way, both in Python and/or OS (windows10) to accelerate the process, taking into account I need to save them in the same order they were recorded? I have a SSD 970 EVO disk drive
class VideoGet():
def __init__(self,input_dict,folder,record):
self.handle=input_dict['handle']
self.frame_t=input_dict['frametype']
self.frameSize= input_dict['frameSize']
# self.buffer = np.zeros(shape=(513,640), dtype=np.uint16)
self.record = record
self.save = False
self.done=False
self.counter=0
self.folder = folder
def displayer(self,q2):
while self.record is True:
if q2.empty() is True:
pass
else:
framedisplay = q2.get()
cv2.namedWindow("Video", cv2.WINDOW_NORMAL)
cv2.imshow("Video", framedisplay[:-1,:])
cv2.waitKey(1)
q2.task_done()
cv2.destroyAllWindows()
def consumer(self,q):
data=[]
while True:
if q.empty():
time.sleep(0.002)
pass
else:
frame_get = q.get()
if frame_get is None:
print('gone')
break
imsave(os.path.join(self.folder,(str(self.counter).zfill(5)+'.tiff')), frame_get[:-1,:])
if self.counter==0:
TS=1.0e-3 *struct.unpack('Q',(frame_get[-1,6:10]).tobytes())[0]
entr=[str(self.counter).zfill(5),str(round(1.0e3*(1.0e-3 *struct.unpack('Q',(frame_get[-1,6:10]).tobytes())[0]-TS)))]
data.append(entr)
self.counter=self.counter+1
q.task_done()
if data:
df = pd.DataFrame(data, columns = ['Picture', 'Timestamp'])
df.to_csv(os.path.join(self.folder,'timestamps.txt'), header=False, index=False, sep=' ')
print('done')
self.done=True
def producer(self,buffer,q,q2):
while self.record is True:
buffer=np.empty_like(buffer)
if camera.properties.get_frame(self.handle,self.frame_t,4,buffer,self.frameSize)==0:
frame=buffer
q2.put(frame)
if self.save is True:
q.put(frame)
del frame
print('None')
q.put(None)
def run(self,buffer,q,q2):
prod_thread=Thread(target=self.producer,args=(buffer,q,q2,))
display_thread=Thread(target=self.displayer,args=(q2,))
con_thread= Thread(target=self.consumer, args=(q,))
prod_thread.start()
display_thread.start()
con_thread.start()

It's hard to say what's going wrong when you don't show your code. Also, it is not clear to me why the acquisition order would change if you have multiple writers.
Here is a script to generate synthetic frames the same size as your images and save them as TIFF files, in order. It scales up in speed pretty linearly with more writer threads:
NFRAMES NWRITERS TIME(s)
1000 1 1.48
1000 2 0.78
1000 4 0.48
#!/usr/bin/env python3
import time
import numpy as np
import threading, queue
from tifffile import imsave
def writer(q):
print('[WRITER] Started')
total = 0
while True:
(frameNum, im) = q.get()
if frameNum < 0:
break
# Save as TIFF
imsave(f'frame-{frameNum}.tif', im)
total += 1
print(f'[WRITER] Complete: wrote {total} frames')
if __name__ == "__main__":
# Edit these to suit
NFRAMES = 1000
NWRITERS= 4
# Create dummy image of correct size
h, w = 640, 512
im = np.random.randint(0, 65536, (h,w), dtype=np.uint16)
# Create a queue to pass frames to writer(s)
q = queue.Queue(16)
print('[MAIN] Started')
start = time.time()
# Create and start writer thread(s)
threads = []
for _ in range(NWRITERS):
t = threading.Thread(target=writer, args=(q,))
t.start()
threads.append(t)
# Generate a large number of frames to store
for frameNum in range(NFRAMES):
# Put a tuple of frameNum and image in queue
q.put((frameNum, im))
# Sentinel to tell each writer to exit
for _ in range(NWRITERS):
q.put((-1,-1))
# Wait for our writer thread(s) to exit
for thread in threads:
thread.join()
elapsed = time.time() - start;
print(f'[MAIN] Complete: {NFRAMES} frames, with {NWRITERS} writers in {elapsed} seconds')
Sample Output
[MAIN] Started
[WRITER] Started
[WRITER] Started
[WRITER] Started
[WRITER] Started
[WRITER] Complete: wrote 250 frames
[WRITER] Complete: wrote 250 frames
[WRITER] Complete: wrote 250 frames
[WRITER] Complete: wrote 250 frames
[MAIN] Complete: 1000 frames, with 4 writers in 0.4869719505310059 seconds
One thing I noticed, is that it runs about 10% faster if you replace tifffile.imsave() with:
np.save(f'frame-{frameNum}.npy', im)

Related

How to use pyav or opencv to decode a live stream of raw H.264 data?

The data was received by socket ,with no more shell , they are pure I P B frames begin with NAL Header(something like 00 00 00 01). I am now using pyav to decode the frames ,but i can only decode the data after the second pps info(in key frame) was received(so the chunk of data I send to my decode thread can begin with pps and sps ), otherwise the decode() or demux() will return error "non-existing PPS 0 referenced decode_slice_header error" .
I want to feed data to a sustaining decoder which can remember the previous P frame , so after feeding one B frame, the decoder return a decoded video frame. Or someform of IO that can be opened as container and keep writing data into it by another thread.
Here is my key code:
#read thread... read until get a key frame, then make a new io.BytesIO() to store the new data.
rawFrames = io.BytesIO()
while flag_get_keyFrame:()
....
content= socket.recv(2048)
rawFrames.write(content)
....
#decode thread... decode content between two key frames
....
rawFrames.seek(0)
container = av.open(rawFrames)
for packet in container.demux():
for frame in packet.decode():
self.frames.append(frame)
....
My code will play the video but with a 3~4 seconds delay. So I am not putting all of it here, because I know it's not actually working for what I want to achieve.
I want to play the video after receiving the first key frame and decode the following frames right after receiving them . Pyav opencv ffmpeg or something else ,how can I achieve my goal?
After hours of finding an answer for this as well. I figure this out myself.
For single thread, you can do the following:
rawData = io.BytesIO()
container = av.open(rawData, format="h264", mode='r')
cur_pos = 0
while True:
data = await websocket.recv()
rawData.write(data)
rawData.seek(cur_pos)
for packet in container.demux():
if packet.size == 0:
continue
cur_pos += packet.size
for frame in packet.decode():
self.frames.append(frame)
That is the basic idea. I have worked out a generic version that has receiving thread and decoding thread separated. The code will also skip frames if the CPU does not keep up with the decoding speed and will start decoding from the next key frame (so you will not have the teared green screen effect). Here is the full version of the code:
import asyncio
import av
import cv2
import io
from multiprocessing import Process, Queue, Event
import time
import websockets
def display_frame(frame, start_time, pts_offset, frame_rate):
if frame.pts is not None:
play_time = (frame.pts - pts_offset) * frame.time_base.numerator / frame.time_base.denominator
if start_time is not None:
current_time = time.time() - start_time
time_diff = play_time - current_time
if time_diff > 1 / frame_rate:
return False
if time_diff > 0:
time.sleep(time_diff)
img = frame.to_ndarray(format='bgr24')
cv2.imshow('Video', img)
return True
def get_pts(frame):
return frame.pts
def render(terminated, data_queue):
rawData = io.BytesIO()
cur_pos = 0
frames_buffer = []
start_time = None
pts_offset = None
got_key_frame = False
while not terminated.is_set():
try:
data = data_queue.get_nowait()
except:
time.sleep(0.01)
continue
rawData.write(data)
rawData.seek(cur_pos)
if cur_pos == 0:
container = av.open(rawData, mode='r')
original_codec_ctx = container.streams.video[0].codec_context
codec = av.codec.CodecContext.create(original_codec_ctx.name, 'r')
cur_pos += len(data)
dts = None
for packet in container.demux():
if packet.size == 0:
continue
dts = packet.dts
if pts_offset is None:
pts_offset = packet.pts
if not got_key_frame and packet.is_keyframe:
got_key_frame = True
if data_queue.qsize() > 8 and not packet.is_keyframe:
got_key_frame = False
continue
if not got_key_frame:
continue
frames = codec.decode(packet)
if start_time is None:
start_time = time.time()
frames_buffer += frames
frames_buffer.sort(key=get_pts)
for frame in frames_buffer:
if display_frame(frame, start_time, pts_offset, codec.framerate):
frames_buffer.remove(frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
if dts is not None:
container.seek(25000)
rawData.seek(cur_pos)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
terminated.set()
cv2.destroyAllWindows()
async def receive_encoded_video(websocket, path):
data_queue = Queue()
terminated = Event()
p = Process(
target=render,
args=(terminated, data_queue)
)
p.start()
while not terminated.is_set():
try:
data = await websocket.recv()
except:
break
data_queue.put(data)
terminated.set()
Its normal getting 3~4 seconds delay because you are reading encoded data and decoding it takes time via on CPU.
If you have GPU hardware, you can use FFMPEG to decode H264 by GPU. Here is an example.
If you don't have a GPU, decoding H264 on CPU always will cause delays. You can use FFMPEG for effective decoding but this will also decrease total delay almost 10%

Extract and analyze frames while recording using PiCamera, OpenCV

I'm streaming the video from my raspberryPi using piCamera to a web socket, so that I can view it within my local network.
I want to make my own motion detection script from scratch, therefore I want to get the first image from the video stream (which is going to be the plain background) then compare with a function next frames to the first one to check whether something has changed (I have written those functions separately), I am not really worrying about efficiency here.
MAIN ISSUE:
I want to get the data from those frames in a BytesIO object, then convert them to a 2D numpy array in B&W so I can perform operations. All this while keeping the stream going (I have in fact reduced the frame rate to 4 per second to make it run faster on my computer).
PROBLEM ENCOUNTERED WITH THE FOLLOWING CODE:
One part of the problem that I have identified is that the numbers are way off. In my settings my camera to have a resolution of around 640*480 (= 307,200 length numpy array pixels data in B&W) whereas my computations in len() return less that 100k pixels.
def main():
print('Initializing camera')
base_image = io.BytesIO()
image_captured = io.BytesIO()
with picamera.PiCamera() as camera:
camera.resolution = (WIDTH, HEIGHT)
camera.framerate = FRAMERATE
camera.vflip = VFLIP # flips image rightside up, as needed
camera.hflip = HFLIP # flips image left-right, as needed
sleep(1) # camera warm-up time
print('Initializing websockets server on port %d' % WS_PORT)
WebSocketWSGIHandler.http_version = '1.1'
websocket_server = make_server(
'', WS_PORT,
server_class=WSGIServer,
handler_class=WebSocketWSGIRequestHandler,
app=WebSocketWSGIApplication(handler_cls=StreamingWebSocket))
websocket_server.initialize_websockets_manager()
websocket_thread = Thread(target=websocket_server.serve_forever)
print('Initializing HTTP server on port %d' % HTTP_PORT)
http_server = StreamingHttpServer()
http_thread = Thread(target=http_server.serve_forever)
print('Initializing broadcast thread')
output = BroadcastOutput(camera)
broadcast_thread = BroadcastThread(output.converter, websocket_server)
print('Starting recording')
camera.start_recording(output, 'yuv')
try:
print('Starting websockets thread')
websocket_thread.start()
print('Starting HTTP server thread')
http_thread.start()
print('Starting broadcast thread')
broadcast_thread.start()
time.sleep(0.5)
camera.capture(base_image, use_video_port=True, format='jpeg')
base_data = np.asarray(bytearray(base_image.read()), dtype=np.uint64)
base_img_matrix = cv2.imdecode(base_data, cv2.IMREAD_GRAYSCALE)
while True:
camera.wait_recording(1)
#insert here the code for frame analysis
camera.capture(image_captured, use_video_port=True, format='jpeg')
data_next = np.asarray(bytearray(image_captured.read()), dtype=np.uint64)
next_img_matrix = cv2.imdecode(data_next, cv2.IMREAD_GRAYSCALE)
monitor_changes(base_img_matrix, next_img_matrix)
except KeyboardInterrupt:
pass
finally:
print('Stopping recording')
camera.stop_recording()
print('Waiting for broadcast thread to finish')
broadcast_thread.join()
print('Shutting down HTTP server')
http_server.shutdown()
print('Shutting down websockets server')
websocket_server.shutdown()
print('Waiting for HTTP server thread to finish')
http_thread.join()
print('Waiting for websockets thread to finish')
websocket_thread.join()
if __name__ == '__main__':
main()
Solved, basically the problem was all in the way I was handling data and BytesIO files. First of all I needed to use unsigned int8 as type of the file to decode id. Then I have switched to np.frombuffer to read the files in its entirety, because the base image is not going to change, hence it will read always the same thing, and the next one will be inizialized and eliminated in every while loop. Also I can replace cv2.IMREAD_GRAYSCALE with 0 in the function.
camera.start_recording(output, 'yuv')
base_image = io.BytesIO()
try:
print('Starting websockets thread')
websocket_thread.start()
print('Starting HTTP server thread')
http_thread.start()
print('Starting broadcast thread')
broadcast_thread.start()
time.sleep(0.5)
camera.capture(base_image, use_video_port=True, format='jpeg')
base_data = np.frombuffer(base_image.getvalue(), dtype=np.uint8)
base_img_matrix = cv2.imdecode(base_data, 0)
while True:
camera.wait_recording(0.25)
image_captured = io.BytesIO()
#insert here the code for frame analysis
camera.capture(image_captured, use_video_port=True, format='jpeg')
data_next = np.frombuffer(image_captured.getvalue(), dtype=np.uint8)
next_img_matrix = cv2.imdecode(data_next, cv2.IMREAD_GRAYSCALE)
monitor_changes(base_img_matrix, next_img_matrix)
image_captured.close()

How can I improve my python openCV video-stream?

I've been working on a project where I use a raspberry pi to send a live video feed to my server. This kinda works but not how I'd like it to.
The problem mainly is the speed. Right now I can send a 640x480 video stream with a speed of around 3.5 FPS and a 1920x1080 with around 0.5 FPS, which is terrible. Since I am not a professional I thought there should be a way of improving my code.
The sender (Raspberry pi):
def send_stream():
connection = True
while connection:
ret,frame = cap.read()
if ret:
# You might want to enable this while testing.
# cv2.imshow('camera', frame)
b_frame = pickle.dumps(frame)
b_size = len(b_frame)
try:
s.sendall(struct.pack("<L", b_size) + b_frame)
except socket.error:
print("Socket Error!")
connection = False
else:
print("Received no frame from camera, exiting.")
exit()
The Receiver (Server):
def recv_stream(self):
payload_size = struct.calcsize("<L")
data = b''
while True:
try:
start_time = datetime.datetime.now()
# keep receiving data until it gets the size of the msg.
while len(data) < payload_size:
data += self.connection.recv(4096)
# Get the frame size and remove it from the data.
frame_size = struct.unpack("<L", data[:payload_size])[0]
data = data[payload_size:]
# Keep receiving data until the frame size is reached.
while len(data) < frame_size:
data += self.connection.recv(32768)
# Cut the frame to the beginning of the next frame.
frame_data = data[:frame_size]
data = data[frame_size:]
frame = pickle.loads(frame_data)
frame = cv2.cvtColor(frame,cv2.COLOR_BGR2RGB)
end_time = datetime.datetime.now()
fps = 1/(end_time-start_time).total_seconds()
print("Fps: ",round(fps,2))
self.detect_motion(frame,fps)
self.current_frame = frame
except (socket.error,socket.timeout) as e:
# The timeout got reached or the client disconnected. Clean up the mess.
print("Cleaning up: ",e)
try:
self.connection.close()
except socket.error:
pass
self.is_connected = False
break
One potential reason could because of I/O latency when reading frames. Since cv2.VideoCapture().read() is a blocking operation, the main program is stalled until a frame is read from the camera device and returned. A method to improve performance would be to spawn another thread to handle grabbing frames in parallel instead of relying on a single thread to grab frames in sequential order. We can improve performance by creating a new thread that only polls for new frames while the main thread handles processing/graphing the most recent frame.
Your current approach (Sequential):
Thread 1: Grab frame -> Process frame -> Plot
Proposed approach (Parallel):
Thread 1: Grab frame
from threading import Thread
import time
def get_frames():
while True:
ret, frame = cap.read()
time.sleep(.01)
thread_frames = Thread(target=self.get_frames, args=())
thread_frames.daemon = True
thread_frames.start()
Thread 2: Process frame -> Plot
def process_frames():
while True:
# Grab most recent frame
# Process/plot frame
...
By having separate threads, your program will be in parallel since there will always be a frame ready to be processed instead of having to wait for a frame to be read in before processing can be done.
Note: This method will give you a performance boost based on I/O latency reduction. This isn't a true increase of FPS as it is a dramatic reduction in latency (a frame is always available for processing; we don't need to poll the camera device and wait for the I/O to complete).
After searching the internet for ages, I found a quick solution which doubled the fps (This is still way too low: 1.1 fps #1080p). What I did was I stopped using pickle and used base64 instead. apparently pickling the image just takes a while. Anyway this is my new code:
The sender (Raspberry pi):
def send_stream():
global connected
connection = True
while connection:
if last_frame is not None:
# You might want to uncomment these lines while testing.
# cv2.imshow('camera', frame)
# cv2.waitKey(1)
frame = last_frame
# The old pickling method.
#b_frame = pickle.dumps(frame)
encoded, buffer = cv2.imencode('.jpg', frame)
b_frame = base64.b64encode(buffer)
b_size = len(b_frame)
print("Frame size = ",b_size)
try:
s.sendall(struct.pack("<L", b_size) + b_frame)
except socket.error:
print("Socket Error!")
connection = False
connected = False
s.close()
return "Socket Error"
else:
return "Received no frame from camera"
The Receiver (Server):
def recv_stream(self):
payload_size = struct.calcsize("<L")
data = b''
while True:
try:
start_time = datetime.datetime.now()
# keep receiving data until it gets the size of the msg.
while len(data) < payload_size:
data += self.connection.recv(4096)
# Get the frame size and remove it from the data.
frame_size = struct.unpack("<L", data[:payload_size])[0]
data = data[payload_size:]
# Keep receiving data until the frame size is reached.
while len(data) < frame_size:
data += self.connection.recv(131072)
# Cut the frame to the beginning of the next frame.
frame_data = data[:frame_size]
data = data[frame_size:]
# using the old pickling method.
# frame = pickle.loads(frame_data)
# Converting the image to be sent.
img = base64.b64decode(frame_data)
npimg = np.fromstring(img, dtype=np.uint8)
frame = cv2.imdecode(npimg, 1)
frame = cv2.cvtColor(frame,cv2.COLOR_BGR2RGB)
end_time = datetime.datetime.now()
fps = 1/(end_time-start_time).total_seconds()
print("Fps: ",round(fps,2))
self.detect_motion(frame,fps)
self.current_frame = frame
except (socket.error,socket.timeout) as e:
# The timeout got reached or the client disconnected. Clean up the mess.
print("Cleaning up: ",e)
try:
self.connection.close()
except socket.error:
pass
self.is_connected = False
break
I also increased the packet size which increased the fps when sending from my local machine to my local machine while testing, but this didn't change anything whatsoever when using the raspberry pi.
You can see the full code on my github: https://github.com/Ruud14/SecurityCamera

Python multiprocessing hangs at join

I am reading a video file such that out for every 20 frames I'm storing first frames in Input Queue. Once I get all the required frames in Input Queue, then I run multiple processes to perform some operation on these frames and store the results in output queue.
But the code always stuck at join, I tried different solutions proposed for such problems but none of them seems to work.
import numpy as np
import cv2
import timeit
import face_recognition
from multiprocessing import Process, Queue, Pool
import multiprocessing
import os
s = timeit.default_timer()
def alternative_process_target_func(input_queue, output_queue):
while not output_queue.full():
frame_no, small_frame, face_loc = input_queue.get()
print('Frame_no: ', frame_no, 'Process ID: ', os.getpid(), '----', multiprocessing.current_process())
#canny_frame(frame_no, small_frame, face_loc)
#I am just storing frame no for now but will perform something else later
output_queue.put((frame_no, frame_no))
if output_queue.full():
print('Its Full ---------------------------------------------------------------------------------------')
else:
print('Not Full')
print(timeit.default_timer() - s, ' seconds.')
print('I m not reading anymore. . .', os.getpid())
def alternative_process(file_name):
start = timeit.default_timer()
cap = cv2.VideoCapture(file_name)
frame_no = 1
fps = cap.get(cv2.CAP_PROP_FPS)
length = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
print('Frames Per Second: ', fps)
print('Total Number of frames: ', length)
print('Duration of file: ', int(length / fps))
processed_frames = 1
not_processed = 1
frames = []
process_this_frame = True
frame_no = 1
Input_Queue = Queue()
while (cap.isOpened()):
ret, frame = cap.read()
if not ret:
print('Size of input Queue: ', Input_Queue.qsize())
print('Total no of frames read: ', frame_no)
end1 = timeit.default_timer()
print('Time taken to fetch useful frames: ', end1 - start)
threadn = cv2.getNumberOfCPUs()
Output_Queue = Queue(maxsize=Input_Queue.qsize())
process_list = []
#quit = multiprocessing.Event()
#foundit = multiprocessing.Event()
for x in range((threadn - 1)):
# print('Process No : ', x)
p = Process(target=alternative_process_target_func, args=(Input_Queue, Output_Queue))#, quit, foundit
#p.daemon = True
p.start()
process_list.append(p)
#p.join()
# for proc in process_list:
# print('---------------------------------------------------------------', proc.p)
i = 1
for proc in process_list:
print('I am hanged here')
proc.join()
print('I am done')
i += 1
end = timeit.default_timer()
print('Time taken by face verification: ', end - start)
break
if process_this_frame:
print(frame_no)
small_frame = cv2.resize(frame, (0, 0), fx=0.25, fy=0.25)
rgb_small_frame = small_frame[:, :, ::-1]
face_locations = face_recognition.face_locations(rgb_small_frame)
# frames.append((rgb_small_frame, face_locations))
Input_Queue.put((frame_no, rgb_small_frame, face_locations))
frame_no += 1
if processed_frames < 5:
processed_frames += 1
not_processed = 1
else:
if not_processed < 15:
process_this_frame = False
not_processed += 1
else:
processed_frames = 1
process_this_frame = True
print('-----------------------------------------------------------------------------------------------')
cap.release()
cv2.destroyAllWindows()
alternative_process('user_verification_2.avi')
As the documentation on Process.join() says, hanging (or "blocking") is exactly what is expected to happen:
Block the calling thread until the process whose join() method is
called terminates or until the optional timeout occurs.
join() stops current thread until the target process finishes. Target process is calling alternative_process_target_func, so the problem is obviously in that function. It never finishes. There may be more than one reason for that.
Problem 1
alternative_process_target_func runs until output_queue.full(). What if it is never full? It never ends? It is really better to determine the end some other way, e.g. run until the input queue is empty.
Problem 2
input_queue.get() will block if the input queue is empty. As the documentation says:
Remove and return an item from the queue. If optional args block is true and timeout is None (the default), block if necessary until an item is available.
You are running multiple processes, so do not expect that there is something in input just because output_queue.full() was False a moment ago, and because input size is the same as output size. A lot could have happened in the meantime.
What you want to do is:
try:
input_queue.get(False) # or input_queue.get_nowait()
except Empty:
break # stop when there is nothing more to read from the input
Problem 3
output_queue.put((frame_no, frame_no)) will block if there is no room in the output to store the data.
Again, you are assuming that there is room in output, just because you checked output_queue.full() a few moments ago, and because input size is equal to output size. Never rely on such things.
You want to do the same thing as for input:
try:
output_queue.put((frame_no, frame_no), False)
# or output_queue.put_nowait((frame_no, frame_no))
except Empty:
# deal with this somehow, e.g.
raise Exception("There is no room in the output queue to write to.")

Why write to a file is faster than mutiprocessing.Pipe?

i am test the fastest way between two process. i got two process, one write data, one receive data. my script show write and read from a file is fater than pipe. How can this happen? memory is faster than disk??
write and read from file:
#!/usr/bin/env python
# -*- coding:utf-8 -*-
from mutiprocesscomunicate import gen_data
data_size = 128 * 1024 # KB
def send_data_task(file_name):
with open(file_name, 'wb+') as fd:
for i in range(data_size):
fd.write(gen_data(1))
fd.write('\n'.encode('ascii'))
# end EOF
fd.write('EOF'.encode('ascii'))
print('send done.')
def get_data_task(file_name):
offset = 0
fd = open(file_name, 'r+')
i = 0
while True:
data = fd.read(1024)
offset += len(data)
if 'EOF' in data:
fd.truncate()
break
if not data:
fd.close()
fd = None
fd = open(file_name, 'r+')
fd.seek(offset)
continue
print("recv done.")
if __name__ == '__main__':
import multiprocessing
pipe_out = pipe_in = 'throught_file'
p = multiprocessing.Process(target=send_data_task, args=(pipe_out,), kwargs=())
p1 = multiprocessing.Process(target=get_data_task, args=(pipe_in,), kwargs=())
p.daemon = True
p1.daemon = True
import time
start_time = time.time()
p1.start()
import time
time.sleep(0.5)
p.start()
p.join()
p1.join()
import os
os.sync()
print('through file', data_size / (time.time() - start_time), 'KB/s')
open(pipe_in, 'w+').truncate()
use pipe
#!/usr/bin/env python
# -*- coding:utf-8 -*-
import multiprocessing
from mutiprocesscomunicate import gen_data
data_size = 128 * 1024 # KB
def send_data_task(pipe_out):
for i in range(data_size):
pipe_out.send(gen_data(1))
# end EOF
pipe_out.send("")
print('send done.')
def get_data_task(pipe_in):
while True:
data = pipe_in.recv()
if not data:
break
print("recv done.")
if __name__ == '__main__':
pipe_out, pipe_in = multiprocessing.Pipe()
p = multiprocessing.Process(target=send_data_task, args=(pipe_out,), kwargs=())
p1 = multiprocessing.Process(target=get_data_task, args=(pipe_in,), kwargs=())
p.daemon = True
p1.daemon = True
import time
start_time = time.time()
p1.start()
p.start()
p.join()
p1.join()
print('through pipe', data_size / (time.time() - start_time), 'KB/s')
create data function:
def gen_data(size):
onekb = "a" * 1024
return (onekb * size).encode('ascii')
result:
through file 110403.02025891568 KB/s
through pipe 75354.71358973449 KB/s
i use Mac os with python3.
update
if data is just 1kb, pipe is 100 faster than file. but if date if big, like 128MB result is above.
A pipe has a limited capacity, in order to match speeds of producer and consumer (via back pressure flow control) rather than consume an unlimited amount of memory. The particular limit on OS X, according to this Unix stack exchange answer, is 16KiB. As you're writing 128KiB, this means 8 times as many system calls (and context switches), at least. When working with files, the size is limited by your disk space or quota only, and without a fdatasync or similar, it won't need to make it to disk; it can be read again directly from cache. On the other hand, when your data is small, the time to find a place to put the file dominates leaving the pipe far faster.
When you do use fdatasync, or just exceed the available memory for disk caching, writing to disk also slows down to match actual disk transfer speeds.
Because quite often file data is first written into the page cache (which is in RAM) by the OS kernel.

Categories

Resources