Related
[Update]
I'm using Twisted 22.4.0 and Python 3.9.6
I'm trying to write an asynchronous application that must run an event loop at 250Hz. So far, Twisted is simply not fast enough to work for my application (but I would like to know if it's possible to fix this). On a Windows 10 i5 laptop, the highest frequency I can achieve in a LoopingCall is around 50hz. When I adjust the following code runs ok at 50hz and successfully prints out "took 1.002 sec", but at 100Hz, the code takes typically 1.5seconds to run, and I need my code to be able to run at .004 (250Hz).
from twisted.internet.task import LoopingCall
from twisted.internet import reactor
import time
class Loop():
def __init__(self, hz):
self.hz = hz
self.lc = LoopingCall(self.fast_task)
self.num_calls = 0
self.lc.start(1/hz)
reactor.run() # **Forgot to add this the first time**
def fast_task(self):
if self.num_calls == 0:
self.start_time = time.time()
if self.num_calls == self.hz:
print("Stopping reactor...")
print(f"took: {time.time() - self.start_time} sec")
reactor.stop()
return
self.num_calls += 1
if __name__ == "__main__":
l = Loop(100)
The above code typically takes ~1.5s to run.
My question:
Is there any way to speed this event loop up in Twisted on Windows?
I've run some similar code in asyncio and asyncio can definitely handle a 250Hz loop on my laptop. So one of the next things I tried was using the asyncioreactor with Twisted. Turns out, it still takes the same amount of time as the above code which doesn't use asyncioreactor.
But, I like the simplicity of Twisted for my use case - I need a few TCP servers and clients and a few UDP servers and clients plus some other heavy I/O processing.
One other note, I did find this ticket (https://twistedmatrix.com/trac/ticket/2424) for Twisted in which I found out [Edit](that the author of Twisted - Glyph - chose not to move to a monotonic based time unless there was an API change, which to my knowledge hasn't been implemented except maybe when used with asyncioreactor?). This also gives me other concerns about using Twisted as a reliable, high frequency event loop, such as NTP clock adjustments. Now, it may be that using asyncio under the hood (with asyncioreactor) takes care of this problem, but it certainly doesn't seem to offer any speed advantage.
[Update 2] This may have fixed my problem:
I adjusted the windows sleep resolution with the following code, and now my LoopingCall seems to run the above code reliably in 1 sec at 250Hz, and reliably up to 1000Hz:
from ctypes import windll
windll.winmm.timeBeginPeriod(1) # This sets the time sleep resolution to 1 ms
[Update 3]
I've included the code I used to create the loop with aysncioreactor.
Note: you'll notice that I'm using WindowsSelectorEventLoopPolicy() - this is due to not having the latest Visual C++ libraries installed (not sure if that's important info here, though)
Note 2: I'm new to twisted, so I could be using this incorrectly (the usage of asyncioreactor, or the actual LoopingCall - although the LoopingCall seems pretty straightforward)
Note 3:
I'm running on Windows 10 v21H2, Processor: 1.6GHz i5
The v21H2 is important here since it's after v2004:
From: https://learn.microsoft.com/en-us/windows/win32/api/timeapi/nf-timeapi-timebeginperiod
Prior to Windows 10, version 2004, this function affects a global
Windows setting. For all processes Windows uses the lowest value (that
is, highest resolution) requested by any process. Starting with
Windows 10, version 2004, this function no longer affects global timer
resolution. For processes which call this function, Windows uses the
lowest value (that is, highest resolution) requested by any process.
For processes which have not called this function, Windows does not
guarantee a higher resolution than the default system resolution.
To see if I could prove this out, I've tried running Windows Media Player, Skype, and other programs while not calling timeBeginPeriod(1) (the thought being that another program by another process would set a lower resolution and that would affect my program. But this didn't change the timings you see below.
Note 4:
Timings for a 3 second run (3 runs each) # 1000Hz:
asyncioreactor with timeBeginPeriod(1): [3.019, 3.029, 3.009]
asyncioreactor with no timeBeginPeriod(1): [42.859, 43.65, 43.152]
no asyncioreactor with timeBeginPeriod(1): [3.012, 3.519, 3.146]
no asyncioreactor, no timeBeginPeriod(1): [45.247, 44.957, 45.325]
My implementation using asyncioreactor
import asyncio
asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())
from twisted.internet import asyncioreactor
asyncioreactor.install()
from twisted.internet.task import LoopingCall
from twisted.internet import reactor
import time
from ctypes import windll
windll.winmm.timeBeginPeriod(1)
class Loop():
def __init__(self, hz=1000):
self.hz = hz
...
...
To fix my problem:
I adjusted the windows sleep resolution with the following code, and now my LoopingCall seems to run the above code reliably in 1 sec at 250Hz, and reliably up to 1000Hz:
from ctypes import windll
windll.winmm.timeBeginPeriod(1) # This sets the time sleep resolution to 1 ms
I have a python program where I need to load and de-serialize a 1GB pickle file. It takes a good 20 seconds and I would like to have a mechanism whereby the content of the pickle is readily available for use. I've looked at shared_memory but all the examples of its use seem to involve numpy and my project doesn't use numpy. What is the easiest and cleanest way to achieve this using shared_memory or otherwise?
This is how I'm loading the data now (on every run):
def load_pickle(pickle_name):
return pickle.load(open(DATA_ROOT + pickle_name, 'rb'))
I would like to be able to edit the simulation code in between runs without having to reload the pickle. I've been messing around with importlib.reload but it really doesn't seem to work well for a large Python program with many file:
def main():
data_manager.load_data()
run_simulation()
while True:
try:
importlib.reload(simulation)
run_simulation()
except:
print(traceback.format_exc())
print('Press enter to re-run main.py, CTRL-C to exit')
sys.stdin.readline()
This could be an XY problem, the source of which being the assumption that you must use pickles at all; they're just awful to deal with due to how they manage dependencies and are fundamentally a poor choice for any long-term data storage because of it
The source financial data is almost-certainly in some tabular form to begin with, so it may be possible to request it in a friendlier format
A simple middleware to deserialize and reserialize the pickles in the meantime will smooth the transition
input -> load pickle -> write -> output
Converting your workflow to use Parquet or Feather which are designed to be efficient to read and write will almost-certainly make a considerable difference to your load speed
Further relevant links
Answer to How to reversibly store and load a Pandas dataframe to/from disk
What are the pros and cons of parquet format compared to other formats?
You may also be able to achieve this with hickle, which will internally use a HDH5 format, ideally making it significantly faster than pickle, while still behaving like one
An alternative to storing the unpickled data in memory would be to store the pickle in a ramdisk, so long as most of the time overhead comes from disk reads. Example code (to run in a terminal) is below.
sudo mkdir mnt/pickle
mount -o size=1536M -t tmpfs none /mnt/pickle
cp path/to/pickle.pkl mnt/pickle/pickle.pkl
Then you can access the pickle at mnt/pickle/pickle.pkl. Note that you can change the file names and extensions to whatever you want. If disk read is not the biggest bottleneck, you might not see a speed increase. If you run out of memory, you can try turning down the size of the ramdisk (I set it at 1536 mb, or 1.5gb)
You can use shareable list:
So you will have 1 python program running which will load the file and save it in memory and another python program which can take the file from memory. Your data, whatever is it you can load it in dictionary and then dump it as json and then reload json.
So
Program1
import pickle
import json
from multiprocessing.managers import SharedMemoryManager
YOUR_DATA=pickle.load(open(DATA_ROOT + pickle_name, 'rb'))
data_dict={'DATA':YOUR_DATA}
data_dict_json=json.dumps(data_dict)
smm = SharedMemoryManager()
smm.start()
sl = smm.ShareableList(['alpha','beta',data_dict_json])
print (sl)
#smm.shutdown() commenting shutdown now but you will need to do it eventually
The output will look like this
#OUTPUT
>>>ShareableList(['alpha', 'beta', "your data in json format"], name='psm_12abcd')
Now in Program2:
from multiprocessing import shared_memory
load_from_mem=shared_memory.ShareableList(name='psm_12abcd')
load_from_mem[1]
#OUTPUT
'beta'
load_from_mem[2]
#OUTPUT
yourdataindictionaryformat
You can look for more over here
https://docs.python.org/3/library/multiprocessing.shared_memory.html
Adding another assumption-challenging answer, it could be where you're reading your files from that makes a big difference
1G is not a great amount of data with today's systems; at 20 seconds to load, that's only 50MB/s, which is a fraction of what even the slowest disks provide
You may find you actually have a slow disk or some type of network share as your real bottleneck and that changing to a faster storage medium or compressing the data (perhaps with gzip) makes a great difference to read and writing
Here are my assumptions while writing this answer:
Your Financial data is being produced after complex operations and you want the result to persist in memory
The code that consumes must be able to access that data fast
You wish to use shared memory
Here are the codes (self-explanatory, I believe)
Data structure
'''
Nested class definitions to simulate complex data
'''
class A:
def __init__(self, name, value):
self.name = name
self.value = value
def get_attr(self):
return self.name, self.value
def set_attr(self, n, v):
self.name = n
self.value = v
class B(A):
def __init__(self, name, value, status):
super(B, self).__init__(name, value)
self.status = status
def set_attr(self, n, v, s):
A.set_attr(self, n,v)
self.status = s
def get_attr(self):
print('\nName : {}\nValue : {}\nStatus : {}'.format(self.name, self.value, self.status))
Producer.py
from multiprocessing import shared_memory as sm
import time
import pickle as pkl
import pickletools as ptool
import sys
from class_defs import B
def main():
# Data Creation/Processing
obj1 = B('Sam Reagon', '2703', 'Active')
#print(sys.getsizeof(obj1))
obj1.set_attr('Ronald Reagon', '1023', 'INACTIVE')
obj1.get_attr()
###### real deal #########
# Create pickle string
byte_str = pkl.dumps(obj=obj1, protocol=pkl.HIGHEST_PROTOCOL, buffer_callback=None)
# compress the pickle
#byte_str_opt = ptool.optimize(byte_str)
byte_str_opt = bytearray(byte_str)
# place data on shared memory buffer
shm_a = sm.SharedMemory(name='datashare', create=True, size=len(byte_str_opt))#sys.getsizeof(obj1))
buffer = shm_a.buf
buffer[:] = byte_str_opt[:]
#print(shm_a.name) # the string to access the shared memory
#print(len(shm_a.buf[:]))
# Just an infinite loop to keep the producer running, like a server
# a better approach would be to explore use of shared memory manager
while(True):
time.sleep(60)
if __name__ == '__main__':
main()
Consumer.py
from multiprocessing import shared_memory as sm
import pickle as pkl
from class_defs import B # we need this so that while unpickling, the object structure is understood
def main():
shm_b = sm.SharedMemory(name='datashare')
byte_str = bytes(shm_b.buf[:]) # convert the shared_memory buffer to a bytes array
obj = pkl.loads(data=byte_str) # un-pickle the bytes array (as a data source)
print(obj.name, obj.value, obj.status) # get the values of the object attributes
if __name__ == '__main__':
main()
When the Producer.py is executed in one terminal, it will emit a string identifier (say, wnsm_86cd09d4) for the shared memory. Enter this string in the Consumer.py and execute it in another terminal.
Just run the Producer.py in one terminal and the Consumer.py on another terminal on the same machine.
I hope this is what you wanted!
You can take advantage of multiprocessing to run the simulations inside of subprocesses, and leverage the copy-on-write benefits of forking to unpickle/process the data only once at the start:
import multiprocessing
import pickle
# Need to use forking to get copy-on-write benefits!
mp = multiprocessing.get_context('fork')
# Load data once, in the parent process
data = pickle.load(open(DATA_ROOT + pickle_name, 'rb'))
def _run_simulation(_):
# Wrapper for `run_simulation` that takes one argument. The function passed
# into `multiprocessing.Pool.map` must take one argument.
run_simulation()
with mp.Pool() as pool:
pool.map(_run_simulation, range(num_simulations))
If you want to parameterize each simulation run, you can do so like so:
import multiprocessing
import pickle
# Need to use forking to get copy-on-write benefits!
mp = multiprocessing.get_context('fork')
# Load data once, in the parent process
data = pickle.load(open(DATA_ROOT + pickle_name, 'rb'))
with mp.Pool() as pool:
simulations = ('arg for simulation run', 'arg for another simulation run')
pool.map(run_simulation, simulations)
This way the run_simulation function will be passed in the values from the simulations tuple, which can allow for having each simulation run with different parameters, or even just assign each run a ID number of name for logging/saving purposes.
This whole approach relies on fork being available. For more information about using fork with Python's built-in multiprocessing library, see the docs about contexts and start methods. You may also want to consider using the forkserver multiprocessing context (by using mp = multiprocessing.get_context('fork')) for the reasons described in the docs.
If you don't want to run your simulations in parallel, this approach can be adapted for that. The key thing is that in order to only have to process the data once, you must call run_simulation within the process that processed the data, or one of its child processes.
If, for instance, you wanted to edit what run_simulation does, and then run it again at your command, you could do it with code resembling this:
main.py:
import multiprocessing
from multiprocessing.connection import Connection
import pickle
from data import load_data
# Load/process data in the parent process
load_data()
# Now child processes can access the data nearly instantaneously
# Need to use forking to get copy-on-write benefits!
mp = multiprocessing.get_context('fork') # Consider using 'forkserver' instead
# This is only ever run in child processes
def load_and_run_simulation(result_pipe: Connection) -> None:
# Import `run_simulation` here to allow it to change between runs
from simulation import run_simulation
# Ensure that simulation has not been imported in the parent process, as if
# so, it will be available in the child process just like the data!
try:
run_simulation()
except Exception as ex:
# Send the exception to the parent process
result_pipe.send(ex)
else:
# Send this because the parent is waiting for a response
result_pipe.send(None)
def run_simulation_in_child_process() -> None:
result_pipe_output, result_pipe_input = mp.Pipe(duplex=False)
proc = mp.Process(
target=load_and_run_simulation,
args=(result_pipe_input,)
)
print('Starting simulation')
proc.start()
try:
# The `recv` below will wait until the child process sends sometime, or
# will raise `EOFError` if the child process crashes suddenly without
# sending an exception (e.g. if a segfault occurs)
result = result_pipe_output.recv()
if isinstance(result, Exception):
raise result # raise exceptions from the child process
proc.join()
except KeyboardInterrupt:
print("Caught 'KeyboardInterrupt'; terminating simulation")
proc.terminate()
print('Simulation finished')
if __name__ == '__main__':
while True:
choice = input('\n'.join((
'What would you like to do?',
'1) Run simulation',
'2) Exit\n',
)))
if choice.strip() == '1':
run_simulation_in_child_process()
elif choice.strip() == '2':
exit()
else:
print(f'Invalid option: {choice!r}')
data.py:
from functools import lru_cache
# <obtain 'DATA_ROOT' and 'pickle_name' here>
#lru_cache
def load_data():
with open(DATA_ROOT + pickle_name, 'rb') as f:
return pickle.load(f)
simulation.py:
from data import load_data
# This call will complete almost instantaneously if `main.py` has been run
data = load_data()
def run_simulation():
# Run the simulation using the data, which will already be loaded if this
# is run from `main.py`.
# Anything printed here will appear in the output of the parent process.
# Exceptions raised here will be caught/handled by the parent process.
...
The three files detailed above should all be within the same directory, alongside an __init__.py file that can be empty. The main.py file can be renamed to whatever you'd like, and is the primary entry-point for this program. You can run simulation.py directly, but that will result in a long time spent loading/processing the data, which was the problem you ran into initially. While main.py is running, the file simulation.py can be edited, as it is reloaded every time you run the simulation from main.py.
For macOS users: forking on macOS can be a bit buggy, which is why Python defaults to using the spawn method for multiprocessing on macOS, but still supports fork and forkserver for it. If you're running into crashes or multiprocessing-related issues, try adding OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES to your environment. See https://stackoverflow.com/a/52230415/5946921 for more details.
As I understood:
something is needed to be loaded
it is needed to be loaded often, because file with code which uses this something is edited often
you don't want to wait until it will be loaded every time
Maybe such solution will be okay for you.
You can write script loader file in such way (tested on Python 3.8):
import importlib.util, traceback, sys, gc
# Example data
import pickle
something = pickle.loads(pickle.dumps([123]))
if __name__ == '__main__':
try:
mod_path = sys.argv[1]
except IndexError:
print('Usage: python3', sys.argv[0], 'PATH_TO_SCRIPT')
exit(1)
modules_before = list(sys.modules.keys())
argv = sys.argv[1:]
while True:
MOD_NAME = '__main__'
spec = importlib.util.spec_from_file_location(MOD_NAME, mod_path)
mod = importlib.util.module_from_spec(spec)
# Change to needed global name in the target module
mod.something = something
sys.modules[MOD_NAME] = mod
sys.argv = argv
try:
spec.loader.exec_module(mod)
except:
traceback.print_exc()
del mod, spec
modules_after = list(sys.modules.keys())
for k in modules_after:
if k not in modules_before:
del sys.modules[k]
gc.collect()
print('Press enter to re-run, CTRL-C to exit')
sys.stdin.readline()
Example of module:
# Change 1 to some different number when first script is running and press enter
something[0] += 1
print(something)
Should work. And should reduce the reload time of pickle close to zero 🌝
UPD
Add a possibility to accept script name with command line arguments
This is not exact answer to the question as the Q looks as pickle and SHM are required, but others went of the path, so I am going to share a trick of mine. It might help you. There are some fine solutions here using the pickle and SHM anyway. Regarding this I can offer only more of the same. Same pasta with slight sauce modifications.
Two tricks I employ when dealing with your situations are as follows.
First is to use sqlite3 instead of pickle. You can even easily develop a module for a drop-in replacement using sqlite. Nice thing is that data will be inserted and selected using native Python types, and you can define yourown with converter and adapter functions that would use serialization method of your choice to store complex objects. Can be a pickle or json or whatever.
What I do is to define a class with data passed in through *args and/or **kwargs of a constructor. It represents whatever obj model I need, then I pick-up rows from "select * from table;" of my database and let Python unwrap the data during the new object initialization. Loading big amount of data with datatype conversions, even the custom ones is suprisingly fast. sqlite will manage buffering and IO stuff for you and do it faster than pickle. The trick is construct your object to be filled and initiated as fast as possible. I either subclass dict() or use slots to speed up the thing.
sqlite3 comes with Python so that's a bonus too.
The other method of mine is to use a ZIP file and struct module.
You construct a ZIP file with multiple files within. E.g. for a pronunciation dictionary with more than 400000 words I'd like a dict() object. So I use one file, let say, lengths.dat in which I define a length of a key and a length of a value for each pair in binary format. Then I have a one file of words and one file of pronunciations all one after the other.
When I load from file, I read the lengths and use them to construct a dict() of words with their pronunciations from two other files. Indexing bytes() is fast, so, creating such a dictionary is very fast. You can even have it compressed if diskspace is a concern, but some speed loss is introduced then.
Both methods will take less place on a disk than the pickle would.
The second method will require you to read into RAM all the data you need, then you will be constructing the objects, which will take almost double of RAM that the data took, then you can discard the raw data, of course. But alltogether shouldn't require more than the pickle takes. As for RAM, the OS will manage almost anything using the virtual memory/SWAP if needed.
Oh, yeah, there is the third trick I use. When I have ZIP file constructed as mentioned above or anything else which requires additional deserialization while constructing an object, and number of such objects is great, then I introduce a lazy load. I.e. Let say we have a big file with serialized objects in it. You make the program load all the data and distribute it per object which you keep in list() or dict().
You write your classes in such a way that when the object is first asked for data it unpacks its raw data, deserializes and what not, removes the raw data from RAM then returns your result. So you will not be losing loading time until you actually need the data in question, which is much less noticeable for a user than 20 secs taking for a process to start.
I implemented the python-preloaded script, which can help you here. It will store the CPython state at an early stage after some modules are loaded, and then when you need it, you can restore from this state and load your normal Python script. Storing currently means that it will stay in memory, and restoring means that it does a fork on it, which is very fast. But these are implementation details of python-preloaded and should not matter to you.
So, to make it work for your use case:
Make a new module, data_preloaded.py or so, and in there, just this code:
preloaded_data = load_pickle(...)
Now run py-preloaded-bundle-fork-server.py data_preloaded -o python-data-preloaded.bin. This will create python-data-preloaded.bin, which can be used as a replacement for python.
I assume you have started python your_script.py before. So now run ./python-data-preloaded.bin your_script.py. Or also just python-data-preloaded.bin (no args). The first time, this will still be slow, i.e. take about 20 seconds. But now it is in memory.
Now run ./python-data-preloaded.bin your_script.py again. Now it should be extremely fast, i.e. a few milliseconds. And you can start it again and again and it will always be fast, until you restart your computer.
I am using Python sounddevice library to record audio, but I can't seem to eliminate ~0.25 to ~0.5 second gaps between what should be consecutive audio files. I think this is because the file writing takes up time, so I learned to use Multiprocessing and Queues to separate out the file writing but it hasn't helped. The most confusing thing is that the logs suggest that the iterations in Main()'s loop are near gapless (only 1-5 milliseconds) but mysteriously the audio_capture function is taking longer than expected even tho nothing else significant is being done. I tried to reduce the script as much as possible for this post. My research has all pointed to this threading/multiprocessing approach, so I am flummoxed.
Background: 3.7 on Raspbian Buster
I am dividing the data into segments so that the files are not too big and I imagine programming tasks must deal with this challenge. I also have 4 other subprocesses doing various things after.
Log: The audio_capture part should only take 10:00
08:26:29.991 --- Start of segment #0
08:36:30.627 --- End of segment #0 <<<<< This is >0.6 later than it should be
08:36:30.629 --- Start of segment #1 <<<<< This is near gapless with the prior event
Script:
import logging
import sounddevice
from scipy.io.wavfile import write
import time
import os
from multiprocessing import Queue, Process
# this process is a near endless loop
def main():
fileQueue = Queue()
writerProcess = Process(target=writer, args=(fileQueue,))
writerProcess.start()
for i in range(9000):
fileQueue.put(audio_capture(i))
writerProcess.join()
# This func makes an audio data object from a sound source
def audio_capture(i):
cycleNumber = str(i)
logging.debug('Start of segment #' + cycleNumber)
# each cycle is 10 minutes at 32000Hz sample rate
audio = sounddevice.rec(frames=600 * 32000, samplerate=32000, channels=2)
name = time.strftime("%H-%M-%S") + '.wav'
path = os.path.join('/audio', name)
sounddevice.wait()
logging.debug('End of segment #' + cycleNumber)
return [audio, path]
# This function writes the files.
def writer(input_queue):
while True:
try:
parameters = input_queue.get()
audio = parameters[0]
path = parameters[1]
write(filename=path, rate=32000, data=audio)
logging.debug('File is written')
except:
pass
if __name__ == "__main__":
logging.basicConfig(level=logging.DEBUG, format='%(asctime)s.%(msecs)03d --- %(message)s', datefmt='%H:%M:%S',handlers=[logging.FileHandler('/audio/log.txt'), logging.StreamHandler()])
main()
The documentation tells us that sounddevice.rec() is not meant for gapless recording:
If you need more control (e.g. block-wise gapless recording, overlapping recordings, …), you should explicitly create an InputStream yourself. If NumPy is not available, you can use a RawInputStream.
There are multiple examples for gapless recording in the example programs.
Use Pyaudio, open a non-blocking audio-stream. you can find a very good basic example on the Pyaudio documentation frontpage. Choose a buffer size, I recommend 512 or 1024. Now just append the incoming data to a numpy array. I sometimes store up to 30 seconds of audio in one numpy array. When reaching the end of a segment, create another empty numpy array and start over. Create a thread and save the first segment somewhere. The recording will continue and not one sample will be dropped ;)
Edit: if you want to write 10 mins in one file, I would suggest just create 10 arrays á 1 minute and then append and save them.
I want to send images taken by Picamera on Raspberry Pi to my windows computer.
I wrote some code, as follows (they are simplified here), but it gets stuck in frame = footage_socket.recv_string() in the client.py.
I don't get any error but it always gets stuck in the code like it freezes and can't go to the next line. The server.py works fine and prints 'test' continuously. If you look at the jpg_as_text, you can see encoded texts.
server.py :
import picamera
import socket
import threading
import zmq
import cv2
import base64
from picamera.array import PiRGBArray
if __name__ == "__main__":
addr = 'ip_address'
camera = picamera.PiCamera() # Camera initialization
camera.resolution = (640, 480)
camera.framerate = 7
rawCapture = PiRGBArray(camera, size=(640, 480))
# FPV initialization
context = zmq.Context()
footage_socket = context.socket(zmq.PUB)
footage_socket.connect('tcp://%s:5555'%addr)
print(addr)
font = cv2.FONT_HERSHEY_SIMPLEX
for frame in camera.capture_continuous( rawCapture,
format = "bgr",
use_video_port = True ):
image = frame.array
print('test')
image = cv2.resize(image, (640, 480)) # resize the frame
encoded, buffer = cv2.imencode('.jpg', image)
jpg_as_text = base64.b64encode(buffer)
footage_socket.send(jpg_as_text)
rawCapture.truncate(0)
client.py :
from socket import *
import sys
import time
import threading as thread
import tkinter as tk
import math
import os
import cv2
import zmq
import base64
import numpy as np
if __name__ == "__main__":
context = zmq.Context()
footage_socket = context.socket(zmq.SUB)
footage_socket.bind('tcp://*:5555')
footage_socket.setsockopt_string(zmq.SUBSCRIBE, np.unicode(''))
font = cv2.FONT_HERSHEY_SIMPLEX
while 100:
try:
frame = footage_socket.recv_string() # This line of code is the problem.
print('next successfuly connected')
img = base64.b64decode(frame)
npimg = np.frombuffer(img, dtype=np.uint8)
source = cv2.imdecode(npimg, 1)
cv2.imshow("Stream", source)
cv2.waitKey(1)
except KeyboardInterrupt:
break
except:
pass
Q : How to receive images from Raspberry Pi over ZeroMQ PUB/SUB in Python?
OBSERVATION :
There is no bug.
You shall use other data-acquisition strategy + setup some self-defensive parameters.
.recv_string()-method is called in a blocking-mode ( it does and will, even forever, block the code-execution, until anything plausible meets the rules to become deliverable
Using zmq.NOBLOCK flag permits you to avoid such blocking-mode + using a .poll()-method can help you design private event-driven loops' logic, that call for .recv( zmq.NOBLOCK ) just in cases, there indeed is something ready to get delivered.
SUB-side will receive nothing, unless properly subscribed to receive something, the default state -like with newspapers- is to receive nothing, unless explicitly subscribed to. The safest mode to subscribe to any content, as per the API documented strategy, is to subscribe to a zero-length string, using a .setsockopt( zmq.SUBSCRIBE, "" )-method to do so.
Last, but not least, if willing to do RPi-Win streaming, there might be a vise strategy, as it is in common of no value to enqueue/publish/transport/receive/dequeue any but the very latest frame, for which .setsockopt( zmq.CONFLATE, 1 ) is ready.
You may need more tweaking of resources, be it for boosting the .Context( nIOthreads )-instance performance, reserved Queue-depth, L3-stack parameters and many further possible enhancements.
Always do set .setsockopt( zmq.LINGER, 0 ) for you never know which versions will connect and what defaults might take place, here, with a chance to let your crashed instances of sockets hang forever (most often until the O/S reboot), which seems a bit wild, unhandled risk-factor for any production-grade software, doesn't it?
SOLUTION TIPS :
Avoid a risk of having missed the unicode-conventions, which are different not matching one another between the Linux-side originator and the Windows O/S-side.
+Since unicode objects have a wide range of representations, they are not stored as the bytes according to their encoding, but rather in a format called UCS (an older fixed-width Unicode format). On some platforms (OS X, Windows), the storage is UCS-2, which is 2 bytes per character. On most ix systems, it is UCS-4, or 4 bytes per character. The contents of the buffer of a unicode object are not encoding dependent (always UCS-2 or UCS-4), but they are platform dependent.
...
+The efficiency problem here comes from the fact that simple ascii strings are 4x as big in memory as they need to be (on most Linux, 2x on other platforms). Also, to translate to/from C code that works with char, you always have to copy data and encode/decode the bytes. This really is horribly inefficient from a memory standpoint. Essentially, Where memory efficiency matters to you, you should never ever use strings; use bytes. The problem is that users will almost always use str, and in 2.x they are efficient, but in 3.x they are not. We want to make sure that we don’t help the user make this mistake, so we ensure that zmq methods don’t try to hide what strings really are.
Read more about latency-avoidance in ZeroMQ, if trying to stream the constant and a priori known imagery ( 640 x 480 x <colordepth> ) - conversions are expensive, turning a small-scale, low-res, low FPS RGB / IR picture into JPEG-file format just for transmission is meaningless if local LAN or a dedicated WLAN segment is used between the RPi and Win-device. Latency-motivated design may test and perhaps avoid any kind of compressing the data by using cPickle.dumps() or dill.dumps() but rather send data as compact as possible in binary-block-BLOB, most often enough to use aNumpyObject.data utility to send straight from <read-write buffer for 0x7fa3cbe3f8a0, size 307200, offset 0 at 0x7f632bb2cc30> or doing some binary mangling using struct.pack()/.unpack()-methods, if in a need to go beyond the numpy available .data-access trick. All given the .setsockopt( zmq.CONFLATE, 1 ) was activated on both sides, for avoiding any excessive depths of buffering live-streaming data.
For both performance & latency reasons, you may avoid the PUB/SUB pair of Archetypes, as the ZeroMQ API v3.+ has moved the workload of the TOPIC-filtering onto the PUB-side, which is your weaker node ( while RPi has several cores and you may boost the .Context( nIOthreads )-instance on steroids, to have more power for I/O, yet the RPi has a fraction of GHz, compared to your Windows-side localhost, and robotic-tight control-loops may have already eaten up most of that for control ). Using PUSH/PULL would fit in quite the same way, for a 1-to-1 topology, plus having less processing and E2E-latency overheads due to handling avoided on the RPi side.
For .poll()-based, differently prioritised event-handlers, and a few remarks about seminal work of Mrs. Margaret HAMILTON and her MIT team, may like to read this & this.
I use the following code to do some immediate sound processing/analyzing. It works, but really slow (compared to the planned speed). I have added some time markers to find out where the problem is and according to them there shouldn't be any. Typical duration (see below) is <0.01 s for all three computed times but it still takes around a second to complete the loop. Where is the problem?
Edit: Please note, that the time measurement is not the real issue here. To prove that: MyPeaks basically just finds the maximum of pretty short FFT - nothing expensive. And the problem persists even when these routines are commented out.
Should I use something different than lambda function to make the cycle?
Did I make some mistake when starting and recording the stream?
etc.
import pyaudio
import struct
import mute_alsa
import time
import numpy as np
from Tkinter import *
def snd_process(k=0):
if k<1000:
t0=time.clock()
data = stream.read(CHUNK)
t1=time.clock()
fl=CHUNK
int_data = struct.unpack("%sh" %str(fl),data)
ft=np.fft.fft(int_data)
ft=np.fft.fftshift(ft)
ft=np.abs(ft)
t2=time.clock()
pks=MyPeaks(np.log(ft))
freq_out.configure(text=str(pks))
t3=time.clock()
print t1-t0, t2-t1, t3-t2
master.after(1, lambda: snd_process(k+1))
CHUNK = 8000
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 4000
p = pyaudio.PyAudio()
stream = p.open(format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=CHUNK)
#Tkinter stuff
master=Tk()
button_play=Button(master, command=snd_process, bg="yellow", text="Analyze")
button_play.grid(row=0, column=0)
freq_out = Label(master)
freq_out.grid(row=0, column=1)
freq_out.configure(text='base')
mainloop()
You are scheduling 1000 callback in tk main thread; for every callback you are using 1 ms delay (after()'s first argument). That means the last loop will start around after 1000 ms (1 second) the first one.
Maybe that is way the loop still takes around a second to complete.
So, try to use after_idle(). I don't think you really need to Speeding up the sound processing algorithm because np is already quite efficient.
[EDIT]
Surprise!! you are reading from audio channel at every iteration 1 second 8000 bytes in 16 bits format for a 4000 frame rate. You need a second to have it.
Squeezing I/O and calculations into the main loop like you are doing is the classical solution. But there are alternatives.
Do the audio gathering and calculations in a second thread. Since both I/O and numpy should release the GIL, it might be a good alternative here. There is a caveat here. Since GUI toolkits like TKinter are generally not multithread-safe, you should not make Tkinter calls from the second thread. But you could set up a function that is called with after to check the progress of the calculation and update the UI say every 100 ms.
Do the audio gathering and calculations in a different multiprocessing.Process. This makes it completely separate from your GUI. You will have to set up a communication channel like e.g. a Queue to send the pks back to the main process. You should use an after function to check if the Queue has data available and to update the display if so.
Depending on the OS you're running at you might nog be measuring actual 'wall-clock' time. See here http://pythoncentral.io/measure-time-in-python-time-time-vs-time-clock/ for some details. Note that for python 3.3 time.clock is deprecated and time.process_time() or time.perf_counter() is recommended.