Log messages via TCP, significantly slower than similar code in Java

Log messages via TCP, significantly slower than similar code in Java - python

In Python 3 I have a QueueHandler attached to a Logger and a QueueListener shipping the LogRecords to a SocketHandler which sends the logs via TCP to a Java app that's listening.
Both programs run on localhost.
import logging
import queue
log_q = queue.Queue(-1)
logger = logging.getLogger('TestLogger')
socket_handler = logging.handlers.SocketHandler('localhost', 1337)
q_handler = logging.handlers.QueueHandler(log_q)
q_listener = logging.handlers.QueueListener(log_q, socket_handler)
logger.addHandler(q_handler)
q_listener.start()
I'm sending LogRecords with fairly large lists attached to them.
logger.info("PROX_MARKER", extra={'vector': [some_list]})
where [some_list] is a list of ~100k double values.
I run the following code to test the throughput:
for i in range(1000):
logger.info("PROX_MARKER", extra={'vector': [some_list]})
Which takes about 30-35 seconds to complete.
If I run a similar test in Java, the Java app is about twice as fast.
In Python the QueueHandler/-Listener + SocketHandler setup manages to send about 3 messages for every 10 put in the queue. By the time the program finishes it'll have sent ~300 msgs with about 700 still in the queue which will than slowly get sent after the main program already finished.
The QueueHandler/-Listener I use are default, the only change to the default SocketHandler is that I use a custom serializing method.
My goal (in case this wasn't obvious) is to try and speed up the python code. Unfortunately I'm still not 100% sure what's causing this slow behavior in the first place. It might have something to do with the socket (which I know very little about, I've tried playing around with various timeout setting and TCP_NODELAY - to no avail).
I've tried to ditch the QueueHandler/-Listener and just use the SocketHandler directly which takes about the same amount of time as before so I'm assuming that threading isn't an issue.
Any tips on what the issue might be or how to make this faster are very much appreciated.

Related

Python multi-processing one worker dynimc number of recievers of all worker data (1:n)

I am planing to setup a small proxy service for a remote sensor, that only accepts one connection. I have a temporary solution and I am now designing a more robust version, and therefore dived deeper into the python multiprocessing module.
I have written a couple of systems in python using a main process, which spawns subprocesses using the multiprocessing module and used multiprocessing.Queue to communicate between them. This works quite well and some of theses programs/scripts are doing their job in a production environment.
The new case is slightly different since it uses 2+n processes:
One data-collector, that reads data from the sensor (at 100Hz) and every once in a while receives short ASCII strings as command
One main-server, that binds to a socket and listens, for new connections and spawns...
n child-servers, that handle clients who want to have the sensor data
while communication from the child servers to the data collector seems pretty straight forward using a multiprocessing.Queue which manages a n:1 connection well enough, I have problems with the other way. I can't use a queue for that as well, because all child-servers need to get all the data the sensor produces, while they are active. At least I haven't found a way to configure a Queue to mimic that behaviour, as get takes the top most out of the Queue by design.
I looked into shared memory already, which massively increases the management overhead, since as far as I understand it while using it, I would basically need to implement a streaming buffer myself.
The only safe way I see right now, is using a redis server and messages queues, but I am a bit hesitant, since that would need more infrastructure than I like.
Is there a pure python internal way?

maybe You can use MQTT for that ?
You did not clearly specify, but sounds like observer pattern -
or do You want the clients to poll each time they need data ?
It depends which delays / data rate / jitter etc. You can accept.
after You provided the information :
The whole setup runs on one machine in one process space. What I would like to have, is a way without going through a third party process
I would suggest to check for observer pattern.
More informations can be found for example:
https://www.youtube.com/watch?v=_BpmfnqjgzQ&t=1882s
and
https://refactoring.guru/design-patterns/observer/python/example
and
https://www.protechtraining.com/blog/post/tutorial-the-observer-pattern-in-python-879
and
https://python-3-patterns-idioms-test.readthedocs.io/en/latest/Observer.html
Your Server should fork for each new connection and register with the observer, and will be therefore informed about every change.

With Logbook and ZeroMQ, why do I need to wait to be able to pass messages?

Code description:
My code is simple, it starts a ZeroMQHandler (socket based messaging) from the Logbook library (using pyzmq). The Logger (log) runs throughout the application. Finally, the handler closes the port. The .push() and .pop_application() methods are there instead of the
with handler.applicationbound(): and indent.
Purpose:
I'm testing this queue based messaging to see if it can be a low-impact asynchronous logging solution. I need to log about 15 000 messages per second. I'd prefer to use Python, but my fallback is writing the logger in C++ and exposing its handles to python.
The issue:
The issue is that if I don't wait a quarter of a second or more after opening the handler(socket), the program executes without any messages getting through (the test program takes less than 0.25 seconds to execute). This I interpret as being a set-up time needed for the ZeroMQ socket or something like that. So I'm reaching out to see if anyone has a similar experience, maybe this is documented anywhere, but I can't seem to figure it out by myself. I want to know why this is needed. Thanks for any input.
my working code looks something like this :
from logbook.queues import ZeroMQHandler
from logbook import Logger
import time
addr='tcp://127.0.0.1:5053'
handler = ZeroMQHandler(addr)
time.sleep(0.25) ################################################# THIS ! ####
log = Logger("myLogbook")
handler.push_application()
log.info("start of program")
foo()
log.info("end of program")
handler.close()
handler.pop_application()
receiver, running in different python kernel (for tests, gives output to stdout):
from logbook.queues import ZeroMQSubscriber
from logbook import Logger, StreamHandler
import sys
import time
addr='tcp://127.0.0.1:5053'
print("ZeroMQSubscriber begin with address {}".format(addr))
subscriber = ZeroMQSubscriber(addr)
handler = StreamHandler(sys.stdout)
log = Logger("A receiver")
handler.push_application()
try:
i=0
while True:
i += 1
record = subscriber.recv(2)
if not record:
pass # timeout
else:
print("got message!")
log.handle(record)
except KeyboardInterrupt:
print("C-C caught, program end after {} iterations".format(i))
handler.pop_application()

ZeroMQ indeed spends some time to create a Context()-instance per-se, it then asks O/S to allocate memory-bound resources, to spawn I/O-threads, which also takes some additional time. Next, each Socket()-instantiation consumes some add-on overhead time.
It is well documented in both the native API-documentation and in educational resources, that asynchronous signalling / messaging framework does spend some time, before any API-request actually gets processed inside the both "local" and "remote" Context()-instances and finally marked as a deliverable readable for the "remote"-end of some of the ZeroMQ Scalable Formal Communication Archetype.
This said, there ought be no surprise, that an even more re-wrapped use of ZeroMQ tools ( re-wrapped by another level of abstraction, coded into the logbook.queue.ZeroMQSubscriber, logbook.queue.ZeroMQHandler classes ), will only add additional { setup & operational }-overheads, so the known asynchronicity of the service will only grow.
If your application requires any sort of mutual reconfirmation between any pair of the both ends that they have reached a Ready-To-Operate state ( RTO-state ), the best would be to introduce some sort of smart-reconciliation policy, instead of remaining in a blind belief, relying on a long enough .sleep() to hope the things were let enough time to settle down and got into RTO.
In distributed-system there is always better be explicit, rather than remaining in an optimistic hope.
Epilogue:
Given your sustained throughput ought safely go under and also remain under the expected threshold of <= 66 [us/message] per message dispatched, let me also raise an interest into a proper Context()-parametrisation, so as to be indeed smoothly carrying your required workload under realistic hardware and system-wide resources planning.
Default values are not hammered into a stone and never ought be a point to rely on.

Can I have Python code to continue executing after I call Flask app.run?

I have just started with Python, although I have been programming in other languages over the past 30 years. I wanted to keep my first application simple, so I started out with a little home automation project hosted on a Raspberry Pi.
I got my code to work fine (controlling a valve, reading a flow sensor and showing some data on a display), but when I wanted to add some web interactivity it came to a sudden halt.
Most articles I have found on the subject suggest to use the Flask framework to compose dynamic web pages. I have tried, and understood, the basics of Flask, but I just can't get around the issue that Flask is blocking once I call the "app.run" function. The rest of my python code waits for Flask to return, which never happens. I.e. no more water flow measurement, valve motor steering or display updating.
So, my basic question would be: What tool should I use in order to serve a simple dynamic web page (with very low load, like 1 request / week), in parallel to my applications main tasks (GPIO/Pulse counting)? All this in the resource constrained environment of a Raspberry Pi (3).
If you still suggest Flask (because it seems very close to target), how should I arrange my code to keep handling the real-world events, such as mentioned above?
(This last part might be tough answering without seeing the actual code, but maybe it's possible answering it in a "generic" way? Or pointing to existing examples that I might have missed while searching.)

You're on the right track with multithreading. If your monitoring code runs in a loop, you could define a function like
def monitoring_loop():
while True:
# do the monitoring
Then, before you call app.run(), start a thread that runs that function:
import threading
from wherever import monitoring_loop
monitoring_thread = threading.Thread(target = monitoring_loop)
monitoring_thread.start()
# app.run() and whatever else you want to do
Don't join the thread - you want it to keep running in parallel to your Flask app. If you joined it, it would block the main execution thread until it finished, which would be never, since it's running a while True loop.
To communicate between the monitoring thread and the rest of the program, you could use a queue to pass messages in a thread-safe way between them.

The way I would probably handle this is to split your program into two distinct separately running programs.
One program handles the GPIO monitoring and communication, and the other program is your small Flask server. Since they run as separate processes, they won't block each other.
You can have the two processes communicate through a small database. The GIPO interface can periodically record flow measurements or other relevant data to a table in the database. It can also monitor another table in the database that might serve as a queue for requests.
Your Flask instance can query that same database to get the current statistics to return to the user, and can submit entries to the requests queue based on user input. (If the GIPO process updates that requests queue with the current status, the Flask process can report that back out.)
And as far as what kind of database to use on a little Raspberry Pi, consider sqlite3 which is a very small, lightweight file-based database well supported as a standard library in Python. (It doesn't require running a full "database server" process.)
Good luck with your project, it sounds like fun!

Hi i was trying the connection with dronekit_sitl and i got the same issue , after 30 seconds the connection was closed.To get rid of that , there are 2 solutions:
You use the decorator before_request:in this one you define a method that will handle the connection before each request
You use the decorator before_first_request : in this case the connection will be made once the first request will be called and the you can handle the object in the other route using a global variable
For more information https://pythonise.com/series/learning-flask/python-before-after-request

Python script execution time increases when executed multiple time parallely

I have a python script whose execution time is 1.2 second while it is being executed standalone.
But when I execute it 5-6 time parallely ( Am using postman to ping the url multiple times) the execution time shoots up.
Adding the breakdown of the time taken.
1 run -> ~1.2seconds
2 run -> ~1.8seconds
3 run -> ~2.3seconds
4 run -> ~2.9seconds
5 run -> ~4.0seconds
6 run -> ~4.5seconds
7 run -> ~5.2seconds
8 run -> ~5.2seconds
9 run -> ~6.4seconds
10 run -> ~7.1seconds
Screenshot of top command(Asked in the comment):
This is a sample code:
import psutil
import os
import time
start_time = time.time()
import cgitb
cgitb.enable()
import numpy as np
import MySQLdb as mysql
import cv2
import sys
import rpy2.robjects as robj
import rpy2.robjects.numpy2ri
rpy2.robjects.numpy2ri.activate()
from rpy2.robjects.packages import importr
R = robj.r
DTW = importr('dtw')
process= psutil.Process(os.getpid())
print " Memory Consumed after libraries load: "
print process.memory_info()[0]/float(2**20)
st_pt=4
# Generate our data (numpy arrays)
template = np.array([range(84),range(84),range(84)]).transpose()
query = np.array([range(2500000),range(2500000),range(2500000)]).transpose()
#time taken
print(" --- %s seconds ---" % (time.time() - start_time))
I also checked my memory consumption using watch -n 1 free -m and memory consumption also increases noticeably.
1) How do I make sure that the execution time of script remain constant everytime.
2) Can I load the libraries permanently so that the time taken by the script to load the libraries and the memory consumed can be minimized?
I made an enviroment and tried using
#!/home/ec2-user/anaconda/envs/test_python/
but it doesn't make any difference whatsoever.
EDIT:
I have AMAZON's EC2 server with 7.5GB RAM.
My php file with which am calling the python script.
<?php
$response = array("error" => FALSE);
if($_SERVER['REQUEST_METHOD']=='GET'){
$response["error"] = FALSE;
$command =escapeshellcmd(shell_exec("sudo /home/ec2-user/anaconda/envs/anubhaw_python/bin/python2.7 /var/www/cgi-bin/dtw_test_code.py"));
session_write_close();
$order=array("\n","\\");
$cleanData=str_replace($order,'',$command);
$response["message"]=$cleanData;
} else
{
header('HTTP/1.0 400 Bad Request');
$response["message"] = "Bad Request.";
}
echo json_encode($response);
?>
Thanks

1) You really can't ensure the execution will take always the same time, but at least you can avoid performance degradation by using a "locking" strategy like the ones described in this answer.
Basically you can test if the lockfile exists, and if so, put your program to sleep a certain amount of time, then try again.
If the program does not find the lockfile, it creates it, and delete the lockfile at the end of its execution.
Please note: in the below code, when the script fails to get the lock for a certain number of retries, it will exit (but this choice is really up to you).
The following code exemplifies the use of a file as a "lock" against parallel executions of the same script.
import time
import os
import sys
lockfilename = '.lock'
retries = 10
fail = True
for i in range(retries):
try:
lock = open(lockfilename, 'r')
lock.close()
time.sleep(1)
except Exception:
print('Got after {} retries'.format(i))
fail = False
lock = open(lockfilename, 'w')
lock.write('Locked!')
lock.close()
break
if fail:
print("Cannot get the lock, exiting.")
sys.exit(2)
# program execution...
time.sleep(5)
# end of program execution
os.remove(lockfilename)
2) This would mean that different python instances share the same memory pool and I think it's not feasible.

1)
More servers equals more availability
Hearsay tells me that one effective way to ensure consistent request times is to use multiple requests to a cluster. As I heard it the idea goes something like this.
The chance of a slow request
(Disclaimer I'm not much of a mathematician or statistician.)
If there is a 1% chance a request is going to take an abnormal amount of time to finish then one-in-a-hundred requests can be expected to be slow. If you as a client/consumer make two requests to a cluster instead of just one, the chance that both of them turn out to be slow would be more like 1/10000, and with three 1/1000000, et cetera. The downside is doubling your incoming requests means needing to provide (and pay for) as much as twice the server power to fulfill your requests with a consistent time, this additional cost scales with how much chance is acceptable for a slow request.
To my knowledge this concept is optimized for consistent fulfillment times.
The client
A client interfacing with a service like this has to be able to spawn multiple requests and handle them gracefully, probably including closing the unfulfilled connections as soon as it can.
The servers
On the backed there should be a load balancer that can associate multiple incoming client requests to multiple unique cluster workers. If a single client makes multiple requests to an overburdened node, its just going to compound its own request time like you see in your simple example.
In addition to having the client opportunistically close connections it would be best to have a system of sharing job fulfilled status/information so that backlogged request on other other slower-to-process nodes have a chance of aborting an already-fulfilled request.
This this a rather informal answer, I do not have direct experience with optimizing a service application in this manner. If someone does I encourage and welcome more detailed edits and expert implementation opinions.
2)
Caching imports
yes that is a thing, and its awesome!
I would personally recommend setting up django+gunicorn+nginx. Nginx can cache static content and keep a request backlog, gunicorn provides application caching and multiple threads&worker management (not to mention awesome administration and statistic tools), django embeds best practices for database migrations, auth, request routing, as well as off-the-shelf plugins for providing semantic rest endpoints and documentation, all sorts of goodness.
If you really insist on building it from scratch yourself you should study uWsgi, a great Wsgi implementation that can be interfaced with gunicorn to provide application caching. Gunicorn isn't the only option either, Nicholas Piël has a Great write up comparing performance of various python web serving apps.

Here's what we have:
EC2 instance type is m3.large box which has only 2 vCPUs https://aws.amazon.com/ec2/instance-types/?nc1=h_ls
We need to run a CPU- and memory-hungry script which takes over a second to execute when CPU is not busy
You're building an API than needs to handle concurrent requests and running apache
From the screenshot I can conclude that:
your CPUs are 100% utilized when 5 processes are run. Most likely they would be 100% utilized even when fewer processes are run. So this is the bottleneck and no surprise that the more processes are run the more time is required — you CPU resources just get shared among concurrently running scripts.
each script copy eats about ~300MB of RAM so you have lots of spare RAM and it's not a bottleneck. The amount of free + buffers memory on your screenshot confirms that.
The missing part is:
are requests directly sent to your apache server or there's a balancer/proxy in front of it?
why do you need PHP in your example? There are plently of solutions available using python ecosystem only without a php wrapper ahead of it
Answers to your questions:
That's infeasible in general case
The most you can do is to track your CPU usage and make sure its idle time doesn't drop below some empirical threshold — in this case your scripts would be run in more or less fixed amount of time.
To guarantee that you need to limit the number of requests being processed concurrently.
But if 100 requests are sent to your API concurrently you won't be able to handle them all in parallel! Only some of them will be handled in parallel while others waiting for their turn. But your server won't be knocked down trying to serve them all.
Yes and no
No because unlikely can you do something in your present architecture when a new script is launched on every request through a php wrapper. BTW it's a very expensive operation to run a new script from scratch each time.
Yes if a different solution is used. Here are the options:
use a python-aware pre-forking webserver which will handle your requests directly. You'll spare CPU resources on python startup + you might utilize some preloading technics to share RAM among workers, i.e http://docs.gunicorn.org/en/stable/settings.html#preload-app. You'd also need to limit the number of parallel workers to be run http://docs.gunicorn.org/en/stable/settings.html#workers to adress your first requirement.
if you need PHP for some reason you might setup some intermediary between PHP script and python workers — i.e. a queue-like server.
Than simply run several instances of your python scripts which would wait for some request to be availble in the queue. Once it's available it would handle it and put the response back to the queue and php script would slurp it and return back to the client. But it's a more complex to build this that the first solution (if you can eliminate your PHP script of course) and more components would be involved.
reject the idea to handle such heavy requests concurrently, and instead assign each request a unique id, put the request into a queue and return this id to the client immediately. The request will be picked up by an offline handler and put back into the queue once it's finished. It will be client's responsibility to poll your API for readiness of this particular request
1st and 2nd combined — handle requests in PHP and request another HTTP server (or any other TCP server) handling your preloaded .py-scripts

The ec2 cloud does not guarantee 7.5gb of free memory on the server. This would mean that the VM performance is severely impacted like you are seeing where the server has less than 7.5gb of physical free ram. Try reducing the amount of memory the server thinks it has.
This form of parallel performance is very expensive. Typically with 300mb requirement, the ideal would be a script which is long running, and re-uses the memory for multiple requests. The Unix fork function allows a shared state to be re-used. The os.fork gives this in python, but may not be compatible with your libraries.

It might be because of the way computers are run.
Each program gets a slice of time on a computer (quote Help Your Kids With Computer Programming, say maybe 1/1000 of a second)
Answer 1: Try using multiple threads instead of parallel processes.
It'll be less time-consuming, but the program's time to execute still won't be completely constant.
Note: Each program has it's own slot of memory, so that is why memory consumption is shooting up.

Building an HTTP API for continuously running python process

TL;DR: I have a beautifully crafted, continuously running piece of Python code controlling and reading out a physics experiment. Now I want to add an HTTP API.
I have written a module which controls the hardware using USB. I can script several types of autonomously operating experiments, but I'd like to control my running experiment over the internet. I like the idea of an HTTP API, and have implemented a proof-of-concept using Flask's development server.
The experiment runs as a single process claiming the USB connection and periodically (every 16 ms) all data is read out. This process can write hardware settings and commands, and reads data and command responses.
I have a few problems choosing the 'correct' way to communicate with this process. It works if the HTTP server only has a single worker. Then, I can use python's multiprocessing.Pipe for communication. Using more-or-less low-level sockets (or things like zeromq) should work, even for request/response, but I have to implement some sort of protocol: send {'cmd': 'set_voltage', 'value': 900} instead of calling hardware.set_voltage(800) (which I can use in the stand-alone scripts). I can use some sort of RPC, but as far as I know they all (SimpleXMLRPCServer, Pyro) use some sort of event loop for the 'server', in this case the process running the experiment, to process requests. But I can't have an event loop waiting for incoming requests; it should be reading out my hardware! I googled around quite a bit, but however I try to rephrase my question, I end up with Celery as the answer, which mostly fires off one job after another, but isn't really about communicating with a long-running process.
I'm confused. I can get this to work, but I fear I'll be reinventing a few wheels. I just want to launch my app in the terminal, open a web browser from anywhere, and monitor and control my experiment.
Update: The following code is a basic example of using the module:
from pysparc.muonlab.muonlab_ii import MuonlabII
muonlab = MuonlabII()
muonlab.select_lifetime_measurement()
muonlab.set_pmt1_voltage(900)
muonlab.set_pmt1_threshold(500)
lifetimes = []
while True:
data = muonlab.read_lifetime_data()
if data:
print "Muon decays detected with lifetimes", data
lifetimes.extend(data)
The module lives at https://github.com/HiSPARC/pysparc/tree/master/pysparc/muonlab.
My current implementation of the HTTP API lives at https://github.com/HiSPARC/pysparc/blob/master/bin/muonlab_with_http_api.
I'm pretty happy with the module (with lots of tests) but the HTTP API runs using Flask's single-threaded development server (which the documentation and the internet tells me is a bad idea) and passes dictionaries through a Pipe as some sort of IPC. I'd love to be able to do something like this in the above script:
while True:
data = muonlab.read_lifetime_data()
if data:
print "Muon decays detected with lifetimes", data
lifetimes.extend(data)
process_remote_requests()
where process_remote_requests is a fairly short function to call the muonlab instance or return data. Then, in my Flask views, I'd have something like:
muonlab = RemoteMuonlab()
#app.route('/pmt1_voltage', methods=['GET', 'PUT'])
def get_data():
if request.method == 'PUT':
voltage = request.form['voltage']
muonlab.set_pmt1_voltage(voltage)
else:
voltage = muonlab.get_pmt1_voltage()
return jsonify(voltage=voltage)
Getting the measurement data from the app is perhaps less of a problem, since I could store that in SQLite or something else that handles concurrent access.

But... you do have an IO loop; it runs every 16ms.
You can use BaseHTTPServer.HTTPServer in such a case; just set the timeout attribute to something small. bascially...
class XmlRPCApi:
def do_something(self):
print "doing something"
server = SimpleXMLRPCServer(("localhost", 8000))
server.register_instance(XMLRpcAPI())
server.timeout = 0
while True:
sleep(0.016)
do_normal_thing()
x.handle_request()
Edit: python has a built in server, also built on BaseHTTPServer, capable of serving a flask app. since flask.Flask() happens to be a wsgi compliant application, your process_remote_requests() should look like this:
import wsgiref.simple_server
remote_server = wsgire.simple_server('localhost', 8000, app)
# app here is just your Flask() application!
# as before, set timeout to zero so that you can go right back
# to your event loop if there are no requests to handle
remote_server.timeout = 0
def process_remote_requests():
remote_server.handle_request()
This works well enough if you have only short running requests; but if you need to handle requests that may possibly take longer than your event loop's normal polling interval, or if you need to handle more requests than you have polls per unit of time, then you can't use this approach, exactly.
You don't necessarily need to fork off another process, though, You can potentially get by using a pool of workers in another thread. roughly:
import threading
import wsgiref.simple_server
remote_server = wsgire.simple_server('localhost', 8000, app)
POOL_SIZE = 10 # or some other value.
pool = [threading.Thread(target=remote_server.serve_forever) for dummy in xrange(POOL_SIZE)]
for thread in pool:
thread.daemon = True
thread.start()
while True:
pass # normal experiment processing here; don't handle requests in this thread.
However; this approach has one major shortcoming, you now have to deal with concurrency! It's not safe to manipulate your program state as freely as you could with the above loop, since you might be, concurrently manipulating that same state in the main thread (or another http server thread). It's up to you to know when this is valid, wrapping each resource with some sort of mutex lock or whatever is appropriate.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.