Improving speed of xmlrpclib

Improving speed of xmlrpclib - python

I'm working with a device that is essentially a black box, and the only known communication method for it is XML-RPC. It works for most needs, except for when I need to execute two commands very quickly after each other. Due to the overhead and waiting for the RPC response, this is not as quick as desired.
My main question is, how does one reduce this overhead to make this functionality possible? I know the obvious solution is to ditch XML-RPC, but I don't think that's possible for this device, as I have no control over implementing any other protocols from the "server". This also makes it impossible to do a MultiCall, as I can not add valid instructions for MultiCall. Does MultiCall have to be implemented server side? For example, if I have method1(), method2(), and method3() all implemented by the server already, should this block of code work to execute them all in one reply? I'd assume no from my testing so far, as the documentation shows examples where I need to initialize commands on the server side.
server=xmlrpclib.ServerProxy(serverURL)
multicall=xmlrpclib.MultiCall(server)
multicall.method1()
multicall.method2()
mutlicall.method3()
multicall()
Also, looking through the source of xmlrpclib, I see references to a "FastParser" as opposed to a default one that is used. However, I can not determine how to enable this parser over the default. Additionally, the comment on this answer mentions that it parses one character at a time. I believe this is related, but again, no idea how to change this setting.

Unless the bulk size of your requests or responses are very large, it's unlikely that changing the parser will affect the turnaround time (since CPU is much faster than network).
You might want to consider, if possible, sending more than one command to the device without waiting for the response from the first one. If the device can handle multiple requests at once, then this may be of benefit. Even if the device only handles requests in sequence, you can still have the next request waiting at the device so that there is no delay after processing the previous one. If the device serialises requests in this way, then that's goingn to be about the best you can do.

Related

Python multi-processing one worker dynimc number of recievers of all worker data (1:n)

I am planing to setup a small proxy service for a remote sensor, that only accepts one connection. I have a temporary solution and I am now designing a more robust version, and therefore dived deeper into the python multiprocessing module.
I have written a couple of systems in python using a main process, which spawns subprocesses using the multiprocessing module and used multiprocessing.Queue to communicate between them. This works quite well and some of theses programs/scripts are doing their job in a production environment.
The new case is slightly different since it uses 2+n processes:
One data-collector, that reads data from the sensor (at 100Hz) and every once in a while receives short ASCII strings as command
One main-server, that binds to a socket and listens, for new connections and spawns...
n child-servers, that handle clients who want to have the sensor data
while communication from the child servers to the data collector seems pretty straight forward using a multiprocessing.Queue which manages a n:1 connection well enough, I have problems with the other way. I can't use a queue for that as well, because all child-servers need to get all the data the sensor produces, while they are active. At least I haven't found a way to configure a Queue to mimic that behaviour, as get takes the top most out of the Queue by design.
I looked into shared memory already, which massively increases the management overhead, since as far as I understand it while using it, I would basically need to implement a streaming buffer myself.
The only safe way I see right now, is using a redis server and messages queues, but I am a bit hesitant, since that would need more infrastructure than I like.
Is there a pure python internal way?

maybe You can use MQTT for that ?
You did not clearly specify, but sounds like observer pattern -
or do You want the clients to poll each time they need data ?
It depends which delays / data rate / jitter etc. You can accept.
after You provided the information :
The whole setup runs on one machine in one process space. What I would like to have, is a way without going through a third party process
I would suggest to check for observer pattern.
More informations can be found for example:
https://www.youtube.com/watch?v=_BpmfnqjgzQ&t=1882s
and
https://refactoring.guru/design-patterns/observer/python/example
and
https://www.protechtraining.com/blog/post/tutorial-the-observer-pattern-in-python-879
and
https://python-3-patterns-idioms-test.readthedocs.io/en/latest/Observer.html
Your Server should fork for each new connection and register with the observer, and will be therefore informed about every change.

mpi4py recv data cap?

I am working on a program that is communication intensive with a group of people. I'm not particularly good at debugging distributed programs, but I have a strong suspicion that I am sending too many messages at once to a process. I have reimplemented the actor model in mpi4py. Each process has a "mailbox" of jobs and when they finish with their mailbox they decide to go into CHECK_FOR_UPDATES mode, where they see if there is any new messages they can receive.
I had issues with the program that a group of students and I have been working on. When the load became too big it would start to crash, but we couldn't figure out where the issue was because we're all pretty bad at debugging stuff.
I asked some people at my school if he had any ideas and suggested that, as we are reimplementing the actor system, we should consider using Akka. A student this year said that there may still be a problem, that one actor may get inundated with messages and crash. I asked about it here. The stream model seems not to be what we want (see my comment for more details) and I have since then looked back at the mpi4py program as I had not accounted for this problem before.
In the plain C or Fortran implementation, it appears that there is a count parameter for MPI_Recv. I noticed that comm.recv has no count parameter and suspect that when a process goes into CHECK_FOR_UPDATES mode it just consume a ton of messages from a variety of sources and dies. (Technically, I don't know for sure, but we suspect it might be the case.) Is there a way to cap the amount of data comm.recv accepts?
(Note: I want to avoid using comm.Recv variant as it restricts the user to using numpy arrays.)

Found the answer:
The recv() and irecv() methods may be passed a buffer object that can be repeatedly used to receive messages avoiding internal memory allocation. The buffer must be sufficiently large to accomodate the transmitted messages.
Emphasis mine. Therefore, I have to use Send and Recv.

How to grab contents asynchronously using Python(Gevent)?

The scenario is save the response of an API request using RMDB id as a parameter.
I want to grab all the movie info from imdv-id tt0000001 to tt9999999.
Now I'm using gevent to run several threads(gevent.joinall(threads)), it's not so fast.
Is there other solutions for this kind of problems, like using Celery+RabbitMQ?

For one you must make sure that you aren't making any blocking calls in your code,
as that will also block everything else from running, slowing the entire system.
Reasons for blocking include tight loops or IO that has not been patched by eventlet's monkey patch (e.g. C extensions).
Celery supports using eventlet & gevent, and that is probably the recommended concurrency
option for what you are doing (web request IO). Celery may not make your code run faster though, but it enables you to easily distribute the work to many machines.
To optimize you should always profile your code to find out what the bottleneck is. It could be many things, e.g. slow network, slow host, slow DNS or something else entirely.

Writing a Python data analysis server for a Java interface

I want to write data analysis plugins for a Java interface. This interface is potentially run on different computers. The interface will send commands and the Python program can return large data. The interface is distributed by a Java Webstart system. Both access the main data from a MySQL server.
What are the different ways and advantages to implement the communication? Of course, I've done some research on the internet. While there are many suggestions I still don't know what the differences are and how to decide for one. (I have no knowledge about them)
I've found a suggestion to use sockets, which seems fine. Is it simple to write a server that dedicates a Python analysis process for each connection (temporary data might be kept after one communication request for that particular client)?
I was thinking to learn how to use sockets and pass YAML strings.
Maybe my main question is: What is the relation to and advantage of systems like RabbitMQ, ZeroMQ, CORBA, SOAP, XMLRPC?
There were also suggestions to use pipes or shared memory. But that wouldn't fit to my requirements?
Does any of the methods have advantages for debugging or other pecularities?
I hope someone can help me understand the technology and help me decide on a solution, as it is hard to judge from technical descriptions.
(I do not consider solutions like Jython, JEPP, ...)

Offering an opinion on the merits you described, it sounds like you are dealing with potentially large data/queries that may take a lot of time to fetch and serialize, in which case you definitely want to go with something that can handle concurrent connections without stacking up threads. Thereby, in the Python domain, I can't recommend any networking library other than Twisted.
http://twistedmatrix.com/documents/current/core/examples/
Whether you decide to use vanilla HTTP or your own protocol, twisted is pretty much the one stop shop for concurrent networking. Sure, the name gets thrown around alot, and the documentation is Atlantean, but if you take the time to learn it there is very little in the networking domain you cannot accomplish. You can extend the base protocols and factories to make one server that can handle your data in a reactor-based event loop and respond to deferred request when ready.
The serialization format really depends on the nature of the data. Will there be any binary in what is output as a response? Complex types? That rules out JSON if so, though that is becoming the most common serialization format. YAML sometimes seems to enjoy a position of privilege among the python community - I haven't used it extensively as most of the kind of work I've done with serials was data to be rendered in a frontend with javascript.
Message queues are really the most important tool in the toolbox when you need to defer background tasks without hanging response. They are commonly employed in web apps where the HTTP request should not hang until whatever complex processing needs to take place completes, so the UI can render early and count on an implicit "promise" the processing will take place. They have two important traits: they rely on eventual consistency, in that the process can finish long after the response in the protocol is sent, and they also have fail-safe and try-again directives should a task fail. They are where you turn in the "do this really hard task as soon as you can and I trust you to get it done" problem domain.
If we are not talking about potentially HUGE response bodies, and relatively simple data types within the serialized output, there is nothing wrong with rolling a simple HTTP deferred server in Twisted.

Python twisted asynchronous write using deferred

With regard to the Python Twisted framework, can someone explain to me how to write asynchronously a very large data string to a consumer, say the protocol.transport object?
I think what I am missing is a write(data_chunk) function that returns a Deferred. This is what I would like to do:
data_block = get_lots_and_lots_data()
CHUNK_SIZE = 1024 # write 1-K at a time.
def write_chunk(data, i):
d = transport.deferredWrite(data[i:i+CHUNK_SIZE])
d.addCallback(write_chunk, data, i+1)
write_chunk(data, 0)
But, after a day of wandering around in the Twisted API/Documentation, I can't seem to locate anything like the deferredWrite equivalence. What am I missing?

As Jean-Paul says, you should use IProducer and IConsumer, but you should also note that the lack of deferredWrite is a somewhat intentional omission.
For one thing, creating a Deferred for potentially every byte of data that gets written is a performance problem: we tried it in the web2 project and found that it was the most significant performance issue with the whole system, and we are trying to avoid that mistake as we backport web2 code to twisted.web.
More importantly, however, having a Deferred which gets returned when the write "completes" would provide a misleading impression: that the other end of the wire has received the data that you've sent. There's no reasonable way to discern this. Proxies, smart routers, application bugs and all manner of network contrivances can conspire to fool you into thinking that your data has actually arrived on the other end of the connection, even if it never gets processed. If you need to know that the other end has processed your data, make sure that your application protocol has an acknowledgement message that is only transmitted after the data has been received and processed.
The main reason to use producers and consumers in this kind of code is to avoid allocating memory in the first place. If your code really does read all of the data that it's going to write to its peer into a giant string in memory first (data_block = get_lots_and_lots_data() pretty directly implies that) then you won't lose much by doing transport.write(data_block). The transport will wake up and send a chunk of data as often as it can. Plus, you can simply do transport.write(hugeString) and then transport.loseConnection(), and the transport won't actually disconnect until either all of the data has been sent or the connection is otherwise interrupted. (Again: if you don't wait for an acknowledgement, you won't know if the data got there. But if you just want to dump some bytes into the socket and forget about it, this works okay.)
If get_lots_and_lots_data() is actually reading a file, you can use the included FileSender class. If it's something which is sort of like a file but not exactly, the implementation of FileSender might be a useful example.

The way large amounts of data is generally handled in Twisted is using the Producer/Consumer APIs. This doesn't give you a write method that returns a Deferred, but it does give you notification about when it's time to write more data.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.