mpi4py recv data cap?

mpi4py recv data cap? - python

I am working on a program that is communication intensive with a group of people. I'm not particularly good at debugging distributed programs, but I have a strong suspicion that I am sending too many messages at once to a process. I have reimplemented the actor model in mpi4py. Each process has a "mailbox" of jobs and when they finish with their mailbox they decide to go into CHECK_FOR_UPDATES mode, where they see if there is any new messages they can receive.
I had issues with the program that a group of students and I have been working on. When the load became too big it would start to crash, but we couldn't figure out where the issue was because we're all pretty bad at debugging stuff.
I asked some people at my school if he had any ideas and suggested that, as we are reimplementing the actor system, we should consider using Akka. A student this year said that there may still be a problem, that one actor may get inundated with messages and crash. I asked about it here. The stream model seems not to be what we want (see my comment for more details) and I have since then looked back at the mpi4py program as I had not accounted for this problem before.
In the plain C or Fortran implementation, it appears that there is a count parameter for MPI_Recv. I noticed that comm.recv has no count parameter and suspect that when a process goes into CHECK_FOR_UPDATES mode it just consume a ton of messages from a variety of sources and dies. (Technically, I don't know for sure, but we suspect it might be the case.) Is there a way to cap the amount of data comm.recv accepts?
(Note: I want to avoid using comm.Recv variant as it restricts the user to using numpy arrays.)

Found the answer:
The recv() and irecv() methods may be passed a buffer object that can be repeatedly used to receive messages avoiding internal memory allocation. The buffer must be sufficiently large to accomodate the transmitted messages.
Emphasis mine. Therefore, I have to use Send and Recv.

Related

Middleware to optimize postgres

In my company, we have an ingestion service written in Go whose job is to take messages from a HTTP end point and store them in Postgres. It receives a peak throughput of 50,000 messages/second. However, our database can handle a maximum of 30,000 messages/second.
Is it possible to write a middleware in Python to optimize this? If so please explain.

It seems to be pretty unrelated to Python or any particular programming language.
These are typical questions to be asked and answers to be given:
Are there duplicates? If yes, don't save every message immediately but rather wait for duplicates (for what some kind of RAM-originated cache is required, the simplest one is <thread-safe?> hashtable).
Batch your message into large enough packs and then dump them into PostgreSQL all-at-once. You have to determine what is "large enough" based on load tests.
Can you drop some of those messages? If your data is not of critical importance, or at least not all of it, then you may detect overload by tracking number of pending messages and start to throw incoming stuff away until load becomes acceptable.

set_sequential_download() and set_piece_deadline() in libtorrent

i'm working on my project which is to make a streaming client over libtorrent.
i'm using the python client (python binding).
i searched a lot about these functions set_sequential_download() and set_piece_deadline() and i couldn't find a good answer on how to force download pieces in order, which means first piece 1 and then 2,3,4 etc..
i saw people are asking this in forums, but none of them got a good answer on the changes need to be done in order it to succeed.
i understood that the set_sequential_download() just asks for the pieces in order but in fact they are randomly downloaded. i tried to change the deadline of the pieces using set_piece_deadline() , increment each piece but it doesn't work for me at all.
** UPDATE
the goal i'm trying to acomplish , it's downloading one piece at a time so i can make a streaming throgh torrents.
i hope some of you can help me,
thanks Ben.

set_sequential_download() will request pieces in order. However:
all peers may not have all pieces. If the next piece you want to download is 3 and one of your peers doesn't have 3 but the next it has is 5, libtorrent will start requesting blocks from piece 5 from that peer.
peers provide varying upload rates, which means that some peers will satisfy your request sooner than others.
This makes it possible for the pieces to complete out-of-order.
set_piece_deadline() is a more flexible way to specify piece priority. It supports arbitrary range requests (as described by Jacob Zelek). Its main feature, though, is that it uses a different approach to requesting blocks. Instead of considering a peer at a time, and asking "what should I request from this peer", it considers a piece at a time, asking "which peer should I request this block from".
This makes it deliberately attempt to make pieces complete in the order of their deadlines. It is still an estimate based on historical download rates from peers, and if the bottleneck for download rates is your own download capacity, it may be very difficult to make predictions of future download rates for peers. A few important things to keep in mind when using the `set_piece_deadline()`` API are:
It's not important that the deadline is in the future. If the deadline cannot be met given the current download or upload capacity, the pieces will be prioritized in the order they were asked to be completed.
If a deadline is far out in the future, libtorrent may wait to prioritize it until it believe it needs to request it to make the deadline. If you're streaming a large file, and you know the bit-rate, you can set up deadlines for every piece, and if your capacity is higher than the bitrate, you'll still request some pieces in rarest-first order. Improves swarm quality.
When streaming data, it's absolutely critical to read-ahead. If you don't set the deadline until you want the piece, you'll always fall behind. There's typically a pretty long round-trip between requesting a piece and completing it. If you don't keep the request pipe full of deadline-pieces, libtorrent will start requesting other pieces again, and you'll get non-prioritized pieces interleaved with your high-priority pieces. You should probably keep a few seconds and at least a few pieces as read-ahead. For video, I would imagine tens of megabytes is appropriate (but experimentation and measurement is the best way to tweak it).
If you are in fact looking to stream video to a player or web browser over HTTP, you may want to take a look at (or use and submit pull requests to):
https://github.com/arvidn/libtorrent-webui/blob/master/src/file_downloader.cpp
that's a file-downloader provider that fits into simple http framework in that repository.
UPDATE:
If all you want is to guarantee that piece 1 completes before piece 2 (at any cost, specifically very poor performance), you can set the priority of all pieces to 0, except for the one piece you want to download. Once it completes, you'll be notified by an alert and you can set the priority of the next piece you want to 1. And so on.
This will be incredibly slow, since you'll pause the download constantly, and be in constant end-game mode (where you may download the same block from multiple peers, if one is slow). For instance, if you have more peers than there are blocks in one piece, you will leave download bandwidth unused, by not being able to request from all peers.

I've ran into the same problem as you. Setting a torrent to sequential download means the pieces will be downloaded in a somewhat ordered fashion. This may be the intuitive solution for streaming. However, streaming video is more complicated then just downloading all the pieces in order.
Video files come in different containers (e.g. mkv, mp4, avi) and different codes (h264, theora, etc). Some codecs/containers store metadata/headers in different locations in a file. I can't remember off the top of my head but a certain container/codec stores all header information at the end of the file. Such a file may not stream well if downloaded sequentially.
Unless you write the code for determining which pieces are needed to start streaming, you will have to rely on an existing mechanisms. Take for example Peerflix which spawns a browser video player, VLC, of Mplayer. These applications have a good idea of what byte ranges they need for various containers/codecs. When Peerflix launches VLC to play, lets say, an AVI file, VLC will attempt to read the first several bytes and last several bytes (headers).
The genius behind Peerflix is that it tries to serve the video file through it's own web server and therefore knows what byte ranges of the file VLC is seeking. It then determines which pieces the byte ranges fall into and prioritizes those pieces. Peerflix uses some Node.js BitTorrent library, whose exact piece prioritization mechanisms are unknown to me. However, in the case of libtorrent-rasterbar, the set_piece_deadline() function allows you to signal the library to what pieces you need. In my experience, once I determined the pieces needed, I would call set_piece_deadline() with a short deadline (50ms or so) and wait for the arrival. Please note that using set_piece_dealine() is incompatible with sequential downloads (just set them to false).
One thing to note, libtorrent-rasterbar will not write the piece to the hard drive as soon as it gets it. This is a trap I fell into because I tried to read that byte range from the file when the piece arrived. For this you will need to run a thread to catch the alerts that libtorrent-rasterbar passes to your application. More specifically you will receive the raw binary data for that piece in a read_piece_alert.

Improving speed of xmlrpclib

I'm working with a device that is essentially a black box, and the only known communication method for it is XML-RPC. It works for most needs, except for when I need to execute two commands very quickly after each other. Due to the overhead and waiting for the RPC response, this is not as quick as desired.
My main question is, how does one reduce this overhead to make this functionality possible? I know the obvious solution is to ditch XML-RPC, but I don't think that's possible for this device, as I have no control over implementing any other protocols from the "server". This also makes it impossible to do a MultiCall, as I can not add valid instructions for MultiCall. Does MultiCall have to be implemented server side? For example, if I have method1(), method2(), and method3() all implemented by the server already, should this block of code work to execute them all in one reply? I'd assume no from my testing so far, as the documentation shows examples where I need to initialize commands on the server side.
server=xmlrpclib.ServerProxy(serverURL)
multicall=xmlrpclib.MultiCall(server)
multicall.method1()
multicall.method2()
mutlicall.method3()
multicall()
Also, looking through the source of xmlrpclib, I see references to a "FastParser" as opposed to a default one that is used. However, I can not determine how to enable this parser over the default. Additionally, the comment on this answer mentions that it parses one character at a time. I believe this is related, but again, no idea how to change this setting.

Unless the bulk size of your requests or responses are very large, it's unlikely that changing the parser will affect the turnaround time (since CPU is much faster than network).
You might want to consider, if possible, sending more than one command to the device without waiting for the response from the first one. If the device can handle multiple requests at once, then this may be of benefit. Even if the device only handles requests in sequence, you can still have the next request waiting at the device so that there is no delay after processing the previous one. If the device serialises requests in this way, then that's goingn to be about the best you can do.

Writing a Python data analysis server for a Java interface

I want to write data analysis plugins for a Java interface. This interface is potentially run on different computers. The interface will send commands and the Python program can return large data. The interface is distributed by a Java Webstart system. Both access the main data from a MySQL server.
What are the different ways and advantages to implement the communication? Of course, I've done some research on the internet. While there are many suggestions I still don't know what the differences are and how to decide for one. (I have no knowledge about them)
I've found a suggestion to use sockets, which seems fine. Is it simple to write a server that dedicates a Python analysis process for each connection (temporary data might be kept after one communication request for that particular client)?
I was thinking to learn how to use sockets and pass YAML strings.
Maybe my main question is: What is the relation to and advantage of systems like RabbitMQ, ZeroMQ, CORBA, SOAP, XMLRPC?
There were also suggestions to use pipes or shared memory. But that wouldn't fit to my requirements?
Does any of the methods have advantages for debugging or other pecularities?
I hope someone can help me understand the technology and help me decide on a solution, as it is hard to judge from technical descriptions.
(I do not consider solutions like Jython, JEPP, ...)

Offering an opinion on the merits you described, it sounds like you are dealing with potentially large data/queries that may take a lot of time to fetch and serialize, in which case you definitely want to go with something that can handle concurrent connections without stacking up threads. Thereby, in the Python domain, I can't recommend any networking library other than Twisted.
http://twistedmatrix.com/documents/current/core/examples/
Whether you decide to use vanilla HTTP or your own protocol, twisted is pretty much the one stop shop for concurrent networking. Sure, the name gets thrown around alot, and the documentation is Atlantean, but if you take the time to learn it there is very little in the networking domain you cannot accomplish. You can extend the base protocols and factories to make one server that can handle your data in a reactor-based event loop and respond to deferred request when ready.
The serialization format really depends on the nature of the data. Will there be any binary in what is output as a response? Complex types? That rules out JSON if so, though that is becoming the most common serialization format. YAML sometimes seems to enjoy a position of privilege among the python community - I haven't used it extensively as most of the kind of work I've done with serials was data to be rendered in a frontend with javascript.
Message queues are really the most important tool in the toolbox when you need to defer background tasks without hanging response. They are commonly employed in web apps where the HTTP request should not hang until whatever complex processing needs to take place completes, so the UI can render early and count on an implicit "promise" the processing will take place. They have two important traits: they rely on eventual consistency, in that the process can finish long after the response in the protocol is sent, and they also have fail-safe and try-again directives should a task fail. They are where you turn in the "do this really hard task as soon as you can and I trust you to get it done" problem domain.
If we are not talking about potentially HUGE response bodies, and relatively simple data types within the serialized output, there is nothing wrong with rolling a simple HTTP deferred server in Twisted.

Python twisted asynchronous write using deferred

With regard to the Python Twisted framework, can someone explain to me how to write asynchronously a very large data string to a consumer, say the protocol.transport object?
I think what I am missing is a write(data_chunk) function that returns a Deferred. This is what I would like to do:
data_block = get_lots_and_lots_data()
CHUNK_SIZE = 1024 # write 1-K at a time.
def write_chunk(data, i):
d = transport.deferredWrite(data[i:i+CHUNK_SIZE])
d.addCallback(write_chunk, data, i+1)
write_chunk(data, 0)
But, after a day of wandering around in the Twisted API/Documentation, I can't seem to locate anything like the deferredWrite equivalence. What am I missing?

As Jean-Paul says, you should use IProducer and IConsumer, but you should also note that the lack of deferredWrite is a somewhat intentional omission.
For one thing, creating a Deferred for potentially every byte of data that gets written is a performance problem: we tried it in the web2 project and found that it was the most significant performance issue with the whole system, and we are trying to avoid that mistake as we backport web2 code to twisted.web.
More importantly, however, having a Deferred which gets returned when the write "completes" would provide a misleading impression: that the other end of the wire has received the data that you've sent. There's no reasonable way to discern this. Proxies, smart routers, application bugs and all manner of network contrivances can conspire to fool you into thinking that your data has actually arrived on the other end of the connection, even if it never gets processed. If you need to know that the other end has processed your data, make sure that your application protocol has an acknowledgement message that is only transmitted after the data has been received and processed.
The main reason to use producers and consumers in this kind of code is to avoid allocating memory in the first place. If your code really does read all of the data that it's going to write to its peer into a giant string in memory first (data_block = get_lots_and_lots_data() pretty directly implies that) then you won't lose much by doing transport.write(data_block). The transport will wake up and send a chunk of data as often as it can. Plus, you can simply do transport.write(hugeString) and then transport.loseConnection(), and the transport won't actually disconnect until either all of the data has been sent or the connection is otherwise interrupted. (Again: if you don't wait for an acknowledgement, you won't know if the data got there. But if you just want to dump some bytes into the socket and forget about it, this works okay.)
If get_lots_and_lots_data() is actually reading a file, you can use the included FileSender class. If it's something which is sort of like a file but not exactly, the implementation of FileSender might be a useful example.

The way large amounts of data is generally handled in Twisted is using the Producer/Consumer APIs. This doesn't give you a write method that returns a Deferred, but it does give you notification about when it's time to write more data.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.