I'm trying to figure out the best way to publish and receive data between separate programs. My ideal setup is to have one program constantly receive market data from an external websocket api and to have multiple other programs use this data. Since this is market data from an exchange, the lower the overhead the better.
My first thoughts were to write out a file and have the others read it, but that seems like there would be file locking issues. Another approach I tried was to use UDP sockets, but it seems like the socket blocks the rest of the program when receiving. I'm pretty new at writing full fledged programs instead of little scripts so sorry if this a dumb question. Any suggestions would be appreciated. Thanks!
You can use SQS, It is easy to use and the Python
documentation for it is great. If you want a free one you can use Kafka
Try something like an message queue, e.g. https://github.com/kr/beanstalkd, and you essentially control it via the client ... one that collects and sends, and one that consumes and marks what it has read ... and so on.
Beanstalk is super-light-weight and simple compared to other message queues which are more like multi app. systems rather than queues necessarily.
Related
First of all I have to admit that I am a beginner concerning concurrency in general, but reading a lot about it recently. Because I heard that Golang is strong on that area. I wanted to ask how (concurrent) servers are written in this language.
I mean, there are different ways in how to write a server that can handle multiple requests/connections concurrently. You can use threads, asynchronous programming (async/asyncio in Python for example), and in Golang there are goroutines, which is more or less a lightweight thread.
However, when using Python and async/asyncio you can have one single process and one thread and it's able to handle concurrency. However, the code is complicated (at least for me without any background).
My question:
What is the way to go to write a concurrent server in Golang? Just a new goroutine for every connection or are there any asynchronous ways? What's the "best practice"?
I mean is it not expensive to have LOTS of goroutines on a highly used server? How to do a well-written server in Golang?
For beginner the best way to start is just use https://golang.org/pkg/net/http/ and just write http handlers. You don't need to initialize Go routines - the http.Server will do it for you.
The code will be straight forward with blocking calls. You don't need to think about concurrency at this stage as Go will do it for you. For example when you do a call like
record, err := someDb.GetRecordByID(123)
actually it's an asynchronous call that blocks current flow but release thread to other Go routines. It will continue flow once data returned and a thread (may be different from previous) becomes available.
If you will need to do concurrent calls within 1 HTTP request you can start Go routines. But leave it for later stage and do the Go lang tour on concurrency first.
If you really need a high load solution for HTTP requests consider using https://github.com/valyala/fasthttp instead of standard http package.
For HTTP #icza's comments & Alexander's answer give a fair idea. Just to add Goroutines are not expensive because they are lighter than normal threads. They can have variable sized stack (probably start as low as 2k) & hence can scale up very well with less operating overhead.
Also on http, there are third party libraries like Gorilla mux which can make life better as also other frameworks like Buffalo which you can explore. While I haven't used the latter, I have heard it makes life easier.
Now if you are going to be writing your own custom server (something different from http) then again Go is a great choice for it. The program can start as simple as https://golang.org/pkg/net/#example_Listener (To try running this program, you can use netcat like this from another terminal)
$ nc localhost 2000
Hellow
Hellow
And finally channels in Go make sharing data & communication much easier and safer across routines taking care of the synchronization aspects. Hope this helps.
My question: What is the way to go to write a concurrent server in
Golang? Just a new goroutine for every connection or are there any
asynchronous ways? What's "best practice"?
Golang http package will do requests concurrency handling for you and I really like that code looks like synchronous and you don't need to add any async/await keywords. Here is how you start
func helloHandler(w http.ResponseWriter, r *http.Request) {
fmt.Fprintf(w, "Hello")
}
http.HandleFunc("/hello", helloHandler)
log.Fatal(http.ListenAndServe(":8080", nil))
I'd like to do the following:
the queries on a django site (first server) are send to a second
server (for performance and security reasons)
the query is processed on the second server using sqlite
the python search function has to keep a lot of data in memory. a simple cgi would always have to reread data from disk which would further slow down the search process. so i guess i need some daemon to run on the second server.
the search process is slow and i'd like to send partial results back, and show them as they arrive.
this looks like a common task, but somehow i don't get it.
i tried Pyro first which exposes the search class (and then i needed a workaround to avoid sqlite threading issues). i managed to get the complete search results onto the first server, but only as a whole. i don't know how to "yield" the results one by one (as generators cannot be pickled), and i anyway wouldn't know how to write them one by one onto the search result page.
i may need some "push technology" says this thread: https://stackoverflow.com/a/5346075/1389074 talking about some different framework. but which?
i don't seem to search for the right terms. maybe someone can point me to some discussions or frameworks that address this task?
thanks a lot in advance!
You can use python tornado websockets. This will allow you to establish 2 way connection from the client side to the server and return data as it comes. Tornado is an async framework built in python.
I'm trying to do some machinery automation with python, but I've run into a problem.
I have code that does the actual control, code that logs, code the provides a GUI, and some other modules all being called from a single script.
The issue is that an error in one module halts all the others. So, for instance a bug in the GUI will kill the control systems.
I want to be able to have the modules run independently, so one can crash, be restarted, be patched, etc without halting the others.
The only way I can find to make that work is to store the variables in an SQL database, or files or something.
Is there a way for one python script to sort of ..debug another? so that one script can read or change the variables in the other? I can't find a way to do that that also allows to scripts to be started and stopped independently.
Does anyone have any ideas or advice?
A fairly effective way to do this is to use message passing. Each of your modules are independent, but they can send and receive messages to each other. A very good reference on the many ways to achieve this in Python is the Python wiki page for parallel processing.
A generic strategy
Split your program into pieces where there are servers and clients. You could then use middleware such as 0MQ, Apache ActiveMQ or RabbitMQ to send data between different parts of the system.
In this case, your GUI could send a message to the log parser server telling it to begin work. Once it's done, the log parser will send a broadcast message to anyone interested telling the world the a reference to the results. The GUI could be a subscriber to the channel that the log parser subscribes to. Once it receives the message, it will open up the results file and display whatever the user is interested in.
Serialization and deserialization speed is important also. You want to minimise the overhead for communicating. Google Protocol Buffers and Apache Thrift are effective tools here.
You will also need some form of supervision strategy to prevent a failure in one of the servers from blocking everything. supervisord will restart things for you and is quite easy to configure. Again, it is only one of many options in this space.
Overkill much?
It sounds like you have created a simple utility. The multiprocessing module is an excellent way to have different bits of the program running fairly independently. You still apply the same strategy (message passing, no shared shared state, supervision), but with different tactics.
You want multiply independent processes, and you want them to talk to each other. Hence: read what methods of inter-process communication are available on your OS. I recommend sockets (generic, will work over a n/w and with diff OSs). You can easily invent a simple (maybe http-like) protocol on top of TCP, maybe with json for messages. There is a bunch of classes coming with Python distribution to make it easy (SocketServer.ThreadingMixIn, SocketServer.TCPServer, etc.).
I want to write data analysis plugins for a Java interface. This interface is potentially run on different computers. The interface will send commands and the Python program can return large data. The interface is distributed by a Java Webstart system. Both access the main data from a MySQL server.
What are the different ways and advantages to implement the communication? Of course, I've done some research on the internet. While there are many suggestions I still don't know what the differences are and how to decide for one. (I have no knowledge about them)
I've found a suggestion to use sockets, which seems fine. Is it simple to write a server that dedicates a Python analysis process for each connection (temporary data might be kept after one communication request for that particular client)?
I was thinking to learn how to use sockets and pass YAML strings.
Maybe my main question is: What is the relation to and advantage of systems like RabbitMQ, ZeroMQ, CORBA, SOAP, XMLRPC?
There were also suggestions to use pipes or shared memory. But that wouldn't fit to my requirements?
Does any of the methods have advantages for debugging or other pecularities?
I hope someone can help me understand the technology and help me decide on a solution, as it is hard to judge from technical descriptions.
(I do not consider solutions like Jython, JEPP, ...)
Offering an opinion on the merits you described, it sounds like you are dealing with potentially large data/queries that may take a lot of time to fetch and serialize, in which case you definitely want to go with something that can handle concurrent connections without stacking up threads. Thereby, in the Python domain, I can't recommend any networking library other than Twisted.
http://twistedmatrix.com/documents/current/core/examples/
Whether you decide to use vanilla HTTP or your own protocol, twisted is pretty much the one stop shop for concurrent networking. Sure, the name gets thrown around alot, and the documentation is Atlantean, but if you take the time to learn it there is very little in the networking domain you cannot accomplish. You can extend the base protocols and factories to make one server that can handle your data in a reactor-based event loop and respond to deferred request when ready.
The serialization format really depends on the nature of the data. Will there be any binary in what is output as a response? Complex types? That rules out JSON if so, though that is becoming the most common serialization format. YAML sometimes seems to enjoy a position of privilege among the python community - I haven't used it extensively as most of the kind of work I've done with serials was data to be rendered in a frontend with javascript.
Message queues are really the most important tool in the toolbox when you need to defer background tasks without hanging response. They are commonly employed in web apps where the HTTP request should not hang until whatever complex processing needs to take place completes, so the UI can render early and count on an implicit "promise" the processing will take place. They have two important traits: they rely on eventual consistency, in that the process can finish long after the response in the protocol is sent, and they also have fail-safe and try-again directives should a task fail. They are where you turn in the "do this really hard task as soon as you can and I trust you to get it done" problem domain.
If we are not talking about potentially HUGE response bodies, and relatively simple data types within the serialized output, there is nothing wrong with rolling a simple HTTP deferred server in Twisted.
I was trying to create a polling script in python that starts when another python script starts and then keeps supplying data back to this script.
I can obviously write an infinite loop but is that the right way to go about it? I might loose control over how the functions work and how many times a function should be called in an hour.
Edit:
What I am trying to accomplish is to poll the REST API of twitter and get new mentions and people who follow me. I obviously can't keep polling because I will run out of API requests per hour. Thus, the issue. This poller, will send the new mention and follower id/user to the main script that would be listening to any such update.
I highly suggest looking into Twisted, one of the most popular async frameworks using the reactor pattern.
The "infinite loop" you are looking for is really an application pattern that Twisted implements to respond to events asynchronously, and it almost never makes sense to roll your own.
Twisted is largely used for networking requirements, but the it has a LoopingCall interface to set up the kind of functionality you require. Using the core Twisted deferred as your request model allows you to set up a long-polling server that can perform the kind of conditional network test you need. It can intially be a little intimidating, but once you understand the core components (Factories, Reactors, Protocols etc) that you need to inherit it becomes much easier to visualize your problem.
This also might be a good tutorial to start looking at the basics of the "push" model:
http://carloscarrasco.com/simple-http-pubsub-server-with-twisted.html