Handling multiple clients with twisted and spyne - python

I'm trying to create a simple python server that can handle multiple RCP calls at the same time. I would like to use twisted for the networking and spyne to handle the RPCs. I found a good example in the spyne github repo here, but when I make a call to say_hello_with_sleep using curl I get an error.
exceptions.AssertionError: It looks like this protocol is not
async-compliant yet
This is the only one of the RPCs thats doesn't seem to work and the one that defines the type of nonblocking call I'm looking for.
The final RPCs that I need to implement will take around 40 sec to process before returning the request and Im honestly not sure if this is the best way to go about handling multiple requests at the same time.
Any help or direction would be greatly appreciated. Thanks!

This is fixed and will be released as part of Spyne 2.13.
You can use code from the master branch of http://github.com/arskom/spyne if you can't wait an indefinite amount of time until the release. Code only gets merged there if it passes all tests.

Related

Concurrent server in Golang

First of all I have to admit that I am a beginner concerning concurrency in general, but reading a lot about it recently. Because I heard that Golang is strong on that area. I wanted to ask how (concurrent) servers are written in this language.
I mean, there are different ways in how to write a server that can handle multiple requests/connections concurrently. You can use threads, asynchronous programming (async/asyncio in Python for example), and in Golang there are goroutines, which is more or less a lightweight thread.
However, when using Python and async/asyncio you can have one single process and one thread and it's able to handle concurrency. However, the code is complicated (at least for me without any background).
My question:
What is the way to go to write a concurrent server in Golang? Just a new goroutine for every connection or are there any asynchronous ways? What's the "best practice"?
I mean is it not expensive to have LOTS of goroutines on a highly used server? How to do a well-written server in Golang?
For beginner the best way to start is just use https://golang.org/pkg/net/http/ and just write http handlers. You don't need to initialize Go routines - the http.Server will do it for you.
The code will be straight forward with blocking calls. You don't need to think about concurrency at this stage as Go will do it for you. For example when you do a call like
record, err := someDb.GetRecordByID(123)
actually it's an asynchronous call that blocks current flow but release thread to other Go routines. It will continue flow once data returned and a thread (may be different from previous) becomes available.
If you will need to do concurrent calls within 1 HTTP request you can start Go routines. But leave it for later stage and do the Go lang tour on concurrency first.
If you really need a high load solution for HTTP requests consider using https://github.com/valyala/fasthttp instead of standard http package.
For HTTP #icza's comments & Alexander's answer give a fair idea. Just to add Goroutines are not expensive because they are lighter than normal threads. They can have variable sized stack (probably start as low as 2k) & hence can scale up very well with less operating overhead.
Also on http, there are third party libraries like Gorilla mux which can make life better as also other frameworks like Buffalo which you can explore. While I haven't used the latter, I have heard it makes life easier.
Now if you are going to be writing your own custom server (something different from http) then again Go is a great choice for it. The program can start as simple as https://golang.org/pkg/net/#example_Listener (To try running this program, you can use netcat like this from another terminal)
$ nc localhost 2000
Hellow
Hellow
And finally channels in Go make sharing data & communication much easier and safer across routines taking care of the synchronization aspects. Hope this helps.
My question: What is the way to go to write a concurrent server in
Golang? Just a new goroutine for every connection or are there any
asynchronous ways? What's "best practice"?
Golang http package will do requests concurrency handling for you and I really like that code looks like synchronous and you don't need to add any async/await keywords. Here is how you start
func helloHandler(w http.ResponseWriter, r *http.Request) {
fmt.Fprintf(w, "Hello")
}
http.HandleFunc("/hello", helloHandler)
log.Fatal(http.ListenAndServe(":8080", nil))

Class for making asynchronous API requests with Twisted

I am working on development of a system that collect data from rest servers and manipulates it.
One of the requirements is multiple and frequent API requests. We currently implement this in a somewhat synchronous way. I can easily implement this using threads but considering the system might need to support thousands of requests per second I think it would be wise to utilize Twisted's ability to efficiently implement the above. I have seen this blog post and the whole idea of deferred list seems to do the trick. But I am kind of stuck with how to structure my class (can't wrap my mind around how Twisted works).
Can you try to outline the structure of the class that will run the event-loop and will be able to get a list of URLs and headers and return a list of results after making the requests asynchronously?
Do you know of a better way of implementing this in python?
Sounds like you want to use a Twisted project called treq which allows you to send requests to HTTP endpoints. It works alot like requests. I recently helped a friend here in this thread. My answer there might be of some use to you. If you still need more help, just make a comment and I'll try my best to update this answer.

Best practice to make partial search results appear (one by one as they come in from a secondary server)

I'd like to do the following:
the queries on a django site (first server) are send to a second
server (for performance and security reasons)
the query is processed on the second server using sqlite
the python search function has to keep a lot of data in memory. a simple cgi would always have to reread data from disk which would further slow down the search process. so i guess i need some daemon to run on the second server.
the search process is slow and i'd like to send partial results back, and show them as they arrive.
this looks like a common task, but somehow i don't get it.
i tried Pyro first which exposes the search class (and then i needed a workaround to avoid sqlite threading issues). i managed to get the complete search results onto the first server, but only as a whole. i don't know how to "yield" the results one by one (as generators cannot be pickled), and i anyway wouldn't know how to write them one by one onto the search result page.
i may need some "push technology" says this thread: https://stackoverflow.com/a/5346075/1389074 talking about some different framework. but which?
i don't seem to search for the right terms. maybe someone can point me to some discussions or frameworks that address this task?
thanks a lot in advance!
You can use python tornado websockets. This will allow you to establish 2 way connection from the client side to the server and return data as it comes. Tornado is an async framework built in python.

How to create a polling script in Python?

I was trying to create a polling script in python that starts when another python script starts and then keeps supplying data back to this script.
I can obviously write an infinite loop but is that the right way to go about it? I might loose control over how the functions work and how many times a function should be called in an hour.
Edit:
What I am trying to accomplish is to poll the REST API of twitter and get new mentions and people who follow me. I obviously can't keep polling because I will run out of API requests per hour. Thus, the issue. This poller, will send the new mention and follower id/user to the main script that would be listening to any such update.
I highly suggest looking into Twisted, one of the most popular async frameworks using the reactor pattern.
The "infinite loop" you are looking for is really an application pattern that Twisted implements to respond to events asynchronously, and it almost never makes sense to roll your own.
Twisted is largely used for networking requirements, but the it has a LoopingCall interface to set up the kind of functionality you require. Using the core Twisted deferred as your request model allows you to set up a long-polling server that can perform the kind of conditional network test you need. It can intially be a little intimidating, but once you understand the core components (Factories, Reactors, Protocols etc) that you need to inherit it becomes much easier to visualize your problem.
This also might be a good tutorial to start looking at the basics of the "push" model:
http://carloscarrasco.com/simple-http-pubsub-server-with-twisted.html

Python Socket and Thread pooling, how to get more performance?

I am trying to implement a basic lib to issue HTTP GET requests. My target is to receive data through socket connections - minimalistic design to improve performance - usage with threads, thread pool(s).
I have a bunch of links which I group by their hostnames, so here's a simple demonstration of input URLs:
hostname1.com - 500 links
hostname2.org - 350 links
hostname3.co.uk - 100 links
...
I intend to use sockets because of performance issues. I intend to use a number of sockets which keeps connected (if possible and it usually is) and issue HTTP GET requests. The idea came from urllib low performance on continuous requests, then I met urllib3, then I realized it uses httplib and then I decided to try sockets. So here's what I accomplished till now:
GETSocket class, SocketPool class, ThreadPool and Worker classes
GETSocket class is a minified, "HTTP GET only" version of Python's httplib.
So, I use these classes like that:
sp = Comm.SocketPool(host,size=self.poolsize, timeout=5)
for link in linklist:
pool.add_task(self.__get_url_by_sp, self.count, sp, link, results)
self.count += 1
pool.wait_completion()
pass
__get_url_by_sp function is a wrapper which calls sp.urlopen and saves the result to results list. I am using a pool of 5 threads which has a socket pool of 5 GETSocket classes.
What I wonder is, is there any other possible way that I can improve performance of this system?
I've read about asyncore here, but I couldn't figure out how to use same socket connection with class HTTPClient(asyncore.dispatcher) provided.
Another point, I don't know if I'm using a blocking or a non-blocking socket, which would be better for performance or how to implement which one.
Please be specific about your experiences, I don't intend to import another library to do just HTTP GET so I want to code my own tiny library.
Any help appreciated, thanks.
Do this.
Use multiprocessing. http://docs.python.org/library/multiprocessing.html.
Write a worker Process which puts all of the URL's into a Queue.
Write a worker Process which gets a URL from a Queue and does a GET, saving a file and putting the File information into another Queue. You'll probably want multiple copies of this Process. You'll have to experiment to find how many is the correct number.
Write a worker Process which reads file information from a Queue and does whatever it is that you're trying do.
I finally found a well chosen path to solve my problems. I was using Python 3 for my project and my only option was to use pycurl, so this made me have to port my project back to Python 2.7 series.
Using pycurl, I gained:
- Consistent responses to my requests (actually my script has to deal with minimum 10k URLs)
- With the usage of ThreadPool class I am receiving responses as fast as my system can (received data is processed later - so multiprocessing is not much of a possibility here)
I tried httplib2 first, I realized that it is not acting as solid as it acts on Python 2, by switching to pycurl I lost caching support.
Final conclusion: When it comes to HTTP communication, one could need a tool like (py)curl at his disposal. It is a lifesaver, especially when one is dealing with loads of URLs (try sometimes for fun: you will get lots of weird responses from them)
Thanks for the replies, folks.

Categories

Resources