I'm a beginner in python and I'd like to know what the difference between socket and requests modules in python is. I'm sorry if this question format is bad, if this may be due to the fact that I do not know exactly how different protocols of Internet work, then I will be grateful if you share the relevant literature, so that I myself look for the answer to my question. Can you give examples?
How Everything is Related
Requests is one of the most used, highest level python HTTP clients. It is built on urllib3, which is built on httpclient lib. These are all HTTP application protocol libraries that utilize sockets to make calls, and thus sockets is foundation of them all.
More on the Foundational Sockets
Sockets are widely used with most Operating Systems to communicate with networks, since they can control tons of types of connections and transmit data. HTTP libraries rely on sockets, and thus there is no HTTP/TCP protocol connections without them. However, on the opposite side, sockets can exist without HTTP.
Conclusion
In the end, the requests library is fundamentally built on several libs with sockets being the (basically) bottom level. You’ll find that requests is the easiest to use since it is the highest level with the most documentation. You can also use pysocks, however it is much more difficult since it is lower level, but there is some documentation out there that can lead you to the right direction for making your own connections and sending your own encoded data to act as an HTTP request. The main trade off is pysocks are harder to work with, but much more customizable and fast. Most python web-apps/scrapers use requests, and sockets are used more rarely by experts in some cases.
More Reading
Intro to HTTP: https://www.tutorialspoint.com/http/index.htm
Distinguishing Between HTTP and HTTPS protocols are very important to the web: https://www.cloudflare.com/learning/ssl/why-is-http-not-secure/
Requests (pythons objectively most widely used HTTP library): https://docs.python-requests.org/en/latest/
Pysocks (pythons use of sockets): https://pypi.org/project/PySocks/
The Relation of Senders/Receivers (advanced): https://www.geeksforgeeks.org/layers-of-osi-model/
I am trying to develop a script in Python which would function like the NetDisturb utility. Some of you might ask why am I doing this if there is a ready made utility, but the thing is I want to integrate this in a web page. Using this web page I can access a particular set of interfaces which I already know or are present in the back end script and eventually degrade the performance or simply block packets.
I have succeeded a little but now I want to implement a socket connection between the two interfaces which would be connected. I am unable to have a full duplex communication using a socket connection. I am unable to decide which interface should act as master and which interface should act as slave. Because when I make one of the interface as master the listen and accept statements block further execution of the code.
Would using SOCK_DGRAM sockets instead of SOCK_STREAM help me?
You have to use non-blocking sockets. Here is an explanation how to use select to handle non-blocking sockets (for beginners I could really recommend the complete article, it is a good start). Alternatives would be a multi-threaded architecture or asynchore. If you want additionally a GUI, I can recommend pygtk for the interface and glib.io_add_watch to handle the sockets.
But in general I would recommend some high level framework like zeromq. A second high level alternative would be Twisted, but it has a non-pythonic Java-like API and is (IMO) badly documented.
I'm trying to work out how to approach building a "machine" to send and receive messages to WebSphere MQ, via Twisted. I want it to be as generic as possible, so I can reuse it for many different situations that interface with MQ.
I've used Twisted before, but many years ago now and I'm trying to resurrect the knowledge I once had...
The specific problem I'm having is how to implement the MQ IO using Twisted. There's a pymqi Python library that interfaces with MQ, and it provides all the interfaces I need. The MQ calls I need to implement are:
initiate a connection to a specific MQ server/port/channel/queue-manager/queue combination
take content and post it as a message to the desired queue
poll a queue and return the content of the next message in the queue
send a request to a queue manager to find the number of messages currently in a queue
All of these involve blocking calls to MQ.
As I'm intending to reuse the Twisted/MQ interface many times across a range of projects, should I be looking to implement the MQ IO as a Twisted protocol, as a Twisted transport, or just call the pymqi methods via deferToThread() calls? I realise this is a very broad question with possibly no definitive answer; I'm really after advice from those who may have encountered similar challenges before (i.e. working with queueing interfaces that will always block) and found a way that works well.
If you're going to use this functionality a lot, then having a native Twisted implementation is probably worth the effort. A wrapper based on deferToThread will be less work, but it will also be harder to test and debug, perform less well, and have problems on certain platforms where Python threads don't work extremely well (eg FreeBSD).
The approach to take for a native Twisted implementation is probably to implement a protocol that can speak to MQ servers and give it a rich API for interacting with channels, queues, queue managers, etc, and then build a layer on top of that which abstracts the actual network connection away from the application (as I believe mqi/pymqi largely do).
thanks for the interesting responses thus far. In light of said responses I have changed my question a bit.
guess what I really need to know is, is socketserver as opposed to the straight-up socket library designed to handle both periods of latency and stress, i.e. does it have additional mechanisms or features that justify its implicitly advertised status as a "server," or is it just slightly easier to use?
everyone seems to be recommending socketserver but I'm still not entirely clear why, as opposed to socket.
thanks!!!
I've built some server programs in
python based on the standard socket
library
http://docs.python.org/library/socket.html
I've noticed that they seem to work
just fine except that without load
they have a tendency to go to sleep
after a while. I guess this may not
be an issue in production (no doubt
there will be plenty of other issues)
but I would like to know if I am
using the right code for the job here.
Looking around I saw that python also
provides a socketserver library -
http://docs.python.org/library/socketserver.html
The socket library provides the
ability to listen for multiple
connections, typically up to 5.
According to the socketserver page,
its services are synchronous, i.e.
blocking, but one may support
asynchronous behavior via threading.
I did notice it has the ability to
maintain a request queue, with a
default value of up to 5 requests...so
maybe not much difference there.
I have also read that Twisted runs
socketserver under the hood. Though I
would rather not get into a beast the
size of Twisted unless it's going to
be worthwhile.
so my question is, is socketserver
more robust than socket? If so, why?
(And how do you know?)
incidentally, is socketserver built on
top of python's socket or is it
entirely separate?
finally, as a bonus if anyone knows
what one could do wrong such that
standard sockets 'fall asleep' please
feel free to chime in on that too.
Oh, and I'm talking python 2.x rather
than 3.x here if that makes a
difference.
thanks folks!
jsh
Well, I don't have a technical answer but I've implemented SocketServer per folks' recommendations and it IS definitely more reliable. If anyone ever comes up with the low-level explanation please let me know...thanks!
The socket module is a very low-level module for sending and receiving packets. As said in the documentation, it "provides access to the BSD socket interface".
If you want something more elaborate, there is "socketserver" that takes care of the gory details for you, but it is still relatively low level.
On top of that you can find an HTTP server, with or without CGI, an XML-RPC server, and so on. These are frameworks, which usually means that their code calls your code. It makes things simpler because you just have to fill some "gaps" to have a fully working server, but it also means you have a little bit less control over what it does.
If you only need features of socketserver, I would probably go with it, unless you want to reinvent the wheel for some reason (and there are always good reasons to design new wheels, for example to understand how it works).
I'm looking for a good server/client protocol supported in Python for making data requests/file transfers between one server and many clients. Security is also an issue - so secure login would be a plus. I've been looking into XML-RPC, but it looks to be a pretty old (and possibly unused these days?) protocol.
If you are looking to do file transfers, XMLRPC is likely a bad choice. It will require that you encode all of your data as XML (and load it into memory).
"Data requests" and "file transfers" sounds a lot like plain old HTTP to me, but your statement of the problem doesn't make your requirements clear. What kind of information needs to be encoded in the request? Would a URL like "http://yourserver.example.com/service/request?color=yellow&flavor=banana" be good enough?
There are lots of HTTP clients and servers in Python, none of which are especially great, but all of which I'm sure will get the job done for basic file transfers. You can do security the "normal" web way, which is to use HTTPS and passwords, which will probably be sufficient.
If you want two-way communication then HTTP falls down, and a protocol like Twisted's perspective broker (PB) or asynchronous messaging protocol (AMP) might suit you better. These protocols are certainly well-supported by Twisted.
ProtocolBuffers was released by Google as a way of serializing data in a very compact efficient way. They have support for C++, Java and Python. I haven't used it yet, but looking at the source, there seem to be RPC clients and servers for each language.
I personally have used XML-RPC on several projects, and it always did exactly what I was hoping for. I was usually going between C++, Java and Python. I use libxmlrpc in Python often because it's easy to memorize and type interactively, but it is actually much slower than the alternative pyxmlrpc.
PyAMF is mostly for RPC with Flash clients, but it's a compact RPC format worth looking at too.
When you have Python on both ends, I don't believe anything beats Pyro (Python Remote Objects.) Pyro even has a "name server" that lets services announce their availability to a network. Clients use the name server to find the services it needs no matter where they're active at a particular moment. This gives you free redundancy, and the ability to move services from one machine to another without any downtime.
For security, I'd tunnel over SSH, or use TLS or SSL at the connection level. Of course, all these options are essentially the same, they just have various difficulties of setup.
Pyro (Python Remote Objects) is fairly clever if all your server/clients are going to be in Python. I use XMPP alot though since I'm communicating with hosts that are not always Python. XMPP lends itself to being extended fairly easily too.
There is an excellent XMPP library for python called PyXMPP which is reasonably up to date and has no dependancy on Twisted.
I suggest you look at 1. XMLRPC 2. JSONRPC 3. SOAP 4. REST/ATOM
XMLRPC is a valid choice. Don't worry it is too old. That is not a problem. It is so simple that little needed changing since original specification. The pro is that in every programming langauge I know there is a library for a client to be written in. Certainly for python. I made it work with mod_python and had no problem at all.
The big problem with it is its verbosity. For simple values there is a lot of XML overhead. You can gzip it of cause, but then you loose some debugging ability with the tools like Fiddler.
My personal preference is JSONRPC. It has all of the XMLRPC advantages and it is very compact. Further, Javascript clients can "eval" it so no parsing is necessary. Most of them are built for version 1.0 of the standard. I have seen diverse attempts to improve on it, called 1.1 1.2 and 2.0 but they are not built one on top of another and, to my knowledge, are not widely supported yet. 2.0 looks the best, but I would still stick with 1.0 for now (October 2008)
Third candidate would be REST/ATOM. REST is a principle, and ATOM is how you convey bulk of data when it needs to for POST, PUT requests and GET responses.
For a very nice implementation of it, look at GData, Google's API. Real real nice.
SOAP is old, and lots lots of libraries / langauges support it. IT is heeavy and complicated, but if your primary clients are .NET or Java, it might be worth the bother.
Visual Studio would import your WSDL file and create a wrapper and to C# programmer it would look like local assembly indeed.
The nice thing about all this, is that if you architect your solution right, existing libraries for Python would allow you support more then one with almost no overhead. XMLRPC and JSONRPC are especially good match.
Regarding authentication. XMLRPC and JSONRPC don't bother defining one. It is independent thing from the serialization. So you can implement Basic Authentication, Digest Authentication or your own with any of those. I have seen couple of examples of client side Digest Authentication for python, but am yet to see the server based one. If you use Apache, you might not need one, using mod_auth_digest Apache module instead. This depens on the nature of your application
Transport security. It is obvously SSL (HTTPS). I can't currently remember how XMLRPC deals with, but with JSONRPC implementation that I have it is trivial - you merely change http to https in your URLs to JSONRPC and it shall be going over SSL enabled transport.
HTTP seems to suit your requirements and is very well supported in Python.
Twisted is good for serious asynchronous network programming in Python, but it has a steep learning curve, so it might be worth using something simpler unless you know your system will need to handle a lot of concurrency.
To start, I would suggest using urllib for the client and a WSGI service behind Apache for the server. Apache can be set up to deal with HTTPS fairly simply.
SSH can be a good choice for file transfer and remote control, especially if you are concerned with secure login. Most Linux and Solaris servers will already run an SSH service for administration, so if your Python program use ssh then you don't need to open up any additional ports or services on remote machines.
OpenSSH is the standard and portable SSH client and server, and can be used via subprocesses from Python. If you want more flexibility Twisted includes Twisted Conch which is a SSH client and server implementation which provides flexible programmable control of an SSH stack, on both Linux and Windows. I use both in production.
I'd use http and start with understanding what the Python library offers.
Then I'd move onto the more industrial strength Twisted library.
There is no need to use HTTP (indeed, HTTP is not good for RPC in general in some respects), and no need to use a standards-based protocol if you're talking about a python client talking to a python server.
Use a Python-specific RPC library such as Pyro, or what Twisted provides (Twisted.spread).
XMLRPC is very simple to get started with, and at my previous job, we used it extensively for intra-node communication in a distributed system. As long as you keep track of the fact that the None value can't be easily transferred, it's dead easy to work with, and included in Python's standard library.
Run it over https and add a username/password parameter to all calls, and you'll have simple security in place. Not sure about how easy it is to verify server certificate in Python, though.
However, if you are transferring large amounts of data, the coding into XML might become a bottleneck, so using a REST-inspired architecture over https may be as good as xmlrpclib.
Facebook's thrift project may be a good answer. It uses a light-weight protocol to pass object around and allows you to use any language you wish. It may fall-down on security though as I believe there is none.
In the RPC field, Json-RPC will bring a big performance improvement over xml-rpc:
http://json-rpc.org/wiki/python-json-rpc