The socket module in python wraps the _socket module which is the C implementation stuff. As well, socket.socket will take a _sock parameter that must implement the _socket interface. In some regards _sock must be an actual instance of the underlying socket type from _socket since the C code does type checking (unlike pure python).
Given that you can pass in a socket-like object for _sock, it seems like you could write a socket emulator that you could pass in to socket.socket. It would need to emulate the underlying socket behavior but in memory and not over a real network or anything. It should not otherwise be distinguishable from _socket.
What would it take to build out this sort of emulation?
I know, I know, this is not terribly practical. In fact, I learned the hard way that using regular sockets was easier and fake sockets were unnecessary. I had thought that I would have better control of a test environment with fake sockets. Regardless of the time I "wasted", I found that I learned a bunch about sockets and about python in the process.
I was guessing any solution would have to be a stack of interacting objects like this:
something_that_uses_sockets (like XMLRPCTransport for ServerProxy)
|
V
socket.socket
|
V
FakeSocket
|
V
FakeNetwork
|
V
FakeSocket ("remote")
|
V
socket.socket
|
V
something_else_that_uses_sockets (like SimpleXMLRPCServer)
It seems like this is basically what is going on for real sockets, except for a real network instead of a fake one (plus OS-level sockets), and _socket instead of FakeSocket.
Anyway, just for fun, any ideas on how to approach this?
Incidently, with a FakeSocket you could do some socket-like stuff in Google Apps...
It's already been done. Twisted uses this extensively for unit tests of its protocol implementations. A good starting place would be looking at some of Twisted's unit tests.
In essence, you'd just call makeConnection on your protocol with a transport that isn't connected to a real socket. Super easy!
Related
Is there a way to use Python's socket module to replicate some of the functionality of the Linux ss utility?
ss is used to dump socket statistics. It allows showing information similar to netstat. It can display more TCP and state informations than other tools.
Pretty much all of the socket documentation revolves around creating new sockets, but I can't find any information on getting statistics from the system's sockets.
I don't see how such functionality could be implemented purely with the socket module. The socket module is for working directly with sockets: opening/closing, sending/receiving, etc. It's simply a thin wrapper over the standard BSD socket interface.
On the other hand, getting metadata about existing sockets already allocated on the system requires knowledge of the other processes running on a system. This has little to do with actually manipulating a socket, and much more to do with monitoring other processes and/or their file descriptors.
For example, it seems that both ss and netstat are actually implemented (at least on Linux) by reading and parsing the /proc pseudo-filesystem. (See here and here, for examples.) The kernel manages the processes and their opened sockets, and exposes (some of) the information about them to other processes via procfs. This provides a simple and safe way of exporting some of the information about processes to userspace, obviating the need for lots of system calls or reading kernel data structures directly.
Note that it pretty much has to work this way. Strong process isolation necessitates that information about another process's open files, including sockets, has to come through the kernel in some way. That could be either via procfs on Linux, or some kernel-provided API (e.g. libproc on macOS). Anything else would be a massive security hole.
As an alternative to the socket module, you could try the psutil package or something similar. The psutil.net_connections() function seems appropriate.
What does this line mean?
s=socket.socket(socket.AF_INET, socket.SOCK_STREAM)
What does this syntax mean socket.socket() and socket.AF_INET`?
Can't we use just AF_INET and Stream as parameter?
import socket # for socket
import sys
try:
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
print "Socket successfully created"
except socket.error as err:
print "socket creation failed with error %s" %(err)
# default port for socket
port = 80
try:
host_ip = socket.gethostbyname('www.google.com')
except socket.gaierror:
# this means could not resolve the host
print "there was an error resolving the host"
sys.exit()
# connecting to the server
s.connect((host_ip, port))
print "the socket has successfully connected to google \
on port == %s" %(host_ip)
Object names reside in a namespace - a space containing names. A module (for example, socket) has its own namespace. The syntax is:
namespace.name
So socket.socket means the socket name (which happens to be a function) from the socket module - the module name comes first then the function name. If we omit the namespace then it assumes the current one, which in a simple single file program is called __main__.
We can arrange it so we import names into our own namespace and don't need to specify the module name, which is what you asked for:
from socket import *
but that's dangerous for a couple of reasons and is called namespace pollution.
One is that we can't easily determine where something comes from - the code you show is quite short and not typical.
The other reason is namespace collisions. What if two modules happen to both use the same name, for example closedown? The last one defined is the one which will be used - there will be no warning that one has masked the other because python is designed to be dynamic.
So we know that socket.socket comes from the socket module, and not from some module describing car tools or one concerning electrical circuits. If we want we can use all three in the same program, but we must specify the namespace first.
Unfortunately you will see from module import * quite a lot because people are lazy. You can get away with it in a small program but you would be taking a risk - over time programs only ever get bigger and more complicated, they never get smaller and simpler.
There are other ways to use import: you can restrict importing only certain names and you can create aliases, but you should learn more about programming before using them. They have their uses but when they are appropriate is a judgement decision.
You have imported the socket module, so everything from that module that you use will have "socket." in front of it.
So socket.socket() means run the socket() function from the socket module.
You have to write socket.AF_INET because AF_INET is also from the socket module, so this means get the AF_INET constant from the socket module. Similar logic applies for socket.SOCK_STREAM.
For more on sockets: https://docs.python.org/2/library/socket.html
Also, in terms of learning to code in general, copying code and then trying to understand it can work, but it is much more powerful to try to understand the underlying concepts and then write your own code.
Many of the Python standard libraries are fairly thin wrappers around the underlying system libraries. They expose many of the idiosyncrasies of the underlying OS facilities, and you have to be familiar with the underlying system to properly understand their semantics.
If you really want to understand sockets, there are many excellent introductions to the topic. Most of them will require some familiarity with C, which may be a bit of a distraction (but understanding the basics of C is probably also a good investment of your time if you expect to be spending more of it reading and writing code).
You could very well create a more pythonic replacement for the Python socket module with proper encapsulation of the underlying facilities. It is unclear whether it would serve any useful purpose, though. Most trivial uses of sockets get by with a small number of slightly opaque but common enough pieces of "copy/paste programming" that most readers will understand roughly what's going on in the code; others are involved enough that they do require full access to, and understanding of, the underlying facility.
https://docs.python.org/3.10/library/socket.html
class socket.socket(family=AF_INET, type=SOCK_STREAM, proto=0, fileno=None)
AF_INET is a constant that represents the address family (i.e. IPv4).
SOCK_STREAM is a constant that represents the socket type (i.e. TCP).
Using the following code it seems I can fairly easily reconstruct a socket in a child process using multiprocessing.reduction..
import socket,os
import multiprocessing
from multiprocessing.reduction import reduce_handle, rebuild_handle
client = socket.socket()
client.connect(('google.com', 80))
rd = reduce_handle(client.fileno())
print "Parent: %s" % (os.getpid())
def test(x):
print "Child: %s" % (os.getpid())
build = rebuild_handle(x)
rc = socket.fromfd(build, socket.AF_INET, socket.SOCK_STREAM)
rc.send('GET / HTTP/1.1\n\n')
print rc.recv(1024)
p = multiprocessing.Process(target=test, args=(rd,))
p.start()
p.join()
I have a Twisted game server that runs multiple matches at the same time. These matches may contain several players, each of whom has a Protocol instance. What I'd like to do is have matches split across a pool of Twisted subprocesses, and have the pools handle the clients of the matches they're processing themselves. It seems like reading/writing the client's data and passing that data to and from the subprocesses would be unnecessary overhead.
The Protocols are guaranteed to be TCP instances so I believe I can (like the above code) reduce the socket like this:
rd = reduce_handle(myclient.transport.fileno())
Upon passing that data to a subprocess by looking at the Twisted source it seems I can reconstruct it in a subprocess now like this:
import socket
from twisted.internet import reactor, tcp
from multiprocessing.reduction import reduce_handle, rebuild_handle
handle = rebuild_handle(rd)
sock = socket.fromfd(handle, socket.AF_INET, socket.SOCK_STREAM)
protocol = MyProtocol(...)
transport = tcp.Connection(sock, protocol, reactor=reactor)
protocol.transport = transport
I would just try this, but seeing as I'm not super familiar with the Twisted internals even if this works I don't really know what the implications might be.
Can anyone tell me whether this looks right and whether it would work? Is this inadvisable for any some reason (I've never seen it mentioned in Twisted documentation or posts even though it seems quite relevant)? If this works, anything I should be wary of?
Thanks in advance.
Twisted and the multiprocessing module are incompatible with each other. If the code appears to work, it's only by luck and accident and a future version of either (there may well be no future versions of multiprocessing but there will probably be futures versions of Twisted) might turn this good luck into bad luck.
twisted.internet.tcp also isn't a great module to use in your applications. It's not exactly private but you also can't rely on it always working with the reactor your application uses, either. For example, iocp reactor uses twisted.internet.iocpreactor.tcp instead and will not work at all with twisted.internet.tcp (I don't expect it's very likely you'll be using iocp reactor with this code and the rest of the reactors Twisted ships with do use twisted.internet.tcp but third-party reactors may not and future versions of Twisted may change how the reactors are implemented).
There are two parts of the problem you're solving. One part is conveying the file descriptor between two processes. The other part is convincing the reactor to start monitoring the file descriptor and dispatching its events.
It's possible the risk of using multiprocessing.reduction with Twisted is minimal because there doesn't seem to be anything to do with process management in that module. Instead, it's just about pickling sockets. So you may be able to continue to convey your file descriptors using that method (and you might want to do this if you wanted to avoid using Twisted in the parent process for some reason - I'm not sure, but it doesn't sound like this is the case). However, an alternative to this is to use twisted.python.sendmsg to pass these descriptors over a UNIX socket - or better yet, to use a higher-level layer that handles the fiddly sendmsg bits for you: twisted.protocols.amp. AMP supports an argument type that is a file descriptor, letting you pass file descriptors between processes (again, only over a UNIX socket) just like you'd pass any other Python object.
As for the second part, you can add an already-established TCP connection to the reactor using reactor.adoptStreamConnection. This is a public interface that you can rely on (as long as the reactor actually implements it - which not all reactors do: you can introspect the reactor using twisted.internet.interfaces.IReactorSocket.providedBy(reactor) if you want to do some kind of graceful degradation or user-friendly error reporting).
What library should I use for network programming? Is sockets the best, or is there a higher level interface, that is standard?
I need something that will be pretty cross platform (ie. Linux, Windows, Mac OS X), and it only needs to be able to connect to other Python programs using the same library.
You just want to send python data between nodes (possibly on separate computers)? You might want to look at SimpleXMLRPCServer. It's based on the inbuilt HTTP server, which is based on the inbuilt Socket server, neither of which are the most industrial-strength servers around, but it's easy to set up in a hurry:
from SimpleXMLRPCServer import SimpleXMLRPCServer
server = SimpleXMLRPCServer(("localhost", 9876))
def my_func(a,b):
return a + b
server.register_function(my_func)
server.serve_forever()
And easy to connect to:
import xmlrpclib
s = xmlrpclib.ServerProxy('http://localhost:9876')
print s.my_func(2,3)
>>> 5
print type(s.my_func(2,3))
>>> <type 'int'>
print s.my_func(2,3.0):
>>> 7.0
Twisted is popular for industrial applications, but it's got a brutal learning curve.
There is a framework that you may be interested in: Twisted
the answer depends on what you are trying to do.
"What library should I use for network programming?" is pretty vague.
for example, if you want to do HTTP, you might look at such standard libraries as urllib, urllib2, httplib, sockets. It all depends on which protocol you are looking to use, and which network layer you want to work at.
there are libraries in python for various network tasks... email, web, rpc, etc etc...
for starters, look over the standard library reference manual and see which tasks you want to do, then go from there: http://docs.python.org/library/index.html
As previously mentioned, Twisted is the most popular (by far). However, there are a lot of other alternative worth exploring. Tornado and Diesel are probably the top two contenders. A more complete comparison is found here.
Personally I just use asyncore from the standard library, which is a bit like a very cut-down version of Twisted, but this is because I prefer a simple and low level interface. If you want a higher level interface, especially just to communicate with another instance of your own program, you don't necessarily have to worry about the networking layer, and can consider something higher level like RPyC or pyro instead. The network then becomes an implementation detail and you can concentrate on just sending the information.
A lot of people like Twisted. I was a huge fan for awhile, but on working with it a bit and thinking about it more I've become disenchanted. It's complex, and last I looked, a lot of it had the assumption that your program would always be able to send data leading to possible situations in which your program grows memory usage endlessly buffering data to send that isn't being picked up by the remote side or isn't being picked up fast enough.
In my opinion, it depends a lot on what kind of network programming you want to do. A lot of times you don't really care about getting stuff done while you're waiting for IO. HTTP, for example, is very request-response oriented, and if you're only talking to a single server there is little reason to need something like Twisted and plain sockets or Python's built-in HTTP libraries will work fine.
If you're writing a server of any kind, you almost certainly need to be event-driven. Twisted has a slight edge there, but it still seems overly complex to me. Bittorrent, for example, was written in Python and doesn't use Twisted at all.
Another factor favoring Twisted is that there is code for a lot of protocols already written for it. So if you want to speak an existing protocol a lot of hard work may already have been done for you.
The socket module in the standard lib is in my opinion a good choice if you don't need high performance.
It is a very famous API that is known by almost every developpers of almost every languages. It's quite sipmple and there is a lot of information available on the internet. Moreover, it will be easier for other people to understand your code.
I guess that an event-driven framework like Twisted has better performance but in basic cases standard sockets are enough.
Of course, if you use a higher-level protocol (http, ftp...), you should use the corresponding implementation in the python standard library.
Socket is low level api, it is mapped directly to operating system interface.
Twisted, Tornado ... are high level framework (of course they are built on socket because socket is low level).
When it come to TCP/IP programming, you should have some basic knowledge to make a decision about what you shoud use:
Will you use well-known protocol like HTTP, FTP or create your own protocol?
Blocking or non-blocking? Twisted, Tornado are non-blocking framework (basically like nodejs).
Of course, socket can do everything because every other framework is base on its ;)
Can someone please tell how to write a Non-Blocking server code using the socket library alone.Thanks
Frankly, just don't (unless it's for an exercise). The Twisted Framework will do everything network-related for you, so you have to write only your protocol without caring about the transport layer. Writing socket code is not easy, so why not use code somebody else wrote and tested.
Why socket alone? It's so much simpler to use another standard library module, asyncore -- and if you can't, at the very least select!
If you're constrained by your homework's condition to only use socket, then I hope you can at least add threading (or multiprocessing), otherwise you're seriously out of luck -- you can make sockets with timeout, but juggling timing-out sockets without the needed help from any of the other obvious standard library modules (to support either async or threaded serving) is a serious mess indeed-y...;-).
Not sure what you mean by "socket library alone" - you surely will need other modules from the standard Python library.
The lowest level of non-blocking code is the select module. This allows you to have many simultaneous client connections, and reports which of them have input pending to process. So you select both the server (accept) socket, plus any client connections that you have already accepted. A thin layer on top of that is the asyncore module.
Use eventlets or gevent. It monkey patches existing libraries. socket module can be used without any changes. Though code appears synchronous, it executes asynchronously.
Example:
http://eventlet.net/doc/examples.html#socket-connect