C/Python Socket Performance?

C/Python Socket Performance? - python

my question simply relates to the difference in performance between a socket in C and in Python. Since my Python build is CPython, I assume it's similar, but I'm curious if someone actually has "real" benchmarks, or at least an opinion that's evidence based.
My logics is as such:
C socket much faster? then write a C
extension.
not/barely a difference?
keep writing in Python and figure out
how to obtain packet level control
(scapy? dpkt?)
I'm sure someone will want to know for either context or curiosity. I plan to build a sort of proxy for myself (not for internet browsing, anonymity, etc) and will bind the application I want to use with it to a specific port. Then, all packets on said port will be queued, address header modified, and then sent, etc, etc.
Thanks in advance.

In general, sockets in Python perform just fine. For example, the reference implementation of the BitTorrent tracker server is written in Python.
When doing networking operations, the speed of the network is usually the limiting factor. That is, any possible tiny difference in speed between C and Python's socket code is completely overshadowed by the fact that you're doing networking of some kind.
However, your description of what you want to do indicates that you want to inspect and modify individual IP packets. This is beyond the capabilities of Python's standard networking libraries, and is in any case a very OS-dependent operation. Rather than asking "which is faster?" you will need to first ask "is this possible?"

i would think C would be faster, but python would be a lot easier to manage and use.
the difference would be so small, you wouldn't need it unless you were trying to send masses amount of data (something stupid like 1 million gb/second lol)
joe

Related

Cleanest method of inter-program communication on separate devices

I need to settle on a method of communication between multiple programs spread out over multiple machines. The data itself is fairly straight forwards, comprising of variable length vectors with some meta-data descriptors.
Each program "type" will send data to 1 other program type, and expects to see a reply from it.
The number of connections a given program has is variable over time, and programs can be added or removed at any moment.
Programs may be distributed across several processors, which may use different operating systems, or could be microprocessors.
I have not had to really contend with such a inter-communication puzzle before and am unsure of what the cleanest approach would be. I've heard that ROS might be useful, but that it may not be suitable for Windows environments and has a steep learning curve. I can imagine creating a database between the nodes which would act as a kind of blackboard, but this feels like it may be very inefficient. If using sockets would resolve efficiencies, then how would the connections be managed / maintained?
My main development environment at the moment is Julia, but I can port the data over to C++ or Python without too much discomfort.
I am open to ideas and learning new things, but I can't figure out which path to take. Any nuggets of wisdom would be greatly appreciated.
As requested in a comment, here is some additional information:
Data: The data is a vector of floating point numbers, which can vary in size anywhere from 1 value to 1000's of values (maybe more depending on other aspects of the project, but I'm capping it for prototyping purposes). The meta-data is in json format and provides information such as the total length of the array, and a few identifiers (all integers).
Reliability: I can handle a few missed messages here and there. I'd need the data in the order it was sent though. So if a packet of data arrives after a more recent one it will be ignored. (In case you were instead referring to data integrity, then the data must not be corrupted in transit).
Latency: As little as possible as the program could start to introduce errors otherwise. That being said, for prototyping purposes I can persevere with 10's of milliseconds average, and irregular 100's of milliseconds. I would very much like to keep this down though.
Standard IP Facilities: You may assume all devices have these, yes.

Perhaps taking a look at something well established (and therefore should be able to hook up to whichever language/s you end up using) like Redis or RabbitMQ would be wise seeing as you'll need to be able to communicate with a number of different devices over a network. Either way having some form of central hub that handles communication among the other devices would be advisable.

twisted python request/response message and substantial binary data transfer

I am trying to implement a server using python-twisted with potential C# and ObjC clients. I started with LineReceiver and that works well for basic messaging, but I can't figure out the best approach for something more robust. Any ideas for a simple solution for the following requirements?
Request and response
ex. send message to get a status, receive status back
Recieve binary data transfer (non-trivial, but not massive - less than a few megs)
ex. bytes of a small png file
AMP seems like a feasible solution for the first scenario, but may not be able to handle the size for the data transfer scenario.
I've also looked at full blown SOAP but haven't found a decent enough example to get me going.

I like AMP a lot. twisted.protocols.amp is moderately featureful and relatively easily testable (although documentation on how to test applications written with it is a little lacking).
The command/response abstraction AMP provides is comfortable and familiar (after all, we live in a world where HTTP won). AMP avoids the trap of excessive complexity (seemingly for the sake of complexity) that SOAP fell squarely into. But it's not so simple you won't be able to do the job with it (like LineReceiver most likely is).
There are intermediate steps - for example, twisted.protocols.basic.Int32Receiver gives you a more sophisticated framing mechanism (32 bit length prefixes instead of magic-bytes-terminated-lines) - but in my opinion AMP is a really good first choice for a protocol. You may find you want to switch to something else later (one size really does not fit all) but AMP is at the sweet spot between features and simplicity that seems like a good fit for a very broad range of applications.
It's true that there are some built-in length limits in AMP. This is a long standing sore spot that is just waiting for someone with a real-world need to address it. :) There is a fairly well thought-out design for lifting this limit (without breaking protocol compatibility!). If AMP seems otherwise appealing to you then I encourage you to engage the Twisted development community to find out how you can help make this a reality. ;)
There's also always the option of using AMP for messaging and to set up another channel (eg, HTTP) for transferring your larger chunks of data.

Why is there a need for Twisted?

I have been playing around with the twisted framework for about a week now(more because of curiosity rather than having to use it) and its been a lot of fun doing event driven asynchronous network programming.
However, there is something that I fail to understand. The twisted documentation starts off with
Twisted is a framework designed to be very flexible and let you write powerful servers.
My doubt is :- Why do we need such an event-driven library to write powerful servers when there are already very efficient implementations of various servers out there?
Surely, there must have been more than a couple of concrete implementations which the twisted developers had in mind while writing this event-driven I\O library. What are those? Why exactly was twisted made?

In a comment on another answer, you say "Every library is supposed to have ...". "Supposed" by whom? Having use-cases is certainly a nice way to nail down your requirements, but it's not the only way. It also doesn't make sense to talk about the use-cases for all of Twisted at once. There is no use case that justifies every single API in Twisted. There are hundreds or thousands of different use cases, each which justifies a lesser or greater subdivision of Twisted. These came and went over the years of Twisted's development, and no attempt has been made to keep a list of them. I can say that I worked on part of Twisted Names so that I would have a topic for a paper I was presenting at the time. I implemented the vt102 parser in Twisted Conch because I am obsessed with terminals and wanted a fun project involving them. And I implemented the IMAP4 support in Twisted Mail because I worked at a company developing a mail server which required tighter control over the mail store than any other IMAP4 server at the time offered.
So, as you can see, different parts of Twisted were written for widely differing reasons (and I've only given examples of my own reasons, not the reasons of any other developers).
The initial reason for a program being written often doesn't matter much in the long run though. Now the code is written: Twisted Names now runs the DNS for many domain names on the internet, the vt102 parser helped me get a job, and the company that drove the IMAP4 development is out of business. What really matters is what useful things you can do with the code now. As MattH points out, the resulting plethora of functionality has resulted in a library that (perhaps uniquely) addresses a wide array of interesting problems.

Why do we need such an event-driven library to write powerful servers when there are already very efficient implementations of various servers out there?
So paraphrasing: you can't imagine why anyone would need a toolkit when dyecast products already exist?
I'm guessing you've never needed to knock up a protocol gateway, e.g.
- write a daemon to md5 local files on demand over a unix socket
- interrogate a piece of software using udp and expose statistics over http.
I wrote a little proof-of-concept for the second example for a question here on SO in a handful of minutes. I couldn't do that without twisted.
Have you looked at: ProjectsUsingTwisted?

More on 'why': (disclaimer: I'm not a developer of Twisted proper), it's necessary to consider Twisted's high age (relative to Python's). When Twisted was written there was no sufficiently powerful non-blocking network/event driven library written around the reactor pattern (almost everyone was using threads back then). Twisted's initial use case was a large multiplayer game, although the specifics of this game seems to be somewhat lost in time.
Since the origins, as #MattH's link suggest, a very large amount of various network servers written in Python is based on Twisted.

This PyCon talk by the creator of Twisted should give you answers.
It has changed my opinion of Twisted. Before I viewed it as a massive piece of software with interfaces and weird names, two things that many developers dislike but that are actually just superficial things, and now that I’ve seen the history behind and the amazing number of use cases I respect it a lot. Life is short, you need Twisted :)

Suggestion Needed - Networking in Python - A good idea?

I am considering programming the network related features of my application in Python instead of the C/C++ API. The intended use of networking is to pass text messages between two instances of my application, similar to a game passing player positions as often as possible over the network.
Although the python socket modules seems sufficient and mature, I want to check if there are limitations of the python module which can be a problem at a later stage of the development.
What do you think of the python socket module :
Is it reliable and fast enough for production quality software ?
Are there any known limitations which can be a problem if my app. needs more complex networking other than regular client-server messaging ?
Thanks in advance,
Paul

Check out Twisted, a Python engine for Networking. Has built-in support for TCP, UDP, SSL/TLS, multicast, Unix sockets, a large number of protocols (including HTTP, NNTP, IMAP, SSH, IRC, FTP, and others)

Python is a mature language that can do almost anything that you can do in C/C++ (even direct memory access if you really want to hurt yourself).
You'll find that you can write beautiful code in it in a very short time, that this code is readable from the start and that it will stay readable (you will still know what it does even after returning one year later).
The drawback of Python is that your code will be somewhat slow. "Somewhat" as in "might be too slow for certain cases". So the usual approach is to write as much as possible in Python because it will make your app maintainable. Eventually, you might run into speed issues. That would be the time to consider to rewrite a part of your app in C.
The main advantages of this approach are:
You already have a running application. Translating the code from Python to C is much more simple than write it from scratch.
You already have a running application. After the translation of a small part of Python to C, you just have to test that small part and you can use the rest of the app (that didn't change) to do it.
You don't pay a price upfront. If Python is fast enough for you, you'll never have to do the optional optimization.
Python is much, much more powerful than C. Every line of Python can do the same as 100 or even 1000 lines of C.

To answer #1, I know that among other things, EVE Online (the MMO) uses a variant of Python for their server code.

The python that EVE online uses is StacklessPython (http://www.stackless.com/), and as far as i understand they use it for how it implements threading through using tasklets and whatnot. But since python itself can handle stuff like MMO with 40k people online i think it can do anything.
This bad answer and not really an answer to your question, rather addition to previous answer.
Alan.

What is the feasibility of porting a legacy C program to Python?

I have a program in C that communicates via UDP with another program (in Java) and then does process manipulation (start/stop) based on the UDP pkt exchange.
Now this C program has been legacy and I want to convert it to Python - do you think Python will be a good choice for the tasks mentioned?

Yes, I do think that Python would be a good replacement. I understand that the Twisted Python framework is quite popular.

I'd say that if:
Your C code contains no platform specific requirements
You are sure speed is not going to be an issue going from C to python
You have a desire to not compile anymore
You would like to try utilise exception handling
You want to dabble in OO
You might choose to run on many platforms without porting
You are curious about dynamic typing
You want memory handled for you
You know or want to learn python
Then sure, why not.
There doesn't seem to be any technical reason you shouldn't use python here, so it's a preference in this case.

Remember as well, you can leave parts of your program in C, turn them into Python modules and build python code around them - you don't need to re-write everything up-front.

Assuming that you have control over the environment which this application will run, and that the performance of interpreted language (python) compared to a compiled one (C) can be ignored, I believe Python is a great choice for this.

If I was faced with a similar situation I'd ask myself a couple of questions:
Is there anything more important I could be working on?
Does Python bring anything to the table that is currently handled poorly by the current application?
Will this allow me to add functionality that was previously too difficult to implement?
Is this going to disrupt service in any way?
If I can't answer those satisfactorily, then I'd put off the rewrite.

Yes, I think Python is a good choice, if all your platforms support it. Since this is a network program, I'm assuming the network is your runtime bottleneck? That's likely to still be the case in Python. If you really do need to speed it up, you can include your long-since-debugged, speedy C as Python modules.

If this is an embedded program, then it might be a problem to port it since Python programs typically rely on the Python runtime and library, and those are fairly large. Especially when compared to a C program doing a well-defined task. Of course, it's likely you've already considered that aspect, but I wanted to mention it in the context of the question anyway, since I feel it's an important aspect when doing this type of comparison.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.