Measuring packet per second for connected clients IP addresses

Measuring packet per second for connected clients IP addresses - python

How can we determine the packet rate of clients connected to our server in case of multi client server using Winsock. The idea I came up with is keeping a frequency map for IP addresses of all the clients and storing the packets count for some arbitrary amount k seconds. Now after k seconds we traverse the map and see what IP addresses have more than 100*k packets, now we block these IP addresses. After every k seconds we empty the map and start again.
PSEUDO CODE: (k = 10)
map<string,int> map;
void calculate() {
for(auto &ip : map){
if(ip.second>10000) blacklist(ip.first);
}
map.clear();
Sleep(10000);
calculate();
}
int s = socket(AF_INET,SOCK_STREAM, IPPROTO_TCP);
// bind(), listen()
calculate();
while(1) {
if(recv(s,buff,len)>0) map[client.ip]++;
}

Per comments:
If someone is sending too fast, I'd like to block him permanently rather than receiving his messages less frequently. Something like this is what I'm trying to achieve
If this was UDP, I'd 100% be onboard with what you are trying to do and give the code I have. But this is TCP and your assumptions are flawed.
Let's say the sender invokes this:
send(sock, buffer, 1000, 0);
And then on the other side, you invoke this:
recv(sock, buffer, 1000, 0)
Did you know that recv may do any of the following:
It may return any value less than or equal to 1000. It could return 1 and expect you to invoke it another 999 times to consume the entire message. One of the biggest confusions with TCP socket is assuming that each send call mirrors a recv call in a 1:1 fashion. Lots of buggy apps have shipped that way.
More probably, you'll get 1 or 2 recv calls because of IP fragmentation and/or TCP segmentation. How fast you invoke recv also But this is never guaranteed or expected to be consistent. What you observe with local testing on your on LAN will not resemble actual internet behavior.
How many recv calls you get has nothing to do with how many actual IP packets or TCP segments, because "the packets" will get coalesced anyway by the TCP stack on the recv side.
Similarly, how many bytes you pass to send doesn't influence the packet count. TCP, including any number of routers and gateways in between, may split up this 1000 byte stream into additional fragments and segments.
I'm going to offer two suggestions:
Detect flood attacks by counting application protocol messages and/or the size of these application protocol messages - but not individual recv calls. That is, as you recv data, you'll accumulate this data stream into logical protocol messages based on a fixed size of bytes or a delimiter based message structure and pass it up to a higher part of your application for processing. Do the incremental count then.
Instead of trying to thwart flood attacks at the message level, it's probably simpler to just throttle clients to a fixed rate of data. That is, each time you recv data, count how many bytes it returns and use with a timer to measure an incoming bytes/second rate. If the remote side exceeds your limit, insert sleep statements in between recv calls. This will implicitly make TCP slow the other side down from sending too fast.

Related

Reliable udp using tcp retransmit [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
I am client receiver of UDP multicast data sent by Sender server (Stock Exchange data). I am continuously receiving udp multicast packet flow sequentially numbered 1 to approximately 35,000,000 sent uniformly over a period of 6 hours . I need to ensure all packets upto say N are received before the set of N packets is periodically processed after every say ~ 256 packets. i.e. I need reliable UDP.
Reliable UDP is mimicked using TCP retransmit. If any udp packet(s) is lost/not received, it is requested by using tcp protocol by specifying the desired missing packet range (starting number, ending number).
Sender keeps record of all the packets (stock exchange data) it has sent via UDP multicast so far. So Sender will resend by TCP only those packets numbers that the receiver specifically requests for via TCP. This is how UDP reliability is achieved by receiver. The UDP drop ratio is very small (less than 0.001%) except when starting the UDP multicast in the middle of the day, in which case all previously sent UDP packets from 1 to some N will need to be resent on TCP, while live transmission of UDP multicast data packet number N+1 onward is being received.) I can't request Sender (Stock Exchange) to change its protocol--it is fixed.
What is the efficient algorithm to implement this in terms of CPU?
The issue is speed BigOh. I can make a naive algorithm using several nested loops and methods, but it not necessarily the best.
I am thinking of maintaining a number N which confirms I have received UDP
packets 1 through N, and any packet no. M which is not the next expected packet no. N+1 will be buffered, for say 256 packets, and then TCP will be used to request the missing numbers. Then normal UDP reception resumes over from the last confirmed received number after TCP request is filled.
Example:
Suppose UDP packets received by receiver are in the following sequence {1,2,3,6,7,8,9,10 ...}
After packet No. 3, the next packet is No. 6. Packets 4 through 5 are missing.
So the missing packets {4,5} are requested using TCP request({4 through 5}), and {6,7,8,9,10} are buffered. There is enough space on the 10GBaseT LAN card for buffering 35,000,000 packets.
So: receive UDP {1,2,3}, refill by TCP request {4,5}, continue receive UDP {6,7,8,9,10, ...}

I assume since you are using multicast that there are going to be multiple receivers of this data? (Because if not, you'd probably be using unicast instead)
Therefore, if the receivers are going to have the option of requesting TCP retransmission of packets they didn't get, that means that the transmitting program will need to keep a copy of recently-sent UDP packets in memory, so that when it receives a retransmit-request, it will have the requested data available to retransmit. Assuming you're stamping each packet with a unique ID, it can store this data in a std::map or std::unordered_map or similar for quick lookup.
The real question becomes, how much of this old-packet data should the transmitter retain? ideally it would retain all of it, because you never know how much a given receiver might have missed and might want to request; but that would require infinite memory so that's not a realistic option. Probably the best you can do is decide how much RAM you're willing to tie up for this purpose, and keep a count of the total number of bytes you have in your table, and when it reaches the limit, start dropping the oldest packets from the table in order to keep its size under the limit.
I wrote an open-source library that uses essentially the technique you describe (multicast UDP + TCP-retransmit-to-recover-from-packet-loss) to synchronize databases across multiple hosts as quickly as possible; some things I learned while implementing it include:
If/when you can, pack your data-messages together into larger packets, up to the MTU of the network you are transmitting over (e.g. 1388 bytes for IPv4/Ethernet). Very small packet-sizes (like 48-bytes/packet) are inefficient, since the fixed-sized packet-headers make up a greater percentage of the total data sent/received.
Only try to send when your sending-socket indicates it is ready-for-write. (i.e. don't assume that you will never fill up the socket's outgoing-data-buffer; if your traffic is "bursty", you probably will at some point)
Minimize UDP packet loss by making your UDP sockets' send and receive buffers as large as you can get away with
Further minimize UDP packet loss by doing all the UDP receiving in a dedicated, high-priority thread (which can then route the received UDP data back to a normal-priority thread for further processing -- the main thing is to avoid allowing the receiving UDP-socket's incoming-data-buffer to overflow if possible)
For the TCP retransmission part, keep in mind that TCP streams can potentially slow down to nearly zero bytes-per-second in the worst case scenario, which makes it important to ensure that poor TCP performance to client A doesn't block the TCP communications to/from clients B, C, D, etc. This can be accomplished either via non-blocking I/O and select() (or poll() or similar), or asynchronous networking, or via multiple threads; avoid blocking I/O unless you are implementing a thread-per-socket model (and probably avoid that model as well, since a thread that is indefinitely-blocked-inside-recv() is difficult to shut down cleanly)
Think about under what circumstances (if any) it is acceptable for a client to never receive a particular packet at all; are there situations where that is okay? Or must the entire system grind to a halt until every receiver has received every packet in the group, regardless of how long that might take?
If you want to get really fancy, you can look into Forward Error Correction algorithms that encode data across packets, such that the receiver can still decode all of the data even if it never receives (up to a certain percentage of) the packets. This makes the need for a re-transmit request less likely, at the cost of making all of the packets slightly larger.

reading both tcp and udp packets from same socket

I am trying to read packets in a router, like this in python:
# (skipping the exception handling code here)
s = socket.socket(socket.AF_PACKET, socket.SOCK_RAW, socket.ntohs(0x0003))
while True:
p = s.recvfrom(2000)
pkt = p[0]
# process pkt here ...
Answers to a related question (36115971) say that parameters and methods for UDP vs TCP data are different (some say recv is for TCP and recvfrom is for UDP, and others say the opposite, similarly some say 1024 as buffer size for TCP and larger for UDP, and again some say the reverse). In my case of reading in a router, I do not have different sockets for TCP and UDP, so I need to read both from the same socket, so I am bit confused regarding how I should read the incoming packets.
(1) Should I use recv() or recvfrom(), if I want to read both TCP and UDP packets?
(2) Do the calls return data one packet at a time, or do they return after the buffer is filled up? eg, if I have a large buffer of 4096 bytes, and the incoming streaming 2 packets have 2400 bytes each, will the call return as soon as the 1st packet ends, or will it return after filling up the buffer from the 2nd packet also?
(2a) same question, but if I have a smaller buffer of 2000 bytes. It is clear that on the 1st call I will get the first 2000 bytes of the 1st packet. But on the next call, will I get the last 400 bytes of the 1st packet, or the first 2000 bytes of the 2nd packet?
(3) If I am delayed in making the next call, maybe because I was busy processing the 1st dataset, am I in danger of losing data, or will the OS keep its internal queue of the incoming packets to be given to me when I call the next time? If the OS keeps its internal queue, where can I find information about its size?
NOTE: Some of the given replies have been divergent, so let me put in some boundaries to my question. Hopefully these restrictions will help to give more specific answers.
(a) My objective is to sniff the incoming packets with python sockets only. So other solutions involving tcpdump or tshark etc are outside the scope.
(b) The objective is to only sniff for incoming packets. Additional details like packet reordering (for connection oriented protocols like TCP) are outside the scope, actually they are avoidable overhead.

If you're reading packets from a raw socket (as shown in your source code), then you can easily read all packets from the same socket. Be sure this is what you intend to do. A raw socket is for doing packet inspection for troubleshooting, forensic, security or educational purposes. You cannot easily communicate with another system this way.
And likewise, the receive calls will not differ here by protocol because you are not actually using TCP or UDP, you're simply receiving the raw packets that those protocols build and decode.
(1) Should I use recv() or recvfrom(), if I want to read both TCP and UDP packets?
Either one will work. recv() will return to you only the actual packet data, while recvfrom will return to you the data along with metadata about the packet, including the interface from which the data was received (and other things defined in struct sockaddr_ll from the packet(7) man page).
(2) Do the calls return data one packet at a time, or do they return after the buffer is filled up? eg, if I have a large buffer of 4096 bytes, and the incoming streaming 2 packets have 2400 bytes each, will the call return as soon as the 1st packet ends, or will it return after filling up the buffer from the 2nd packet also?
When using a raw socket like this, you get exactly one packet at a time. You will never get more than one. If the buffer you give is not large enough, then the packet will be truncated (with the ending bytes discarded).
(2a) same question, but if I have a smaller buffer of 2000 bytes. It is clear that on the 1st call I will get the first 2000 bytes of the 1st packet. But on the next call, will I get the last 400 bytes of the 1st packet, or the first 2000 bytes of the 2nd packet?
Generally speaking, packets on most networks are limited to about 1514 bytes. This is because the traditional "MTU" (Maximum Transfer Unit) that is configured on the network interface is 1500 bytes and usually an Ethernet header containing two MAC addresses (6 bytes each) plus a two-byte Ethertype is prepended to that. In a switch or router, you may also see packets that have an additional 4-byte header containing a VLAN header (IEEE 802.1Q). (But, some networks internally use "jumbo" packets up to about 9K in size for specific purposes.)
You should also understand that, in writing an application, one can send UDP datagrams (or TCP buffers) larger than the maximum packet size. In that case, the OS breaks those up into smaller chunks for sending (and they are re-assembled on the destination side before being handed to an application). When you're receiving raw packets like this, you will see the packets in their low-level, possibly fragmented, state.
(3) If I am delayed in making the next call, maybe because I was busy processing the 1st dataset, am I in danger of losing data, or will the OS keep its internal queue of the incoming packets to be given to me when I call the next time? If the OS keeps its internal queue, where can I find information about its size?
The OS will keep a queue of packets for you. The size is of course limited since there is no way you would be able to keep up with, say, a 1Gb NIC at full line rate (let alone a 10Gb or higher NIC). The size is configured in a system-specific way. On linux -- and probably other Unix-based systems -- you can call getsockopt with SOL_SOCKET / SO_RCVBUF to get an idea of the queue space available.
On linux, at least, the size can be set with setsockopt up to a system-imposed maximum (which itself can be configured with various sysctl settings).

I think you should not do that, because TCP assures various things like reliability, ordering, flow control, and congestion. However UDP does not guarantee anything.
These parameters are defined in the moment of creation of the socket by operating system. That is why I think that you cannot do that you are saying.
Open two different sockets, one native UDP sock and one native TCP sock.

How does the python socket.recv() method know that the end of the message has been reached?

Let's say I'm using 1024 as buffer size for my client socket:
recv(1024)
Let's assume the message the server wants to send to me consists of 2024 bytes.
Only 1024 bytes can be received by my socket. What's happening to the other 1000 bytes?
Will the recv-method wait for a certain amount of time (say 2 seconds) for more data to come and stop working after this time span? (I.e., if the rest of the data arrives after 3 seconds, the data will not be received by the socket any more?)
or
Will the recv-method stop working immediately after having received 1024 bytes of data? (I.e. will the other 1000 bytes be discarded?)
In case that 1.) is correct ... is there a way for me to to determine the amount of time, the recv data should wait before returning or is it determined by the system? (I.e. could I tell the socket to wait for 5 seconds before stopping to wait for more data?)
UPDATE:
Assume, I have the following code:
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((sys.argv[1], port))
s.send('Hello, world')
data = s.recv(1024)
print("received: {}".format(data))
s.close()
Assume that the server sends data of size > 1024 bytes. Can I be sure that the variable "data" will contain all the data (including those beyond the 1024th byte)?
If I can't be sure about that, how would I have to change the code so that I can always be sure that the variable "data" will contain all the data sent (in one or many steps) from the server?

It depends on the protocol. Some protocols like UDP send messages and exactly 1 message is returned per recv. Assuming you are talking about TCP specifically, there are several factors involved. TCP is stream oriented and because of things like the amount of currently outstanding send/recv data, lost/reordered packets on the wire, delayed acknowledgement of data, and the Nagle algorithm (which delays some small sends by a few hundred milliseconds), its behavior can change subtly as a conversation between client and server progresses.
All the receiver knows is that it is getting a stream of bytes. It could get anything from 1 to the fully requested buffer size on any recv. There is no one-to-one correlation between the send call on one side and the recv call on the other.
If you need to figure out message boundaries its up to the higher level protocols to figure that out. Take HTTP for example. It starts with a \r\n delimited header and then has a count of the remaining bytes the client should expect to receive. The client knows how to read the header because of the \r\n then knows exactly how many bytes are coming next. Part of the charm of RESTful protocols is that they are HTTP based and somebody else already figured this stuff out!
Some protocols use NUL to delimit messages. Others may have a fixed length binary header that includes a count of any variable data to come. I like zeromq which has a robust messaging system on top of TCP.
More details on what happens with receive...
When you do recv(1024), there are 6 possibilities
There is no receive data. recv will wait until there is receive data. You can change that by setting a timeout.
There is partial receive data. You'll get that part right away. The rest is either buffered or hasn't been sent yet and you just do another recv to get more (and the same rules apply).
There is more than 1024 bytes available. You'll get 1024 of that data and the rest is buffered in the kernel waiting for another receive.
The other side has shut down the socket. You'll get 0 bytes of data. 0 means you will never get more data on that socket. But if you keep asking for data, you'll keep getting 0 bytes.
The other side has reset the socket. You'll get an exception.
Some other strange thing has gone on and you'll get an exception for that.

Limiting TCP sending rate

TCP flows by their own nature will grow until they fill the maximum capacity of the links used from src to dst (if all those links are empty).
Is there an easy way to limit that ? I want to be able to send TCP flows with a maximum X mbps rate.
I thought about just sending X bytes per second using the socket.send() function and then sleeping the rest of the time. However if the link gets congested and the rate gets reduced, once the link gets uncongested again it will need to recover what it could not send previously and the rate will increase.

At the TCP level, the only control you have is how many bytes you pass off to send(), and how often you call it. Once send() has handed over some bytes to the networking stack, it's entirely up to the networking stack how fast (or slow) it wants to send them.
Given the above, you can roughly limit your transmission rate by monitoring how many bytes you have sent, and how much time has elapsed since you started sending, and holding off subsequent calls to send() (and/or the number of data bytes your pass to send()) to keep the average rate from going higher than your target rate.
If you want any finer control than that, you'll need to use UDP instead of TCP. With UDP you have direct control of exactly when each packet gets sent. (Whereas with TCP it's the networking stack that decides when to send each packet, what will be in the packet, when to resend a dropped packet, etc)

Stop tcp packets from concatenating

I have two apps sending tcp packages, both written in python 2. When client sends tcp packets to server too fast, the packets get concatenated. Is there a way to make python recover only last sent package from socket? I will be sending files with it, so I cannot just use some character as packet terminator, because I don't know the content of the file.

TCP uses packets for transmission, but it is not exposed to the application. Instead, the TCP layer may decide how to break the data into packets, even fragments, and how to deliver them. Often, this happens because of the unterlying network topology.
From an application point of view, you should consider a TCP connection as a stream of octets, i.e. your data unit is the byte, not a packet.
If you want to transmit "packets", use a datagram-oriented protocol such as UDP (but beware, there are size limits for such packets, and with UDP you need to take care of retransmissions yourself), or wrap them manually. For example, you could always send the packet length first, then the payload, over TCP. On the other side, read the size first, then you know how many bytes need to follow (beware, you may need to read more than once to get everything, because of fragmentation). Here, TCP will take care of in-order delivery and retransmission, so this is easier.

TCP is a streaming protocol, which doesn't expose individual packets. While reading from stream and getting packets might work in some configurations, it will break with even minor changes to operating system or networking hardware involved.
To resolve the issue, use a higher-level protocol to mark file boundaries. For example, you can prefix the file with its length in octets (bytes). Or, you can switch to a protocol that already handles this kind of stuff, like http.

First you need to know if the packet is combined before it is sent or after. Use wireshark to check it the sender is sending one packet or two. If it is sending one, then your fix is to call flush() after each write. I do not know the answer if the receiver is combining packets after receiving them.
You could change what you are sending. You could send bytes sent, followed by the bytes. Then the other side would know how many bytes to read.

Normally, TCP_NODELAY prevents that. But there are very few situations where you need to switch that on. One of the few valid ones are telnet style applications.
What you need is a protocol on top of the tcp connection. Think of the TCP connection as a pipe. You put things in one end of the pipe and get them out of the other. You cannot just send a file through this without both ends being coordinated. You have recognised you don't know how big it is and where it ends. This is your problem. Protocols take care of this. You don't have a protocol and so what you're writing is never going to be robust.
You say you don't know the length. Get the length of the file and transmit that in a header, followed by the number of bytes.
For example, if the header is a 64bits which is the length, then when you receive your header at the server end, you read the 64bit number as the length and then keep reading until the end of the file which should be the length.
Of course, this is extremely simplistic but that's the basics of it.
In fact, you don't have to design your own protocol. You could go to the internet and use an existing protocol. Such as HTTP.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.