Fixed Length Socket communication from python to C - python

Working on updating my application that uses sockets to communicate from a python to a C program to use fixed length headers as a protocol when sending messages.
As an test example of my C Client code:
/*
Socket definition code put inside here
*/
char test2[] = "hello this is a test";
uint32_t test = sizeof(test2); //Get size of message and store in 32bit int
uint32_t test_conv = htonl(test); //Convert to network byte order
header = send(ConnectSocket, &test_conv, sizeof(uint32_t), 0); //Send size of message
body = send(ConnectSocket, test2, sizeof(test2), 0); //Send the actual message
Here is excerpt of python server code:
msg = clientsocket.recv(4)
msg_length = int.from_bytes(msg, sys.byteorder) //Get the header with the message length
msg_length = socket.ntohl(msg_length)
if len(msg) > 0:
print(f"This message length: {test} ")
msg = clientsocket.recv(msg_length) //Get the message
if len(msg)>0:
print(f'Message is: {msg.decode("utf-8")}')
Server.py output:
The message length: 21
Message is: hello this is a test
I am omitting the socket headers and stuff as well as error checking to save space in this post.
Is this a safe way to go about using fixed length headers? I am not expecting a high amount of traffic and I would only be sending a sentence or two worth of information in each message.
My final question is why when calling send in C do I use '&test_conv' for the message parameter. Would that not just be sending the address value for the test_conv variable. Don't I need to send the actual message, not the address?
Thanks for any insight and please provide any links to resources if I should be using a different implementation.

Your solution relying on a fixed sized integer announcing the number
of bytes in the following message seems correct to me.
Just be sure to use consistently sizeof and/or strlen() for
the textual message in your actual program.
In your example, transmitting sizeof(text2) bytes includes the
implicit '\0' at the end of text2; then the python string
which is build at reception contains this useless (invisible
but nevertheless present) null-byte as last char.
Concerning &test_conv in send(), you need to understand that
this system call only considers a sequence of bytes.
It does not know that these bytes consist in an integer.
That's why you provide the address of the first byte to be sent
(&test_conv) and the number of bytes to be sent
(sizeof(uint32_t)) starting from this address.
The receiver will obtain this exact same sequence of bytes an
interpret them as the memory representation of a 32-bit integer
(considering endianness of course).
Note that the struct package in python could help dealing
with memory representation of integers and endianness.

It seems to me you should use network byte order, not host byte order.
https://pythontic.com/modules/socket/byteordering-coversion-functions
https://www.ibm.com/support/knowledgecenter/en/SSB27U_6.4.0/com.ibm.zvm.v640.kiml0/asonetw.htm
Also, I believe send() and recv() are allowed to return early, without transmission of all that was requested - that's why they return lengths. Usually they will transmit all of your data and return the full length, but they aren't guaranteed to do so.
For python, I like my http://stromberg.dnsalias.org/~dstromberg/bufsock.html module. It takes care of resumption for you.
For C, I think people pretty much just use a while loop for each send() and recv().

Related

depythonifying 'char', got 'str' for pyobjc

0
Story would be: I was using a hardware which can be automatic controlled by a objc framework, it was already used by many colleagues so I can see it as a "fixed" library. But I would like to use it via Python, so with pyobjc I can already connect to this device, but failed to send data into it.
The objc command in header is like this
(BOOL) executeabcCommand:(NSString*)commandabc
withArgs:(uint32_t)args
withData:(uint8_t*)data
writeLength:(NSUInteger)writeLength
readLength:(NSUInteger)readLength
timeoutMilliseconds:(NSUInteger)timeoutMilliseconds
error:(NSError **) error;
and from my python code, data is an argument which can contain 256bytes of data such
as 0x00, 0x01, 0xFF. My python code looks like this:
senddata=Device.alloc().initWithCommunicationInterface_(tcpInterface)
command = 'ABCw'
args= 0x00
writelength = 0x100
readlength = 0x100
data = '\x50\x40'
timeout = 500
success, error = senddata.executeabcCommand_withArgs_withData_writeLength_readLength_timeoutMilliseconds_error_(command, args, data, writelength, readlength, timeout, None)
Whatever I sent into it, it always showing that.
ValueError: depythonifying 'char', got 'str'
I tired to dig in a little bit, but failed to find anything about convert string or list to char with pyobjc
Objective-C follows the rules that apply to C.
So in objc as well as C when we look at uint8_t*, it is in fact the very same as char* in memory. string differs from this only in that sense that it is agreed that the last character ends in \0 to indicate that the char* block that we call string has its cap. So char* blocks end with \0 because, well its a string.
What do we do in C to find out the length of a character block?
We iterate the whole block until we find \0. Usually with a while loop, and break the loop when you find it, your counter inside the loop tells you your length if you did not give it somehow anyway.
It is up to you to interpret the data in the desired format.
Which is why sometime it is easier to cast from void* or to take indeed a char* block which is then cast to and declared as uint8_t data inside the function which makes use if it. Thats the nice part of C to be able to define that as you wish, use that force that was given to you.
So to make your life easier, you could define a length parameter like so
-withData:(uint8_t*)data andLength:(uint64_t)len; to avoid parsing the character stream again, as you know already it is/or should be 256 characters long. The only thing you want to avoid at all cost in C is reading attempts at indices that are out of bound throwing an BAD_ACCESS exception.
But this basic information should enable you to find a way to declare your char* block containing uint8_t data addressed with the very first pointer (*) which also contains the first uint8_t character of the block as str with a specific length or up to the first appearance of \0.
Sidenote:
objective-c #"someNSString" == pythons u"pythonstring"
PS: in your question is not clear who throw that error msg.
Python? Because it could not interpret the data when receiving?
Pyobjc? Because it is python syntax hell when you mix with objc?
The objc runtime? Because it follows the strict rules of C as well?
Python has always been very forgiving about shoe-horning one type into another, but python3 uses Unicode strings by default, which need to be converted into binary strings before plugging into pyobjc methods.
Try specifying the strings as byte objects as b'this'
I was hitting the same error trying to use IOKit:
import objc
from Foundation import NSBundle
IOKit = NSBundle.bundleWithIdentifier_('com.apple.framework.IOKit')
functions = [("IOServiceGetMatchingService", b"II#"), ("IOServiceMatching", b"#*"),]
objc.loadBundleFunctions(IOKit, globals(), functions)
The problem arose when I tried to call the function like so:
IOServiceMatching('AppleSmartBattery')
Receiving
Traceback (most recent call last):
File "<pyshell#53>", line 1, in <module>
IOServiceMatching('AppleSmartBattery')
ValueError: depythonifying 'charptr', got 'str'
While as a byte object I get:
IOServiceMatching(b'AppleSmartBattery')
{
IOProviderClass = AppleSmartBattery;
}

Finding a Python 2.7 -> 3.8 struct.Pack("H") into Strings for joining into Lists?

This is.. a long one. So I apologize for any inconsistency regarding code and problems. I'll be sure to try and add as much of the source code as I can to make sure the issue is as clear as possible.
This project at work is an attempt at converting Python 2 to 3, and thus far has been mildly straightforward. My coworker and I have reached a point though where no amount of googling or searching has given a straight answer, so here we are.
Alright, Starting offwith...
Python 2 code:
listBytes[102:104]=struct.pack('H',rotation_deg*100) # rotational position in degrees
listBytes[202:204]=struct.pack('H',rotation_deg*100) # rotational position in degrees
listBytes[302:304]=struct.pack('H',rotation_deg*100) # rotational position in degrees
# this continues on for a while in the same fashion
Where rotation_deg is a float between 0.00 and 359.99 (but for testing is almost always changing between 150-250)
For the purpose of testing, we're going to make rotation_deg be 150.00 all the time.
a = listBytes[102:104]=struct.pack('H',150.00*100)
print a
print type(a)
The print out of the following is:
�:
<type 'str'>
From what I understand, in the Python2 version of struct.pack, it is packing the floats as shorts, which then are "added" to the list as a short. Python 2 sees it as a string, and adds no encoding to the string (will get to that later for python 3). All simple and good, then a few more bits and bobs of dropping more stuff into the list and we get to:
return ''.join(listBytes)
Which, is being sent back to a simple variable:
bytes=self.UpdatePacket(bytes,statusIndex,rotation_deg,StatusIdList,StatusValueList, Stat,offsetUTC)
To then be sent along as a string through
sock.sendto(bytes , (host, port) )
This all comes together to look like this:
A string with a bunch of bytes (I think)
This is the working version, in which we are sending the bytes along the socket, data is being retrieved, and everyone is happy. If I missed anything, please let me know, otherwise, lets move to...
Python 3
This is where the Fun Begins
There are a few changes that are required between Python 2 and 3 right off the bat.
struct.pack('H',rotation_deg*100) requires an INT type to be packed, meaning all instances of packing had to be given int(rotatin_deg*100) as to not error the program.
sock.sendto(bytes, (host, port)) did not work anymore as the socket needed a bytes object to send something. No more strings that look like bytes, they had to be properly encoded to send properly. So now this becomes sock.sendto(bytes.encode(), (host, port)) to properly encode the "bytes" string.
As more of a background, the length of listBytes should always be 1206. Anymore and our socket won't work properly, and the issue is that no matter what we try with this python 3 code, the .join seems to be sending a LOT more than just byte objects, often quintupling the length of listBytes and breaking the socket.sendto .
listBytes[102:104] = struct.pack('H', int(rotation_deg * 100)) # rotational position in degrees
listBytes[202:204] = struct.pack('H', int(rotation_deg * 100)) # rotational position in degrees
listBytes[302:304] = struct.pack('H', int(rotation_deg * 100)) # rotational position in degrees
# continues on in this fashion again
return ''.join(str(listBytes))
returns to:
bytes = self.UpdatePacket(bytes, statusIndex, rotation_deg, StatusIdList, StatusValueList, Stat, offsetUTC)
sock.sendto(bytes.encode(), (host, port))
Here's where things start getting weird
a = struct.pack('H', int(150.00 * 100))
returns:
b'\x98:', with it's type being <class 'bytes'>, which is fine and the value we want, except we specifically need to store this variable into the list as maybe a string... to encode it later to send as a byte object for the socket.
You're starting to see the problem, yes?
The thing is, we've tried just about every technique to convert the two bytes that struct.pack returns into a string of some kind, and we've been able to convert it over, but then we run into the issue of the .join being evil.
Remember when I was talking about listBytes had to remain a size of 1206 or else it would break? For some reason, if we .join literally anything other than the two bytes as a string, we think python is trying to add a bunch of other stuff that we don't need.
So for now, we're focusing on trying to match the python 2 equivalent to python 3.
Here's what we've tried
binascii.hexlify(struct.pack('H', int(150.00 * 100))).decode() returns '983a'
str(struct.pack('H', int(150.00 * 100.00)).decode()) returns an error, 'utf-8' codec can't decode byte 0x98 in position 0: invalid start byte
str(struct.pack('H', int(150.00 * 100.00)).decode("utf-16")) returns '㪘'. Can't even begin to understand that.
return b''.join(listBytes) returns an error because there are int's at the start of the list.
return ''.join(str(listBytes)).encode('utf-8') still is adding a bunch of nonsense.
Now we get to the .join, and the first loop around it seems.. fine? It has 1206 as listBytes length before .joining, but on the second loop around, it creates a massive influx of junk, making the list 5503 in length. Third go around it becomes 27487, and finally on the last go around, it becomes to large for the socket to handle and I get slapped with [WinError 10040] A message sent on a datagram socket was larger than the internal message buffer or some other network limit, or the buffer used to receive a datagram into was smaller than the datagram itself
Phew, if you made it this far, thank you. Any help at all would be extremely appreciated. If you have questions or I'm missing something, let me know.
Thanks!
You’re just trying too hard. It may be helpful to think of bytes and unicode rather than the ambiguous str type (which is the former in Python 2 and the latter in Python 3). struct always produces bytes, and socket needs to send bytes, regardless of the language version. Put differently, everything outside your process is bytes, although some represent characters in some encoding chosen by the rest of the system. Characters are a data type used to manipulate text within your program, just like floats are a data type used to manipulate “real numbers”. You have no characters here, so you definitely don’t need encode.
So just accumulate your bytes objects and then join them with b''.join (if you can’t just directly use the buffer into which your slice assignments seem to be writing) to keep them as bytes.

Converting an IP Address string to exactly 4 bytes in Python

So this is very simple, but I'm having trouble getting this to work. I want to, for example, if the incoming IP address string is '168.108.114.22', convert this to a bytes object like:
\xA8\x6C\x72\x16
Basically each part of the IP address is converted to it's hexadecimal equivalent.
I've tried so many ways but couldn't get what I want. String manipulation, using socket.inet_aton, packing, etc. I want to be able to send these bytes over a socket and then receive and parse them at the other end, but I am having trouble just getting my bytes object created and looking like that.
Python's inet_aton function should do what you need, it does return a string containing exactly 4 bytes:
import socket
print socket.inet_aton('168.108.114.22')
print socket.inet_aton('65.66.67.68')
These would display:
¨lr
ABCD
And to convert the four characters back again using inet_ntoa:
print socket.inet_ntoa('\xA8\x6C\x72\x16')
print socket.inet_ntoa('ABCD')
Giving:
65.66.67.68
this
ip='168.108.114.22'
b_out = bytes(map(int,ip.split('.')))
print(b_out)
on python 3 produces
b'\xa8lr\x16'
which should be what you are looking for, if I understand correctly.
Note: there are more specific and optimized utility functions to manipulate IP addresses

C++ wxsocket TCP server send unsigned char array but python client get 4 more bytes in

Hi I sent an wxImage using C++ TCP socket to python TCP clientlike this:
C++ TCP Server is like this:
//this part is the TCP server m_sock send the unsigned char* ImageData
std::stringstream imageStream1;
imageStream1 << ImageData;
m_sock->Write(imageStream1.str().c_str(), imageStream1.str().length());
//Then I send a simple string "hello"
std::stringstream dataStream2;
dataStream2 << "hello";
m_sock->Write(dataStream2.str().c_str(), dataStream2.str().length());
dataStream2.clear();
So I receive the two message in python
// This is used to receive the image data with the exact same bytes of the ImageData
packet = socket.recv(ImageDataSize)
myWxImage.SetData(packet)
// This is used to receive the second "hello" message
packet = socket.recv(1000)
I can received the image successfully. But when I print the message, it shows "****hello" but not "hello". There is an additional 4 bytes string in front of it. the four "*" is something python can not print out. what is it? Can I get rid of it?
std::stringstream imageStream1;
imageStream1 << ImageData;
m_sock->Write(imageStream1.str().c_str(), imageStream1.str().length());
looks like a bad idea. I don't know the type of ImageData, but converting it to a std::stringstream is definitely not the best way to go about this. If ImageData just contains a reference to a raw image buffer, use that. Images are not strings, and obviously, things will go wrong if you get the c_str() from something that might contain \x00 in its data (because that's what normally terminates C strings).
Then:
//Then I send a simple string "hello"
std::stringstream dataStream2;
dataStream2 << "hello";
m_sock->Write(dataStream2.str().c_str(), dataStream2.str().length());
dataStream2.clear();
Um, do you understand what you're writing here? You take a perfectly valid C string , "hello", you push it onto a stringstream, just to get a C string out again?
To be honest: I think you're copy&pasting code from an example without understanding it. You should go back and understand every single line before usage.
How to remove the "****"
In general to just remove characters from a python string you can do this:
cleaned_data = str(packet).replace("*", "")
Please remember that when you receive data in packet that you are receiving bytes and not a string, so if you try to print directly python has do an implicit conversion for you. In this case, I think it would be better to do an explicit conversion and remove the beginning.
However, it doesn't solve the problem of why you are getting the characters in the first place. These 4 "*" could potentially stem from an encoding issue.
Trouble Shooting
It would be worth it to connect the python program to debugger (maybe PyCharm) so that you could see the actual values that are before "Hello", this would give you an idea of where to look so that you do not have to deal with conversions from bytes to unicode or whatever locale your console is in.
If you could get that and post that info it might help others to help you.

How to decode a variable string from C?

Trying to program an UDP connection. The client is in python and the server is in C.
In my python code I defined my PDU as a struct (using the struct module) in this format: 'B 5s 50s' (unsigned char, char[5], char[50]). The issue is that if the strings are not filled, the remainder is garbage, which I should remove.
After unpacking the response from server, if I do:
str = str_from_c.split('\0',1)
It returns me this:
['useful data', '\x00\x01\x00\x00\x00r\x00\x00\x00\xae\xf2d\xb7\x18\x00\x00\x00\x02\x00\x00\x00\x0f\x00\x00\x00\x94\xecd\xb7\xa8\xe6\xb0\t\x00\x00\x00\x00\x00\xa9]\xb7\xca\xf1d\xb7']
How I can dispose the second part?
By despise do you mean dispose? If you just want the text, then only take that from the result - note we're not calling the variable str here as that will shadow the builtin str:
text, rest = str_from_c.split('\0', 1)
Then just use text and if you need rest you've got it for later...
Note that in the case of splitting once, then str.partition is preferred, eg:
text, rest = str_from_c.partition('\0')[::2]
As this ensures there's always a 3-tuple result so that unpacking will always be successful even if no actual split occurred.
How I can despise the second part?
Just do not send it?
The server code probably looks like this:
char buffer[123];
<init buffer partially, terminating the the part with a 0 >
write(..., buffer, 123);
Change it to be
write(..., buffer, strlen(buffer) + 1);

Categories

Resources