How to decode a variable string from C?

How to decode a variable string from C? - python

Trying to program an UDP connection. The client is in python and the server is in C.
In my python code I defined my PDU as a struct (using the struct module) in this format: 'B 5s 50s' (unsigned char, char[5], char[50]). The issue is that if the strings are not filled, the remainder is garbage, which I should remove.
After unpacking the response from server, if I do:
str = str_from_c.split('\0',1)
It returns me this:
['useful data', '\x00\x01\x00\x00\x00r\x00\x00\x00\xae\xf2d\xb7\x18\x00\x00\x00\x02\x00\x00\x00\x0f\x00\x00\x00\x94\xecd\xb7\xa8\xe6\xb0\t\x00\x00\x00\x00\x00\xa9]\xb7\xca\xf1d\xb7']
How I can dispose the second part?

By despise do you mean dispose? If you just want the text, then only take that from the result - note we're not calling the variable str here as that will shadow the builtin str:
text, rest = str_from_c.split('\0', 1)
Then just use text and if you need rest you've got it for later...
Note that in the case of splitting once, then str.partition is preferred, eg:
text, rest = str_from_c.partition('\0')[::2]
As this ensures there's always a 3-tuple result so that unpacking will always be successful even if no actual split occurred.

How I can despise the second part?
Just do not send it?
The server code probably looks like this:
char buffer[123];
<init buffer partially, terminating the the part with a 0 >
write(..., buffer, 123);
Change it to be
write(..., buffer, strlen(buffer) + 1);

Related

depythonifying 'char', got 'str' for pyobjc

0
Story would be: I was using a hardware which can be automatic controlled by a objc framework, it was already used by many colleagues so I can see it as a "fixed" library. But I would like to use it via Python, so with pyobjc I can already connect to this device, but failed to send data into it.
The objc command in header is like this
(BOOL) executeabcCommand:(NSString*)commandabc
withArgs:(uint32_t)args
withData:(uint8_t*)data
writeLength:(NSUInteger)writeLength
readLength:(NSUInteger)readLength
timeoutMilliseconds:(NSUInteger)timeoutMilliseconds
error:(NSError **) error;
and from my python code, data is an argument which can contain 256bytes of data such
as 0x00, 0x01, 0xFF. My python code looks like this:
senddata=Device.alloc().initWithCommunicationInterface_(tcpInterface)
command = 'ABCw'
args= 0x00
writelength = 0x100
readlength = 0x100
data = '\x50\x40'
timeout = 500
success, error = senddata.executeabcCommand_withArgs_withData_writeLength_readLength_timeoutMilliseconds_error_(command, args, data, writelength, readlength, timeout, None)
Whatever I sent into it, it always showing that.
ValueError: depythonifying 'char', got 'str'
I tired to dig in a little bit, but failed to find anything about convert string or list to char with pyobjc

Objective-C follows the rules that apply to C.
So in objc as well as C when we look at uint8_t*, it is in fact the very same as char* in memory. string differs from this only in that sense that it is agreed that the last character ends in \0 to indicate that the char* block that we call string has its cap. So char* blocks end with \0 because, well its a string.
What do we do in C to find out the length of a character block?
We iterate the whole block until we find \0. Usually with a while loop, and break the loop when you find it, your counter inside the loop tells you your length if you did not give it somehow anyway.
It is up to you to interpret the data in the desired format.
Which is why sometime it is easier to cast from void* or to take indeed a char* block which is then cast to and declared as uint8_t data inside the function which makes use if it. Thats the nice part of C to be able to define that as you wish, use that force that was given to you.
So to make your life easier, you could define a length parameter like so
-withData:(uint8_t*)data andLength:(uint64_t)len; to avoid parsing the character stream again, as you know already it is/or should be 256 characters long. The only thing you want to avoid at all cost in C is reading attempts at indices that are out of bound throwing an BAD_ACCESS exception.
But this basic information should enable you to find a way to declare your char* block containing uint8_t data addressed with the very first pointer (*) which also contains the first uint8_t character of the block as str with a specific length or up to the first appearance of \0.
Sidenote:
objective-c #"someNSString" == pythons u"pythonstring"
PS: in your question is not clear who throw that error msg.
Python? Because it could not interpret the data when receiving?
Pyobjc? Because it is python syntax hell when you mix with objc?
The objc runtime? Because it follows the strict rules of C as well?

Python has always been very forgiving about shoe-horning one type into another, but python3 uses Unicode strings by default, which need to be converted into binary strings before plugging into pyobjc methods.

Try specifying the strings as byte objects as b'this'
I was hitting the same error trying to use IOKit:
import objc
from Foundation import NSBundle
IOKit = NSBundle.bundleWithIdentifier_('com.apple.framework.IOKit')
functions = [("IOServiceGetMatchingService", b"II#"), ("IOServiceMatching", b"#*"),]
objc.loadBundleFunctions(IOKit, globals(), functions)
The problem arose when I tried to call the function like so:
IOServiceMatching('AppleSmartBattery')
Receiving
Traceback (most recent call last):
File "<pyshell#53>", line 1, in <module>
IOServiceMatching('AppleSmartBattery')
ValueError: depythonifying 'charptr', got 'str'
While as a byte object I get:
IOServiceMatching(b'AppleSmartBattery')
{
IOProviderClass = AppleSmartBattery;
}

How to convert a regular string to a raw string? [duplicate]

I have a string s, its contents are variable. How can I make it a raw string? I'm looking for something similar to the r'' method.

i believe what you're looking for is the str.encode("string-escape") function. For example, if you have a variable that you want to 'raw string':
a = '\x89'
a.encode('unicode_escape')
'\\x89'
Note: Use string-escape for python 2.x and older versions
I was searching for a similar solution and found the solution via:
casting raw strings python

Raw strings are not a different kind of string. They are a different way of describing a string in your source code. Once the string is created, it is what it is.

Since strings in Python are immutable, you cannot "make it" anything different. You can however, create a new raw string from s, like this:
raw_s = r'{}'.format(s)

As of Python 3.6, you can use the following (similar to #slashCoder):
def to_raw(string):
return fr"{string}"
my_dir ="C:\data\projects"
to_raw(my_dir)
yields 'C:\\data\\projects'. I'm using it on a Windows 10 machine to pass directories to functions.

raw strings apply only to string literals. they exist so that you can more conveniently express strings that would be modified by escape sequence processing. This is most especially useful when writing out regular expressions, or other forms of code in string literals. if you want a unicode string without escape processing, just prefix it with ur, like ur'somestring'.

For Python 3, the way to do this that doesn't add double backslashes and simply preserves \n, \t, etc. is:
a = 'hello\nbobby\nsally\n'
a.encode('unicode-escape').decode().replace('\\\\', '\\')
print(a)
Which gives a value that can be written as CSV:
hello\nbobby\nsally\n
There doesn't seem to be a solution for other special characters, however, that may get a single \ before them. It's a bummer. Solving that would be complex.
For example, to serialize a pandas.Series containing a list of strings with special characters in to a textfile in the format BERT expects with a CR between each sentence and a blank line between each document:
with open('sentences.csv', 'w') as f:
current_idx = 0
for idx, doc in sentences.items():
# Insert a newline to separate documents
if idx != current_idx:
f.write('\n')
# Write each sentence exactly as it appared to one line each
for sentence in doc:
f.write(sentence.encode('unicode-escape').decode().replace('\\\\', '\\') + '\n')
This outputs (for the Github CodeSearchNet docstrings for all languages tokenized into sentences):
Makes sure the fast-path emits in order.
#param value the value to emit or queue up\n#param delayError if true, errors are delayed until the source has terminated\n#param disposable the resource to dispose if the drain terminates
Mirrors the one ObservableSource in an Iterable of several ObservableSources that first either emits an item or sends\na termination notification.
Scheduler:\n{#code amb} does not operate by default on a particular {#link Scheduler}.
#param the common element type\n#param sources\nan Iterable of ObservableSource sources competing to react first.
A subscription to each source will\noccur in the same order as in the Iterable.
#return an Observable that emits the same sequence as whichever of the source ObservableSources first\nemitted an item or sent a termination notification\n#see ReactiveX operators documentation: Amb
...

Just format like that:
s = "your string"; raw_s = r'{0}'.format(s)

With a little bit correcting #Jolly1234's Answer:
here is the code:
raw_string=path.encode('unicode_escape').decode()

s = "hel\nlo"
raws = '%r'%s #coversion to raw string
#print(raws) will print 'hel\nlo' with single quotes.
print(raws[1:-1]) # will print hel\nlo without single quotes.
#raws[1:-1] string slicing is performed

The solution, which worked for me was:
fr"{orignal_string}"
Suggested in comments by #ChemEnger

I suppose repr function can help you:
s = 't\n'
repr(s)
"'t\\n'"
repr(s)[1:-1]
't\\n'

Just simply use the encode function.
my_var = 'hello'
my_var_bytes = my_var.encode()
print(my_var_bytes)
And then to convert it back to a regular string do this
my_var_bytes = 'hello'
my_var = my_var_bytes.decode()
print(my_var)
--EDIT--
The following does not make the string raw but instead encodes it to bytes and decodes it.

Fixed Length Socket communication from python to C

Working on updating my application that uses sockets to communicate from a python to a C program to use fixed length headers as a protocol when sending messages.
As an test example of my C Client code:
/*
Socket definition code put inside here
*/
char test2[] = "hello this is a test";
uint32_t test = sizeof(test2); //Get size of message and store in 32bit int
uint32_t test_conv = htonl(test); //Convert to network byte order
header = send(ConnectSocket, &test_conv, sizeof(uint32_t), 0); //Send size of message
body = send(ConnectSocket, test2, sizeof(test2), 0); //Send the actual message
Here is excerpt of python server code:
msg = clientsocket.recv(4)
msg_length = int.from_bytes(msg, sys.byteorder) //Get the header with the message length
msg_length = socket.ntohl(msg_length)
if len(msg) > 0:
print(f"This message length: {test} ")
msg = clientsocket.recv(msg_length) //Get the message
if len(msg)>0:
print(f'Message is: {msg.decode("utf-8")}')
Server.py output:
The message length: 21
Message is: hello this is a test
I am omitting the socket headers and stuff as well as error checking to save space in this post.
Is this a safe way to go about using fixed length headers? I am not expecting a high amount of traffic and I would only be sending a sentence or two worth of information in each message.
My final question is why when calling send in C do I use '&test_conv' for the message parameter. Would that not just be sending the address value for the test_conv variable. Don't I need to send the actual message, not the address?
Thanks for any insight and please provide any links to resources if I should be using a different implementation.

Your solution relying on a fixed sized integer announcing the number
of bytes in the following message seems correct to me.
Just be sure to use consistently sizeof and/or strlen() for
the textual message in your actual program.
In your example, transmitting sizeof(text2) bytes includes the
implicit '\0' at the end of text2; then the python string
which is build at reception contains this useless (invisible
but nevertheless present) null-byte as last char.
Concerning &test_conv in send(), you need to understand that
this system call only considers a sequence of bytes.
It does not know that these bytes consist in an integer.
That's why you provide the address of the first byte to be sent
(&test_conv) and the number of bytes to be sent
(sizeof(uint32_t)) starting from this address.
The receiver will obtain this exact same sequence of bytes an
interpret them as the memory representation of a 32-bit integer
(considering endianness of course).
Note that the struct package in python could help dealing
with memory representation of integers and endianness.

It seems to me you should use network byte order, not host byte order.
https://pythontic.com/modules/socket/byteordering-coversion-functions
https://www.ibm.com/support/knowledgecenter/en/SSB27U_6.4.0/com.ibm.zvm.v640.kiml0/asonetw.htm
Also, I believe send() and recv() are allowed to return early, without transmission of all that was requested - that's why they return lengths. Usually they will transmit all of your data and return the full length, but they aren't guaranteed to do so.
For python, I like my http://stromberg.dnsalias.org/~dstromberg/bufsock.html module. It takes care of resumption for you.
For C, I think people pretty much just use a while loop for each send() and recv().

Hexadecimal Memory Address to Assembly

I am following a buffer overflow tutorial. I have set up my NOP block, I also set up my shell code, now I need to append the return address to the end of my string. I know my return address is :
0xbfffef40
however I need to write it in the form:
xd0\xce\xff\xff (that's just an example address to show what format I need)
I'm not sure how to carry out the conversion between the two.

You can use struct.pack like this:
import struct
struct.pack('<L', 0xbfffef40)
Check the documentation of struct.pack if you want to change the endianness.

C++ wxsocket TCP server send unsigned char array but python client get 4 more bytes in

Hi I sent an wxImage using C++ TCP socket to python TCP clientlike this:
C++ TCP Server is like this:
//this part is the TCP server m_sock send the unsigned char* ImageData
std::stringstream imageStream1;
imageStream1 << ImageData;
m_sock->Write(imageStream1.str().c_str(), imageStream1.str().length());
//Then I send a simple string "hello"
std::stringstream dataStream2;
dataStream2 << "hello";
m_sock->Write(dataStream2.str().c_str(), dataStream2.str().length());
dataStream2.clear();
So I receive the two message in python
// This is used to receive the image data with the exact same bytes of the ImageData
packet = socket.recv(ImageDataSize)
myWxImage.SetData(packet)
// This is used to receive the second "hello" message
packet = socket.recv(1000)
I can received the image successfully. But when I print the message, it shows "****hello" but not "hello". There is an additional 4 bytes string in front of it. the four "*" is something python can not print out. what is it? Can I get rid of it?

std::stringstream imageStream1;
imageStream1 << ImageData;
m_sock->Write(imageStream1.str().c_str(), imageStream1.str().length());
looks like a bad idea. I don't know the type of ImageData, but converting it to a std::stringstream is definitely not the best way to go about this. If ImageData just contains a reference to a raw image buffer, use that. Images are not strings, and obviously, things will go wrong if you get the c_str() from something that might contain \x00 in its data (because that's what normally terminates C strings).
Then:
//Then I send a simple string "hello"
std::stringstream dataStream2;
dataStream2 << "hello";
m_sock->Write(dataStream2.str().c_str(), dataStream2.str().length());
dataStream2.clear();
Um, do you understand what you're writing here? You take a perfectly valid C string , "hello", you push it onto a stringstream, just to get a C string out again?
To be honest: I think you're copy&pasting code from an example without understanding it. You should go back and understand every single line before usage.

How to remove the "****"
In general to just remove characters from a python string you can do this:
cleaned_data = str(packet).replace("*", "")
Please remember that when you receive data in packet that you are receiving bytes and not a string, so if you try to print directly python has do an implicit conversion for you. In this case, I think it would be better to do an explicit conversion and remove the beginning.
However, it doesn't solve the problem of why you are getting the characters in the first place. These 4 "*" could potentially stem from an encoding issue.
Trouble Shooting
It would be worth it to connect the python program to debugger (maybe PyCharm) so that you could see the actual values that are before "Hello", this would give you an idea of where to look so that you do not have to deal with conversions from bytes to unicode or whatever locale your console is in.
If you could get that and post that info it might help others to help you.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to decode a variable string from C? - python

How I can despise the second part? Just do not send it? The server code probably looks like this: char buffer[123]; <init buffer partially, terminating the the part with a 0 > write(..., buffer, 123); Change it to be write(..., buffer, strlen(buffer) + 1);

Related

depythonifying 'char', got 'str' for pyobjc

How to convert a regular string to a raw string? [duplicate]

Fixed Length Socket communication from python to C

Hexadecimal Memory Address to Assembly

C++ wxsocket TCP server send unsigned char array but python client get 4 more bytes in

Categories

Resources