depythonifying 'char', got 'str' for pyobjc - python

0
Story would be: I was using a hardware which can be automatic controlled by a objc framework, it was already used by many colleagues so I can see it as a "fixed" library. But I would like to use it via Python, so with pyobjc I can already connect to this device, but failed to send data into it.
The objc command in header is like this
(BOOL) executeabcCommand:(NSString*)commandabc
withArgs:(uint32_t)args
withData:(uint8_t*)data
writeLength:(NSUInteger)writeLength
readLength:(NSUInteger)readLength
timeoutMilliseconds:(NSUInteger)timeoutMilliseconds
error:(NSError **) error;
and from my python code, data is an argument which can contain 256bytes of data such
as 0x00, 0x01, 0xFF. My python code looks like this:
senddata=Device.alloc().initWithCommunicationInterface_(tcpInterface)
command = 'ABCw'
args= 0x00
writelength = 0x100
readlength = 0x100
data = '\x50\x40'
timeout = 500
success, error = senddata.executeabcCommand_withArgs_withData_writeLength_readLength_timeoutMilliseconds_error_(command, args, data, writelength, readlength, timeout, None)
Whatever I sent into it, it always showing that.
ValueError: depythonifying 'char', got 'str'
I tired to dig in a little bit, but failed to find anything about convert string or list to char with pyobjc

Objective-C follows the rules that apply to C.
So in objc as well as C when we look at uint8_t*, it is in fact the very same as char* in memory. string differs from this only in that sense that it is agreed that the last character ends in \0 to indicate that the char* block that we call string has its cap. So char* blocks end with \0 because, well its a string.
What do we do in C to find out the length of a character block?
We iterate the whole block until we find \0. Usually with a while loop, and break the loop when you find it, your counter inside the loop tells you your length if you did not give it somehow anyway.
It is up to you to interpret the data in the desired format.
Which is why sometime it is easier to cast from void* or to take indeed a char* block which is then cast to and declared as uint8_t data inside the function which makes use if it. Thats the nice part of C to be able to define that as you wish, use that force that was given to you.
So to make your life easier, you could define a length parameter like so
-withData:(uint8_t*)data andLength:(uint64_t)len; to avoid parsing the character stream again, as you know already it is/or should be 256 characters long. The only thing you want to avoid at all cost in C is reading attempts at indices that are out of bound throwing an BAD_ACCESS exception.
But this basic information should enable you to find a way to declare your char* block containing uint8_t data addressed with the very first pointer (*) which also contains the first uint8_t character of the block as str with a specific length or up to the first appearance of \0.
Sidenote:
objective-c #"someNSString" == pythons u"pythonstring"
PS: in your question is not clear who throw that error msg.
Python? Because it could not interpret the data when receiving?
Pyobjc? Because it is python syntax hell when you mix with objc?
The objc runtime? Because it follows the strict rules of C as well?

Python has always been very forgiving about shoe-horning one type into another, but python3 uses Unicode strings by default, which need to be converted into binary strings before plugging into pyobjc methods.

Try specifying the strings as byte objects as b'this'
I was hitting the same error trying to use IOKit:
import objc
from Foundation import NSBundle
IOKit = NSBundle.bundleWithIdentifier_('com.apple.framework.IOKit')
functions = [("IOServiceGetMatchingService", b"II#"), ("IOServiceMatching", b"#*"),]
objc.loadBundleFunctions(IOKit, globals(), functions)
The problem arose when I tried to call the function like so:
IOServiceMatching('AppleSmartBattery')
Receiving
Traceback (most recent call last):
File "<pyshell#53>", line 1, in <module>
IOServiceMatching('AppleSmartBattery')
ValueError: depythonifying 'charptr', got 'str'
While as a byte object I get:
IOServiceMatching(b'AppleSmartBattery')
{
IOProviderClass = AppleSmartBattery;
}

Related

Finding a Python 2.7 -> 3.8 struct.Pack("H") into Strings for joining into Lists?

This is.. a long one. So I apologize for any inconsistency regarding code and problems. I'll be sure to try and add as much of the source code as I can to make sure the issue is as clear as possible.
This project at work is an attempt at converting Python 2 to 3, and thus far has been mildly straightforward. My coworker and I have reached a point though where no amount of googling or searching has given a straight answer, so here we are.
Alright, Starting offwith...
Python 2 code:
listBytes[102:104]=struct.pack('H',rotation_deg*100) # rotational position in degrees
listBytes[202:204]=struct.pack('H',rotation_deg*100) # rotational position in degrees
listBytes[302:304]=struct.pack('H',rotation_deg*100) # rotational position in degrees
# this continues on for a while in the same fashion
Where rotation_deg is a float between 0.00 and 359.99 (but for testing is almost always changing between 150-250)
For the purpose of testing, we're going to make rotation_deg be 150.00 all the time.
a = listBytes[102:104]=struct.pack('H',150.00*100)
print a
print type(a)
The print out of the following is:
�:
<type 'str'>
From what I understand, in the Python2 version of struct.pack, it is packing the floats as shorts, which then are "added" to the list as a short. Python 2 sees it as a string, and adds no encoding to the string (will get to that later for python 3). All simple and good, then a few more bits and bobs of dropping more stuff into the list and we get to:
return ''.join(listBytes)
Which, is being sent back to a simple variable:
bytes=self.UpdatePacket(bytes,statusIndex,rotation_deg,StatusIdList,StatusValueList, Stat,offsetUTC)
To then be sent along as a string through
sock.sendto(bytes , (host, port) )
This all comes together to look like this:
A string with a bunch of bytes (I think)
This is the working version, in which we are sending the bytes along the socket, data is being retrieved, and everyone is happy. If I missed anything, please let me know, otherwise, lets move to...
Python 3
This is where the Fun Begins
There are a few changes that are required between Python 2 and 3 right off the bat.
struct.pack('H',rotation_deg*100) requires an INT type to be packed, meaning all instances of packing had to be given int(rotatin_deg*100) as to not error the program.
sock.sendto(bytes, (host, port)) did not work anymore as the socket needed a bytes object to send something. No more strings that look like bytes, they had to be properly encoded to send properly. So now this becomes sock.sendto(bytes.encode(), (host, port)) to properly encode the "bytes" string.
As more of a background, the length of listBytes should always be 1206. Anymore and our socket won't work properly, and the issue is that no matter what we try with this python 3 code, the .join seems to be sending a LOT more than just byte objects, often quintupling the length of listBytes and breaking the socket.sendto .
listBytes[102:104] = struct.pack('H', int(rotation_deg * 100)) # rotational position in degrees
listBytes[202:204] = struct.pack('H', int(rotation_deg * 100)) # rotational position in degrees
listBytes[302:304] = struct.pack('H', int(rotation_deg * 100)) # rotational position in degrees
# continues on in this fashion again
return ''.join(str(listBytes))
returns to:
bytes = self.UpdatePacket(bytes, statusIndex, rotation_deg, StatusIdList, StatusValueList, Stat, offsetUTC)
sock.sendto(bytes.encode(), (host, port))
Here's where things start getting weird
a = struct.pack('H', int(150.00 * 100))
returns:
b'\x98:', with it's type being <class 'bytes'>, which is fine and the value we want, except we specifically need to store this variable into the list as maybe a string... to encode it later to send as a byte object for the socket.
You're starting to see the problem, yes?
The thing is, we've tried just about every technique to convert the two bytes that struct.pack returns into a string of some kind, and we've been able to convert it over, but then we run into the issue of the .join being evil.
Remember when I was talking about listBytes had to remain a size of 1206 or else it would break? For some reason, if we .join literally anything other than the two bytes as a string, we think python is trying to add a bunch of other stuff that we don't need.
So for now, we're focusing on trying to match the python 2 equivalent to python 3.
Here's what we've tried
binascii.hexlify(struct.pack('H', int(150.00 * 100))).decode() returns '983a'
str(struct.pack('H', int(150.00 * 100.00)).decode()) returns an error, 'utf-8' codec can't decode byte 0x98 in position 0: invalid start byte
str(struct.pack('H', int(150.00 * 100.00)).decode("utf-16")) returns '㪘'. Can't even begin to understand that.
return b''.join(listBytes) returns an error because there are int's at the start of the list.
return ''.join(str(listBytes)).encode('utf-8') still is adding a bunch of nonsense.
Now we get to the .join, and the first loop around it seems.. fine? It has 1206 as listBytes length before .joining, but on the second loop around, it creates a massive influx of junk, making the list 5503 in length. Third go around it becomes 27487, and finally on the last go around, it becomes to large for the socket to handle and I get slapped with [WinError 10040] A message sent on a datagram socket was larger than the internal message buffer or some other network limit, or the buffer used to receive a datagram into was smaller than the datagram itself
Phew, if you made it this far, thank you. Any help at all would be extremely appreciated. If you have questions or I'm missing something, let me know.
Thanks!
You’re just trying too hard. It may be helpful to think of bytes and unicode rather than the ambiguous str type (which is the former in Python 2 and the latter in Python 3). struct always produces bytes, and socket needs to send bytes, regardless of the language version. Put differently, everything outside your process is bytes, although some represent characters in some encoding chosen by the rest of the system. Characters are a data type used to manipulate text within your program, just like floats are a data type used to manipulate “real numbers”. You have no characters here, so you definitely don’t need encode.
So just accumulate your bytes objects and then join them with b''.join (if you can’t just directly use the buffer into which your slice assignments seem to be writing) to keep them as bytes.

Using NULL bytes in bash (for buffer overflow)

I programmed a little C program that is vulnerable to a buffer overflow. Everything is working as expected, though I came across a little problem now:
I want to call a function which lies on address 0x00007ffff7a79450 and since I am passing the arguments for the buffer overflow through the bash terminal (like this:
./a "$(python -c 'print "aaaaaaaaaaaaaaaaaaaaaa\x50\x94\xA7\xF7\xFF\x7F\x00\x00"')" )
I get an error that the bash is ignoring the nullbytes.
/bin/bash: warning: command substitution: ignored null byte in input
As a result I end up with the wrong address in memory (0x7ffff7a79450instead of0x00007ffff7a79450).
Now my question is: How can I produce the leading 0's and give them as an argument to my program?
I'll take a bold move and assert what you want to do is not possible in a POSIX environment, because of the way arguments are passed.
Programs are run using the execve system call.
int execve(const char *filename, char *const argv[], char *const envp[]);
There are a few other functions but all of them wrap execve in the end or use an extended system call with the properties that follow:
Program arguments are passed using an array of NUL-terminated strings.
That means that when the kernel will take your arguments and put them aside for the new program to use, it will only read them up to the first NUL character, and discard anything that follows.
So there is no way to make your example work if it has to include nul characters. This is why I suggested reading from stdin instead, which has no such limitation:
char buf[256];
read(STDIN_FILENO, buf, 2*sizeof(buf));
You would normally need to check the returned value of read. For a toy problem it should be enough for you to trigger your exploit. Just pipe your malicious input into your program.

Node packing data using buffer

Any one know how to convert this python snippet in nodejs:
return "".join(reversed(struct.pack('I',data)))
I tried to make the same in nodejs using Buffer like this:
var buff = new Buffer(4).fill(0);
buff.writeInt16LE(data, 0);
return new Buffer(buff.reverse().toString('hex'),'hex');
But it not work exactly like python snippet, some data make my program stuck and it gave me this error:
buffer.js:830
throw new TypeError('value is out of bounds');
^
Make sure that data is a valid 16-bit signed integer. That means it must be a valid integer from -32,768 to 32,767.
However, the 'I' in Python's struct.pack() is for unsigned 32-bit integers, so what you should instead be using is buff.writeUInt32LE(data, 0).

Python: base64.b64decode() vs .decode?

The Code Furies have turned their baleful glares upon me, and it's fallen to me to implement "Secure Transport" as defined by The Direct Project. Whether or not we internally use DNS rather than LDAP for sharing certificates, I'm obviously going to need to set up the former to test against, and that's what's got me stuck. Apparently, an X509 cert needs some massaging to be used in a CERT record, and I'm trying to work out how that's done.
The clearest thing I've found is a script on Videntity's blog, but not being versed in python, I'm hitting a stumbling block. Specifically, this line crashes:
decoded_clean_pk = clean_pk.decode('base64', strict)
since it doesn't seem to like (or rather, to know) whatever 'strict' is supposed to represent. I'm making the semi-educated guess that the line is supposed to decode the base64 data, but I learned from the Debian OpenSSL debacle some years back that blindly diddling with crypto-related code is a Bad Thing(TM).
So I turn the illustrious python wonks on SO to ask if that line might be replaced by this one (with the appropriate import added):
decoded_clean_pk = base64.b64decode(clean_pk)
The script runs after that change, and produces correct-looking output, but I've got enough instinct to know that I can't necessarily trust my instincts here. :)
This line should've work if you would've called like this:
decoded_clean_pk = clean_pk.decode('base64', 'strict')
Notice that strict has to be a string, otherwise python interpreter would try to search for a variable named strict and if it didn't find it or otherwise has other value than: strict, ignore, and replace, it'll probably would've complain about it.
Take a look at this code:
>>>b=base64.b64encode('hello world')
>>>b.decode('base64')
'hello world'
>>>base64.b64decode(b)
'hello world'
Both decode and b64decode works the same when .decode is passed the base64 argument string.
The difference is that str.decode will take a string of bytes as arguments and will return it's Unicode representation depending on the encoding argument you pass as first parameter. In this case, you're telling it to handle a bas64 string so it will do it ok.
To answer your question, both works the same, although b64decode/encode are meant to work only with base64 encodings and str.decode can handle as many encodings as the library is aware of.
For further information take a read at both of the doc sections: decode and b64decode.
UPDATE: Actually, and this is the most important example I guess :) take a look at the source code for encodings/base64_codec.py which is that decode() uses:
def base64_decode(input,errors='strict'):
""" Decodes the object input and returns a tuple (output
object, length consumed).
input must be an object which provides the bf_getreadbuf
buffer slot. Python strings, buffer objects and memory
mapped files are examples of objects providing this slot.
errors defines the error handling to apply. It defaults to
'strict' handling which is the only currently supported
error handling for this codec.
"""
assert errors == 'strict'
output = base64.decodestring(input)
return (output, len(input))
As you may see, it actually uses base64 module to do it :)
Hope this clarify in some way your question.

Writing and reading headers with struct

I have a file header which I am reading and planning on writing which contains information about the contents; version information, and other string values.
Writing to the file is not too difficult, it seems pretty straightforward:
outfile.write(struct.pack('<s', "myapp-0.0.1"))
However, when I try reading back the header from the file in another method:
header_version = struct.unpack('<s', infile.read(struct.calcsize('s')))
I have the following error thrown:
struct.error: unpack requires a string argument of length 2
How do I fix this error and what exactly is failing?
Writing to the file is not too difficult, it seems pretty straightforward:
Not quite as straightforward as you think. Try looking at what's in the file, or just printing out what you're writing:
>>> struct.pack('<s', 'myapp-0.0.1')
'm'
As the docs explain:
For the 's' format character, the count is interpreted as the size of the string, not a repeat count like for the other format characters; for example, '10s' means a single 10-byte string, while '10c' means 10 characters. If a count is not given, it defaults to 1.
So, how do you deal with this?
Don't use struct if it's not what you want. The main reason to use struct is to interact with C code that dumps C struct objects directly to/from a buffer/file/socket/whatever, or a binary format spec written in a similar style (e.g. IP headers). It's not meant for general serialization of Python data. As Jon Clements points out in a comment, if all you want to store is a string, just write the string as-is. If you want to store something more complex, consider the json module; if you want something even more flexible and powerful, use pickle.
Use fixed-length strings. If part of your file format spec is that the name must always be 255 characters or less, just write '<255s'. Shorter strings will be padded, longer strings will be truncated (you might want to throw in a check for that to raise an exception instead of silently truncating).
Use some in-band or out-of-band means of passing along the length. The most common is a length prefix. (You may be able to use the 'p' or 'P' formats to help, but it really depends on the C layout/binary format you're trying to match; often you have to do something ugly like struct.pack('<h{}s'.format(len(name)), len(name), name).)
As for why your code is failing, there are multiple reasons. First, read(11) isn't guaranteed to read 11 characters. If there's only 1 character in the file, that's all you'll get. Second, you're not actually calling read(11), you're calling read(1), because struct.calcsize('s') returns 1 (for reasons which should be obvious from the above). Third, either your code isn't exactly what you've shown above, or infile's file pointer isn't at the right place, because that code as written will successfully read in the string 'm' and unpack it as 'm'. (I'm assuming Python 2.x here; 3.x will have more problems, but you wouldn't have even gotten that far.)
For your specific use case ("file header… which contains information about the contents; version information, and other string values"), I'd just use write the strings with newline terminators. (If the strings can have embedded newlines, you could backslash-escape them into \n, use C-style or RFC822-style continuations, quote them, etc.)
This has a number of advantages. For one thing, it makes the format trivially human-readable (and human-editable/-debuggable). And, while sometimes that comes with a space tradeoff, a single-character terminator is at least as efficient, possibly more so, than a length-prefix format would be. And, last but certainly not least, it means the code is dead-simple for both generating and parsing headers.
In a later comment you clarify that you also want to write ints, but that doesn't change anything. A 'i' int value will take 4 bytes, but most apps write a lot of small numbers, which only take 1-2 bytes (+1 for a terminator/separator) if you write them as strings. And if you're not writing small numbers, a Python int can easily be too large to fit in a C int—in which case struct will silently overflow and just write the low 32 bits.

Categories

Resources