XOR encryption with a PyObject - python

I'm having trouble encrypting bytes from a PyObject using XOR.
For now I only managed to print the bytes as an encoded string (with PyUnicode_AsEncodedString):
Here's what I tried (taken from this SO answer)
PyObject* repr = PyObject_Repr(wf.str); // wf.str is a PyObject *
PyObject* str = PyUnicode_AsEncodedString(repr, "utf-8", "~E~");
const char *bytes = PyBytes_AS_STRING(str);
printf("REPR: %s\n", bytes);
Py_XDECREF(repr);
Py_XDECREF(str);
From here on, I don't know what to do anymore.
I also tried to access bytes only using PyBytes_AS_STRING(wf.str) and then proceed with the encryption, but it only returned one byte.
There is a way to XOR encrypt bytes taken from a PyObject? Something like this:
bytes = getBytesFromPyObject(wf.str)
encrypted = XOREncryption(bytes)
AssignBytesToPyObject(encrypted, wf.str)
Note: I don't know much about C, all of this is almost new to me.
Edit: I'm using C instead of Python because I need to implement a function that uses XOR encryption in a built-in module for Python3.

"I also tried to access bytes only using PyBytes_AS_STRING(wf.str)
and then proceed with the encryption, but it only returned one byte."
Are you sure about this? It looks like it is returning a byte pointer byte*.
In C, a pointer to an array is a pointer to the location of the first element in the array. If you add an offset equal to the size of the data you are accessing (in this case, 1 byte), then you should be able to access the following element.
The issue is likely that you need some method to determine the size of your byte array, then you can operate on the byte* that you've already accessed and iterate through each byte.

I know that #h0r53 has already answered my question, but I want to post the code anyway in case it comes useful to someone.
This was implemented in a function (PyMarshal_WriteObjectToString, used for marshal.dump and marshal.dumps) of my custom version of Python, in the marshal.c file
char *bytes = PyBytes_AS_STRING(wf.str);
const char key[32] = {162, 10, 190, 161, 209, 110, 69, 181,
119, 63, 176, 125, 158, 134, 48, 185,
200, 22, 41, 43, 212, 144, 131, 169,
158, 182, 8, 220, 200, 232, 231, 126
};
Py_ssize_t n = PyBytes_Size(wf.str);
for (int i = 0; i < n; i++) {
bytes[i] = bytes[i] ^ key[i % (sizeof(key) / sizeof(key[0]))];
}
wf.str = PyBytes_FromStringAndSize(bytes, n);

Related

Converting Python program to C: How can I multiply a character by a specified value and store it into a variable?

in need of general help with converting a small buffer overflow script in Python to C. It's a bit of hack job and I am struggling to get the data types right. I can compile everything with only a single warning: "initialization makes pointer from integer without a cast - char *buff = ("%0*i", 252, 'A');"
This line is supposed to give the variable buff the value of 252 'A' characters.
I know that changing the data type can fix this, but the rest of the program relies on overflow being a pointer char *.
If anyone has any tips for me regarding any parts of the program they would be greatly appreciated.
cheers, Shiv
ORIGINAL Python:
stack_addr = 0xbffff1d0
rootcode = "\x31"
def conv(num):
return struct.pack("<I",num)
buff = "A" * 172
buff += conv(stack_addr)
buff += "\x90" * 30
buff += rootcode
buff += "A" * 22
print "targetting vulnerable program"
call(["./vuln", buff])
Converted C code:
//endianess convertion
int conv(int stack_addr)
{
(stack_addr>>8) | (stack_addr<<8);
return(0);
}
int main(int argc, char *argv[])
{
int stack_addr = 0xbffff1d0;
int rootcode = *"\x31"
char *buff = ("%0*i", 252, 'A'); //give buff the value of 252 'A's
buff += conv(stack_addr); //endian conversion
buff += ("%0*i", 30, '\x90'); //append buff variable with 30 '\x90'
buff = buff + rootcode; //append buff with value of rootcode variable
buff += ("%0*i", 22, 'A'); //append buff with 22 'A's
}
The easiest way it to write a string with the needed number of characters manually. Use the copy-paste feature of your favourite text editor.
"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
You can also build it from individual characters, using a for-loop, as described below. However, you can skip the part with building a long string, and append individual characters directly to the final string. This can be done in two ways: using strcat and without using strcat. The first way is a little cleaner:
char buff[400] = ""; // note: this should be an array, not a pointer!
// the array should be big enough to hold the final string; 400 seems enough
for (int i = 0; i < 252; i++)
strcat(buff, "A"); // this part appends one string of length 1
The function strcat is inefficient; it calculates the length of the string each time you append the string "A" to it. You don't need speed, but if you ever decide to write it efficiently, don't use strcat, and append individual char (bytes) to the array using core C language:
char buff[400]; // note: this should be an array, not a pointer!
int pos = 0; // position at which to write data
for (int i = 0; i < 252; i++)
buff[pos++] = 'A'; // this part appends one char 'A'; note single quotes
...
buff[pos++] = '\0'; // don't forget to terminate the string!

C++ boost.python cannot convert const char* to str

I want to calculate something in C++ and return result to python. This is part of the C++ code:
const Mat& flow_map_x, flow_map_y;
std::vector<unchar> encoded_x, encoded_y;
flow_map_x = ...;
flow_map_y = ...;
Mat flow_img_x(flow_map_x.size(), CV_8UC1);
Mat flow_img_y(flow_map_y.size(), CV_8UC1);
encoded_x.resize(flow_img_x.total());
encoded_y.resize(flow_img_y.total());
memcpy(encoded_x.data(), flow_img_x.data, flow_img_x.total());
memcpy(encoded_y.data(), flow_img_y.data, flow_img_y.total());
bp::str tmp = bp::str((const char*) encoded_x.data())
The error when running python script is:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte
After debugging, I found that the error comes from this line:
bp::str tmp = bp::str((const char*) encoded_x.data())
I'm not good at C++. Could anyone tell me how to fix the error? Thanks in advance!
You can't because encoded_x.data() is not UTF-8. You probably want bytes for a copy of the raw data:
Using PyObject* PyBytes_FromStringAndSize(const char *v, Py_ssize_t len). Or you can use PyByteArray_FromStringAndSize for a bytearray with the same arguments.
bp::object tmp(bp::handle<>(PyBytes_FromStringAndSize(
// Data to make `bytes` object from
reinterpret_cast<const char*>(encoded_x.data()),
// Amount of data to read
static_cast<Py_ssize_t>(encoded_x.size())
)));
In this case, you can get rid of the vector and use flow_img_x.data and flow_img_x.total() directly.
Or a memoryview to not copy the data, but just access the std::vectors data
Using PyObject* PyMemoryView_FromMemory(char *mem, Py_ssize_t size, int flags)
bp::object tmp(bp::handle<>(PyMemoryView_FromMemory(
reinterpret_cast<char*>(encoded_x.data()),
static_cast<Py_ssize_t>(encoded_x.size()),
PyBUF_WRITE // Or `PyBUF_READ` i if you want a read-only view
)));
(If the vector was const, you would const_cast<char*>(reinterpret_cast<const char*>(encoded_x.data())) and only use PyBUF_READ)
You have to make sure the vector stays alive in this case though, but it won't create an unnecessary copy.

Pack list of ints in Python

I have got a list that I am packing as bytes using struct module in Python. Here is my list:
[39, 39, 126, 126, 256, 258, 260, 259, 257, 126]
I am packing my list as:
encoded = struct.pack(">{}H".format(len(list)), *list)
where I pass number of elements in list as a format.
Now, I need to unpack the packed struct. For that I will need a format where I again pass number of elements. For now I am doing it like so:
struct.unpack(">{}H".format(10), encoded)
However, I can't pass it as a simple parameter to function format because that struct is then written to file that I am using for compressing image. How can I add a number of elements to file, and unpack it after?
P.S. I would like to get that 10 (in unpacking) from file itself that is packed as bytes.
Form what I understood from the comments and questions. Maybe this will be helpful.
import struct
data = [39, 39, 126, 126, 256, 258, 260, 259, 257, 126]
encoded = struct.pack(">{}H".format(len(data)), *data)
tmp = struct.pack(">H", len(data))
encoded = tmp + encoded #appending at the start
begin = 2
try:
size = struct.unpack(">H", encoded[0:begin])[0]
print(size)
print(struct.unpack(">{}H".format(size), encoded[begin:]))
except Exception as e:
print(e)
Let me know if it helps.
Here is my approach of adding that [number of elements] to the file:
file.write(len(compressed_list).to_bytes(3,'big'))
I allocate 3 bytes of memory for the length of compressed_list, convert it to bytes, and add it to the beginning of the file. Further, write other left parts.
Next, when I need that number, I get it from the file like so:
sz = int.from_bytes(encoded[0:3],'big')
which means that I take first three bytes from byte array read from the file, and typecast that bytes to int.
That solved my problem.

Decode C const char* in Python with ctypes

I am using ctypes (imported as c) in Python 3 to execute a C++ shared library. The library is loaded into python using:
smpLib = c.cdll.LoadLibrary(os.getcwd()+os.sep+'libsmpDyn.so')
One of the functions has the extern 'C' declaration const char* runSmpModel(...). The python function prototype is coded and run as:
proto_SMP = c.CFUNCTYPE(c.c_char_p,...)
runSmpModel = proto_SMP(('runSmpModel',smpLib))
res = runSmpModel(...)
This all works beautifully, but I'm unable to decode the res variable and obtain the string passed out by the C runSmpModel function. The value of res is displayed (I'm using ipython3) as b'\xd0'. The best solution I've found online - res.decode('utf-8') gives me the error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd0 in position 0: unexpected end of data
The const char* return value from the runSmpModel function comes from
std::string scenID = SMPLib::SMPModel::runModel(...);
return scenID.c_str();
inside runModel, it is ultimately defined as shown here, where scenName is an input string:
auto utcBuffId = newChars(500);
sprintf(utcBuffId, "%s_%u", scenName.c_str(), microSeconds); // catenate scenario name & time
uint64_t scenIdhash = (std::hash < std::string>() (utcBuffId)); // hash it
auto hshCode = newChars(100);
sprintf(hshCode, "%032llX", scenIdhash);
scenId = hshCode;
The value of this specific res should be 0000000000000000BBB00C6CA8B8872E. How can I decode this string?
After a lot of further testing, I've identified the problem as the length of the string passed from the C function. No problems if the string is up to 15 characters in length, but if it's 16 or longer - no dice. For a minimum-working example, the C-code is:
extern "C" {
const char* testMeSO()
{
string scenarioID = "abcdefghijklmnop";
return scenarioID.c_str();
}
}
and python code is (same definition of smpLib as shown above):
proto_TST = c.CFUNCTYPE(c.c_char_p)
testMeSO = proto_TST(('testMeSO',smpLib))
res = testMeSO()
print("Scenario ID: %s"%res.decode('utf-8'))
This gives the decode error, unless any character is removed from the scenarioID variable in the C function. So it seems the question is "how can Python read a C char* longer than 15 characters, using ctypes.
After several days of debugging and testing, I've finally gotten this working, using the second solution posted by #Petesh on this SO post. I don't understand why ctypes is apparently limiting the char * value passed from C to 15 characters (+termination = 256 bits?).
Essentially, the solution is to pass into the C function an extra char * buff buffer that has already been created using ctypes.create_string_buffer(32*16), as well as an unsigned int buffsize of value 32*16. Then, in the C function execute scenarioID.copy(buff,buffsize). The python prototype function is modified in an obvious way.

How to print values of a string full of "chaos question marks"

I'm debugging with python audio, having a hard time with the audio coding.
Here I have a string full of audio data, say, [10, 20, 100].
However the data is stored in a string variable,
data = "����������������"
I want to inspect the values of this string.
Below is the things I tried
Print as int
I tried to use print "%i" % data[0]
ended up with
Traceback (most recent call last):
File "wire.py", line 28, in <module>
print "%i" % data[i]
TypeError: %d format: a number is required, not str
Convert to int
int(data[0]) ended up with
Traceback (most recent call last):
File "wire.py", line 27, in <module>
print int(data[0])
ValueError: invalid literal for int() with base 10: '\xd1'
Any idea on this? I want to print the string in a numerical way since the string is actually an array of sound wave.
EDIT
All your answers turned out to be really helpful.
The string is actually generated from the microphone so I believe it to be raw wave form, or vibration data. Further this should be referred to the audio API document, PortAudio.
After looking into PortAudio, I find this helpful example.
** This routine will be called by the PortAudio engine when audio is needed.
** It may called at interrupt level on some machines so don't do anything
** that could mess up the system like calling malloc() or free().
static int patestCallback( const void *inputBuffer, void *outputBuffer,
unsigned long framesPerBuffer,
const PaStreamCallbackTimeInfo* timeInfo,
PaStreamCallbackFlags statusFlags,
void *userData )
{
paTestData *data = (paTestData*)userData;
float *out = (float*)outputBuffer;
unsigned long i;
(void) timeInfo; /* Prevent unused variable warnings. */
(void) statusFlags;
(void) inputBuffer;
for( i=0; i<framesPerBuffer; i++ )
{
*out++ = data->sine[data->left_phase]; /* left */
*out++ = data->sine[data->right_phase]; /* right */
data->left_phase += 1;
if( data->left_phase >= TABLE_SIZE ) data->left_phase -= TABLE_SIZE;
data->right_phase += 3; /* higher pitch so we can distinguish left and right. */
if( data->right_phase >= TABLE_SIZE ) data->right_phase -= TABLE_SIZE;
}
return paContinue;
}
This indicates that there is some way that I can interpret the data as float
To be clear, your audio data is a byte string. The byte string is a representation of the bytes stored in the audio file. You are not going to simply be able to convert those bytes into meaningful values without knowing what is in the binary first.
As an example, the mp3 specification says that each mp3 contains header frames (described here: http://en.wikipedia.org/wiki/MP3). To read the header you would either need to use something like bitstring, or if you feel comfortable doing the bitwise manipulation yourself then you would just need to unpack an integer (4 bytes) and do some math to figure out the values of the 32 individual bits.
It really all depends on what you are trying to read, and how the data was generated. If you have whole byte numbers, then struct will serve you well.
If you're ok with the \xd1 mentioned above:
for item in data: print repr(item),
Note that for x in data will iterate over each value in the list rather than its location. If you want the location you can use for i in range(len(data)): ...
If you want them in numerical form, replace repr(item) with ord(item).
It is better if you use the new {}.format method:
data = "����������������"
print '{0}'.format(data[3])
You could use ord to map each byte to its numeric value between 0-255:
print map(ord, data)
Or, for Python 3 compatibility, do:
print([ord(c) for c in data])
It will also work with Unicode glyphs, which might not be what you want, so make sure you have a bytearray or an actual str or bytes object in Python 2.

Categories

Resources