C program char buffer unexpected overflow - python

I am trying to understand two different behaviors of an overflow from a C program(call it vulnerable_prog)in Linux that asks for input, in order to allow you to overflow a buffer. I understand that the compiler lays out the stack frame in particular ways, causing a bit of unpredictability sometimes. What I can't understand is the difference in the way the memory is handled when I overflow the buffer using a python script to feed 20 characters to the program, as opposed to running vulnerable_prog manually and inputting the 20 characters manually.
The example program declares an array of "char name[20]", and the goal is to overflow it and write a specific value into the other variable that will be overwritten. (This is from a classic wargaming site).
I understand that the processor(64 bit) reads 8 bytes at a time, so this requires padding of arrays that are not multiples of 8 to keep memory organized. Therefore my char [20] is actually occupying 24 bytes of memory and accessible to the processor as 8-byte words.
The unexpected behavior is this:
When using a python script, the overflow behaves as follows:
$python -c'print "A"*20 + "\xre\xhe\xyt\xhe"' | /path/vulnerable_prog
The 20 characters overflow the buffer, and the expected value is written into the correct spot in memory.
HOWEVER, when you try to overflow the buffer by running the program from the command prompt and inputting 20 characters manually, followed by the required hex string to be written to memory, you must use one additional hex character in order to have your value end up in the correct place that you want it:
$echo$ 'AAAAAAAAAAAAAAAAAAAA\xre\xhe\xyt\xhe\xaf'
(output of the 'echo' is then copied and pasted into the prompt that vulnerable_prog offers when run from the command line)
Where does this difference in the padding of the character array between the script and the command line exploitation come into play?
I have been doing a lot of research of C Structure padding and reading in the ISO/IEC 9899:201x, but cannot find anything that would explain this nuance.
(This is my first question on Stack Overflow so I apologize if I did not quite ask this correctly.)

Your Python script, when piped, actually sends 25 characters into /path/vulnerable_prog. The print statement adds a newline character. Here is your Python program plus a small Python script that counts the characters written to its standard input:
python -c'print "A"*20 + "\xre\xhe\xyt\xhe"' | python -c "import sys; print(len(sys.stdin.read()))"
I'm guessing you're not pasting the newline character that comes from echo into the program's prompt. Unfortunately, I don't think I have enough information to explain why you need 25, not 24, characters to achieve what you're attempting.
P.S. Welcome to Stack Overflow!

Related

Python script to inject null bytes in the middle of an executable's argument

I have a simple C program that is copying with strcpy an argument ( argv ) in a buffer ( a char array ), which I am not allowed to modify!
I want to write a python script to input a list of bytes (let's say \x00\x01\x04\x70 ) as one of the executable's arguments so it get's coppied into the buffer.
The way I'm doing it right now is just by calling the system method and start the program with the proper argument.
os.system('./program ' + '\xaa\xaa\xaa\xaa')
My problem is that I want to write a the bytes stating with a null byte, which python complains about and I cannot find a way to get pass this problem.
In the end, after the python script run with the following bytes : (00, 01, 04, 70),
the buffer should look like this : [\x00, \x01, \x04, \x70]
Edit:
Or is there any way to create some sort of pipeline to inject null bytes into the arguments?
Edit:
[workaround] for My specific task
I found out that most of the data had a null byte as the first character, So I would consider the system as little endian and add to the workflow another script that reverse the bytes, ignoring the last \x00 as we can consider the memory locations in the buffer 0 by default.
I will still not mark the question as answered as maybe someone would Find another workaround this problem:)

Using NULL bytes in bash (for buffer overflow)

I programmed a little C program that is vulnerable to a buffer overflow. Everything is working as expected, though I came across a little problem now:
I want to call a function which lies on address 0x00007ffff7a79450 and since I am passing the arguments for the buffer overflow through the bash terminal (like this:
./a "$(python -c 'print "aaaaaaaaaaaaaaaaaaaaaa\x50\x94\xA7\xF7\xFF\x7F\x00\x00"')" )
I get an error that the bash is ignoring the nullbytes.
/bin/bash: warning: command substitution: ignored null byte in input
As a result I end up with the wrong address in memory (0x7ffff7a79450instead of0x00007ffff7a79450).
Now my question is: How can I produce the leading 0's and give them as an argument to my program?
I'll take a bold move and assert what you want to do is not possible in a POSIX environment, because of the way arguments are passed.
Programs are run using the execve system call.
int execve(const char *filename, char *const argv[], char *const envp[]);
There are a few other functions but all of them wrap execve in the end or use an extended system call with the properties that follow:
Program arguments are passed using an array of NUL-terminated strings.
That means that when the kernel will take your arguments and put them aside for the new program to use, it will only read them up to the first NUL character, and discard anything that follows.
So there is no way to make your example work if it has to include nul characters. This is why I suggested reading from stdin instead, which has no such limitation:
char buf[256];
read(STDIN_FILENO, buf, 2*sizeof(buf));
You would normally need to check the returned value of read. For a toy problem it should be enough for you to trigger your exploit. Just pipe your malicious input into your program.

Python - print long string in terminal but stay at the beginning of the string

I am writing a script that runs in the terminal and that displays a (long) multiple line string. My problem is that, when the string is printed, the terminal automatically places the cursor at the end of the string.
The string being longer than the number of lines in the terminal, I only see the last 72 lines of my string (my terminal window has 72 lines), so it forces to scroll up to the beginning of the string every time I run that script, and it turns out to be pretty annoying.
Is there a way to go back to the beginning of the string once it's printed?
End of string, the cursor is at the bottom:
Beginning of the string, ~200 lines above, where I want to be after the script runs:
I thought of using curses, but that seems to be overkill for what I am looking for.
Also, I'm on Mac OS and I don't particularly care about portability
While curses is the portable solution, try printing the sequence ESC [ H. It will likely work on all of the terminals you care about.
print "\033[H"
Reference:
https://en.wikipedia.org/wiki/ANSI_escape_code
Re your comment you made on Aug 28, 2015 at 1:51:
I think what you are looking for is printing this escape code:
print("\033[F")

Python: Output prefixed with b

I'm pretty new to Python so please bear with me here!
I've taken some code from ActiveState (and then butchered it around a bit) to open a DBF file and then output to CSV.
This worked perfectly well on Python 2.5 but I've now moved it to Python 3.3 and ran into a number of issues, most of which I've resolved.
The final issue I have is that in order to run the code, I've had to prefix some items with b (because I was getting TypeError: expected bytes, bytearray or buffer compatible object errors)
The code now works, and outputs correctly, except that every field is displayed as b'DATAHERE' (where DATAHERE is the actual data of course!)
So... does anyone know how I can stop it from outputting the b character? I can post code if required but it's fairly lengthy so I was hoping someone would be able to spot what I expect to be something simple that I've done wrong!
Thanks!
You are seeing the code output byte values; if you expected unicode strings instead, simply decode:
yourdata.decode('ascii')
where ascii should be replaced by the encoding your data uses.

Maximum characters that can be stuffed into raw_input() in Python

For an InterviewStreet challenge, we have to be able to accomodate for a 10,000 character String input from the keyboard, but when I copy/paste a 10k long word into my local testing, it cuts off at a thousand or so.
What's the official limit in Python? And is there a way to change this?
Thanks guys
Here's the challenge by-the-by:
http://www.interviewstreet.com/recruit/challenges/solve/view/4e1491425cf10/4edb8abd7cacd
Are you sure of the fact that your 10k long word doesn't contain newlines?
raw_input([prompt])
If the prompt argument is present, it is written to standard output without a trailing newline. The function then reads a line from input, converts it to a string (stripping a trailing newline), and returns that. When EOF is read, EOFError is raised.
...
If the readline module was loaded, then raw_input() will use it to provide elaborate line editing and history features.
There is no maximum limit (in python) of the buffer returned by raw_input, and as I tested some big length of input to stdin I could not reproduce your result. I tried to search the web for information regarding this but came up with nothing that would help me answer your question.
my tests
:/tmp% python -c 'print "A"*1000000' | python -c 'print len (raw_input ())';
1000000
:/tmp% python -c 'print "A"*210012300' | python -c 'print len (raw_input ())';
210012300
:/tmp% python -c 'print "A"*100+"\n"+"B"*100' | python -c 'print len (raw_input ())';
100
I had this same experience, and found python limits the length of input to raw_input if you do not import the readline module. Once I imported the readline module, it lifted the limit (or at least raised it significantly enough to where the text I was using worked just fine). This was on my Mac with Python 2.7.15. Additionally, it’s been confirmed working on at least 3.9.5.
I guess this is part of the challenges. The faq suggest raw_input() might not be the optimal approach:
The most common (possibly naive) methods are listed below. (...)
There are indeed Python standard modules helping to handle system input/output.

Categories

Resources