I'm quite new to python and trying to port a simple exploit I've written for a stack overflow (just a nop sled, shell code and return address). This isn't for nefarious purposes but rather for a security lecture at a university.
Given a hex string (deadbeef), what are the best ways to:
represent it as a series of bytes
add or subtract a value
reverse the order (for x86 memory layout, i.e. efbeadde)
Any tips and tricks regarding common tasks in exploit writing in python are also greatly appreciated.
In Python 2.6 and above, you can use the built-in bytearray class.
To create your bytearray object:
b = bytearray.fromhex('deadbeef')
To alter a byte, you can reference it using array notation:
b[2] += 7
To reverse the bytearray in place, use b.reverse(). To create an iterator that iterates over it in reverse order, you can use the reversed function: reversed(b).
You may also be interested in the new bytes class in Python 3, which is like bytearray but immutable.
Not sure if this is the best way...
hex_str = "deadbeef"
bytes = "".join(chr(int(hex_str[i:i+2],16)) for i in xrange(0,len(hex_str),2))
rev_bytes = bytes[::-1]
Or might be simpler:
bytes = "\xde\xad\xbe\xef"
rev_bytes = bytes[::-1]
In Python 2.x, regular str values are binary-safe. You can use the binascii module's b2a_hex and a2b_hex functions to convert to and from hexadecimal.
You can use ordinary string methods to reverse or otherwise rearrange your bytes. However, doing any kind of arithmetic would require you to use the ord function to get numeric values for individual bytes, then chr to convert the result back, followed by concatenation to reassemble the modified string.
For mutable sequences with easier arithmetic, use the array module with type code 'B'. These can be initialized from the results of a2b_hex if you're starting from hexadecimal.
Related
I can't find bytearray method or similar in Raku doc as in Python. In Python, the bytearray defined as this:
class bytearray([source[, encoding[, errors]]])
Return a new array of bytes. The bytearray class is a mutable sequence of integers in the range 0 <= x < 256. It has most of the usual methods of mutable sequences, described in Mutable Sequence Types, as well as most methods that the str type has, see String Methods.
Does Raku should provide this method or some module?
I think you're looking for Buf - a mutable sequence of (usually unsigned) integers. Opening a file with :bin returns a Buf.
brian d foy answer is essentially correct. You can pretty much translate this code into Perl6
my $frame = Buf.new;
$frame.append(0xA2);
$frame.append(0x01);
say $frame; # OUTPUT: «Buf:0x<a2 01>»
However, the declaration is not the same:
bu = bytearray( 'þor', encoding='utf8',errors='replace')
in Python would be equivalent to this in Perl 6
my $bú = Buf.new('þor'.encode('utf-8'));
say $bú; # OUTPUT: «Buf:0x<c3 be 6f 72>»
And to use something equivalent to the error transformation, the approach is different due to the way Perl 6 approaches Unicode normalization; you would probably have to use UTF8 Clean 8 encoding.
For most uses, however, I guess Buf, as indicated by brian d foy, is correct.
Since python does slice-by-copy, slicing strings can be very costly.
I have a recursive algorithm that is operating on strings. Specifically, if a function is passed a string a, the function calls itself on a[1:] of the passed string. The hangup is that the strings are so long, the slice-by-copy mechanism is becoming a very costly way to remove the first character.
Is there a way to get around this, or do I need to rewrite the algorithm entirely?
The only way to get around this in general is to make your algorithm uses bytes-like types, either Py2 str or Py3 bytes; views of Py2 unicode/Py3 str are not supported. I provided details on how to do this on my answer to a related question, but the short version is, if you can assume bytes-like arguments (or convert to them), wrapping the argument in a memoryview and slicing is a reasonable solution. Once converted to a memoryview, slicing produces new memoryviews with O(1) cost (in both time and memory), rather than the O(n) time/memory cost of text slicing.
I am from c background and a beginner in python. I want to know how strings are actually stored in memory in case of python.
I did something like
s="foo"
id(s)=140542718184424
id(s[0])= 140542719027040
id(s[1])= 140542718832152
id(s[2])= 140542718832152
I did not understand how each character is getting stored in memory and and why id of s is not equal to id of s[0] (like it use to be in c) and why id of s1 and s2 are same?
Python has no characters. Indexing into a string creates a new string, which (like every other object) promptly vanquishes if you don't keep a reference to it around. So the id()s in your example can't be compared with each other, an object's id is only unique as long as the object lives. In particular, id(s[0]) != id(s) because the former is a new (temporary) object, and id(s[1]) == id(s[2]) because after the first operand is evaluated, the first temporary string is destroyed and the second temporary string is allocated to the previously freed memory. The latter is an implementation detail and a coincidence and cannot be relied on.
Reasoning about string memory is further complicated by implementation details like small strings (along with integers, some tuples, and more) being interned, so some_str is other_str may be true for equal strings that come from different sources (e.g. from indexing into a string with different indices).
This article is a good reading which explains how strings are stored. Briefly:
When working with empty strings or ASCII strings of one character Python uses string interning. Interned strings act as singletons, that is, if you have two identical strings that are interned, there is only one copy of them in the memory.
Python does not UTF-8 internally to provide constant access to substrings:
s = 'hello world'
s[0]
s[7]
both do not require to scan the string from the initial char (or, more correctly, the first substring of length 1) to the i-th position.
This is why Python uses the three kinds of internal representations for Unicode strings with 1, 2 or 4 byte(s) per char (Latin-1, UCS-2, UCS-4 encoding) and does not use the space-optimised UTF-8.
This is implementation dependent, but some implementations (not only of Python, other languages too) may keep a moderate-size set of constant values around for expected frequent use. In Python's case those might be values like True, None, 'o', 1, 2, etc. This way, when one of those common values is needed, there is no overhead to create it--just refer to the existing value.
While I know that there is the possibility:
>>> a = "abc"
>>> result = a[-1]
>>> a = a[:-1]
Now I also know that strings are immutable and therefore something like this:
>>> a.pop()
c
is not possible.
But is this really the preferred way?
Strings are "immutable" for good reason: It really saves a lot of headaches, more often than you'd think. It also allows python to be very smart about optimizing their use. If you want to process your string in increments, you can pull out part of it with split() or separate it into two parts using indices:
a = "abc"
a, result = a[:-1], a[-1]
This shows that you're splitting your string in two. If you'll be examining every byte of the string, you can iterate over it (in reverse, if you wish):
for result in reversed(a):
...
I should add this seems a little contrived: Your string is more likely to have some separator, and then you'll use split:
ans = "foo,blah,etc."
for a in ans.split(","):
...
Not only is it the preferred way, it's the only reasonable way. Because strings are immutable, in order to "remove" a char from a string you have to create a new string whenever you want a different string value.
You may be wondering why strings are immutable, given that you have to make a whole new string every time you change a character. After all, C strings are just arrays of characters and are thus mutable, and some languages that support strings more cleanly than C allow mutable strings as well. There are two reasons to have immutable strings: security/safety and performance.
Security is probably the most important reason for strings to be immutable. When strings are immutable, you can't pass a string into some library and then have that string change from under your feet when you don't expect it. You may wonder which library would change string parameters, but if you're shipping code to clients you can't control their versions of the standard library, and malicious clients may change out their standard libraries in order to break your program and find out more about its internals. Immutable objects are also easier to reason about, which is really important when you try to prove that your system is secure against particular threats. This ease of reasoning is especially important for thread safety, since immutable objects are automatically thread-safe.
Performance is surprisingly often better for immutable strings. Whenever you take a slice of a string, the Python runtime only places a view over the original string, so there is no new string allocation. Since strings are immutable, you get copy semantics without actually copying, which is a real performance win.
Eric Lippert explains more about the rationale behind immutable of strings (in C#, not Python) here.
The precise wording of the question makes me think it's impossible.
return to me means you have a function, which you have passed a string as a parameter.
You cannot change this parameter. Assigning to it will only change the value of the parameter within the function, not the passed in string. E.g.
>>> def removeAndReturnLastCharacter(a):
c = a[-1]
a = a[:-1]
return c
>>> b = "Hello, Gaukler!"
>>> removeAndReturnLastCharacter(b)
!
>>> b # b has not been changed
Hello, Gaukler!
Yes, python strings are immutable and any modification will result in creating a new string. This is how it's mostly done.
So, go ahead with it.
I decided to go with a for loop and just avoid the item in question, is it an acceptable alternative?
new = ''
for item in str:
if item == str[n]:
continue
else:
new += item
Right now, I am buffering bytes using strings, StringIO, or cStringIO. But, I often need to remove bytes from the left side of the buffer. A naive approach would rebuild the entire buffer. Is there an optimal way to do this, if left-truncating is a very common operation? Python's garbage collector should actually GC the truncated bytes.
Any sort of algorithm for this (keep the buffer in small pieces?), or an existing implementation, would really help.
Edit:
I tried to use Python 2.7's memoryview for this, but sadly, the data outside the "view" isn't GCed when the original reference is deleted:
# (This will use ~2GB of memory, not 50MB)
memoryview # Requires Python 2.7+
smalls = []
for i in xrange(10):
big = memoryview('z'*(200*1000*1000))
small = big[195*1000*1000:]
del big
smalls.append(small)
print '.',
A deque will be efficient if left-removal operations are frequent (Unlike using a list, string or buffer, it's amortised O(1) for either-end removal). It will be more costly memory-wise than a string however, as you'll be storing each character as its own string object, rather than a packed sequence.
Alternatively, you could create your own implementation (eg. a linked list of string / buffer objects of fixed size), which may store the data more compactly.
Build your buffer as a list of characters or lines and slice the list. Only join as string on output. This is pretty efficient for most types of 'mutable string' behaviour.
The GC will collect the truncated bytes because they are no longer referenced in the list.
UPDATE: For modifying the list head you can simply reverse the list. This sounds like an inefficient thing to do however python's list implementation optimises this internally.
from http://effbot.org/zone/python-list.htm :
Reversing is fast, so temporarily
reversing the list can often speed
things up if you need to remove and
insert a bunch of items at the
beginning of the list:
L.reverse()
# append/insert/pop/delete at far end
L.reverse()