Decrypt macsec frame python (AES-GCM)

Decrypt macsec frame python (AES-GCM) - python

I'm french student and for an exercise, my professor asked me to decrypt a macsec frame with python.
I have the key but there is a problem: ValueError: Mac check failed.
Here the frame :
00 0c 29 45 13 e1 00 0c 29 b0 53 b2 88 e5 2c 00
00 00 00 16 00 0c 29 b0 53 b2 00 01 64 ad 0a 24
7f 79 b4 68 2a 4b 37 6e 20 72 c5 e7 af ee 90 7f
b6 8c de e7 5e 84 d1 01 9e f2 b6 a4 91 8f f3 bd
62 69 9a 44 86 ad 5a 29 08 a0 98 64 98 74 52 a1
e0 ae 89 10 55 90 a4 5e 99 99 72 d5 91 ac dc c0
c5 c2 c8 93 8f 3f 25 59 d0 9c b6 89 15 86 ae ec
93 0f ce 3b ae f5 91 94 3e 22 67 4d 73 75 39 8b
67 de
Here the algorithm :
key = binascii.unhexlify('fe0969aac4e169dfc89011326418aeae')
data = binascii.unhexlify('000c29b053b2000100000016000c294513e1000c29b053b28888e52C0000000016000c29b053b2000164ad0a247f79b4682a4b376e2072c5e7afee907fb68cdee75e84d1019ef2b6a4918ff3bd62699a4486ad5a2908a09864987452a1e0ae89105590a45e999972d591acdcc0c5c2c8938f3f2559d09cb6891586aeec930fce3baef591943e22674d7375398b67de')
iv, tag = data[:24], data[-32:]
cipher = AES.new(key, AES.MODE_GCM, iv)
cipher.decrypt_and_verify(data[24:-32], tag)
Could you help me please ? :(

The task is essentially to identify from the frame the components necessary for AES-GCM, namely nonce, AAD and tag.
The frame starts with the MAC DA (Destination Address) and the MAC SA (Source Address), each of them 6 bytes long. Then follows the 16 bytes long SecTAG (Security TAG), which is composed of the 2 bytes long MACsec Ether Type (0x88e5), the 1 byte long TCI/AN (TAG Control Information / Association Number), the 1 byte long SL (Short Length of the encrypted data), the 4 bytes long PN (Packet Number) and the 8 bytes long SCI (Secure Channel Identifier). Then comes the encrypted data and finally the 16 bytes long ICV (Integrity Check Value):
MAC DA: 0x000c294513e1
MAC SA: 0x000c29b053b2
MACsec Ether Type: 0x88e5
TCI/AN: 0x2c
SL: 0x00
PN: 0x00000016
SCI: 0x000c29b053b20001
enc. user data: 0x64ad0a247f79b4682a4b376e2072c5e7afee907fb68cdee75e84d1019ef2b6a4918ff3bd62699a4486ad5a2908a09864987452a1e0ae89105590a45e999972d591acdcc0c5c2c8938f3f2559d09cb6891586aeec930f
ICV: 0xce3baef591943e22674d7375398b67de
These portions map to the GCM components as follows: The 12 bytes GCM nonce corresponds to the SCI and PN concatenated in this order. The GCM AAD are the concatenated data of MAC DA, MAC SA and SecTAG (Ether Type, TCI/AN, SL, PN, SCI) in this order. The GCM tag corresponds to the ICV:
GCM nonce: 0x000c29b053b2000100000016
GCM AAD: 0x000c294513e1000c29b053b288e52c0000000016000c29b053b20001
GCM tag: 0xce3baef591943e22674d7375398b67de
Thus the encrypted data can be decrypted with PyCryptodome as follows:
from Crypto.Cipher import AES
import binascii
key = binascii.unhexlify('fe0969aac4e169dfc89011326418aeae')
nonce = binascii.unhexlify('000c29b053b2000100000016')
aad = binascii.unhexlify('000c294513e1000c29b053b288e52c0000000016000c29b053b20001')
tag = binascii.unhexlify('ce3baef591943e22674d7375398b67de')
data = binascii.unhexlify('64ad0a247f79b4682a4b376e2072c5e7afee907fb68cdee75e84d1019ef2b6a4918ff3bd62699a4486ad5a2908a09864987452a1e0ae89105590a45e999972d591acdcc0c5c2c8938f3f2559d09cb6891586aeec930f')
cipher = AES.new(key, AES.MODE_GCM, nonce)
cipher.update(aad)
decrypted = cipher.decrypt_and_verify(data, tag)
print(decrypted.hex())
with the output:
080045000054607040004001c6160a01000b0a0100160800b716022b0007a6c0c25e0000000012c5040000000000101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f3031323334353637
More details can be found here (test vectors, identification of the GCM components) and here (structure of the SecTAG).

Related

ECDSA signature verification mismatch

I see a strange behaviour on ECDSA signature verification from the nodejs's secp256k1 package that sometimes fails the signature check. I use the following public key:
33 2E 16 0F 4C 24 1F 50 0B 5A 67 13 EB E1 52 52
D1 E2 BA A0 0A B9 7B 54 6E 5C CD 32 E4 FE 26 2A
B5 51 5A BF CA EF D5 9D FD 35 AA 3A 4B 23 1C 7C
1A 2E 3B 4A B7 84 7C 49 89 66 66 98 E6 4F FA B4
Now, given the message hash
0C 8D 6D 12 60 93 2B 13 04 DA 48 56 F5 DB 14 DE
E6 51 69 97 5D 04 89 1F 5E F3 56 A5 77 12 31 10
and the signature
989EFF3505B719017F9DC0CB1D46CBC305940CA458742357BABC0E81C306704FE4F1CD5921E42FEC1CD184FBF0D09E82BCCF3B7F8706D15E4B331302F9845A1F
with both Python's ecdsa and ST's X-CUBE-CRYPTOLIB it verifies successfully, instead with nodejs's secp256k1 the signature gets rejected.
Any ideas? On nodejs I have to add 0x04 before the publickey and it works perfectly in most of the cases. The following signatures/hash couple for example is accepted:
hash
43 82 6b bf 48 61 77 e7 c9 3e 47 b3 ad cf 80 c2 51 46 29 a1 97 15 13 3b 8c b5 bb a0 89 c5 cb bc
Signature
D5FA95C2B66DA7ECB294E9C677495BC24425076C6C9DE42DAB9C4F0FD25AE854649E6F3042611F8441DAE82A14D6145E3C3EB8504A8F673FADDF94702CF641C3
Thanks

Bitcoin and the secp256k1 library use canonical signatures, while this constraint does not apply to the ecdsa library (and presumably not to X-CUBE-CRYPTOLIB).
Canonical signature: In general, if (r, s) is a valid signature, then (r, -s) = (r, n - s) is also a valid signature (n: order of the base point). A canonical signature uses the value s' = n - s if s > n/2, see here.
Therefore, signatures with s > n/2 are always validated as invalid by the secp256k1 library, while this does not apply to the ecdsa library. Signatures with smaller s are validated identically by both libraries. This is the reason for the sporadic occurrence of the issue.
For the secp256k1 library to verify the posted signature as valid, it must be normalized. The secp256k1 library provides the signatureNormalize() function for this purpose:
const crypto = require('crypto')
const secp256k1 = require('secp256k1')
var signature = Buffer.from('989EFF3505B719017F9DC0CB1D46CBC305940CA458742357BABC0E81C306704FE4F1CD5921E42FEC1CD184FBF0D09E82BCCF3B7F8706D15E4B331302F9845A1F', 'hex');
signature = secp256k1.signatureNormalize(signature); // FIX!
var publicKey = Buffer.from('04332E160F4C241F500B5A6713EBE15252D1E2BAA00AB97B546E5CCD32E4FE262AB5515ABFCAEFD59DFD35AA3A4B231C7C1A2E3B4AB7847C4989666698E64FFAB4', 'hex');
var messageHash = Buffer.from("0C8D6D1260932B1304DA4856F5DB14DEE65169975D04891F5EF356A577123110", 'hex');
console.log(secp256k1.ecdsaVerify(signature, messageHash, publicKey)); // true

How to print binary file as bytes?

I did
>>> b0 = open('file','rb')
Then
>>> b0.read(10)
gives
b'\xb8\xaaK\x1e^J)\xab_I'
How can I get things printed all as pure hex bytes? I want
b'\xb8\xaa\x4b\x1e\x5e\x4a\x29\xab\x5f\x49'
(PS: is it possible to print it pretty? like
B8 AA 4B 1E 5E 4A 29 AB 5F 49
or colon separated.)

>>> s = b'\xb8\xaaK\x1e^J)\xab_I'
>>> ' '.join('{:02X}'.format(c) for c in s)
'B8 AA 4B 1E 5E 4A 29 AB 5F 49'
or, slightly more concisely:
>>> ' '.join(map('{:02X}'.format, s))
'B8 AA 4B 1E 5E 4A 29 AB 5F 49'

printing number of snmpwalk results

Were trying to make a script on a Ubuntu server that reads the number of results from an snmpwalk command, and then sending it to Cacti for graphing.
Since none of us have any kind of programming knowledge and from what we have tried, we havent succeed.
It will go like this:
the script runs: snmpwalk -v 1 -c public -Cp 10.59.193.141 .1.3.6.1.4.1.11.2.14.11.6.4.1.1.8.1.1.2.1
The command will print
iso.3.6.1.4.1.11.2.14.11.6.4.1.1.8.1.1.2.1.0.34.250.121.174.124 = Hex-STRING: 00 22 FA 79 AE 7C
iso.3.6.1.4.1.11.2.14.11.6.4.1.1.8.1.1.2.1.0.35.20.11.246.64 = Hex-STRING: 00 23 14 0B F6 40
iso.3.6.1.4.1.11.2.14.11.6.4.1.1.8.1.1.2.1.0.38.198.89.34.192 = Hex-STRING: 00 26 C6 59 22 C0
iso.3.6.1.4.1.11.2.14.11.6.4.1.1.8.1.1.2.1.40.224.44.221.222.148 = Hex-STRING: 28 E0 2C DD DE 94
iso.3.6.1.4.1.11.2.14.11.6.4.1.1.8.1.1.2.1.100.163.203.10.120.83 = Hex-STRING: 64 A3 CB 0A 78 53
iso.3.6.1.4.1.11.2.14.11.6.4.1.1.8.1.1.2.1.120.214.240.8.133.165 = Hex-STRING: 78 D6 F0 08 85 A5
iso.3.6.1.4.1.11.2.14.11.6.4.1.1.8.1.1.2.1.132.0.210.179.213.93 = Hex-STRING: 84 00 D2 B3 D5 5D
iso.3.6.1.4.1.11.2.14.11.6.4.1.1.8.1.1.2.1.132.0.210.201.8.196 = Hex-STRING: 84 00 D2 C9 08 C4
iso.3.6.1.4.1.11.2.14.11.6.4.1.1.8.1.1.2.1.140.112.90.108.236.188 = Hex-STRING: 8C 70 5A 6C EC BC
iso.3.6.1.4.1.11.2.14.11.6.4.1.1.8.1.1.2.1.140.112.90.139.18.244 = Hex-STRING: 8C 70 5A 8B 12 F4
iso.3.6.1.4.1.11.2.14.11.6.4.1.1.8.1.1.2.1.180.240.171.112.37.69 = Hex-STRING: B4 F0 AB 70 25 45
Variables found: 11
Then the script should somehow do: read until Variables found: and read "11", and then print "11".
So basically we want the script to filter out the number "11" in this case which we can use in Cacti for graphing. We've tried some scripts on google and looked around for information, but found nothing.
I think it should be easy if you know how to do it, but we are beginners at programming.
Thanks in advance!

Using perl, add following command after a pipe to extract the number you want:
... | perl -ne 'm/\A(?i)variables\s+/ and m/(\d+)\s*$/ and printf qq|%s\n|, $1 and exit'
It will print:
11

Problems with AUTH LOGIN in python SMTP server

I'm attempting to create a python script to send an email over smpt.gmail.com. I am only allowed to use sockets.
Currently, I've got the script to successfully connect to the servers, declare StartTLS, and wrap my socket in SSL. However, I'm having issues when attempting to authenticate with the server.
Here is my authentification code:
clientSocketSSL.send('AUTH LOGIN\r\n')
clientSocketSSL.send(base64.b64encode('USERNAME')+'\r\n')
clientSocketSSL.send(base64.b64encode('PASS')+'\r\n')
The response I get is
501 5.5.2 Cannot Decode response
So then the MAIL FROM command fails as I'm not properly authenticated.
I feel like this is a very easily solution, am I just using the auth login incorrectly? I've been looking for two hours but haven't been able to find anything...

It should work I tried myself with openssl:
OpenSSL> s_client -starttls smtp -connect smtp.gmail.com:587
CONNECTED(00000003)
depth=1 C = US, O = Google Inc, CN = Google Internet Authority
verify error:num=20:unable to get local issuer certificate
verify return:0
---
Certificate chain
0 s:/C=US/ST=California/L=Mountain View/O=Google Inc/CN=smtp.gmail.com
i:/C=US/O=Google Inc/CN=Google Internet Authority
1 s:/C=US/O=Google Inc/CN=Google Internet Authority
i:/C=US/O=Equifax/OU=Equifax Secure Certificate Authority
---
Server certificate
-----BEGIN CERTIFICATE-----
MIIDgDCCAumgAwIBAgIKO3T/ewAAAABoqDANBgkqhkiG9w0BAQUFADBGMQswCQYD
VQQGEwJVUzETMBEGA1UEChMKR29vZ2xlIEluYzEiMCAGA1UEAxMZR29vZ2xlIElu
dGVybmV0IEF1dGhvcml0eTAeFw0xMjA5MTIxMTU3NTBaFw0xMzA2MDcxOTQzMjda
MGgxCzAJBgNVBAYTAlVTMRMwEQYDVQQIEwpDYWxpZm9ybmlhMRYwFAYDVQQHEw1N
b3VudGFpbiBWaWV3MRMwEQYDVQQKEwpHb29nbGUgSW5jMRcwFQYDVQQDEw5zbXRw
LmdtYWlsLmNvbTCBnzANBgkqhkiG9w0BAQEFAAOBjQAwgYkCgYEAv0UvQmjW1y96
cOK6AdQVEYPRd3ZQ9UhxkKfuVaYS9riOESFkWxkz+b3Ts/EOA5SY8axkaJS7Qa/v
N7laztYY8tTkx9Ml+eCY4xh0fFq9z4/WWADGqTY5I0wvqjZr+jBuYGulK1fU4ZUS
QpuZMMO9x7Bmr5LVP9C5r2qnoqtMtJUCAwEAAaOCAVEwggFNMB0GA1UdJQQWMBQG
CCsGAQUFBwMBBggrBgEFBQcDAjAdBgNVHQ4EFgQUaCtARMZ9urIDfdpR6v1AkQsr
44owHwYDVR0jBBgwFoAUv8Aw6/VDET5nup6R+/xq2uNrEiQwWwYDVR0fBFQwUjBQ
oE6gTIZKaHR0cDovL3d3dy5nc3RhdGljLmNvbS9Hb29nbGVJbnRlcm5ldEF1dGhv
cml0eS9Hb29nbGVJbnRlcm5ldEF1dGhvcml0eS5jcmwwZgYIKwYBBQUHAQEEWjBY
MFYGCCsGAQUFBzAChkpodHRwOi8vd3d3LmdzdGF0aWMuY29tL0dvb2dsZUludGVy
bmV0QXV0aG9yaXR5L0dvb2dsZUludGVybmV0QXV0aG9yaXR5LmNydDAMBgNVHRMB
Af8EAjAAMBkGA1UdEQQSMBCCDnNtdHAuZ21haWwuY29tMA0GCSqGSIb3DQEBBQUA
A4GBADSkwmtEUhy/AhX2sIULT0Q5S9OlfKxbyE8hEc8nxls3jbk5yKZYd35Bzyy8
raoUPFuD3IH+zP/FGj5LPQirjnJLUvuFDsiM4eowPUthQad9SGWWdz6hCx8HpEUZ
1ssGnwb3HX34e9RH57v9LdtVUPdFYQsBJ36miGPylWk6r0xx
-----END CERTIFICATE-----
subject=/C=US/ST=California/L=Mountain View/O=Google Inc/CN=smtp.gmail.com
issuer=/C=US/O=Google Inc/CN=Google Internet Authority
---
No client certificate CA names sent
---
SSL handshake has read 2304 bytes and written 383 bytes
---
New, TLSv1/SSLv3, Cipher is ECDHE-RSA-RC4-SHA
Server public key is 1024 bit
Secure Renegotiation IS supported
Compression: NONE
Expansion: NONE
SSL-Session:
Protocol : TLSv1.1
Cipher : ECDHE-RSA-RC4-SHA
Session-ID: 3A9E6D2BD679FD124B6772C91C74A5AFCEE7699A212D514FBC11710B684BDE31
Session-ID-ctx:
Master-Key: D7B5B70090660B2359CFD8B82582033C16B569DEE6ACE1F6EB2CDD4E2042A613410B5E6DD07643664ABC33E8049547B8
Key-Arg : None
PSK identity: None
PSK identity hint: None
SRP username: None
TLS session ticket lifetime hint: 100800 (seconds)
TLS session ticket:
0000 - 63 53 11 b3 92 0d 59 63-15 90 58 10 84 f2 f7 6a cS....Yc..X....j
0010 - e8 4b b0 a8 41 0a 73 0e-41 ee 3c a0 ab 91 df df .K..A.s.A.<.....
0020 - f0 24 b5 08 18 7d cc 56-05 9b 05 f4 e5 57 23 1b .$...}.V.....W#.
0030 - e0 00 33 e6 61 11 6b a2-9e 05 32 bb a3 99 8f 64 ..3.a.k...2....d
0040 - 50 2c 6c 3a 5f 46 d1 53-2d 2a 3f 6a 8d cd c5 c8 P,l:_F.S-*?j....
0050 - 4e 0a 15 63 04 e7 4e a0-01 51 79 93 38 3c de 62 N..c..N..Qy.8<.b
0060 - 75 76 7a 0e 1c fc 98 0f-04 b5 b2 59 2a 1e c3 e5 uvz........Y*...
0070 - aa d2 f6 2b 36 8c b8 97-77 77 9e 77 37 a7 ed 12 ...+6...ww.w7...
0080 - d5 85 30 d2 e8 42 67 e8-84 97 0a f2 b6 95 fd 2f ..0..Bg......../
0090 - e7 f2 de 0e ....
Start Time: 1354229935
Timeout : 300 (sec)
Verify return code: 20 (unable to get local issuer certificate)
---
250 ENHANCEDSTATUSCODES
ehlo
250-mx.google.com at your service, [188.79.92.35]
250-SIZE 35882577
250-8BITMIME
250-AUTH LOGIN PLAIN XOAUTH XOAUTH2
250 ENHANCEDSTATUSCODES
auth login
334 VXNlcm5hbWU6
MY_EMAIL_BASE64
334 UGFzc3dvcmQ6
MY_PASS_BASE64
235 2.7.0 Accepted

how do I specify extended ascii (i.e. range(256)) in the python magic encoding specifier line?

I'm using mako templates to generate specialized config files. Some of these files contain extended ASCII chars (>127), but mako chokes saying that the chars are out of range when I use:
## -*- coding: ascii -*-
So I'm wondering if perhaps there's something like:
## -*- coding: eascii -*-
That I can use that will be ok with the range(128, 256) chars.
EDIT:
Here's the dump of the offending section of the file:
000001b0 39 c0 c1 c2 c3 c4 c5 c6 c7 c8 c9 ca cb cc cd ce |9...............|
000001c0 cf d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 da db dc dd de |................|
000001d0 df e0 e1 e2 e3 e4 e5 e6 e7 e8 e9 ea eb ec ed ee |................|
000001e0 ef f0 f1 f2 f3 f4 f5 f6 f7 f8 f9 fa fb fc fd fe |................|
000001f0 ff 5d 2b 28 27 73 29 3f 22 0a 20 20 20 20 20 20 |.]+('s)?". |
00000200 20 20 74 6f 6b 65 6e 3a 20 57 4f 52 44 20 20 20 | token: WORD |
00000210 20 20 22 5b 41 2d 5a 61 2d 7a 30 2d 39 c0 c1 c2 | "[A-Za-z0-9...|
00000220 c3 c4 c5 c6 c7 c8 c9 ca cb cc cd ce cf d0 d1 d2 |................|
00000230 d3 d4 d5 d6 d7 d8 d9 da db dc dd de df e0 e1 e2 |................|
00000240 e3 e4 e5 e6 e7 e8 e9 ea eb ec ed ee ef f0 f1 f2 |................|
00000250 f3 f4 f5 f6 f7 f8 f9 fa fb fc fd fe ff 5d 2b 28 |.............]+(|
The first character that mako complains about is 000001b4. If I remove this section, everything works fine. With the section inserted, mako complains:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 19: ordinal not in range(128)
It's the same complaint whether I use 'ascii' or 'latin-1' in the magic comment line.
Thanks!
Greg

Short answer
Use cp437 as the encoding for some retro DOS fun. All byte values greater than or equal to 32 decimal, except 127, are mapped to displayable characters in this encoding. Then use cp037 as the encoding for a truly trippy time. And then ask yourself how do you really know which of these, if either of them, is "correct".
Long answer
There is something you must unlearn: the absolute equivalence of byte values and characters.
Many basic text editors and debugging tools today, and also the Python language specification, imply an absolute equivalence between bytes and characters when in reality none exists. It is not true that 74 6f 6b 65 6e is "token". Only for ASCII-compatible character encodings is this correspondence valid. In EBCDIC, which is still quite common today, "token" corresponds to byte values a3 96 92 85 95.
So while the Python 2.6 interpreter happily evaluates 'text' == u'text' as True, it shouldn't, because they are only equivalent under the assumption of ASCII or a compatible encoding, and even then they should not be considered equal. (At least '\xfd' == u'\xfd' is False and gets you a warning for trying.) Python 3.1 evaluates 'text' == b'text' as False. But even the acceptance of this expression by the interpreter implies an absolute equivalence of byte values and characters, because the expression b'text' is taken to mean "the byte-string you get when you apply the ASCII encoding to 'text'" by the interpreter.
As far as I know, every programming language in widespread use today carries an implicit use of ASCII or ISO-8859-1 (Latin-1) character encoding somewhere in its design. In C, the char data type is really a byte. I saw one Java 1.4 VM where the constructor java.lang.String(byte[] data) assumed ISO-8859-1 encoding. Most compilers and interpreters assume ASCII or ISO-8859-1 encoding of source code (some let you change it). In Java, string length is really the UTF-16 code unit length, which is arguably wrong for characters U+10000 and above. In Unix, filenames are byte-strings interpreted according to terminal settings, allowing you to open('a\x08b', 'w').write('Say my name!').
So we have all been trained and conditioned by the tools we have learned to trust, to believe that 'A' is 0x41. But it isn't. 'A' is a character and 0x41 is a byte and they are simply not equal.
Once you have become enlightened on this point, you will have no trouble resolving your issue. You have simply to decide what component in the software is assuming the ASCII encoding for these byte values, and how to either change that behavior or ensure that different byte values appear instead.
PS: The phrases "extended ASCII" and "ANSI character set" are misnomers.

Try
## -*- coding: UTF-8 -*-
or
## -*- coding: latin-1 -*-
or
## -*- coding: cp1252 -*-
depending on what you really need. The last two are similar except:
The Windows-1252 codepage coincides with ISO-8859-1 for all codes except the range 128 to 159 (hex 80 to 9F), where the little-used C1 controls are replaced with additional characters. Windows-28591 is the actual ISO-8859-1 codepage.
where ISO-8859-1 is the official name for latin-1.

Try examining your data with a critical eye:
000001b0 39 c0 c1 c2 c3 c4 c5 c6 c7 c8 c9 ca cb cc cd ce |9...............|
000001c0 cf d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 da db dc dd de |................|
000001d0 df e0 e1 e2 e3 e4 e5 e6 e7 e8 e9 ea eb ec ed ee |................|
000001e0 ef f0 f1 f2 f3 f4 f5 f6 f7 f8 f9 fa fb fc fd fe |................|
000001f0 ff 5d 2b 28 27 73 29 3f 22 0a 20 20 20 20 20 20 |.]+('s)?". |
00000200 20 20 74 6f 6b 65 6e 3a 20 57 4f 52 44 20 20 20 | token: WORD |
00000210 20 20 22 5b 41 2d 5a 61 2d 7a 30 2d 39 c0 c1 c2 | "[A-Za-z0-9...|
00000220 c3 c4 c5 c6 c7 c8 c9 ca cb cc cd ce cf d0 d1 d2 |................|
00000230 d3 d4 d5 d6 d7 d8 d9 da db dc dd de df e0 e1 e2 |................|
00000240 e3 e4 e5 e6 e7 e8 e9 ea eb ec ed ee ef f0 f1 f2 |................|
00000250 f3 f4 f5 f6 f7 f8 f9 fa fb fc fd fe ff 5d 2b 28 |.............]+(|
The stuff in bold font is two lots of (each byte from 0xc0 to 0xff both inclusive). You appear to have a binary file (perhaps a dump of compiled regex(es)), not a text file. I suggest that you read it as a binary file, rather than paste it into your Python source file. You should also read the mako docs to find out what it is expecting.
Update after eyeballing the text part of your dump: You may well be able to express this in ASCII-only regexes e.g. you would have a line containing
token: WORD "[A-Za-z0-9\xc0-\xff]+(etc)etc"

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.