Cryptography.Hazmat AES Cipher Output Length - python

In Python, using the Cryptography.Hazmat module For AES, the output length of the encryption is not the a multiple of 16; am I implementing the encryption cipher wrong, and if so, what is wrong? The output length I recieve is 16 + len(input) (16 as it is the length of the IV). Here is the code below:
from cryptography.hazmat.primitives.ciphers import Cipher
from cryptography.hazmat.primitives.ciphers.algorithms import AES
from cryptography.hazmat.primitives.ciphers.modes import CBC,OFB,CFB
class AES_Cipher:
def __init__(self,key):
self.key = key
def encrypt(self,plain_text):
initialization_vector = urandom(16)
cipher = Cipher(AES(self.key),OFB(initialization_vector),backend)
encryption_engine = cipher.encryptor()
return initialization_vector + encryption_engine.update(plain_text.encode("utf-8")) + encryption_engine.finalize()
def decrypt(self,cipher_text):
initialization_vector = cipher_text[:16]
cipher = Cipher(AES(self.key),OFB(initialization_vector),backend)
decryption_engine = cipher.decryptor()
return (decryption_engine.update(cipher_text[16:]) + decryption_engine.finalize()).decode("utf-8")
The cipher is called as so:
from hashlib import sha3_256
aes_key = sha3_256(b"Strong Encryption Key").digest()
aes_engine = AES_Cipher(aes_key)
aes_engine.encrypt("Hello World")
And this is the result:
b'\xc4I\xf2\xe5\xf4\xaeX\x96\xa5\xfe\xbd+\xde\x8ca\xd5\xdb\xad\x97S\x01\x81C\x9e\xd5\xd8#'
This is only 27 bytes long, compared to the expected 32 bytes. The 27 = 16 + len("Hello World"). Why is it not 32 bytes long? What is the code missing? Another thing; decryption works perfectly fine.

The length of 27 bytes is correct for OFB-mode.
The OFB-mode used in the Python-code turns a block cipher into a stream cipher. The difference between block cipher and stream cipher is described in more detail here. In particular, the length of the plaintext input can be arbitrary for a stream cipher, i.e. in contrast to a block cipher, the length does not have to be an integer multiple of the blocksize, so that no padding is required. The generated ciphertext has the same length as the plaintext.
In the current example, the plaintext Hello World, and therefore also the ciphertext, has a length of 11 bytes. Together with the IV, which has a length of 16 bytes, the total length is 27 bytes, which corresponds exactly to your result.

Related

How to make Python Crypto AES faster

I'm looking for a very fast way for encrypting and decrypting short text snippets. Security is secondary in my use-case. Light encryption with a constant IV is fine. I'm currently doing this:
BS = 16
pad = lambda s: s + (BS - len(s) % BS) * chr(BS - len(s) % BS)
unpad = lambda s : s[:-ord(s[len(s)-1:])]
import base64
from Crypto.Cipher import AES
iv = '0123456789012345'
def encrypt(raw, key):
raw = pad(raw)
cipher = AES.new(key, AES.MODE_CBC, iv)
return base64.b64encode( cipher.encrypt( raw ) )
def decrypt(enc, key):
enc = base64.b64decode(enc)
cipher = AES.new(key, AES.MODE_CBC, iv)
return unpad(cipher.decrypt( enc ))
enc_text = encrypt('Hello World!','xyz1234567890abc')
print decrypt(enc_text, 'xyz1234567890abc')
How can I make this faster? Maybe by using another AES mode (MODE_CBC?), or is there a faster padding function, a faster way of hex-converting the output?
Counter mode (AES.MODE_CTR) will be faster for multi-block messages, as it can be parallelized for both encryption and decryption. CBC is serial on encryption because the resulting output of each block-cipher operation is fed as input to be XOR with the plaintext of the next block before encrypting. Because CTR generates a keystream by encrypting each (sequential) counter value with the key, it does not rely on the output of any previous block operation and can perform the tasks in parallel.
In addition, because CTR operates as a stream cipher, no message padding is required, so you'll save time on that operation in and out.
Note: Don't re-use counters. You mentioned that confidentiality is a secondary concern here, but while re-using IVs in CBC mode is "bad", re-using counters in CTR mode is end of the world bad. Just use a sequential counter (literally i++) combined with 64 bits of the ms since epoch start and you'll be fine. (See Stream Reuse or Many Time Pad Attack for good examples of why).

PyCrypto returns a keysize error for AES

I'm trying to write a program to encrypt files using AES, however I get a
ValueError: AES key must be either 16, 24, or 32 bytes long
error no matter the size of the key.
My code for generating the AES object is
def AESEncryptor(Seed, Block = 16): #Generate AES key and Cipher
iv = Random.new().read(Block)
cipher = AES.new(Seed.encode('utf8'), AES.MODE_CBC, iv)
return cipher, iv
And my code for generating the key is
def genNewSeed(k=2048): #Generate seed for new AES key
return hashlib.sha256(os.urandom(32)).hexdigest()[:11]
Which, according to sys.getsizeof() is equal to 32 bits yet it still returns the error
The problem is that you're slicing off only 11 bytes from a hex-encoded "seed" of 64 characters. Keep in mind that keys are supposed to have high entropy, otherwise it's easier to brute force them.
I suggest you use:
def AESEncryptor(Seed, Block = 16): #Generate AES key and Cipher
iv = Random.new().read(Block)
cipher = AES.new(Seed, AES.MODE_CBC, iv)
return cipher, iv
def genNewSeed(k=2048): #Generate seed for new AES key
return hashlib.sha256(os.urandom(32)).digest()
This will give you a 32 byte key which makes this AES-256. If you want AES-128, then you can slice the last 16 bytes off:
hashlib.sha256(os.urandom(32)).digest()[:16]
You cannot use sys.getsizeof() to determine the size of a key, because it includes all kinds of internal counters and data. For example, an empty string has already a size of 21 bytes. That's why you would thought that you got 32 byte when you only got 11 (21 + 11 = 32). Use the built-in len(key) instead.

PyCrypto - How does the Initialization Vector work?

I'm trying to understand how PyCrypto works to use in a project but I'm not fully understanding the significance of the Initialization Vector (IV). I've found that I can use the wrong IV when decoding a string and I still seem to get the message back except for the first 16 bytes (the block size). Am simply using it wrong or not understanding something?
Here's a sample code to demonstrate:
import Crypto
import Crypto.Random
from Crypto.Cipher import AES
def pad_data(data):
if len(data) % 16 == 0:
return data
databytes = bytearray(data)
padding_required = 15 - (len(databytes) % 16)
databytes.extend(b'\x80')
databytes.extend(b'\x00' * padding_required)
return bytes(databytes)
def unpad_data(data):
if not data:
return data
data = data.rstrip(b'\x00')
if data[-1] == 128: # b'\x80'[0]:
return data[:-1]
else:
return data
def generate_aes_key():
rnd = Crypto.Random.OSRNG.posix.new().read(AES.block_size)
return rnd
def encrypt(key, iv, data):
aes = AES.new(key, AES.MODE_CBC, iv)
data = pad_data(data)
return aes.encrypt(data)
def decrypt(key, iv, data):
aes = AES.new(key, AES.MODE_CBC, iv)
data = aes.decrypt(data)
return unpad_data(data)
def test_crypto ():
key = generate_aes_key()
iv = generate_aes_key() # get some random value for IV
msg = b"This is some super secret message. Please don't tell anyone about it or I'll have to shoot you."
code = encrypt(key, iv, msg)
iv = generate_aes_key() # change the IV to something random
decoded = decrypt(key, iv, code)
print(decoded)
if __name__ == '__main__':
test_crypto()
I'm using Python 3.3.
Output will vary on execution, but I get something like this: b"1^,Kp}Vl\x85\x8426M\xd2b\x1aer secret message. Please don't tell anyone about it or I'll have to shoot you."
The behavior you see is specific to the CBC mode. With CBC, decryption can be visualized in the following way (from wikipedia):
You can see that IV only contributes to the first 16 bytes of plaintext. If the IV is corrupted while it is in transit to the receiver, CBC will still correctly decrypt all blocks but the first one. In CBC, the purpose of the IV is to enable you to encrypt the same message with the same key, and still get a totally different ciphertext each time (even though the message length may give something away).
Other modes are less forgiving. If you get the IV wrong, the whole message is garbled at decryption. Take CTR mode for instance, where nonce takes almost the same meaning of IV:
The developer for PyCrypto pulled the specification for AES CBC Mode from NIST:
AES Mode_CBC -> referencing NIST 800-38a (The Recommendation for Cipher Mode Operations)
From that, page 8:
5.3 Initialization Vectors
The input to the encryption processes of the CBC, CFB, and OFB modes includes, in addition to the plaintext, a data block called the initialization vector (IV), denoted IV. The IV is used in an initial step in the encryption of a message and in the corresponding decryption of the message.
The IV need not be secret; however, for the CBC and CFB modes, the IV for any particular
execution of the encryption process must be unpredictable, and, for the OFB mode, unique IVs must be used for each execution of the encryption process. The generation of IVs is discussed in Appendix C.
Thing to remember, you need to use a random IV every time you compose a message, this adds a 'salt' to the message therefore making the message unique; even with the 'salt' being out in the open, it will not help break the encryption if the AES encryption key is unknown. If you do not use a randomized IV, say, you use the same 16 bytes each message, your messages, if you repeat yourself, will look the same going across the wire and you could be subject to frequency and/or replay attacks.
A test for the results of random IVs vs static:
def test_crypto ():
print("Same IVs same key:")
key = generate_aes_key()
iv = b"1234567890123456"
msg = b"This is some super secret message. Please don't tell anyone about it or I'll have to shoot you."
code = encrypt(key, iv, msg)
print(code.encode('hex'))
decoded = decrypt(key, iv, code)
print(decoded)
code = encrypt(key, iv, msg)
print(code.encode('hex'))
decoded = decrypt(key, iv, code)
print(decoded)
print("Different IVs same key:")
iv = generate_aes_key()
code = encrypt(key, iv, msg)
print(code.encode('hex'))
decoded = decrypt(key, iv, code)
print(decoded)
iv = generate_aes_key()
code = encrypt(key, iv, msg)
print(code.encode('hex'))
decoded = decrypt(key, iv, code)
print(decoded)
Hope this helps!

How AES in CTR works for Python with PyCrypto?

I am using python 2.7.1
I want to encrypt sth using AES in CTR mode. I installed PyCrypto library for python. I wrote the following code:
secret = os.urandom(16)
crypto = AES.new(os.urandom(32), AES.MODE_CTR, counter=lambda: secret)
encrypted = crypto.encrypt("asdk")
print crypto.decrypt(encrypted)
i have to run crypto.decrypt as many times as the byte size of my plaintext in order to get correctly the decrypted data. I.e:
encrypted = crypto.encrypt("test")
print crypto.decrypt(encrypted)
print crypto.decrypt(encrypted)
print crypto.decrypt(encrypted)
print crypto.decrypt(encrypted)
The last call to decrypt will give me the plaintext back. The other outputs from decrypt are some gibberish strings .
I am wondering if this is normal or not? Do i have to include into a loop with size equal of my plaintext every time or i have gotten sth wrong?
I'm going to elaborate on #gertvdijk's explanation of why the cipher behaved the way it did in the original question (my edit was rejected), but also point out that setting up the counter to return a static value is a major flaw and show how to set it up correctly.
Reset the counter for new operations
The reason why this behaves as you described in the question is because your plain text (4 bytes / 32 bits) is four times as small as the size of the key stream blocks that the CTR cipher outputs for encryption (16 bytes/128 bits).
Because you're using the same fixed value over and over instead of an actual counter, the cipher keeps spitting out the same 16 byte blocks of keystream. You can observe this by encrypting 16 null bytes repeatedly:
>>> crypto.encrypt('\x00'*16)
'?\\-\xdc\x16`\x05p\x0f\xa7\xca\x82\xdbE\x7f/'
>>> crypto.encrypt('\x00'*16)
'?\\-\xdc\x16`\x05p\x0f\xa7\xca\x82\xdbE\x7f/'
You also don't reset the cipher's state before performing decryption, so the 4 bytes of ciphertext are decrypted against the next 4 bytes of XOR key from the first output stream block. This can also be observed by encrypting and decrypting null bytes:
>>> crypto.encrypt('\x00' * 4)
'?\\-\xdc'
>>> crypto.decrypt('\x00' * 4)
'\x16`\x05p'
If this were to work the way you wanted, the result of both of those operations should be the same. Instead, you can see the first four bytes of the 16 byte block in the first result, and the second four bytes in the second result.
After you've used up the 16 byte block of XOR key by performing four operations on four-byte values (for a 16 byte total), a new block of XOR key is generated. The first four bytes (as well as all the others) of each XOR key block are the same, so when you call decrypt this time, it gives you back the plaintext.
This is really bad! You should not use AES-CTR this way - it's equivalent to simple XOR encryption with a 16 byte repeating key, which can be broken pretty easily.
Solution
You have to reset the state of the cipher before performing an operation on a new stream of data (or another operation on it), as the original instance will no longer be in the correct initial state. Your issue will be solved by instantiating a new crypto object for the decryption, as well as resetting the counter and keystream position.
You also need to use a proper counter function that combines a nonce with a counter value that increases each time a new block of keystream is generated. PyCrypto has a Counter class that can do this for you.
from Crypto.Cipher import AES
from Crypto.Util import Counter
from Crypto import Random
# Set up the counter with a nonce.
# 64 bit nonce + 64 bit counter = 128 bit output
nonce = Random.get_random_bytes(8)
countf = Counter.new(64, nonce)
key = Random.get_random_bytes(32) # 256 bits key
# Instantiate a crypto object first for encryption
encrypto = AES.new(key, AES.MODE_CTR, counter=countf)
encrypted = encrypto.encrypt("asdk")
# Reset counter and instantiate a new crypto object for decryption
countf = Counter.new(64, nonce)
decrypto = AES.new(key, AES.MODE_CTR, counter=countf)
print decrypto.decrypt(encrypted) # prints "asdk"
Start with a new crypto object for new operations
The reason why this behaves as you described in the question is because your plain text (4 bytes / 32 bits) is four times as small as the size the cryptographic engine works on for your chosen AES mode (128 bits) and also reusing the same instance of the crypto object. Simply don't reuse the same object if you're performing an operation on a new stream of data (or another operation on it). Your issue will be solved by instantiating a new crypto object for the decryption, like this:
# *NEVER* USE A FIXED LIKE COUNTER BELOW IN PRODUCTION CODE. READ THE DOCS.
counter = os.urandom(16)
key = os.urandom(32) # 256 bits key
# Instantiate a crypto object first for encryption
encrypto = AES.new(key, AES.MODE_CTR, counter=lambda: counter)
encrypted = encrypto.encrypt("asdk")
# Instantiate a new crypto object for decryption
decrypto = AES.new(key, AES.MODE_CTR, counter=lambda: counter)
print decrypto.decrypt(encrypted) # prints "asdk"
Why it is not about padding with AES-CTR
This answer started out as a response on the answer by Marcus, in which he initially indicated the use of padding would solve it. While I understand it looks like symptoms of a padding issue, it certainly is not.
The whole point of AES-CTR is that you do not need padding, as it's a stream cipher (unlike ECB/CBC and so on)! Stream ciphers work on streams of data, rather chunking data in blocks and chaining them in the actual cryptographic computation.
In addition to what Marcus says, the Crypto.Util.Counter class can be used to build your counter block function.
According to #gertvdijk, AES_CTR is a stream cipher which does not need padding. So I've deleted the related codes.
Here's something I know.
You have to use a same key(the first parameter in AES.new(...)) in encryption and decryption, and keep the key private.
The encryption/decryption methods are stateful, that means crypto.en(de)crypt("abcd")==crypto.en(de)crypt("abcd") is not always true. In your CTR, your counter callback always returns a same thing, so it becomes stateless when encrypt (I am not 100% sure it is the reason), but we still find that it is somewhat stateful in decryption. As a conclusion, we should always use a new object to do them.
The counter callback function in both encryption and decryption should behave the same. In your case, it is to make both of them return the same secret. Yet I don't think the secret is a "secret". You can use a random generated "secret" and pass it across the communicating peers without any encryption so that the other side can directly use it, as long as the secret is not predictable.
So I would write my cipher like this, hope it will offer some help.
import os
import hashlib
import Crypto.Cipher.AES as AES
class Cipher:
#staticmethod
def md5sum( raw ):
m = hashlib.md5()
m.update(raw)
return m.hexdigest()
BS = AES.block_size
#staticmethod
def pad( s ):
"""note that the padding is no necessary"""
"""return s + (Cipher.BS - len(s) % Cipher.BS) * chr(Cipher.BS - len(s) % Cipher.BS)"""
return s
#staticmethod
def unpad( s ):
"""return s[0:-ord(s[-1])]"""
return s
def __init__(self, key):
self.key = Cipher.md5sum(key)
#the state of the counter callback
self.cnter_cb_called = 0
self.secret = None
def _reset_counter_callback_state( self, secret ):
self.cnter_cb_called = 0
self.secret = secret
def _counter_callback( self ):
"""
this function should be stateful
"""
self.cnter_cb_called += 1
return self.secret[self.cnter_cb_called % Cipher.BS] * Cipher.BS
def encrypt(self, raw):
secret = os.urandom( Cipher.BS ) #random choose a "secret" which is not secret
self._reset_counter_callback_state( secret )
cipher = AES.new( self.key, AES.MODE_CTR, counter = self._counter_callback )
raw_padded = Cipher.pad( raw )
enc_padded = cipher.encrypt( raw_padded )
return secret+enc_padded #yes, it is not secret
def decrypt(self, enc):
secret = enc[:Cipher.BS]
self._reset_counter_callback_state( secret )
cipher = AES.new( self.key, AES.MODE_CTR, counter = self._counter_callback )
enc_padded = enc[Cipher.BS:] #we didn't encrypt the secret, so don't decrypt it
raw_padded = cipher.decrypt( enc_padded )
return Cipher.unpad( raw_padded )
Some test:
>>> from Cipher import Cipher
>>> x = Cipher("this is key")
>>> "a"==x.decrypt(x.encrypt("a"))
True
>>> "b"==x.decrypt(x.encrypt("b"))
True
>>> "c"==x.decrypt(x.encrypt("c"))
True
>>> x.encrypt("a")==x.encrypt("a")
False #though the input is same, the outputs are different
Reference: http://packages.python.org/pycrypto/Crypto.Cipher.blockalgo-module.html#MODE_CTR

AES 256 Encryption with PyCrypto using CBC mode - any weaknesses?

I have the following python script to encrypt/decrypt data using AES 256, could you please tell me if there's anything in the code that may make the encryption weak or if there's anything that I've not taken account of for AES 256 encryption using CBC mode? I've tested the script and it works fine, it is encrypting and decrypting data but just wanted a second opinion. Thanks.
from Crypto.Cipher import AES
from Crypto import Random
BLOCK_SIZE = 32
INTERRUPT = u'\u0001'
PAD = u'\u0000'
def AddPadding(data, interrupt, pad, block_size):
new_data = ''.join([data, interrupt])
new_data_len = len(new_data)
remaining_len = block_size - new_data_len
to_pad_len = remaining_len % block_size
pad_string = pad * to_pad_len
return ''.join([new_data, pad_string])
def StripPadding(data, interrupt, pad):
return data.rstrip(pad).rstrip(interrupt)
SECRET_KEY = Random.new().read(32)
IV = Random.new().read(16)
cipher_for_encryption = AES.new(SECRET_KEY, AES.MODE_CBC, IV)
cipher_for_decryption = AES.new(SECRET_KEY, AES.MODE_CBC, IV)
def EncryptWithAES(encrypt_cipher, plaintext_data):
plaintext_padded = AddPadding(plaintext_data, INTERRUPT, PAD, BLOCK_SIZE)
encrypted = encrypt_cipher.encrypt(plaintext_padded)
return encrypted
def DecryptWithAES(decrypt_cipher, encrypted_data):
decoded_encrypted_data = encrypted_data
decrypted_data = decrypt_cipher.decrypt(decoded_encrypted_data)
return StripPadding(decrypted_data, INTERRUPT, PAD)
our_data_to_encrypt = u'abc11100000'
encrypted_data = EncryptWithAES(cipher_for_encryption, our_data_to_encrypt)
print ('Encrypted string:', encrypted_data)
decrypted_data = DecryptWithAES(cipher_for_decryption, encrypted_data)
print ('Decrypted string:', decrypted_data)
I've seen the code posted on the internet. There are - in principle - not too many things wrong with it, but there is no need to invent your own padding. Furthermore, I don't see why the first padding character is called INTERRUPT. I presume that INTERRUPT and PAD is handled as a single byte (I'm not a Python expert).
The most common padding is PKCS#5 padding. It consists of N bytes with the value of the number of padding bytes. The padding used here looks more like 'ISO' padding, which consists of a single bit set to 1 to distinguish it from the data and other padding bits, and the rest is zero's. That would be code point \u0080 in code.
So the encryption (which can provide confidentiality of data) seems to be used correctly. It depends on the use case if you also need integrity protection and/or authentication, e.g. by using a MAC or HMAC. Of course, no legal guarantees or anything provided.

Categories

Resources